Structured Outputs Big Time

OpenAI introduce native structured output support in the API. You probably don't want to use it yet.

Aug 06, 2024

UPDATE:

Folks from OpenAI have confirmed that the significant time-to-first-token with structured outputs only happens on the first call with a new schema, and that it is then cached for future calls. In most production cases that’s fine and you can issue one initial call to warm the cache.
Also, I made a mistake in my initial testing, using the wrong model (‘gpt-4o’ instead of ‘gpt-4o-2024-08-06‘) which led me to believe that the new response_format parameter isn’t working and instead this feature can only be used with tool calls.
Considering everything, I still feel that getting to full power of Instructor provides enough advantage for me to stick with it, but the new feature from OpenAI is definitely usable. Hopefully we can use both and get the power of controlled generation with the richness of the complete model and validation offered by Pydantic and Instructor.

SEE ALSO: https://simonwillison.net/2024/Aug/6/openai-structured-outputs/ with a thorough review of the new capabilities and some additional details.

If there’s anything people really hate about me, other than my exceptionally good looks, is that I can’t seem to stop talking (preaching?) about how using LLMs with structured outputs is the best thing since sliced bread and Pydantic is all you need and AI is Software and yada yada yada … so naturally I felt vindicated to learn that OpenAI agrees, and have now introduced structured output to their APIs and SDKs.

This is great news:

It will bring structured outputs to a much wider audience that did not yet discover them as the best technique for working with LLMs because doing so depended on using community libraries they didn’t even know about.
By implementing this natively, OpenAI are able to tightly control the LLM during generation and thus guarantee 100% compliance with the specified schema. That was previously only possible when using open models that allow you to intervene in the generation. ( note: Cohere also introduced this recently to their API - kudos to them! )

The (admittedly beta) implementation as it is right now is probably not something you’ll want to use in most cases though:

Time to first token is loooooooooong. Because OpenAI need to compile the schema into a grammar for use in generation, there is an initial overhead that’s making every call take a very long time. I’m pretty sure they’ll be able to overcome this eventually with faster compilation and caching of re-used schemas, but the current implementation is not usable for many scenarios.
The JSON schema accepted by the API is limited. OpenAI claim that they focused on core use cases and left out a “long tail” of additional features that are not necessary. Perhaps, but when I tried to migrate existing code I have to this new format I discovered that many of my schemas are not accepted. At the very least we’ll need to get used to using a subset of JSON schema with this feature.
The Python SDK released today doesn’t actually include all the changes advertised in the documentation. In particular, support for passing Pydantic BaseModel subclasses as schema definition isn’t there yet. I’m sure that will be improved in future releases. But it is a good reminder that this is beta software.

What should we use instead? Instructor + Pydantic still offers the easiest way to do structured output with OpenAI as well as many other LLMs. It does not guarantee compliance in generation (not possible without controlling the LLM itself), but it does validate the results using the response model’s definition and even retries with hints from the error message when encountering validation errors.

I’m super excited about OpenAI recognising the power of structured outputs and including it in the API. I am certain that in some time this will become the main way software developers integrate LLMs into their code. But it may take just a bit longer.

Eleanor on Everything

Discussion about this post