OpenAI Vision + Function Calling Examples

When OpenAI released preview Vision support in GPT-4, it had some major limitations, lacking many of the newest features available in GPT-4 Turbo, including structured output (JSON Mode) and function calling. I had to hack my Recipe Ingestor project, which applies GPT-4 Vision to ingest, structure and enrich legacy recipe images, to run in two passes as a workaround.

On April 9th, OpenAI released an updated gpt-4-turbo model:

GPT-4 Turbo with Vision: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling.

I’ve updated Recipe Ingestor so it now runs nicely in one pass that uses Vision alongside Turbo’s JSON Mode.

In making these updates, I dug around in the OpenAI docs and came across a pair of new Vision examples in the OpenAI Cookbook that take advantage of the April 9th model, applying function calling driven from image inputs. Provided as a Jupyter Notebook, the examples have me mulling whether function calling might make sense for Recipe Ingestor.