Willison: The killer app of Gemini Pro 1.5 is video

Willison’s experience with, and reaction to, Gemini 1.5 Pro extracting structured output from video prompts parallels my own experience using GPT-4 Vision to extract structure from heirloom recipe images (often handwritten and horribly mangled):

… I’m pretty astonished by this.

… I find those results pretty astounding.

The ability to analyze video like this feels SO powerful. Being able to take a 20 second video of a bookshelf and get back a JSON array of those books is just the first thing I thought to try.