Thinking with images

OpenAI introduces new capabilities for AI models to process and understand images alongside text. This advancement allows more interactive and multimodal AI applications. It matters because it expands the ways AI can assist users by combining visual and textual information.

ArchiveLaunch

Signal trust

Single sourceEarly signal

PublishedWednesday, April 16, 2025 at 12:00 PMApr 16, 12:00 PM

FreshnessArchive

Story ID#398

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

OpenAI o3 and o4-mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought.

OpenAI o3 and o4-mini are the latest visual reasoning models in our o-series. For the first time, our models can think with images in their chain-of-thought—not just see them.

Similar to our earlier OpenAI o1 model, o3 and o4-mini are trained to think for longer before answering—and use a long internal chain of thought before responding to the user. o3 and o4-mini further extend this capability by thinking with images in their chain-of-thought, which is achieved by transforming user uploaded images with tools, allowing them to crop, zoom in, and rotate, in addition to other simple image processing techniques. More importantly, these capabilities come natively, without relying on separate specialized models.

Opening the briefing

Thinking with images

Original article excerpt