Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Google has introduced Gemini Omni, a multimodal AI model that can generate and edit videos using text, images, and audio inputs. This technology allows users to create videos through simple conversational commands. Gemini Omni represents a significant advancement in AI-driven video generation and editing.

ArchiveLaunch

Signal trust

Single sourceEarly signal

Market reactionGOOGL ↓ -0.57% by next close

Before $389.84After $387.60

PublishedTuesday, May 19, 2026 at 7:45 PMMay 19, 07:45 PM

FreshnessArchive

Story ID#3293

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

Google's Gemini Omni is a new multimodal model that reasons across text, images, audio, and video to generate and edit videos through simple conversation — starting with Omni Flash.

When Google launched Gemini three years ago, the goal was to build a multimodal large language model — a single neural network that was trained on text, image, audio, and video and could generate content in any of those formats.

Opening the briefing

Google’s Gemini Omni turns images, audio, and text into video — and that’s just the start

Original article excerpt