$7.99 per month. That’s what full access to Google’s new multimodal video generator costs through a Gemini AI Plus subscription. Google launched Gemini Omni Flash at I/O 2026 on May 19, and YouTube Shorts creators get access for free.
Gemini Omni is not another text-to-video tool. It’s the first model from Google that accepts text, images, audio, and video as combined input, reasons across all of them, and outputs video with synchronized audio. DeepMind CEO Demis Hassabis called it “our new model that can create anything from any input” and described it as a step toward merging Gemini’s intelligence with the rendering capabilities of Veo, Nano Banana, and Genie.
For creators, the practical impact is immediate. You can feed Omni a script, a reference image, a voice sample, and a clip of existing footage, and it will synthesize a cohesive video that accounts for all of those inputs simultaneously. Then you can edit the result by talking to it.
What’s actually under the hood
Gemini Omni Flash is the first model in the Omni family. It combines several Google AI systems that previously operated independently:
- Gemini 3.1: The reasoning and knowledge engine that understands physics, culture, and context
- Veo 3.1: Google’s video generation model (the same engine behind Google Flow)
- Nano Banana 2: Google’s image generation model, responsible for the viral AI image trend earlier this year
- Genie: Google’s interactive world model
The integration means Omni doesn’t just stitch inputs together. Nicole Brichtova, DeepMind’s director of product management, explained it as “the next step towards combining the intelligence of Gemini with the rendering capabilities of our media models.” In practice, the model understands gravity, kinetic energy, and fluid dynamics when generating video. A marble rolling across a desk bounces and sounds correct because the model reasons about physics before rendering frames.
Editing by conversation, not by prompt
The most useful feature for working creators is conversational editing. After Omni generates a video, you can refine it through plain language prompts that build on each other. Ask it to swap the background, change the lighting, add a character, or adjust the wardrobe, and it maintains consistency in the elements you didn’t ask it to change.
This is the difference between Omni and a standalone tool like Runway Gen-4 or Kling 3.0. Those tools generate from a single prompt. Omni generates, then lets you iterate through conversation while preserving characters and environments across edits. It’s closer to directing than prompting.
Google also introduced two companion tools inside Flow:
- Flow Agent: An AI assistant that brainstorms scenes, organizes assets, recommends plot changes, and batch-edits across a project
- Flow Tools: Custom editing workflows built through natural language prompts, no coding required
Creators already using Google Flow for video or music work will find Omni layered directly into that environment.
Your face, your voice, and Google’s safety system
Omni can generate videos featuring your digital avatar. You record yourself and speak a series of numbers during onboarding, and Google creates a personalized avatar that looks and sounds like you. The avatar gets stored for reuse across future sessions.
Google designed the avatar system with anti-deepfake measures baked in. Only the person who recorded the avatar can use it. The onboarding process verifies identity, and every video generated through Omni carries Google’s SynthID digital watermark. SynthID is imperceptible to viewers but can be verified through the Gemini app, Chrome, and Google Search.
Audio and speech editing are still in testing. Google acknowledged the sensitivity of voice manipulation and is holding those features back until the safety implementation meets their standards. For now, voice references work for avatar creation, but you cannot edit someone else’s speech within a generated video.
YouTube creators Happy Kelli and comedian Adam Waheed demonstrated the avatar system at I/O 2026, generating short clips featuring their digital selves in scenarios they described through text prompts.
Where you can use it right now
Gemini Omni Flash is live as of May 19, 2026, across multiple platforms:
| Platform | Access Level | Cost |
|---|---|---|
| Gemini app | AI Plus subscribers and above | $7.99/mo (Plus), $19.99/mo (Pro), $99.99/mo (Ultra) |
| Google Flow | AI subscribers | Included with subscription |
| YouTube Shorts | All creators | Free |
| YouTube Create App | All creators | Free, rolling out this week |
The free YouTube Shorts access is the most significant detail for working creators. Google is positioning Omni as a native creation tool inside YouTube, not just a standalone generator. Shorts creators can generate and edit AI video directly within the platform they already publish on.
API access for developers and enterprise customers is coming within the next few weeks. Google also teased Omni Pro, a higher-capability model, but gave no release date beyond “when performance exceeds Flash significantly.”
The 10-second ceiling
Omni Flash generates up to 10 seconds of video per generation. Google’s Nicole Brichtova described this as “a product decision, not a technical limitation,” suggesting longer durations are technically possible but held back while the team evaluates quality and safety at scale.
Ten seconds is enough for YouTube Shorts hooks, social clips, and B-roll inserts. It’s not enough for a full YouTube video intro or a complete explainer segment. Creators building longer content will need to generate multiple clips and stitch them together, either in Flow or in an editor like DaVinci Resolve or Descript.
For context, Runway Gen-4 also generates 10-second clips. Kling 3.0 generates up to 15 seconds. The 10-second limit is industry-standard for this generation of models, though longer output windows are widely expected by late 2026.
What Omni outputs (and what’s coming)
Omni Flash launches with video output only, but the model architecture supports image and audio generation as future output modalities. Google confirmed both will arrive, though no specific dates were shared.
The current output supports native audio synchronized with the video. This includes dialogue, ambient sound, and sound effects generated directly by the model rather than added in post-production. Vertical video (9:16) is supported natively, which makes Shorts and Reels output straightforward without cropping or reformatting.
Which creators should try this first
If you’re a YouTube Shorts creator, Omni Flash is the most immediately useful AI video tool Google has released. Free access removes the cost barrier, and the YouTube integration eliminates the friction of exporting from a standalone generator and re-uploading.
If you’re a long-form creator, the value is in B-roll generation and rapid prototyping. Feed Omni a mood board, a script fragment, and a reference clip, and you’ll get a 10-second draft that shows your concept before you commit to a full production shoot. The conversational editing means you can iterate quickly without starting from scratch each time.
If you’re already paying for Google AI Plus at $7.99 per month, you now have access to Gemini Omni, Veo 3.1 through Flow, Nano Banana 2 for image generation, and Flow Music for audio. That’s a full creative stack for less than a single month of Midjourney’s Pro plan.
The I/O 2026 announcement also brought updates to Veo 3.1 (adding audio to Ingredients-to-Video and enhanced creative controls in Flow) alongside refreshed pricing across all Google AI tiers. Google is consolidating its creative AI tools under the Gemini umbrella, and Omni is the model that ties them all together.
Recent Posts
Figma Ships a Design Agent That Thinks in Components, Not Pixels
Figma launched a native AI design agent on the canvas and a coding agent in Figma Make. Both are free during beta on Professional plans and above.
Google Pics, announced at I/O 2026, brings precision AI image editing to Workspace. Generate images with Nano Banana 2, then edit individual elements without regenerating the whole composition.
