Gemini Omni: Google’s New ‘Create Anything’ Video AI Just Landed on YouTube for Free

a close up of a cell phone with buttons

$7.99 per month. That’s what full access to Google’s new multimodal video generator costs through a Gemini AI Plus subscription. Google launched Gemini Omni Flash at I/O 2026 on May 19, and YouTube Shorts creators get access for free.

Gemini Omni is not another text-to-video tool. It’s the first model from Google that accepts text, images, audio, and video as combined input, reasons across all of them, and outputs video with synchronized audio. DeepMind CEO Demis Hassabis called it “our new model that can create anything from any input” and described it as a step toward merging Gemini’s intelligence with the rendering capabilities of Veo, Nano Banana, and Genie.

For creators, the practical impact is immediate. You can feed Omni a script, a reference image, a voice sample, and a clip of existing footage, and it will synthesize a cohesive video that accounts for all of those inputs simultaneously. Then you can edit the result by talking to it.

What’s actually under the hood

Gemini Omni Flash is the first model in the Omni family. It combines several Google AI systems that previously operated independently:

  • Gemini 3.1: The reasoning and knowledge engine that understands physics, culture, and context
  • Veo 3.1: Google’s video generation model (the same engine behind Google Flow)
  • Nano Banana 2: Google’s image generation model, responsible for the viral AI image trend earlier this year
  • Genie: Google’s interactive world model

The integration means Omni doesn’t just stitch inputs together. Nicole Brichtova, DeepMind’s director of product management, explained it as “the next step towards combining the intelligence of Gemini with the rendering capabilities of our media models.” In practice, the model understands gravity, kinetic energy, and fluid dynamics when generating video. A marble rolling across a desk bounces and sounds correct because the model reasons about physics before rendering frames.

Editing by conversation, not by prompt

The most useful feature for working creators is conversational editing. After Omni generates a video, you can refine it through plain language prompts that build on each other. Ask it to swap the background, change the lighting, add a character, or adjust the wardrobe, and it maintains consistency in the elements you didn’t ask it to change.

This is the difference between Omni and a standalone tool like Runway Gen-4 or Kling 3.0. Those tools generate from a single prompt. Omni generates, then lets you iterate through conversation while preserving characters and environments across edits. It’s closer to directing than prompting.

Google also introduced two companion tools inside Flow:

  • Flow Agent: An AI assistant that brainstorms scenes, organizes assets, recommends plot changes, and batch-edits across a project
  • Flow Tools: Custom editing workflows built through natural language prompts, no coding required

Creators already using Google Flow for video or music work will find Omni layered directly into that environment.

Your face, your voice, and Google’s safety system

Omni can generate videos featuring your digital avatar. You record yourself and speak a series of numbers during onboarding, and Google creates a personalized avatar that looks and sounds like you. The avatar gets stored for reuse across future sessions.

Google designed the avatar system with anti-deepfake measures baked in. Only the person who recorded the avatar can use it. The onboarding process verifies identity, and every video generated through Omni carries Google’s SynthID digital watermark. SynthID is imperceptible to viewers but can be verified through the Gemini app, Chrome, and Google Search.

Audio and speech editing are still in testing. Google acknowledged the sensitivity of voice manipulation and is holding those features back until the safety implementation meets their standards. For now, voice references work for avatar creation, but you cannot edit someone else’s speech within a generated video.

YouTube creators Happy Kelli and comedian Adam Waheed demonstrated the avatar system at I/O 2026, generating short clips featuring their digital selves in scenarios they described through text prompts.

Where you can use it right now

Gemini Omni Flash is live as of May 19, 2026, across multiple platforms:

Platform Access Level Cost
Gemini app AI Plus subscribers and above $7.99/mo (Plus), $19.99/mo (Pro), $99.99/mo (Ultra)
Google Flow AI subscribers Included with subscription
YouTube Shorts All creators Free
YouTube Create App All creators Free, rolling out this week

The free YouTube Shorts access is the most significant detail for working creators. Google is positioning Omni as a native creation tool inside YouTube, not just a standalone generator. Shorts creators can generate and edit AI video directly within the platform they already publish on.

API access for developers and enterprise customers is coming within the next few weeks. Google also teased Omni Pro, a higher-capability model, but gave no release date beyond “when performance exceeds Flash significantly.”

The 10-second ceiling

Omni Flash generates up to 10 seconds of video per generation. Google’s Nicole Brichtova described this as “a product decision, not a technical limitation,” suggesting longer durations are technically possible but held back while the team evaluates quality and safety at scale.

Ten seconds is enough for YouTube Shorts hooks, social clips, and B-roll inserts. It’s not enough for a full YouTube video intro or a complete explainer segment. Creators building longer content will need to generate multiple clips and stitch them together, either in Flow or in an editor like DaVinci Resolve or Descript.

For context, Runway Gen-4 also generates 10-second clips. Kling 3.0 generates up to 15 seconds. The 10-second limit is industry-standard for this generation of models, though longer output windows are widely expected by late 2026.

What Omni outputs (and what’s coming)

Omni Flash launches with video output only, but the model architecture supports image and audio generation as future output modalities. Google confirmed both will arrive, though no specific dates were shared.

The current output supports native audio synchronized with the video. This includes dialogue, ambient sound, and sound effects generated directly by the model rather than added in post-production. Vertical video (9:16) is supported natively, which makes Shorts and Reels output straightforward without cropping or reformatting.

Which creators should try this first

If you’re a YouTube Shorts creator, Omni Flash is the most immediately useful AI video tool Google has released. Free access removes the cost barrier, and the YouTube integration eliminates the friction of exporting from a standalone generator and re-uploading.

If you’re a long-form creator, the value is in B-roll generation and rapid prototyping. Feed Omni a mood board, a script fragment, and a reference clip, and you’ll get a 10-second draft that shows your concept before you commit to a full production shoot. The conversational editing means you can iterate quickly without starting from scratch each time.

If you’re already paying for Google AI Plus at $7.99 per month, you now have access to Gemini Omni, Veo 3.1 through Flow, Nano Banana 2 for image generation, and Flow Music for audio. That’s a full creative stack for less than a single month of Midjourney’s Pro plan.

The I/O 2026 announcement also brought updates to Veo 3.1 (adding audio to Ingredients-to-Video and enhanced creative controls in Flow) alongside refreshed pricing across all Google AI tiers. Google is consolidating its creative AI tools under the Gemini umbrella, and Omni is the model that ties them all together.

Ty Sutherland

Ty Sutherland is the Chief Editor of Full-stack Creators. Ty is lifelong creator who's journey began with recording music at the tender age of 12 and crafting video content during his high school years. This passion for storytelling led him to the University of Regina's film faculty, where he honed his craft. Post-university, Ty transitioned into the technology realm, amassing 25 years of experience in coding and systems administration. His tenure at Electronic Arts provided a deep dive into the entertainment and game development sectors. As the GM of a data center and later the COO of WTFast, Ty's focus sharpened on product strategy, intertwining it with marketing and community-building, particularly within the gaming community. Outside of his professional pursuits, Ty remains an enthusiastic content creator. He's deeply intrigued by AI's potential in augmenting individual skill sets, enabling them to unleash their innate talents. At Full-stack Creators, Ty's mission is clear: to impart the wealth of knowledge he's gathered over the years, assisting creators across all mediums and genres in their artistic endeavors.

Recent Posts