Professional video editors spend roughly 40% of their working hours on one task that has nothing to do with creativity: searching for the right clip. You scrub, you skim, you tag, you forget where you put the B-roll of that sunset, and you scrub again. TwelveLabs, a San Francisco and Seoul based AI company with $107 million in funding, just launched a product that attacks this exact bottleneck.
Rodeo, which went live on June 1, 2026, is an AI video editing copilot that lets you describe what you want in plain English and then searches, finds, and assembles clips from your footage library automatically. It is the first consumer-facing product from a company that has spent four years building video understanding models for enterprise clients like Autodesk.
What Rodeo Does That Your Current Editor Doesn’t
Most AI video tools in 2026 fall into two camps. Transcript-first editors like Descript let you cut video by editing text, which is brilliant for talking-head content but useless when the moment you need is visual (a drone shot, a reaction, a product close-up with no dialogue). Clip generators like OpusClip analyze long videos and spit out short vertical clips for social, which saves time but removes creative control.
Rodeo sits in a different lane entirely. You upload your footage library, and the AI watches all of it. Not the transcript. Not the metadata. The actual visual content, the audio, the movement, the scene rhythm. Then you talk to it.
“Find every shot where the camera pulls back from a close-up to a wide.” “Show me all the clips with warm golden-hour light.” “Give me three cuts of people laughing naturally, not posed.”
Rodeo returns timestamped results across your entire library. You select, reorder, and assemble. The tool handles the search; you handle the story.
The Models Powering the Search
Two proprietary models do the heavy lifting.
Marengo 3.0 is TwelveLabs’ video understanding engine. It compresses audio, text, movement, visuals, and scene context into a searchable embedding space. Think of it as a librarian who has actually read every book in the building, not one who only knows the spine labels. Marengo lets Rodeo understand what is happening in a frame, not just what objects are present.
Pegasus 1.5 handles long-form video. It supports footage up to one hour with low latency, and TwelveLabs claims it outperformed Gemini 2.5 Pro by 30% on aggregate segmentation quality benchmarks. That benchmark was announced at NAB Show 2026 in April, where the model was already deployed with a major broadcast network.
Together, these models give Rodeo something that timeline scrubbing and transcript search simply cannot: the ability to find visual moments that were never spoken about.
Who Built This (and Why That Matters for Trust)
TwelveLabs was founded by Jae Lee and a team of twelve researchers spanning language, video, machine learning, and perception. The advisory board reads like an AI credibility checklist: Fei-Fei Li (Stanford, co-creator of ImageNet), Silvio Savarese (Stanford), Jeffrey Katzenberg (former DreamWorks Animation CEO), Alex Wang (Scale AI founder), and Lukas Biewald (Weights & Biases founder).
The company raised a $50 million Series A in June 2024 led by NEA and NVIDIA’s NVentures, with additional backing from Index Ventures and Radical Ventures. A $30 million round in December 2024 brought in Databricks, Snowflake Ventures, HubSpot Ventures, and In-Q-Tel (the CIA’s venture arm). Total raised: $107 million across five rounds.
Before Rodeo, TwelveLabs sold its video intelligence API to enterprise clients. The Autodesk partnership, announced at NAB Show 2026, embeds TwelveLabs’ search and tagging into Autodesk Flow Capture (the platform formerly known as Moxion and PIX). Hugh Calveley from Autodesk put it plainly: “Creative teams shouldn’t have to hunt for their footage.”
Rodeo is TwelveLabs’ bet that the same technology can serve individual creators, not just production studios.
Where Rodeo Fits Next to Descript, CapCut, and OpusClip
The comparison chart in most creators’ heads looks something like this:
| Tool | Best for | AI approach | Editing model |
|---|---|---|---|
| Descript | Podcasts, talking-head video | Transcript-first | Edit text, edit video |
| CapCut | Social video, quick cuts | Template and filter based | Timeline with AI shortcuts |
| OpusClip | Repurposing long video into shorts | Speech and engagement scoring | Automated clipping |
| Rodeo | Finding and assembling visual footage | Multimodal scene understanding | Natural language search, manual assembly |
Rodeo is not trying to replace your timeline editor. It is trying to replace the hours you spend before you start editing: the logging, the scrubbing, the “I know I shot this somewhere” frustration. If you shoot a lot of original footage (travel, events, product demos, documentary work), Rodeo solves a problem that Descript and OpusClip don’t even address.
If your content is mostly screen recordings or talking-head videos with minimal B-roll, Descript is still the faster path. If you need automated short-form clips for social, OpusClip is purpose-built for that. Rodeo’s superpower is visual search across large footage libraries, and it shines the more raw footage you have.
What Creators Should Watch Out For
Rodeo is brand new. Launched two days ago. A few things to consider before going all in:
Pricing is undisclosed. TwelveLabs has not published Rodeo pricing as of launch day. Their API pricing for developers starts at a free tier and scales to enterprise, but consumer pricing for Rodeo may follow a different structure. If budget is a constraint, wait for the pricing page before uploading your library.
Export and integration details are thin. The press materials describe the search and assembly workflow, but specifics on export formats, resolution support, and integrations with editors like Premiere Pro or DaVinci Resolve are still unclear. Expect these to fill in as the product matures.
It’s a 1.0 product from an infrastructure company. TwelveLabs has spent four years building models for enterprise customers. Rodeo is their first consumer product. The underlying AI is proven, but the user experience for solo creators may need iteration. Early adopters will get the best technology with the roughest edges.
Your footage has to be uploaded. Rodeo’s value scales with library size, but that means uploading potentially terabytes of footage to a cloud service. Creators working with sensitive client footage or operating under NDAs should review TwelveLabs’ data handling policies before committing.
The Practical Test
I ran a thought experiment based on my own workflow. When I was building out internal training materials during my IT operations years, the single biggest time sink was always the same: “I filmed this walkthrough six months ago. Where is the segment where I demonstrated the failover procedure?” That question, multiplied across dozens of shoots, is exactly what Rodeo promises to answer in seconds instead of hours.
For creators who shoot original footage at volume (YouTubers with multi-camera setups, freelance videographers juggling client projects, documentarians sitting on hundreds of hours of interviews), Rodeo represents a genuine shift. It is the first editing tool built around the assumption that finding footage is harder than cutting it.
For creators who work primarily with generated content, screen recordings, or templated social posts, Rodeo solves a problem you probably don’t have yet. Keep it on your radar, but your current stack is likely sufficient.
Rodeo is available now at rodeo.twelvelabs.io. Sign up requires a TwelveLabs account.
Recent Posts
Luma Ray3.2 introduces 16-keyframe control, 8-face tracking, and HDR EXR output. A closer look at how the model changes AI video production for creators.
ElevenLabs Music v2 Gives Creators an AI Music Editor They Can Actually Monetize
ElevenLabs Music v2 adds section-by-section editing, mid-track genre switching, and embedded sound effects to AI music generation, all built on licensed training data with clear commercial rights.
