Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

NVIDIA Cosmos Predict 2.5 has been fine-tuned using LoRA and DoRA techniques to improve robot video generation. This advancement enhances the model's ability to create more accurate and realistic robotic videos. The update is significant for robotics and AI video generation applications.

ArchiveMajorHigh-signal source

Signal trust

High-signal sourceSingle sourceEarly signal

Market reactionNVDA → +0.10% by next close

Before $222.03After $222.25

PublishedMonday, May 18, 2026 at 6:00 PMMay 18, 06:00 PM

FreshnessArchive

Story ID#3233

Back to feed Original report

Original article excerpt

Server-side extracted preview paragraphs from the original source.

A Blog post by NVIDIA on Hugging Face

NVIDIA Cosmos Predict 2.5 is a large-scale world model capable of generating physically plausible videos conditioned on text, images, or video clips. To adapt it to a specific domain, such as robot manipulation or a particular camera viewpoint, teams still need targeted fine-tuning.

Training robot policies requires demonstration data, but collecting real-robot trajectories is slow and expensive. Generating synthetic trajectories with a fine-tuned video world model offers a scalable alternative. However, full fine-tuning of a 2B-parameter model is expensive and risks catastrophic forgetting of general knowledge. LoRA and DoRA inject small trainable adapter modules into the frozen base model, reducing memory requirements while keeping the adapter files small and portable. This makes it practical to fine-tune on a single GPU and flexibly swap adapters for different domains at inference.

Opening the briefing

Fine-Tuning NVIDIA Cosmos Predict 2.5 with LoRA/DoRA for Robot Video Generation

Original article excerpt