Out

Chat

disable

Veo 3.1 Reference-to-Video

Native audio can be automatically created and synchronized with visual content, improving output realism and coherence.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const main = async () => {
  const response = await fetch('https://api.ai.cc/v2/video/generations', {
    method: 'POST',
    headers: {
      Authorization: 'Bearer ',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'google/veo-3.1-reference-to-video',
      prompt: 'A graceful ballerina dancing outside a circus tent on green grass, with colorful wildflowers swaying around her as she twirls and poses in the meadow.',
      image_urls: [
        'https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-1.png',
        'https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-2.png',
        'https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-3.png',
      ],
    }),
  }).then((res) => res.json());

  console.log('Generation:', response);
};

main()

                                        import requests


def main():
    url = "https://api.ai.cc/v2/video/generations"
    payload = {
        "model": "google/veo-3.1-reference-to-video",
        "prompt": "A graceful ballerina dancing outside a circus tent on green grass, with colorful wildflowers swaying around her as she twirls and poses in the meadow.",
        "image_urls": [
            "https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-1.png",
            "https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-2.png",
            "https://storage.googleapis.com/falserverless/example_inputs/veo31-r2v-input-3.png"
        ]
    }
    headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}

    response = requests.post(url, json=payload, headers=headers)
    print("Generation:", response.json())


if __name__ == "__main__":
    main()

Docs

300+ AI Models for OpenClaw & AI Agents

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Veo 3.1 Reference-to-Video

Product Detail

Introducing Veo 3.1 Reference-to-Video

Google DeepMind's Veo 3.1 Reference-to-Video is an advanced AI model that sets new standards in video generation. It empowers users with unparalleled creative control by enabling them to guide video style and scene composition using reference images. This innovative functionality ensures artistic consistency and seamless integration of scene elements. Veo 3.1 natively produces high-fidelity 8-second videos at 720p or 1080p resolution, complete with synchronized audio for a complete sensory experience.

Source: Veo 3.1 - Ingredients to video

Technical Specifications & Performance

✅ Core Specifications

Input Modalities: Text-to-Video, Image-to-Video (Reference images), Video-to-Video
Output Resolution: 720p and 1080p (16:9 aspect ratio)
Video Length: 8 seconds maximum when using reference images (expandable for longer narratives)
Frame Rate: 24 fps for smooth motion
Audio: Natively generated and perfectly synchronized with video content

📈 Performance Benchmarks

Visually Rich Videos: Generates stunning videos with realistic lighting, intricate shadows, and fluid movements within minutes.
Cinematic & Diverse Styles: Excels in adapting and preserving various cinematic and artistic styles from reference imagery, ensuring layout cohesiveness.
Stable & Evolving: Offers stable model availability, with continuous enhancements and advanced features currently in preview.

Key Features of Veo 3.1

🖼️ Reference-to-Video Control: Leverage up to three reference images to precisely dictate the aesthetic style and intricate scene layout.
🎵 Native Audio Generation: Automatically produces high-quality, synchronized music or compelling sound effects that perfectly complement your video.
💻 High-Definition Resolution: Delivers professional-grade 720p and 1080p output, ideal for a wide range of applications.
⏱️ Short Video Duration: Optimized for creating impactful clips up to 8 seconds, perfect for dynamic, concise content.
⭐ Frame-Specific Generation: Gain ultimate control by defining the first and last frames, enabling the generation of precise video sequences.
📏 Video Extension: Seamlessly expand previously generated videos to tell longer stories or create extended narratives.

API Pricing

💰 $0.21 / sec (audio off)
💰 $0.42 / sec (audio on)

A cost-effective solution for high-quality video generation tailored to your needs.

Versatile Use Cases

🎦 Film & Storyboarding: Expedite the creation of cinematic short clips from text prompts and reference imagery, ideal for pre-visualization.
📂 Advertising & Marketing: Produce engaging product promos and dynamic social media videos efficiently and cost-effectively.
📱 Social Media Content: Generate captivating Shorts, TikToks, and Reels with unique stylized audio-visuals for maximum impact.
🎓 Educational Videos: Develop animated teaching aids and instructional content enriched with synchronized, AI-generated sound.

Important Considerations

💭 Optimal Reference Imagery: Reference images yield the best results when they clearly depict the desired subject matter and artistic style.
💭 Leveraging Multiple References: Using multiple reference images enhances the model's ability to understand and integrate diverse scene elements and complex compositions.
💭 Short-Form Content Optimization: Veo 3.1 is specifically optimized for generating short, high-quality video clips, making it ideal for concise, impactful content rather than lengthy productions.

Code Sample & API Details

For comprehensive API integration guides, code examples, and detailed documentation on utilizing Veo 3.1, please refer to the official AI/ML API documentation:

Access Veo 3.1 API Documentation

(Developers would find embedded code snippets and interactive examples within the linked documentation.)

Veo 3.1 Compared to Other Leading Models

📈 Veo 3.1 vs. Sora 2

Veo 3.1 distinguishes itself by surpassing Sora 2 in visual realism, scene coherence, and crucial audio-visual synchronization. This makes Veo 3.1 particularly suited for cinematic storytelling and commercial video production. While Sora 2 is recognized for rapid generation, Veo 3.1 delivers longer durations and superior multi-scene transitions with enhanced professional quality.

📈 Veo 3.1 vs. Veo 3.0

Veo 3.1 represents a significant leap from Veo 3.0. It extends video length from up to 12 seconds to an impressive 60 seconds and elevates resolution from 720p to crisp 1080p HD. Key additions include native synchronized audio, advanced multi-scene control, embedded cinematic camera presets, and vastly improved continuity of characters and lighting, transforming it into a director-level narrative instrument.

📈 Veo 3.1 vs. Kling 2.1

Kling 2.1 offers strong stylistic video generation but generally outputs shorter clips with less complex scene composition. Veo 3.1's capacity to generate seamless minute-long videos with integrated audio and cinematic effects provides a definitive edge for projects demanding polished narrative videos with consistent audiovisual flow.

📈 Veo 3.1 vs. Wan 2.5

Wan 2.5 focuses on quick video generation with basic scene structuring. However, it lacks the advanced multi-shot scene transitions and robust audio generation capabilities found in Veo 3.1. Veo's integration of cinematic presets and detailed scene control is inherently better suited for crafting highly directed and professionally nuanced video content.

Frequently Asked Questions (FAQ)

❓ What is Veo 3.1 Reference-to-Video?

Veo 3.1 is Google DeepMind's advanced AI model for generating high-fidelity videos. It enables users to control video style and scene composition by providing reference images, ensuring artistic consistency and creative flexibility.

❓ How does the reference image control work?

Users can upload up to three reference images. The model analyzes these images to capture desired artistic styles, color palettes, lighting, and scene layouts, integrating these visual cues into the generated video based on accompanying text prompts.

❓ What are Veo 3.1's key output specifications?

It generates videos up to 8 seconds in length (with extension capabilities), supporting 720p or 1080p resolution at a 16:9 aspect ratio and 24 frames per second. A standout feature is its native generation of synchronized audio, perfectly matched to the video content.

❓ How does Veo 3.1 improve upon Veo 3.0?

Veo 3.1 offers significant advancements, including increased video length up to 60 seconds (from 12), higher 1080p HD resolution (from 720p), native synchronized audio, multi-scene control, and advanced cinematic camera presets, making it a more comprehensive narrative tool.

❓ What are the primary applications of Veo 3.1?

Veo 3.1 is ideally suited for diverse applications such as film storyboarding, creating engaging advertising and marketing content, producing dynamic social media videos (like Shorts, TikToks, and Reels), and developing animated educational materials with AI-generated sound.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

300+ AI Models for
OpenClaw & AI Agents

Save 20% on Costs

Free $1 Tokens for New Members