qwen-bg
max-ico04
In
Out
max-ico02
Chat
max-ico03
disable
Wan 2.2 Plus Text to Video
It excels in tasks like visual question answering, cross-modal retrieval, and complex data analysis involving images and language. Optimized for scalable API use, Wan2.2 T2V supports streaming and function calling to enable efficient automation of multi-modal workflows.
Free $1 Tokens for New Members
Text to Speech
                                        const main = async () => {
  const response = await fetch('https://api.ai.cc/v2/generate/video/alibaba/generation', {
    method: 'POST',
    headers: {
      Authorization: 'Bearer ',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'alibaba/wan2.2-t2v-plus',
      prompt: 'A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background',
      aspect_ratio: '16:9',
    }),
  }).then((res) => res.json());

  console.log('Generation:', response);
};

main()

                                
                                        import requests


def main():
    url = "https://api.ai.cc/v2/generate/video/alibaba/generation"
    payload = {
        "model": "alibaba/wan2.2-t2v-plus",
        "prompt": "A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background",
        "aspect_ratio": "16:9",
    }
    headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}

    response = requests.post(url, json=payload, headers=headers)
    print("Generation:", response.json())


if __name__ == "__main__":
    main()
Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens
  • ico01-1
    AI Playground

    Test all API models in the sandbox environment before you integrate.

    We provide more than 300 models to integrate into your app.

    copy-img02img01
qwenmax-bg
img
Wan 2.2 Plus Text to Video

Product Detail

Alibaba's Wan2.2 is a state-of-the-art AI model meticulously engineered for advanced multi-modal understanding. It seamlessly integrates both text and vision inputs, offering robust capabilities for large context processing and delivering superior precision in complex text-to-vision tasks and intricate reasoning challenges.

✨ Technical Specifications

Performance Benchmarks

  • VQA-bench: 78.3%
  • Multi-modal Reasoning: 52.7%
  • Cross-modal Retrieval: 81.9%

Performance Metrics (Wan2.1)

Wan2.1 leads with an impressive overall VBench score of 86.22%, demonstrating exceptional performance in dynamic motion, spatial relationships, color accuracy, and multi-object interaction. Training foundational video models demands significant compute power and access to vast, high-quality datasets. Open access to such advanced models drastically reduces barriers, empowering more businesses to create tailored, high-quality visual content in a cost-effective manner.

Alibaba Wan2.2 Multi-modal AI Capabilities

Key Capabilities

  • 💡 Vision-Language Fusion: Excels at interpreting and generating precise responses by seamlessly combining image and text data.
  • 💡 Advanced Reasoning: Demonstrates strong multi-step reasoning abilities across various modalities for in-depth analytics and complex understanding.

💲 API Pricing

  • 🎥 480P: $0.105/video
  • 🎥 1080P: $0.525/video

🚀 Optimal Use Cases

  • Multi-modal Analysis: Enhancing comprehension through the expert combination of image and text data.
  • Visual Question Answering (VQA): Providing accurate and context-aware answers based on integrated image-text inputs.
  • Cross-modal Retrieval: Enabling efficient matching and retrieval of information across both vision and language domains.
  • Business Intelligence: Facilitating complex data interpretation by integrating visual content with textual analytics for deeper insights.

💻 Code Sample

<snippet data-name="alibaba.create-text-to-video-generation" data-model="alibaba/wan2.2-t2v-plus"></snippet>

📊 Comparison with Other Leading Models

  • Vs. Gemini 2.5 Flash: Alibaba Wan2.2 offers higher multi-modal accuracy (78.3% vs. 70.8% VQA-bench), making it a superior choice for integrated vision-language tasks.
  • Vs. OpenAI GPT-4 Vision: Wan2.2 provides a significantly larger context window (65K vs. 32K tokens text), enabling more extensive and coherent conversations with embedded images.
  • Vs. Qwen3-235B-A22B: Alibaba Wan2.2 demonstrates superior cross-modal retrieval precision (81.9% vs. ~78% estimated), optimizing it for demanding large-scale vision-language workflows.

⚠️ Limitations

Occasionally, generated videos may contain unwanted elements such as text artifacts or watermarks. While employing negative prompts can help mitigate these occurrences, it does not fully eliminate them.

🔗 API Integration

Alibaba Wan2.2 is readily accessible via the AI/ML API. Comprehensive documentation is available to facilitate a smooth and efficient integration process.

❓ Frequently Asked Questions (FAQ)

Q: What is Alibaba Wan2.2 primarily designed for?
A: Alibaba Wan2.2 is an advanced AI model engineered for multi-modal understanding, specifically integrating text and vision inputs for complex reasoning and high-precision text-to-vision tasks.
Q: How does Wan2.2 perform in comparison to other models like Gemini 2.5 Flash?
A: Wan2.2 demonstrates higher multi-modal accuracy (78.3% VQA-bench) compared to Gemini 2.5 Flash (70.8%), making it particularly effective for integrated vision-language tasks.
Q: What are the key capabilities of Alibaba Wan2.2?
A: Its primary capabilities include robust vision-language fusion for interpreting and generating content from combined image and text data, and advanced multi-step reasoning across modalities.
Q: Are there any known limitations when using Wan2.2?
A: Occasionally, generated videos might contain unwanted elements such as text artifacts or watermarks. While negative prompts can mitigate these, they don't fully eliminate them.
Q: How can businesses integrate Alibaba Wan2.2 into their systems?
A: Alibaba Wan2.2 is easily accessible through the AI/ML API, with comprehensive documentation provided to guide the integration process.

Learn how you can transformyour company with AICC APIs

Discover how to revolutionize your business with AICC API! Unlock powerfultools to automate processes, enhance decision-making, and personalize customer experiences.
Contact sales
api-right-1
model-bg02-1

One API
300+ AI Models

Save 20% on Costs