Out

Chat

disable

Wan 2.1 Plus

It features strong multi-modal fusion and spatio-temporal coherence, enabling cinematic video synthesis ideal for creative, marketing, and storytelling applications.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const main = async () => {
  const response = await fetch('https://api.ai.cc/v2/generate/video/alibaba/generation', {
    method: 'POST',
    headers: {
      Authorization: 'Bearer ',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'alibaba/wan2.1-t2v-plus',
      prompt: 'A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background',
      aspect_ratio: '16:9',
    }),
  }).then((res) => res.json());

  console.log('Generation:', response);
};

main()

                                        import requests


def main():
    url = "https://api.ai.cc/v2/generate/video/alibaba/generation"
    payload = {
        "model": "alibaba/wan2.1-t2v-plus",
        "prompt": "A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background",
        "aspect_ratio": "16:9",
    }
    headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}

    response = requests.post(url, json=payload, headers=headers)
    print("Generation:", response.json())


if __name__ == "__main__":
    main()

Docs

300+ AI Models for OpenClaw & AI Agents

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Wan 2.1 Plus

Product Detail

Alibaba's Wan2.1 Plus represents a significant leap in text-to-video generation, engineered to produce high-quality, cinematic video outputs with unparalleled precision and efficiency. This advanced AI model leverages sophisticated multi-modal understanding, seamlessly translating intricate textual prompts into visually coherent and dynamic videos. It excels in large-scale video synthesis, offering granular control over motion dynamics and detailed scene composition, making it an indispensable tool for creative and professional applications.

✨ Key Features & Technical Specifications

✔️ Video Generation Quality: Delivers high fidelity in dynamic motions, nuanced facial expressions, and intricate object interactions, ensuring professional-grade output.
🧠 Multi-step Reasoning: Possesses a strong contextual understanding of complex prompts, enabling sophisticated video synthesis that aligns perfectly with user intent.
🎯 Instruction Following: Demonstrates enhanced adherence to user prompts and upholds physical realism in all generated video content.
🎬 Text-to-Video Synthesis: Effortlessly generates smooth, contextually accurate videos directly from natural language descriptions.
🖼️ Multi-modal Scene Understanding: Integrates scene layout, colors, lighting, and movement for truly cinematic and immersive visual effects.
⚙️ Fine Control: Supports detailed prompt-based tuning for aesthetic parameters, including precise adjustments to lighting, camera angles, and color tones.

💰 API Pricing

Only $0.525 per video

💡 Optimal Use Cases

🎥 Creative Content Production: Ideal for filmmaking, advertising, and storyboarding workflows that demand high-definition video output generated from text.
📚 Visual Storytelling: Transforms textual narratives into dynamic, richly detailed visuals, bringing stories to life with unprecedented ease.
🎮 Interactive Media & Entertainment: Facilitates the rapid development of visual assets from script or dialogue inputs for games and interactive experiences.
📈 Business Presentations & Marketing: Enables the generation of tailored video content, significantly enhancing communication impact in business contexts.

Code Sample

⚖️ Comparison with Other Models

Vs. Wan2.2-T2V: Wan2.1-T2V-Plus provides solid performance focusing on cost-effective 1080P video generation, whereas Wan2.2 offers advancements with larger parameter models and a multi-expert architecture for superior aesthetics and efficiency.
Vs. Gemini 2.5 Flash: Wan2.1 delivers competitive text-to-video capabilities, proving particularly valuable for 1080P generation tasks where cost-efficiency is a primary concern.
Vs. OpenAI GPT-4 Vision: Wan2.1 specifically emphasizes dedicated video synthesis from text with robust higher resolution pricing support, contrasting with GPT-4’s broader multimodal conversational strengths.

⚠️ Limitations

Minor Artifacts: Some generated videos may exhibit minor artifacts or inconsistencies, especially with highly complex prompts. While advanced tuning can mitigate these, complete elimination is not always guaranteed.
Video Length: Currently optimized primarily for 5-second video clips. Generating longer videos may require additional processing steps or resources.

❓ Frequently Asked Questions (FAQ)

Q: What is Alibaba Wan2.1 Plus primarily designed for?

A: Alibaba Wan2.1 Plus is an advanced AI model specifically designed for high-quality, cinematic text-to-video generation, excelling in translating textual prompts into visually coherent video outputs.

Q: What kind of control does Wan2.1 Plus offer over video generation?

A: It provides fine control over aesthetic parameters, allowing detailed prompt-based tuning for lighting, camera angles, and color tones to achieve desired cinematic effects.

Q: How does its pricing compare to other models?

A: Wan2.1 Plus offers competitive pricing at $0.525 per video, making it particularly valuable for cost-sensitive 1080P video generation tasks compared to some broader multimodal AI models.

Q: What are the main limitations of Wan2.1 Plus?

A: Primary limitations include potential minor artifacts with complex prompts and current optimization mainly for 5-second video clips, requiring additional processing for longer durations.

Q: In what industries can Wan2.1 Plus be optimally utilized?

A: It is optimally utilized in creative content production (filmmaking, advertising), visual storytelling, interactive media and entertainment, and for enhancing business presentations and marketing.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

300+ AI Models for
OpenClaw & AI Agents

Save 20% on Costs

Free $1 Tokens for New Members