



const main = async () => {
const response = await fetch('https://api.ai.cc/v2/video/generations', {
method: 'POST',
headers: {
Authorization: 'Bearer ',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'alibaba/wan2.2-vace-fun-a14b-depth',
prompt: 'Mona Lisa puts on glasses with her hands.',
video_url: 'https://storage.googleapis.com/falserverless/example_inputs/wan_animate_input_video.mp4',
image_url: 'https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg',
resolution: "720p",
}),
}).then((res) => res.json());
console.log('Generation:', response);
};
main()
import requests
def main():
url = "https://api.ai.cc/v2/video/generations"
payload = {
"model": "alibaba/wan2.2-vace-fun-a14b-depth",
"prompt": "Mona Lisa puts on glasses with her hands.",
"video_url": "https://storage.googleapis.com/falserverless/example_inputs/wan_animate_input_video.mp4",
"image_url": "https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg",
"resolution": "720p",
}
headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print("Generation:", response.json())
if __name__ == "__main__":
main()

Product Detail
Wan 2.2 VACE Depth stands as a cutting-edge video-to-video generation model, meticulously optimized for depth map control. As a key member of the Wan 2.2 VACE Fun A14B family, this model harnesses advanced multimodal video synthesis technology to produce high-quality, depth-aware video outputs. Its specialization in depth conditioning provides unparalleled spatial depth control, elevating video realism and enabling dynamic visual effects.
● Key Capabilities & Features
- ✓ Depth-Controlled Generation: Focuses on depth maps to guide video creation with precise spatial awareness.
- ✓ Multi-Resolution Support: Generates videos at 512, 768, and 1024 pixels, accommodating diverse production needs.
- ✓ Smooth Motion & Fluidity: Trained with 81 frames at 16 frames per second (FPS), ensuring exceptionally smooth and fluid motion.
- ✓ Global Accessibility: Features multi-language support for broad global usability.
- ✓ Subject-Specific Depth: Facilitates video generation by specifying the subject with consistent depth-based scene understanding.
- ✓ Wide Input Compatibility: Compatible with various video input types, including MP4, MOV, WebM, M4V, and GIF.
● Technical Specifications
- 💻 Model Size: Approximately 64 GB
- 🔧 Architecture: Built on Wan 2.2-T2V-A14B base model with VACE scheme integration
- ⏰ Frame Rate: Outputs videos at 16 FPS
- 📈 Video Length: Up to 81 frames per inference
- 🔗 Input Types: Accepts raw video or depth map inputs for precise control
● Performance Benchmarks
- ✓ Demonstrates high fidelity video prediction with stable depth consistency.
- ✓ Minimizes common video generation artifacts like jitter and scene inconsistency.
- ✓ Produces cinematic quality motion with enhanced spatial depth cues.
- ✓ Optimized for fluid video generation across multiple resolutions and formats.
● Use Cases
- 🎬 Cinematic video production with precise depth rendering.
- 🔍 Pre-visualization and concept video generation in filmmaking.
- 🎨 Digital art animation requiring spatial depth and scene stability.
- 💰 Commercial video effects where depth cues enhance realism.
- 📜 Research and development in multimodal video synthesis.
● API Integration & Pricing
Access Wan 2.2 VACE Depth capabilities seamlessly via the AI/ML API. Pricing is structured by output resolution:
- 💵 360p: $0.0525
- 💵 540p: $0.07875
- 💵 720p: $0.105
Comprehensive documentation for API integration is available here.
● Code Sample
<snippet data-name="alibaba.create-video-to-video-generation" data-model="alibaba/wan2.2-vace-fun-a14b-depth"></snippet> ● Comparison with Other Leading Models
Wan 2.2 Depth vs. KLING 2.0
Wan 2.2 Depth leverages a Mixture-of-Experts architecture with a strong emphasis on precise depth map control for spatially coherent video generation. In contrast, KLING 2.0 offers broader video synthesis capabilities but with less explicit depth-driven motion control. Wan 2.2 provides superior temporal stability and scene consistency at resolutions up to 1080p.
Wan 2.2 Depth vs. Veo 3
Veo 3 is optimized for fast, real-time video synthesis, typically focusing on lower resolutions (e.g., 720p) for speed. Wan 2.2 Depth, however, prioritizes cinematic quality with detailed depth conditioning and robust frame coherence, delivering higher-quality outputs at the cost of increased computational resources.
Wan 2.2 Depth vs. Wan 2.1 VACE
Wan 2.2 Depth represents a significant advancement, substantially improving video smoothness, motion realism, and depth accuracy through an upgraded architecture. Wan 2.1 VACE is less specialized in depth and often yields less stable outputs, particularly in complex scene generation scenarios.
● Frequently Asked Questions (FAQ)
Q: What is Wan 2.2 VACE Depth's primary advantage?
A: Its primary advantage is unparalleled control over video generation via depth maps, enabling precise spatial awareness and enhanced realism in outputs.
Q: Can Wan 2.2 VACE Depth generate videos at high resolutions?
A: Yes, it supports multi-resolution video prediction, including 512, 768, and 1024 pixels, catering to various quality requirements.
Q: How does it ensure smooth video motion?
A: The model is trained with 81 frames at 16 FPS, which is foundational to producing smooth, fluid, and cinematic-quality motion in its generated videos.
Q: What types of input videos does it accept?
A: It is highly compatible, accepting various video input types such as MP4, MOV, WebM, M4V, and GIF.
Q: Is Wan 2.2 VACE Depth suitable for professional filmmaking?
A: Absolutely. Its precise depth rendering, cinematic quality motion, and ability to minimize generation artifacts make it ideal for cinematic video production and pre-visualization.
AI Playground



Log in