



const main = async () => {
const response = await fetch('https://api.ai.cc/v2/generate/video/kling/generation', {
method: 'POST',
headers: {
Authorization: 'Bearer ',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'kling-video/v1.5/pro/text-to-video',
prompt: 'A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background',
aspect_ratio: '16:9',
duration: '5',
}),
}).then((res) => res.json());
console.log('Generation:', response);
};
main()
import requests
def main():
url = "https://api.ai.cc/v2/generate/video/kling/generation"
payload = {
"model": "kling-video/v1.5/pro/text-to-video",
"prompt": "A DJ on the stand is playing, around a World War II battlefield, lots of explosions, thousands of dancing soldiers, between tanks shooting, barbed wire fences, lots of smoke and fire, black and white old video: hyper realistic, photorealistic, photography, super detailed, very sharp, on a very white background",
"aspect_ratio": "16:9",
"duration": "5",
}
headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print("Generation:", response.json())
if __name__ == "__main__":
main()
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
Kling V1.6: Advanced Multi-Image to Video Generation
Kling V1.6 Multi-Image to Video represents the latest advancement in the Kling series, meticulously engineered to transform multiple input images into seamlessly integrated, high-quality video sequences. Building upon the robust foundation of the Kling V1.5 generation suite, this version excels in coherently synthesizing temporal progression from static visual inputs. It offers enhanced creative control over scene transitions, object motion continuity, and stylistic consistency throughout generated videos. Tailored specifically for creators, agencies, and enterprises requiring precise video generation from curated imagery, Kling V1.6 M2V leverages cutting-edge spatiotemporal modeling to deliver industry-leading fidelity, expanded resolution support, and sophisticated multi-image contextual understanding.
Technical Specifications
- 🎥 Video Generation Quality: Utilizes an innovative approach combining advanced frame interpolation with context-aware temporal synthesis, minimizing temporal jitter and preserving image details while ensuring smooth and realistic animation over extended sequences.
- 💻 Resolution and Frame Rate: Supports up to 4K Ultra HD at a stable 30 frames per second, enabling production-ready video content with balanced computational efficiency.
- 🔍 Multi-Image Contextual Parsing: Features an enhanced multi-modal fusion engine capable of interpreting complex visual narratives across input images, maintaining spatial and semantic coherence to create fluid storyboards that precisely reflect user intent and image semantics.
- 🎦 Camera and Motion Dynamics: Implements superior simulation of camera movements, including parallax effects, dynamic zooms, stabilized pans, and auto focus adjustments, producing immersive cinematographic experiences directly from static image inputs.
Technical Details
Model Architecture
Kling V1.6 employs a hybrid transformer-GAN architecture with hierarchical spatiotemporal attention layers meticulously optimized for integrating diverse image inputs over time. This structure enables the model to maintain consistent object identities and scene context, with temporal GAN modules refining motion realism and suppressing visual artifacts across frames. Advanced cross-modal attention pathways fuse image feature embeddings with style and motion vectors for highly coherent video generation.
Performance Metrics
Balances visual output quality with robust inference speeds suitable for scalable deployment. It supports batch processing with fine-grained style, motion, and duration control, enabling users to customize output videos to exact project requirements while maintaining enterprise-grade uptime and reliability.
API Pricing
💸 Cost per second: $0.0588
Key Features
- ⏱ Extended Temporal Synthesis: Supports longer video generation with improved temporal coherence, capable of maintaining smooth transitions and narrative flow across up to 30 seconds per generation.
- 🎦 Advanced Camera Simulation: Includes a diverse range of camera effects adapted from still image inputs, delivering professional tracking shots, zoom effects, parallax shifts, and focus transitions that enhance the cinematic quality of generated videos.
- 🎭 Style and Visual Continuity: Trained extensively on multi-image datasets that enable replication of a broad spectrum of visual styles and aesthetics, ensuring generated sequences faithfully respect input imagery’s stylistic and thematic attributes.
- 🔀 Cross-Modal Context Integration: Effectively integrates visual semantics from multiple images to produce coherent narrative and scene progression, supporting complex storytelling scenarios such as character movement and environmental changes across frames.
- 🌐 Multilingual and Cross-Cultural Versatility: While primarily image-driven, the model’s training incorporates multilingual metadata to support additional text or cue integration from diverse languages for localizable visual content production.
Use Cases
- 🎨 Creative Production: Converting photo sets or concept art into animated video content.
- 📣 Advertising & Marketing: Generating dynamic video from static product shots.
- 📚 Visual Storytelling: Concept visualization using multiple scene captures.
- 📱 Social Media & Digital Content: Leveraging quick image-to-video transformations.
- 🎧 Animation Studios: Synthesizing motion from static layouts or multi-panel artwork.
- 🌍 Enterprise Multimedia: Integrating multi-angle visual assets for large-scale projects.
- 🔧 Rapid Prototyping: Quickly creating video narratives based on curated image collections.
Code Sample
<snippet data-name="kling.create-text-to-video-generation" data-model="kling-video/v1.5/pro/text-to-video"></snippet>
Please note: This code snippet represents a placeholder for API integration. For detailed implementation, refer to the official API documentation.
❓ Frequently Asked Questions (FAQ)
Q1: What is Kling V1.6 Multi-Image to Video?
A: Kling V1.6 is an advanced AI model designed to transform multiple static images into dynamic, high-quality video sequences, offering enhanced control over transitions, motion, and stylistic consistency.
Q2: What video resolutions does Kling V1.6 support?
A: It supports up to 4K Ultra HD resolution at a stable 30 frames per second, suitable for professional production-ready content.
Q3: How does Kling V1.6 ensure smooth transitions and continuity?
A: The model utilizes advanced frame interpolation, context-aware temporal synthesis, and a hybrid transformer-GAN architecture to maintain object identities, scene context, and smooth motion realism across frames.
Q4: Can I control camera movements with Kling V1.6?
A: Yes, it implements superior simulation of camera movements, including parallax effects, dynamic zooms, stabilized pans, and auto-focus adjustments, allowing for immersive cinematographic experiences.
Q5: What are the primary use cases for Kling V1.6?
A: It's ideal for creative production, advertising, visual storytelling, social media content, animation studios, enterprise multimedia generation, and rapid prototyping of video narratives from image collections.
Learn how you can transformyour company with AICC APIs



Log in