



const main = async () => {
const response = await fetch('https://api.ai.cc/v2/generate/video/kling/generation', {
method: 'POST',
headers: {
Authorization: 'Bearer ',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'kling-video/v2.1/pro/image-to-video',
prompt: 'Mona Lisa puts on glasses with her hands.',
image_url: 'https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg',
duration: '5',
}),
}).then((res) => res.json());
console.log('Generation:', response);
};
main()
import requests
def main():
url = "https://api.ai.cc/v2/generate/video/kling/generation"
payload = {
"model": "kling-video/v2.1/pro/image-to-video",
"prompt": "Mona Lisa puts on glasses with her hands.",
"image_url": "https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg",
"duration": "5",
}
headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print("Generation:", response.json())
if __name__ == "__main__":
main()
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
Kling V2.1 Pro represents the latest advancement in the Kling series’ image-to-video generation technology. It delivers unparalleled video synthesis quality, enhanced semantic relevance, and expanded creative control. Building on the robust foundation of Kling V2.0 Standard, this professional iteration caters to the most demanding multimedia production workflows by integrating advanced image understanding, long-duration video generation, and adaptive stylistic rendering. Designed for visual artists, production studios, and enterprises requiring scalable, high-fidelity video generation from static imagery, Kling V2.1 Pro Image-to-Video introduces enhanced contextual embedding, sophisticated temporal dynamics to support complex visual storytelling and innovation-driven pipelines.
⚙️Technical Specifications
- Video Generation Quality: Utilizes next-generation spatiotemporal synthesis and frame interpolation algorithms that ensure ultra-smooth motion continuity and striking photorealism, significantly minimizing visual artifacts and temporal noise across generated sequences.
- Resolution and Frame Rate: Supports seamless generation of videos up to 4K Ultra HD resolution at a stable 30 frames per second, achieved through optimized rendering engines that prioritize both visual fidelity and computational efficiency.
- Input Image Processing: Employs a refined image-encoding pipeline capable of extracting deep semantic and compositional features from various image formats and resolutions, enabling precise narrative extrapolation and visual expansion from a single or batch of images.
- Camera & Cinematic Effects: Integrates advanced virtual cinematography, including dynamic tracking, crane shots, zooms, parallax shifts, and programmable depth-of-field effects, facilitating immersive and professional video compositions while maintaining real-time synthesis speeds.
🔬Technical Details
Model Architecture
Features an enhanced hybrid transformer-GAN design with multi-scale hierarchical attention and temporal coherence modules explicitly designed for long-range spatiotemporal modeling and frame-level consistency. The architecture incorporates novel image encoder fusion blocks that synergize static visual cues with dynamic video synthesis pathways, enabling sophisticated scene progression and context-aware animation.
Training Data
Trained on a proprietary, large-scale dataset combining diverse high-resolution images paired with synchronized video sequences spanning multiple genres, including narrative cinematics, advertising content, documentaries, and highly stylized animations. The dataset emphasizes multilingual annotations and rich metadata to bolster cross-domain adaptability and fine-grained style control.
Performance Metrics
Achieves industry-leading trade-offs between ultra-high visual fidelity, latency, and computational resource usage, offering robust batch processing capabilities and fine control over temporal length, scene complexity, and stylistic parameters to align with varied production needs.
💰API Pricing
Only $0.1029 per video second
✨Key Features
- High-Fidelity Image-to-Video Generation: Transforms static images into coherent, richly detailed video sequences with fluid motion, preserving key visual characteristics while creatively extending the source content.
- Extended Temporal Scope: Supports video durations up to 30 seconds, leveraging extensive contextual memory to maintain thematic and visual consistency throughout evolving scenes.
- Dynamic Cinematic Simulation: Offers an advanced toolkit of camera maneuvers including smooth dolly and crane motions, multi-axis rotation, depth modulation, and focus pull transitions, enabling professional visual storytelling and dramatic effect creation.
- Multi-Style and Genre Adaptability: Trained on extensive genre-diverse datasets enabling faithful reproduction of live action, animation, documentary, and experimental styles with high-fidelity stylistic nuances and content variability.
- Multilingual and Multimodal Prompting: Incorporates robust multilingual understanding (English, Mandarin Chinese, and additional languages) and supports multimodal inputs combining text annotations and visual cues to enable precise control and localization for global production requirements.
💡Use Cases
- ✅Generating extended, narrative-rich video content from photographic assets for advertising, marketing, and educational purposes.
- ✅Cinematic storyboarding and concept development translating static art into dynamic sequences.
- ✅Social media video enhancement and creative augmentation through image animation.
- ✅Documentary and narrative video augmentation driven by photographic archives.
- ✅Animation and live-action video synthesis from high-resolution images.
- ✅Enterprise-grade multimedia content generation for creative studios and corporate communication teams.
- ✅Rapid visual prototyping and iterative story development leveraging image inputs.
- ✅Multilingual video production tailored for diverse international markets.
💻Code Sample
<snippet data-name="kling.create-image-to-video-generation" data-model="kling-video/v2.1/pro/image-to-video"></snippet>
📊Comparison with Other Models
vs Kling V2.0 Standard I2V: Kling V2.1 Pro significantly extends video duration from 15 to 30 seconds, upgrades maximum resolution and frame rate stability to 4K/30fps, introduces a more sophisticated image-encoding and temporal consistency approach, and enhances camera simulation capabilities with multi-axis dynamic effects. The Pro version also improves inference efficiency, supporting enterprise-scale batch processing with refined scene and style control.
vs Kling V1.5 Pro T2V: While Kling V1.5 Pro focuses on text-to-video generation, Kling V2.1 Pro I2V pioneers sophisticated image-to-video synthesis with higher resolution, longer video duration, enhanced motion realism, and multi-source multimodal integration, reflecting significant architectural innovations and expanded application scope.
❓Frequently Asked Questions (FAQ)
Q: What makes Kling V2.1 Pro ideal for professional multimedia production?
A: Kling V2.1 Pro offers unparalleled video synthesis quality, 4K Ultra HD resolution at 30fps, extended video durations up to 30 seconds, and advanced cinematic effects. These features, combined with its robust image understanding and adaptive stylistic rendering, make it suitable for demanding professional workflows in film, advertising, and enterprise content creation.
Q: How does Kling V2.1 Pro differ from Kling V2.0 Standard?
A: V2.1 Pro significantly extends video duration from 15 to 30 seconds, upgrades resolution and frame rate to 4K/30fps, and introduces a more sophisticated image-encoding pipeline. It also enhances camera simulation with multi-axis dynamic effects and improves inference efficiency for enterprise-scale batch processing.
Q: What kind of creative control does Kling V2.1 Pro offer?
A: Users gain extensive creative control through dynamic cinematic simulation (dolly, crane, zoom, depth-of-field), multi-style and genre adaptability, and robust multilingual/multimodal prompting. This allows for precise narrative extrapolation and customized visual storytelling.
Q: What is the pricing structure for Kling V2.1 Pro's API?
A: The API is priced at $0.1029 per video second, offering a competitive rate for high-fidelity video generation.
Q: Can Kling V2.1 Pro handle different languages for content generation?
A: Yes, it incorporates robust multilingual understanding, supporting English, Mandarin Chinese, and additional languages. This feature, combined with multimodal inputs, enables precise control and localization for global production requirements.
Learn how you can transformyour company with AICC APIs



Log in