



const main = async () => {
const response = await fetch('https://api.ai.cc/v2/generate/video/kling/generation', {
method: 'POST',
headers: {
Authorization: 'Bearer ',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'kling-video/v2.1/standard/image-to-video',
prompt: 'Mona Lisa puts on glasses with her hands.',
image_url: 'https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg',
duration: '5',
}),
}).then((res) => res.json());
console.log('Generation:', response);
};
main()
import requests
def main():
url = "https://api.ai.cc/v2/generate/video/kling/generation"
payload = {
"model": "kling-video/v2.1/standard/image-to-video",
"prompt": "Mona Lisa puts on glasses with her hands.",
"image_url": "https://s2-111386.kwimgs.com/bs2/mmu-aiplatform-temp/kling/20240620/1.jpeg",
"duration": "5",
}
headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}
response = requests.post(url, json=payload, headers=headers)
print("Generation:", response.json())
if __name__ == "__main__":
main()
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
The Kling V2.1 Standard Image-to-Video generation model marks a significant leap in multimodal AI capabilities, offering robust and versatile video synthesis. It transforms static image inputs, optionally guided by textual prompts, into dynamic video content. This iteration emphasizes improved stability, higher frame quality, and enhanced temporal coherence, all while maintaining user-friendly accessibility and efficient computational performance.
✨ Technical Specifications
- • Video Generation Quality: Employs advanced spatiotemporal convolutional transformers paired with novel motion inference modules to generate smooth, consistent, and artifact-minimized video sequences from single or multiple keyframe images.
- • Resolution and Frame Rate: Supports output resolutions up to 1080p Full HD at a steady 24 fps, optimized for a balanced trade-off between visual fidelity and efficient rendering suitable for real-time applications and batch generation.
- • Prompt & Image Integration: Features a sophisticated cross-modal fusion architecture that synergistically combines detailed image feature extraction with natural language prompts, enabling nuanced scene evolution and stylistic modifications.
- • Camera & Motion Effects: Incorporates baseline camera motion synthesis, including panning, slow zoom, and subtle parallax effects, to enhance immersion and dynamic storytelling while ensuring visual consistency and natural transitions.
📚 Training Data
The model was trained on an expanded, diverse multimedia corpus comprising paired image-to-video datasets across multiple domains: cinematic clips, nature scenes, urban environments, and dynamic artworks. This dataset features rich annotations and multilingual descriptive captions, fostering strong generalization across styles, motions, and cultural contexts.
📈 Performance Metrics
Kling V2.1 achieves a high fidelity-to-latency ratio, delivering seamless video outputs with minimal temporal artifacts at competitive inference speeds. It supports batch processing and prompt-guided variable-length video generation, offering fine-grained control over motion amplitude and stylistic consistency.
💲 API Pricing
Starting at $0.0588 per second of generated video.
💡 Key Features
- ✅ Direct Image-to-Video Generation: Converts a single image or an image set into smooth and coherent video sequences, preserving essential visual elements while introducing plausible motion consistent with scene semantics.
- ✅ Multimodal Prompt Conditioning: Enables users to steer video dynamics and aesthetics via optional textual prompts, augmenting creative flexibility and narrative depth.
- ✅ Enhanced Temporal Coherence: Incorporates novel temporal regularization techniques, significantly reducing flicker, jitter, and motion discontinuities to maintain fluid visual flow across frames.
- ✅ Dynamic Camera Emulation: Implements fundamental camera movements, including subtle zooms, pans, and slight rotational shifts, enhancing scene depth and cinematic presence without sacrificing performance.
- ✅ Stylistic and Contextual Adaptability: Trained to function across a wide range of visual genres, including natural landscapes, urban settings, animation styles, and artistic renderings, allowing for diverse creative outputs.
- ✅ Multilingual Support: Features robust understanding and processing of prompts in English, Chinese, and additional languages, supporting global user needs and broad international applications.
🚀 Use Cases
- ➤ Artistic and creative video development from existing visual assets.
- ➤ Video enhancement and dynamic scene creation for compelling marketing content.
- ➤ Social media and digital storytelling, transforming static images into engaging motion.
- ➤ Preliminary concept visualization and rapid multimedia prototyping.
- ➤ Application in gaming, AR/VR content generation, and interactive media experiences.
- ➤ Cross-lingual video content generation for diverse audience engagement worldwide.
💻 Code Sample
// Example Python code snippet for Kling V2.1 Image-to-Video API integration
import kling_api
# Initialize the Kling API client with your authentication key
client = kling_api.KlingClient(api_key="YOUR_API_KEY")
# Define your input image and an optional textual prompt
image_path = "path/to/your/input_image.jpg"
text_prompt = "A majestic eagle soaring over snow-capped mountains at sunrise."
video_duration = 5 # Desired video length in seconds
try:
with open(image_path, "rb") as image_file:
# Call the Image-to-Video generation endpoint
response = client.generate_video(
model="kling-video/v2.1/standard/image-to-video",
image=image_file.read(),
prompt=text_prompt,
duration=video_duration
)
if response.status == "success":
print("Video generation successful!")
print(f"Generated Video URL: {response.video_url}")
# Further steps: e.g., download the video or integrate into your application
else:
print(f"Video generation failed: {response.error_message}")
except FileNotFoundError:
print(f"Error: Image file not found at {image_path}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
🆚 Comparison with Other Models
vs Kling V2.0 Standard I2V: Kling V2.1 delivers significant upgrades, boosting output resolution from 720p to 1080p. It features enhanced temporal smoothness through improved motion inference modules and integrates a more powerful cross-modal fusion mechanism for superior image-text alignment and overall video consistency. Both inference speed and API throughput have been optimized for lower latency and higher concurrency.
vs Kling V1.5 Standard T2V: While V1.5 focuses primarily on Text-to-Video (T2V) synthesis, V2.1 Standard I2V shifts the paradigm towards image-conditioned video generation (I2V). V2.1 offers richer scene dynamics guided primarily by visual input with complementary text prompts, greatly expanding its use-case versatility. Despite its different input modality focus, V2.1 also delivers notable improvements in temporal continuity and resolution.
❓ Frequently Asked Questions (FAQ)
Q1: What are the primary advantages of Kling V2.1 over its predecessor, V2.0?
Kling V2.1 offers significant advancements, including 1080p Full HD output resolution (up from 720p), enhanced temporal smoothness, and improved image-text alignment due to a more powerful cross-modal fusion mechanism. It also boasts optimized inference speed and API throughput for greater efficiency.
Q2: Can Kling V2.1 generate videos from multiple images, or only a single image?
Kling V2.1 is versatile and can generate smooth, coherent video sequences from either a single static image or a set of multiple keyframe images, integrating them into a dynamic visual narrative.
Q3: How does textual prompting enhance the video generation process?
Optional textual prompts allow users to finely steer the video's dynamics, aesthetics, and overall narrative direction. This multimodal conditioning facilitates nuanced scene evolution and stylistic modifications that are deeply grounded in both the input imagery and the provided text context.
Q4: Is Kling V2.1 suitable for applications requiring real-time video generation?
Yes, the model is optimized for a balanced trade-off between visual fidelity and efficient rendering. This makes it well-suited for real-time applications, interactive media, and batch video generation, thanks to its competitive inference speeds and minimal temporal artifacts.
Q5: What languages are supported for textual prompts in Kling V2.1?
Kling V2.1 offers robust multilingual support. It can effectively understand and process prompts provided in English, Chinese, and several other languages, thereby catering to a diverse and international user base.
Learn how you can transformyour company with AICC APIs



Log in