



const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'moonshot/kimi-k2-0905-preview',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="moonshot/kimi-k2-0905-preview",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
✨ The Kimi K2 0905 Preview is an advanced update to the Kimi K2 model, meticulously engineered for high-performance in intelligent agent creation, multi-turn conversational AI, and complex analytical tasks. This cutting-edge version boasts an extended context window of 262,144 tokens and integrates enhanced request caching, delivering unparalleled efficiency and depth in natural language understanding and reasoning. It is specifically tailored for demanding applications such as corporate assistants, sophisticated agent-based workflows, and advanced reasoning systems that require extensive context and memory capabilities.
🚀 Technical Specifications
- Model Type: Large-scale Transformer-based language model
- Context Window: 262,144 tokens (significantly expanded from previous versions)
- Architecture: Hybrid architecture optimized for long context retention and efficient memory usage
- Training Data: Diverse, high-quality corpora with a strong focus on dialogue, reasoning, and enterprise-specific texts
- Supported Tasks: Natural language understanding, reasoning, multi-turn dialogue, text summarization, and advanced analytics
- Max Output Tokens per Request: 8192 tokens
📊 Performance Benchmarks
Across five distinct evaluations, including SWE-bench Verified, Multilingual, and SWE-Dev, the Kimi K2 0905 achieves consistently higher average scores than both Kimi K2-0711 and Claude Sonnet 4. Each reported score represents the average of five rigorous test runs, ensuring robust statistical reliability and showcasing its superior capabilities.
💡 Key Features
- Ultra-long Context Processing: Seamlessly handles extensive documents and conversations with up to 262K tokens.
- Enhanced Caching Mechanism: Significantly improves throughput and reduces latency in multi-turn sessions and repetitive queries, optimizing performance.
- Multiturn Dialogue Specialization: Maintains excellent context coherency over long conversations, making it ideal for sophisticated virtual assistants.
- Intelligent Agent Capabilities: Provides robust support for autonomous decision-making and the execution of complex tasks in diverse environments.
- Advanced Reasoning: Excels in analytical queries that demand sustained logic and intricate inference chains.
💲 Kimi K2 0905 API Pricing
- Input: $0.1575 / 1M tokens
- Output: $2.625 / 1M tokens
💻 Use Cases
- Corporate Virtual Assistants: Managing complex workflows and interacting with large volumes of documentation.
- Customer Support Bots: Handling extended multi-turn conversations with personalized context retention, enhancing user experience.
- Intelligent Agents: For automated decision-making in critical enterprise domains such as finance, healthcare, and legal.
- Analytical Tools: Requiring deep contextual understanding and advanced inference capabilities over lengthy texts.
- Multi-agent Systems: Enabling synchronized memory and coordinated actions across extended interaction histories.
✍️ Code Sample
# Example: Basic API call structure (Python)
import requests
import json
API_KEY = "YOUR_API_KEY"
MODEL_URL = "https://api.kimi.ai/v1/chat/completions" # Hypothetical URL
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {API_KEY}"
}
data = {
"model": "moonshot/kimi-k2-0905-preview",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the key features of Kimi K2 0905 in detail."}
],
"max_tokens": 500,
"temperature": 0.7
}
try:
response = requests.post(MODEL_URL, headers=headers, data=json.dumps(data))
response.raise_for_status() # Raise an exception for HTTP errors
print(json.dumps(response.json(), indent=2))
except requests.exceptions.RequestException as e:
print(f"API Error: {e}")
🆚 Comparison with Other Models
vs GPT-4 Turbo: Kimi-K2-0905 offers double the context length (262K vs. 128K) and superior caching mechanisms for repetitive enterprise queries. While GPT-4 excels in general creativity, Kimi-K2-0905 is specifically optimized for structured reasoning and agent reliability.
vs Claude 3.5 Sonnet: Both deliver strong analytical performance, but Kimi-K2-0905 provides faster inference on long contexts and native support for stateful agent memory. Claude tends to favor conversational fluency, whereas Kimi prioritizes efficient task completion.
vs Llama 3 70B: Llama 3 is highly customizable, but it lacks built-in long-context optimization and comprehensive enterprise tooling. Kimi-K2-0905 delivers out-of-the-box performance with managed infrastructure, integrated caching, and compliance features.
vs Gemini 1.5 Pro: Gemini matches Kimi in context length, but Kimi-K2-0905 demonstrates lower latency in cached scenarios and offers better tool-integration for agentic loops. Gemini leads in multimodal tasks, while Kimi dominates in text-centric enterprise reasoning.
❓ Frequently Asked Questions (FAQ)
Q: What is the primary advantage of Kimi K2 0905's context window?
A: The Kimi K2 0905 features an ultra-long context window of 262,144 tokens, allowing it to process and retain information from extremely large documents and extended conversations, which is crucial for complex enterprise applications and intelligent agents.
Q: How does Kimi K2 0905 enhance efficiency for repetitive queries?
A: It integrates an enhanced caching mechanism that significantly improves throughput and reduces latency, especially beneficial for multi-turn sessions and frequently repeated requests, leading to more efficient operations.
Q: What types of tasks is Kimi K2 0905 best suited for?
A: Kimi K2 0905 is tailored for natural language understanding, advanced reasoning, multi-turn dialogue, text summarization, and complex analytical tasks. It particularly excels in applications requiring extensive context and memory like corporate assistants and intelligent agents.
Q: Can Kimi K2 0905 be used for developing intelligent agents?
A: Yes, it offers robust intelligent agent capabilities, supporting autonomous decision-making and complex task execution, making it an excellent choice for building sophisticated agent-based workflows.
Q: What are the API pricing details for Kimi K2 0905?
A: The input cost is $0.1575 per 1M tokens, and the output cost is $2.625 per 1M tokens.
Learn how you can transformyour company with AICC APIs



Log in