



const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'google/gemini-2.5-flash',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="google/gemini-2.5-flash",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
✨ Introducing Gemini 2.5 Flash: Google DeepMind's Breakthrough AI Model
Google DeepMind has unveiled Gemini 2.5 Flash, a highly efficient, cost-effective, and versatile multimodal AI model. Engineered for rapid reasoning and coding tasks, it boasts a formidable 1-million-token context window, making it exceptionally proficient in areas like web development, complex mathematics, and scientific analysis. This model is now accessible to developers and enterprises via Google AI Studio and Vertex AI (in preview), striking an optimal balance between performance quality, operational cost, and processing speed.
🔧 Technical Specifications & Performance Insights
Gemini 2.5 Flash utilizes a Transformer-based hybrid reasoning architecture, allowing developers to fine-tune its "thinking" depth for peak performance. It supports a comprehensive range of inputs including text, image, video, and audio, with advanced post-training for superior reasoning capabilities.
- 💰 Context Window: 1 million tokens, with plans to expand to 2 million soon.
- 📝 Output Capacity: Generates up to 32,768 tokens per response.
- ⚡ Speed: Achieves an impressive 180 tokens/second output speed, with a low latency of 0.8 seconds (TTFT without thinking).
-
📈 Key Benchmarks (with thinking):
- AIME 2025 (Math): 78.3%
- GPQA Diamond (Science): 76.5%
- SWE-Bench Verified (Coding): 58.2%
- MMLU: 0.783
-
💸 API Pricing (per million tokens):
- Input tokens: $0.1575
- Output tokens: $0.63
- Cost for 1,000 tokens (input + output with thinking): $0.0007875 total

💡 Key Capabilities That Set Gemini 2.5 Flash Apart
Gemini 2.5 Flash (an experimental reasoning-focused model) meticulously analyzes tasks to deliver precise and nuanced outputs. Its robust multimodal processing allows for seamless integration of text, images, video, and audio, making it an incredibly versatile tool for diverse and complex workflows.
- 💻 Advanced Coding: Excels in WebDev Arena, generating functional web applications with aesthetically pleasing UIs (e.g., video players, dictation apps). Supports over 40 programming languages and enables agentic coding with minimal supervision.
- 🧠 Superior Reasoning & Problem-Solving: Achieves high scores in challenging domains like math (AIME 2025: 86.7%) and science (GPQA: 84%), leveraging built-in thinking processes for logical conclusions.
- 🎥 Multimodal Processing: Scores 84.8% on VideoMME, facilitating innovative video-to-code workflows (e.g., creating learning applications directly from YouTube videos).
- 🛠️ Tool Utilization: Integrates seamlessly with external tools and APIs through function calling and JSON structuring, enabling multi-step tasks and complex interactions.
- 🌐 Web Development Prowess: Capable of generating responsive, visually engaging web apps complete with advanced features like wavelength animations and hover effects.
- 🎮 Interactive Simulations: Creates executable code for games (e.g., endless runners) and sophisticated visualizations (e.g., Mandelbrot fractals, boid animations).
- 📡 API Features: Offers streaming capabilities, robust function calling, and multilingual support for developing real-time, scalable applications.
🚀 Optimal Use Cases for Gemini 2.5 Flash
- 📱 Web Development: Crafting interactive applications with dynamic and responsive designs.
- 🔣 Code Generation: Autonomous coding for complex simulations and extensive codebases.
- 🔬 Scientific Research: Advanced data analysis and problem-solving in mathematical and scientific fields.
- 🌈 Multimodal Applications: Developing innovative learning apps from video content and creating rich visualizations.
- 💼 Business Automation: Streamlining business processes through seamless API integration.
📊 Comparison with Other Leading Models
Gemini 2.5 Flash stands out in several key areas when compared to its contemporaries:
- ✅ vs. OpenAI o3-mini: Gemini 2.5 Flash is notably faster (180 vs. ~100 tokens/second) and more cost-effective without thinking ($0.15 vs. $0.30 per million output tokens).
- ✅ vs. Claude 3.7 Sonnet: While having a slightly lower SWE-Bench score (58.2% vs. ~65%), Gemini 2.5 Flash offers superior speed and cost efficiency.
- ✅ vs. DeepSeek R1: Possesses a lower AIME score (78.3% vs. 93.3%), but significantly excels in multimodal capabilities.
- ✅ vs. Qwen3-235B-A22B: Delivers a much higher output speed (180 vs. 40.1 tokens/second) and comes at a lower cost.
💻 Code Samples
// Example placeholder for Gemini 2.5 Flash API integration
// This section would typically contain interactive code snippets
// illustrating model usage (e.g., Python, Node.js, etc.)
// For instance:
//
// import { GoogleGenerativeAI } from "@google/generative-ai";
//
// const genAI = new GoogleGenerativeAI(API_KEY);
// const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
//
// async function run() {
// const prompt = "Write a short story about an AI exploring the deep sea.";
// const result = await model.generateContent(prompt);
// const response = await result.response;
// const text = response.text();
// console.log(text);
// }
//
// run();
⚠️ Important Limitations to Consider
- ❌ High Latency: A 0.8s TTFT latency with thinking mode can impact real-time applications.
- ❌ Experimental Status: Being in preview/experimental status may affect stability and lead to changes.
- ❌ No Fine-tuning Support: Currently lacks support for fine-tuning specific models.
- ❌ Increased Costs: Utilizing the "thinking" mode will lead to higher operational costs.
🔗 Seamless API Integration
Gemini 2.5 Flash is readily accessible via the AI/ML API, offering comprehensive streaming capabilities, robust function calling, and full multimodal support to empower developers in building advanced, intelligent applications.
❓ Frequently Asked Questions (FAQ)
Q: What is Gemini 2.5 Flash designed for?
A: Gemini 2.5 Flash is designed as a fast, cost-effective multimodal AI model optimized for reasoning and coding tasks, excelling in areas like web development, math, and scientific analysis.
Q: What is the context window size of Gemini 2.5 Flash?
A: It features a 1-million-token context window, with plans to expand to 2 million tokens in the near future.
Q: How does its pricing compare to other models?
A: Gemini 2.5 Flash offers competitive pricing, with input tokens at $0.1575 and output tokens at $0.63 per million tokens, generally making it more cost-effective than some competitors like OpenAI o3-mini for output.
Q: Can Gemini 2.5 Flash process different types of media?
A: Yes, it boasts robust multimodal capabilities, supporting text, image, video, and audio inputs, and can even facilitate video-to-code workflows.
Q: What are some limitations of Gemini 2.5 Flash?
A: Key limitations include a relatively high latency of 0.8s TTFT with thinking, its current experimental status, lack of fine-tuning support, and increased costs when using the "thinking" mode.
Learn how you can transformyour company with AICC APIs



Log in