



const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-fp8-tput",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
✨ Discover Qwen3-235B-A22B: Alibaba Cloud's Flagship AI Model
The Qwen3-235B-A22B model, developed by Alibaba Cloud, stands as a premier large language model (LLM) utilizing a sophisticated Mixture-of-Experts (MoE) architecture. Boasting an impressive 235 billion total parameters, it intelligently activates 22 billion parameters per inference, achieving unparalleled performance across critical domains like coding, mathematics, and complex reasoning. Its versatility extends to 119 languages, making it an ideal solution for global enterprise applications, from software development to advanced research. Access is streamlined via the AI/ML API.
🚀 Technical Deep Dive: Architecture and Performance
Qwen3-235B-A22B is engineered with a cutting-edge Transformer-based MoE architecture. It dynamically selects the top-8 experts per token, activating only 22 billion of its total 235 billion parameters to significantly reduce computational costs while maintaining peak performance. Enhanced with Rotary Positional Embeddings and Group-Query Attention, it ensures remarkable efficiency. The model was pre-trained on an extensive dataset of 36 trillion tokens spanning 119 languages, and further refined through RLHF and a rigorous four-stage post-training process for superior hybrid reasoning capabilities.
- Context Window: Natively supports 32K tokens, extendable up to an impressive 128K tokens with YaRN integration.
-
Key Benchmarks:
- ✅ Outperforms OpenAI’s o3-mini on AIME (mathematics) and Codeforces (coding).
- ✅ Surpasses Gemini 2.5 Pro on BFCL (reasoning) and LiveCodeBench.
- ✅ Achieves an MMLU score of 0.828, competing directly with DeepSeek R1.
- Performance Metrics: Achieves a rapid 40.1 tokens/second output speed with a low latency of 0.54s (TTFT - Time To First Token).
-
API Pricing (Highly Competitive):
- Input tokens: $0.21 per million tokens
- Output tokens: $0.63 per million tokens
- Cost for 1,000 tokens (input + output): $0.00084 total

Performance Comparison: Qwen3-235B-A22B vs. Leading LLMs
💡 Key Capabilities: Empowering Diverse Applications
Qwen3-235B-A22B truly excels in hybrid reasoning, adeptly switching between a detailed thinking mode (/think) for step-by-step problem-solving and a rapid non-thinking mode (/no_think) for quick responses. Its native support for 119 languages ensures seamless global deployment for applications like multilingual chatbots and advanced translation. With its substantial 128K-token context window, it efficiently processes vast datasets, complex codebases, and extensive documents, maintaining high coherence through the use of XML delimiters for structural retention.
- </> Coding Excellence: Outperforms OpenAI’s o1 on LiveCodeBench, supporting over 40 programming languages (e.g., Python, Java, Haskell). It generates, debugs, and refactors complex codebases with exceptional precision.
- 🧠 Advanced Reasoning: Surpasses o3-mini on AIME for mathematics and BFCL for logical reasoning, making it ideal for intricate problem-solving scenarios requiring deep analytical capabilities.
- 🌍 Multilingual Proficiency: Natively handles 119 languages, powering critical cross-lingual tasks such as semantic analysis, content localization, and advanced translation services.
- 🏢 Enterprise Applications: A catalyst for diverse enterprise needs, including biomedical literature parsing, sophisticated financial risk modeling, precise e-commerce intent prediction, and detailed legal document analysis.
- 🤖 Agentic Workflows: Supports advanced features like tool-calling, the Model Context Protocol (MCP), and function calling, enabling the creation of autonomous and highly efficient AI agents.
- ⚙️ API Features: Offers robust API capabilities including streaming output, OpenAI-API compatibility, and structured output generation for seamless real-time integration into existing systems.
🎯 Optimal Use Cases: Where Qwen3-235B-A22B Shines
Qwen3-235B-A22B is specifically engineered for high-complexity enterprise environments demanding deep reasoning, scalability, and multilingual support.
- Software Development: Empower autonomous code generation, advanced debugging, and intelligent refactoring for large-scale projects, leveraging its superior performance on Codeforces and LiveCodeBench.
- Biomedical Research: Accurately parse dense medical literature, structure complex clinical notes, and generate lifelike patient dialogues with high fidelity.
- Financial Modeling: Conduct sophisticated risk analysis, efficiently answer regulatory queries, and summarize financial documents with precise numerical reasoning.
- Multilingual E-commerce: Drive intelligent semantic product categorization, accurate user intent prediction, and deploy highly effective multilingual chatbots across 119 languages.
- Legal Analysis: Facilitate comprehensive multi-document review for regulatory compliance and advanced legal research, utilizing the 128K-token context for unparalleled coherence.
🆚 Comparative Advantage: Qwen3-235B-A22B vs. Competitors
Qwen3-235B-A22B distinguishes itself among leading LLMs through its efficient MoE architecture and superior multilingual capabilities.
- Vs. OpenAI’s o3-mini: Outperforms in math (AIME) and coding (Codeforces), boasting lower latency (0.54s TTFT vs. 0.7s). Offers significantly broader language support (119 vs. ~20 languages).
- Vs. Google’s Gemini 2.5 Pro: Excels in reasoning (BFCL) and coding (LiveCodeBench), with a larger context window (128K vs. 96K tokens) and more efficient inference via its MoE design.
- Vs. DeepSeek R1: Matches MMLU performance (0.828) but surpasses in multilingual tasks and enterprise scalability, all while offering cheaper API pricing.
- Vs. GPT-4.1: Competitive in core coding and reasoning benchmarks, offering distinct advantages with lower operational costs and native 119-language support, a significant contrast to GPT-4.1's predominant English focus.
💻 Code Sample: Integrating Qwen3-235B-A22B
Here's an example of how you might interact with the Qwen3-235B-A22B model via API for a chat completion task:
import openai
client = openai.OpenAI(
base_url="https://api.aliyun.com/v1/qwen3/", # Example API endpoint
api_key="YOUR_ALIYUN_API_KEY",
)
response = client.chat.completions.create(
model="Qwen/Qwen3-235B-A22B-fp8-tput",
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "/think What is the capital of France? Provide a detailed explanation."},
],
temperature=0.7,
max_tokens=200,
)
print(response.choices[0].message.content)
⚠️ Important Limitations
- Accuracy Degradation: Model accuracy may decline when context exceeds 100K tokens.
- Latency in Thinking Mode: Utilizing the "/think" mode will increase response latency; use "/no_think" for faster outputs.
- Access Restriction: Qwen3-235B-A22B is not publicly available; access is exclusively granted via Alibaba Cloud Model Studio.
- License Restrictions: The Qwen License generally restricts commercial use, making it primarily research-focused.
🔗 API Integration Details
Integrating Qwen3-235B-A22B is straightforward through its comprehensive AI/ML API. For detailed technical documentation and API references, please visit the official Alibaba Cloud resources.
❓ Frequently Asked Questions (FAQ)
-
Q: What is the primary advantage of Qwen3-235B-A22B's MoE architecture?
A: The Mixture-of-Experts (MoE) architecture allows the model to activate only 22 billion of its 235 billion parameters per inference, significantly reducing computational costs while maintaining top-tier performance across various tasks.
-
Q: How many languages does Qwen3-235B-A22B support?
A: It natively supports 119 languages, making it highly proficient for multilingual applications like chatbots, translation, and global content analysis.
-
Q: What is the maximum context window for the model?
A: While it natively offers a 32K token context window, it can be extended up to an impressive 128K tokens with YaRN, allowing it to process very large documents and codebases.
-
Q: Is Qwen3-235B-A22B available for public use?
A: No, it is not publicly available. Access is exclusively provided through Alibaba Cloud Model Studio, and its license primarily supports research-focused activities.
-
Q: How does its API pricing compare to other models?
A: Qwen3-235B-A22B offers highly competitive API pricing, with input tokens at $0.21 per million and output tokens at $0.63 per million, resulting in a total cost of $0.00084 for 1,000 tokens (input+output).
Learn how you can transformyour company with AICC APIs



Log in