256K

Out

Chat

disable

Qwen3-Next-80B-A3B Thinking

It supports multi-token prediction and large context windows (up to 1 million tokens), enabling efficient real-time reasoning and interactive applications.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const { OpenAI } = require('openai');

const api = new OpenAI({
  baseURL: 'https://api.ai.cc/v1',
  apiKey: '',
});

const main = async () => {
  const result = await api.chat.completions.create({
    model: 'alibaba/qwen3-next-80b-a3b-thinking',
    messages: [
      {
        role: 'system',
        content: 'You are an AI assistant who knows everything.',
      },
      {
        role: 'user',
        content: 'Tell me, why is the sky blue?'
      }
    ],
  });

  const message = result.choices[0].message.content;
  console.log(`Assistant: ${message}`);
};

main();

                                        import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ai.cc/v1",
    api_key="",    
)

response = client.chat.completions.create(
    model="alibaba/qwen3-next-80b-a3b-thinking",
    messages=[
        {
            "role": "system",
            "content": "You are an AI assistant who knows everything.",
        },
        {
            "role": "user",
            "content": "Tell me, why is the sky blue?"
        },
    ],
)

message = response.choices[0].message.content

print(f"Assistant: {message}")

Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Qwen3-Next-80B-A3B Thinking

Product Detail

✨ Introducing Qwen3-Next-80B-A3B Thinking: Your Advanced Reasoning AI

Overview

The Qwen3-Next-80B-A3B Thinking model stands as a premier reasoning-focused chat AI, specifically engineered for intricate multi-step problem solving and advanced chain-of-thought tasks. It natively generates structured "thinking" traces, making it exceptionally proficient in domains demanding profound analytical reasoning, such as complex mathematical proofs, robust code synthesis, logical deduction, and sophisticated agentic planning.

💡 Technical Specifications

Qwen3-Next-80B-A3B Thinking is an advanced language model boasting 80 billion parameters. A key innovation is its sparse Mixture of Experts (MoE) architecture, which ensures only 3 billion parameters are actively engaged per token. This design facilitates remarkable efficiency.

⚙️ Architecture: 48 layers with a 2048-hidden dimension, employing a hybrid design with gating mechanisms and advanced normalization (RMSNorm).
📖 Context Window: Supports an expansive 262K tokens, extensible up to 1 million tokens with specialized scaling methods for superior long-context understanding.
⚡ Efficiency: Trained with resource-efficient hybrid strategies, it delivers high performance in complex reasoning, math, coding, and multi-step problem solving, while maintaining low inference costs and high throughput.

📈 Performance Benchmarks

MMLU (General Knowledge)

78.5%

HumanEval (Code Generation)

82.1%

GSM8K (Mathematics)

91.2%

MT-Bench (Instruction Following)

84.3%

💰 API Pricing

Input:

$0.1575

Output:

$1.6

🚀 Key Features

🧠 Thinking Mode Optimization: Specifically designed for chain-of-thought and complex problem-solving, producing longer, more detailed output traces for enhanced transparency.
✅ Sparse Activation: Activates only 3 billion of 80 billion parameters per token, enabling rapid inference and significant cost efficiency.
⚡ Multi-token Prediction: Accelerates the decoding process by predicting multiple tokens concurrently, boosting output speed.
🔗 Stable Long-form Reasoning: Engineered for unwavering stability across extended reasoning chains and intricate instructions.
🤖 Agent Integration: Fully supports function calling and seamless integration into agent frameworks demanding step-by-step analytical solutions.
🌐 Multilingual & Multimodal: Offers strong multilingual understanding and supports diverse reasoning tasks across various languages and modalities internationally.

🎯 Use Cases

🔬 Scientific Research: Ideal for deep hypothesis generation and complex data analysis.
💻 Engineering & Mathematics: Excels in problem-solving, proofs, and sophisticated code synthesis/debugging.
⚖️ Legal Analysis: Supports detailed legal case analysis and structured argument construction.
📊 Financial & Business: Aids in financial risk modeling and strategic business planning with transparent decision steps.
⚕️ Medical Diagnosis Assistance: Provides reasoning transparency and detailed explanations for diagnostic support.
📄 Long-context Document Analysis: Perfect for document analysis and retrieval-augmented workflows requiring deep context.

Coded Example

Below is a representation of how to interact with the Qwen3-Next-80B-A3B Thinking model via API (e.g., OpenAI-compatible chat completion). Actual implementation details may vary based on your environment.

<!-- Placeholder for actual API code sample --> <snippet data-name="open-ai.chat-completion" data-model="alibaba/qwen3-next-80b-a3b-thinking"></snippet> <!-- In a real application, this would be client-side code (e.g., Python/JavaScript) making an API call. -->

↔️ Comparison with Other Leading Models

Vs. Qwen3-32B

Qwen3-Next-80B-A3B activates only 3 billion parameters per token, contrasting with Qwen3-32B's full activation. This makes Qwen3-Next about 10 times more efficient in both training and inference costs. Additionally, it delivers over 10 times faster output speed in long-context scenarios (beyond 32K tokens) while achieving superior accuracy on reasoning and complex tasks.

Vs. Qwen3-235B

Despite having significantly fewer active parameters, Qwen3-Next-80B-A3B remarkably approaches the performance levels of the much larger 235 billion parameter Qwen3-235B, especially in instruction following and long-context reasoning. It strikes an excellent balance between compute efficiency and high model quality, making it highly suitable for production environments.

Vs. Google Gemini-2.5-Flash-Thinking

The Qwen3-Next-80B-A3B Thinking variant demonstrably outperforms Google Gemini-2.5-Flash-Thinking in critical areas like chain-of-thought reasoning and multi-turn instruction tasks. This superior performance comes with substantially lower operational costs, attributed to its sparse activation and multi-token prediction capabilities.

Vs. Llama 3.1-70B

Qwen3-Next-80B-A3B offers enhanced long-range context understanding and superior reasoning stability at much larger context windows (scalable up to 1 million tokens), significantly surpassing Llama 3.1-70B's native window limitations. Its sparse MoE architecture also grants it superior efficiency at scale.

❓ Frequently Asked Questions (FAQ)

Q1: What makes Qwen3-Next-80B-A3B Thinking unique for reasoning tasks?

A1: It's specifically designed with a "Thinking Mode" optimization for complex multi-step problem solving and chain-of-thought, generating structured reasoning traces by default. Its sparse MoE architecture also ensures efficiency without compromising deep analytical capabilities.

Q2: How does the sparse MoE architecture benefit this model?

A2: The sparse Mixture of Experts (MoE) architecture means only 3 billion of its 80 billion parameters are active per token. This significantly reduces inference costs, boosts processing speed, and maintains high throughput, especially for complex reasoning tasks.

Q3: What is the maximum context window supported by Qwen3-Next-80B-A3B Thinking?

A3: The model natively supports an extensive context window of 262K tokens, and with specialized scaling methods, it can be extended up to an impressive 1 million tokens, enabling superior long-context understanding.

Q4: Can Qwen3-Next-80B-A3B Thinking be integrated into agent systems?

A4: Yes, it fully supports function calling and is designed for seamless integration into agent frameworks that require precise, step-by-step analytical solutions.

Q5: How does its performance compare to other large language models like Llama 3.1-70B?

A5: Qwen3-Next-80B-A3B Thinking offers better long-range context understanding and reasoning stability across significantly larger context windows (up to 1 million tokens) compared to Llama 3.1-70B. Its sparse MoE architecture also provides superior efficiency at scale.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

One API
300+ AI Models

Save 20% on Costs

Free $1 Tokens for New Members