



const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'openai/gpt-oss-20b',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
The GPT OSS 20B is an innovative open-weight language model developed by OpenAI, specifically optimized for efficient, local, and specialized AI use cases. It boasts strong reasoning and coding capabilities. This model achieves an excellent balance of high performance and low latency, making it exceptionally well-suited for edge devices and applications that demand rapid iteration or lower computational requirements. Designed with agentic workflows in mind, it provides robust support for chain-of-thought reasoning, function calling, and Python code execution, complete with customizable reasoning effort and structured output capabilities.
🚀 Technical Specifications
- ✔️ Model Size: 20 billion total parameters, with 3.6 billion active parameters during inference.
- ✔️ Compatibility: Engineered to run efficiently within 16GB memory, prioritizing lower latency and local deployment.
- ✔️ Architecture: A text-only model that demonstrates superior instruction following and sophisticated tool usage.
📊 Performance Benchmarks
- 💡 Comparable Performance: Achieves performance levels comparable to OpenAI’s proprietary o3-mini model across numerous reasoning and coding tasks.
- 💡 Efficient Deployment: Highly efficient for deployment on consumer-grade hardware and diverse edge devices.
- 💡 Advanced Learning: Excels in few-shot learning scenarios, complex multi-step reasoning, and robust tool integration.
💰 API Pricing
- 💲 Input Tokens: $0.033233 per million tokens
- 💲 Output Tokens: $0.153248 per million tokens
✨ Key Capabilities
- 🧠 Advanced Reasoning: Offers configurable reasoning effort levels (low, medium, high) to optimally balance accuracy with latency.
- 🤖 Agentic Features: Seamlessly supports function calling, web browsing, code execution, and structured outputs within sophisticated workflows.
- 💻 Code Generation: Highly effective at both producing and editing code across a wide array of programming languages.
- ⚡ Lightweight Deployment: Engineered for efficient operation in resource-constrained environments with modest hardware requirements.
🎯 Optimal Use Cases
- 📱 On-device AI: Perfect for applications demanding lightweight yet powerful AI models directly on edge devices.
- 🔄 Rapid Experimentation: Facilitates swift experimentation and iteration in coding and analytical tasks.
- 🛠️ Flexible Integration: Ideal for applications that benefit from adaptable reasoning depth and comprehensive tool integration.
- 🔒 Local/Offline Deployments: An excellent choice for scenarios prioritizing privacy and local data control.
💻 Code Sample
// Example API call using GPT OSS 20B via OpenAI's API client
import openai
client = openai.OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.openai.com/v1/", # Or your custom endpoint for GPT OSS 20B
)
try:
chat_completion = client.chat.completions.create(
model="openai/gpt-oss-20b",
messages=[
{"role": "system", "content": "You are a helpful and concise assistant."},
{"role": "user", "content": "Explain the concept of machine learning in one sentence."}
],
temperature=0.7,
max_tokens=50
)
print(chat_completion.choices[0].message.content)
except Exception as e:
print(f"An error occurred: {e}")
Note: This expanded Python code example illustrates a typical API call for GPT OSS 20B, replacing the original generic snippet placeholder. Ensure your API key and base_url are correctly configured.
⚖️ Comparison with Other Models
vs GPT OSS 120B: GPT OSS 20B operates efficiently with limited hardware (16GB memory), making it ideal for local and rapid deployment with robust reasoning and coding. In contrast, GPT OSS 120B, with its significantly larger capacity (120B parameters), provides higher accuracy and is engineered for large-scale, high-compute tasks.
vs OpenAI o3-mini: GPT OSS 20B demonstrates performance comparable to the proprietary o3-mini model. Its key differentiators are open-weight access and flexible configuration, offering significant benefits to researchers and developers who prioritize transparency and customization.
vs GLM-4.5: While GLM-4.5 may outperform GPT OSS 20B in specific practical coding challenges and advanced tool integration, GPT OSS 20B maintains strong competitiveness in general reasoning tasks and offers easier deployment on hardware with limited resources.
⚠️ Limitations and Considerations
- ❗ Complexity Limit: While more cost-effective than larger models, it is less powerful than GPT OSS 120B for extremely complex tasks.
- ❗ Prompt Design: Optimal outputs are best achieved through explicit and well-designed prompt engineering.
- ❗ Hardware Dependency: Overall performance and latency are directly influenced by the underlying hardware capabilities and the size of the input.
- ❗ Production Safeguards: Due to its open-weight nature, enterprises should implement additional safeguards for production safety, security, and compliance.
❓ Frequently Asked Questions (FAQs)
GPT OSS 20B is an open-weight language model optimized for efficient, local, and specialized AI use cases, particularly excelling in reasoning and coding tasks. It's built for scenarios requiring a balance of high performance and low latency, especially on edge devices.
The model is optimized to run efficiently within 16GB of memory, making it accessible for deployment on consumer-grade hardware and various edge devices without demanding extensive computational resources.
It offers robust support for agentic features, including configurable chain-of-thought reasoning, reliable function calling, web browsing, Python code execution, and the ability to generate structured outputs within complex automated workflows.
While powerful for many applications, GPT OSS 20B is less capable than much larger models like GPT OSS 120B for extremely complex, large-scale tasks. It shines brightest in resource-constrained environments where efficiency and local deployment are key.
The open-weight nature of GPT OSS 20B provides developers and researchers with full access and flexibility for customization and transparency. This benefits those requiring deep insights into model internals, flexible configurations, and the ability to integrate it into proprietary systems with enhanced control.
Learn how you can transformyour company with AICC APIs



Log in