



const { OpenAI } = require('openai');
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const result = await api.chat.completions.create({
model: 'x-ai/grok-4-fast-non-reasoning',
messages: [
{
role: 'system',
content: 'You are an AI assistant who knows everything.',
},
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
const message = result.choices[0].message.content;
console.log(`Assistant: ${message}`);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="x-ai/grok-4-fast-non-reasoning",
messages=[
{
"role": "system",
"content": "You are an AI assistant who knows everything.",
},
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
message = response.choices[0].message.content
print(f"Assistant: {message}")
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
Grok 4 Fast Non-Reasoning is a specialized variant of xAI's Grok 4 model, meticulously optimized for ultra-high context capacity and rapid text-to-text tasks, specifically designed without advanced reasoning capabilities. It excels at efficiently handling extremely long contexts up to 2,000,000 tokens, delivering fast, deterministic outputs ideal for high-throughput applications where extensive context retention is paramount.
Technical Specification
Performance Benchmarks
- Context Window: 2,000,000 tokens
- Max Output: Variable, optimized for streaming and fast response
- Training Regime: Streamlined for speed and large-context encoding, non-reasoning focused
- Tool Use: Not supported (non-agentic)
Performance Metrics
Grok 4 Fast Non-Reasoning is specifically optimized to handle extremely large context windows up to 2 million tokens, enabling it to process vast amounts of text without losing coherence. While it does not support advanced multi-step reasoning or tool integration, it delivers highly efficient and stable performance in text-to-text generation tasks where context retention over long sequences is critical. Its architecture prioritizes speed and throughput, allowing for rapid response times even with very large inputs. This makes it ideal for applications such as long document summarization, extensive conversational histories, and batch processing where reasoning complexity is not required. The model’s deterministic output further ensures consistent and reliable behavior across repeated requests.
API Pricing
- Input: 0–128k: $0.21; 128k+: $0.42 per 1M tokens
- Output: 0–128k: $0.525; 128k+: $1.05 per 1M tokens
- Cached Input: $0.05 per 1M tokens
✨ Key Capabilities
- Ultra-Long Context Windows: Handles up to 2 million tokens for massive document and multi-document processing.
- Rapid Text-to-Text Generation: Optimized for low-latency, time-sensitive applications.
- Deterministic and Non-Streaming Responses: Ensures stable and consistent output.
- Scalable for API-Driven Environments: Features efficient cached pricing support for cost-effective deployment.
💡 Optimal Use Cases
- Large-scale document summarization and analysis across extensive texts.
- Context-rich text completion for lengthy inputs, maintaining coherence.
- Fast-response conversational AI handling extensive historical dialogues.
- Batch text generation in content pipelines requiring consistent context retention.
Code Sample
// Placeholder for a code sample demonstrating Grok 4 Fast Non-Reasoning API usage
// Example: Python using an imaginary 'xai_client' library
import xai_client
client = xai_client.XAIClient(api_key="YOUR_API_KEY")
def process_long_document(document_text):
response = client.grok_4_fast_non_reasoning.generate(
model="x-ai/grok-4-fast-non-reasoning",
prompt="Summarize the following document concisely:",
context=document_text,
max_tokens=500
)
return response.text
# Sample usage with an extremely long document string
# For production, load from file or database
long_doc_example = "This is an extremely long document text... (up to 2 million tokens)"
summary = process_long_document(long_doc_example)
print(summary)
Comparison with Other Models
vs. Grok 4: Grok 4 Fast Non-Reasoning trades advanced multi-step reasoning and tool integration for vastly expanded context capacity and faster throughput. It is suitable for applications where reasoning is not critical but context scale and speed are essential.
vs. GPT-4o: Grok 4 Fast Non-Reasoning significantly surpasses GPT-4o in maximum context length by nearly an order of magnitude, though it lacks the multimodal and advanced reasoning features available in GPT-4o.
vs. Grok 4 Fast Reasoning: Grok 4 Fast Non-Reasoning offers superior speed and a larger context window but omits the complex reasoning capabilities present in reasoning-enabled variants like Grok 4 Fast Reasoning.
⚠️ Limitations
- Lacks multi-step reasoning and agentic tool use.
- Text-only modality; no vision or audio processing.
- Closed-weight model without local offline inference capabilities.
- Streaming determinism may vary depending on context size.
❓ Frequently Asked Questions
Q: What is Grok 4 Fast Non-Reasoning primarily optimized for?
A: It is optimized for ultra-high context capacity and rapid text-to-text tasks, especially those requiring the processing of extremely long documents and conversational histories without complex reasoning.
Q: How large of a context window can Grok 4 Fast Non-Reasoning handle?
A: This model is designed to handle an exceptionally large context window of up to 2,000,000 tokens, making it suitable for processing vast amounts of text.
Q: Does Grok 4 Fast Non-Reasoning support advanced reasoning or tool use?
A: No, it specifically omits advanced multi-step reasoning capabilities and agentic tool integration to prioritize speed, throughput, and context scale.
Q: What types of applications benefit most from this model?
A: Applications like large-scale document summarization, context-rich text completion, fast-response conversational AI, and batch text generation where context retention over long sequences is crucial and complex reasoning is not required.
Q: How does its pricing compare to other models for large contexts?
A: For contexts over 128k tokens, its input pricing is $0.42 per 1M tokens and output pricing is $1.05 per 1M tokens, offering efficient rates for handling extensive data volumes. Cached input is even more cost-effective at $0.05 per 1M tokens.
Learn how you can transformyour company with AICC APIs



Log in