



const axios = require('axios').default;
const api = new axios.create({
baseURL: 'https://api.ai.cc/v1',
headers: { Authorization: 'Bearer ' },
});
const main = async () => {
const response = await api.post('/stt', {
model: 'openai/gpt-4o-mini-transcribe',
url: 'https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3',
});
console.log('[transcription]', response.data.results.channels[0].alternatives[0].transcript);
};
main();
import requests
headers = {"Authorization": "Bearer "}
def main():
url = f"https://api.ai.cc/v1/stt"
data = {
"model": "openai/gpt-4o-mini-transcribe",
"url": "https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3",
}
response = requests.post(url, json=data, headers=headers)
if response.status_code >= 400:
print(f"Error: {response.status_code} - {response.text}")
else:
response_data = response.json()
transcript = response_data["results"]["channels"][0]["alternatives"][0][
"transcript"
]
print("[transcription]", transcript)
if __name__ == "__main__":
main()
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
🎙️ Introducing GPT-4o Mini Transcribe API
The GPT-4o Mini Transcribe API from OpenAI is a groundbreaking speech-to-text model engineered for exceptional accuracy and unparalleled efficiency. As a lighter, faster iteration of the full GPT-4o Transcribe model, it is specifically optimized for low latency and reduced resource consumption, all while maintaining superior transcription quality. This API is an ideal solution for developers seeking rapid and reliable speech recognition across diverse and challenging acoustic environments.
⚙️ Technical Specifications
- Model Type: Speech-to-text transcription model
- Architecture Basis: Built on GPT-4o-mini architecture, pretrained on specialized audio-centric datasets
- Token Context Window: Supports long audio inputs with up to 16,000 tokens context window
- Maximum Output Tokens: Up to 2,000 tokens per transcription output
- Training Data: Diverse, high-quality audio datasets including various accents, noise conditions, and speech speeds
- Training Techniques: Supervised fine-tuning and reinforcement learning to minimize word error rate and hallucinations
📊 Performance Benchmarks
- Word Error Rate (WER): Significantly improved compared to earlier Whisper models and similar baselines
- Reliability: Performs robustly in noisy environments, with diverse accents, and varying speech speeds
- Language Recognition: Enhanced accuracy and language understanding capabilities across multiple languages
✨ Key Features
- Efficiency: A lightweight model offering rapid inference times for quick transcription turnaround.
- Robustness: Excellently handles challenging audio inputs, including background noise, various accents, and speech variations.
- Scalability: Capable of transcribing lengthy audio inputs without losing context, thanks to its generous 16,000 token context window.
- Streaming Capability: Provides support for continuous audio streaming and real-time transcription.
- Customizable Integration: Designed for seamless integration into various applications such as voice agents, call centers, transcription services, and meeting management tools.
💸 GPT-4o Mini Transcribe API Pricing
Cost: $0.63 per 1M input tokens
🎯 Practical Use Cases
- Customer Service: Call transcription and analytics for improved service and insights.
- Productivity: Automated note-taking for meetings and conferences.
- Voice Assistants: Powering voice assistant and voice agent transcription capabilities.
- Specialized Transcription: Services for legal and medical dictation.
💻 Code Sample
⚖️ Comparison with Other Models
vs. GPT-4o Transcribe
The GPT-4o Mini Transcribe excels in low-latency applications where speed is paramount. In contrast, the full GPT-4o Transcribe model is better suited for accuracy-critical environments like legal or medical transcription, where even minor errors can have significant implications.
vs. OpenAI Whisper-Large
GPT-4o Mini Transcribe demonstrates superior performance over Whisper-Large in terms of Word Error Rate (WER) and streaming latency. This advantage is largely attributed to its advanced reinforcement learning techniques and specialized audio training. While Whisper is a more general-purpose model, it typically exhibits slower processing and reduced precision when confronted with noisy audio or accented speech.
vs. Eleven Labs Scribe
Both models are highly capable in streaming transcription. According to some third-party tests, Eleven Labs Scribe may match or slightly exceed GPT-4o Mini Transcribe in certain accuracy benchmarks. However, GPT-4o Mini's speed and its seamless integration within OpenAI’s extensive ecosystem remain significant competitive advantages.
❓ Frequently Asked Questions (FAQ)
Q1: What is GPT-4o Mini Transcribe API designed for?
A: It's designed for highly accurate and efficient speech-to-text transcription, optimized for low latency and reduced resource consumption, making it ideal for real-time applications and developers needing quick, reliable audio processing.
Q2: How does it compare to the full GPT-4o Transcribe model?
A: GPT-4o Mini Transcribe prioritizes speed and efficiency for low-latency uses, while the full GPT-4o Transcribe focuses on maximum accuracy for critical applications like legal or medical transcription.
Q3: Can GPT-4o Mini Transcribe handle noisy audio or different accents?
A: Yes, it is built with robust capabilities to perform reliably in challenging acoustic environments, effectively handling background noise, diverse accents, and varying speech speeds.
Q4: What are the primary use cases for this API?
A: Key use cases include customer service call transcription and analytics, meeting and conference note-taking, powering voice assistants, and specialized services like legal and medical dictation.
Q5: Is streaming transcription supported?
A: Absolutely. GPT-4o Mini Transcribe supports continuous audio streaming and provides real-time transcription capabilities.
Learn how you can transformyour company with AICC APIs



Log in