



import { writeFileSync } from 'node:fs';
import OpenAI from 'openai';
const api = new OpenAI({
baseURL: 'https://api.ai.cc/v1',
apiKey: '',
});
const main = async () => {
const answer = await api.chat.completions.create({
model: 'openai/gpt-audio-mini',
modalities: ['text', 'audio'],
audio: { voice: 'alloy', format: 'wav' },
messages: [
{
role: 'user',
content: 'Tell me, why is the sky blue?'
}
],
});
console.log(answer.choices[0]);
writeFileSync(
'answer.wav',
Buffer.from(answer.choices[0].message.audio.data, 'base64'),
{ encoding: 'utf-8' }
);
};
main();
import os
from openai import OpenAI
client = OpenAI(
base_url="https://api.ai.cc/v1",
api_key="",
)
response = client.chat.completions.create(
model="openai/gpt-audio-mini",
modalities=["text", "audio"],
audio={"voice": "alloy", "format": "wav"},
messages=[
{
"role": "user",
"content": "Tell me, why is the sky blue?"
},
],
)
print(response.choices[0])
wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
with open("answer.wav", "wb") as f:
f.write(wav_bytes)
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
🚀 Introducing GPT Audio Mini API: Real-Time Speech Synthesis for Modern Applications
The GPT Audio Mini is a cutting-edge, lightweight variant within the GPT Audio family, specifically designed for highly efficient, low-latency speech generation. This powerful model is perfectly suited for real-time interactive applications, including advanced voice assistants, intelligent chatbots, and dictation software, where instantaneous responsiveness and minimal resource consumption are paramount. It expertly balances high-quality audio output with exceptional speed, making it an ideal solution for deployment on edge devices or in services with restricted computational capabilities.
⚙️ Technical Specifications
- Model Type: Lightweight autoregressive neural TTS (Text-to-Speech) model
- Parameter Count: Approximately 100 million parameters
- Input Modalities: Text input sequences
- Output Modalities: Audio waveform generation
- Sampling Rate: 24 kHz standard output quality
- Latency: Average response time under 100 ms on typical edge devices
- Supported Languages: English (primary), with planned multilingual support
- Model Architecture: Modified transformer-based encoder-decoder
- Hardware Compatibility: CPU and GPU optimized for inference on mainstream consumer devices
📊 Performance Benchmarks
- Speech Naturalness: MOS (Mean Opinion Score) around 4.1/5 in user tests
- Latency Comparison: 30-40% faster than full-scale GPT-Audio on standard hardware
- Resource Usage: Operates at 50-60% lower RAM consumption than the GPT-Audio base model
- Robustness: Maintains intelligibility with up to 15 dB background noise
✨ Key Features of GPT Audio Mini
- Low-Latency Speech Synthesis: Optimized architecture ensures minimal delay for real-time interaction.
- Resource-Efficient Design: Engineered for low power consumption and a reduced memory footprint, perfect for constrained environments.
- Versatile Voice Generation: Capable of producing natural-sounding speech across diverse styles and contexts.
- Compact Model Size: Facilitates easy integration into lightweight environments and mobile platforms.
- Robust in Noisy Scenarios: Maintains exceptional clarity and intelligibility even under challenging acoustic conditions.
- Customizable Voice Outputs: Allows for fine-tuning to align with specific brand voices or application-specific requirements.
💰 GPT Audio Mini API Pricing
- Input: $10.50 / 1M audio tokens; $0.63 / 1M tokens (text input)
- Output: $21.00 / 1M output; $2.52 / 1M tokens (audio output)
💡 Common Use Cases
- Voice Assistants: Enabling responsive and natural voice replies with minimal delays.
- Customer Support Bots: Delivering clear and engaging speech synthesis for call centers and online chat platforms.
- Dictation Applications: Providing real-time transcription-to-speech feedback for enhanced user experience.
- Interactive Educational Tools: Generating dynamic speech output for tutoring or language learning programs.
- Accessibility Tools: Powering assistive technologies for users with visual or motor impairments.
- IoT Devices: Integrating voice-enabled capabilities into smart devices with constrained hardware resources.
💻 Code Sample
🆚 Comparison with Other Leading Models
vs GPT-4o Mini TTS: While GPT-4o Mini TTS offers enhanced control over intonation and style with voice print decoupling, resulting in slightly more natural and expressive speech, GPT Audio Mini is specifically optimized for a slightly faster response time and a smaller memory footprint, making it ideal for edge computing.
vs OpenAI TTS-1: GPT Audio Mini significantly outperforms TTS-1 in generation speed and maintains a higher overall speech naturalness. While TTS-1 aims for fast synthesis, GPT Audio Mini combines speed with improved audio clarity, making it more suitable for demanding interactive voice assistant applications.
vs OpenAI Whisper: OpenAI Whisper excels in multi-language support and transcription accuracy, rather than low-latency synthesis. GPT Audio Mini is tailored for interactive scenarios requiring rapid voice generation, with a primary focus on English and upcoming multilingual features.
vs ElevenLabs Turbo: ElevenLabs Turbo prioritizes speed but relies exclusively on cloud inference and lacks offline support. GPT Audio Mini delivers comparable quality while offering full on-device privacy and superior cross-platform portability.
❓ Frequently Asked Questions (FAQ)
Q: What is the primary purpose of GPT Audio Mini?
A: GPT Audio Mini is engineered for efficient, low-latency speech generation, targeting real-time interactive applications like voice assistants and chatbots where responsiveness and resource economy are crucial.
Q: How does GPT Audio Mini achieve low latency?
A: It utilizes an optimized architecture that minimizes processing delays, resulting in an average response time of under 100 milliseconds on typical edge devices.
Q: Is GPT Audio Mini suitable for devices with limited resources?
A: Yes, it's designed to be resource-efficient, operating with 50-60% lower RAM consumption than the base GPT-Audio model, making it ideal for edge deployments and IoT devices.
Q: Can GPT Audio Mini be customized for specific voice styles?
A: Absolutely. It offers customizable voice outputs, allowing fine-tuning to match brand voices or specific application needs.
Q: What languages does GPT Audio Mini support?
A: Currently, it primarily supports English, with plans for expanded multilingual support in future updates.
Learn how you can transformyour company with AICC APIs



Log in