128K

Out

Chat

disable

Chat GPT 4o mini audio preview

GPT-4o Mini Audio adds speech-to-text and text-to-speech abilities to the efficient GPT-4o Mini model, optimized for voice interfaces in smaller applications.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        import { writeFileSync } from 'node:fs';
import OpenAI from 'openai';

const api = new OpenAI({
  baseURL: 'https://api.ai.cc/v1',
  apiKey: '',
});

const main = async () => {
  const answer = await api.chat.completions.create({
    model: 'gpt-4o-mini-audio-preview',
    modalities: ['text', 'audio'],
    audio: { voice: 'alloy', format: 'wav' },
    messages: [
      {
        role: 'user',
        content: 'Tell me, why is the sky blue?'
      }
    ],
  });

  console.log(answer.choices[0]);

  writeFileSync(
    'answer.wav',
    Buffer.from(answer.choices[0].message.audio.data, 'base64'),
    { encoding: 'utf-8' }
  );
};

main();

                                        import os
from openai import OpenAI

client = OpenAI(
    base_url="https://api.ai.cc/v1",
    api_key="",    
)

response = client.chat.completions.create(
    model="gpt-4o-mini-audio-preview",
    modalities=["text", "audio"],
    audio={"voice": "alloy", "format": "wav"},
    messages=[
        {
            "role": "user",
            "content": "Tell me, why is the sky blue?"
        },
    ],
)

print(response.choices[0])

wav_bytes = base64.b64decode(response.choices[0].message.audio.data)
with open("answer.wav", "wb") as f:
    f.write(wav_bytes)

Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Chat GPT 4o mini audio preview

Product Detail

✨ Introducing GPT-4o Mini Audio: Efficient & Versatile Speech AI

Designed for developers seeking fast, natural, and highly efficient speech applications, GPT-4o Mini Audio offers robust speech input and output capabilities. This cost-effective model significantly lowers the barrier to entry for building voice-driven applications, operating at just 25% of the cost of the full GPT-4o Audio models, making advanced audio AI widely accessible.

Source information derived from: Original GPT-4o Mini Audio Description

💡 Key Capabilities of GPT-4o Mini Audio

💬 Real-Time Voice Interaction: Seamlessly processes and generates both voice and text responses for dynamic conversations.
📦 Lightweight Deployment: Optimized for resource-constrained environments, ensuring broad compatibility.
🌐 Multilingual Audio Support: Advanced speech recognition across 50+ languages.
⚡ Fast Response Time: Engineered for low-latency interactions crucial for real-time applications.
💰 Cost Efficiency: Remarkably budget-friendly, operating at just 25% of the cost of GPT-4o Audio models.

🎯 Intended Use Cases

📱 Voice Assistants on Mobile: Powering low-resource smart agents for seamless mobile experiences.
🧑‍🦯 Accessibility Features: Enhancing user accessibility through advanced voice control and feedback systems.
💡 Embedded IoT Tools: Integrating sophisticated audio AI into smart devices and IoT ecosystems.

⚙️ Technical Deep Dive

Architecture

Derived from the full GPT-4o model through sophisticated model distillation techniques, GPT-4o Mini Audio maintains a robust Transformer-based architecture. It is specifically optimized for audio tasks, incorporating advanced Voice Activity Detection (VAD) layers for precise audio segmentation and processing.

Training Data

The model leverages a vast and diverse training dataset, including:

Comprehensive multilingual speech corpora.
Synthetic voice data covering various accents and tones to enhance robustness.
Extensive publicly available audiobooks, podcasts, and conversational datasets.

This training data comprises hundreds of hours of high-quality audio recordings combined with billions of text tokens, ensuring robust multimodal performance.

Knowledge Cutoff

The model's knowledge base is current up to October 2023. It is optimized for static datasets and does not possess real-time web search capabilities.

📈 Performance Benchmarks

Accuracy

GPT-4o Mini Audio demonstrates high-rate performance across key metrics:

Speech-to-Text Transcription: Achieves a low Word Error Rate (WER) of 6.5%.
Text-to-Audio Synthesis: Delivers high fidelity and natural intonation scores, exceeding 92%.

Speed

It efficiently processes asynchronous audio tasks with an average latency of 420 milliseconds per second of input audio, making it highly suitable for near-real-time applications.

Robustness

The model effectively handles diverse accents, dialects, and noisy environments. However, it may exhibit reduced accuracy when confronted with highly specialized jargon or in low-resource languages.

🚀 Integration & Usage

Code Samples

GPT-4o Mini Audio is readily available on the AI/ML API platform under the identifier "gpt-4o-mini-audio".

API Documentation

For comprehensive guidelines and integration details, refer to the detailed API Documentation available on the AI/ML API website.

⚖️ Ethical Considerations & Licensing

Ethical Guidelines

OpenAI has diligently incorporated ethical considerations throughout the model's development, with a strong focus on safety and bias mitigation. While the model integrates OpenAI’s bias mitigation framework, it's important to note that it may still reflect biases inherent in its training data sources, particularly concerning underrepresented languages or accents.

Licensing

GPT-4o Mini Audio is available under commercial usage rights, enabling businesses and developers to seamlessly integrate the model into their applications and services.

❓ Frequently Asked Questions (FAQs)

Q: What is GPT-4o Mini Audio?

A: GPT-4o Mini Audio is a highly cost-effective and efficient version of GPT-4o Audio, designed for quick, low-resource speech applications with real-time audio input and output capabilities. It costs only 25% of the full GPT-4o Audio models.

Q: How does GPT-4o Mini Audio compare in cost?

A: It operates at a significantly lower cost, specifically 25% of the price of the full GPT-4o Audio models, making advanced audio AI more accessible for budget-conscious projects.

Q: What are the primary use cases for this model?

A: Ideal for mobile voice assistants, accessibility features (voice control), and embedded AI in IoT devices due to its lightweight and efficient nature.

Q: Does it support multiple languages?

A: Yes, GPT-4o Mini Audio features robust multilingual audio support, offering speech recognition in over 50 languages.

Q: What is the knowledge cutoff for GPT-4o Mini Audio?

A: Its knowledge base is current up to October 2023. It is optimized for static datasets and does not have real-time web search capabilities.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

One API
300+ AI Models

Save 20% on Costs

Free $1 Tokens for New Members