



const fs = require('fs');
const path = require('path');
const axios = require('axios').default;
const api = new axios.create({
baseURL: 'https://api.ai.cc/v1',
headers: { Authorization: 'Bearer ' },
});
const main = async () => {
const response = await api.post(
'/tts',
{
model: 'minimax/speech-2.8-hd',
text: 'Hi! What are you doing today?',
voice_setting: {
voice_id: 'Wise_Woman'
}
},
{ responseType: 'stream' },
);
const dist = path.resolve(__dirname, './audio.wav');
const writeStream = fs.createWriteStream(dist);
response.data.pipe(writeStream);
writeStream.on('close', () => console.log('Audio saved to:', dist));
};
main();
import os
import requests
def main():
url = "https://api.ai.cc/v1/tts"
headers = {
"Authorization": "Bearer ",
}
payload = {
"model": "minimax/speech-2.8-hd",
"text": "Hi! What are you doing today?",
"voice_setting": {
"voice_id": 'Wise_Woman'
}
}
response = requests.post(url, headers=headers, json=payload, stream=True)
dist = os.path.join(os.path.dirname(__file__), "audio.wav")
with open(dist, "wb") as write_stream:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
write_stream.write(chunk)
print("Audio saved to:", dist)
main()

Speech 2.8 HD
MiniMax Speech 2.8 HD is a high-definition text-to-speech model built for scenarios where audio quality, tonal depth, and realism are the top priorities.
What Is MiniMax Speech 2.8 HD API?
MiniMax Speech 2.8 HD is the high-fidelity variant of the Speech 2.8 series, designed to produce broadcast-quality audio with rich timbre and expressive nuance. Instead of optimizing for speed, it emphasizes clarity, consistency, and depth across longer audio segments.
The model is based on an autoregressive Transformer architecture combined with a Flow-VAE decoder, enabling more detailed waveform generation and smoother transitions between phonemes and phrases. It has also performed strongly in blind listening evaluations, where users consistently rated its output as more natural compared to competing systems.
Performance Overview
API Pricing
- $130 per 1M characters
Core Capabilities
High-Fidelity Voice Rendering
The defining strength of the HD model is its ability to reproduce subtle vocal characteristics, including breath, emphasis, and tonal variation. Speech feels less compressed and more spatially consistent, which is particularly noticeable in long-form narration.
Expressive Emotion Control
Emotion is deeply integrated into the synthesis process. Instead of simply adjusting tone superficially, the model modifies prosody, pacing, and emphasis to reflect emotional intent such as calm, happy, or dramatic delivery.
Voice Cloning and Identity Consistency
The system supports voice cloning using short reference samples, allowing it to recreate a consistent voice identity across different scripts. Even with minimal input, it maintains recognizable vocal traits, improving continuity in serialized content.
Multilingual Speech Generation
MiniMax Speech 2.8 HD supports 30+ languages, maintaining pronunciation accuracy and tonal consistency across linguistic variations.
Voice Control and Audio Customization
Fine-Grained Speech Parameters
The model provides predictable control over delivery characteristics. Speed, pitch, and volume can be adjusted within wide ranges while preserving natural articulation.
Structured Pauses and Timing
Custom pause markers allow precise control over pacing. This is particularly useful in narration, where rhythm and timing directly affect listener engagement.
Multiple Output Formats
Audio can be generated in formats such as WAV, MP3, FLAC, or PCM, with configurable bitrate and sampling rates.
Natural Speech Details
Human-Like Interjections
MiniMax Speech 2.8 HD supports embedded vocal cues such as laughter, sighs, or breathing sounds. These are not layered effects but are generated as part of the speech itself, making them feel cohesive rather than artificial.
Consistent Long-Form Delivery
Unlike many TTS systems that degrade over longer passages, this model maintains stable tone and pacing across extended text, which is critical for audiobooks and podcasts.
Feature Breakdown
Use Cases
Audiobooks and Long-Form Narration
MiniMax Speech 2.8 HD is particularly effective for audiobook production, where maintaining consistent tone over long durations is essential. The model avoids fatigue-like degradation and keeps delivery stable from start to finish.
Professional Voiceovers
For marketing videos, corporate content, or branded media, the model produces audio that aligns closely with studio-recorded quality, reducing the need for post-processing.
Podcast and Media Production
The clarity and depth of the generated voice make it suitable for podcast workflows, especially when consistency and scheduling flexibility are required.
Accessibility and Assistive Audio
High intelligibility and natural pacing improve the listening experience for accessibility applications, particularly for extended sessions.
HD vs Turbo: Key Differences
AI Playground



Log in