



const fs = require('fs');
const path = require('path');
const axios = require('axios').default;
const api = new axios.create({
baseURL: 'https://api.ai.cc/v1',
headers: { Authorization: 'Bearer ' },
});
const main = async () => {
const response = await api.post(
'/tts',
{
model: 'elevenlabs/eleven_multilingual_v2',
text: 'Hi! What are you doing today?',
voice: 'Alice'
},
{ responseType: 'stream' },
);
const dist = path.resolve(__dirname, './audio.wav');
const writeStream = fs.createWriteStream(dist);
response.data.pipe(writeStream);
writeStream.on('close', () => console.log('Audio saved to:', dist));
};
main();
import os
import requests
def main():
url = "https://api.ai.cc/v1/tts"
headers = {
"Authorization": "Bearer ",
}
payload = {
"model": "elevenlabs/eleven_multilingual_v2",
"text": "Hi! What are you doing today?",
"voice": "Alice"
}
response = requests.post(url, headers=headers, json=payload, stream=True)
dist = os.path.join(os.path.dirname(__file__), "audio.wav")
with open(dist, "wb") as write_stream:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
write_stream.write(chunk)
print("Audio saved to:", dist)
main()
-
AI Playground

Test all API models in the sandbox environment before you integrate.
We provide more than 300 models to integrate into your app.


Product Detail
Introducing Eleven Multilingual v2, a groundbreaking AI model meticulously designed to achieve unparalleled excellence in multilingual understanding, generation, and translation tasks. This robust system offers extensive language support, delivering content with extraordinary fidelity and acute context awareness.
🔧 Technical Specifications & Performance Benchmarks
Eleven Multilingual v2 sets new industry standards for AI-driven language processing. Its powerful technical foundation ensures high-quality, reliable output across all supported languages:
- ✅ Naturalness (MOS): Achieves an impressive 4.7/5.0 Mean Opinion Score across diverse languages, indicating highly natural-sounding speech.
- ✅ Intelligibility: Ensures >98% word accuracy in all supported languages, guaranteeing clear and easily understandable audio.
- ✅ Voice Similarity (Embedding Distance): Maintains a low 0.22 average cosine distance (lower values signify more human-like voice replication), for consistent voice cloning.
- ✅ Language Accuracy: Delivers 95–98% native-level pronunciation across key languages, meticulously capturing cultural nuances and accents.
💡 Key Capabilities of Eleven Multilingual v2
- Natural Multilingual Speech: Generates fluent, culturally appropriate speech with native-like rhythm and accent, ensuring your content resonates authentically across global audiences.
- Expressive Voice Control: Easily adjust tone, emotion (e.g., happy, sad, excited), and emphasis through simple text prompts or API parameters for dynamic and engaging storytelling.
- Real-Time Streaming: Supports low-latency streaming, making it perfect for interactive applications such as intelligent voice assistants, real-time gaming, and live content generation.
- Custom Voice Creation: Enables the creation of unique, branded, or cloned voices with minimal training data, offering unparalleled personalization and brand consistency.
💰 Flexible & Transparent Pricing
Experience premium multilingual speech synthesis for just $0.189 per 1,000 characters!
Cost-effective solutions tailored for all your multilingual voice needs.
🌍 Optimal Use Cases for Eleven Multilingual v2
Unlock new possibilities across various industries and applications by leveraging the power of Eleven Multilingual v2:
- 🎦 Global Content Localization: Effortlessly translate and voice-over videos, e-learning modules, and applications in numerous languages with natural, authentic voices.
- 🤖 Interactive AI Agents: Empower multilingual chatbots, virtual assistants, and customer service avatars to communicate fluently and empathetically across language barriers.
- 🎧 Audiobooks & Podcasts: Generate expressive, long-form narration in multiple languages, significantly enriching the listener's experience.
- 🎮 Gaming & Animation: Provide dynamic, real-time voice lines for characters, enhancing immersion and expanding your global game reach.
- 💻 Accessibility Tools: Deliver high-quality screen readers and voice-based interfaces, making digital content widely accessible for visually impaired users.
💻 Code Sample (Integration Reference)
For developers, integrating Eleven Multilingual v2 is designed to be straightforward. Here’s a typical reference for how the model can be invoked:
<snippet data-name="voice.tts-elevenlabs" data-model="elevenlabs/eleven_multilingual_v2"></snippet>
🔄 How Eleven Multilingual v2 Stands Out from Competitors
Eleven Multilingual v2 distinguishes itself with several key advantages over other leading text-to-speech models:
- Vs. Google WaveNet (Multilingual): Offers superior expressiveness (4.7 vs. 4.3 MOS), provides broader language support (29+ vs. 15), and delivers enhanced voice cloning capabilities.
- Vs. Amazon Polly (Neural): Boasts higher naturalness and a wider emotional range; supports more languages and real-time streaming with significantly lower latency.
- Vs. Microsoft Azure Neural TTS: Exhibits more consistent prosody in low-resource languages; features faster inference speeds and simpler API integration for developers.
- Vs. Meta’s MMS-TTS: Provides superior audio fidelity and advanced voice customization options; commercially licensed for broad deployment, ensuring versatility.
⚠️ Important Considerations (Limitations)
While Eleven Multilingual v2 is highly advanced, users should be aware of certain operational limitations:
- Language Switching: Issues may arise with accent bleeding during rapid language switching within very long content, potentially leading to inconsistent pronunciation.
- Variable Processing Time: The processing time for speech synthesis can fluctuate depending on the specific language used and complexity of the text.
- Uneven Audio Quality: There might be slight variations in overall audio quality across the extensive range of supported languages.
- Character Limit: The model supports a maximum of 10,000 characters per request, which may impose constraints on extremely long, single-request speech synthesis tasks.
❓ Frequently Asked Questions (FAQ)
What is Eleven Multilingual v2 and what advancements does it offer?
Eleven Multilingual v2 is an advanced AI text-to-speech model that generates highly natural, expressive speech across multiple languages. Key advancements include improved voice quality, expanded language support, enhanced emotional expression, and more realistic speech patterns that capture the nuances of human conversation.
What languages does Eleven Multilingual v2 support and how well does it handle accents?
The model supports numerous languages including English, Spanish, French, German, Italian, Portuguese, Hindi, Chinese, Japanese, Korean, and many others. It handles regional accents and dialects with impressive accuracy, adapting pronunciation and intonation patterns to sound authentic to native speakers while maintaining consistent voice characteristics.
What are the practical applications for this multilingual text-to-speech technology?
Practical applications include multilingual audiobook and podcast production, e-learning and educational content localization, customer service and IVR systems with natural voices, video game character dialogue, and accessibility tools for visually impaired users.
How does Eleven Multilingual v2 compare to competing TTS systems?
Eleven Multilingual v2 represents significant improvements in voice naturalness, emotional range, and language coverage. It favorably competes with other leading TTS systems by offering more consistent quality across languages, better handling of complex sentence structures, more natural conversational flow, and superior voice cloning capabilities.
Learn how you can transformyour company with AICC APIs



Log in