Out

Chat

disable

Deepgram Nova-2

Deepgram Nova-2 API features enhanced accuracy, multilingual support, and rapid transcription across various applications.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const axios = require('axios').default;

const api = new axios.create({
  baseURL: 'https://api.ai.cc/v1',
  headers: { Authorization: 'Bearer ' },
});

const main = async () => {
  const response = await api.post('/stt', {
    model: '#g1_nova-2-general',
    url: 'https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3',
  });

  console.log('[transcription]', response.data.results.channels[0].alternatives[0].transcript);
};

main();

                                        import requests


headers = {"Authorization": "Bearer "}


def main():
    url = f"https://api.ai.cc/v1/stt"
    data = {
        "model": "#g1_nova-2-general",
        "url": "https://audio-samples.github.io/samples/mp3/blizzard_unconditional/sample-0.mp3",
    }

    response = requests.post(url, json=data, headers=headers)

    if response.status_code >= 400:
        print(f"Error: {response.status_code} - {response.text}")
    else:
        response_data = response.json()
        transcript = response_data["results"]["channels"][0]["alternatives"][0][
            "transcript"
        ]
        print("[transcription]", transcript)

if __name__ == "__main__":
    main()

Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Deepgram Nova-2

Product Detail

🚀 Discover Deepgram Nova-2: The Future of Speech-to-Text

Deepgram Nova-2 stands as a groundbreaking Automatic Speech Recognition (ASR) model, engineered by Deepgram to deliver unparalleled accuracy for both pre-recorded and real-time streaming audio in English. It sets a new benchmark in the industry, offering a significant leap in performance over its predecessors and competitors.

Model Highlights:

Model Name: Nova-2
Developer: Deepgram
Model Type: Automatic Speech Recognition (ASR)

Performance Edge:

✨ 18% more accurate than previous Nova models.
🎯 Offers a 36% relative WER improvement over OpenAI Whisper (large).

💡 Key Features of Nova-2

Nova-2 is engineered with a suite of features designed to meet the rigorous demands of modern speech applications:

🌐 Multilingual Capabilities: Extend your reach with support for various languages.
📈 High Accuracy & Reduced Word Error Rate (WER): Achieve superior transcription quality.
⚡ Fast Inference Times: Process audio rapidly for real-time applications.
💰 Competitive Pricing: Benefit from cost-effective transcription solutions.

🎯 Versatile Applications & Specialized Models

Deepgram Nova-2 is designed for a broad spectrum of voice applications, from real-time transcription to media analysis. To cater to diverse industry needs, Nova-2 offers several highly optimized versions:

General & Core Models:

nova-2 or nova-2-general: General-purpose model for various domains.
nova-2-conversationalai: Ideal for conversational AI.
nova-2-video: Optimized for video content.

Industry-Specific Optimizations:

nova-2-meeting: Tailored for transcribing meetings.
nova-2-phonecall: Specifically for phone call transcription.
nova-2-finance: Customized for finance contexts.
nova-2-voicemail: Perfect for voicemail messages.
nova-2-medical: Specialized for medical transcription, achieving 16% better accuracy for medical terms at 120-180 words/minute. Explore more about AI in Healthcare here.
nova-2-drivethru: Developed for drive-thru systems.
nova-2-automotive: Designed for automotive environments.

⚙️ Technical Insights into Nova-2

Architecture:

Nova-2 is built upon a cutting-edge Transformer-based architecture. This advanced design significantly enhances performance, leading to an 18.4% decrease in Word Error Rate (WER) compared to Nova-1. These improvements are crucial for transcribing entities (like proper nouns), punctuation, and capitalization with high accuracy across both live and pre-recorded audio.

Training Data:

The model was trained on Deepgram's most extensive and diverse dataset to date, utilizing nearly 6 million resources and 47 billion tokens. This massive dataset is enriched with a comprehensive collection of high-quality human transcriptions, ensuring robust and accurate learning.

Performance Metrics & Speed:

Nova-2 showcases significant improvements in WER against previous models and competitors. Furthermore, speed is a critical advantage: Nova-2 achieved a median inference time of just 29.8 seconds per hour of diarized audio. This makes it 5 to 40 times faster than other vendors offering diarization capabilities.

🛠️ How to Use Deepgram Nova-2

Code Samples & SDK:

Integration Example: Use the `voice.stt` snippet with `data-model="#g1_nova-2-general"` for general transcription needs.

Tutorials:

Dive deeper with guides like: Speech-to-text Multimodal Experience in NodeJS

Technical Constraints:

💾 Maximum File Size: 2 GB
⏱️ Rate Limits: 100 concurrent requests

⚖️ Ethical Considerations for Nova-2

Deepgram is committed to responsible AI development. Nova-2 adheres to stringent ethical guidelines:

🔒 Privacy & Ethical AI: Strict adherence to ethical AI development, emphasizing data privacy and responsible use.
🌍 Bias Mitigation: Continuous efforts to ensure fairness and accuracy across diverse speech patterns, accents, and demographics.

❓ Frequently Asked Questions (FAQ) about Deepgram Nova-2

Q: What is Deepgram Nova-2?

A: Deepgram Nova-2 is a state-of-the-art Automatic Speech Recognition (ASR) model designed for highly accurate speech-to-text transcription of both pre-recorded and streaming English audio.

Q: How does Nova-2 compare to other ASR models like OpenAI Whisper?

A: Nova-2 boasts an 18% improvement in accuracy over previous Deepgram Nova models and offers a significant 36% relative Word Error Rate (WER) improvement compared to OpenAI Whisper (large).

Q: Are there specialized versions of Nova-2 for specific industries?

A: Yes, Deepgram Nova-2 comes with several optimized versions for specific use cases, including `nova-2-meeting`, `nova-2-phonecall`, `nova-2-finance`, `nova-2-medical`, and more, each tailored for maximum accuracy in its respective domain.

Q: What are the main technical advantages of Nova-2?

A: Nova-2 utilizes an advanced Transformer-based architecture, leading to an 18.4% WER decrease from Nova-1. It was trained on an extensive dataset of 47 billion tokens and offers extremely fast inference times, being 5 to 40 times faster than competitors for diarized audio.

Q: How does Deepgram address ethical concerns with Nova-2?

A: Deepgram prioritizes ethical AI development, focusing on reducing bias, ensuring privacy, and maintaining fairness and accuracy across diverse speech patterns and accents through continuous efforts and adherence to strict guidelines.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

One API
300+ AI Models

Save 20% on Costs

Free $1 Tokens for New Members