Out

Chat

disable

Stable Audio

Discover Stable Audio by Stability AI, an advanced audio generation model that creates high-quality tracks from text prompts with innovative features.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const main = async () => {
  const response = await fetch('https://api.ai.cc/v2/generate/audio', {
    method: 'POST',
    headers: {
      Authorization: 'Bearer ',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'stable-audio',
      prompt: 'lo-fi pop hip-hop ambient music',
      steps: 100,
      seconds_total: 10,
    }),
  }).then((res) => res.json());

  console.log('Generation:', response);
};

main()

                                        import requests


def main():
    url = "https://api.ai.cc/v2/generate/audio"
    payload = {
        "model": "stable-audio",
        "prompt": "lo-fi pop hip-hop ambient music",
        "steps": 100,
        "seconds_total": 10,
    }
    headers = {"Authorization": "Bearer ", "Content-Type": "application/json"}

    response = requests.post(url, json=payload, headers=headers)
    print("Generation:", response.json())


if __name__ == "__main__":
    main()

Docs

One API 300+ AI Models

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

Stable Audio

Product Detail

🎵 Stable Audio: Advanced AI Audio Generation Model Overview

Stable Audio is an innovative audio generation model developed by Stability AI, officially released in September 2023. This cutting-edge AI technology is engineered to create high-quality audio tracks directly from textual prompts, serving a broad spectrum of creative and professional applications.

✨ Key Features & Capabilities

✓ High-Fidelity Output: Generates professional-grade stereo audio at an impressive 44.1 kHz sampling rate, ensuring rich and clear sound quality suitable for diverse media.
✓ Structured Audio Length: Capable of producing cohesive tracks that incorporate distinct musical structures, including engaging intros, dynamic developments, and clear outros.
✓ Diverse Sound Creation: Stable Audio excels at generating a wide range of audio content, from intricate melodies and varied musical styles to realistic sound effects, catering to the nuanced needs of musicians and sound designers alike.

🎯 Intended Use Cases

This model is primarily designed for:

● Musicians & Composers: To aid in the creation of new musical pieces, backing tracks, or experimental soundscapes.
● Sound Designers: For generating bespoke sound effects or ambient backgrounds for games, films, interactive media, and other multimedia projects.
● Developers: To integrate AI-powered audio generation capabilities into various applications and platforms.

Stable Audio primarily processes English text prompts, but its robust architecture allows for processing of multilingual inputs depending on the specific context provided.

⚙️ Technical Specifications & Architecture

Underlying Architecture

Stable Audio is built upon a sophisticated latent diffusion model architecture, meticulously optimized for audio synthesis. Key components of its design include:

● Highly Compressed Autoencoder: Facilitates efficient and high-quality representation of complex audio waveforms, crucial for processing and generation.
● Diffusion Transformer (DiT): This component excels at manipulating data over long sequences, enabling the generation of coherent and well-structured audio pieces.

Training Data & Curation

The model's extensive capabilities are rooted in its training on a vast and diverse dataset:

● Data Source & Scale: The training dataset was curated from the AudioSparx music library, encompassing over 800,000 audio files. This rich collection includes a wide array of music, diverse sound effects, and individual instrument stems, providing a comprehensive foundation for understanding audio elements.
● Ethical Curation & Diversity: Emphasis was placed on respecting creator rights during data curation, including an opt-out option for artists. This thoughtful approach minimizes potential biases and ensures the generated outputs maintain diverse representation.

Performance Metrics

Stable Audio consistently demonstrates strong performance across key indicators:

Metric	Score
Quality Index	High
Maximum Generated Track Length	Up to 47 seconds
Sampling Rate	44.1 kHz

Note: On smaller screens, the table content is horizontally scrollable for optimal readability.

💻 Usage Guide & API Integration

API Access & Code Examples

Stable Audio is readily available for integration via the AI/ML API platform, where it is listed as "Stable Audio".

Generating Audio Programmatically:

// Example API Request to Create Audio Generation POST /audio.create-generation-stable Content-Type: application/json  {   "prompt": "a futuristic synthwave track with a driving beat and neon melodies",   "model": "stable-audio",   "duration": 45 // in seconds, up to 47 }

Retrieving Generated Audio:

// Example API Call to Fetch Generated Audio File GET /audio.fetch-generation?id={generation_id} Accept: audio/mpeg

Comprehensive API Documentation

For in-depth details on request parameters, response formats, authentication, and error handling, please consult the official API Documentation.

⚖️ Ethical Guidelines & Licensing

Ethical Development Framework

Stability AI is deeply committed to fostering ethical AI development. Key aspects of their approach for Stable Audio include:

● Transparency: Upholding clear communication regarding the model's capabilities, potential applications, and inherent limitations.
● Creator Rights: Ensuring that all training data utilized adheres strictly to copyright laws and providing artists with a clear opt-out mechanism for their content's use in future training.

Licensing Information

Stable Audio is made available under a commercial license. This license grants users rights for both research and commercial applications, all while maintaining compliance with established ethical standards and respecting intellectual property rights.

Begin your creative journey with Stable Audio. Access the API and get started here.

❓ Frequently Asked Questions (FAQ)

Q: What is Stable Audio and who developed it?

A: Stable Audio is an advanced AI audio generation model developed by Stability AI, capable of creating high-quality audio tracks from text prompts.

Q: What is the maximum duration of audio Stable Audio can generate?

A: Stable Audio can generate coherent musical structures and tracks up to 47 seconds in length.

Q: What kind of data was used to train Stable Audio?

A: The model was trained on a vast and diverse dataset of over 800,000 audio files from the AudioSparx music library, including music, sound effects, and individual instrument stems.

Q: Can Stable Audio be used for commercial projects?

A: Yes, Stable Audio is available under a commercial license that permits both research and commercial usage, with adherence to ethical guidelines and creator rights.

Q: Where can I access the API for Stable Audio and its documentation?

A: Stable Audio's API is available on the AI/ML API platform, and comprehensive documentation can be found on its official documentation portal.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

One API
300+ AI Models

Save 20% on Costs

Free $1 Tokens for New Members