Out

Chat

disable

HunyuanImage 3.0

The model supports understanding and rendering multi-thousand-word prompts and creates clear, legible text within images, making it ideal for diverse creative applications.

Free $1 Tokens for New Members

Text to Speech

Javascript

Python

                                        const main = async () => {
  const response = await fetch('https://api.ai.cc/v1/images/generations', {
    method: 'POST',
    headers: {
      Authorization: 'Bearer ',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      model: 'hunyuan/hunyuan-image-v3-text-to-image',
      prompt: 'A jellyfish in the ocean',
    }),
  }).then((res) => res.json());

  console.log('Generation:', response);
};

main();

                                        import requests


def main():
    response = requests.post(
        "https://api.ai.cc/v1/images/generations",
        headers={
            "Authorization": "Bearer ",
            "Content-Type": "application/json",
        },
        json={
            "model": "hunyuan/hunyuan-image-v3-text-to-image",
            "prompt": "A jellyfish in the ocean",
        },
    )

    response.raise_for_status()
    data = response.json()

    print("Generation:", data)


if __name__ == "__main__":
    main()

Docs

300+ AI Models for OpenClaw & AI Agents

Save 20% on Costs & $1 Free Tokens

Get API Key Explore Models

HunyuanImage 3.0

Product Detail

HunyuanImage 3.0 is Tencent's cutting-edge native multimodal text-to-image generation model. This advanced system integrates an autoregressive large language model architecture with diffusion-based image generation, setting new benchmarks for image quality and text-image alignment. With an impressive 80 billion parameters and a Mixture-of-Experts (MoE) design, HunyuanImage 3.0 excels in generating hyper-realistic, highly detailed, and stylistically diverse images directly from natural language prompts. It offers robust support for both Chinese and English prompts and provides flexible aspect ratios, empowering creators across various industries.

✨ Technical Specifications

Model Type: Native multimodal autoregressive diffusion model with MoE LLM backbone
Parameters: 80 billion total, 13 billion active per token (MoE)
Architecture: Mixture of Experts (64 experts), enhanced diffusion transformer, variational autoencoder (VAE) compression
Training Data: Trained on 5 billion image-text pairs, enriched with video frames and interleaved multimodal data
Input Modalities: Text prompts (Chinese/English)
Output: High-resolution images, flexible aspect ratios

📈 Performance Benchmarks

Comparison to Previous Versions: Outperforms HunyuanImage 2.1 by a relative win rate of 14.1% in professional human evaluation for image quality and text alignment.
Image Quality: Produces hyper-realistic photos, detailed illustrations, and diverse artistic styles with strong prompt adherence.
Evaluation Methodology: 1000 carefully curated prompts evaluated by over 100 professional human raters using the Good/Same/Bad (GSB) framework for fairness.

💡 Key Features

✅ Massive Scale MoE Architecture: Features 80B total parameters, with 13B activated per token using 64 experts, balancing immense capacity with computational efficiency.
✨ Revolutionary Diffusion Architecture: An enhanced diffusion transformer ensures the generation of detailed, coherent, and high-resolution images.
🚀 Advanced Compression VAE: Effectively compresses image features, reducing computational costs while simultaneously improving visual fidelity.
🔗 Enhanced Dual Encoder System: Tightly integrates vision and text encoders for superior semantic understanding and alignment between text and image.
🔧 Prompt Enhancement Module: Automatically refines user prompts to optimize generation quality and accuracy, ensuring better outputs.
🌐 Multi-language Support: Character-aware processing provides fluent support for both Chinese and English prompts.
📐 Flexible Aspect Ratios: Supports various ratios including 1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3 to meet diverse creative demands.

💲 API Pricing

Pricing for HunyuanImage 3.0 API is set at $0.105 per megapixel.

🎯 Use Cases

🖼️ Marketing and advertising visuals requiring photorealistic quality.
🎨 Diverse artistic exploration: watercolor, oil painting, anime, surrealism, cyberpunk, and more.
👤 Character design and animation frames with expressive detail.
📚 Educational visuals and comics with fine textual consistency.
🏗️ Visual prototyping for product design and digital twins.

⚖️ Comparison with Other Models

vs Seedream 4.0: HunyuanImage 3.0 operates on a larger scale with 80 billion parameters through its Mixture of Experts architecture, surpassing Seedream 4.0’s approximately 50 billion parameters. HunyuanImage also offers more fluent support for both Chinese and English prompts, whereas Seedream primarily focuses on English. While both models deliver high-fidelity images, HunyuanImage demonstrates superior prompt adherence and comprehensive multi-aspect ratio support.

vs Gemini 2.5 Flash Image: HunyuanImage 3.0’s large-scale MoE model is engineered for generating hyper-realistic and a broad spectrum of artistic styles. Gemini 2.5, conversely, tends towards more artistic, stylized outputs and is smaller in parameter size (~30B). HunyuanImage provides greater versatility across various use cases due to its dual-language input capabilities and flexible resolution options, offering more comprehensive creative freedom compared to models with more limited language and aspect ratio options.

vs GPT-Image: Both models utilize diffusion architectures, but HunyuanImage 3.0 uniquely integrates a large multimodal MoE LLM backbone, significantly enhancing text-image alignment. GPT-Image typically produces general quality images with moderate prompt adherence. In contrast, HunyuanImage systematically optimizes prompts and employs a two-stage pipeline to improve clarity and detail. Furthermore, HunyuanImage supports multilingual prompts and multiple aspect ratios, considerably expanding creative possibilities beyond GPT-Image’s more basic output formats.

🔌 API Integration

HunyuanImage 3.0 is conveniently accessible via the AI/ML API. Comprehensive documentation can be found available here.

❓ Frequently Asked Questions

Q: How does HunyuanImage 3.0's MoE architecture benefit image generation?

A: The Mixture-of-Experts (MoE) architecture in HunyuanImage 3.0 allows for efficient scaling with 80 billion parameters while activating only 13 billion per token. This design optimizes computational cost and enhances the model's capacity to learn complex visual features and diverse styles, leading to higher quality and more detailed image outputs.

Q: Can HunyuanImage 3.0 generate images with specific artistic styles?

A: Yes, HunyuanImage 3.0 excels in generating a wide range of artistic styles, including hyper-realistic photos, watercolor, oil painting, anime, surrealism, and cyberpunk, among others. Its advanced diffusion transformer and extensive training data enable it to adapt to various stylistic prompts effectively.

Q: What makes HunyuanImage 3.0 particularly strong in multi-language prompt support?

A: HunyuanImage 3.0 features character-aware processing and an enhanced dual encoder system that tightly integrates vision and text encoders. This allows for superior semantic understanding and alignment for both Chinese and English prompts, ensuring that multi-language inputs are interpreted accurately and reflected faithfully in the generated images.

AI Playground

Test all API models in the sandbox environment before you integrate. We provide more than 300 models to integrate into your app.

Try For Free

300+ AI Models for
OpenClaw & AI Agents

Save 20% on Costs

Free $1 Tokens for New Members