Qwen3-TTS-Flash

Copied!
Try AIAdd to Compare
Text-to-Speech

Overview

Text-to-Speech

The Qwen3-TTS-Flash is Tongyi's latest offline text-to-speech foundation model, featuring 17 expressive voices while enabling low-latency, high-stability audio synthesis. It supports multilingual and dialect outputs with consistent voice characteristics across languages. Trained on massive datasets, the system automatically adjusts vocal tones based on text semantics and demonstrates robust capabilities for synthesizing complex content.

Input

Text

Output

Audio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • TTS
    $0.1Per 10,000 characters

Rate Limits

  • RPMRequests Per Minute
    180

API Reference

Get API Key
Copied!
1234567891011121314