Qwen3-TTS-Flash
Copied!
Try AIAdd to Compare
Text-to-Speech
Overview
Text-to-Speech
The Qwen3-TTS-Flash is Tongyi's latest offline text-to-speech foundation model, featuring 17 expressive voices while enabling low-latency, high-stability audio synthesis. It supports multilingual and dialect outputs with consistent voice characteristics across languages. Trained on massive datasets, the system automatically adjusts vocal tones based on text semantics and demonstrates robust capabilities for synthesizing complex content.
Input
Text
Output
Audio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.1Per 10,000 characters
Rate Limits
- RPMRequests Per Minute180
API Reference
Get API KeyCopied!
1234567891011121314