Qwen3-TTS-VC
Copied!
Try AIAdd to Compare
Text-to-Speech
Overview
Text-to-Speech
Qwen3-TTS-Flash model is Tongyi's latest real-time speech synthesis model. It can perform high-fidelity real-time speech synthesis on voices replicated from the qwen-voice-enrollment service, and supports speech output in 11 languages using the same voice timbre. This model has been trained on massive amounts of data, and the synthesized audio can adaptively adjust tone according to the text, demonstrating good processing capabilities for complex text synthesis. This model is a snapshot version from January 22, 2026.
Input
Text
Output
Audio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.115Per 10,000 characters
Rate Limits
- RPMRequests Per Minute180
API Reference
Get API KeyCopied!
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364