Qwen3-TTS-Flash-Realtime
Copied!
Text-to-Speech
Overview
Text-to-Speech
Qwen3-TTS-Flash-Realtime model is Tongyi's latest large-scale real-time speech synthesis model. It boasts 51 highly expressive human-like timbres and can synthesize audio in real-time with low latency and high stability. It also supports multiple languages and dialects, and allows for multilingual output using the same timbre. Trained on massive amounts of data, the model can adaptively adjust tone based on the text and handles complex text synthesis effectively.This model is provided as a snapshot version.
Input
Text
Output
Audio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.13Per 10,000 characters
Rate Limits
- RPMRequests Per Minute180
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495