qwen3-tts-instruct-flash-realtime
Copied!
Text-to-Speech
Overview
Text-to-Speech
Qwen3-TTS-Flash model is Tongyi's latest real-time speech synthesis model. The Instruct model processes the synthesis effect through natural language, ensuring highly appropriate emotional and expressive speech in different contexts. Currently, it supports 25 timbres for both Chinese and English Instruct adjustments. This model is a snapshot version from January 22, 2026.
Input
Text
Output
Audio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.143Per 10,000 characters
Rate Limits
- RPMRequests Per Minute180
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495