Qwen3-TTS-Flash-Realtime
Copied!
Text-to-Speech
Overview
Text-to-Speech
The Qwen3-TTS-Flash-Realtime-2025-09-18 model is Tongyi's latest real-time speech synthesis foundation model, featuring over 50 expressive voices while delivering low-latency, high-stability audio synthesis. It supports multilingual and dialect outputs with consistent voice characteristics across languages. Trained on extensive datasets, the system autonomously adjusts vocal tones based on text semantics and demonstrates robust capabilities for complex content synthesis.This model is a snapshot version dated September 18, 2025.
Input
Text
Output
Audio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.13Per 10,000 characters
Rate Limits
- RPMRequests Per Minute10
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495