Qwen3-TTS-Flash-Realtime

Copied!
Add to Compare
Text-to-Speech

Overview

Text-to-Speech

Qwen3-TTS-Flash-Realtime model is Tongyi's latest large-scale real-time speech synthesis model. It boasts 51 highly expressive human-like timbres and can synthesize audio in real-time with low latency and high stability. It also supports multiple languages ​​and dialects, and allows for multilingual output using the same timbre. Trained on massive amounts of data, the model can adaptively adjust tone based on the text and handles complex text synthesis effectively.This model is provided as a snapshot version.

Input

Text

Output

Audio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • TTS
    $0.13Per 10,000 characters

Rate Limits

  • RPMRequests Per Minute
    180

API Reference

Get API Key
Copied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495