Qwen3-TTS-Flash-Realtime

Copied!
Add to Compare
Text-to-Speech

Overview

Text-to-Speech

The Qwen3-TTS-Flash-Realtime-2025-09-18 model is Tongyi's latest real-time speech synthesis foundation model, featuring over 50 expressive voices while delivering low-latency, high-stability audio synthesis. It supports multilingual and dialect outputs with consistent voice characteristics across languages. Trained on extensive datasets, the system autonomously adjusts vocal tones based on text semantics and demonstrates robust capabilities for complex content synthesis.This model is a snapshot version dated September 18, 2025.

Input

Text

Output

Audio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • TTS
    $0.13Per 10,000 characters

Rate Limits

  • RPMRequests Per Minute
    10

API Reference

Get API Key
Copied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495