Qwen3-LiveTranslate-Flash-Realtime
Copied!
Real-time Speech Translation
Overview
Real-time Speech Translation
The real-time version of Qwen3-LiveTranslate-Flash, which is a high-precision, highly responsive, and robust multilingual simultaneous audio and video interpretation model. Leveraging Qwen3-Omni's powerful infrastructure, massive multimodal data, cross-language and cross-modal alignment, and visual enhancement technologies, Qwen3-LiveTranslate-Flash offers both offline and real-time audio and video translation capabilities. It can understand 19 languages and speak 10 languages, including 8 Chinese dialects.
Input
ImageAudio
Output
TextAudio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input: Audio$10Per 1M tokens
- Input: Image$1.3Per 1M tokens
- Output: Text$10Per 1M tokens
- Output: Audio$38Per 1M tokens
Context
Context
53.24K
Max Input
49.15K
Max Output
4.09K
Rate Limits
- RPMRequests Per Minute10
- TPMTokens Per Minute100K
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829