Qwen-Omni-Turbo-Realtime
Copied!
Real-time Omni-modality
Overview
Real-time Omni-modality
This is the real-time version of Qwen-Omni-Turbo, a brand-new multimodal understanding and generation large model, designed for real-time audio interaction scenarios. It supports mixed input comprehension of audio along with text, images, and video, enables simultaneous streaming generation of both speech and text, and offers four natural conversational voice styles.
Input
TextImageAudio
Output
TextAudio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input: Text$0.27Per 1M tokens
- Input: Audio$4.44Per 1M tokens
- Input: Vision$0.84Per 1M tokens
- Output: Text (When input contains only text) $1.07Per 1M tokens
- Output: Text (When input contains images/audio/video)$2.52Per 1M tokens
- Output: Text&Audio (Output text is not charged)$8.89Per 1M tokens
Context
Context
32.76K
Max Input
30.72K
Max Output
2.04K
Rate Limits
- RPMRequests Per Minute60
- TPMTokens Per Minute10K
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768