Qwen3.5-Omni-Flash-Realtime
Copied!
Real-time Omni-modality
Overview
Real-time Omni-modality
Qwen 3.5-Omni is the latest generation of Qwen's multimodal large model, supporting text, image, audio, and audio-visual understanding and interaction. As a fully evolved version of Qwen3-Omni, it supports audio input in 60+ languages, voice output in 30+ languages, and controllable voice dialogue, WebSearch and complex FunctionCall invocation, and has intelligent semantic interruption interaction capabilities. It is widely used in scenarios such as text creation, voice assistants, and multimedia analysis, providing a natural and smooth multimodal interactive experience.
Input
TextImageVideoAudio
Output
TextAudio
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input: Audio$4.5Per 1M tokens
- Output: Text&Audio (Output text is not charged)$17.7Per 1M tokens
- input:Text/Image/Video$0.55Per 1M tokens
- Output: Text$3.3Per 1M tokens
Context
Context
262.14K
Max Input
196.60K
Max Output
65.53K
Rate Limits
- RPMRequests Per Minute60
- TPMTokens Per Minute100K
Built-in Tools
search_strategy:agentCompletions API
API Reference
Get API KeyCopied!
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768