Qwen3.5-Flash
Copied!
Try AIAdd to Compare
ReasoningText GenerationVisual Understanding
Overview
ReasoningText GenerationVisual Understanding
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.1Per 1M tokens
- Output$0.4Per 1M tokens
- Explicit Cache Creation$0.125Per 1M tokens
- Explicit Cache Read$0.01Per 1M tokens
Context
Standard
Context
1M
Max Input
991.80K
Max Output
65.53K
Thinking
Context
1M
Max Input
983.61K
Max Output
65.53K
Max Reasoning
81.92K
Rate Limits
- RPMRequests Per Minute15K
- TPMTokens Per Minute5M
Built-in Tools
web_searchResponses API
web_extractorResponses API
code_interpreterResponses API
t2i_searchResponses API
i2i_searchResponses API
API Reference
Get API KeyCopied!
123456789101112131415161718