Qwen3-VL-Flash
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
The Qwen3 series of small-scale visual understanding models effectively integrates thinking and non-thinking modes, delivering superior performance compared to the open-source Qwen3-VL-30B-A3B while maintaining fast response speeds. It features a comprehensive upgrade in image/video understanding, supporting ultra-long contexts such as extended videos and documents, spatial awareness, and object recognition across various domains. Equipped with 2D/3D visual localization capabilities, it is well-suited for tackling complex real-world tasks.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.05Per 1M tokens
- Output$0.4Per 1M tokens
- Input(Implicit Cache)$0.01Per 1M tokens
- Input(Batch File)$0.025Per 1M tokens
- Output(Batch File)$0.2Per 1M tokens
- Explicit Cache Creation$0.0625Per 1M tokens
- Explicit Cache Read$0.005Per 1M tokens
- Input$0.05Per 1M tokens
- Output$0.4Per 1M tokens
- Input(Implicit Cache)$0.01Per 1M tokens
- Input(Batch File)$0.025Per 1M tokens
- Output(Batch File)$0.2Per 1M tokens
- Explicit Cache Creation$0.0625Per 1M tokens
- Explicit Cache Read$0.005Per 1M tokens
Context
Context
262.14K
Max Input
258.04K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute1.20K
- TPMTokens Per Minute1M
API Reference
Get API KeyCopied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263