Qwen3-VL-Flash
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
The Qwen3 series of small-sized visual understanding models effectively integrates thinking and non-thinking modes. Compared with the snapshot taken on October 15, 2025, the overall performance of the model has improved significantly: it delivers enhanced capabilities in general visual recognition and reasoning, and shows marked improvements in recognition accuracy across various business scenarios such as security, in-store inspections, equipment monitoring, and photo-based problem solving. This version is a snapshot as of January 22, 2026.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.05Per 1M tokens
- Output$0.4Per 1M tokens
- Input$0.05Per 1M tokens
- Output$0.4Per 1M tokens
Context
Context
262.14K
Max Input
258.04K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute60
- TPMTokens Per Minute100K
API Reference
Get API KeyCopied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263