Qwen3-VL-Plus
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
The Qwen3 series VL models effectively integrates thinking and non-thinking modes, achieving world-leading performance in visual agent capabilities on public benchmark datasets such as OS World. This version features comprehensive upgrades in areas like visual coding, spatial perception, and multimodal reasoning, significantly enhancing visual perception and recognition abilities, and supporting the understanding of ultra-long videos.This version is a snapshot as of September 23, 2025
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.2Per 1M tokens
- Output$1.6Per 1M tokens
- Input$0.2Per 1M tokens
- Output$1.6Per 1M tokens
Context
Context
262.14K
Max Input
258.04K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute120
- TPMTokens Per Minute1M
API Reference
Get API KeyCopied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263