Qwen3-VL-Plus
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
The Qwen3 series VL models effectively integrates thinking and non-thinking modes, achieving world-leading performance in visual agent capabilities on public benchmark datasets such as OS World. This version features comprehensive upgrades in areas like visual coding, spatial perception, and multimodal reasoning, significantly enhancing visual perception and recognition abilities, and supporting the understanding of ultra-long videos.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.2Per 1M tokens
- Output$1.6Per 1M tokens
- Input(Implicit Cache)$0.04Per 1M tokens
- Explicit Cache Creation$0.25Per 1M tokens
- Explicit Cache Read$0.02Per 1M tokens
- Input$0.2Per 1M tokens
- Output$1.6Per 1M tokens
- Input(Implicit Cache)$0.04Per 1M tokens
- Explicit Cache Creation$0.25Per 1M tokens
- Explicit Cache Read$0.02Per 1M tokens
Context
Context
262.14K
Max Input
258.04K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute1.20K
- TPMTokens Per Minute1M
API Reference
Get API KeyCopied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263