Qwen3-Open-Source
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
Qwen3-VL's second-largest MoE model delivers fast responses and supports ultra-long contexts (e.g., long videos and documents). It enhances image/video understanding, spatial perception, and object recognition, and includes 2D/3D visual localization to handle complex real-world tasks.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.2Per 1M tokens
- Output$0.8Per 1M tokens
Context
Context
131.07K
Max Input
129.02K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute60
- TPMTokens Per Minute100K
API Reference
Get API KeyCopied!
1234567891011121314151617