Qwen-VL-Max
Copied!
Try AIAdd to Compare
Visual Understanding
Overview
Visual Understanding
Qwen's Most Capable Large Visual Language Model. Compared to the enhanced version, further improvements have been made to visual reasoning and instruction-following capabilities, offering a higher level of visual perception and cognitive understanding. It delivers optimal performance on an even broader range of complex tasks.
Input
TextImageVideo
Output
Text
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Input$0.8Per 1M tokens
- Output$3.2Per 1M tokens
- Input(Implicit Cache)$0.16Per 1M tokens
- Input(Batch File)$0.4Per 1M tokens
- Output(Batch File)$1.6Per 1M tokens
Context
Context
131.07K
Max Input
129.02K
Max Output
32.76K
Rate Limits
- RPMRequests Per Minute1.20K
- TPMTokens Per Minute1M
API Reference
Get API KeyCopied!
1234567891011121314151617