Qwen-VL-Max

Copied!
Try AIAdd to Compare
Visual Understanding

Overview

Visual Understanding

Qwen's Most Capable Large Visual Language Model. Compared to the enhanced version, further improvements have been made to visual reasoning and instruction-following capabilities, offering a higher level of visual perception and cognitive understanding. It delivers optimal performance on an even broader range of complex tasks.

Input

TextImageVideo

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • Input
    $0.8Per 1M tokens
  • Output
    $3.2Per 1M tokens
  • Input(Implicit Cache)
    $0.16Per 1M tokens
  • Input(Batch File)
    $0.4Per 1M tokens
  • Output(Batch File)
    $1.6Per 1M tokens

Context

Context
131.07K
Max Input
129.02K
Max Output
32.76K

Rate Limits

  • RPMRequests Per Minute
    1.20K
  • TPMTokens Per Minute
    1M

API Reference

Get API Key
Copied!
1234567891011121314151617