Qwen3-VL-Plus

Copied!
Try AIAdd to Compare
Visual Understanding

Overview

Visual Understanding

The Qwen3 series VL models effectively integrates thinking and non-thinking modes, achieving world-leading performance in visual agent capabilities on public benchmark datasets such as OS World. This version features comprehensive upgrades in areas like visual coding, spatial perception, and multimodal reasoning, significantly enhancing visual perception and recognition abilities, and supporting the understanding of ultra-long videos.

Input

TextImageVideo

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • Input
    $0.2Per 1M tokens
  • Output
    $1.6Per 1M tokens
  • Input(Implicit Cache)
    $0.04Per 1M tokens
  • Explicit Cache Creation
    $0.25Per 1M tokens
  • Explicit Cache Read
    $0.02Per 1M tokens
  • Input
    $0.2Per 1M tokens
  • Output
    $1.6Per 1M tokens
  • Input(Implicit Cache)
    $0.04Per 1M tokens
  • Explicit Cache Creation
    $0.25Per 1M tokens
  • Explicit Cache Read
    $0.02Per 1M tokens

Context

Context
262.14K
Max Input
258.04K
Max Output
32.76K

Rate Limits

  • RPMRequests Per Minute
    1.20K
  • TPMTokens Per Minute
    1M

API Reference

Get API Key
Copied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263