Qwen3-Open-Source

Copied!
Try AIAdd to Compare
Visual Understanding

Overview

Visual Understanding

Qwen3-VL's second-largest MoE model delivers fast responses and supports ultra-long contexts (e.g., long videos and documents). It enhances image/video understanding, spatial perception, and object recognition, and includes 2D/3D visual localization to handle complex real-world tasks.

Input

TextImageVideo

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • Input
    $0.2Per 1M tokens
  • Output
    $0.8Per 1M tokens

Context

Context
131.07K
Max Input
129.02K
Max Output
32.76K

Rate Limits

  • RPMRequests Per Minute
    60
  • TPMTokens Per Minute
    100K

API Reference

Get API Key
Copied!
1234567891011121314151617