Qwen3-VL-30B-A3B-Instruct

Qwen3-Open-Source

Copied!

Try AIAdd to Compare

Visual Understanding

Overview

Visual Understanding

Qwen3-VL's second-largest MoE model delivers fast responses and supports ultra-long contexts (e.g., long videos and documents). It enhances image/video understanding, spatial perception, and object recognition, and includes 2D/3D visual localization to handle complex real-world tasks.

Input

TextImageVideo

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input
$0.2Per 1M tokens
Output
$0.8Per 1M tokens

Context

131.07K

Max Input

129.02K

Max Output

32.76K

Rate Limits

RPMRequests Per Minute
60
TPMTokens Per Minute
100K

API Reference

Get API Key

Copied!

1234567891011121314151617

import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
    "role": "user",
    "content": [
    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg"},
    {"text": "Output the text in the image only."}]
}]
response = dashscope.MultiModalConversation.call(
    #If the environment variable is not set, replace it with your Model Studio API key:  api_key ="sk-xxx"
    api_key = os.getenv('DASHSCOPE_API_KEY'),
    model = 'qwen3-vl-30b-a3b-instruct',
    messages = messages
)
print(response.output.choices[0].message.content[0]["text"])

import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
    "role": "user",
    "content": [
    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg"},
    {"text": "Output the text in the image only."}]
}]
response = dashscope.MultiModalConversation.call(
    #If the environment variable is not set, replace it with your Model Studio API key:  api_key ="sk-xxx"
    api_key = os.getenv('DASHSCOPE_API_KEY'),
    model = 'qwen3-vl-30b-a3b-instruct',
    messages = messages
)
print(response.output.choices[0].message.content[0]["text"])