Qwen-VL-OCR

Copied!

Try AIAdd to Compare

Visual Understanding

Overview

Visual Understanding

Qwen-VL-OCR is a large OCR recognition model built upon Qwen-VL. It unifies a wide range of image-text recognition, parsing, and processing tasks within a single model, delivering robust visual-text comprehension.

Input

Image

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input
$0.07Per 1M tokens
Output
$0.16Per 1M tokens

Context

38.19K

Max Input

30K

Max Output

8.19K

Rate Limits

RPMRequests Per Minute
600
TPMTokens Per Minute
6M

API Reference

Get API Key

Copied!

1234567891011121314151617

import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
    "role": "user",
    "content": [
    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg"},
    {"text": "Output the text in the image only."}]
}]
response = dashscope.MultiModalConversation.call(
    #If the environment variable is not set, replace it with your Model Studio API key:  api_key ="sk-xxx"
    api_key = os.getenv('DASHSCOPE_API_KEY'),
    model = 'qwen-vl-ocr',
    messages = messages
)
print(response.output.choices[0].message.content[0]["text"])

import os
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{
    "role": "user",
    "content": [
    {"image": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg"},
    {"text": "Output the text in the image only."}]
}]
response = dashscope.MultiModalConversation.call(
    #If the environment variable is not set, replace it with your Model Studio API key:  api_key ="sk-xxx"
    api_key = os.getenv('DASHSCOPE_API_KEY'),
    model = 'qwen-vl-ocr',
    messages = messages
)
print(response.output.choices[0].message.content[0]["text"])