Qwen-Flash

Copied!

Try AIAdd to Compare

ReasoningText Generation

Overview

ReasoningText Generation

The Qwen3 Flash model offers a powerful fusion of thinking and non-thinking modes with dynamic in-conversation switching, excelling in complex reasoning while showing significant gains in instruction following and text comprehension. It supports a 1M context length and is billed on a tiered model corresponding to context usage.

Input

Text

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input
$0.05Per 1M tokens
Output
$0.4Per 1M tokens
Input(Implicit Cache)
$0.01Per 1M tokens
Input(Batch File)
$0.025Per 1M tokens
Output(Batch File)
$0.2Per 1M tokens
Explicit Cache Creation
$0.063Per 1M tokens
Explicit Cache Read
$0.005Per 1M tokens

Input
$0.05Per 1M tokens
Output
$0.4Per 1M tokens
Input(Implicit Cache)
$0.01Per 1M tokens
Input(Batch File)
$0.025Per 1M tokens
Output(Batch File)
$0.2Per 1M tokens
Explicit Cache Creation
$0.063Per 1M tokens
Explicit Cache Read
$0.005Per 1M tokens

Context

Max Input

995.90K

Max Output

32.76K

Rate Limits

RPMRequests Per Minute
600
TPMTokens Per Minute
5M

API Reference

Get API Key

Copied!

12345678910111213141516171819202122232425262728293031

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
response = Generation.call(
    # If the environment variable is not set, replace it with your Model Studio API key: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-flash",
    messages=messages,
    result_format="message",
    # Enable deep thinking
    enable_thinking=True,
)

if response.status_code == 200:
    # Print thinking process
    print("=" * 20 + "Thinking process" + "=" * 20)
    print(response.output.choices[0].message.reasoning_content)
    
    # Print response
    print("=" * 20 + "Full response" + "=" * 20)
    print(response.output.choices[0].message.content)
else:
    print(f"HTTP return code: {response.status_code}")
    print(f"Error code: {response.code}")
    print(f"Error message: {response.message}")

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
response = Generation.call(
    # If the environment variable is not set, replace it with your Model Studio API key: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen-flash",
    messages=messages,
    result_format="message",
    # Enable deep thinking
    enable_thinking=True,
)

if response.status_code == 200:
    # Print thinking process
    print("=" * 20 + "Thinking process" + "=" * 20)
    print(response.output.choices[0].message.reasoning_content)
    
    # Print response
    print("=" * 20 + "Full response" + "=" * 20)
    print(response.output.choices[0].message.content)
else:
    print(f"HTTP return code: {response.status_code}")
    print(f"Error code: {response.code}")
    print(f"Error message: {response.message}")