Qwen3-Omni-Flash-2025-09-15

Qwen3-Omni-Flash

Copied!

Add to Compare

Multimodal

Overview

Multimodal

Qwen3-Omni-Flash multimodal large-scale model, based on the Thinker–Talker Mixed Expert (MoE) architecture, supports efficient understanding and speech generation of text, images, audio, and video. It can interact with text in 119 languages and speech in 20 languages, generating human-like speech for precise cross-lingual communication. The model boasts powerful command-following and system prompt customization capabilities, flexibly adapting to conversational styles and character settings. It is widely used in scenarios such as text creation, voice assistants, and multimedia analysis, providing a natural and smooth multimodal interaction experience.This version is a snapshot version from September 15, 2025.

Input

TextImageAudioVideo

Output

TextAudio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input: Text
$0.43Per 1M tokens
Input: Audio
$3.81Per 1M tokens
Input: Vision
$0.78Per 1M tokens
Output: Text (When input contains only text)
$1.66Per 1M tokens
Output: Text (When input contains images/audio/video)
$3.06Per 1M tokens
Output: Text&Audio (Output text is not charged)
$15.11Per 1M tokens
Input: Text(Thinking)
$0.43Per 1M tokens
Input: Audio(Thinking)
$3.81Per 1M tokens
Input: Vision(Thinking)
$0.78Per 1M tokens
Output: Text (in thinking mode, when input contains only text)
$1.66Per 1M tokens
Output: Text (in thinking mode, when the input contains images/audio/video)
$3.06Per 1M tokens

Rate Limits

RPMRequests Per Minute
60
TPMTokens Per Minute
100K

API Reference

Get API Key

Copied!

12345678910111213141516171819202122232425

import os
from openai import OpenAI

client = OpenAI(
    # The API keys for the Singapore and Beijing regions are different. To obtain an API key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash-2025-09-15",
    messages=[{"role": "user", "content": "Who are you"}],
    # Set the modality for the output data. The following modalities are supported: ["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Ethan", "format": "wav"},
    # The stream parameter must be set to True. Otherwise, an error is reported
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

import os
from openai import OpenAI

client = OpenAI(
    # The API keys for the Singapore and Beijing regions are different. To obtain an API key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash-2025-09-15",
    messages=[{"role": "user", "content": "Who are you"}],
    # Set the modality for the output data. The following modalities are supported: ["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Ethan", "format": "wav"},
    # The stream parameter must be set to True. Otherwise, an error is reported
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)