Qwen2.5-Omni-7B - Qwen Cloud

Qwen2.5-Open-Source

Copied!

Try AIAdd to Compare

Multimodal

Overview

Multimodal

Based on the Qwen2.5 training, a new multimodal understanding and generation large model is developed, which supports text, image, voice, video input understanding and mixed input understanding, has the ability to generate text and voice simultaneously, significantly improves the speed of multimodal content understanding, and provides four kinds of natural dialogue timbres.

Input

TextImageVideoAudio

Output

TextAudio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input: Text
$0.1Per 1M tokens
Input: Audio
$6.76Per 1M tokens
Input: Vision
$0.28Per 1M tokens
Output: Text (When input contains only text)
$0.4Per 1M tokens
Output: Text (When input contains images/audio/video)
$0.84Per 1M tokens
Output: Text&Audio (Output text is not charged)
$13.51Per 1M tokens

Context

32.76K

Max Input

30.72K

Max Output

2.04K

Rate Limits

API Reference

Get API Key

Copied!

12345678910111213141516171819202122232425

import os
from openai import OpenAI

client = OpenAI(
    # The API keys for the Singapore and Beijing regions are different. To obtain an API key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen2.5-omni-7b",
    messages=[{"role": "user", "content": "Who are you"}],
    # Set the modality for the output data. The following modalities are supported: ["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Ethan", "format": "wav"},
    # The stream parameter must be set to True. Otherwise, an error is reported
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)

import os
from openai import OpenAI

client = OpenAI(
    # The API keys for the Singapore and Beijing regions are different. To obtain an API key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen2.5-omni-7b",
    messages=[{"role": "user", "content": "Who are you"}],
    # Set the modality for the output data. The following modalities are supported: ["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Ethan", "format": "wav"},
    # The stream parameter must be set to True. Otherwise, an error is reported
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)