Qwen3-TTS-VD-2026-01-26

Qwen3-TTS-VD

Copied!

Try AIAdd to Compare

Text-to-Speech

Overview

Text-to-Speech

Qwen3-TTS-VD model is Tongyi's latest real-time speech synthesis model. It can perform high-fidelity real-time speech synthesis on the voices designed by the qwen3-voice-design service, and supports speech output in 11 languages using the same voice. This model has been trained on massive amounts of data, and the synthesized audio can adaptively adjust its tone according to the text, demonstrating good processing capabilities for complex text synthesis. This model is a snapshot version from January 26, 2026.

Input

Text

Output

Audio

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

TTS
$0.115Per 10,000 characters

Rate Limits

RPMRequests Per Minute
180

API Reference

Get API Key

Copied!

123456789101112131415161718

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-2026-01-26",
        "voice_prompt": "A composed middle-aged male announcer with a deep and rich voice, possessing a strong magnetic quality. His speech rate is steady and his pronunciation is clear. This kind of voice is suitable for news broadcasts or documentary commentaries.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-2026-01-26",
        "voice_prompt": "A composed middle-aged male announcer with a deep and rich voice, possessing a strong magnetic quality. His speech rate is steady and his pronunciation is clear. This kind of voice is suitable for news broadcasts or documentary commentaries.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'