Qwen-Voice-Enrollment

Copied!

Try AIAdd to Compare

Overview

The Qwen Voice-Enrollment model is a series of voice replication models from the Qianwen speech model. It can quickly replicate highly similar voices using audio of only 5 seconds or more. When used in conjunction with the qwen3-tts-vc-realtime model, it can replicate a person's voice with high fidelity and output speech in 10 languages. Furthermore, the synthesized audio can adaptively adjust its tone according to the text and has good processing capabilities for complex text synthesis.

Input

Audio

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

TTS
$0.01Per voice

Rate Limits

RPMRequests Per Minute
180