CosyVoice
Overview
Synthesis Capabilities: CosyVoice-v3-Flash is the latest high-performance speech synthesis model in the CosyVoice series from Tongyi Labs, offering improved naturalness, timbre, prosody, and emotional expressiveness compared to previous versions. This model supports real-time streaming text-to-speech synthesis. Cloning Capabilities: CosyVoice-v3-Flash is also the latest speech cloning model in the CosyVoice series from Tongyi Labs. Compared to previous versions, it improves pronunciation accuracy and timbre similarity, and adds support for more less commonly spoken languages (German, Spanish, French, Italian, Russian, Japanese). It can quickly generate highly similar and naturally sounding custom voices from just 5-20 seconds of reference audio.
Input
Output
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- TTS$0.13Per 10,000 characters
Rate Limits
- RPMRequests Per Minute180