Qwen3-ASR-Flash-Realtime

Copied!
Add to Compare
Real-time Speech Recognition

Overview

Real-time Speech Recognition

The real-time version of Qwen3-ASR-Flash is a highly accurate, intelligent, and robust multilingual speech recognition model based on a large language model. Leveraging a powerful foundational model, massive amounts of text and multimodal data, and tens of millions of hours of audio data, Qwen3-ASR-Flash achieves highly accurate speech recognition, automatically determining the language and accurately identifying speech in 11 languages, while ensuring precise transcription even in complex audio environments.This version is a snapshot version from October 27, 2025.

Input

Audio

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • Audio Duration
    $0.00009Per second

Rate Limits

  • RPMRequests Per Minute
    1.20K

API Reference

Get API Key
Copied!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168