Fun-ASR-Realtime
Overview
This is the real-time version of Tongyi Lab's next-generation end-to-end speech recognition model, based on leading proprietary speech technology, and boasts exceptional contextual awareness and high-precision speech transcription capabilities. Based on an end-to-end architecture, Fun-ASR integrates innovative RAG technology, supporting multi-dimensional features such as large-scale hotword customization, automatic filtering of sensitive and modal particles, ITN normalization, and punctuation prediction, significantly improving overall recognition accuracy and contextual relevance. Furthermore, Fun-ASR supports flexible switching between Chinese and English, covers multiple regional dialects, and boasts enhanced noise robustness, adapting to diverse and complex environments.This is a snapshot released on November 7, 2025.
Input
Output
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Rate Limits
- RPMRequests Per Minute1.20K