Fun-ASR-Realtime
Overview
This is the real-time version of Tongyi Lab's next-generation end-to-end speech recognition model, based on leading proprietary speech technology, and boasts exceptional contextual awareness and high-precision speech transcription capabilities. Based on an end-to-end architecture, Fun-ASR integrates innovative RAG technology, supporting multi-dimensional features such as large-scale hotword customization, automatic filtering of sensitive and modal particles, ITN normalization, and punctuation prediction, significantly improving overall recognition accuracy and contextual relevance. Furthermore, Fun-ASR supports flexible switching between Chinese and English, covers multiple regional dialects, and boasts enhanced noise robustness, adapting to diverse and complex environments.
Input
Output
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Rate Limits
- RPMRequests Per Minute1.20K