Fun-ASR-Realtime

Copied!
Add to Compare
Real-time Speech Recognition

Overview

Real-time Speech Recognition

This is the real-time version of Tongyi Lab's next-generation end-to-end speech recognition model, based on leading proprietary speech technology, and boasts exceptional contextual awareness and high-precision speech transcription capabilities. Based on an end-to-end architecture, Fun-ASR integrates innovative RAG technology, supporting multi-dimensional features such as large-scale hotword customization, automatic filtering of sensitive and modal particles, ITN normalization, and punctuation prediction, significantly improving overall recognition accuracy and contextual relevance. Furthermore, Fun-ASR supports flexible switching between Chinese and English, covers multiple regional dialects, and boasts enhanced noise robustness, adapting to diverse and complex environments.

Input

Audio

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Rate Limits

  • RPMRequests Per Minute
    1.20K