Fun-ASR-Realtime

Copied!
Add to Compare
Real-time Speech Recognition

Overview

Real-time Speech Recognition

This is the real-time version of Tongyi Lab's next-generation end-to-end speech recognition model, based on leading proprietary speech technology, and boasts exceptional contextual awareness and high-precision speech transcription capabilities. Based on an end-to-end architecture, Fun-ASR integrates innovative RAG technology, supporting multi-dimensional features such as large-scale hotword customization, automatic filtering of sensitive and modal particles, ITN normalization, and punctuation prediction, significantly improving overall recognition accuracy and contextual relevance. Furthermore, Fun-ASR supports flexible switching between Chinese and English, covers multiple regional dialects, and boasts enhanced noise robustness, adapting to diverse and complex environments.This is a snapshot released on November 7, 2025.

Input

Audio

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Rate Limits

  • RPMRequests Per Minute
    1.20K