Qwen3.5-Flash

Copied!
Try AIAdd to Compare
ReasoningText GenerationVisual Understanding

Overview

ReasoningText GenerationVisual Understanding

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

Input

TextImageVideo

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

  • Input
    $0.1Per 1M tokens
  • Output
    $0.4Per 1M tokens
  • Explicit Cache Creation
    $0.125Per 1M tokens
  • Explicit Cache Read
    $0.01Per 1M tokens

Context

Standard

Context
1M
Max Input
991.80K
Max Output
65.53K

Thinking

Context
1M
Max Input
983.61K
Max Output
65.53K
Max Reasoning
81.92K

Rate Limits

  • RPMRequests Per Minute
    15K
  • TPMTokens Per Minute
    5M

Built-in Tools

web_searchResponses API
web_extractorResponses API
code_interpreterResponses API
t2i_searchResponses API
i2i_searchResponses API

API Reference

Get API Key
Copied!
123456789101112131415161718