DeepSeek-V4-Flash - Qwen Cloud

DeepSeek

Copied!

Try AIAdd to Compare

Text GenerationReasoning

Overview

Text GenerationReasoning

A highly efficient, lightweight MoE model with 284 billion parameters in total and 13 billion activated parameters, natively supporting context windows of up to one million tokens. It offers fast inference speed, low latency, and cost-effective invocation, delivering well-balanced overall performance. Designed for high-concurrency, lightweight workloads, it is ideally suited for common, essential use cases such as everyday dialogue, content creation, basic RAG applications, and batch text processing.

Input

Text

Output

Text

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Input
$0.2Per 1M tokens
Output
$0.4Per 1M tokens
Input(Implicit Cache)
$0.04Per 1M tokens

Context

Max Input

Max Output

393.21K

Rate Limits

RPMRequests Per Minute
10K
TPMTokens Per Minute
1.20M

API Reference

Get API Key

Copied!

12345678910111213141516171819202122232425262728293031

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
response = Generation.call(
    # If the environment variable is not set, replace it with your Model Studio API key: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="deepseek-v4-flash",
    messages=messages,
    result_format="message",
    # Enable deep thinking
    enable_thinking=True,
)

if response.status_code == 200:
    # Print thinking process
    print("=" * 20 + "Thinking process" + "=" * 20)
    print(response.output.choices[0].message.reasoning_content)
    
    # Print response
    print("=" * 20 + "Full response" + "=" * 20)
    print(response.output.choices[0].message.content)
else:
    print(f"HTTP return code: {response.status_code}")
    print(f"Error code: {response.code}")
    print(f"Error message: {response.message}")

import os
from dashscope import Generation
import dashscope
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Who are you?"},
]
response = Generation.call(
    # If the environment variable is not set, replace it with your Model Studio API key: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="deepseek-v4-flash",
    messages=messages,
    result_format="message",
    # Enable deep thinking
    enable_thinking=True,
)

if response.status_code == 200:
    # Print thinking process
    print("=" * 20 + "Thinking process" + "=" * 20)
    print(response.output.choices[0].message.reasoning_content)
    
    # Print response
    print("=" * 20 + "Full response" + "=" * 20)
    print(response.output.choices[0].message.content)
else:
    print(f"HTTP return code: {response.status_code}")
    print(f"Error code: {response.code}")
    print(f"Error message: {response.message}")