Embedding-Vision-Plus

Tongyi Multimodal Embedding

Copied!

Add to Compare

Embedding

Overview

Embedding

Embedding-Vision is a vision-centric multimodal embedding model powered by an LLM, featuring outstanding domain-specific performance and high cost-effectiveness in various domains (e.g., e-commerce, photo galleries, security, autonomous driving). With support for text, image, and video, it is applicable to downstream retrieval tasks, including text-to-image, image-to-image, text-to-video and video-to-video.

Input

TextImageVideo

Output

Features

Prefix Completion

Function Calling

Cache

Structured Outputs

Batches

Web Search

Pricing

Image Input
$0.09Per 1M tokens
Text Input
$0.09Per 1M tokens

Rate Limits

RPMRequests Per Minute
600
TPMTokens Per Minute
200K

API Reference

Get API Key

Copied!

1234567891011

import dashscope

text = "Multimodal embedding model sample"
input = [{'text': text}]
resp = dashscope.MultiModalEmbedding.call(
    model="tongyi-embedding-vision-plus",
    input=input
)

print(resp)