Tongyi Multimodal Embedding
Copied!
Embedding
Overview
Embedding
Embedding-Vision is a vision-centric multimodal embedding model powered by an LLM, featuring outstanding domain-specific performance and high cost-effectiveness in various domains (e.g., e-commerce, photo galleries, security, autonomous driving). With support for text, image, and video, it is applicable to downstream retrieval tasks, including text-to-image, image-to-image, text-to-video and video-to-video.
Input
TextImageVideo
Output
Features
Prefix Completion
Function Calling
Cache
Structured Outputs
Batches
Web Search
Pricing
- Image Input$0.09Per 1M tokens
- Text Input$0.09Per 1M tokens
Rate Limits
- RPMRequests Per Minute600
- TPMTokens Per Minute200K
API Reference
Get API KeyCopied!
1234567891011
import dashscope
text = "Multimodal embedding model sample"
input = [{'text': text}]
resp = dashscope.MultiModalEmbedding.call(
model="tongyi-embedding-vision-plus",
input=input
)
print(resp)