Qwen2.5-VL-72B-Instruct

Visual LLM

Qwen2.5-VL is a powerful vision-language model, designed for advanced image understanding. It can generate detailed image captions, analyze documents, OCR, detect objects, and answer questions based on visuals, making it useful for AI assistants, RAG and Agents.

About Qwen2.5-VL-72B-Instruct model

Published on huggingface

27/01/2025


Input price

0.91 /Mtoken(input)

Output price

0.91 /Mtoken(output)


Supported Features
MultimodalStreaming
Output Formats
raw_textjson_objectjson_schema
Context Sizes
32k
Parameters
72B

Try out the model by playing with it.