Qwen2.5-VL-72B-Instruct

Visual LLM

Qwen2.5-VL ist ein leistungsstarkes Vision-Sprach-Modell, das für erweitertes Bildverständnis entwickelt wurde. Es kann detaillierte Bildunterschriften generieren, Dokumente analysieren, OCR durchführen, Objekte erkennen und Fragen basierend auf visuellen Inhalten beantworten, was es nützlich für AI-Assistenten, RAG und Agents macht.

Versuchen Sie ein anderes Modell

Über das Qwen2.5-VL-72B-Instruct Modell

Veröffentlicht am huggingface

27/01/2025

Lizenz: Qwen

Eingabepreis

0.91 € /Mtoken(Eingabe)

Ausgabepreis

0.91 € /Mtoken(Ausgabe)

Unterstützte Funktionen

MultimodalStreaming

Ausgabeformate

raw_textjson_objectjson_schema

Kontextgrößen

32k

Parameter

72B

Testen Sie das Modell, indem Sie damit spielen.

Example of how to use Qwen2.5-VL-72B-Instruct Vision Language Model API in python

Introduction

The following examples walk you through the use of a VLM (Vision Language Model), able to take images and texts prompts and generate text. To send a multimodal input to the model, you have to use a content list that will contain the text prompt and the base64 encoded image.

A VLM encodes the image as embeddings and uses tokens to represent and process the image along the usual text tokens, so an image will use some of the input context length of the model.

With a simple HTTP client (requests)

First, install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

If you don't have any images available for testing, save the following image locally as sample.jpg:

And then run the following example Python code to ask the VLM to describe this image (or perform another task by adapting the content field of the messages dictionary):

import mimetypes
import os
import requests
import base64

image_filepath = "sample.jpg"
with open(image_filepath, "rb") as img_file:
    image_data = img_file.read()

# detect MIME type (default to jpeg if unknown)
mime_type, _ = mimetypes.guess_type(image_filepath)
if mime_type is None:
    mime_type = "image/jpeg"

encoded_image = f"data:{mime_type};base64,{base64.b64encode(image_data).decode('utf-8')}"
 
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}

# With /chat/completions
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"
payload = {
    "max_tokens": 512,
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": encoded_image
                    }
                }
            ]
        }
    ],
    "model": "Qwen2.5-VL-72B-Instruct",
    "temperature": 0.2,
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    # Parse JSON response
    choices = response_data["choices"]
    for choice in choices:
        text = choice["message"]["content"]
        # Process text and finish_reason
        print(text)
else:
    print("Error:", response.status_code, response.text)

# With /responses
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/responses"
payload = {
    "max_output_tokens": 512,
    "input": [
        {
            "role": "user",
            "type": "message",
            "content": [
                {
                    "type": "input_text",
                    "text": "Describe this image."
                },
                {
                    "type": "input_image",
                    "image_url": encoded_image
                }
            ]
        }
    ],
    "store": False, # Stateful mode is not supported
    "model": "Qwen2.5-VL-72B-Instruct",
    "temperature": 0,
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    text = response_data["output"][0]["content"][0]["text"]
    print(text)
else:
    print("Error:", response.status_code, response.text)

With the Python OpenAI library

The Qwen2.5-VL-72B-Instruct API is compatible with the OpenAI specification.

First install the openai library:

pip install openai

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

Finally, run the following Python code:

import mimetypes
import os
import base64

from openai import OpenAI

url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"

client = OpenAI(
    base_url=url,
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
)

def read_encoded_image(image_filepath: str) -> str:
    with open(image_filepath, "rb") as img_file:
        image_data = img_file.read()

    # detect MIME type (default to jpeg if unknown)
    mime_type, _ = mimetypes.guess_type(image_filepath)
    if mime_type is None:
        mime_type = "image/jpeg"

    encoded_image = f"data:{mime_type};base64,{base64.b64encode(image_data).decode('utf-8')}"
    return encoded_image

def multimodal_chat_completion(new_message: str, image_filepath: str = None) -> str:
    new_user_message = {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": new_message
            }
        ]
    }

    if image_filepath is not None:
        image_content = {
            "type": "image_url",
            "image_url": {
                "url": read_encoded_image(image_filepath)
            }
        }
        new_user_message["content"].append(image_content)

    history_openai_format = [new_user_message]

    return client.chat.completions.create(
        model="Qwen2.5-VL-72B-Instruct",
        messages=history_openai_format,
        temperature=0.2,
        max_tokens=1024
    ).choices.pop().message.content

def multimodal_responses(new_message: str, image_filepath: str = None) -> str:
    new_user_message = {
        "role": "user",
        "type": "message",
        "content": [
            {
                "type": "input_text",
                "text": new_message
            }
        ]
    }

    if image_filepath is not None:
        image_content = {
            "type": "input_image",
            "image_url": read_encoded_image(image_filepath)
        }
        new_user_message["content"].append(image_content)

    history_openai_format = [new_user_message]

    return client.responses.create(
        model="Qwen2.5-VL-72B-Instruct",
        input=history_openai_format,
        temperature=0.2,
        max_output_tokens=1024,
        store=False # Stateful mode is not supported
    ).output[0].content[0].text

if __name__ == '__main__':
    # With chat completion endpoint
    print(multimodal_chat_completion("Describe this image.", "sample.jpg"))
    # With responses endpoint
    print(multimodal_responses("Describe this image.", "sample.jpg"))

Model rate limit

When using AI Endpoints, the following rate limits apply:

Anonymous: 2 requests per minute, per IP and per model.
Authenticated with an API access key: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a 429 error code will be returned.

If you require higher usage, please get in touch with us to discuss increasing your rate limits.

Going Further

Want to explore the full capabilities of the LLM API? Dive into our dedicated Structured Output and Function Calling guides.

For a broader overview of AI Endpoints, explore the full AI Endpoints Documentation.

Reach out to our support team or join the OVHcloud Discord #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.

Qwen2.5-VL-72B-Instruct