Qwen3-32B

NewReasoning LLM

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support. With this model, reasoning can be disabled by using \"/no_think\" in your prompts.

Try another model

About Qwen3-32B model

Published on huggingface

05/03/2025

Licence: Apache 2.0

Input price

0.08 € /Mtoken(input)

Output price

0.23 € /Mtoken(output)

Supported Features

Function callingReasoningStreaming

Output Formats

raw_textjson_objectjson_schema

Context Sizes

32k

Parameters

32.8B

Try out the model by playing with it.

Example of how to use Qwen3-32B API in python

With a simple HTTP client (requests)

First install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

Finally, run the following python code:

import os
import requests
 
# You can use the model dedicated URL
url = "https://qwen-3-32b.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions"
# Or our unified endpoint for easy model switching with optimal OpenAI compatibility
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"
payload = {
    "max_tokens": 512,
    "messages": [
        {
            "content": "Explain gravity to a 6 years old",
            "role": "user"
        }
    ],
    "model": "Qwen3-32B",
    "temperature": 0,
}
 
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}
 
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    # Parse JSON response
    choices = response_data["choices"]
    for choice in choices:
        text = choice["message"]["content"]
        # Process text and finish_reason
        print(text)
else:
    print("Error:", response.status_code, response.text)

With the Python OpenAI library

The Qwen3-32B API is compatible with the openai specification.

First install the openai library:

pip install openai

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

Finally, run the following python code:

import os

from openai import OpenAI

# You can use the model dedicated URL
url = "https://qwen-3-32b.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1"
# Or our unified endpoint for easy model switching with optimal OpenAI compatibility
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1

client = OpenAI(
    base_url=url,
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
)

def chat_completion(new_message: str) -> str:
    history_openai_format = [{"role": "user", "content": new_message}]

    return client.chat.completions.create(
        model="Qwen3-32B",
        messages=history_openai_format,
        temperature=0,
        max_tokens=1024
    ).choices.pop().message.content


if __name__ == '__main__':
    print(chat_completion("Explain gravity for a 6 years old"))

Model rate limit

When using AI Endpoints, the following rate limits apply:

Anonymous: 2 requests per minute, per IP and per model.
Authenticated with an API access key: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a 429 error code will be returned.

If you require higher usage, please get in touch with us to discuss increasing your rate limits.

Qwen3-32B

Qwen3-32B

About Qwen3-32B model

Input price

Output price

Supported Features

Output Formats

Context Sizes

Parameters

Try out the model by playing with it.

Qwen3-32B API

Example of how to use Qwen3-32B API in python

With a simple HTTP client (requests)

With the Python OpenAI library

Model rate limit

Get started with Qwen3-32B

AI Endpoints - Getting Started

AI Endpoints - Features, Capabilities and Limitations

AI Endpoints - Troubleshooting

AI Endpoints - Structured Output

AI Endpoints - Function Calling

AI Endpoints - Using Virtual Models