Gpt-oss-20b

Reasoning LLM

gpt-oss-20b delivers strong performance with fast inference and efficient reasoning. Ideal for a wide range of tasks, it offers flexible deployment for developers seeking powerful, customizable AI solutions.

Try another model

About Gpt-oss-20b model

Published on huggingface

05/08/2025

Licence: Apache 2.0

Input price

0.04 € /Mtoken(input)

Output price

0.15 € /Mtoken(output)

Supported Features

Function callingReasoningStreaming

Output Formats

raw_textjson_objectjson_schema

Context Sizes

131k

Parameters

21B

Try out the model by playing with it.

Example of how to use gpt-oss-20b API in python

With a simple HTTP client (requests)

First install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

Finally, run the following Python code:

import os
import requests
 
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"
payload = {
    "max_tokens": 512,
    "messages": [
        {
            "content": "Explain gravity to a 6 years old",
            "role": "user"
        }
    ],
    "model": "gpt-oss-20b",
    "temperature": 0,
}
 
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}
 
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    # Parse JSON response
    choices = response_data["choices"]
    for choice in choices:
        text = choice["message"]["content"]
        # Process text and finish_reason
        print(text)
        reasoning_text = choice["message"]["reasoning_content"]
        # Process reasoning_content
        print(reasoning_text)
else:
    print("Error:", response.status_code, response.text)

With the Python OpenAI library

The gpt-oss-20b API is compatible with the OpenAI specification.

First install the openai library:

pip install openai

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

Finally, run the following Python code:

import os

from openai import OpenAI

url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"

client = OpenAI(
    base_url=url,
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
)

def chat_completion(new_message: str) -> str:
    history_openai_format = [{"role": "user", "content": new_message}]

    choice = client.chat.completions.create(
        model="gpt-oss-20b",
        messages=history_openai_format,
        temperature=0,
        max_tokens=1024,
        reasoning_effort="low", # Either low, medium, or high
    ).choices.pop()
    
    print(f"** REASONING ** \n {choice.message.reasoning_content}\n")
    print(f"** CONTENT ** \n {choice.message.content}")

if __name__ == '__main__':
    chat_completion("Explain gravity for a 6 years old")

Model rate limit

When using AI Endpoints, the following rate limits apply:

Anonymous: 2 requests per minute, per IP and per model.
Authenticated with an API access key: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a 429 error code will be returned.

If you require higher usage, please get in touch with us to discuss increasing your rate limits.

Going Further

Want to explore the full capabilities of the LLM API? Dive into our dedicated Structured Output and Function Calling guides.

For a broader overview of AI Endpoints, explore the full AI Endpoints Documentation.

Reach out to our support team or join the OVHcloud Discord #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.

Gpt-oss-20b

Gpt-oss-20b

About Gpt-oss-20b model

Input price

Output price

Supported Features

Output Formats

Context Sizes

Parameters

Try out the model by playing with it.

Gpt-oss-20b API

Example of how to use gpt-oss-20b API in python

With a simple HTTP client (requests)

With the Python OpenAI library

Model rate limit

Going Further

Get started with Gpt-oss-20b

AI Endpoints - Getting Started

AI Endpoints - Features, Capabilities and Limitations

AI Endpoints - Troubleshooting

AI Endpoints - Structured Output

AI Endpoints - Function Calling

AI Endpoints - Using Virtual Models