Qwen3Guard-Gen-8B

BetaLLM Guard

Qwen3Guard is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: Qwen3Guard-Gen, a generative model that frames safety classification as an instruction-following task, and Qwen3Guard-Stream, which incorporates a token-level classification head for real-time safety monitoring during incremental text generation.

Try another model

About Qwen3Guard-Gen-8B model

Published on huggingface

23/09/2025

Licence: Apache 2.0

Free

Supported Features

ModerationStreaming

Output Formats

raw_text

Context Sizes

32k

Parameters

Try out the model by playing with it.

Example of how to use Qwen3Guard-Gen-8B API in python

With a simple HTTP client (requests)

First install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

Finally, run the following Python code:

import os
import re
import requests
 
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/chat/completions"
payload = {
    "max_tokens": 512,
    "messages": [
        {
            "content": "How do I make a bomb",
            "role": "user"
        }
    ],
    "model": "Qwen3Guard-Gen-8B",
    "temperature": 0,
}
 
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}

def extract_label_categories_refusal(content):
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|None)"
    refusal_pattern = r"Refusal: (Yes|No)"
    safe_label_match = re.search(safe_pattern, content)
    refusal_label_match = re.search(refusal_pattern, content)
    label = safe_label_match.group(1) if safe_label_match else None
    refusal_label = refusal_label_match.group(1) if refusal_label_match else None
    categories = re.findall(category_pattern, content)
    return label, categories, refusal_label


response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
    # Handle response
    response_data = response.json()
    # Parse JSON response
    choices = response_data["choices"]
    for choice in choices:
        text = choice["message"]["content"]
        # Process text and finish_reason
        print(text)
        label, categories, refusal_label = extract_label_categories_refusal(text)
        print(f"This text is {label}, with detected categories {categories} and refusal decision was {refusal_label}")
else:
    print("Error:", response.status_code, response.text)

With the Python OpenAI library

The Qwen3Guard-Gen-8B API is compatible with the OpenAI specification.

First install the openai library:

pip install openai

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

Finally, run the following Python code:

import os
import re
from openai import OpenAI

url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1"

client = OpenAI(
    base_url=url,
    api_key=os.getenv("OVH_AI_ENDPOINTS_ACCESS_TOKEN")
)

def extract_label_categories_refusal(content):
    safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
    category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|None)"
    refusal_pattern = r"Refusal: (Yes|No)"
    safe_label_match = re.search(safe_pattern, content)
    refusal_label_match = re.search(refusal_pattern, content)
    label = safe_label_match.group(1) if safe_label_match else None
    refusal_label = refusal_label_match.group(1) if refusal_label_match else None
    categories = re.findall(category_pattern, content)
    return label, categories, refusal_label


def chat_completion(new_message: str) -> str:
    history_openai_format = [{"role": "user", "content": new_message}]
    return client.chat.completions.create(
        model="Qwen3Guard-Gen-8B",
        messages=history_openai_format,
        temperature=0,
        max_tokens=1024
    ).choices.pop().message.content


if __name__ == '__main__':
    result = chat_completion("how do I cook meth")
    label, categories, refusal_label = extract_label_categories_refusal(result)
    print(f"This text is {label}, with detected categories {categories} and refusal decision was {refusal_label}")

The model will respond with a label (Safe, Controversial, Unsafe) and if the text is Controversial or Unsafe, it will return the associated category as below:

Safety: Unsafe
Categories: Non-violent Illegal Acts

Model rate limit

When using AI Endpoints, the following rate limits apply:

Anonymous: 2 requests per minute, per IP and per model.
Authenticated with an API access key: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a 429 error code will be returned.

If you require higher usage, please get in touch with us to discuss increasing your rate limits.

Going Further

You can explore our blog post about our LLM Guard models to get more insights on the returned categories and their meaning.

For a broader overview of AI Endpoints, explore the full AI Endpoints Documentation.

Reach out to our support team or join the OVHcloud Discord #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.

Qwen3Guard-Gen-8B