Nvr-tts-it-it

Audio Analysis

This NVIDIA TTS model generates natural-sounding Italian speech from raw text without requiring additional information.

Try another model

About Nvr-tts-it-it model

Published on NVIDIA

None

Licence: Riva license

Free

Output Formats

audio

Context Sizes

Unknown

Parameters

Try out the model by playing with it.

TTS (Text To Speech) API - NvrTtsItIt

The Text-to-Speech (TTS) API endpoint allows you to obtain speech synthesis from raw text.

Introduction

Text-to-Speech (TTS) is a subfield of Artificial Intelligence (AI) that converts written text into spoken words. This TTS API operates as a two-stage pipeline, with a first model generating a mel spectrogram, then a second model using this mel spectrogram to generate speech. This speech synthesis system enables you to synthesize natural speech from raw transcriptions without any additional information.

AI Endpoints makes it easy, with ready-to-use inference APIs. Discover how to use them:

Model concept and configuration

These TTS models were developed by NVIDIA. The TTS AI Endpoint takes text as input and returns audio stream or audio buffer, along with additional optional metadata.

Model configuration:

Transcription mode: offline
Language Support: en-US, es-ES, de-DE, it-IT (please choose the corresponding endpoint - e.g. nvr-tts-en-us)
Input type: Raw text
Voice name: English-US.Female-1, English-US.Male-1, English-US.Female-Calm, English-US.Female-Neutral, English-US.Female-Happy, English-US.Female-Angry, English-US.Female-Fearful, English-US.Female-Sad, English-US.Male-Calm, English-US.Male-Neutral, English-US.Male-Happy, English-US.Male-Angry, Spanish-ES-Female-1, Spanish-ES-Male-1, German-DE-Male-1, Italian-IT-Female-1, Italian-IT-Male-1, Mandarin-CN.Female-1, Mandarin-CN.Male-1, Mandarin-CN.Female-Calm, Mandarin-CN.Female-Neutral, Mandarin-CN.Male-Happy, Mandarin-CN.Male-Fearful, Mandarin-CN.Male-Sad, Mandarin-CN.Male-Calm, Mandarin-CN.Male-Neutral, Mandarin-CN.Male-Angry
Sample Rate: usually 22 000 Hz or 44 000 Hz

How to?

The TTS endpoint offers you a wide range of transcription options. Learn how to use them with the following example:

With a simple HTTP client (requests)

First install the requests library:

pip install requests

Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:

export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>

If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.

Finally, run the following Python code:

import requests

url = "https://nvr-tts-it-it.endpoints.kepler.ai.cloud.ovh.net/api/v1/tts/text_to_audio"

headers = {
    "accept": "application/octet-stream",
    "Content-Type": "application/json",
    "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}

data = {
    "encoding": 1,
    "language_code": "en-US",
    "sample_rate_hz": 16000,
    "text": "We provide a set of managed tools designed for building your Machine Learning projects: AI Notebooks, AI Training, AI Deploy and AI Endpoints.",
    "voice_name": "English-US.Female-1"
}

response = requests.post(url, headers=headers, json=data)

if response.status_code == 200:
    # Save the audio content to a file
    with open("output_audio.wav", "wb") as audio_file:
        audio_file.write(response.content)
    print("Audio file saved as output_audio.wav")
else:
    print("Error:", response.status_code, response.text)

Returning the following result:

Audio file saved as output_audio.wav

You are now able to play and use your generated audio file.

With the gRPC RIVA client

Install RIVA client and audio libraries:

pip install nvidia-riva-client numpy

This use case deals with a basic example that returns the audio speech generated by the model:

import numpy as np
import IPython.display as ipd
import riva.client

# connect with riva tts server
tts_service = riva.client.SpeechSynthesisService(
                riva.client.Auth(
                    uri="nvr-tts-it-it.endpoints-grpc.kepler.ai.cloud.ovh.net:443",
                    use_ssl=True,
                )
            )

# set up config
sample_rate_hz = 44100
req = {
        "language_code"  : "en-US",                                 # choose the corresponding language in the list: en-US / es-ES / de-DE / it-IT
        "encoding"       : riva.client.AudioEncoding.LINEAR_PCM ,
        "sample_rate_hz" : sample_rate_hz,                          # sample rate: 44.1KHz audio
        "voice_name"     : "English-US.Female-1"                    # voices: `English-US.Female-1`, `English-US.Male-1`, 
                                                                    #         `English-US.Female-Calm`, `English-US.Female-Neutral`,
                                                                    #         `English-US.Female-Happy`, `English-US.Female-Angry`,
                                                                    #         `English-US.Female-Fearful`, `English-US.Female-Sad`,
                                                                    #         `English-US.Male-Calm`, `English-US.Male-Neutral`,
                                                                    #         `English-US.Male-Happy`, `English-US.Male-Angry`, 
                                                                    #         `Spanish-ES-Female-1`, `Spanish-ES-Male-1`,
                                                                    #         `German-DE-Male-1`, `Italian-IT-Female-1`, 
                                                                    #         `Italian-IT-Male-1`, `Mandarin-CN.Female-1`, 
                                                                    #         `Mandarin-CN.Male-1`, `Mandarin-CN.Female-Calm`, 
                                                                    #         `Mandarin-CN.Female-Neutral`, `Mandarin-CN.Male-Happy`,
                                                                    #         `Mandarin-CN.Male-Fearful`, `Mandarin-CN.Male-Sad`,
                                                                    #         `Mandarin-CN.Male-Calm`, `Mandarin-CN.Male-Neutral`,
                                                                    #         `Mandarin-CN.Male-Angry`
}

# input text
req["text"] = "We provide a set of managed tools designed for building your Machine Learning projects: AI Notebooks, AI Training, AI Deploy and AI Endpoints."

# return response
response = tts_service.synthesize(**req)
audio_samples = np.frombuffer(response.audio, dtype=np.int16)

# play output audio
ipd.Audio(audio_samples, rate=sample_rate_hz)

Model rate limit

When using AI Endpoints, the following rate limits apply:

Anonymous: 2 requests per minute, per IP and per model.
Authenticated with an API access key: 400 requests per minute, per Public Cloud project and per model.

If you exceed this limit, a 429 error code will be returned.

If you require higher usage, please get in touch with us to discuss increasing your rate limits.

References

For more information about the TTS model features, please refer to RIVA TTS documentation.

Going Further

For a broader overview of AI Endpoints, explore the full AI Endpoints Documentation.

Reach out to our support team or join the OVHcloud Discord #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.

Nvr-tts-it-it

Nvr-tts-it-it

About Nvr-tts-it-it model

Free

Output Formats

Context Sizes

Parameters

Try out the model by playing with it.

Nvr-tts-it-it API

TTS (Text To Speech) API - NvrTtsItIt

Introduction

Model concept and configuration

How to?

With a simple HTTP client (requests)

With the gRPC RIVA client

Model rate limit

References

Going Further

Get started with Nvr-tts-it-it

AI Endpoints - Getting Started

AI Endpoints - Features, Capabilities and Limitations

AI Endpoints - Troubleshooting

AI Endpoints - Using Virtual Models

AI Endpoints - Speech to Text

Create your own Audio Summarizer Assistant

Build a Powerful Voice Assistant

Speaker Diarization with ASR models

Expressive Speech with TTS and Emotion Mixing