Whisper is a state-of-the-art model for automatic speech recognition (ASR) and speech translation, proposed in the paper "Robust Speech Recognition via Large-Scale Weak Supervision" by Alec Radford et al. from OpenAI. Trained on >5M hours of labeled data, Whisper demonstrates a strong ability to generalise to many datasets and domains in a zero-shot setting. Whisper large-v3-turbo is a finetuned version of a pruned Whisper large-v3. In other words, it's the exact same model, except that the number of decoding layers have reduced from 32 to 4. As a result, the model is way faster, at the expense of a minor quality degradation.
The OpenAI-compatible Audio transcription API endpoint allows you to recognize and transcribe audio, especially human speech, into text.
Audio transcription is a technology that converts spoken language into written text. It is a complex process that involves several stages, including speech signal preprocessing, feature extraction, acoustic modeling, language modeling, and speech recognition engine.
AI Endpoints makes it easy, with ready-to-use inference APIs. Discover how to use them:
The Audio transcription endpoint offers you a wide range of transcription options for audios. Learn how to use them with the following examples:
First, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:
export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>
If you do not have an access token key yet, follow the instructions in the AI Endpoints – Getting Started.
The whisper-large-v3-turbo API is compatible with the openai specification.
First install the openai library:
pip install openai
Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:
export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>
Finally, run the following python code:
import os
from openai import OpenAI
# You can use the model dedicated URL
url = "https://whisper-large-v3-turbo.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1"
# Or our unified endpoint for easy model switching with optimal OpenAI compatibility
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/"
audio_file_path = "my_audio.mp3"
client = OpenAI(
base_url=url,
api_key=os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')
)
audio_file = open(audio_file_path, "rb")
transcript = client.audio.transcriptions.create(
model="whisper-large-v3-turbo",
file=audio_file
)
print(transcript)
Which returns the following:
Transcription(text=' It is very exciting that the Olympic Games will be held in Paris in 2024 for France. The Olympic Games have been a fantastic way for England to feel excited about sports and the 2012 Olympics in London were very interesting for the whole nation. It was an opportunity for us to come together and unite under the interest of sport. The Olympic Games provided a sense of patriotism that I believe it also will do in France this year. It is a chance for everyone to get involved and become interested about different types of sports and to cheer on their home country.', logprobs=None, task='transcribe', success=True, language='en', duration=60.936, words=[], segments=[{'id': 1, 'seek': 0, 'start': 0.0, 'end': 8.48, 'text': ' It is very exciting that the Olympic Games will be held in Paris in 2024 for France.', 'tokens': [50365, 467, 307, 588, 4670, 300, 264, 19169, 12761, 486, 312, 5167, 294, 8380, 294, 45237, 337, 6190, 13, 50865], 'temperature': 0.0, 'avg_logprob': -0.31054688, 'compression_ratio': 0.9767442, 'no_speech_prob': 0.0}, {'id': 2, 'seek': 848, 'start': 8.48, 'end': 26.06, 'text': ' The Olympic Games have been a fantastic way for England to feel excited about sports and the 2012 Olympics in London were very interesting for the whole nation.', 'tokens': [50365, 440, 19169, 12761, 362, 668, 257, 5456, 636, 337, 8196, 281, 841, 2919, 466, 6573, 293, 264, 9125, 19854, 294, 7042, 645, 588, 1880, 337, 264, 1379, 4790, 13, 51246], 'temperature': 0.0, 'avg_logprob': -0.1633113, 'compression_ratio': 1.4670658, 'no_speech_prob': 0.0}, {'id': 3, 'seek': 848, 'start': 26.64, 'end': 34.84, 'text': ' It was an opportunity for us to come together and unite under the interest of sport.', 'tokens': [51246, 467, 390, 364, 2650, 337, 505, 281, 808, 1214, 293, 29320, 833, 264, 1179, 295, 7282, 13, 51685], 'temperature': 0.0, 'avg_logprob': -0.1633113, 'compression_ratio': 1.4670658, 'no_speech_prob': 0.0}, {'id': 4, 'seek': 3484, 'start': 34.84, 'end': 45.2, 'text': ' The Olympic Games provided a sense of patriotism that I believe it also will do in France this year.', 'tokens': [50365, 440, 19169, 12761, 5649, 257, 2020, 295, 44210, 1434, 300, 286, 1697, 309, 611, 486, 360, 294, 6190, 341, 1064, 13, 50885], 'temperature': 0.0, 'avg_logprob': -0.11174939, 'compression_ratio': 1.4268292, 'no_speech_prob': 0.0}, {'id': 5, 'seek': 3484, 'start': 46.12, 'end': 60.24, 'text': ' It is a chance for everyone to get involved and become interested about different types of sports and to cheer on their home country.', 'tokens': [50885, 467, 307, 257, 2931, 337, 1518, 281, 483, 3288, 293, 1813, 3102, 466, 819, 3467, 295, 6573, 293, 281, 12581, 322, 641, 1280, 1941, 13, 51640], 'temperature': 0.0, 'avg_logprob': -0.11174939, 'compression_ratio': 1.4268292, 'no_speech_prob': 0.0}], diarization=[], usage={'type': 'duration', 'duration': 61.0})
To process an audio file with curl, run the following command in your terminal:
curl -X POST "https://whisper-large-v3-turbo.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/audio/transcriptions" \
-H 'accept: application/json' \
-H 'Bearer $OVH_AI_ENDPOINTS_ACCESS_TOKEN' \
-F "file=@my_audio.mp3" \
-F "temperature=0.7" \
-H 'Content-Type: multipart/form-data'
Which returns the following:
{"task":"transcribe","success":true,"language":"en","duration":60.925,"text":" It is very exciting that the Olympic Games will be held in Paris 2024 for France. The Olympic Games have been a fantastic way for England to feel excited about sports and the 2012 Olympics in London were very interesting for the whole nation It was an opportunity for us to come together and unite under the interest of sport The Olympic Games provided a sense of patriotism that I believe it also will do in France this year. It's a chance for everyone to get involved and become interested about different types of sports and to cheer, cheer on their home country.","words":[],"segments":[{"id":1,"seek":0,"start":0.0,"end":8.46,"text":" It is very exciting that the Olympic Games will be held in Paris 2024 for France.","tokens":[50365,467,307,588,4670,300,264,19169,12761,486,312,5167,294,8380,45237,337,6190,13,50799],"temperature":0.7,"avg_logprob":-0.5378082,"compression_ratio":0.97590363,"no_speech_prob":0.0},{"id":2,"seek":846,"start":8.46,"end":17.44,"text":" The Olympic Games have been a fantastic way for England to feel excited about sports","tokens":[50365,440,19169,12761,362,668,257,5456,636,337,8196,281,841,2919,466,6573,50817],"temperature":0.7,"avg_logprob":-0.16149491,"compression_ratio":1.4817073,"no_speech_prob":0.0},{"id":3,"seek":846,"start":17.44,"end":26.06,"text":" and the 2012 Olympics in London were very interesting for the whole nation","tokens":[50817,293,264,9125,19854,294,7042,645,588,1880,337,264,1379,4790,51247],"temperature":0.7,"avg_logprob":-0.16149491,"compression_ratio":1.4817073,"no_speech_prob":0.0},{"id":4,"seek":846,"start":26.06,"end":34.84,"text":" It was an opportunity for us to come together and unite under the interest of sport","tokens":[51247,467,390,364,2650,337,505,281,808,1214,293,29320,833,264,1179,295,7282,51685],"temperature":0.7,"avg_logprob":-0.16149491,"compression_ratio":1.4817073,"no_speech_prob":0.0},{"id":5,"seek":3484,"start":35.72,"end":45.18,"text":" The Olympic Games provided a sense of patriotism that I believe it also will do in France this year.","tokens":[50409,440,19169,12761,5649,257,2020,295,44210,1434,300,286,1697,309,611,486,360,294,6190,341,1064,13,50884],"temperature":0.7,"avg_logprob":-0.12831916,"compression_ratio":1.3953488,"no_speech_prob":0.0},{"id":6,"seek":3484,"start":46.26,"end":53.76,"text":" It's a chance for everyone to get involved and become interested about different types of sports","tokens":[50940,467,311,257,2931,337,1518,281,483,3288,293,1813,3102,466,819,3467,295,6573,51313],"temperature":0.7,"avg_logprob":-0.12831916,"compression_ratio":1.3953488,"no_speech_prob":0.0},{"id":7,"seek":3484,"start":53.76,"end":60.26,"text":" and to cheer, cheer on their home country.","tokens":[51313,293,281,12581,11,12581,322,641,1280,1941,13,51638],"temperature":0.7,"avg_logprob":-0.12831916,"compression_ratio":1.3953488,"no_speech_prob":0.0}],"diarization":[],"usage":{"type":"duration","duration":61.0}}
First install the requests library:
pip install requests
Next, export your access token to the OVH_AI_ENDPOINTS_ACCESS_TOKEN environment variable:
export OVH_AI_ENDPOINTS_ACCESS_TOKEN=<your-access-token>
Finally, run the following python code:
import os
import requests
# You can use the model dedicated URL
url = "https://whisper-large-v3-turbo.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/audio/transcriptions"
# Or our unified endpoint for easy model switching with optimal OpenAI compatibility
url = "https://oai.endpoints.kepler.ai.cloud.ovh.net/v1/audio/transcriptions"
audio_file_path = "my_audio.mp3"
headers = {
"accept": "application/json",
"Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}",
}
data = {
"timestamp_granularities": "segment, word",
"temperature": 0.7
}
headers["accept"] = "application/json"
files = {"file": open(audio_file_path, "rb")}
data = {
"model": "whisper-large-v3-turbo",
"temperature": 0.7,
"timestamp_granularities": "segment, word"
}
response = requests.post(url, headers=headers, files=files, data=data)
if response.status_code == 200:
# Handle response
print(response.json())
else:
print("Error:", response.status_code, response.text)
Returning the following result, which contains the audio transcription API response:
{'task': 'transcribe', 'success': True, 'language': 'en', 'duration': 60.936, 'text': " It is very exciting that the Olympic Games will be held in Paris in 2024 for France. The Olympic Games have been a fantastic way for England to feel excited about sports and the 2012 Olympics in London were very interesting for the whole nation. It was an opportunity for us to come together and unite under the interest of sport. The Olympic Games provided a sense of patriotism that I believe it also will do in France this year. It's a chance for everyone to get involved and become interested about different types of sports and to cheer, cheer on their home country.", 'words': [{'word': 'It', 'start': 0.0, 'end': 0.84}, {'word': 'is', 'start': 0.84, 'end': 1.02}, {'word': 'very', 'start': 1.02, 'end': 1.3}, {'word': 'exciting', 'start': 1.3, 'end': 1.9}, {'word': 'that', 'start': 1.9, 'end': 2.32}, {'word': 'the', 'start': 2.32, 'end': 2.52}, {'word': 'Olympic', 'start': 2.52, 'end': 2.92}, {'word': 'Games', 'start': 2.92, 'end': 3.36}, {'word': 'will', 'start': 3.36, 'end': 3.68}, {'word': 'be', 'start': 3.68, 'end': 3.86}, {'word': 'held', 'start': 3.86, 'end': 4.3}, {'word': 'in', 'start': 4.3, 'end': 5.06}, {'word': 'Paris', 'start': 5.06, 'end': 5.62}, {'word': 'in', 'start': 5.62, 'end': 6.16}, {'word': '2024', 'start': 6.16, 'end': 7.02}, {'word': 'for', 'start': 7.02, 'end': 7.96}, {'word': 'France.', 'start': 7.96, 'end': 8.48}, {'word': 'The', 'start': 8.48, 'end': 9.96}, {'word': 'Olympic', 'start': 9.96, 'end': 10.32}, {'word': 'Games', 'start': 10.32, 'end': 10.86}, {'word': 'have', 'start': 10.86, 'end': 11.86}, {'word': 'been', 'start': 11.86, 'end': 12.14}, {'word': 'a', 'start': 12.14, 'end': 12.34}, {'word': 'fantastic', 'start': 12.34, 'end': 13.04}, {'word': 'way', 'start': 13.04, 'end': 13.66}, {'word': 'for', 'start': 13.66, 'end': 14.32}, {'word': 'England', 'start': 14.32, 'end': 14.8}, {'word': 'to', 'start': 14.8, 'end': 15.32}, {'word': 'feel', 'start': 15.32, 'end': 15.6}, {'word': 'excited', 'start': 15.6, 'end': 16.14}, {'word': 'about', 'start': 16.14, 'end': 16.72}, {'word': 'sports', 'start': 16.72, 'end': 17.48}, {'word': 'and', 'start': 17.48, 'end': 18.86}, {'word': 'the', 'start': 18.86, 'end': 19.06}, {'word': '2012', 'start': 19.06, 'end': 19.7}, {'word': 'Olympics', 'start': 19.7, 'end': 20.48}, {'word': 'in', 'start': 20.48, 'end': 21.14}, {'word': 'London', 'start': 21.14, 'end': 21.6}, {'word': 'were', 'start': 21.6, 'end': 22.92}, {'word': 'very', 'start': 22.92, 'end': 23.6}, {'word': 'interesting', 'start': 23.6, 'end': 24.32}, {'word': 'for', 'start': 24.32, 'end': 24.96}, {'word': 'the', 'start': 24.96, 'end': 25.3}, {'word': 'whole', 'start': 25.3, 'end': 25.58}, {'word': 'nation.', 'start': 25.58, 'end': 26.06}, {'word': 'It', 'start': 26.64, 'end': 27.4}, {'word': 'was', 'start': 27.4, 'end': 27.58}, {'word': 'an', 'start': 27.58, 'end': 27.74}, {'word': 'opportunity', 'start': 27.74, 'end': 28.42}, {'word': 'for', 'start': 28.42, 'end': 29.26}, {'word': 'us', 'start': 29.26, 'end': 29.5}, {'word': 'to', 'start': 29.5, 'end': 29.68}, {'word': 'come', 'start': 29.68, 'end': 30.0}, {'word': 'together', 'start': 30.0, 'end': 30.52}, {'word': 'and', 'start': 30.52, 'end': 31.36}, {'word': 'unite', 'start': 31.36, 'end': 31.98}, {'word': 'under', 'start': 31.98, 'end': 33.14}, {'word': 'the', 'start': 33.14, 'end': 33.38}, {'word': 'interest', 'start': 33.38, 'end': 33.98}, {'word': 'of', 'start': 33.98, 'end': 34.32}, {'word': 'sport.', 'start': 34.32, 'end': 34.84}, {'word': 'The', 'start': 35.74, 'end': 36.6}, {'word': 'Olympic', 'start': 36.6, 'end': 37.0}, {'word': 'Games', 'start': 37.0, 'end': 37.52}, {'word': 'provided', 'start': 37.52, 'end': 38.48}, {'word': 'a', 'start': 38.48, 'end': 38.78}, {'word': 'sense', 'start': 38.78, 'end': 39.12}, {'word': 'of', 'start': 39.12, 'end': 39.34}, {'word': 'patriotism', 'start': 39.34, 'end': 40.46}, {'word': 'that', 'start': 40.46, 'end': 41.18}, {'word': 'I', 'start': 41.18, 'end': 41.36}, {'word': 'believe', 'start': 41.36, 'end': 41.84}, {'word': 'it', 'start': 41.84, 'end': 42.28}, {'word': 'also', 'start': 42.28, 'end': 42.66}, {'word': 'will', 'start': 42.66, 'end': 43.02}, {'word': 'do', 'start': 43.02, 'end': 43.36}, {'word': 'in', 'start': 43.36, 'end': 44.02}, {'word': 'France', 'start': 44.02, 'end': 44.5}, {'word': 'this', 'start': 44.5, 'end': 44.9}, {'word': 'year.', 'start': 44.9, 'end': 45.2}, {'word': "It's", 'start': 46.3, 'end': 47.22}, {'word': 'a', 'start': 47.22, 'end': 47.32}, {'word': 'chance', 'start': 47.32, 'end': 47.68}, {'word': 'for', 'start': 47.68, 'end': 47.9}, {'word': 'everyone', 'start': 47.9, 'end': 48.36}, {'word': 'to', 'start': 48.36, 'end': 48.78}, {'word': 'get', 'start': 48.78, 'end': 49.0}, {'word': 'involved', 'start': 49.0, 'end': 49.58}, {'word': 'and', 'start': 49.58, 'end': 50.4}, {'word': 'become', 'start': 50.4, 'end': 50.76}, {'word': 'interested', 'start': 50.76, 'end': 51.42}, {'word': 'about', 'start': 51.42, 'end': 51.96}, {'word': 'different', 'start': 51.96, 'end': 52.58}, {'word': 'types', 'start': 52.58, 'end': 53.04}, {'word': 'of', 'start': 53.04, 'end': 53.26}, {'word': 'sports', 'start': 53.26, 'end': 53.78}, {'word': 'and', 'start': 53.78, 'end': 54.8}, {'word': 'to', 'start': 54.8, 'end': 55.3}, {'word': 'cheer,', 'start': 55.3, 'end': 57.24}, {'word': 'cheer', 'start': 57.64, 'end': 58.44}, {'word': 'on', 'start': 58.44, 'end': 58.8}, {'word': 'their', 'start': 58.8, 'end': 59.38}, {'word': 'home', 'start': 59.38, 'end': 59.74}, {'word': 'country.', 'start': 59.74, 'end': 60.24}], 'segments': [{'id': 1, 'seek': 0, 'start': 0.0, 'end': 8.48, 'text': ' It is very exciting that the Olympic Games will be held in Paris in 2024 for France.', 'tokens': [50365, 467, 307, 588, 4670, 300, 264, 19169, 12761, 486, 312, 5167, 294, 8380, 294, 45237, 337, 6190, 13, 50875], 'temperature': 0.7, 'avg_logprob': -0.40542248, 'compression_ratio': 0.9767442, 'no_speech_prob': 0.0}, {'id': 2, 'seek': 848, 'start': 8.48, 'end': 17.48, 'text': ' The Olympic Games have been a fantastic way for England to feel excited about sports', 'tokens': [50365, 440, 19169, 12761, 362, 668, 257, 5456, 636, 337, 8196, 281, 841, 2919, 466, 6573, 50816], 'temperature': 0.7, 'avg_logprob': -0.1908958, 'compression_ratio': 1.4670658, 'no_speech_prob': 0.0}, {'id': 3, 'seek': 848, 'start': 17.48, 'end': 26.06, 'text': ' and the 2012 Olympics in London were very interesting for the whole nation.', 'tokens': [50816, 293, 264, 9125, 19854, 294, 7042, 645, 588, 1880, 337, 264, 1379, 4790, 13, 51246], 'temperature': 0.7, 'avg_logprob': -0.1908958, 'compression_ratio': 1.4670658, 'no_speech_prob': 0.0}, {'id': 4, 'seek': 848, 'start': 26.64, 'end': 34.84, 'text': ' It was an opportunity for us to come together and unite under the interest of sport.', 'tokens': [51246, 467, 390, 364, 2650, 337, 505, 281, 808, 1214, 293, 29320, 833, 264, 1179, 295, 7282, 13, 51685], 'temperature': 0.7, 'avg_logprob': -0.1908958, 'compression_ratio': 1.4670658, 'no_speech_prob': 0.0}, {'id': 5, 'seek': 3484, 'start': 35.74, 'end': 45.2, 'text': ' The Olympic Games provided a sense of patriotism that I believe it also will do in France this year.', 'tokens': [50410, 440, 19169, 12761, 5649, 257, 2020, 295, 44210, 1434, 300, 286, 1697, 309, 611, 486, 360, 294, 6190, 341, 1064, 13, 50885], 'temperature': 0.7, 'avg_logprob': -0.1453783, 'compression_ratio': 1.3953488, 'no_speech_prob': 0.0}, {'id': 6, 'seek': 3484, 'start': 46.3, 'end': 53.78, 'text': " It's a chance for everyone to get involved and become interested about different types of sports", 'tokens': [50943, 467, 311, 257, 2931, 337, 1518, 281, 483, 3288, 293, 1813, 3102, 466, 819, 3467, 295, 6573, 51314], 'temperature': 0.7, 'avg_logprob': -0.1453783, 'compression_ratio': 1.3953488, 'no_speech_prob': 0.0}, {'id': 7, 'seek': 3484, 'start': 53.78, 'end': 60.24, 'text': ' and to cheer, cheer on their home country.', 'tokens': [51314, 293, 281, 12581, 11, 12581, 322, 641, 1280, 1941, 13, 51639], 'temperature': 0.7, 'avg_logprob': -0.1453783, 'compression_ratio': 1.3953488, 'no_speech_prob': 0.0}], 'diarization': [], 'usage': {'type': 'duration', 'duration': 61.0}}
When using AI Endpoints, the following rate limits apply:
On top of the request rate limitations, AI Endpoints also applies the following limitations to the audio data:
If you exceed these limits, a 429 error code will be returned.
If you require higher usage, please get in touch with us to discuss increasing your rate limits.
For more information about the whisper model features, please refer to Hugging Face documentation:
New to AI Endpoints? This guide walks you through everything you need to get an access token, call AI models, and integrate AI APIs into your apps with ease.
Start TutorialExplore what AI Endpoints can do. This guide breaks down current features, future roadmap items, and the platform's core capabilities so you know exactly what to expect.
Start TutorialRunning into issues? This guide helps you solve common problems on AI Endpoints, from error codes to unexpected responses. Get quick answers, clear fixes, and helpful tips to keep your projects running smoothly.
Start TutorialLearn how to use OVHcloud AI Endpoints Virtual Models.
Start TutorialExplore the full potential of the ASR API through advanced parameters (diarization, timestamp granularities, chunking strategies, prompts ...) and ready-to-run code examples in Python, cURL, and JavaScript.
Start TutorialTurn hours of audio into sharp, readable summaries! This guide walks you through building an AI-powered audio assistant using ASR and LLMs, perfect for meetings, podcasts, or any voice recordings.
Start TutorialBuild a voice-enabled assistant in under 100 lines of code! Learn how to combine ASR, LLM, and TTS endpoints to create an AI that listens, understands, and responds.
Start TutorialDiscover how to synthesize realistic dialogues and automatically identify who’s speaking. This notebook shows you how to generate speech and label each speaker, perfect for building smart transcripts.
Start TutorialBring your synthetic voice to life! Discover how to use Text-To-Speech with emotion mixing—adjusting pitch, tone, pace, and style to express emotions like joy, sadness, or surprise.
Start Tutorial