# TTS (Text To Speech) API - NvrTtsEnUs The **Text-to-Speech** (TTS) API endpoint allows you to obtain speech synthesis from raw text. ## Introduction **Text-to-Speech** (TTS) is a subfield of Artificial Intelligence (AI) that converts written text into spoken words. This TTS API operates as a two-stage pipeline, with a first model generating a mel spectrogram, then a second model using this mel spectrogram to generate speech. This speech synthesis system enables you to synthesize natural speech from raw transcriptions without any additional information. AI Endpoints makes it easy, with ready-to-use inference APIs. Discover how to use them: ## Model concept and configuration These TTS models were developed by NVIDIA. The TTS AI Endpoint takes text as input and returns audio stream or audio buffer, along with additional optional metadata. **Model configuration:** - **Transcription mode**: offline - **Language Support**: en-US, es-ES, de-DE, it-IT (please choose the corresponding endpoint - e.g. *nvr-tts-en-us*) - **Input type**: Raw text - **Voice name**: This parameter specifies the voice to use for speech synthesis, allowing selection of speaker gender and emotional styles. Available options vary depending on the model language; some languages offer both male and female voices, while others may have only one gender available, and emotional variations (such as neutral, calm, or happy) are limited to certain languages. Voices are prefixed with the language code (e.g., English-US). The suffix "-1" (e.g., Female-1 or Male-1) indicates the base voice, which has natural characteristics like timbre and accent, without any specific emotional modification. The available voice names are as follows: Detailed breakdown of available voice names: English-US.Female-1, English-US.Male-1, English-US.Female-Calm, English-US.Female-Neutral, English-US.Female-Happy, English-US.Female-Angry, English-US.Female-Fearful, English-US.Female-Sad, English-US.Male-Calm, English-US.Male-Neutral, English-US.Male-Happy, English-US.Male-Angry, Spanish-ES-Female-1, Spanish-ES-Male-1, German-DE-Male-1, Italian-IT-Female-1, Italian-IT-Male-1. - **Sample Rate**: usually 22 000 Hz or 44 000 Hz ## How to? The **TTS** endpoint offers you a wide range of transcription options. Learn how to use them with the following example: ### With a simple HTTP client (requests) First install the *requests* library: ```bash pip install requests ``` Next, export your access token to the *OVH_AI_ENDPOINTS_ACCESS_TOKEN* environment variable: ```bash export OVH_AI_ENDPOINTS_ACCESS_TOKEN= ``` *If you do not have an access token key yet, follow the instructions in the [AI Endpoints – Getting Started](https://help.ovhcloud.com/csm/en-gb-public-cloud-ai-endpoints-getting-started?id=kb_article_view&sysparm_article=KB0065401).* Finally, run the following Python code: ```python import requests url = "https://nvr-tts-en-us.endpoints.kepler.ai.cloud.ovh.net/api/v1/tts/text_to_audio" headers = { "accept": "application/octet-stream", "Content-Type": "application/json", "Authorization": f"Bearer {os.getenv('OVH_AI_ENDPOINTS_ACCESS_TOKEN')}", } data = { "encoding": 1, "language_code": "en-US", "sample_rate_hz": 16000, "text": "We provide a set of managed tools designed for building your Machine Learning projects: AI Notebooks, AI Training, AI Deploy and AI Endpoints.", "voice_name": "English-US.Female-1" } response = requests.post(url, headers=headers, json=data) if response.status_code == 200: # Save the audio content to a file with open("output_audio.wav", "wb") as audio_file: audio_file.write(response.content) print("Audio file saved as output_audio.wav") else: print("Error:", response.status_code, response.text) ``` Returning the following result: ``` Audio file saved as output_audio.wav ``` You are now able to play and use your generated audio file. ### With the gRPC RIVA client Install RIVA client and audio libraries: ```python pip install nvidia-riva-client numpy ``` This use case deals with a basic example that returns the audio speech generated by the model: ```python import numpy as np import IPython.display as ipd import riva.client # connect with riva tts server tts_service = riva.client.SpeechSynthesisService( riva.client.Auth( uri="nvr-tts-en-us.endpoints-grpc.kepler.ai.cloud.ovh.net:443", use_ssl=True, ) ) # set up config sample_rate_hz = 44100 req = { "language_code" : "en-US", # choose the corresponding language in the list: en-US / es-ES / de-DE / it-IT "encoding" : riva.client.AudioEncoding.LINEAR_PCM , "sample_rate_hz" : sample_rate_hz, # sample rate: 44.1KHz audio "voice_name" : "English-US.Female-1" # voices: `English-US.Female-1`, `English-US.Male-1`, # `English-US.Female-Calm`, `English-US.Female-Neutral`, # `English-US.Female-Happy`, `English-US.Female-Angry`, # `English-US.Female-Fearful`, `English-US.Female-Sad`, # `English-US.Male-Calm`, `English-US.Male-Neutral`, # `English-US.Male-Happy`, `English-US.Male-Angry`, # `Spanish-ES-Female-1`, `Spanish-ES-Male-1`, # `German-DE-Male-1`, `Italian-IT-Female-1`, # `Italian-IT-Male-1` } # input text req["text"] = "We provide a set of managed tools designed for building your Machine Learning projects: AI Notebooks, AI Training, AI Deploy and AI Endpoints." # return response response = tts_service.synthesize(**req) audio_samples = np.frombuffer(response.audio, dtype=np.int16) # play output audio ipd.Audio(audio_samples, rate=sample_rate_hz) ``` ## Model rate limit When using AI Endpoints, the **following rate limits apply**: - **Anonymous**: 2 requests per minute, per IP and per model. - **Authenticated with an API access key**: 400 requests per minute, per Public Cloud project and per model. If you exceed this limit, a **429 error code** will be returned. If you require higher usage, please **[get in touch with us](https://help.ovhcloud.com/csm?id=csm_get_help)** to discuss increasing your rate limits. ## References For more information about the TTS model features, please refer to RIVA TTS [documentation](https://docs.nvidia.com/deeplearning/riva/user-guide/docs/tts/tts-overview.html). ## Going Further For a broader overview of AI Endpoints, explore the full [AI Endpoints Documentation](https://help.ovhcloud.com/csm/en-gb-documentation-public-cloud-ai-and-machine-learning-ai-endpoints?id=kb_browse_cat&kb_id=574a8325551974502d4c6e78b7421938&kb_category=ea1d6daa918a1a541e11d3d71f8624aa). Reach out to our support team or join the [OVHcloud Discord](https://discord.gg/ovhcloud) #ai-endpoints channel to share your questions, feedback, and suggestions for improving the service, to the team and the community.