text to speech APIs
Several libraries and services are available in Python for converting text to speech. Here are a few examples:
- gTTS (Google Text-to-Speech) is a Python library that uses Google’s Text-to-Speech API to convert text to speech.
from gtts import gTTS import os # The text to convert to speech text = "Hello, this is an example of text to speech conversion." # Create a gTTS object tts = gTTS(text, lang='en') # Save the speech to a file tts.save("speech.mp3") # Play the speech os.system("mpg321 speech.mp3")
- pyttsx3 is a Python library that uses the native text-to-speech engine of the operating system to convert text to speech.
import pyttsx3 # Create a text-to-speech engine engine = pyttsx3.init() # The text to convert to speech text = "Hello, this is an example of text to speech conversion." # Speak the text engine.say(text) engine.runAndWait()
- Amazon Polly is a service provided by Amazon Web Services (AWS) that uses advanced deep learning technologies to convert text to speech. To use Amazon Polly, you will need to have an AWS account and access to the AWS SDK for Python (boto3)
import boto3 # Create a Polly client polly = boto3.client('polly') # The text to convert to speech text = "Hello, this is an example of text to speech conversion." # Convert the text to speech response = polly.synthesize_speech( Text=text, VoiceId='Joanna', OutputFormat='mp3' ) # Save the speech to a file with open('speech.mp3', 'wb') as f: f.write(response['AudioStream'].read())
- OpenAI
import openai # Your OpenAI API key openai.api_key = "YOUR_API_KEY" # The text to convert to speech text = "Hello, this is an example of text to speech conversion." # Generate speech from the text response = openai.Speech.create( engine="text-davinci-002", prompt=text ) # Save the speech to a file with open("speech.mp3", "wb") as f: f.write(response.data)
there are other libraries as well
- IBM Watson Text-to-Speech: This API uses IBM’s advanced deep learning technology to convert text to speech. It offers a wide range of voices and languages, and the API can be accessed for free with an IBM Cloud account.
from ibm_watson import TextToSpeechV1 from ibm_cloud_sdk_core.authenticators import IAMAuthenticator # Set up the IBM Watson TTS service authenticator = IAMAuthenticator('<your_api_key>') tts = TextToSpeechV1(authenticator=authenticator) tts.set_service_url('<your_service_url>') # Define the text to convert to speech text = 'Hello, my name is Watson. I can convert text to speech.' # Configure the voice and format of the audio voice = 'en-US_AllisonVoice' audio_format = 'audio/mp3' # Convert the text to speech response = tts.synthesize(text, voice=voice, accept=audio_format).get_result() # Save the audio to a file with open('output.mp3', 'wb') as audio_file: audio_file.write(response.content)
- Microsoft Azure Cognitive Services Text-to-Speech: This API uses Microsoft’s advanced deep learning technology to convert text to speech. It offers a wide range of voices and languages, and the API can be accessed for free within certain usage limits.
import requests from xml.etree import ElementTree # Set up the Azure Cognitive Services TTS service subscription_key = '<your_subscription_key>' endpoint = '<your_endpoint>' # Define the text to convert to speech text = 'Hello, my name is Azure. I can convert text to speech.' # Configure the voice and format of the audio voice_name = 'en-US-JessaNeural' audio_format = 'audio-16khz-128kbitrate-mono-mp3' # Convert the text to speech response = requests.post(endpoint, headers={ 'Content-Type': 'application/ssml+xml', 'X-Microsoft-OutputFormat': audio_format, 'Authorization': 'Bearer ' + subscription_key, 'X-Search-AppId': '07D3234E49CE426DAA29772419F436CA', 'X-Search-ClientID': '1ECFAE91408841A480F00935DC390960', 'User-Agent': 'TTSForPython' }, data=ElementTree.tostring(ElementTree.fromstring(""" <speak version='1.0' xml:lang='en-us'> <voice name='""" + voice_name + """'> """ + text + """ </voice> </speak> """))) # Save the audio to a file with open('output.mp3', 'wb') as audio_file: audio_file.write(response.content)
It’s also worth noting that, most of the text to speech conversion libraries and services support multiple languages and voices, allowing you to select the voice and language that best suits your needs. Some libraries, like Amazon Polly, provide a variety of natural-sounding voices, each with its own unique characteristics, such as gender and accent.
Additionally, you can also find some pre-trained models, like TTS (Text-to-Speech) models that have been fine-tuned on a specific dataset, like LJSpeech, Tacotron2, or Transformer TTS.