text to speech APIs

Several libraries and services are available in Python for converting text to speech. Here are a few examples:

  • gTTS (Google Text-to-Speech) is a Python library that uses Google’s Text-to-Speech API to convert text to speech.
from gtts import gTTS
import os

# The text to convert to speech
text = "Hello, this is an example of text to speech conversion."

# Create a gTTS object
tts = gTTS(text, lang='en')

# Save the speech to a file
tts.save("speech.mp3")

# Play the speech
os.system("mpg321 speech.mp3")
  • pyttsx3 is a Python library that uses the native text-to-speech engine of the operating system to convert text to speech.
import pyttsx3

# Create a text-to-speech engine
engine = pyttsx3.init()

# The text to convert to speech
text = "Hello, this is an example of text to speech conversion."

# Speak the text
engine.say(text)
engine.runAndWait()
  • Amazon Polly is a service provided by Amazon Web Services (AWS) that uses advanced deep learning technologies to convert text to speech. To use Amazon Polly, you will need to have an AWS account and access to the AWS SDK for Python (boto3)
import boto3

# Create a Polly client
polly = boto3.client('polly')

# The text to convert to speech
text = "Hello, this is an example of text to speech conversion."

# Convert the text to speech
response = polly.synthesize_speech(
    Text=text,
    VoiceId='Joanna',
    OutputFormat='mp3'
)

# Save the speech to a file
with open('speech.mp3', 'wb') as f:
    f.write(response['AudioStream'].read())
  • OpenAI
import openai

# Your OpenAI API key
openai.api_key = "YOUR_API_KEY"

# The text to convert to speech
text = "Hello, this is an example of text to speech conversion."

# Generate speech from the text
response = openai.Speech.create(
    engine="text-davinci-002",
    prompt=text
)

# Save the speech to a file
with open("speech.mp3", "wb") as f:
    f.write(response.data)

there are other libraries as well

  • IBM Watson Text-to-Speech: This API uses IBM’s advanced deep learning technology to convert text to speech. It offers a wide range of voices and languages, and the API can be accessed for free with an IBM Cloud account.
from ibm_watson import TextToSpeechV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Set up the IBM Watson TTS service
authenticator = IAMAuthenticator('<your_api_key>')
tts = TextToSpeechV1(authenticator=authenticator)
tts.set_service_url('<your_service_url>')

# Define the text to convert to speech
text = 'Hello, my name is Watson. I can convert text to speech.'

# Configure the voice and format of the audio
voice = 'en-US_AllisonVoice'
audio_format = 'audio/mp3'

# Convert the text to speech
response = tts.synthesize(text, voice=voice, accept=audio_format).get_result()

# Save the audio to a file
with open('output.mp3', 'wb') as audio_file:
    audio_file.write(response.content)

 

  • Microsoft Azure Cognitive Services Text-to-Speech: This API uses Microsoft’s advanced deep learning technology to convert text to speech. It offers a wide range of voices and languages, and the API can be accessed for free within certain usage limits.
import requests
from xml.etree import ElementTree

# Set up the Azure Cognitive Services TTS service
subscription_key = '<your_subscription_key>'
endpoint = '<your_endpoint>'

# Define the text to convert to speech
text = 'Hello, my name is Azure. I can convert text to speech.'

# Configure the voice and format of the audio
voice_name = 'en-US-JessaNeural'
audio_format = 'audio-16khz-128kbitrate-mono-mp3'

# Convert the text to speech
response = requests.post(endpoint, headers={
    'Content-Type': 'application/ssml+xml',
    'X-Microsoft-OutputFormat': audio_format,
    'Authorization': 'Bearer ' + subscription_key,
    'X-Search-AppId': '07D3234E49CE426DAA29772419F436CA',
    'X-Search-ClientID': '1ECFAE91408841A480F00935DC390960',
    'User-Agent': 'TTSForPython'
}, data=ElementTree.tostring(ElementTree.fromstring("""
    <speak version='1.0' xml:lang='en-us'>
        <voice name='""" + voice_name + """'>
            """ + text + """
        </voice>
    </speak>
""")))

# Save the audio to a file
with open('output.mp3', 'wb') as audio_file:
    audio_file.write(response.content)

It’s also worth noting that, most of the text to speech conversion libraries and services support multiple languages and voices, allowing you to select the voice and language that best suits your needs. Some libraries, like Amazon Polly, provide a variety of natural-sounding voices, each with its own unique characteristics, such as gender and accent.

Additionally, you can also find some pre-trained models, like TTS (Text-to-Speech) models that have been fine-tuned on a specific dataset, like LJSpeech, Tacotron2, or Transformer TTS.