Amazon Polly Features

Simple-to-Use API

Amazon Polly provides an API that enables you to quickly integrate speech synthesis into your application. You simply send the text you want converted into speech to the Amazon Polly API, and Amazon Polly immediately returns the audio stream to your application so your application can begin streaming it directly or store it in a standard audio file format, such as MP3.

Sampling rate	Sample Code
"Hi. My name is Joanna."	from boto3 import client polly = client("polly", region_name="us-east-1") response = polly.synthesize_speech( Text="Hi. My name is Joanna.", OutputFormat="mp3", VoiceId="Joanna")

Wide Selection of Voices and Languages

Amazon Polly includes dozens of lifelike voices and support for a variety of languages, so you can select the ideal voice and distribute your speech-enabled applications in many countries.

Language	Female	Male
Australian English	Nicole	Russell
Brazilian Portuguese	Vitória	Ricardo
Canadian French	Chantal
Danish	Naja	Mads
Dutch	Lotte	Ruben
French	Léa Céline	Mathieu
German	Vicki	Hans
	Marlene
Hindi	Aditi
Icelandic	Dóra	Karl
Indian English	Raveena Aditi
Italian	Carla	Giorgio
Japanese	Mizuki	Takumi
Korean	Seoyeon
Mandarin Chinese	Zhiyu
Norwegian	Liv
Polish	Ewa	Jacek
	Maja	Jan
Portuguese - Iberic	Inês	Cristiano
Romanian	Carmen
Russian	Tatyana	Maxim
Spanish - Castilian	Conchita	Enrique
Swedish	Astrid
Turkish	Filiz
UK English	Amy	Brian
	Emma
US English	Joanna	Matthew
	Salli	Justin
	Kendra	Joey
	Kimberly
	Ivy
US Spanish	Penélope	Miguel
Welsh	Gwyneth
Welsh English		Geraint

Synchronize Speech for an Enhanced Visual Experience

Amazon Polly makes it easy to request an additional stream of metadata that provides information about when particular sentences, words and sounds are being pronounced. Using this metadata stream alongside the synthesized speech audio stream, you can now build your applications with an enhanced visual experience, such as speech-synchronized facial animation or karaoke-style word highlighting.

Please visit the documentation to learn more about how to use Speech Marks.

Optimize Your Streaming Audio

With Amazon Polly, you can stream all kinds of information through your application to users in near real time. You can also choose from various sampling rates to optimize bandwidth and audio quality for your application. Amazon Polly supports MP3, Vorbis, and raw PCM audio stream formats.

Sampling rate	MP3 size	OGG size	PCM size
22.05 kHz Listen	19.02 kB	19.14 kB	N/A
16.05 kHz Listen	16.04 kB	16.35 kB	99.53 kB
8.00 kHz Listen	13.26 kB	10.40 kB	49.76 kB

Adjust Speech Rate, Pitch, and Loudness

Amazon Polly supports Speech Synthesis Markup Language (SSML), a W3C standard, XML-based markup language for speech synthesis applications, and supports common SSML tags for phrasing, emphasis, and intonation. This flexibility helps you create lifelike speech that will attract and hold the attention of your audience.

To learn more, visit the Amazon Polly documentation on SSML tags.

Sample	SSML
This is how I speak normally.	(none)
I can speak in a higher pitched voice, or I can speak in a lower pitched voice.	<speak>I can speak in a <prosody pitch="high">higher pitched voice</prosody>, or I can speak <prosody pitch="low">in a lower pitched voice</prosody></speak>
I can speak really slowly, or I can speak really fast.	<speak>I can speak <prosody rate="x-slow">really slowly</prosody>, or I can speak <prosody rate="x-fast">really fast</prosody></speak>
I can also speak very loudly, or I can speak very quietly.	<speak>I can also speak <prosody volume="x-loud">very loudly</prosody>, or I can speak <prosody volume="x-soft">very quietly</prosody>. </speak>
I can whisper.	<speak>I have a secret to tell you, I will whisper it to you.<amazon:effect name="whispered">'<prosody rate="x-slow"> <prosody volume="loud">I am not human.</prosody></prosody></amazon:effect>Can you believe it?</speak>

Adjust the Maximum Duration of Speech

Amazon Polly enables you to automatically adjust the speech rate based on a maximum allotted amount of time you define with a feature called time-driven prosody. This is beneficial for many use cases, especially when it comes to localization.

For example, suppose you have US English speech embedded in your training video and want to localize this video into German. Let’s say you translate the text and voice it with Amazon Polly. It is essential that the localized German speech streams in corresponding frames of the video, so the German speech cannot be longer than the US English speech. You can use this feature to more easily facilitate the dubbing process.

Platform and Programming Language Support

Amazon Polly supports all the programming languages included in the Amazon SDK (Java, Node.js, .NET, PHP, Python, Ruby, Go, and C++) and Amazon Mobile SDK (iOS/Android). Polly also supports an HTTP API so you can implement your own access layer.

Speech Synthesis via API, Console, or Command Line

Amazon Polly can be accessed via the Polly API (and various language-specific SDKs), Amazon Web Services Management Console, and the Amazon command-line interface (CLI). You have full control over all the capabilities of Amazon Polly, whether you use the service through the console, the API, or the CLI.

Custom Lexicons

With Amazon Polly’s custom lexicons, or vocabularies, you can modify the pronunciation of particular words, such as company names, acronyms, foreign words and neologisms (e.g., “ROTFL”, “C’est la vie” when spoken in a non-French voice). To customize these pronunciations, you upload an XML file with lexical entries. For example, you can customize the pronunciation of Nguyen by providing a phoneme using this XML:

Nguyen (before)

Nguyen (after)

<lexeme>
            <grapheme>Nguyen</grapheme>
            <grapheme>nguyen</grapheme>
            <grapheme>NGUYEN</grapheme>
            <phoneme>"nu.jEn'</phoneme>
</lexeme>

Learn more about Amazon Polly pricing

Visit the pricing page

Ready to get started?

Have more questions?

Getting Started with Cloud

Find product-specific user guides, training and tutorials

View now »

Simple Application Server

Lightweight app servers demystified: concepts, management, scenarios, Amazon Web Services integration

Learn now »

Cloud Phone

Cloud phones : virtual mobile solutions - technology, architecture, advantages, challenges, and future outlook