Option 2: Amazon Polly

Create sample call recordings with Amazon Polly

One option to create sample call recordings yourself is by using Amazon Polly to synthesize an audio track for either a virtual customer or a virtual call center agent. You cannot synthesize a dialog with two different voices using Amazon Polly.

Amazon Polly

Amazon Polly is a service that turns text into lifelike speech, allowing you to create applications that talk, and build entirely new categories of speech-enabled products. Amazon Polly is a Text-to-Speech service that uses advanced deep learning technologies to synthesize speech that sounds like a human voice. With dozens of lifelike voices across a variety of languages, you can select the ideal voice and build speech-enabled applications that work in many different countries.

Listen to speech

Let’s become familiar with Amazon Polly and start with listening to it in the AWS console. Please follow the steps below.

Plain text input

  1. Go to the Amazon Polly console and make sure, you’re in the Text-to-Speech view. If you aren’t, click on the respective label in the navigation frame on the left.

  2. There is a sample text already provided in the Plain text tab. You can simply listen to it by clicking on the Listen to speech button.

  3. We have also prepared a sample text for you that you can now paste into the Plain text field.

    Hello, this is John Doe calling. I'm a customer of your food delivery service. My user ID is jd42.
    I wanted to provide some feedback regarding my order 47110815. Your delivery driver for that order did an amazing job and was overwhelmingly friendly and helpful.

    Of course you can also make up your own text. Once you’re ready, click on the Listen to speech button. Try different voices and find the one you like best.

    1. Now, experience the improved natural sound of the voice: Go to the radio buttons below Engine and change the engine from Standard to Neural. The Neural Text-To-Speech (NTTS) engine was added in July 2019. Through a new machine learning approach, NTTS delivers significant improvements in speech quality. It increases naturalness and expressiveness, two key factors in synthesizing lifelike speech.

    SSML text input

    Sending plain text to Amazon Polly lets it interpret the content in a default manner. In our case, however, it would be more natural that the customer mentions his order number digit by digit instead of pronouncing the order number as a natural number. Customers would probably also emphasize certain details, like their full name or their user ID. A way to express this is Speech Synthesis Markup Language (SSML) and Amazon Polly supports a number of SSML tags. Below you can find the same sample text as above, but now it’s enriched with several SSML tags that should make the resulting speech sound more natural in the context of a contact center call.

    <speak>
    <p>Hello, this is John Doe calling. I'm a customer of your food delivery service. My user ID is <prosody rate="75%"><say-as interpret-as="spell-out">jd42</say-as></prosody>.</p>
    <p>I wanted to provide some feedback regarding my order <prosody rate="75%"><say-as interpret-as="digits">47110815</say-as></prosody>. Your delivery driver for that order did an amazing job and was overwhelmingly friendly and helpful.</p>
    </speak>
  4. Click on the SSML tab and enter the SSML version of your text in the text box. Take a moment to browse through the documentation of the supported SSML tags and try some of them out. Again, click on the Listen to speech button to listen to the audio output.

  5. Not so relevant for our context today, but Amazon Polly also introduced the newscaster style with NTTS. When you go once more to the announcement, you will see an example close to the bottom of the page. Just try it out, it should sound like a news reporter reading your text - and maybe you have a different use case where this is helpful.

The above customer statements are only a sample. You can use these sample statements yourself if you like to produce a call recording with a quite positive sentiment, or you can make up your own customer feedback. You will also need customer feedback with a quite negative sentiment. In any case: The customer name “John Doe” and his user ID “jd42” as well as his order number “47110815” are also only samples and will not be needed or referred to in later stages of this workshop.

Download audio file

Instead of an audio stream that is played back through your browser, you can also download the result as an audio file. You can try it out by clicking on the Download MP3 button. Notice that you can change the file format by clicking on the Change file format link.

In the end, though, we’re mostly interested in having our fictive call recordings stored somewhere, from where we can start processing it. That brings us to the Synthesize to S3 button.

Synthesize to S3

As we said before, we would like to store our sample call recordings in an Amazon S3 bucket. This is directly supported by Amazon Polly.

  1. When you click on the Synthesize to S3 button (and you haven’t configured this before), you will see a dialog asking for details about storing the synthesized audio in Amazon S3. The things being asked for are:

    • The name of the bucket in which the audio file should be stored as Amazon S3 objects.
    • An optional prefix that should be used in every object key.
    • An optional SNS topic to send out explicit notifications to interested subscribers when a new audio file is stored. We will not make use of that in this workshop.
  2. As S3 output bucket, enter the name of your recording bucket. As S3 key prefix enter recordings/. As said before, we’ll ignore the SNS topic option. Click on Synthesize to start the job. Amazon Polly will use the input text that is currently in the Plain text or SSML text field.

  3. The result will be a new synthesis task. You will be presented the meta data of this new task. You can click on the link next to Task ID to switch to the S3 synthesis tasks overview where you can see the status of your task. Once the task is completed, you see the S3 URL of the audio file that was created.

  4. Go to the Amazon S3 console and click on the name of your recording bucket. You will see a virtual folder called recordings. Although this is actually a prefix that you defined to be added to the key of your object, the Amazon S3 console visualizes segments in object keys that are delimited by a forward slash / as virtual folders. Click on the virtual folder name and you will find the audio file that was just created.

  5. Create a few sample call recordings by repeating this procedure. As we later also want to run a sentiment analysis on the call recordings, make sure you have at least one recording with a very positive statement and at least one with a very negative statement in your samples collection.

Conclusion

Congratulations!

You have successfully used Amazon Polly to create sample voice audio files with virtual contact center agent or virtual customer statements, placed them in your recording bucket, and listened to your audio files on your computer.