Text to speech with Amazon Polly

This is an opportunity to play around with an AI Voice Generator using Amazon Polly. The task is simple: Create a lambda function that reads a text and uses Amazon Polly to generate the audio mp3 file with a human-like speech, and send that file to an S3 bucket for storage to listen or download.

Tools required:

  • Amazon Account to enter to the AWS Management Console

  • Notions of Nodejs 18

Steps

  1. Go to AWS Management Console and search for Amazon Polly.

  2. You can choose according to your preferences, between the Engine (neural, long-form, standard), the language (event the accent), the voice, and of course a text to test it out.

  3. Go to AWS Management Console and search for IAM to create a role with policies to connect to Polly though Lambda.

  4. Go to Access Management → Roles → Create Role.

  5. Since we’ll be using Lambda to create the function, select as the following:

    1. Trusted entity type: AWS service

    2. Use case → Service or use case: Lambda

    3. Click on Next

  6. Search and select the following policies:

    1. AmazonPollyFullAccess

    2. AmazonS3FullAccess

    3. AWSLambdaBasicExecutionRole

  7. Click on Next, and provide a relevant name and description for the role.

  8. Click on Create role to confirm.

  9. Go to AWS Management Console and search for S3 → Create bucket.

  10. Provide a relevant name for the S3 bucket, and choose General purpose as bucket type.

  11. Go to AWS Management Console and search for Lambda.

  12. Click on Create Function.

  13. Select the following:

    1. Function name: Provide a relevant name for the function

    2. Runtime: Choose the latest stable version for Nodejs. At the time of this article, it’s Nodejs18.

    3. Architecture: x86_64

    4. Execution role: Use an existing role (Choose the role you just created in step 8).

    5. Click on Next.

  14. Rename the file from index.mjs to index.js

const { PollyClient, SynthesizeSpeechCommand } = require("@aws-sdk/client-polly");
const { S3Client } = require("@aws-sdk/client-s3");
const { Upload } = require("@aws-sdk/lib-storage");

const polly = new PollyClient({region: "us-east-2"});
const s3 = new S3Client({region: "us-east-2"});

exports.handler = async (event) => {
    try {
        // Extract text input from the event
        const text = event.text;

        if (!text || text.length > 3000) {
          throw new Error("Invalid text input. Ensure the text is non-empty and under 3,000 characters.");
        }

        // Specify parameters for Polly
        const params = {
            Text: text,
            OutputFormat: 'mp3',
            VoiceId: 'Emma' // You can change this to the desired voice
        };

        // Synthesize speech using Polly
        const synthesizeCommand = new SynthesizeSpeechCommand(params);
        const data = await polly.send(synthesizeCommand);

        // Ensure AudioStream is valid
        if (!data.AudioStream || typeof data.AudioStream !== "object") {
          throw new Error("AudioStream is missing in Polly response");
        }

        // Generate a unique key for the audio file
        const key = `audio-${Date.now()}.mp3`;

        const upload = new Upload({
            client: s3,
            params: {
                Bucket: "pollys3-storage", // Replace with your bucket name
                Key: key,
                Body: data.AudioStream,
                ContentType: "audio/mpeg",
            },
        });

        await upload.done();

        const outputMessage = `The audio file has been successfully stored in the S3 bucket by the name ${key}`;

        return {
            statusCode: 200,
            body: JSON.stringify({ message: outputMessage })
        };
    } catch (error) {
        console.error('Error:', error);
        return {
            statusCode: 500,
            body: JSON.stringify({ message: 'Internal server error' })
        };
    }
};
  1. Click on Test to configure a test event for index.js

  2. Provide the text you desire. You should have a successful response (Status code 200).

  3. Verify the mp3 file corresponding to the text in your S3 bucket created.