Scheduled PostgreSQL backups using AWS Lambda, S3 and EventBridge

Post image
Photo by Markus Spiske from Pexels

AWS Lambda at its core, is a serverless compute platform where developers can write small, independent and often single-purpose applications without managing the underlying infrastructure where the application resides. Furthermore, it provides cloud consumers a flexible billing option where no payment is made for “idle” compute yet allowing high availability of the compute resource when required.

One of the use cases for AWS Lambda is running short-lived, routinary computational tasks that runs on schedule or with a given fixed rate, in our case - running database backups.

PostgreSQL is one of the most popular open-source relational databases out there, and while Amazon RDS already provide us with a fully-managed Postgres database solution (point-in time recovery, snapshots, multi-AZ and all rich features etc.), the current price point by which RDS is at does not exactly make it attractive for running small projects as it is often more than 2x the price of what we would get from an EC2 instance of the same compute shape.

Current hourly on-demand pricing table for Singapore (ap-southeast-1) Region:

RDS Instance

Type Hourly Rate
db.m5.large 0.247 usd
db.m5.xlarge 0.494 usd

Equivalent EC2 Instance

Type Hourly Rate
m5.large 0.12 usd
m5.xlarge 0.24 usd

That is not to say that we need to do away with RDS entirely, because oftentimes the cost of data loss for a business might be a whole lot more than the “savings” gained from the compromise of not using it.

So depending on the level of service and reliability required, for small, non mission-critical applications (with a tolerable RTO/RPO), a self-hosted PostgreSQL database on an EC2 instance might suffice. And like any good cloud architect, we need to ensure that we have a Backup/DR plan.

We can create a backup strategy in a variety of ways, we can :

  1. Create a cronjob alongside the database instance that runs the backup - cheap, but has a performance impact it shares compute resources with our instance (CPU, memory, Disk IO)
  2. Have a small instance that runs our backup - minimal impact on the database instance, but incurs additional cost, or we can…
  3. Create a lambda application that runs a backup application on schedule - effectively incurring a bill only when the function is running, but otherwise costless? Sounds like a good idea.

We’ll be implementing a solution outlined in the architecture diagram below:

alter-text
Final cloud architecture

Our Lambda Function

For this exercise, I prepared a Lambda function that serves as a “wrapper” for compiled native binaries of the pg_dump utility and executes it as a child process.

rcordial/lambda_pg_dump - GitHub

To create our function, simply clone the repo above, and on the root project directory, run.

npm run build

This script will simply install dependencies and package the application as a zip file (lambda_pg_dump.zip) that we can upload to AWS Lambda.

Let’s head over to the AWS console and create our Lambda function. Give our function a name, set the runtime to NodeJS 16.x and we can just allow it to create a new basic role (which we will add more permissions to later on).

alter-text
Creating the lambda function

After creation, under the Code panel, we’re going to upload our source bundle. Luckily, our function package is only 4.5 MB so we can directly upload it here.

alter-text
Uploading the lambda function

Lastly, we need to configure our memory, storage and timeout depending on our expected size of the database that we are going to create a dump for. But for this exercise, we could set the memory to 1024 MB and ephemeral storage to 1024 MB as well, the most important thing to configure here is our timeout. We need to give our function ample time to dump our database, so it wouldn’t hurt to set this to a longer time like 10 or 15 minutes even.

alter-text
Configuring our lambda function

Securing Database Credentials

As best practice, we should never store database credentials in the application code, and while we can pass credentials in the event JSON parameter, this seems like a questionable security practice to me. A more secure way of doing this is by putting the credentials inside the function’s environment variables, as this gives improved security and the benefit of KMS encryption.

So the best and most secure way to do this is by utilizing AWS Secrets Manager. As its name suggests, it provides applications a better way to protect secrets like database credentials by providing features for management, retrieval and automatic rotation of secret values.

To create a secret, we need to go the AWS console and go to the Secrets Manager service. We’ll create a key/value pair secret and populate the values with the ones required by our application.

alter-text
Creating a secret

We can optionally configure it to use a different encryption key but we’ll leave everything as default for now. Then we just need add a name to it such as “database_secret”, and hit Store to save our secret.

alter-text
Note the secret

We need to take note of the secret arn or name as we’re going to use and pass this value to our Lambda function for the application to reference and utilize the database credentials.

Invoking our Lambda Function

Our Lambda application accepts the following event JSON parameter.

1{
2    "SECRET": "<secret arn or name>",
3    "REGION": "<aws region>",
4    "S3_BUCKET": "<s3 bucket name>",
5    "PREFIX": "<optional prefix/folder in s3>"
6}

After executing the Lambda with this test event, you might find an error message like below:

alter-text
Error caused by lack of permissions for Secrets Manager

The error message is caused our Lambda function not having enough permission to retrieve the database credentials stored as secrets from AWS Secrets Manager. To remedy this, we must attach another permission policy (can be inline) to our Lambda execution role.

 1{
 2    "Version": "2012-10-17",
 3    "Statement": [
 4        {
 5            "Effect": "Allow",
 6            "Action": [
 7                "s3:PutObject",
 8                "s3:PutObjectAcl"
 9            ],
10            "Resource": "arn:aws:s3:::rcordial/*"
11        },
12        {
13            "Effect": "Allow",
14            "Action": "secretsmanager:GetSecretValue",
15            "Resource": "arn:aws:secretsmanager:<aws-region>:<account-id>:secret:database-XXXX"
16        }
17    ]
18}

The JSON document above is a policy that we need to attach to our Lambda execution role to give it permission to retrieve secret values from AWS Secrets Manager, similarly we also need to give it permission to put objects to our S3 bucket. Our Lambda function should now be able execute successfully.

Scheduling executions with EventBridge

To automatically invoke our Lambda function, can use another AWS service called Amazon EventBridge, more specifically the Scheduler capability. It will allow us to deliver events to targets like our lambda function and pass relevant event data (like the credentials in secrets manager) to create a scheduled way of repeatedly invoking our lambda function.

We can create by heading to EventBridge Scheduler and creating a schedule by giving it a name and specifying the occurence and schedule type like below:

alter-text
Creating an EventBridge Schedule

In the example above, we are specifying that we want to invoke our Lambda function every 12 hours. Furthermore we can optionally add a timezone and a start date and time.

Then we can specify the target, which in this case is a Lambda function. We just need to pass the same JSON event parameter that EventBridge will pass upon invoking our Lambda function.

alter-text
Lambda as an event target

If everything is followed up to this point, then we have now successfully created a Lambda function that runs on a specified schedule to create a database dump from a PostgreSQL database.

Summary

  1. We deployed a Lambda function with the pg_dump utility installed
  2. Created a Secrets Manager secret to securely store sensitive database credentials
  3. Automate function invocation by setting up an EventBridge Scheduler
  4. Implemented the whole thing in serverless again, no server-run CRON jobs, and no idle compute resulting in another negligible AWS bill

You May Also Like