Dynamo Backups
Objective: | Steps to install streaming backup support into AWS account for dynamo tables. |
Requirement(s): | |
Prerequisites: |
|
Execution Location: | Amazon EC2 client dedicated account |
Application/version: | Any |
Checklist Author: | Brian Holt <beholt@gmail.com> |
Summary:
Sage Bionetworks required a backup solution for the Dynamo database tables which power the Bridge production servers. Leveraging AWS best-practices the following procedure can be implemented to provide real time backups, and if desired, replications of critical databases.
The proposed solution:
Dynamo streams CRUD operations as they happen to AWS Lambda
Lambda executes Javascript to push the results into a S3 Backup bucket
The S3 Bucket has version control on to provide incremental backups of record changes
Javascript executables allow for 'backfilling' of old data into the s3 bucket to pre-seed the backups
Each backed up table will require its own IAM role, though due to proper namespace design as many tables as desired can back up to the same bucket.
Use:
Execution steps for configuration are attached below
Use instructions are summarized in my fork of the repo for the dynamodb-replicator:
This fork has the addition of a python script 'restore-backup.py' which allows you to directly playback all objects in the backup into a new table, either for development purposes or for disaster recovery.
usage: restore-backup.py [-h] [--region REGION] sourceS3Bucket sourceS3Prefix tablename |
Security:
The solution is designed to be secure using AWS built in roles to limit access to as few resources as possible. This is broken down as:
The Lambda function runs as a IAM role which possess:
Ability to access specific named stream from Dynamo table
Ability to write to specific S3 bucket
S3 buckets are designed with versioning to allow for incremental backups. The ability to delete versions can be further limited to prevent accidental or malicious deletion of backups.
Backups can exist within non-default VPCs by adding appropriate permissions to lambda roles (not covered in this process)
Configuration Checklist
Step # | Checklist Step with Expected Results | Findings | |||||||||||||||||||||||||||||||||
| Take note of source dynamo table names:
| ||||||||||||||||||||||||||||||||||
| Create backup bucket and enable versioning In AWS, go to Services->S3 and Create bucket Take note of bucket name:
Highlight created bucket name and choose Properties, Versioning, and Enable Versioning. | ||||||||||||||||||||||||||||||||||
| Create IAM policy for S3 access for lambdas
In AWS, go to Services->IAM->Policies->Create Policy->Create your own policy: Take note of created policy name:
Create your own policy with name above, and description of purpose as necessary. Include following policy document with name of S3 bucket appropriately placed:
Note: This policy is reusable for all lambdas backing up via this method if the same backup bucket is acceptable. Individual namespaces per table will be created.
| ||||||||||||||||||||||||||||||||||
| Create IAM Policy for new Lambda: In AWS, go to Services->IAM and Create Policy. Take note of created policy name:
Create your own policy with name above, and description of purpose as necessary. Including following policy document with ARN of stream noted above appropriately placed:
| ||||||||||||||||||||||||||||||||||
| Create Lambda Execution Role In AWS, go to Services->IAM and Roles. Create new Role: Take note of create role name:
Select AWS Lamba for Role Type Under Attach policy, change type to 'Customer Managed' and attach created policies above, as well as CloudWatchLogsFullAccess to allow Lambda to log properly. Create Role | ||||||||||||||||||||||||||||||||||
| Create Lambda In AWS, go to Services->Lambda->Functions and Create a Lambda Function. Choose 'Blank Function' Leave Trigger Blank and next Choose proper table name. Set Batch Size to 100, and Starting position to 'Trim horizon' and enable Trigger. Choose Next. Following settings:
Select Next. Create Function.
| ||||||||||||||||||||||||||||||||||
| Test function Under newly created function, choose 'Test', and select template 'DynamoDB Update'. Save and Test. Test result should return 'null' S3 bucket with prefix should now contain a 'ExampleTableWithStream' folder with contents if test was successful. |
| |||||||||||||||||||||||||||||||||
| Enable backups on target tables
In order to enable backups, each dynamo table needs a stream enabled, and this stream needs to be added to the lambda function. The lambda only sends new events to the backups. Pre-existing rows will need to be backfilled to get a basis for the backups. A python script exists in the git repo for this purpose. From a unix shell (ideally a mac with homebrew) with:
In our example, the backfill command to backup all tables starting with Prod to the Prod namespace, using the function we have would be:
To backup a single table | ||||||||||||||||||||||||||||||||||
| Validate Functionality Look in the Backup (org-sagebridge-dynamo-backups) S3 bucket. In the bucket, a folder matching BackupPrefix should exist ('Prod' in above example). In this, there should be a folder for the table. In this folder should be a number of objects, one for each row in your table. The file names are md5 sums of the row key, which prevents hotspots in the s3 storage architecture. New rows, updates to rows, and deletions should immediately effect this content. If you view versions all old version should remain if files are deleted. |