Dynamo Backups

Dynamo Backups

 

Objective:

Steps to install streaming backup support into AWS account for dynamo tables.

Requirement(s):



Prerequisites:

  • Dedicated AWS account exists for client, with account possessing admin privileges

  • Pre-existing Dynamo database

  • Ec2 instance or unix client for command execution, with nodejs and python installed

Execution Location:

Amazon EC2 client dedicated account

Application/version:

Any

Checklist Author:

Brian Holt <beholt@gmail.com>



Summary:

Sage Bionetworks required a backup solution for the Dynamo database tables which power the Bridge production servers. Leveraging AWS best-practices the following procedure can be implemented to provide real time backups, and if desired, replications of critical databases.



The proposed solution:

  • Dynamo streams CRUD operations as they happen to AWS Lambda

  • Lambda executes Javascript to push the results into a S3 Backup bucket

  • The S3 Bucket has version control on to provide incremental backups of record changes

  • Javascript executables allow for 'backfilling' of old data into the s3 bucket to pre-seed the backups



Each backed up table will require its own IAM role, though due to proper namespace design as many tables as desired can back up to the same bucket.



Use:

Execution steps for configuration are attached below



Use instructions are summarized in my fork of the repo for the dynamodb-replicator:





This fork has the addition of a python script 'restore-backup.py' which allows you to directly playback all objects in the backup into a new table, either for development purposes or for disaster recovery.



usage: restore-backup.py [-h] [--region REGION]

                         sourceS3Bucket sourceS3Prefix tablename



Security:

The solution is designed to be secure using AWS built in roles to limit access to as few resources as possible. This is broken down as:

  • The Lambda function runs as a IAM role which possess:

    • Ability to access specific named stream from Dynamo table

    • Ability to write to specific S3 bucket

S3 buckets are designed with versioning to allow for incremental backups. The ability to delete versions can be further limited to prevent accidental or malicious deletion of backups.

Backups can exist within non-default VPCs by adding appropriate permissions to lambda roles (not covered in this process)



Configuration Checklist

Step #

Checklist Step with Expected Results

Findings

  1. 1.      

Take note of source dynamo table names:

prod-exporter-SynapseMetaTables

prod-exporter-SynapseTables

prod-exporter-testing-SynapseMetaTables

prod-exporter-testing-SynapseTables

prod-heroku-BackfillRecord

prod-heroku-BackfillTask

prod-heroku-Criteria

prod-heroku-ExternalIdentifier

prod-heroku-FPHSExternalIdentifier

prod-heroku-HealthCode

prod-heroku-HealthDataAttachment

prod-heroku-HealthDataRecord3

prod-heroku-HealthId

prod-heroku-MpowerVisualization

prod-heroku-ParticipantOptions

prod-heroku-ReportData

prod-heroku-ReportIndex

prod-heroku-SchedulePlan

prod-heroku-Study

prod-heroku-StudyConsent1

prod-heroku-Subpopulation

prod-heroku-Survey

prod-heroku-SurveyElement

prod-heroku-SurveyResponse2

prod-heroku-SynapseSurveyTables

prod-heroku-Task

prod-heroku-TaskEvent

prod-heroku-Upload2

prod-heroku-UploadDedupe2

prod-heroku-UploadSchema

prod-heroku-UserConsent2

prod-heroku-UserConsent3



  1. 2.      

Create backup bucket and enable versioning



In AWS, go to Services->S3 and Create bucket



Take note of bucket name:

org-sagebridge-dynamo-backups

 

Highlight created bucket name and choose Properties, Versioning, and Enable Versioning.



  1. 3.      

 Create IAM policy for S3 access for lambdas

 

In AWS, go to Services->IAM->Policies->Create Policy->Create your own policy:



 Take note of created policy name:



LambdaS3BackupPolicy



Create your own policy with name above, and description of purpose as necessary. Include following policy document with name of S3 bucket appropriately placed:



{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "LambdaS3Perms",

            "Effect": "Allow",

            "Action": [

                "s3:AbortMultipartUpload",

                "s3:DeleteObject",

                "s3:GetObject",

                "s3:GetObjectVersion",

                "s3:ListBucket",

                "s3:ListBucketMultipartUploads",

                "s3:ListMultipartUploadParts",

                "s3:PutObject"

            ],

            "Resource": [

                "arn:aws:s3:::org-sagebridge-dynamo-backups/*"

            ]

        }

    ]

}





Note: This policy is reusable for all lambdas backing up via this method if the same backup bucket is acceptable. Individual namespaces per table will be created.

 



  1. 4.      

Create IAM Policy for new Lambda:



In AWS, go to Services->IAM and Create Policy. Take note of created policy name:



LambdaDynamoStreamAccessPolicy



Create your own policy with name above, and description of purpose as necessary. Including following policy document with ARN of stream noted above appropriately placed:

{

    "Version": "2012-10-17",

    "Statement": [

        {

            "Sid": "LambdaStreamPermsTestTable1",

            "Effect": "Allow",

            "Action": [

                "dynamodb:DescribeStream",

                "dynamodb:GetRecords",

                "dynamodb:GetShardIterator",

                "dynamodb:ListStreams"

            ],

            "Resource": [

                "arn:aws:dynamodb:us-east-1: 649232250620:*"                  

            ]

        }

   ]

}



  1. 5.      

Create Lambda Execution Role



In AWS, go to Services->IAM and Roles. Create new Role: Take note of create role name:



LambdaDynamoBackupRole

 

Select AWS Lamba for Role Type



Under Attach policy, change type to 'Customer Managed' and attach created policies above, as well as CloudWatchLogsFullAccess to allow Lambda to log properly.



Create Role





  1. 6.      

Create Lambda



In AWS, go to Services->Lambda->Functions and Create a Lambda Function.



Choose 'Blank Function'



Leave Trigger Blank and next



Choose proper table name. Set Batch Size to 100, and Starting position to 'Trim horizon' and enable Trigger.



Choose Next.





Following settings:

  • Set Name to 'DynamoBackups' and Runtime to Node.js 4.3

  • For function package, choose 'Upload a file from Amazon S3' and enter the URL

  •  

  • For environmental variables set the following:

  •  

    • BackupRegion: us-east-1[region dynamo db exists in]

    • BackupBucket: org-sagebridge-dynamo-backups [set to s3 bucket created above]

    • BackupPrefix: Prod [unique string for namespace]

  • For Handler set 'dynamodb-replicator.backup'

  • For Role, choose existing role created in step 6

  •  

    • LambdaDynamoBackupRole

  • Other defaults should be sufficient



Select Next. Create Function.

 



  1. 7.      

Test function



Under newly created function, choose 'Test', and select template 'DynamoDB Update'. Save and Test.



Test result should return 'null'



S3 bucket with prefix should now contain a 'ExampleTableWithStream' folder with contents if test was successful.



 

  1. 9.      

Enable backups on target tables

 

In order to enable backups, each dynamo table needs a stream enabled, and this stream needs to be added to the lambda function. The lambda only sends new events to the backups. Pre-existing rows will need to be backfilled to get a basis for the backups.



A python script exists in the git repo for this purpose.



From a unix shell (ideally a mac with homebrew) with:

  • git

  • AWS credentials configured with admin privileges

  • Nodejs

  • python (w/pip)



git clone https://github.com/consolecowboy/dynamodb-replicator

cd dynamodb-replicator

npm install -g dynamodb-replicator

pip install argparse boto3

python ./backfill-all-dynamo-tables S3BUCKETNAME S3PREFIX LAMBDANAME [ --tableprefix PREFIX ] [ --region REGION ]



In our example, the backfill command to backup all tables starting with Prod to the Prod namespace, using the function we have would be:

python ./enable-dynamo-lambdabackups.py org-sagebridge-dynamo-backups Prod DynamoBackups --tableprefix=prod

To backup a single table







  1. 10.   

Validate Functionality



Look in the Backup (org-sagebridge-dynamo-backups) S3 bucket. In the bucket, a folder matching BackupPrefix should exist ('Prod' in above example). In this, there should be a  folder for the table.



In this folder should be a number of objects, one for each row in your table. The file names are md5 sums of the row key, which prevents hotspots in the s3 storage architecture.



New rows, updates to rows, and deletions should immediately effect this content. If you view versions all old version should remain if files are deleted.