Synapse stacks deployment

Deployments

Creating branches (usually Fri night or Sat morning)

Each week, we make sure release branches are merged back into develop at the stack release meeting. Before building the next staging stack (N+1), we tag the existing release branch and create branches for the next stack.

You can create a shell script to simplify the process:

#!/bin/bash
pushd ~/tmp/deploy
git clone git@github.com:Sage-Bionetworks/$1.git
pushd $1
git checkout release-$2
git tag -a $2.1 -m stack-$2-prod
git checkout develop
git merge release-$2
git push origin develop --tags
git checkout -b release-$3
git tag -a $3.0 -m stack-$3-staging
git push origin release-$3 --tags
popd
popd

Then use it (in this case, create-branch.sh) to close/tag the current staging branches and create new ones:

~/bin/create-branch.sh Synapse-Stack-Builder <N> <N+1>
~/bin/create-branch.sh SynapseWebClient <N> <N+1>
~/bin/create-branch.sh Synapse-Repository-Services <N> <N+1>

Now run the build jobs, pointing to the new branches:

Release builds point to release-<N+1>

Connect to VPN
Go to swc-release-build
- 'Configure' > Source Code Management > "Branches to build" = release-<N+1>
  - example: release-5
- Save
- Click the 'Enable' button to enable the build if not already enabled
- Click "Build now"
Go to repo-release-build
- ‘Configure’ >
  - Source Code Management' > ‘Branches to build’ = release-<N+1>
  - Build-Steps > Trigger/call builds on other projects > Project to build == Stack-Builder-Parameterized > Predefined Parameters > “BRANCH” = release-<N+1>

Production builds point to release-<N>

Go to swc-production-build
- Same process as above but point to ‘release-<N>’
Go to repo-production-build
- Same process as above but point to ‘release-<N>’
Note: the production builds are typically enabled when needed. Click the 'Enable' button to enable the build if you need to build the production artifacts.

Deploying dev stack (usually Fri night or Sat)

The dev stack runs the same version of the code as the staging stack in its production stack (i.e. there is no visible staging dev stack). Once we have artifacts from the builds above, we can deploy a staging version.

Go to build-dev-stack
- Make any change needed to configuration (new config values, update beanstalk solution…)
- Build with Parameters > Specify instance number (N+1), beanstalk numbers (0 if regular/non patch), versions (from builds above) and git branch for the stack builder (matches the instance number)
After the build is done, go to SynapseDev AWS console / Elastic Beanstalk and restart the servers for the environments
Go to dev-link-dns-to-alb to configure the stack just deployed as a staging stack
- Configure > Execute shell
  - In the command, for ‘repo.staging.dev.sagebase.org’ and ‘portal.staging.dev.sagebase.org’ replace ‘none’ by ‘repo-dev-<N+1>-<beanstalk-number>’ and ‘portal-dev-<N+1>-<beanstalk-number>’ where <N+1> and <beanstalk-number> are the values specified for the dev stack deployed above
    ==> This is going to move the instances in the target groups for the staging ALB (i.e. make stack <N+1> the staging stack. The operations takes about 5 minutes.
- While the linking is happening, setup a mySQL connection to the new stack repo database.
  - dev-<N+1>-db.cdusmwdhqvso.us-east-1.rds.amazonaws.com as endpoint
  - dev-<N+1>user as user
  - dev-<N+1> as db
- To verify that you have the correct prod and staging instance of the dev stack
  - go to https://repo-prod.dev.sagebase.org/repo/v1/version ==> N
  - go to https://repo-staging.dev.sagebase.org/repo/v1/version ==> N+1
Go to migrate-dev-stack
- Migrate the stack by running the job
  - Should take 2-3 attempts to get it down to about 5 minutes
  - When at 5 minutes, connect to the ‘prod’ dev stack and set it to Read-Only mode (the migration job sets the destination to RO and leaves it RO).
  - After the final migration, both stacks are in RO mode. Verify that all types displayed have null values for (min,max) [these are empty tables]
Go back to dev-link-dns-to-alb, this time to config the new stack as the ‘prod’ stack
- Configure > Execute shell
  - In the command, for ‘repo.prod.dev.sagebase.org’ and ‘portal.prod.dev.sagebase.org’ replace ‘N’-<beanstalk-number> by ‘<N+1>-<beanstalk-number>’ (i.e. make the new stack the prod stack) and for ‘repo.staging.dev.sagebase.org’ and ‘portal.staging.dev.sagebase.org’ set them again to ‘none’ (there is no ‘staging’ stack except for migration purposes)
    ==> Again, the operations takes about 5 minutes
In the database for the new stack, set the status to read-write mode and verify that the prod stack is in read-write mode at https://repo-prod.dev.sagebase.org/repo/v1/status .
In the Synapse Dev AWS console, go to CloudFormation and delete the stacks for portal-dev-<N>, repo-dev-<N> and workers-dev-N.
Run the synapsePython client integration tests
- Go to http://build-system-ops.dev.sagebase.org:8080/job/synapse-python-client-integration-test-prod/ > Build Now
- Go to http://build-system-ops.dev.sagebase.org:8080/job/synapse-python-client-integration-test-staging/ > Build Now
- NOTE: If you wait more than a few minutes to run these, hold on for a while else whatever they expect on the queues will start mixing with the replaying of messages and tests will time out. Sometimes happens as soon as the second (dev branch) test.
- If there is any error, open a Jira and work with the client dev team to resolve.

Deploying upcoming staging stack (usually Sat or Sun in between migrations below)

Copy the configuration of the job to the build-synapse-prod-stack job
- Go to build-synapse-staging-stack
- Click Configure then copy the content of the Execute Shell Command field to the clipboard
- Go to build-synapse-prod-stack
  - Click Configure and paste into the Execute Shell command field
  - In Source Management / Branches to build, update the branch to ‘release-<N>’
Update the configuration of the build-synapse-staging-stack job
- Go to build-synapse-staging-stack
  - In Source Management / Branches to build, update the branch to ‘release-<N+1>’
  - In Execute Shell Command
    - update ‘org.sagebionetworks.instance’ to <N+1>
    - update ‘org.sagebionetworks.beanstalk.version.[repo|workers|portal]’ to the corresponding versions of the artifacts (usually ‘release-<N+1>.0’)
    - update ‘org.sagebionetworks.vpc.subnet.color’ to the next color in ('red', ‘green', ‘blue’)
    - verify that the beanstalk numbers are all '0'
    - update ‘org.sagebionetworks.beanstalk.image.version.amazonlinux’ as needed
    - make any other change needed in the configuration (new config values?)
Build now
After the build is done, go to SynapseProd AWS console / Elastic Beanstalk, verify that the environments are up and restart the servers for the environments
Setup a connection to the repo database and set the stack to read-only mode.

Final migrations

Migrations are scheduled Tue-Fri night and take a 2-3 hours. The goal is to be around 30 mins when going into the final migration (Sat night or Sun night depending on the progress of the table/search building and/or availability of migrator). To achieve that, we take advantage of a decrease of activity during the week-end and start several migrations during the day (at least one in the morning, one mid-afternoon to get the time around 1hr, then several back to back until the migration time dips towards 30 mins). In between migrations, perform final verifications such as checking that the deployed artifacts is the latest build, any validation left over after the stack meeting etc. Once we get to back-to-back migration, the job is usually setup to keep the destination in read-only mode.

Go to migration-prod-staging
- When migration time is around 30 mins, Configure > in “Execute shell” change the value for " “-Dorg.sagebionetworks.remain.read.only.mode=false" to “true”. From this point, the staging stack will stay in read-only mode when migration is done.

The final migration, where both prod and staging are in read-only mode is usually timed early evening to minimize impact on users. Back-to-back migrations start around 2hrs prior to that, at some point the migration time is not decreasing anymore. That’s the clue that it’s ready to go…

Final migration - read-only mode

Connect to the repo database for the production stack (<N-1>) and set the stack in read-only mode
- prod-<N-1>-db.c5sxx7pot9i8.us-east-1.rds.amazonaws.com
Go to migration-prod-staging and start the last migration.

At the end of the job, you should only see a couple of lines with (null, null) values. these are migrateable tables that are empty.

04:34:38,031 INFO  - Currently running: 0 restore jobs.  Waiting to start 0 restore jobs.
04:34:38,158 INFO  - 	MULTIPART_UPLOAD_COMPOSER_PART_STATE:	mins:(null, null)	maxes:(null, null)
04:34:38,159 INFO  - 	COMMENT:	mins:(null, null)	maxes:(null, null)
04:34:38,159 INFO  - migration successful
04:34:38,159 INFO  - Destination remains in READ-ONLY mode.

If you see anything else do not go live without a clear understanding of what happened.

Open an issue with the type that’s the problem, and we’ll discuss on Monday.

Promoting staging stack to production

Verify the versions for each stack
- go to https://repo-prod.prod.sagebase.org/repo/v1/version ==> N-1
- go to https://repo-staging.prod.sagebase.org/repo/v1/version ==> N
Go to build-synapse-prod-bind-nlb-to-alb
- Build with parameters
  - specify N for ‘REPO_PROD_INSTANCE_AND_VERSION’ and ‘PORTAL_PROD_INSTANCE_AND_VERSION’
  - specify N+1 for ‘REPO_STAGING_INSTANCE_AND_VERSION' and 'PORTAL_STAGING_INSTANCE_AND_VERSION’
- It takes about 5 minutes for the operation to complete
Verify the versions for each stack
- go to https://repo-prod.prod.sagebase.org/repo/v1/version ==> N
- go to https://repo-staging.prod.sagebase.org/repo/v1/version ==> N+1
Set the production stack to read-write mode
- Connect to repo database and set the status to READ_WRITE.
Create an invalidation for the CDN
- Go to AWS console, Cloudfront > look for distribution E3BS1ZI5UHV8OS
  - on Invalidations tab, click ‘Create invalidation’ and specify ‘/*’ for object path
Connect to https://www.synapse.org and verify that it works (e.g. create an annotation)

Removing old production stack

Go to AWS console, Cloudformation
- In the list of stacks
  - for each stack in (‘portal-prod-<N-1>’, ‘repo-prod-<N-1>’, ‘workers-prod-<N-1>’)
    - select stack
    - Stack-Actions > Edit Termination Protection > Deactivated
    - Delete
- Wait for all the stacks to be deleted before deleting the shared resources stack
- select ‘prod-<N-1>-shared-resources’ > Stack Actions > Edit Termination Protection > Deactivated (where N-1 is just the version (e.g. 444))
Go to delete-cloudformation-stack
- Build with Parameters > replace ‘999’ by the version above

Initial migration

At this point, you have a live production stack and an empty staging stack. You can start migration for next week.

go to migration-prod-staging
- Build

Note: typically after starting the job, I change the ‘org.sagebionetworks.remain.read.only.mode’ back to ‘false’ and queue up another job

Patching

Patching is deploying changes (parameters or artifacts) to a running stack. It happens routinely on the staging stack as we’re finding and fixing issues, less often on the production stack.

Patching staging

Patching staging is essentially deploying staging, just update the artifacts versions in the ‘Update the configuration of the build-synapse-staging-stack job’ above and run the job.

Patching production

Patching production is essentially deploying a ‘-1’ environment, mapping it as a ‘tst’ endpoint, validating the fixes and then promoting it to production.

Deploy ‘-1’ environment

Same as above but specify ‘1' for the beanstalk numbers (’repo' and ‘portal’ only, ‘workers’ is always '0').

Map environment to ‘tst’ website

Go to http://build-system-ops.prod.sagebase.org:8080/job/build-synapse-prod-bind-nlb-to-alb/
- Configure
  - In Execute Shell Command, edit the values for ‘repotst.prod.sagebase.org’ and ‘tst.synapse.org' from ‘none’ to ‘repo-prod-<stack-instance>-1’ and ‘portal-prod-<stack-instance.-1’, where <stack-instance> is the release number you’re on (e.g. ‘473’)
- Build with parameters
  - CAUTION: Make sure you enter the correct parameters for (<release-number>-<beanstalk-number>) as the running stack (i.e. what was input when staging was promoted to prod above).

Promote to production

Go to http://build-system-ops.prod.sagebase.org:8080/job/build-synapse-prod-bind-nlb-to-alb/
- Configure
  - In Execute Shell Command, set the values for ‘repotst.prod.sagebase.org’ and ‘tst.synapse.org' back to ‘none’
- Build with parameters
  - CAUTION: Make sure you enter the correct parameters for (<release-number>-<beanstalk-number>) as the running stack (i.e. what was input when staging was promoted to prod above).