Deployment / QA Team
Sage Bionetworks does not have separate operations, QA, and development teams. Instead, we have have a cross functional team where engineers contribute to all these areas, following the principles of the DevOps movement.
Role Rotations
At any given time we will have two team members who take on the role of constituting the deployment team. In 2013 Xa will serve as a permanent member of the deployment team. Every week one additional person will rotate onto the team to assist with deployment team roles according to the 2013 Deployment Team Rotation. During the rotation, the high level responsibilities of the deployment team are:
- To keep production systems stable for end users is the highest priority
- To be highly interruptable to deal with issues with the live production service. Team members will be the first line of defense in responding to issues with production systems. The team has the (still metaphorical) pager.
- To manage the logistics of creating release branches, updating production and staging environments, and managing automated and manual QA and testing of the production and staging systems
- To identify issues making deployment and operations of Synapse difficult, and create Jira issues as appropriate - default assign build / test / stack builder issues to Xa.
- To ensure deployment and operational documentation is up to date in the wiki. This includes keeping activities that touch the production environment accurate in the Platform AWS Log.
- Proposal New in 2013: To create a summary of issues resolved in a stack moving to staging (e.g. list of closed JIRAs emailed to platform) and validate fixes to those issues.
- Proposal New in 2013: To create release notes for a stack moving to production and send summary to end users of significant new fixes.
It is not the responsibility of the team to make non-critical changes to the deployment process or environment, anymore than it is their responsibility to fix any bugs or make any feature improvements they uncover as part of their work. Instead, the team should open Jira issues for any non-critical issues and assign the appropriate engineer to address as part of their "normal" development work. These issues will be triaged and managed like any other work items in our queue. Deployment Team members will continue to work on their normal issue queue, although it's understood they will be subjected to decreased productivity as a results of fulfilling their deployment team role.
Team Activities
In general, the release team should try to work as a pair, particularly on any task which touches the production system. Also, any time one of the two team members is new activities should be performed as a pair.
Monday Staff Meeting (10AM, M1-C103)
Our Monday staff meeting is designed to be operationally focused. Prior to the meeting, the deployment team should make a final QA pass over last week's staging environment. At this meeting, we will
- Identify the next rotation in the deployment team role.
- Come to a clear go / no go decision on putting the existing staging stack into production Monday evening.
- Define any issues that need to be resolved before a new release branch of develop can be created Monday evening or Tuesday morning.
- Define the new features / bug fixes moving to staging and production so that we can prepare communications to our end users.
If you're on the deployment team, make sure you are clear on these items before the meeting ends!
Monday Evening
We want to minimize the impact of upgrades on users by doing them outside normal business hours, and at the beginning of the week when there is plenty of time to respond to issues.
- Cut over CNAMES to put last week's staging environment into production
- Smoke test again on public CNAMES
- Send out email to synapse-users announcing new functionality available on production, and expected tomorrow for staging
- Decommission old production stack, leave data intact.
Tuesday Morning
These activities should be done first thing Tuesday morning (or Monday evening if preferred):
- Create a new release branch off of orgin - develop. This will be used to put code changes from last week's Standard Release Cycle into staging.
- Send an email to platform announcing that orgin - develop is open for this week's work.
- Ensure we have an automated build of the new release branch working
- Create the initial tagged release candidates for staging on the master branch as release 1.n.0. (See our Git Branching Model).
- Build a new staging environment.
- Deploy the new staging release.
- Run the smoke test and any other automated tests that require the full environment to execute
- Do some QA on staging, focusing on work done in the previous week. The goal here is to identify issues and log in Jira, with recommendations of when the issues should be addressed.
- Switch the staging CNAMES over to make this stack the live staging stack. Send email announcing it's availability to platform + any interested end users (not full synapse-users, how should we manage this?)
In general, developers should think of themselves as operations and QA personal for a day while building up the new staging stack.
These are the step-by-step instructions for doing the above: Staging Deployment, Step by Step
Rest of Week
- Respond to issues on production as appropriate. Top priority is to ensure best experience possible for end users as issues arise.
- Manage the release of patches to staging and production as appropriate. Note, the deployment team is responsible for accepting pull requests into the release branch that are destined for patch releases.
- Do a little QA and monitoring work as appropriate
- Try to get some of your "normal" work done