Synapse VPC and VPN

Synapse VPC and VPN

Starting in May 2018, the Synapse infrastructure will be deployed within an AWS Virtual Private Cloud (VPC).  The infrastructure is composed of three main environments: web-UI (portal), service-APIs (repo), and workers.  Each environment is deployed as an Elastic Beanstalk Environment which is composed of load balancers, EC2 instances, and MySQL database.  This document outlines the networking architecture of the Synapse VPC.

Blue Green Deployment

Since the beginning, the Synapse development team has used a variation of blue-green deployment for continuous delivery of releases.  In the book Continuous Delivery the authors describe blue green deployment as the practice of maintaining two production stacks named Blue and Green at all times.  At any given time only one of the two stacks is 'live' and accessible to end-users.  The 'live' stack is considered 'production' while the 'inactive' stack is considered 'staging'.  All maintenance and code patches are applied to the 'inactive' staging stack where the changes can go through validation and quality control.  When the staging stacks passes both validation and QA, the staging stack is ready to become production.  The switch from staging to production is done by routing all network traffic to the new stack.  Specifically, the production CNAMEs are changed to point to the new stack.  With the commodification of cloud computing, it is not longer necessary to maintains and swap two physical stacks.  Instead, a new staging stack can be created each week in the cloud.  Once the old production stack become inactive after the swap it can simply be deleted.  This means Synapse still maintains two production-capable stacks at all times, but we no longer call one blue and the other green.  Instead, each stack is assigned a new number designation.  For example, the current production stack is designated 225 while the current staging stack is designated 226.

Return of Colors

Prior to switching to a VPC, each EC2 and Database in the Synapse stack was issued a public IP address and was visible to the public internet.  While each machine was still protected by a firewall (Security Group), SSH keys and passwords, it was theoretically possible for hackers to probe the public visible machines for weaknesses.  By switching to a VPC, it is possible to completely hide the EC2s and Databases instances from the public internet.



At a high level, a VPC is a named container for virtual network.  For example, the Synapse production VPC was assigned a CIDR of 10.20.0.0/16.  This means Synapse networks contains 65,536 IP address between 10.20.0.0 and 10.20.255.255.  The VPC is then further divided into subnets which can be declared as either Private or Public.  Instances contained in Public subnets can be assigned public IP addresses and therefore can be seen on the public internet.  Conversely, instances deployed in private subnets do not have public IP address and cannot be seen by the public internet.  Instance in private subnets are only visible to machines deployed within the containing VPC.  We will cover how internal developers can access to machines in private subnets in a later section.

Figure 1. Synapse VPC

The Synapse VPC is divided into six public and twenty-four private subnets that span six Availability Zone (AZ): us-east-1a, us-east-b, us-east-c, us-east-d, us-east-e, us-east-f (see Figure 1).   The deployment of all instances (EC2s and Databases) across six zones ensure redundancy should an outage occur in a single zone (such as the event in April 2011), and broadens the availablity of instance types. For details on each subnet see Table 1.



Subnet Name

Type

CIDR

First

Last

Total

Subnet Name

Type

CIDR

First

Last

Total

PublicUsEast1a

Public

10.20.0.0/21

10.20.0.0

10.20.7.255

2048

PublicUsEast1b

Public

10.20.8.0/21

10.20.8.0

10.20.15.255

2048

PublicUsEast1c

Private

10.20.16.0/21

10.20.16.0

10.20.23.255

2048

PublicUsEast1d

Private

10.20.24.0/21

10.20.24.0

10.20.31.255

2048

PublicUsEast1e

Private

10.20.32.0/21

10.20.48.0

10.20.39.255

2048

PublicUsEast1f

Private

10.20.40.0/21

10.20.40.0

10.20.47.255

2048

RedPrivateUsEast1a

Private

10.20.48.0/24

10.20.48.0

10.20.48.255

256

RedPrivateUsEast1b

Private

10.20.49.0/24

10.20.49.0

10.20.49.255

256

RedPrivateUsEast1c

Private

10.20.50.0/24

10.20.50.0

10.20.50.255

256

RedPrivateUsEast1d

Private

10.20.51.0/24

10.20.51.0

10.20.51.255

256

RedPrivateUsEast1e

Private

10.20.52.0/24

10.20.52.0

10.20.52.255

256

RedPrivateUsEast1f

Private

10.20.53.0/24

10.20.53.0

10.20.53.255

256

BluePrivateUsEast1a

Private

10.20.56.0/24

10.20.56.0

10.20.56.255

256

BluePrivateUsEast1b

Private

10.20.57.0/24

10.20.57.0

10.20.57.255

256

BluePrivateUsEast1c

Private

10.20.58.0/24

10.20.58.0

10.20.58.255

256

BluePrivateUsEast1d

Private

10.20.59.0/24

10.20.59.0

10.20.59.255

256

BluePrivateUsEast1e

Private

10.20.60.0/24

10.20.60.0

10.20.60.255

256

BluePrivateUsEast1f

Private

10.20.61.0/24

10.20.61.0

10.20.61.255

256

GreenPrivateUsEast1a

Private

10.20.64.0/24

10.20.64.0

10.20.64.255

256

GreenPrivateUsEast1b

Private

10.20.65.0/24

10.20.65.0

10.20.65.255

256

GreenPrivateUsEast1c

Private

10.20.66.0/24

10.20.66.0

10.20.66.255

256

GreenPrivateUsEast1d

Private

10.20.67.0/24

10.20.67.0

10.20.67.255

256

GreenPrivateUsEast1e

Private

10.20.68.0/24

10.20.68.0

10.20.68.255

256

GreenPrivateUsEast1f

Private

10.20.69.0/24

10.20.69.0

10.20.69.255

256

OrangePrivateUsEast1a

Private

10.20.72.0/24

10.20.72.0

10.20.72.255

256

OrangePrivateUsEast1b

Private

10.20.73.0/24

10.20.73.0

10.20.73.255

256

OrangePrivateUsEast1c

Private

10.20.74.0/24

10.20.74.0

10.20.74.255

256

OrangePrivateUsEast1d

Private

10.20.75.0/24

10.20.75.0

10.20.75.255

256

OrangePrivateUsEast1e

Private

10.20.76.0/24

10.20.76.0

10.20.76.255

256

OrangePrivateUsEast1f

Private

10.20.77.0/24

10.20.77.0

10.20.77.255

256

Table 1.  Synapse VPC 10.22.0.0/16 subnets



Given that subnet address cannot overlap and must be allocated from a fix range of IP addresses (defined by the VPC), it would be awkward to dynamical allocate new subnets for each new stack.  Instead, we decided to create fixed/permanent subnets to deploy new stacks into each week, thus returning to the blue green naming scheme.  Since we occasionally need to have three production-capable stacks running at a time, we included a red subnets.  We will continue to give each new stack a numeric designation, but we will also assign a color to each new stack.  Each color will be assigned in a round-robin manner.  For example, stack 226 will be deployed to Red, 227 to Blue, 228 to Green, 229 to Red etc.. The Orange subnets are reserved for shared resources such as the ID generator database.



Public subnets

Any machine that must be publicly visible must be deployed to a public subnet.  Since it is not possible to isolate public subnets from each other, there was little value in creating a public subnet for each color.  Instead, one public subnet per availability zone was created.  Each public subnet will contain the Elastic Beanstalk loadbalancers for each environment (portal, repo, works) of each stack.  There is also a NAT Gateway deployed to each public subnet (need one NAT per AZ).  We will cover the details of the NAT Gateways in a later section.

Each subnet has a Route Table that answers the question of how network traffic flows out of the subnet.  Table 2 shows the route table used by both public subnets.  Note: The most selective route will always be used.



Destination

Target

Description

Destination

Target

Description

10.20.0.0/16

local

The default entry that identifies all address within this VPC (10.20.0.0/16) can be found within this VPC (local)

10.1.0.0/16

VPN Peering Connection

If the destination IP address in the VPN VPC (10.1.0.0/16), then use the VPC peering connection that connects the two VPCs

0.0.0.0/0

Internet Gateway

If the destination is any other IP address (0.0.0.0/0), then use the VPC's Internet Gateway.

Table 2. Public subnet route table entries



Each loadbalancer deployed in the public subnets is protected by firewall called a Security Groups that blocks access to all ports except 80 (HTTP) and 443 (HTTPS).  See Table 3 for the details of the loadbalancer security groups.



Type

CIDR

Port Range

Description

Type

CIDR

Port Range

Description

HTTP

0.0.0.0/0

80

Allow all traffic in on port 80 (HTTP)

HTTPS

0.0.0.0/0

443

Allow all traffic in on port 443 (HTTPS)

Table 3. Security Groups for Elastic Beanstalk Loadbalancers deployed to public subnets



Private Subnets

As stated above, any machine deployed to a private subnet is not visible from outside the VPC.  Each EC2 and database instances of a stack will be deployed to the two private subnets of a single color.  For example, if stack 226 is configured to be Red,  the EC2s and databases for that stack will be deployed to both RedPrivateUsEast1a and RedPrivateUsEast1b.  The private subnets for each color are assigned CIDRs such that all of the address within both private subnets fall within a single CIDR (see Table 4). This means the CIDR of each color can be used to isolate each stack.  For example, instances in red subnets cannot see instances in blue subnets, or vise versa.  Table 5 shows an example of the Security group applied to the EC2 instances within the red subnet using the color group CIDR.  Table 6 shows an example of the Security Group applied to the MySQL databases deployed in the red private subnets.



Color Group

CIDR

First

Last

Total

Color Group

CIDR

First

Last

Total

Red

10.20.48.0/21

10.20.48.0

10.20.53.255

2048

Blue

10.20.56.0/21

10.20.56.0

10.22.63.255

2048

Green

10.20.64.0/21

10.20.64.0

10.22.69.255

2048

Table 4. CIDR for each Color Group



Type

CIDR/SG

Port Rang

Description

Type

CIDR/SG

Port Rang

Description

HTTP

10.20.48.0/21

80

Allow machines within either Red private subnet to connect with HTTP

SSH

10.1.0.0/16

22

Allows traffic from the Sage VPN (10.1.0.0/16) to connect with SSH

HTTP

Loadbalancer Security Group

80

Allow the loadbalancers from this stack to connect with HTTP

Table 5.  Security Group applied to EC2 instances in Red Private Subnets



Type

CIDR

Port Range

Description

Type

CIDR

Port Range

Description

MySQL

10.20.48.0/21

3306

Allows machines within either Red private subnet to connect on port 3306

MySQL

10.1.0.0/16

3306

Allows traffic from the Sage VPN (10.1.0.0/16) to connect on port 3306

Table 6. Security Group applied to MySQL RDS instances in Red Private Subnets



Just like with public subnets, each private subnet has a Route Table that describes how network traffic flows out of the subnet.  Table 7 shows an example of the Route Table on the RedPrivateUsEast1a subnet.



Destination

Target

Description

Destination

Target

Description

10.20.0.0/16

local

The default entry that identifies all address within this VPC (10.20.0.0/16) can be found within this VPC (local)

10.1.0.0/16

VPN Peering Connection

If the destination IP address in the VPN VPC (10.1.0.0/16), then use the VPC peering connection that connects the two VPCs

0.0.0.0/0

NAT Gateway us-east-1a

If the destination is any other IP address (0.0.0.0/0), then use the NAT Gateway that is within the same availability zone.

Table 7.  The Route Table on RedPrivateUsEast1a subnet



Internet Access from Within a Private Subnet

All of the EC2 instances of Synpase need to make calls to AWS, Google, Jira, etc..  However, each instances is deployed in a private subnet without a public IP address so how does the response to and outside request make it back to the private machine?  The answer is the Network Address Translator (NAT) Gateway.  Figure 1 shows that each public subnet has a NAT Gateway, and Table 7 shows each private subnets has a route to  0.0.0.0/0 that points to the NAT Gateway within the same AZ.  Each NAT has a public IP address and acts a proxy for outgoing requests from instances in the private subnets.  When NAT receives an outgoing request from a private instance, it will replace the response address of the request with its own public address before forwarding the request.  As a result, the response goes back to the NAT, which then forwards the response to the original caller within the private subnet.  Note: The NAT will not respond to requests that originate from the outside world so it cannot be used by gain access to the private machines.



While setting up the Synapse VPC it was our plan to utilizes Network Access Control Lists (Network ACLs) to further isolate each private subnet.   In fact, we originally setup Network ACL similar to the security groups in Tables 3, 5, and 6.  With the Network ACLs in place the EC2 instances were unable to communicate with the outside world (for example, 'ping amazon.com' would hang).  The following document outlines important differences between Security Groups and Network ACLs: Comparison of Security Groups and Network ACLs.  The explanation of the observed communication failures is summarized in Table 8..  In short, if a private subnet needs outgoing access on to 0.0.0.0/0, then you must also allow incoming access to 0.0.0.0/0 in the Network ACL.



Security Group

Network ACL

Security Group

Network ACL

is stateful: Return traffic is automatically allowed, regardless of any rules

Is stateless: Return traffic must be explicitly allowed by rules

Table 8. Important difference between Security Groups and Network ACLs