Elastic Container Registry with Synapse

UPDATE: The idea described below turns out NOT to work. The details of our investigation are here. We are leaving this document in Confluence for future reference.

Synapse has a Docker registry. The registry is implemented using Docker’s open source registry. Two copies run, on each of a pair of EC2 instances behind an ALB. On each EC2 we run the registry as a Docker container, obtained from DockerHub. There are several open issues with our current implementation:

Site Reliability Engineering: The infrastructure was deployed manually and should be scripted using CloudFormation for reproducibility. (See

PLFM-7259 - Getting issue details... STATUS

);
Upgrading the registry: There is no process for updating the running version of the open source registry;
Scanning the registry server: FISMA/FedRAMP requires that we can both the EC2 instances and the registry container instances, the latter prior to deployment as a part of an automated CI/CD process (i.e., as part of addressing (1), above).
Scanning the registry contents: Today we do not scan images loaded into the Synapse registry for vulnerabilities. We have open issues to evaluate and implement a registry scanner. See:

PLFM-7429 - Getting issue details... STATUS

All of the above could be addressed by using AWS Elastic Container Registry (ECR) rather than the open source registry. ECR is a hosted solution which would eliminate the infrastructure we have deployed and the first three issues listed above. Further, ECR has a scanning capability, providing an immediate solution to the last issue.

The open source registry allows delegating authorization and we use the feature to let Synapse permissions control which Docker repositories a user can access (push/pull). We have avoided ECR because it doesn’t have this feature, it requires authorization be done using IAM policies. However it is possible to map Synapse permissions to fine grained (repository-level) permissions using AWS STS with in-line policies (much as we do when issuing STS tokens to access Synapse files). The flow of web requests would be:

The Docker client makes a request (push/pull) to the registry endpoint, which we would map to a Synapse request.
Since the request has no authentication header, a 401 response is returned.
The Docker client repeats the request, passing a user name + password or personal access token. Synapse evaluates the request against the user’s permissions, creates an STS token scoped to the requested repository, and with a short lifetime, and creates a Docker authorization token.
Synapse redirects the client (using a 307 response) to the actual ECR endpoint, including the auth token in the redirect URL. In the case of a docker push operation, Synapse first calls the ECR Create Repository service before returning the redirect.
UPDATE: At this point the flow fails because (as far as we can tell) ECR itself redirects the client to other URL(s), expecting the client to authenticate the request(s) with the Docker authorization token. Since the client does not retain the token, requests to the specified URL(s) fail. Details here.
ECR completes the push or pull operation.
ECR emits an event which we can connect to a chosen Synapse API using Event Bridge. For PUSH and DELETE events, Synapse will create, update or delete the corresponding Synapse entity. (Synapse already has a web hook for events: https://rest-docs.synapse.org/rest/POST/events.html. This API would be similar.)

The implementation is simple, the most complex step being 3, above, which works as follows:

Synapse determines the repository of interest from the URL*.
Synapse determines the user’s permissions on the repository (intersected with the scope of the PAT).
Synapse invokes the STS Assume Role service for a role which is scoped to ECR, and applies an inline policy of the form:
{ "Version":"2012-10-17", "Statement":[ { "Sid":"GetAuthorizationToken", "Effect":"Allow", "Action":[ "ecr:GetAuthorizationToken" ], "Resource":"*" }, { "Sid":"ManageRepositoryContents", "Effect":"Allow", "Action":[ "ecr:BatchCheckLayerAvailability", "ecr:GetDownloadUrlForLayer", "ecr:GetRepositoryPolicy", "ecr:DescribeRepositories", "ecr:ListImages", "ecr:DescribeImages", "ecr:BatchGetImage", "ecr:InitiateLayerUpload", "ecr:UploadLayerPart", "ecr:CompleteLayerUpload", "ecr:PutImage", "ecr:ListImages" ], "Resource":"arn:aws:ecr:us-east-1:325565585839:syn123456/my/repo" } ] }
Synapse then calls the ECR GetAuthorizationToken service, authenticated by the STS token returned by the previous Assume Role request.
The token is base64 decoded to get the username and password to put in the redirect URL.

`* Some docker commands use multiple repositories. E.g., if a user pushes an image having layers shared with another repo' already in the registry, and if the user has permission to pull from the other rep' then the Docker registry can take advantage of this, reuse the layers already pushed and make the push operation go much faster. We need to ensure that the STS token has access to the repositories necessary for the Docker operation being performed.

Content Migration

We have to migrate the content of the current Docker registry to ECR. A tool for doing it is here: https://hub.docker.com/r/docker/migrator. Note that there is a discrepancy between the list of repo’s returned by the registry and the list of repo’s indexed in Synapse, the former being much larger. We should investigate the discrepancy and see if we can eliminate the ‘orphaned’ repositories.