Synapse Worker Architecture
Users interact with Synapse by making web-services calls to the Synapse REST API. All of the web-service requests are expected to return quickly (30 seconds or less). Typically, the average web-service request times is well under 100 milliseconds across all Synapse web-services. To achieves quick return times, limits must be placed on what can be done on the server thread that handles a user's web-services request. To illustrate the limits the following use cases will be considered:
- When a users makes a web-service request to reply to a thread in a forum, all of the users that are watching the thread expect to receive an email notification.
- A user uploads a CSV files containing millions of rows of data to create a new Synapse table.
- A user wishes to download thousands of tiny files bundled into a single large zip file.
In all of these use case, the work required to fulfill the user's request (either directly or indirectly) can take much longer than the standard web-service-timeout to complete. In addition, if the Synapse REST API server machines were to perform all of the requested work, then the machines would become unresponsive to additional web-service requests. Therefore, the REST API server machines limit the work performed on the request thread by asynchronously delegating most of the work to a cluster of worker machines. This allows the REST API server machines to both return quickly from any web-service request and remain available for additional web-service requests. This document outlines the architecture responsible for enabling the work to be performed asynchronously by a cluster of worker machines.
Types of Work
There are two main types of work that are performed by the cluster of worker machines; indirect and asynchronous jobs:
- Indirect work - This is work that is indirectly performed after a user performs a web-service request. Sending the notification emails in the first use-case is an example of indirect work. While a user's requests can trigger indirect work, the user is not tracking the completion of the indirect work.
- Asynchronous Job - This is work that the user has directly requested Synapse to perform as a long-running task. Both the second and third use-cases are examples asynchronous jobs. Users make web-service requests to both start and track the progress of asynchronous jobs.
Indirect work occurs as a response to synchronous web-service requests, while asynchronous jobs are explicitly started and track using web-services requests.
Indirect Work
A single synchronous web-service request can trigger a cascade of indirect work to occur. For example, a web-service request to update an Entity can trigger the following work:
- The search index is updated to reflect entity update.
- The entity replication data used for Views must be updated.
- A new snapshot of the Entity must be taken for auditing.
In this example, each type of indirect work is handled by a separate type of worker. Therefore, a single update event must be broadcast to multiple workers, with each performing independent tasks. Amazon's Simple Notification Service (SNS) is leveraged to facilitate the broadcast of each event to multiple workers.
When a users updates an entity, a single message is published to 'prod-<stack-number>-reop-ENTITY' SNS topic after the update transaction commits. No messages are published for failed transactions. The above screen shot shows four Amazon Simple Queue Service (SQS) queues, one for each worker, that subscribe to the 'prod-188-repo-ENTITY' topic. When a message is pushed to the entity topic, a copy of the message is automatically pushed to each subscribing queue. Each worker can then process messages from its own queue independently of all other indirect workers, at is own pace.