Option A
Solution
The four hierarchy related operations can be converted to asynchronous jobs. Just like all existing asynchronous jobs in Synapse, a user would start the job and receive a job ID. The client would then poll the job's status in a loop waiting for the job to complete.
An asynchronous worker would execute each job by updating all dependencies in a single transaction. A message for each dependency would be sent after the single transaction commits.
Concurrency & Consistency
Just like the current implementation, this option would require the use of locks to maintain consistency when the same hierarchy is changed concurrently by multiple threads. However, non-blocking locks could be used instead instead of blocking locks. This would involve the worker attempting to acquire a semaphore lock before executing the job. If the lock is not acquired, the message for the job would be returned to the queue for a future retry. The non-blocking locks are less resource intensive than the blocking locks used in the current implementation.
Migration
The asynchronous job manager will reject any job status change when a stack is in read-only mode. So if a long running job that is started before entering read-only mode, the job will fail shortly after read-only mode is set. The entire job would need to be run in a single transaction so that failure would trigger the rollback of the transaction.
User Experience
Users would be expected to start and wait for potentially long running jobs to finish. This is an improvement over the current behavior but users may not want wait around for 20+ minutes for their operations to complete. The option would retain the all-or-none user experience of the existing implementation.
Problems with this Option
- This option would require a breaking API change. All clients would need to updated to use the asynchronous jobs. The existing synchronous calls would need to be removed.
- The entire operation would need to be run in a single database transaction that could span a long period of time.
- Jobs started immediately before before setting the stack to read-only mode will result in job failure.