Document toolboxDocument toolbox

How to Migrate Project Data from Synapse Storage to Another S3 Bucket (not yet implemented)

This does not work!

Note: This document describes a process that is currently under development, and has not been released. These steps will not work until this has been implemented into Synapse!

Large projects in Synapse should be moved from the main Synapse Storage bucket to another S3 bucket. By doing this, S3 can generate itemized costs for storage and egress involving that bucket. This also provides a mechanism for Synapse to "offload" the costs to an external party that manages the AWS account owning the destination S3 bucket.

This document will describe the steps necessary for accomplishing this task.

Step-by-step guide

  1. Contact the project manager and have them/assist them in configuring a destination S3 bucket
    1. Open question:  PLFM-5250 - Getting issue details... STATUS  The S3 Storage Location record in Synapse should have a meaningful (and easily understood) label that explains the storage location. Any user should know who has custody of their data (be it Synapse, or a 3rd party).
      1. An S3 protocol + bucket name (current implementation) does not satisfy this.
  2. Move the project's storage location to the new S3 bucket. This will cause all new files stored in the project to be uploaded to the new bucket.
  3. Determine the file handles belonging to file entities that are stored in Synapse storage
    1. The project should probably be placed in read only mode beforehand – will elaborate later. Essentially, you want to make sure you can identify all of the file handle IDs belonging to file entities in the old bucket. To do this, it must be ensured that existing file entities are not moved or modified in a way that would break searching through the project.
    2. Recursively search through the project to get file handles belonging to file entities in the project. Filter the file handles to include only those that are in the default Synapse storage bucket. Record these file handle IDs, as these must be modified.
      1. This will probably be a script
      2. Note that no new files will be added to the S3 bucket after steps 1 and 2 have been completed.
    3. The project may be lifted from read-only mode.
  4. Using an administrator account, a Synapse administrator should call PUT /fileHandle/{id} (not yet implemented) on each file handle, specifying the new S3 bucket. This call will
    1. Copy the underlying file (and its preview) to the new bucket
    2. When the file has been copied and verified to exist in both locations
      1. the default Synapse storage version will be moved to low-cost S3 storage with a lifecycle policy to delete in some period of time.
      2. The file handle record will be updated with the new bucket and key.
    3. If the process is interrupted
      1. Re-retrieve the file handle details from the existing list of file handle IDs
      2. Filter the list of remaining file handles to contain only those that remain in the default Synapse storage bucket.
  5. Verify with the project owner that file entities that have been moved successfully.