A design document for building a solution for use case 1 can be found here: Synapse Storage Reports: API Design Document |
A handful of Synapse projects that are managed internally store so much data (10s-100s TB) that there are benefits (namely, itemized S3 billing) to placing each project in their own storage location. It would be useful to be able to move all of the content in one of these projects to its own bucket. Additionally, we can encrypt this data as we move it into this bucket, dramatically reducing the amount of data we need to encrypt in the main bucket as a part of .
This is a high-priority use case and is driving this proposal. This issue "blocks" tickets related to certification remediation (namely, storing PHI on AWS). Those tickets do not necessarily need this service (we can come up with workarounds), but this could simplify the work that needs to be done later.
Users have stored files on an SFTP server that is expected to be decommissioned, so they must move them to another storage location (an external object store). The data is not managed by Synapse, so storage users must transfer the files manually. Because the content of the files have not changed, but the location has, users would like to update all file handles tied to the old storage location (or an enumerated list of particular file handles, if only a subset of the data is being moved) to point to the new file location.
Priority is not currently high, but it would be very useful for some users (and would save members of the engineering team time, since on at least one occasion this has been done manually on the database). It saves a lot of time down the road, and may be fairly trivial to implement given the similarity of requirements in other use cases.
Synapse is deprecating SFTP storage. To simplify migration (and accelerate deprecation), we can give SFTP users the option to migrate their data into Synapse storage.
Based on the following use cases, we would probably need to
Use Case 1 could probably be one operation called by a project owner or bucket owner. This operation could handle (1-4).
Use Case 4 could be similar, but only (1-3)
Use Case 3 could involve a user calling (1) to get a list of file handles and (3) to update the file handles (one suggestion was to have a similar interface to the Python client's copyFileHandles method. Perhaps an updateFileHandles method would make sense here from a user perspective)
Issues we need to resolve that may guide implementation or refine use cases.
An individual user should not be able to modify file handles that they do not own. Migrations that involve file handles owned by multiple users should only be performed by an admin after determining that the.
Users should be permitted to update their own file handles in cases where Synapse does not manage the storage.
This JSON file outlines all of the objects that can be tied to a file handle
Of these, we can walk through a project and pick out these objects and migrate/modify their associated file handles.
FileEntity
TableEntity
WikiAttachment
WikiMarkdown
We must consider the implications of moving all file handles belonging to entities structured under a project. By moving all files referenced by a project to an S3 bucket that isn't managed by Sage, the S3 bucket owner will able to delete files referenced in Synapse. If those files are owned by a different Synapse user, and used in other places in Synapse, the file could be deleted or modified, when the user owning the file handle assumed they were safe in Synapse storage. For this reason, we should initially only consider moving files between buckets managed by Sage until we are sure there are no risks similar to this.