Download Proxy StorageLocation
Use Case
We have encountered several cases where data providers want to make data available on Synapse but they restricted to hosting the data within their own data centers (see PLFM-3548). Currently, there are two Synapse features that were designed to address such cases:
- ExternalFileHandles
- SFTP Proxy
Unfortunately both features have serious limitations. With both features, FileHandles are created in Synapse that point to data files within the data provider's data center. The resulting FileHandles are then associated with Synapse FileEntites. Synapse users are then able to download these files after passing two completely separate authentication/authorization layers. The first layer involves the Synapse security system that first authenticates the users (log in) and then makes an authorization check (ACL check) to determine if the users has permission to download the requested file. If the users passes this check their client is then provided with a URL that sends the the client to the final hosting data center. The client must then pass a second authorization/authentication check against the data center security system. The two layers of security makes gaining access to the files difficult and confusing. Data access is also brittle as two completely separate security systems must manually synchronized.
Therefore, we need a new solution that allows data to be stored in a 3rd party data center with authentication/authorization provided exclusively by Synapse.
Synapse Download HTTPS Proxy
The proposed solution is to provide a generic Download Proxy that can be deployed co-located to a data provider's data center. The Proxy would be configured with "services" level credentials that would grant the service read permission to select files within the data provider's data center. The proxy would also be configured with a shared secret key that will be used to validate pre-signed URLs. A user would then download data files via one of the Synapse client (Web, R, Python) through the proxy as follows:
- A user chooses to download a FileEntity from Synapse that is associated with a ProxyFileHandle.
- The Synapse security layer authenticates the user and validates that the user is authorized to download the file by checking the FileEntity's Access Control List (ACL).
- If the user has permission to download the file, a new pre-signed URL will be generated by Synapse and returned (or redirected) to the client. Synapse will sign the URL using the shared secret key of the ProxyStorageLocation identified in the associated FileHandle. This pre-signed URL will have the following format:
- HTTPS://<proxy_host>/<proxy_path>?storageLocationId=123&expires=<epoch_expires_ms>&signature=<signature>
- The client is expected to treat the pre-signed URL exactly like any other pre-signed URLs returned by Synapse by simply executing an HTTP GET on the URL.
- The pre-signed URL's host will take the client to the co-located Download Proxy.
- The Download Proxy will check the signature of the pre-signed URL using the pr-configured shared secret key.
- If the pre-signed URL is valid and not expire the proxy will then attempt to connect to the data center using its services credentials and the file path from the per-signed URL.
- If the file exists and the proxy service account has read access then the contents of the file will be streamed to the caller via the HTTPS response.
Note: The Download HTTPS proxy is generic. It could be used to proxy multiple types of file transfer protocols including HTTPS, SFTP, FTP, local files, SSH.