Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

Note: System metadata, annotations, and provenance records are still stored in Synapse's S3 storage.

Setting

...

up an

...

external AWS S3

...

bucket

This article will describe two ways to setup an External external AWS S3 Bucketbucket:

...

To add files to Synapse that are already in your bucket, see Adding Files in Your S3 Bucket to Synapse belowsee below.

Read-write permissions

To allow authorized Synapse users to upload data to your bucket, read-write permissions need to be set on that bucket so you can allow Synapse to upload and retrieve files:

...

For read-write permissions, you also need to create an object that proves to the Synapse service that you own this bucket. This can be done by creating a file named owner.txt that contains a line separated list of user identifiers that are allowed to register and upload to the bucket. Valid user identifiers are a numeric Synapse user ID or the numeric id ID of a team that you are a member of.

...

Code Block
# copy your owner.txt file to your s3 bucket
aws s3 cp owner.txt s3://nameofmybucket/nameofmyfolder

Read-only permissions

If you do not want to allow authorized Synapse users to upload data to your bucket but provide read access instead, you can change the permissions to read-only:

Code Block
{
    "Statement": [
        {
            "Action": [ "s3:ListBucket*", "s3:GetBucketLocation" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::synapse-share.yourcompany.com",
            "Principal": { "AWS": "325565585839" }
        },
        {
            "Action": [ "s3:GetObject*", "s3:*MultipartUpload*" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::synapse-share.yourcompany.com/*",
            "Principal": { "AWS": "325565585839" }
        }
    ]
}

Enable cross-origin resource sharing (CORS)

In Permissions, click CORS configuration. In the CORS configuration editor, edit the configuration so that Synapse is included in the AllowedOrigin tag. An example CORS configuration that would allow this is:

...

For more information, please read: How Do I Configure CORS on My Bucket?

Setup with AWS Cloudformation

For convienance, AWS Cloudformation can be used to provision a custom AWS S3 bucket for use with Synapse. Using this approach will result in the exact same bucket as described in Setup with AWS Console.

...

Code Block
# (Optional) true for read-write, false (default) for read-only bucket
AllowWriteBucket: 'true'
# (Optional) true (default) to encrypt bucket, false for no encryption
EncryptBucket: 'false'
# (Optional) 'Enabled' to enable bucket versioning, default is 'Suspended'
BucketVersioning: 'Enabled'
# (Optional) 'Enabled' to enable bucket data life cycle rule, default is 'Disabled'
EnableDataLifeCycle: 'Enabled'
# (Optional) S3 bucket objects will transition into this storage class: GLACIER(default), STANDARD_IA, ONEZONE_IA
LifecycleDataStorageClass: 'STANDARD_IA'
# (Optional) Number of days until S3 objects are moved to the LifecycleDataStorageClass, default is 30
LifecycleDataTransition: '90'
# (Optional) Number of days (from creation) when objects are deleted from S3 and LifecycleDataStorageClass, default is 365000
LifecycleDataExpiration: '1825'
# (Optional) Restrict downloading files from this bucket to only AWS resources (e.g. EC2 , Lambda) within the same region as this bucket. default is false.
SameRegionResourceAccessToBucket: 'true'

After executing the cloudformation Cloudformation command, view the AWS cloudformation Cloudformation dashboard to verify whether the bucket was provisioned successfully.

Set S3

...

bucket as

...

upload location

By default, your Projectyour project/Folder uses folder uses Synapse’s default S3 storage location. You can use the external bucket configured above via the web or programmatic clients.

Web

Navigate to your Project/Folder -> Tools -> your project or folder of interest, then select Tools, and Change Storage Location. In the resulting pop-up, select the Amazon S3 Bucket option and fill in the relevant information, where Bucket is the name of your external bucket, Base Key is the name of the folder in your bucket to upload to, and Banner is a short description such as who owns the storage location:

...

Code Block
languager
#set storage location
library(synapser)
library(rjson)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='S3',
                    concreteType='org.sagebionetworks.repo.model.project.ExternalS3StorageLocationSetting',
                    bucket='nameofyourbucket')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

Adding

...

files in

...

your S3

...

bucket to Synapse

If your bucket is set for read-write access, files can be added to the bucket using the standard Synapse interface (web or programmatic).

...

Code Block
languagepy
# create filehandle
fileHandle = {'concreteType': 'org.sagebionetworks.repo.model.file.S3FileHandle',
              'fileName'    : 'nameOfFile.csv',
              'contentSize' : "sizeInBytes",
              'contentType' : 'text/csv',
              'contentMd5' :  'md5',
              'bucketName' : destination['bucket'],
              'key' : 's3ObjectKey',
              'storageLocationId': destination['storageLocationId']}
fileHandle = syn.restPOST('/externalFileHandle/s3', json.dumps(fileHandle), endpoint=syn.fileHandleEndpoint)

f = synapseclient.File(parentId=PROJECT, dataFileHandleId = fileHandle['id'])

f = syn.store(f)
R
Code Block
languager
# create filehandle
fileHandle <- list(concreteType='org.sagebionetworks.repo.model.file.S3FileHandle',
                   fileName    = 'nameOfFile.csv',
                   contentSize = 'sizeInBytes',
                   contentType = 'text/csv',
                   contentMd5 =  'md5',
                   storageLocationId = destination$storageLocationId,
                   bucketName = destination$bucket,
                   key ='s3ObjectKey')
fileHandle <- synRestPOST('/externalFileHandle/s3', body=toJSON(fileHandle), endpoint = 'https://file-prod.prod.sagebase.org/file/v1')

f <- File(dataFileHandleId=fileHandle$id, parentId=projectId)

f <- synStore(f)

See the REST docs for more information on setting external storage location settings using our REST API.

Setting

...

up an

...

external Google Cloud

...

storage bucket

Follow the documentation on Google Cloud’s site to Create a Bucket.

...

To add files to Synapse that are already in your bucket, see Adding Files in Your S3 Bucket to Synapse belowsee below.

Command line

Code Block
# copy your owner.txt file to your s3 bucket
gsutil cp owner.txt gs://nameofmybucket/nameofmyfolder

...

Navigate to your bucket on the Google Cloud Console and select the Upload files button to upload your text file into the folder where you want your data.

Enable cross-origin resource sharing (CORS)

Follow the instructions for Setting CORS on a bucket. You may have to install the gsutil application.

...

For more information, please read: Configuring cross-origin resource sharing (CORS).

Set Google Cloud Bucket as Upload Location

By default, your Project project uses the Synapse default storage location. You can use the external bucket configured above via our programmatic clients or web client.

...

Code Block
#set storage location
library(synapser)
library(rjson)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='GOOGLECLOUDSTORAGE', 
                    concreteType='org.sagebionetworks.repo.model.project.ExternalGoogleCloudStorageLocationSetting',
                    bucket='nameofyourbucket',
                    baseKey='nameOfSubfolderInBucket' # optional, only necessary if using a subfolder in your bucket
               }
)
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting', 
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

Web

Navigate to your Project/Folder -> Tools -> your project or folder of interest, then select Tools, and Change Storage Location. In the resulting pop-up, select the Google Cloud Storage Bucket option and fill in the relevant information, where Bucket is the name of your external bucket, Base Key is the name of the folder in your bucket to upload to, and Banner is a short description such as who owns the storage location.

Adding Files in Your Google Cloud Bucket to Synapse

If your bucket is set for read-write access, files can be added to the bucket using the standard Synapse interface (web or programmatic).

...

Please see the REST docs for more information on setting external storage location settings using our REST API.

Using SFTP

To setup an SFTP as a storage location, the settings on the Project need the project need to be changed, specifically the storageLocation needs to be set. This is best done using either R or Python but has alpha support in the web browser. Customize the code below to set the storage location as your SFTP server:

...

Code Block
library(synapseClient)
synapseLogin()
projectId <- 'syn12345'

destination <- list(uploadType='SFTP',
                    concreteType='org.sagebionetworks.repo.model.project.ExternalStorageLocationSetting',
                    description='My SFTP upload location',
                    supportsSubfolders=TRUE,
                    url='https://your-sftp-server.com',
                    banner='A descriptive banner, tada!')

destination <- synRestPOST('/storageLocation', body=destination)

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body = projectDestination)

Using a

...

proxy to

...

access a local file server or SFTP

...

server

For files stored outside of Amazon, an additional proxy is needed to validate the pre-signed URL and then proxy the requested file contents. View more information here about the process as well as about creating a local proxy or a SFTP proxy.

Set

...

project settings for a

...

local proxy

You must have a key (“your_secret_key”) to allow Synapse to interact with the filesystemfile system.

Python

Code Block
languagepy
import synapseclient
import json
syn = synapseclient.login()
PROJECT = 'syn12345'

destination = {"uploadType":"PROXYLOCAL",
               "secretKey":"your_secret_key",
               "proxyUrl":"https://your-proxy.prod.sagebase.org",
               "concreteType":"org.sagebionetworks.repo.model.project.ProxyStorageLocationSettings"}
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))

project_destination ={"concreteType": "org.sagebionetworks.repo.model.project.UploadDestinationListSetting",
                      "settingsType": "upload"}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = PROJECT

project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))

...

Code Block
languager
library(synapser)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='PROXYLOCAL',
                    secretKey='your_secret_key',
                    proxyUrl='https://your-proxy.prod.sagebase.org',
                    concreteType='org.sagebionetworks.repo.model.project.ProxyStorageLocationSettings')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

Set

...

project settings for a SFTP

...

proxy

You must have a key (“your_secret_key”) to allow Synapse to interact with the filesystemfile system.

Python
Code Block
import synapseclient
import json
syn = synapseclient.login()
PROJECT = 'syn12345'

destination = {"uploadType":"SFTP",
               "secretKey":"your_secret_key",
               "proxyUrl":"https://your-proxy.prod.sagebase.org",
               "concreteType":"org.sagebionetworks.repo.model.project.ProxyStorageLocationSettings"}
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))

project_destination ={"concreteType": "org.sagebionetworks.repo.model.project.UploadDestinationListSetting",
                      "settingsType": "upload"}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = PROJECT

project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))
R
Code Block
library(synapser)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='SFTP',
                    secretKey='your_secret_key',
                    proxyUrl='https://your-proxy.prod.sagebase.org',
                    concreteType='org.sagebionetworks.repo.model.project.ProxyStorageLocationSettings')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

See

...

AWS Security Token Service Storage Locationsalso:

/wiki/spaces/DOCS/pages/2048426057