Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This article will describe two ways to setup an external AWS S3 bucket:

To begin, follow the documentation on Amazon Web Service (AWS) site to Create a Bucket. Buckets do not need to be located in the US.

...

To allow authorized Synapse users to upload data to your bucket, read-write permissions need to be set on that bucket so you can allow Synapse to upload and retrieve files:

Code Block
breakoutModewide
breakoutWidth760
{
    "Statement": [
        {
            "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::thisisthenameofmybucket",
            "Principal": { "AWS": "325565585839" }
        },
        {
            "Action": [ "s3:GetObject", "s3:PutObject", "s3:DeleteObject", "s3:AbortMultipartUpload" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::thisisthenameofmybucket/*",
            "Principal": { "AWS": "325565585839" }
        }
    ]
}

...

Navigate to your bucket on the Amazon Console and select Upload to upload your text file.

...

Command line

Code Block
breakoutModewide
breakoutWidth760
# copy your owner.txt file to your s3 bucket
aws s3 cp owner.txt s3://nameofmybucket/nameofmyfolder

...

If you do not want to allow authorized Synapse users to upload data to your bucket but provide read access instead, you can change the permissions to read-only:

Code Block
breakoutModewide
breakoutWidth760
{
    "Statement": [
        {
            "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::synapse-share.yourcompany.com",
            "Principal": { "AWS": "325565585839" }
        },
        {
            "Action": [ "s3:GetObject" ],
            "Effect": "Allow",
            "Resource": "arn:aws:s3:::synapse-share.yourcompany.com/*",
            "Principal": { "AWS": "325565585839" }
        }
    ]
}

...

In Permissions, click CORS configuration. In the CORS configuration editor, edit the configuration so that Synapse is included in the AllowedOrigin tag. An example CORS configuration that would allow this is:

Code Block
breakoutModewide
breakoutWidth760
<CORSConfiguration>
    <CORSRule>
        <AllowedOrigin>*</AllowedOrigin>
        <AllowedMethod>GET</AllowedMethod>
        <AllowedMethod>POST</AllowedMethod>
        <AllowedMethod>PUT</AllowedMethod>
        <AllowedMethod>HEAD</AllowedMethod>
        <MaxAgeSeconds>3000</MaxAgeSeconds>
        <AllowedHeader>*</AllowedHeader>
    </CORSRule>
</CORSConfiguration>

...

Example using the the awscli:

Code Block
breakoutModewide
breakoutWidth760
aws cloudformation create-stack \
--stack-name MyCustomSynapseBucket \
--template-body file://SynapseExternalBucket.yaml \
--parameters ParameterKey=Department,ParameterValue=Cancer ParameterKey=Project,ParameterValue=Mammography \
ParameterKey=OwnerEmail,ParameterValue=joe.smith@company.com ParameterKey=SynapseUserName,ParameterValue=jsmith

...

The following are optional parameters:

Code Block
breakoutModewide
breakoutWidth760
# (Optional) true for read-write, false (default) for read-only bucket
AllowWriteBucket: 'true'
# (Optional) true (default) to encrypt bucket, false for no encryption
EncryptBucket: 'false'
# (Optional) 'Enabled' to enable bucket versioning, default is 'Suspended'
BucketVersioning: 'Enabled'
# (Optional) 'Enabled' to enable bucket data life cycle rule, default is 'Disabled'
EnableDataLifeCycle: 'Enabled'
# (Optional) S3 bucket objects will transition into this storage class: GLACIER(default), STANDARD_IA, ONEZONE_IA
LifecycleDataStorageClass: 'STANDARD_IA'
# (Optional) Number of days until S3 objects are moved to the LifecycleDataStorageClass, default is 30
LifecycleDataTransition: '90'
# (Optional) Number of days (from creation) when objects are deleted from S3 and LifecycleDataStorageClass, default is 365000
LifecycleDataExpiration: '1825'
# (Optional) Restrict downloading files from this bucket to only AWS resources (e.g. EC2 , Lambda) within the same region as this bucket. default is false.
SameRegionResourceAccessToBucket: 'true'

...

Navigate to your project or folder of interest, then select Tools, and Change Storage Location. In the resulting pop-up, select the Amazon S3 Bucket option and fill in the relevant information, where Bucket is the name of your external bucket, the optional Base Key is the name of the folder in your bucket to upload to, and Banner is a short description such as who owns the storage location:

...

Python

Code Block
breakoutModewide
breakoutWidth760
languagepy
# Set storage location
import synapseclient
import json
syn = synapseclient.login()
PROJECT = 'syn12345'

destination = {'uploadType':'S3',
               'concreteType':'org.sagebionetworks.repo.model.project.ExternalS3StorageLocationSetting',
               'bucket':'nameofyourbucket'}
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))

project_destination ={'concreteType': 'org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                      'settingsType': 'upload'}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = PROJECT

project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))

R

Code Block
breakoutModewide
breakoutWidth760
languager
#set storage location
library(synapser)
library(rjson)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='S3',
                    concreteType='org.sagebionetworks.repo.model.project.ExternalS3StorageLocationSetting',
                    bucket='nameofyourbucket')
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting',
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

...

If the bucket is read-only or you already have content in the bucket, you will have to add representations of the files in Synapse programmatically. This is done using a FileHandle, which is a Synapse representation of the file.

Python

Code Block
languagebreakoutModewide
breakoutWidth760
languagepy
# create filehandle
fileHandle = {'concreteType': 'org.sagebionetworks.repo.model.file.S3FileHandle',
              'fileName'    : 'nameOfFile.csv',
              'contentSize' : "sizeInBytes",
              'contentType' : 'text/csv',
              'contentMd5' :  'md5',
              'bucketName' : destination['bucket'],
              'key' : 's3ObjectKey',
              'storageLocationId': destination['storageLocationId']}
fileHandle = syn.restPOST('/externalFileHandle/s3', json.dumps(fileHandle), endpoint=syn.fileHandleEndpoint)

f = synapseclient.File(parentId=PROJECT, dataFileHandleId = fileHandle['id'])

f = syn.store(f)
R
Code Block
breakoutModewide
breakoutWidth760
languager
# create filehandle
fileHandle <- list(concreteType='org.sagebionetworks.repo.model.file.S3FileHandle',
                   fileName    = 'nameOfFile.csv',
                   contentSize = 'sizeInBytes',
                   contentType = 'text/csv',
                   contentMd5 =  'md5',
                   storageLocationId = destination$storageLocationId,
                   bucketName = destination$bucket,
                   key ='s3ObjectKey')
fileHandle <- synRestPOST('/externalFileHandle/s3', body=toJSON(fileHandle), endpoint = 'https://file-prod.prod.sagebase.org/file/v1')

f <- File(dataFileHandleId=fileHandle$id, parentId=projectId)

f <- synStore(f)

...

To add files to Synapse that are already in your bucket, see below.

Command line

Code Block
breakoutModewide
breakoutWidth760
# copy your owner.txt file to your s3 bucket
gsutil cp owner.txt gs://nameofmybucket/nameofmyfolder

...

The configuration must include Synapse as a permitted origin. An example CORS configuration that would allow this is:

Code Block
breakoutModewide
breakoutWidth760
[
    {
        "maxAgeSeconds": 3000,
        "method": ["GET", "POST", "PUT", "HEAD"],
        "origin": ["*"],
        "responseHeader": ["Content-Type"]
    }
]

Using gsutil, you can set the CORS configuration with the command:

Code Block
breakoutModewide
breakoutWidth760
gsutil cors set cors-json-file.json gs://example-bucket

...

By default, your project uses the Synapse default storage location. You can use the external bucket configured above via our programmatic clients or web client.

Python

Code Block
breakoutModewide
breakoutWidth760
# Set storage location
import synapseclient
import json
syn = synapseclient.login()
PROJECT = 'syn12345'

destination = {'uploadType':'GOOGLECLOUDSTORAGE', 
               'concreteType':'org.sagebionetworks.repo.model.project.ExternalGoogleCloudStorageLocationSetting',
               'bucket':'nameofyourbucket',
               'baseKey': 'nameOfSubfolderInBucket' # optional, only necessary if using a subfolder in your bucket
               }
destination = syn.restPOST('/storageLocation', body=json.dumps(destination))

project_destination ={'concreteType': 'org.sagebionetworks.repo.model.project.UploadDestinationListSetting', 
                      'settingsType': 'upload'}
project_destination['locations'] = [destination['storageLocationId']]
project_destination['projectId'] = PROJECT

project_destination = syn.restPOST('/projectSettings', body = json.dumps(project_destination))

R

Code Block
breakoutModewide
breakoutWidth760
#set storage location
library(synapser)
library(rjson)
synLogin()
projectId <- 'syn12345'

destination <- list(uploadType='GOOGLECLOUDSTORAGE', 
                    concreteType='org.sagebionetworks.repo.model.project.ExternalGoogleCloudStorageLocationSetting',
                    bucket='nameofyourbucket',
                    baseKey='nameOfSubfolderInBucket' # optional, only necessary if using a subfolder in your bucket
               }
)
destination <- synRestPOST('/storageLocation', body=toJSON(destination))

projectDestination <- list(concreteType='org.sagebionetworks.repo.model.project.UploadDestinationListSetting', 
                           settingsType='upload')
projectDestination$locations <- list(destination$storageLocationId)
projectDestination$projectId <- projectId

projectDestination <- synRestPOST('/projectSettings', body=toJSON(projectDestination))

...

If the bucket is read-only or you already have content in the bucket, you will have to add representations of the files in Synapse programmatically. This is done using a FileHandle, which is a Synapse representation of the file.

Python

Code Block
breakoutModewide
breakoutWidth760
languagepy
externalFileToAdd = 'googleCloudObjectKey' # put the key for the file to add here

# create filehandle
fileHandle = {'concreteType': 'org.sagebionetworks.repo.model.file.GoogleCloudFileHandle', 
              'fileName'    : 'nameOfFile.csv',
              'contentSize' : "sizeInBytes",
              'contentType' : 'text/csv',
              'contentMd5' :  'md5',
              'bucketName' : destination['bucket'],
              'key' : externalFileToAdd,
              'storageLocationId': destination['storageLocationId']}
fileHandle = syn.restPOST('/externalFileHandle/googleCloud', json.dumps(fileHandle), endpoint=syn.fileHandleEndpoint)
f = synapseclient.File(parentId=PROJECT, dataFileHandleId = fileHandle['id'])
f = syn.store(f)

R

Code Block
breakoutModewide
breakoutWidth760
externalFileToAdd <- 'googleCloudObjectKey' # put the key for the file to add here

# create filehandle
fileHandle <- list(concreteType='org.sagebionetworks.repo.model.file.GoogleCloudFileHandle', 
                   fileName    = 'nameOfFile.csv',
                   contentSize = 'sizeInBytes',
                   contentType = 'text/csv',
                   contentMd5 =  'md5',
                   storageLocationId = destination$storageLocationId,
                   bucketName = destination$bucket,
                   key = externalFileToAdd)
fileHandle <- synRestPOST('/externalFileHandle/googleCloud', body=toJSON(fileHandle), endpoint = 'https://file-prod.prod.sagebase.org/file/v1')
f <- File(dataFileHandleId=fileHandle$id, parentId=projectId)
f <- synStore(f)

Please see the REST docs for more information on setting external storage location settings using our REST API.

See also:Compute Directly on Data in Synapse or S3

/wiki/spaces/DOCS/pages/2048426057