Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There may be many projects with two categories of data
"Raw Data" written to the bucket by some externally managed process, likely not edited in Synapse.  
"Processed Data" created as Synapse users read the raw data and write new data into Synapse.
We'd like one external organization to easily foot the bill for both data sets, as well as any large scale computation on that data.  
Maybe this is easily achieved if we configure a Synapse project with 2 external buckets.  The processed data bucket would be the default upload location for the project, so as users work with Synapse all data naturally goes there.  The raw data bucket could be made available as a read-only data resource through Synapse, and it could be populated via some other mechanism, accessible only to the project admins.

Downloading large files

For exporting large BAM datasets, the following procedure applies:

  • User creates S3 bucket in us-east with permissions for Synapse to copy into
  • User indicates which files to copy
  • Synapse copies the files to users S3 bucket
  • User copies data from their bucket to final destination

Open Questions

- How is a file handle created from an uploaded file in Synapse today?  What file naming conventions or S3 metadata are required (e.g. MD5-hash, content-type)?

...