...
File Organization and Format
S3 bucket URLs are best represented using unique hostnames under our DNS control to assure uniqueness.
- Recommended format: http://data01.sagebase.org.s3.amazonaws.com/(file or directory)
- Data01 allows for 100 buckets numerically defined, max buckets currently allowed per account
- Emb-data01/Pub-data01 if bucket separation of embargoed/unrestricted data is required
Logging should be done to it's own bucket.
- Allows for bucket level permissions, easy to maintain
- http://logs.sagebase.org.s3.amazonaws.com
Two paradigms for storing files (which can also be used in combination):
- Store datasets in tar.gz files
- Benefits: Easier to conform to s3 URL naming standard, Compressed to reduce size in storage and transfer, much easier to load into s3 and manage in Platform
- Cons: Users do not have access to contents of file unless we store it separately in the platform
- Example: http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_pacakge.tar.gz
- Store datasets uncompressed
- Benefits: Users can retrieve single files from archive
- Cons: Users will have use scripts to retrieve entire dataset, Files must be named to match URL naming restrictions
- Example: http://data01.sagebase.org.s3.amazonaws.com/tcga_curation_package/sage_bionetworks_user_agreement.pdf
- Store datasets in both formats
- Benefits: Gain benefits of both formats
- Cons: More than double the storage space used, file naming restriction
- Example: Both of the above
...
The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192 192 links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.
...