...
Options to Consider
AWS Public Data Sets
Current Scenario:
- Sage currently has two data sets stored as "AWS Public Datasets" in the US West Region.
- Users can discover them by browsing public datasets on http://aws.amazon.com/datasets/Biology?browse=1 and also via the platform.
- Users can use them for cloud computation by spinning up EC2 instances and mounting the data as EBS volumes.
- Users cannot directly download data from these public datasets, but once the have them mounted on an EC2 host, they can certainly
scp
them to their local system. - Users are not forced to sign a Sage-specified EULA prior to access since because they can bypass the platform directly and access this data via normal AWS mechanisms.
- Users must have an AWS account to access this data.
- There is no mechanism to grant access. All users with AWS accounts are granted access by default.
- There is no mechanism to keep an audit log for downloads or other usage of this data.
- Users pay their own costs for EC2 and bandwidth charges. Hosting is free.
Future Scenario:
- this is currently EBS only but it will also be available for S3 in the future
- TODO ask Deepak what other plans they have in mind for the re-launch of AWS Public Datasets.
- TODO tell Deepak our suggested features for AWS Public Datasets.
Tech Details:
- You create a new "Public Dataset" by
- making an EBS snapshot in each region in which you would like it to be available
- providing the snapshot id(s) and metadata to Deepak (TODO see if this is still the case)
- then you wait for Amazon to get around to it
Pros:
- free hosting!
- scalable
Cons:
- this won't work for public data if it is a requirement that
- all users provide an email address and agree to a EULA prior to access
- we must log downloads
- this won't work for protected data unless the future implementation provides more support
Cloud Front Private Content
...
Is IAM only intended for managing groups and users where the base assumption is all activity is rolling up to a single AWS bill?
S3 Pre-Signed URLs for Private Content
Query String Request Authentication Alternative
You can authenticate certain types of requests by passing the required information as query-string parameters instead of using the Authorization HTTP header. This is useful for enabling direct third-party browser access to your private Amazon S3 data, without proxying the request. The idea is to construct a "pre-signed" request and encode it as a URL that an end-user's browser can retrieve. Additionally, you can limit a pre-signed request by specifying an expiration time.
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?RESTAuthentication.html
EBS snapshot ACL
...
The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192 192 links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.
...