Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Sage currently has two data sets stored as "AWS Public Datasets" in the US West Region.
  • Users can discover them by browsing public datasets on http://aws.amazon.com/datasets/Biology?browse=1 and also via the platform.
  • Users can use them for cloud computation by spinning up EC2 instances and mounting the data as EBS volumes.
  • Users cannot directly download data from these public datasets, but once the have them mounted on an EC2 host, they can certainly scp them to their local system.
  • Users are not forced to sign a Sage-specified EULA prior to access since because they can bypass the platform directly and access this data via normal AWS mechanisms.
  • Users must have an AWS account to access this data.
  • There is no mechanism to grant access. All users with AWS accounts are granted access by default.
  • There is no mechanism to keep an audit log for downloads or other usage of this data.
  • Users pay for access by paying their own costs for EC2 and bandwidth charges. Hosting
  • The cost of hosting is free.

Future Scenario:

  • this is currently EBS only but it will also be available for S3 in the future
  • TODO ask Deepak what other plans they have in mind for the re-launch of AWS Public Datasets.
  • TODO tell Deepak our suggested features for AWS Public Datasets.

...

http://docs.amazonwebservices.com/AmazonS3/2006-03-01/dev/index.html?UsingDevPay.html

S3

Skipping a description of public data on S3 because the scenario is very straightforward - if get the URL you can download the resource. For example: http://s3.amazonaws.com/nicole.deflaux/ElasticMapReduceFun/mapper.R

Protected Data Scenario:

  • All Sage data is stored on S3 and is not public.
  • Users can only discover what data is available via via the platform.
  • Users can use the data for cloud computation by spinning up EC2 instances and downloading the files from S3 to the hard drive of their EC2 instance. See below for more details on this.
  • Users can download the data from S3 to their local system. See below for more details on this.
  • The platform directs users to sign a Sage-specified EULA prior to gaining access to these files in S3.
  • Users must have a Sage platform account to access this data for download.
  • The platform grants access to this data. See below for details.
  • S3 logs all access to resources and this could serve as an audit log
    • TODO list what info is logged
  • See proposals below regarding how users might pay for usage.
  • The cost of hosting not free.

Resources:

S3 ACL

there is This is ruled out for protected data because ACLs can have a max of 100 grants and it appears that these grants can grants be to groups and we manage the group?cannot be to groups such as groups of arbitrary AWS users.

Open Question:

  • Confirm that grants do not apply to groups of AWS users.

Resources:

S3 Bucket Policies

S3 and IAM

This is ruled out for protected data because

Resources: http://docs.amazonwebservices.com/IAMAmazonS3/latest/UserGuidedev/index.html?UsingWithS3UsingBucketPolicies.html

S3 and IAM

With IAM a group of users can be granted access to S3 resources. This will be helpful for managing access Sage system administrators and Sage employees.

This is ruled out for protected data because IAM is used for managing groups of users all under a particular AWS bill (e.g., all employees of a company).

Open Questions:

  • Is there a cap on the number of users for IAM?

...

  • Confirm that IAM only intended for managing groups and users where the base assumption is all activity is rolling up to a single AWS bill.

Resources: http://docs.amazonwebservices.com/IAM/latest/UserGuide/index.html?UsingWithS3.html

S3 Pre-Signed URLs for Private Content

Pros:

Cons:

Resources:

  • "Query String Request Authentication Alternative

...

  • : You can authenticate certain types of requests by passing the required information as query-string parameters instead of using the Authorization HTTP header. This is useful for enabling direct third-party browser access to your private Amazon S3 data, without proxying the request. The idea is to construct a "pre-signed" request and encode it as a URL that an end-user's browser can retrieve. Additionally, you can limit a pre-signed request by specifying an expiration time."
  • http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?RESTAuthentication.html

EBS snapshot ACL

http://docs.amazonwebservices.com/AWSEC2/latest/APIReference/index.html?ApiReference-query-ModifySnapshotAttribute.html

...

The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192        192         links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.

...