...
- All Sage data is stored on S3 and is not public.
- Users can only discover what data is available via the platform.
- Users can use the data for cloud computation by spinning up EC2 instances and downloading the files from S3 to the hard drive of their EC2 instance.
- Users can download the data from S3 to their local system.
- The platform directs users to sign a Sage-specified EULA prior to gaining access to these files in S3.
- Users must have a Sage platform account to access this data for download. They may need an AWS account for the cloud computation use case depending upon the mechanism we use to grant access.
- The platform grants access to this data. See below for details about the various ways we might do this.
- The platform will write to the audit log each time it grants access. S3 can also be configured to log all access to resources and this could serve as a means of intrusion detection.
- These two types of logs will have log entries about different events (granting access vs. using access) so they will not have a strict 1-to-1 mapping between entries but should have a substantial overlap.
- The platform can store anything it likes in its audit log.
- The S3 log stores normal web access log type data with the following identifiable fields:
- client IP address is available in the log
- "anonymous" or the users AWS canonical user id will appear in the log
- We can try to appending some other query parameter to the S3 URL to help us line it up with audit log entries.
- See proposals below regarding how users might pay for usage.
- The cost of hosting not free.
- Storage fees will apply.
- Bandwidth fees apply when data is uploaded.
- Data can also be shipped via hard drives and AWS Import fees would apply.
- Bandwidth fees apply when data is downloaded out of AWS. There is no charge when it is downloaded inside AWS (e.g., to an EC2 instance).
- These same fees apply to any S3 log data we keep.
...
S3 "Requester Pays" Buckets
Scenario:
- The platform requires that users give us their AWS account number for download use cases.
In this scenario the requester's AWS account would be charged for any download bandwidth charges incurred. Currently assuming we would use this in combination with bucket policy or IAM.
TODO try this out
Open Questions:
- This statement from http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?RequesterPaysBucketConfiguration.html it sounds like this won't work with S3 Pre-Signed URLs: "Bucket owners who give out pre-signed URLs should think twice before configuring a bucket to be Requester Pays, especially if the URL has a very long expiry. The bucket owner is charged each time the requester uses pre-signed URLs that use the bucket owner's credentials."
- But on http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?ObjectsinRequesterPaysBuckets.html is says "For signed URLs, include x-amz-request-payer=requester in the request". So is it correct that the bucket owner cannot make signed urls for other payers because the credentials won't match?
...
The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192 192 links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.
...