Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • All Sage data is stored on S3 and is not public.
  • Users can only discover what data is available via the platform.
  • Users can use the data for cloud computation by spinning up EC2 instances and downloading the files from S3 to the hard drive of their EC2 instance.
  • Users can download the data from S3 to their local system.
  • The platform directs users to sign a Sage-specified EULA prior to gaining access to these files in S3.
  • Users must have a Sage platform account to access this data for download.  They may need an AWS account for the cloud computation use case depending upon the mechanism we use to grant access.
  • The platform grants access to this data. See below for details about the various ways we might do this.
  • The platform will write to the audit log each time it grants access. S3 can also be configured to log all access to resources and this could serve as a means of intrusion detection.  
    • These two types of logs will have log entries about different events (granting access vs. using access) so they will not have a strict 1-to-1 mapping between entries but should have a substantial overlap.  
    • The platform can store anything it likes in its audit log.  
    • The S3 log stores normal web access log type data with the following identifiable fields:
      • client IP address is available in the log
      • "anonymous" or the users AWS canonical user id will appear in the log
    • We can try to appending some other query parameter to the S3 URL to help us line it up with audit log entries.
  • See proposals below regarding how users might pay for usage.
  • The cost of hosting not free.
    • Storage fees will apply.
    • Bandwidth fees apply when data is uploaded.
    • Data can also be shipped via hard drives and AWS Import fees would apply.
    • Bandwidth fees apply when data is downloaded out of AWS. There is no charge when it is downloaded inside AWS (e.g., to an EC2 instance).
    • These same fees apply to any S3 log data we keep.

...

S3 "Requester Pays" Buckets

Scenario:

  • The platform requires that users give us their AWS account number for download use cases.

In this scenario the requester's AWS account would be charged for any download bandwidth charges incurred. Currently assuming we would use this in combination with bucket policy or IAM.

TODO try this out

Open Questions:

...

The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192                  192                   links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.

...