Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
Class Location
    String provider // AWS, Google, Azure, Sage cluster – people will want to set a preferred cloud to work in
    String type     // filepath, download url, S3 url, EBS snapshot name
     String location // the actual uri or path

...

  • metadata
    • how to ensure we have metadata for all stuff in the cloud
  • file formats
    • tar archives or individual files on S3?
    • EBS block devices per dataset?
  • file layout
    • how to organize what we have
    • how can we enforce a clean layout for files and EBS volumes?
    • how to keep track of what we have
  • access patterns
    • we want to make the right thing be the easy thing - make it easy to do computation in the cloud
    • file download will be supported but will not be the recommended use case
    • recommendations and examples from the R prompt for interacting with the data when working on EC2
  • security
    • not all data is public
    • encryption or clear text?
      • key management
    • one time urls?
    • intrusion detection
    • how to manage ACLs and bucket policies
      • are there scalability upper bounds on ACLs? e.g., can't add more than X AWS accounts to an ACL
  • auditability
    • how to have audit logs
    • how to download them and make use of them
  • human data and regulations
    • what recommendations do we make to people getting some data from Sage and some data from dbGaP and co-mingling that data in the cloud
  • monitoring - what should be monitored
    • access patterns
    • who
    • when
    • what
    • how much
      • data foot print
      • upload bandwidth
      • download bandwidth
      • archive to cheaper storage unused stuff
  • cost
    • read vs. write
    • cost of allowing writes
    • cost of keeping same data in multiple formats
    • can we take advantage of the free hosting for http://aws.amazon.com/datasets even though we want to keep an audit log?
  • operations
    • how to make it efficient to manage
    • reduce the burden of administrative tasks
    • how to enable multiple administrators
  • how long does it take to get files up/down?
    • upload speeds - we are on the lambda rail
    • shipping hard drives
  • durability
    • data corruption
    • data loss

...

The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192 192  links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.

...