...
Code Block |
---|
Class Location
String provider // AWS, Google, Azure, Sage cluster – people will want to set a preferred cloud to work in
String type // filepath, download url, S3 url, EBS snapshot name
String location // the actual uri or path
|
...
- metadata
- how to ensure we have metadata for all stuff in the cloud
- file formats
- tar archives or individual files on S3?
- EBS block devices per dataset?
- file layout
- how to organize what we have
- how can we enforce a clean layout for files and EBS volumes?
- how to keep track of what we have
- access patterns
- we want to make the right thing be the easy thing - make it easy to do computation in the cloud
- file download will be supported but will not be the recommended use case
- recommendations and examples from the R prompt for interacting with the data when working on EC2
- security
- not all data is public
- encryption or clear text?
- key management
- one time urls?
- intrusion detection
- how to manage ACLs and bucket policies
- are there scalability upper bounds on ACLs? e.g., can't add more than X AWS accounts to an ACL
- auditability
- how to have audit logs
- how to download them and make use of them
- human data and regulations
- what recommendations do we make to people getting some data from Sage and some data from dbGaP and co-mingling that data in the cloud
- monitoring - what should be monitored
- access patterns
- who
- when
- what
- how much
- data foot print
- upload bandwidth
- download bandwidth
- archive to cheaper storage unused stuff
- cost
- read vs. write
- cost of allowing writes
- cost of keeping same data in multiple formats
- can we take advantage of the free hosting for http://aws.amazon.com/datasets even though we want to keep an audit log?
- operations
- how to make it efficient to manage
- reduce the burden of administrative tasks
- how to enable multiple administrators
- how long does it take to get files up/down?
- upload speeds - we are on the lambda rail
- shipping hard drives
- durability
- data corruption
- data loss
...
The Pacific Northwest Gigapop is the point of presence for the Internet2/Abilene network in the Pacific Northwest. The PNWGP is connected to the Abilene backbone via a 10 GbE link. In turn, the Abilene Seattle node is connected via OC-192 192 links to both Sunnyvale, California and Denver, Colorado.
PNWPG offers two types of Internet2/Abilene interconnects: Internet2/Abilene transit services and Internet2/Abilene peering at Pacific Wave International Peering Exchange. See Participant Services for more information.
...