Context
- Sage scientific expertise
- Sage expertise in data intensive biological analysis and interest in initiating large-scale sequencing analysis program.
- Google engineering expertise
- Google tools potentially enabling for large scale (tera or petabyte) data query and analysis.
Once we are started we will think hard about the scientific and engineering challenges. However, I am confident that if the data is transparently queryable we can make a "go" decision and commit to pursuing the collaboration. However, data access is a major challenge given regulatory hurdles (e.g. dbGAP) and difficulty in organizing the data. Therefore, I would like to focus this hour on practically determining our ability to interact with SRA data through Google tools which will allow us to decide on next steps.
Meeting objectives:
- What data can we access?
- How to handle dbGAP-type access restrictions if we want to do meta analysis across all of SRA?
- How do we access data?
- What formats are data in?
- How is data organized including meta-data sample annotations, and how we can interact with it?
- Is it possible to drill down to something akin to a schema description of how the data are organized and begin exploring the database?
...