Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Context

  1. Sage scientific expertise
    1. Sage expertise in data intensive biological analysis and interest in initiating large-scale sequencing analysis program.
  2. Google engineering expertise
    1. Google tools potentially enabling for large scale (tera or petabyte) data query and analysis.

Once we are started we will think hard about the scientific and engineering challenges. However, I am confident that if the data is transparently queryable we can make a "go" decision and commit to pursuing the collaboration. However, data access is a major challenge given regulatory hurdles (e.g. dbGAP) and difficulty in organizing the data. Therefore, I would like to focus this hour on practically determining our ability to interact with SRA data through Google tools which will allow us to decide on next steps.

Meeting objectives:

  1. What data can we access?
  2. How to handle dbGAP-type access restrictions if we want to do meta analysis across all of SRA?
  3. How do we access data?
  4. What formats are data in?
  5. How is data organized including meta-data sample annotations, and how we can interact with it?
  6. Is it possible to drill down to something akin to a schema description of how the data are organized and begin exploring the database?

...