Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For example, if you are sharing RNA-seq information, raw data would be the raw, fastq.gz files, processed data would be the aligned reads (.bam) or gene counts data, and differential expression analysis and volcano plots would be considered results. This distinction is well defined for many types of data, but for assays we encounter less often this may be less clear. "Results" might also be acceptable for assays that do not lend themselves to re-analysis, such as western blotting. We can work with you to help figure this out.

A rubric for determining what datasets are key data

For the purposes of this portal, we define key data as data that, when shared in a raw or semi-processed format, is of sufficient size or complexity OR can be combined with similar data such that that it can be mined for additional insights. For example, a single Western blot image is typically not key data, because it can be used to answer just a handful of questions, typically all related to the protein that was assayed, and it is difficult to combine this information with lots of other Western blots to create a resource that can be mined. On the other hand, a collection of 5 whole slide images of patient tumor sections would likely be key data, because there are lots of questions that could potentially be asked of the data.


As a rough rule of thumb, you might ask yourself - if I was not doing this experiment myself, would I still want access to the raw data to combine it with other data or to ask my own questions about the data? Or would a figure in a publication suffice? If the former, it’s probably key data. If the latter, it’s probably optional.

Key datasets generally fulfill at least one of the following criteria.

  1. Dataset contains data generated using high-throughput methods that output raw data presented in a widely used systematic format, and has more than just one or two samples. See the table below for examples!

  2. Dataset considered to be validation data for a new method that is being developed in the funded grant.

  3. Dataset is specifically deemed of interest by investigator for some other reason, e.g. particularly unique or non-recreate-able data.

  4. Dataset is specifically deemed of interest by funder for some other reason.

In addition to key datasets, you might consider sharing “optional” data for reasons like archiving or to meet publication requirements (e.g., some journals require scans of original Western blot films).

What data are required, and how

...

should you format it?

Please note: many common experimental data types are included in this requirements table, but you may be generating different or novel types of data that are not included here. Please don’t hesitate to reach out and ask us for a recommendation for your type of data if you do not see it mentioned here.

...