...
For the purposes of this portal, we define key data as data that, when shared in a raw or semi-processed format, is of sufficient size or complexity OR can be combined with similar data such that that it can be mined for additional insights. knowledge beyond the primary research question.
For example, a single Western blot image is typically not key data, because it can be used to answer just a handful of questions, typically all related to the protein that was assayed, and it is difficult to combine this information with lots of other Western blots to create a resource that can be mined. On the other hand, a collection of 5 whole slide images of patient tumor sections would likely be key data, because there are lots of questions that could potentially be asked of the data that were not examined in the study that generated in the data.
As a rough rule of thumb, you might ask yourself - if I was not doing this experiment myself, would I still want access to the raw data to combine it with other data or to ask my own questions about the data? Or would a figure in a publication suffice? If the former, it’s probably key data. If the latter, it’s probably optional.
...
Requirement | Levelsa | Format | Notes | |
---|---|---|---|---|
DNA | ||||
whole genome sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
whole exome sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
SNP microarray | required | raw AND processed | raw: CEL, IDAT, tsv (raw values per SNP) processed: tsv (genotypes per SNP) | |
immunosequencing | required | raw OR semi-processed | vendor-dependent, e.g. ImmunoSEQ and 10XGenomics formats | |
Sanger sequencing | optional | processed | ||
RNA expression | ||||
RNA sequencing (bulk) | required | raw OR semi-processed AND processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM processed: counts matrices or quantification files | quantification files: like the quant.sf files generated by Salmon-based RNA-seq workflows |
RNA sequencing (single-cell) | required | raw AND processed | raw: FASTQ processed: hda5/hdf5 format following cellxgene required format | fastq should be created from bcl files with a program like More documentation on formatting hda5 files can be found here. hda5 format is a type of hdf5 file. |
gene expression microarray | required | raw AND processed | raw: CEL, IDAT, tsv (raw values per SNP, copy number, and loss of heterozygosity) processed: tsv (normalized values and purity/ploidy) | |
qPCR | optional | processed | csv/tsv (according to template) | |
methylation | ||||
ATAC sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
methylation array | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
bisulfite sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
protein | ||||
LC-MS | required | raw AND processed | raw: mzML processed: protein intensities (csv/tsv) | |
western blot | optional | processed | densitometry output (csv/tsv) | |
plate-based ELISA | optional | raw | plate reader output (csv/tsv) | |
protein/peptide microarrays | required | processed | label-free quantification matrix (csv/tsv) | |
metabolomics | ||||
LC-MS | required | raw AND processed | raw: mzML or vendor-dependent format & processed: metabolite intensities (csv/tsv) | |
clinical | ||||
structured clinical data | required | processed | csv/tsv or XML with metadata for each variable | key primary and secondary endpoints only |
EEG | required | raw | pending additional comments | |
clinical/imaging | ||||
MRI or other radiological image | required | raw | dicom, nifti, mincDICOM | |
imaging | ||||
immunohistochemistry | required | raw | OME-TIFF (preferred), at least bio-formats compatible file format | |
immunofluorescence | required | raw | OME-TIFF (preferred), at least bio-formats compatible file format | |
gross morphology photos (mice) | optional | raw | tiff, png, jpg | |
in vitro drug screening | ||||
plate-based cell viability assay | required | processed | csv/tsv (according to template) | |
other | ||||
flow cytometry | optional | raw | fsc with gating parameters | |
in vivo tumor growth experiments | optional | raw OR processed | csv/tsv (according to template) where raw: tumor dimensions or other raw measurements & processed: calculated tumor volume/size | |
aLevel nomenclature can be cross-referenced with https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels , where 'raw' corresponds to Level 1 and 'semi-processed' most closely corresponds to Level 2. |
...