...
Requirement | Levelsa | Format | Notes | |
---|---|---|---|---|
DNA | ||||
whole genome sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
whole exome sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
SNP microarray | required | raw AND processed | raw: CEL, IDAT, tsv (raw values per SNP) processed: tsv (genotypes per SNP) | |
immunosequencing | required | raw OR semi-processed | vendor-dependent, e.g. ImmunoSEQ and 10XGenomics formats | |
Sanger sequencing | optional | processed | ||
RNA expression | ||||
RNA sequencing (bulk) | required | raw OR semi-processed AND processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM processed: counts matrices or quantification files | quantification files: like the quant.sf files generated by Salmon-based RNA-seq workflows |
RNA sequencing (single-cell) | required | raw AND processed | raw: FASTQ processed: hda5/hdf5 format following cellxgene required format | fastq should be created from bcl files with a program like More documentation on formatting hda5 files can be found here. hda5 format is a type of hdf5 file. |
gene expression microarray | required | raw AND processed | raw: CEL, IDAT, tsv (raw values per SNP, copy number, and loss of heterozygosity) processed: tsv (normalized values and purity/ploidy) | |
qPCR | optional | processed | csv/tsv (according to template) | |
methylation | ||||
ATAC sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
methylation array | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
bisulfite sequencing | required | raw OR semi-processed | raw: FASTQ, unaligned BAM, CRAM | semi-processed: aligned BAM | |
protein | ||||
LC-MS | required | raw AND processed | raw: mzML processed: protein intensities (csv/tsv) | |
western blot | optional | processed | densitometry output (csv/tsv) | |
plate-based ELISA | optional | raw | plate reader output (csv/tsv) | |
protein/peptide microarrays | required | processed | label-free quantification matrix (csv/tsv) | |
metabolomics | ||||
LC-MS | required | raw AND processed | raw: mzML or vendor-dependent format & processed: metabolite intensities (csv/tsv) | |
clinical | ||||
structured clinical data | required | processed | csv/tsv or XML with metadata for each variable | key primary and secondary endpoints only |
EEG | required | raw | pending additional comments | |
clinical/imaging | ||||
MRI or other radiological image | required | raw | dicom, nifti, minc | |
imaging | ||||
immunohistochemistry | required | raw | OME-TIFF (preferred), at least bio-formats compatible file format | |
immunofluorescence | required | raw | OME-TIFF (preferred), at least bio-formats compatible file format | |
gross morphology photos (mice) | optional | raw | tiff, png, jpg | |
in vitro drug screening | ||||
plate-based cell viability assay | required | processed | csv/tsv (according to template) | |
other | ||||
flow cytometry | optional | raw | fsc with gating parameters | |
in vivo tumor growth experiments | optional | raw OR processed | csv/tsv (according to template) where raw: tumor dimensions or other raw measurements & processed: calculated tumor volume/size | |
aLevel nomenclature can be cross-referenced with https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels , where 'raw' corresponds to Level 1 and 'semi-processed' most closely corresponds to Level 2. |
...