Page Comparison

...

For the purposes of this portal, we define key data as data that, when shared in a raw or semi-processed format, is of sufficient size or complexity OR can be combined with similar data such that that it can be mined for additional insights. knowledge beyond the primary research question.

For example, a single Western blot image is typically not key data, because it can be used to answer just a handful of questions, typically all related to the protein that was assayed, and it is difficult to combine this information with lots of other Western blots to create a resource that can be mined. On the other hand, a collection of 5 whole slide images of patient tumor sections would likely be key data, because there are lots of questions that could potentially be asked of the data that were not examined in the study that generated in the data.

As a rough rule of thumb, you might ask yourself - if I was not doing this experiment myself, would I still want access to the raw data to combine it with other data or to ask my own questions about the data? Or would a figure in a publication suffice? If the former, it’s probably key data. If the latter, it’s probably optional.

...

Dataset contains data generated using high-throughput methods that output raw data presented in a widely used systematic format, and has more than just one or two samples. See the table below for examples!
Dataset considered to be validation data for a new method that is being developed in the funded grant.
Dataset is specifically deemed of interest by investigator for some other reason, e.g. particularly unique or non-recreate-able data.
Dataset is specifically deemed of interest by funder for some other reason.

...

	Requirement	Levels^a	Format	Notes
DNA
whole genome sequencing	required	raw OR semi-processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM
whole exome sequencing	required	raw OR semi-processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM
SNP microarray	required	raw AND processed	raw: CEL, IDAT, tsv (raw values per SNP) processed: tsv (genotypes per SNP)
immunosequencing	required	raw OR semi-processed	vendor-dependent, e.g. ImmunoSEQ and 10XGenomics formats
Sanger sequencing	optional	processed
RNA expression
RNA sequencing (bulk)	required	raw OR semi-processed AND processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM processed: counts matrices or quantification files	quantification files: like the quant.sf files generated by Salmon-based RNA-seq workflows
RNA sequencing (single-cell)	required	raw AND processed	raw: FASTQ processed: hda5/hdf5 format following cellxgene required format	fastq should be created from bcl files with a program like `cellranger mkfastq` More documentation on formatting hda5 files can be found here. hda5 format is a type of hdf5 file.
gene expression microarray	required	raw AND processed	raw: CEL, IDAT, tsv (raw values per SNP, copy number, and loss of heterozygosity) processed: tsv (normalized values and purity/ploidy)
qPCR	optional	processed	csv/tsv (according to template)
methylation
ATAC sequencing	required	raw OR semi-processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM
methylation array	required	raw OR semi-processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM
bisulfite sequencing	required	raw OR semi-processed	raw: FASTQ, unaligned BAM, CRAM \| semi-processed: aligned BAM
protein
LC-MS	required	raw AND processed	raw: mzML processed: protein intensities (csv/tsv)	https://www.psidev.info/mzML
western blot	optional	processed	densitometry output (csv/tsv)
plate-based ELISA	optional	raw	plate reader output (csv/tsv)
protein/peptide microarrays	required	processed	label-free quantification matrix (csv/tsv)
metabolomics
LC-MS	required	raw AND processed	raw: mzML or vendor-dependent format & processed: metabolite intensities (csv/tsv)
clinical
structured clinical data	required	processed	csv/tsv or XML with metadata for each variable	key primary and secondary endpoints only
EEG	required	raw		pending additional comments
clinical/imaging
MRI or other radiological image	required	raw	dicom, nifti, mincDICOM
imaging
immunohistochemistry	required	raw	OME-TIFF (preferred), at least bio-formats compatible file format
immunofluorescence	required	raw	OME-TIFF (preferred), at least bio-formats compatible file format
gross morphology photos (mice)	optional	raw	tiff, png, jpg
in vitro drug screening
plate-based cell viability assay	required	processed	csv/tsv (according to template)
other
flow cytometry	optional	raw	fsc with gating parameters
in vivo tumor growth experiments	optional	raw OR processed	csv/tsv (according to template) where raw: tumor dimensions or other raw measurements & processed: calculated tumor volume/size
^a^{Level nomenclature can be cross-referenced with https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/data-levels , where 'raw' corresponds to Level 1 and 'semi-processed' most closely corresponds to Level 2.}

...

Versions Compared

Old Version 11

New Version Current

Key