Skip to end of banner
Go to start of banner

Genomic Data Processing - Data Eligibility Criteria (v-1.0)

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

Datatypes eligible for processing

 Whole Exome Sequencing

A technique that focuses on sequencing only the exons, which are the coding regions of the genome, of an individual's DNA. WES is used to identify genetic variations in these regions that may be associated with certain diseases or traits.

 Whole Genome Sequencing

A technique that involves sequencing the entire genome of an individual. This includes both the coding regions and the non-coding regions of DNA. WGS is used to identify genetic variations throughout the entire genome that may be associated with certain diseases or traits.

 Bulk RNA Sequencing

A technique that involves sequencing the transcriptome of a bulk population of cells. This technique provides information on the gene expression levels of all genes in the sample and is commonly used to compare gene expression profiles between different samples or conditions.

 Single cell RNA sequencing

A technique that involves sequencing the transcriptome of individual cells. This technique provides information on the gene expression levels of each cell and is commonly used to study cell heterogeneity and identify rare cell populations.

Required annotations for data files to be staged for processing

For each of the following assays, data files must be annotated with the terms listed below.

Bulk and Single Cell RNA Sequencing

Annotation term
Additional Details
1
fileFormat
Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.
2
individualID
Individual IDs are necessary to create the sample sheets.
3
specimenID
specimen IDs are necessary to interpret the analysis.
4
Assay
Choose between Bulk RNA Seq or Single Cell RNA Seq.
5
Species
The corresponding genome requires knowledge of the species.
6
libraryPreparationMethod
This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7
Platform
This refers to the name of the platform used, for example, illumina.
8
readPair
Specify whether the read pair is 1 or 2.
9
specimenPreparationMethod
Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10
tumorType
If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type.
11
isStranded*
This answer should be either "yes" or "no."
12
readPairOrientation*
Indicate the read pair orientation, such as forward or reverse.

* optional but recommended


Whole Genome Sequencing

Annotation term
Additional Details
1
fileFormat
Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.
2
individualID
Individual IDs are necessary to create the sample sheets.
3
specimenID
specimen IDs are necessary to interpret the analysis.
4
Assay
Choose between Bulk RNA Seq or Single Cell RNA Seq.
5
Species
The corresponding genome requires knowledge of the species.
6
libraryPreparationMethod
This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7
Platform
This refers to the name of the platform used, for example, illumina.
8
readPair
Specify whether the read pair is 1 or 2.
9
specimenPreparationMethod
Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10
tumorType
If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.
11
isStranded*
This answer should be either "yes" or "no."
12
readPairOrientation*
Indicate the read pair orientation, such as forward or reverse.

* optional but recommended


Whole Exome Sequencing

Annotation term
Additional Details
1
fileFormat
Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing. Note: WES files are not eligible for variant calling if BED file is not available
2
individualID
Individual IDs are necessary to create the sample sheets.
3
specimenID
specimen IDs are necessary to interpret the analysis.
4
Assay
Choose between Bulk RNA Seq or Single Cell RNA Seq.
5
Species
The corresponding genome requires knowledge of the species.
6
libraryPreparationMethod
This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7
Platform
This refers to the name of the platform used, for example, illumina.
8
readPair
Specify whether the read pair is 1 or 2.
9
specimenPreparationMethod
Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10
tumorType
If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.
11
isStranded*
This answer should be either "yes" or "no."
12
readPairOrientation*
Indicate the read pair orientation, such as forward or reverse.

* optional but recommended

Additional requirement:

The BED file associated with the library preparation method for each WES dataset is required to be uploaded and available to the NF-OSI Sage Team for the dataset to be eligible for processing.

Assay-specific workflow availability

The table below shows the availability of processing workflows for different data files generated through various ‘omics assays. 

Assay
Germline SNV
Somatic SNV
Copy Number Variation (CNV)
Structural variants (SV)
Microsatellite Instability (MSI)
Raw counts
WES
✖️
✖️
(blue star)
WGS
(blue star)
Bulk RNAseq
(blue star)
(blue star)
(blue star)
(blue star)
(blue star)
Single Cell RNAseq
(blue star)
(blue star)
(blue star)
(blue star)
(blue star)
✅ The workflow is available for this datatype
✖️ The workflow is available for this data type, but the NF-OSI will not provide this processing. This decision follows from the recommendation of scientists and engineers at Sage who have worked with these data modalities and have noted various problems in interpretation of processed data from these workflows during downstream analysis. 
(blue star) The workflow is not applicable for this data type.

  • No labels