Genomic Data Processing - Data Eligibility Criteria (v-1.0)

Table of Contents

With the support of the Neurofibromatosis Therapeutic Acceleration Program (NTAP), we are excited to introduce a pilot initiative aimed at uniformly processing of genomic and transcriptomic data on the NF Data Portal. This initiative encompasses all projects that have received funding from NTAP between 2018 and 2023. The processing of high-dimensional genomic and transcriptomic data will be carried out through the use of standardized data processing pipelines. By ensuring uniformity in data processing, we will be able to share the processed data on the NF Data Portal and facilitate its utilization in other data analysis and exploration platforms. If you have a dataset shared on the NF Data Portal and would like to utilize the NF-OSI Processing pipelines, please reach out to us at nf-osi@sagebionetworks.org. We would be happy to assist you in a case-by-case basis.

Datatypes eligible for processing

Whole Exome Sequencing
Whole Genome Sequencing
Bulk RNA Sequencing
Single cell RNA sequencing

Required annotations for data files to be staged for processing

For each of the following assays, data files must be annotated with the terms listed below.

RNA Sequencing

	Annotation term	Additional Details
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.
2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Choose between Bulk RNA Seq or Single Cell RNA Seq.
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

* optional but recommended

Whole Genome Sequencing

	Annotation term	Additional Details
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.
2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Choose between Bulk RNA Seq or Single Cell RNA Seq.
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

* optional but recommended

Whole Exome Sequencing

	_{Annotation term}	_{Additional Details}
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing. Note: WES files are not eligible for variant calling if BED file is not available
2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Choose between Bulk RNA Seq or Single Cell RNA Seq.
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

* optional but recommended

Assay-specific workflow availability

The table below shows the availability of processing workflows for different data files generated through various ‘omics assays.

Assay	Germline SNV	Somatic SNV	Copy Number Variation (CNV)	Structural variants (SV)	Microsatellite Instability (MSI)	Raw counts
WES	✅	✅	✖️	✖️	✅
WGS	✅	✅	✅	✅	✅
Bulk RNAseq						✅
Single Cell RNAseq						✅

✅ The workflow is Available for this datatype

✖️ The workflow is available for this data type, but the NF-OSI will not provide this processing. This decision follows from the recommendation of scientists and engineers at Sage who have worked with these data modalities and have noted various problems in interpretation of processed data from these workflows during downstream analysis.

Genomic Data Processing - Data Eligibility Criteria (v-1.0)

Datatypes eligible for processing

Required annotations for data files to be staged for processing

RNA Sequencing

Annotation term

Additional Details

fileFormat

Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Choose between Bulk RNA Seq or Single Cell RNA Seq.

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.

Platform

This refers to the name of the platform used, for example, illumina.

readPair

Specify whether the read pair is 1 or 2.

specimenPreparationMethod

Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.

tumorType

If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type.

isStranded*

This answer should be either "yes" or "no."

readPairOrientation*

Indicate the read pair orientation, such as forward or reverse.

Whole Genome Sequencing

Annotation term

Additional Details

fileFormat

Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Choose between Bulk RNA Seq or Single Cell RNA Seq.

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.

Platform

This refers to the name of the platform used, for example, illumina.

readPair

Specify whether the read pair is 1 or 2.

specimenPreparationMethod

Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.

tumorType

If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.

isStranded*

This answer should be either "yes" or "no."

readPairOrientation*

Indicate the read pair orientation, such as forward or reverse.

Whole Exome Sequencing

Annotation term

Additional Details

fileFormat

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Choose between Bulk RNA Seq or Single Cell RNA Seq.

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.

Platform

This refers to the name of the platform used, for example, illumina.

readPair

Specify whether the read pair is 1 or 2.

specimenPreparationMethod

Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.

tumorType

If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.

isStranded*

_{Annotation term}

_{Additional Details}