Page Comparison

The NF-OSI is piloting a new initiative to uniformly process genomic and transcriptomic data that is shared on the NF Data Portal. The uniformly processed data will be shared on the NF Data Portal and will also be used to support the use of data from the NF Data Portal in other data analysis/exploration platforms.

The processing of high dimensional genomic and transcriptomic data will be done using standardized data processing pipelines.

Thanks to the generosity of the NF-OSI funders, a subset of the NF Data Portal data is currently eligible for reprocessing as part of this initiative. If you have a dataset that you have shared on the NF Data Portal and would like to utilize the NF-OSI Processing pipelines, please reach out to us at nf-osi@sagebionetworks.org. We would be happy to assist you in a case-by-case basis.

Read the checklist below to learn more about which data are currently eligible for reprocessing.

Studies in the current scope of this initiative:

All studies funded by the Neurofibromatosis Therapeutic Acceleration Program that uploaded data within the duration 2018 to 2023.

Data Types in the scope of processing:

Whole exome sequencing data
Whole genome sequencing data
Bulk RNA sequencing data
Single cell RNA sequencing data

Eligibility criteria for data files to be staged for processing:

...

Raw data files present in the study must have file format annotated as “fastq” or “bam” or “cram”

For each of the following assays, data files must be annotated with the terms listed below to be staged for processing.

Bulk and Single Cell RNA Sequencing

	Annotation term	Additional Details
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.
2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Choose between Bulk RNA Seq or Single Cell RNA Seq.
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

^{* optional but recommended}

...

Whole Genome Sequencing

	Annotation term	Additional Details
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.

...

Data files must originate from samples that were “flash frozen”. Data from FFPE samples are not eligible for processing due to possible degradation of genomic samples due to the preservation process.

...

Data files must have the required minimal annotations according to the NF-OSI data model (as suggested in the NF Data Curator App). These annotations include among others:

individualID
specimenID
assay
libraryPreparationMethod
Platform
isStranded
readPairOrientation
readPair
specimenPreparationMethod

...

Only raw data files uploaded to Synapse projects are eligible to be staged for processing at this time.

Ineligibility criteria for specific processing methods:

...

WES files are not eligible for structural variant calling or copy number variant calling

...

WES files are not eligible for variant calling if BED file is not available.

2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Whole Genome Sequencing
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls

...

or for microsatellite instability processing

...

Availability of processing workflows:

The table below shows the availability of processing workflows for different data files generated through various ‘omics assays.

✅ means the workflow is Available for the data files

❌ means the workflow is available for these kind of data files but the NF-OSI will not provide this processing. This decision follows from the recommendation of scientists and engineers at Sage who have worked with these data modalities and have noted various problems in interpretation of processed data from these workflows during downstream analysis.

NA means the workflow is not applicable for these data files

Assay

Germline SNV

Somatic SNV

Copy Number Variation (CNV)

Structural variants (SV)

Microsatellite Instability (MSI)

Raw counts

WES

✅

❌

✅

NA

WGS

✅

NA

Bulk RNAseq

NA

✅

Single Cell RNAseq

NA

✅

.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

^{* optional but recommended}

...

Whole Exome Sequencing

	Annotation term	Additional Details
1	fileFormat	Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing. Note: WES files are not eligible for variant calling if BED file is not available
2	individualID	Individual IDs are necessary to create the sample sheets.
3	specimenID	specimen IDs are necessary to interpret the analysis.
4	Assay	Whole Exome Sequencing
5	Species	The corresponding genome requires knowledge of the species.
6	libraryPreparationMethod	This refers to the name of the library preparation, such as KAPA Hyper PCR 3.
7	Platform	This refers to the name of the platform used, for example, illumina.
8	readPair	Specify whether the read pair is 1 or 2.
9	specimenPreparationMethod	Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.
10	tumorType	If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls or for microsatellite instability processing.
11	isStranded*	This answer should be either "yes" or "no."
12	readPairOrientation*	Indicate the read pair orientation, such as forward or reverse.

^{* optional but recommended}

Note

Additional requirement:

The BED file associated with the library preparation method for each WES dataset is required to be uploaded and available to the NF-OSI Sage Team for the dataset to be eligible for processing.

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Studies in the current scope of this initiative:

Data Types in the scope of processing:

Eligibility criteria for data files to be staged for processing:

Bulk and Single Cell RNA Sequencing

Annotation term

Additional Details

fileFormat

Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Choose between Bulk RNA Seq or Single Cell RNA Seq.

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.

Platform

This refers to the name of the platform used, for example, illumina.

readPair

Specify whether the read pair is 1 or 2.

specimenPreparationMethod

Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.

tumorType

If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type.

isStranded*

This answer should be either "yes" or "no."

readPairOrientation*

Indicate the read pair orientation, such as forward or reverse.

Whole Genome Sequencing

Annotation term

Additional Details

fileFormat

Accepted file formats include "fastq", "bam", and "cram". If provided raw data are “bam” or “cram” format, only files that have not undergone any additional filtering (i.e. retains unmapped reads, have not been trimmed, etc) will be eligible for processing.

Ineligibility criteria for specific processing methods:

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Whole Genome Sequencing

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.

Platform

This refers to the name of the platform used, for example, illumina.

readPair

Specify whether the read pair is 1 or 2.

specimenPreparationMethod

Minimize RNA degradation with methods such as flash freezing or RNALater. FFPE is not recommended.

tumorType

If the tissue is normal, indicate "not applicable." Otherwise, specify the tumor type. NOTE: Files from samples lacking tumor-normal pairs will not be eligible for Somatic variant calls

or for microsatellite instability processing

Availability of processing workflows:

.

isStranded*

This answer should be either "yes" or "no."

readPairOrientation*

Indicate the read pair orientation, such as forward or reverse.

Whole Exome Sequencing

Annotation term

Additional Details

fileFormat

individualID

Individual IDs are necessary to create the sample sheets.

specimenID

specimen IDs are necessary to interpret the analysis.

Assay

Whole Exome Sequencing

Species

The corresponding genome requires knowledge of the species.

libraryPreparationMethod

This refers to the name of the library preparation, such as KAPA Hyper PCR 3.