Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The NF Data Portal was built by members of the NF Open Science Initiative (NF-OSI), an alliance that was formed to support open science within the neurofibromatosis and schwannomatosis research community. NF-OSI is the result of a collaboration effort between the Children’s Tumor Foundation (CTF) and the Neurofibromatosis Therapeutic Acceleration Program (NTAP) that dates back to 2014 (you . More recent participants include the Gilbert Family Foundation (GFF), the Developmental and Hyperactive Ras Tumor SPORE (DHART-SPORE), and the CDMRP Neurofibromatosis Research Program (NFRP). You can read more about these programs here). This The NF-OSI is an open effort focusing on finding NF treatments by sharing data and analysis results with the broader community—an effort aided by the existence of the NF Data Portal.

...

Sage Bionetworks

First, there’s Sage Bionetworks - a name you may or may not have come across. While Sage is not a tool you’ll be using, you should know what it is—the company behind all of this! We are a non-profit organization based out of Seattle, Washington. Sage is dedicated to promoting and advancing open science, as well as engaging patients in the research process. Sage acts as the Data Coordinating Center (DCC) for several different portals, including the NF Data Portal. The scientists, developers, and designers that built the tools you’re using are all employed by Sage. You can learn more about Sage Bionetworks and its initiatives here.

Synapse

In line with advocating for open science, Sage developed a software platform called Synapse. This platform is what allows for collaborative data curation and analysis, computational modelling, and more. It allows users to upload, store, analyze, and track data in a private space, before releasing it to the public-facing NF Data Portal. Think of Synapse as the back-end for all the data to live in.

NF Data Portal

If Synapse is the back-end for data, the NF Data Portal is the front. It’s essentially the user interface (UI) or entry point for you to view data and other shared content. Data gets uploaded into Synapse, where it is then processed into readable form for you to access in the portal.

Sage Bionetworks

First, there’s Sage Bionetworks - a name you may or may not have come across. While Sage is not a tool you’ll be using, you should know what it is—the company behind all of this! We are a non-profit organization based out of Seattle, Washington. Sage is dedicated to promoting and advancing open science, as well as engaging patients in the research process. Sage acts as the Data Coordinating Center (DCC) for several different portals, including the NF Data Portal. The scientists, developers, and designers that built the tools you’re using are all employed by Sage. You can learn more about Sage Bionetworks and its initiatives here.

Synapse

In line with advocating for open science, Sage developed a software platform called Synapse. This platform is what allows for collaborative data curation and analysis, computational modelling, and more. It allows users to upload, store, analyze, and track data in a private space, before releasing it to the public-facing NF Data Portal. Think of Synapse as the back-end for all the data to live in.

NF Data Portal

If Synapse is the back-end for data, the NF Data Portal is the front. It’s essentially the user interface (UI) or entry point for you to view data and other shared content. Data gets uploaded into Synapse, where it is then processed into readable form for you to access in the portal.

NF Data Standards

Data standards underpin data sharing and make it possible to successfully explore, access, analyze, and reuse data. Data standards involve:

...

Where possible, Sage Bionetworks models its data standards on established global standards to promote interoperability across platforms, in support of FAIR data sharing. When these components work together, data standards allow users to find data, and ensure all information is present for successful reuse and analysis.

The majority of data available in the NF Data Portal is sequencing data, such as RNA sequencing and whole exome sequencing, though we also have a variety of imaging assays and other data. We derive most of our data standards and collection of standardized keys and values from vetted sources such as the National Cancer Institute’s Genomic Data Commons (NCI’s GDC) and NCI Thesaurus. If you already use or consult those standards, many of NF’s standards will the following terminology may be familiar to you.

Metadata Standards

For the most part, we collect scientific metadata (also referred to as annotations) that documents information about the experimental assay—for example, with sequencing data, information such as:

...

However, we also collect information related to the data project, such as:

  • who funded the project (fundingAgency)

  • what initiative/consortium it’s associated with (initiative)

  • the study’s title and ID (studyName, studyID)

  • general information about the data (filename, fileFormat, resourceTypedataType, dataSubtype)

Metadata is provided in CSV files, so think about this information in terms of a spreadsheet.

The attributes listed above (such as type of assay, platform used, study ID) are called keys, and would appear as the column headers in a spreadsheet.

The items associated with those keys (such as assay, platform, studyID) are called values, and would appear within the cells of a spreadsheetInvestigators provide metadata by filling in manifests (spreadsheets) using the NF Data Curator App. For more information on annotating data, see How to Annotate Data.

To allow for data standards, we control the terminology used for values through (meta)data dictionaries and other tools. Using controlled vocabularies and other data standards allows you to find what you’re looking for on the portal, so that you don’t have to search through multiple terms for the same thing. For example, instead of ribonucleic acid sequencing, or RNA-Seq, we use the value rnaSeq.

You can find our full data dictionary here and as regular releases of a JSON-LD file on our nf-osi github here.