How to Organize Data

This page documents best practices for organizing data and other materials within your NF project.

Project Folders

A new Synapse Project is initialized using a default structure with these three folders:

Raw Data or Data: This can be further partitioned to house different types of raw data. For raw data types and formats commonly seen in this location, see How to Format Your Data.
Milestone Reports or Reporting: This should house the summary reports that link data files to specific award milestones.
Analysis: This can house the protocols, code, and derived results that comprise an analysis performed on raw data.

To make analysis code more reproducible, Docker images can recreate the environment that includes software dependencies and configurations needed for the analysis. Each project has its own Docker Registry to store and distribute their Docker images per Synapse project. See https://help.synapse.org/docs/Synapse-Docker-Registry.2011037752.html.

New NF community contributions should go into these core containers as delineated. Some older projects or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme. Sometimes a project may create an additional folder to house materials that fall outside the scope of these containers, which is usually not an issue.

More details and examples are provided for each in the following subsections.

Raw Data

The Synodos NF2 project provides a good working example for organization of multiple raw data types within Data, illustrating several guidelines:

Data type is the first and most important grouping factor. Create separate folders for each data type, e.g. an “RNA-seq” folder that will have .fastq files.

A metadata schema can be applied at the folder level for describing all files within that folder. Since metadata are specific to data types, this is simplest when the files are of the same type.

For each data type, the data can be further grouped however makes the most sense for the study. The example above further groups RNA-seq data by release year, but other reasonable factors could be by cohort if there are different cohorts.
Separate original raw data from processed data. A folder can be created to store the processed versions.

Project Folders

Raw Data

Milestone Reports

Analysis

Supplemental folders

0 Comments