How to Organize Data

If and when your data submission request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a basic folder structure, based on community standards, that data contributors can build upon.

This page documents best practices for organizing data and other materials within your NF project. If you follow these recommendations, it will make the process of annotating your data easier.

Project Folders Overview

We will set up your Synapse project using a default structure with three folders:

Raw Data or Data
Milestone Reports or Reporting
Analysis

This is the default structure, although some older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme.

Within these three main folders, you have some flexibility to further structure your assets in whichever way fits your study. However, this structure does determine how governane can be applied, how easily you and consistency, community-friendliness, and ease of annotation, we’ve outlined some best practices below.

You can create an additional top-level folder to house materials that fall outside the scope of the pre-generated folders.

Raw Data or Data

The Raw Data or Data folder is intended to be further partitioned for different types of data. This format must be followed in order for your data to be detected with our data curation tooling. See example:

Typical structure

Raw Data
├── Imaging
    ├── img1.tiff
    ├── img2.tiff
    ├── manifest.csv
├── Cognitive Assessments
    ├── a_visit.xlsx
    ├── b_visit.xlsx
    ├── manifest.csv
├── RNA-seq
    ├── abc.fq.gz
    ├── def.fq.gz
    └── manifest.csv

We usually scaffold this structure based on your Data Sharing Plan. (If the Data Sharing Plan changes, you will need to add or delete some of these folders.) As a best practice, files should be in a folder under Raw Data and not directly under Raw Data, even if there is only one data type. For raw data types and formatting recommendations, see How to Format Your Data.

Real example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data folder. Here are guidelines that this example demonstrates:

Data type is againt the first and most important grouping factor. Create separate folders for separate data types—for example, an RNA-seq folder that will have .fastq files.
- A metadata schema can be applied at the folder level to describe all files within that folder (and any sub-folders). Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.
For each data type, it is possible to group data in whatever way makes most sense for the study (e.g. batches). The example above groups RNA-seq data by release year. You may want to apply a different factor, such as by cohort.
Original raw data are separated from processed data.

Milestone Reports or Reporting

This folder should house the summary reports that link data files to specific award milestones. Unlike with Raw Data or Data folders, files within the Milestone Reports or Reporting folder usually won’t need further partitioning. This folder is most relevant to funders as opposed to data re-users. Sometimes, reports housed in this folder are generated by the NF data coordination team.

Analysis

This folder can house the protocols, code, and derived results that comprise an analysis performed on raw data.

In addition to the Analysis folder, each project has its own Docker Registry to store and distribute analysis code. To make analysis code more reproducible, Docker images include both the code and the software dependencies and configurations needed to run the analysis. See Synapse Docker Registry for more information and instructions.