Skip to end of banner
Go to start of banner

How to Organize Data

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

If and when your data submission request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a basic folder structure, based on community standards, that data contributors can build upon.

This page documents best practices for organizing data and other materials within your NF project. If you follow these recommendations, it will make the process of annotating your data easier.

Project Folders Overview

We will set up your Synapse project using a default structure with three folders:

  • Raw Data or Data

  • Milestone Reports or Reporting

  • Analysis

This is the default structure, although some older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme.

Within these three main folders, you have the flexibility to further structure your assets in whichever way fits your study. However, for the purpose of consistency, community-friendliness, and ease of annotation, we’ve outlined some best practices below.

You can create an additional top-level folder to house materials that fall outside the scope of the pre-generated folders.

Raw Data or Data

This folder is intended to be further partitioned for different types of data.

As a best practice, we recommend that you create a new folder within the Raw Data folder for each data type. For raw data types and formatting recommendations, see How to Format Your Data.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data folder. Here are guidelines that this example demonstrates:

  • Data type is the first and most important grouping factor. Create separate folders for separate data types—for example, an RNA-seq folder that will have .fastq files.

    • (info) A metadata schema can be applied at the folder level to describe all files within that folder (and any sub-folders). Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.

  • For each data type, you can further group data in whatever way makes most sense for the study. The example above groups RNA-seq data by release year. You may want to apply a different factor, such as by cohort.

  • Original raw data are separated from processed data—you can create a folder to store the processed versions.

Milestone Reports or Reporting

This folder should house the summary reports that link data files to specific award milestones. Unlike with Raw Data or Data folders, files within the Milestone Reports or Reporting folder usually won’t need further partitioning. This folder is most relevant to funders as opposed to data re-users. Sometimes, reports housed in this folder are generated by the NF data coordination team.

Analysis

This folder can house the protocols, code, and derived results that comprise an analysis performed on raw data.

In addition to the Analysis folder, each project has its own Docker Registry to store and distribute analysis code. To make analysis code more reproducible, Docker images include both the code and the software dependencies and configurations needed to run the analysis. See Synapse Docker Registry for more information and instructions.

  • No labels