Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

When we set up your Synapse project, we add If and when your data submission request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a basic folder structure, based on community standards, that data contributors can build upon.

This page documents best practices for organizing data and other materials within your NF project. If you follow these recommendations, it will make the process of annotating your data easier.

Project Folders

...

Overview

We will set up your Synapse project using a default structure with these three folders:

  • Raw Data or Data

  • Milestone Reports or Reporting

  • Analysis

While This is the default structure, although some older projects or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme, this is considered the community standard. There is much flexibility in how .

Within these three main folders, you have the flexibility to further structure your assets within these core containers, but for in whichever way fits your study. However, for the purpose of consistency, community-friendliness, and ease of annotation there are additional , we’ve outlined some best practices as explained below.

What if I have something that is not raw data, milestone report, or analysis?

You can create an additional top-level folder to house materials that fall outside the scope of the pre-generated folders.

Raw Data or Data

This folder is intended to be further partitioned for different types of data. We recommend (https://sagebionetworks.jira.com/wiki/spaces/NPD/pages/2137326583/How+to+Upload+Data#3.-Create-a-folder-for-your-data)

As a best practice, we recommend that you create a new folder within the “Raw Data” Raw Data folder for each data type. For raw data types and format formatting recommendations, see How to Format Your Data.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data . It demonstrates these several guidelinesfolder. Here are guidelines that this example demonstrates:

  • Data type is the first and most important grouping factor. Create separate folders for separate data types, e.g. an “RNA-seq”types—for example, an RNA-seq folder that will have .fastq files.

...

    • (info) A metadata schema can be applied at the folder level to describe all files within that folder (and any sub-folders). Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.

  • For each data type, the data you can be further grouped however makes the further group data in whatever way makes most sense for the study. The example above further groups RNA-seq data by release year, but other reasonable factors could be used, e.g., by data type and cohort if there were multiple different cohorts. You may want to apply a different factor, such as by cohort.

  • Original raw data are separated from processed data. A folder can be created data—you can create a folder to store the processed versions.

Milestone Reports or Reporting

This folder should house the summary reports that link data files to specific award milestones. Files within this Unlike with Raw Data or Data folders, files within the Milestone Reports or Reporting folder usually won’t need further partitioning, unlike Raw Data, and are . This folder is most relevant to funders rather than as opposed to data re-users. Sometimes reports placed here , reports housed in this folder are generated by the NF data coordination team.

Analysis

This folder can house the protocols, code, and derived results that comprise an analysis performed on raw data.

...

In addition to the Analysis folder, each project has its own Docker Registry to store and distribute analysis code. To make analysis code more reproducible, Docker images include both the code and the software dependencies and configurations needed to run the analysis. See

...

Synapse

...

Docker

...

Registry for more information and instructions.