Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

New projects are set up with a basic structure that data contributors can build upon. This page documents best practices for organizing data and other materials within your NF project. The organization of your data can also affect the later annotation workflow.

Project Folders

A new Synapse Project is initialized using a default structure with these three folders:

  • Raw Data or Data: This can be further partitioned to house different types of raw data. For raw data types and formats commonly seen in this location, see How to Format Your Data.

  • Milestone Reports or Reporting: This should house the summary reports that link data files to specific award milestones.

  • Analysis: This can house the protocols, code, and derived results that comprise an analysis performed on raw data.

Info

To make analysis code more reproducible, Docker images can recreate the environment that includes software dependencies and configurations needed for the analysis. Each project has its own Docker Registry to store and distribute their Docker images per Synapse project. See https://help.synapse.org/docs/Synapse-Docker-Registry.2011037752.html.

...

While some older projects or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme. Sometimes a project may , this is considered the community standard. There is much flexibility in how to further structure your assets within these core containers, but for community-friendliness and ease of annotation there are additional best practices as explained below.

What if I have something that is not raw data, milestone report, or analysis?

A project may also create an additional folder to house materials that fall outside the scope of these containers, which is usually not an issue.

More details and examples are provided for each in the following subsections.

...

Raw Data or Data

This is intended to be further partitioned for different types of raw data. For raw data types and formats commonly seen in this location, see How to Format Your Data . In https://sagebionetworks.jira.com/wiki/spaces/NPD/pages/2137326583/How+to+Upload+Data#3.-Create-a-folder-for-your-data , we advise that you create a folder under this location for each data type.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within Data, illustrating . It demonstrates these several guidelines:

  • Data type is the first and most important grouping factor. Create separate folders for each separate data typetypes, e.g. an “RNA-seq”folder that will have .fastq files.

Info

A metadata schema can be applied at the folder level for describing to describe all files within that folder. Since metadata are specific to data types, this is simplest when the files are of having the same type within a folder helps keep metadata valid and consistent.

  • For each data type, the data can be further grouped however makes the most sense for the study. The example above further groups RNA-seq data by release year, but other reasonable factors could be, e.g., by cohort if there are were multiple different cohorts.

  • Separate original Original raw data are separated from processed data. A folder can be created to store the processed versions.

Milestone Reports or Reporting

This should house the summary reports that link data files to specific award milestones. Files within this folder usually won’t need further partitioning, unlike Raw Data, and are most relevant to funders rather than data re-users. Sometimes reports placed here are generated by the NF data coordination team.

Analysis

...

This can house the protocols, code, and derived results that comprise an analysis performed on raw data.

Info

Alongside the Analysis folder, each project has its own Docker Registry to store and distribute Docker images. To make analysis code more reproducible, Docker images can recreate the environment that includes software dependencies and configurations needed for the analysis. See https://help.synapse.org/docs/Synapse-Docker-Registry.2011037752.html.