Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

New projects are set up with a basic structure that data contributors can build upon. This page documents If and when your data sharing plan request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a folder structure based on community standards for contributors to use.

This page explains best practices for organizing data and other materials within your NF project. The organization of your data can also affect the later annotation workflowFollowing this structure will make annotation and other data management easier both for you and NF-OSI staff.

Project Folders Overview

A new Synapse Project is initialized using a default structure with these three Your Synapse project will usually be set up with these top-level folders:

  • Raw Data or Data - contains subfolders organized by expected datasets.

  • Milestone Reports or Reporting - contains generated reports for project; your program officers may also upload reports here.

  • Data Sharing Plan - contains versioned copies of your Data Sharing Plan.

  • Analysis

...

  • - figures or other outputs not considered “raw data”.

Note: older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme, this is considered the community standard. There is much flexibility in how to further structure your assets within these core containers, but for community-friendliness and ease of annotation there are additional best practices as explained below.

What if I have something that is not raw data, milestone report, or analysis?

A project may also create an additional folder .

This structure determines how easily others can find and understand your contributions, how easily you can annotate data, and how governance can be applied.

You can create other top-level folders to house materials that fall outside the scope of

...

the pre-generated folders, and these will be ignored.

Raw Data or Data

This is intended to be further partitioned for different types of raw data. For raw data types and formats commonly seen in this location, see How to Format Your Data . In https://sagebionetworks.jira.com/wiki/spaces/NPD/pages/2137326583/How+to+Upload+Data#3.-Create-a-folder-for-your-data , we advise that you create a folder under this location for each data type.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within Data. It demonstrates these several guidelines:

...

Info

A metadata schema can be applied at the folder level to describe all files within that folder. Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.

...

For each data type, the data can be further grouped however makes the most sense for the study. The example above further groups RNA-seq data by release year, but other reasonable factors could be, e.g., by cohort if there were multiple different cohorts.

...

The Raw Data or Data folder will have folders for datasets based on your data sharing plan.This format must be followed in order for your data to be detected with our data curation application.

Typical structure

Code Block
Raw Data
├── Imaging
    ├── img1.tiff
    ├── img2.tiff
    ├── manifest.csv
├── Cognitive Assessments
    ├── a_visit.xlsx
    ├── b_visit.xlsx
    ├── manifest.csv
├── RNA-seq
    ├── abc.fq.gz
    ├── def.fq.gz
    └── manifest.csv

When created, these dataset folders are automatically tagged with the special key-value contentType=dataset.

...

Info

If the Data Sharing Plan changes, you will need to add or delete some of these folders. Currently, wou will need to add the contentType key-value pair yourself in order for the dataset to be detected in the curation application.

Files should be in a folder under Raw Data and not directly under Raw Data, even if there is only one type of dataset/files. For raw data types and formatting recommendations, see How to Format Your Data.

Finer organization

  • For each data type, it is possible to group data with batches or certain other factors. For example, the RNA-seq data folder may have subfolders “batch 1” and “batch 2” that were produced at different times during the project.

  • You may do something similar with cohort, having subfolders for MRI data with “French” vs “US” patient groups, which may be especially helpful later on if different consents or geography-specific legal requirements apply to that dataset.

Milestone Reports or Reporting

This folder should house the summary reports that link data files to specific award milestones. Files within this Unlike with Raw Data or Data folders, files within the Milestone Reports or Reporting folder usually won’t need further partitioning, unlike Raw Data, and are . This folder is most relevant to funders rather than as opposed to data re-users. Sometimes reports placed here , reports housed in this folder are generated by the NF data coordination team.

Analysis

This folder can house the protocols, code, and derived results that comprise an analysis performed on raw data.

...

In addition to the Analysis folder, each project has its own Docker Registry to store and distribute

...

analysis code. To make analysis code more reproducible, Docker images

...

include both the code and the software dependencies and configurations needed

...

to run the analysis. See

...

Synapse

...

Docker

...

Registry for more information and instructions.