Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

If and when your data submission sharing plan request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a basic folder structure , based on community standards , that data contributors can build uponfor contributors to use.

This page documents explains best practices for organizing data and other materials within your NF project. If you follow these recommendations, it will make the process of annotating your data easierFollowing this structure will make annotation and other data management easier both for you and NF-OSI staff.

Project Folders Overview

We Your Synapse project will usually be set up your Synapse project using a default structure with three with these top-level folders:

  • Raw Data or Data - contains subfolders organized by expected datasets.

  • Milestone Reports or Reporting - contains generated reports for project; your program officers may also upload reports here.

  • Data Sharing Plan - contains versioned copies of your Data Sharing Plan.

  • Analysis

...

  • - figures or other outputs not considered “raw data”.

Note: older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme.

Within these three main folders, you have the flexibility to further structure your assets in whichever way fits your study. However, for the purpose of consistency, community-friendliness, and ease of annotation, we’ve outlined some best practices belowThis structure determines how easily others can find and understand your contributions, how easily you can annotate data, and how governance can be applied.

You can create an additional other top-level folder folders to house materials that fall outside the scope of the pre-generated folders, and these will be ignored.

Raw Data or Data

This folder is intended to be further partitioned for different types of data.As a best practice, we recommend that you create a new folder within the Raw Data folder for each data typeThe Raw Data or Data folder will have folders for datasets based on your data sharing plan.This format must be followed in order for your data to be detected with our data curation application.

Typical structure

Code Block
Raw Data
├── Imaging
    ├── img1.tiff
    ├── img2.tiff
    ├── manifest.csv
├── Cognitive Assessments
    ├── a_visit.xlsx
    ├── b_visit.xlsx
    ├── manifest.csv
├── RNA-seq
    ├── abc.fq.gz
    ├── def.fq.gz
    └── manifest.csv

When created, these dataset folders are automatically tagged with the special key-value contentType=dataset.

...

Info

If the Data Sharing Plan changes, you will need to add or delete some of these folders. Currently, wou will need to add the contentType key-value pair yourself in order for the dataset to be detected in the curation application.

Files should be in a folder under Raw Data and not directly under Raw Data, even if there is only one type of dataset/files. For raw data types and formatting recommendations, see How to Format Your Data.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data folder. Here are guidelines that this example demonstrates:

...

Data type is the first and most important grouping factor. Create separate folders for separate data types—for example, an RNA-seq folder that will have .fastq files.

  • (info) A metadata schema can be applied at the folder level to describe all files within that folder (and any sub-folders). Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.

...

For each data type, you can further group data in whatever way makes most sense for the study. The example above groups RNA-seq data by release year. You may want to apply a different factor, such as by cohort.

...

Finer organization

  • For each data type, it is possible to group data with batches or certain other factors. For example, the RNA-seq data folder may have subfolders “batch 1” and “batch 2” that were produced at different times during the project.

  • You may do something similar with cohort, having subfolders for MRI data with “French” vs “US” patient groups, which may be especially helpful later on if different consents or geography-specific legal requirements apply to that dataset.

Milestone Reports or Reporting

...