Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

If and when your data submission sharing plan request is approved, we will work with you to set up a repository for your data, known as a project, in Synapse. We will create a basic folder structure , based on community standards , that data contributors can build uponfor contributors to use.

This page documents explains best practices for organizing data and other materials within your NF project. If you follow these recommendations, it will make the process of annotating your data easierFollowing this structure will make annotation and other data management easier both for you and NF-OSI staff.

Project Folders Overview

We Your Synapse project will usually be set up your Synapse project using a default structure with three with these top-level folders:

  • Raw Data or Data - contains subfolders organized by expected datasets.

  • Milestone Reports or Reporting - contains generated reports for project; your program officers may also upload reports here.

  • Data Sharing Plan - contains versioned copies of your Data Sharing Plan.

  • Analysis

...

  • - figures or other outputs not considered “raw data”.

Note: older or independent projects (not sponsored by one of our funders) may not have this exact top-level scheme.

Within these three main folders, you have some flexibility to further structure your assets in whichever way fits your study. However, this structure does determine how governane can be applied, how easily you and consistency, community-friendliness, and ease of annotation, we’ve outlined some best practices belowThis structure determines how easily others can find and understand your contributions, how easily you can annotate data, and how governance can be applied.

You can create an additional other top-level folder folders to house materials that fall outside the scope of the pre-generated folders, and these will be ignored.

Raw Data or Data

The Raw Data or Data folder is intended to be further partitioned for different types of datawill have folders for datasets based on your data sharing plan.This format must be followed in order for your data to be detected with our data curation toolingapplication. See example:

Typical structure

Code Block
Raw Data
├── Imaging
    ├── img1.tiff
    ├── img2.tiff
    ├── manifest.csv
├── Cognitive Assessments
    ├── a_visit.xlsx
    ├── b_visit.xlsx
    ├── manifest.csv
├── RNA-seq
    ├── abc.fq.gz
    ├── def.fq.gz
    └── manifest.csv

We usually scaffold this structure based on your Data Sharing Plan. (When created, these dataset folders are automatically tagged with the special key-value contentType=dataset.

...

Info

If the Data Sharing Plan changes, you will need to add or delete some of these folders.

...

Currently, wou will need to add the contentType key-value pair yourself in order for the dataset to be detected in the curation application.

Files should be in a folder under Raw Data and not directly under Raw Data, even if there is only one data type of dataset/files. For raw data types and formatting recommendations, see How to Format Your Data.

Working Example

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data folder. Here are guidelines that this example demonstrates:

Data type is againt the first and most important grouping factor. Create separate folders for separate data types—for example, an RNA-seq folder that will have .fastq files.

...

Finer organization

  • For each data type, it is possible to group data in whatever way makes most sense for the study (e.g. batches). The example above groups RNA-seq data by release year. You may want to apply a different factor, such as by cohort.Original raw data are separated from processed datawith batches or certain other factors. For example, the RNA-seq data folder may have subfolders “batch 1” and “batch 2” that were produced at different times during the project.

  • You may do something similar with cohort, having subfolders for MRI data with “French” vs “US” patient groups, which may be especially helpful later on if different consents or geography-specific legal requirements apply to that dataset.

Milestone Reports or Reporting

...