Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Within these three main folders, you have the some flexibility to further structure your assets in whichever way fits your study. However, for the purpose of this structure does determine how governane can be applied, how easily you and consistency, community-friendliness, and ease of annotation, we’ve outlined some best practices below.

You can create an additional top-level folder to house materials that fall outside the scope of the pre-generated folders.

Raw Data or Data

This The Raw Data or Data folder is intended to be further partitioned for different types of data.This format must be followed in order for your data to be detected with our data curation tooling. See example:

Typical structure

Code Block
Raw Data
├── Imaging
    ├── img1.tiff
    ├── img2.tiff
    ├── manifest.csv
├── Cognitive Assessments
    ├── a_visit.xlsx
    ├── b_visit.xlsx
    ├── manifest.csv
├── RNA-seq
    ├── abc.fq.gz
    ├── def.fq.gz
    └── manifest.csv

We usually scaffold this structure based on your Data Sharing Plan. (If the Data Sharing Plan changes, you will need to add or delete some of these folders.) As a best practice, we recommend that you create a new folder within the Raw Data folder for each files should be in a folder under Raw Data and not directly under Raw Data, even if there is only one data type. For raw data types and formatting recommendations, see How to Format Your Data.

...

The Synodos NF2 project provides a good working example for organization of multiple raw data types within a Data folder. Here are guidelines that this example demonstrates:

  • Data type is againt the first and most important grouping factor. Create separate folders for separate data types—for example, an RNA-seq folder that will have .fastq files.

    • (info) A metadata schema can be applied at the folder level to describe all files within that folder (and any sub-folders). Since metadata are specific to data types, having the same type within a folder helps keep metadata valid and consistent.

  • For each data type, you can further it is possible to group data in whatever way makes most sense for the study (e.g. batches). The example above groups RNA-seq data by release year. You may want to apply a different factor, such as by cohort.

  • Original raw data are separated from processed data—you can create a folder to store the processed versionsdata.

Milestone Reports or Reporting

...