Metadata is standardized information about your data and will be used to annotate the data files in Synapse. Four metadata files are required for each study:
Individual
Biospecimen
Assay
Manifest
UPDATE - May 2023
The following approach will be deprecated by Q3 2023. Stay tuned for updates to the workflow.
The climb database variables that are exported for Synapse are mapped here https://www.synapse.org/#!Synapse:syn26137185
Metadata Templates
Metadata templates provide guidance for allowed variable keys and values on the dictionary and values worksheets. The AD Metadata Dictionary includes the latest information about allowed values. The latest templates are linked below.
In order for your metadata to validate successfully, download the latest Metadata Templates, populate with the relevant information and validate. Once you have populated the metadata fields, export the template worksheet from each file as a plain-text comma-separated file (CSV). and follow the validation instructions below.
Individual Animal
This file contains metadata about each individual animal that is part of a study. Each animal will be described in one row with information that is true of the animal as a whole (eg, individualID
, genotype
).
Template: template_individual_animal_model-ad.xlsx
Biospecimen
A biospecimen is a sample of cells, tissue, RNA, DNA, etc. This metadata file contains information about each biospecimen that is part of a study, including details like what organ and tissue the specimen is from.
The biospecimen and individual animal metadata files are linked by the individualID
variable. Verify these values are consistent across these two files. Each individualID
may have more than one associated specimenID
. Not all data will have an associated biospecimen – for instance behavioral or imaging studies may only have records in the individual animal metadata file.
The biospecimen and assay metadata files are linked by the specimenID
and it should be consistent across these two files. If more than one assay was conducted on the same specimen, the same specimenID
should be used throughout.
Template: template_biospecimen.xlsx
Assays
Each assay metadata file contains information about the assay and there are multiple templates since the information collected will vary by assay.
The assay metadata and biospecimen metadata files are linked by the specimenID
variable, which appears in both files and it should be consistent across these two files.
Not all assays have related assay metadata templates, but let the DCC know if you would like to collaborate on the development of new templates.
Manifest
A tab-delimited manifest file allows you to upload and download many data files, and set annotations, at once a client (Python, R, command line). Each row in the manifest species the file to be uploaded and the annotations to be applied.
Template: template_manifest.xlsx
Specify:
path
– the path of the file to upload (local, server, cloud)parentID
– each file will have a SynapseID for its staging locationAnnotations are key-value pairs that associate metadata with a file and help users find and query data. (see Synapse Annotation documentation)
Provenance is a means of describing a relationship between raw and processed data (see Synapse Provenance documentation). If you are uploading the results of an analysis, you may add a Used column to a manifest to give the Synapse ID(s) of the raw files that went into the analysis. If multiple Synapse IDs should be associated with a processed file, separate them with a semicolon
Once you have populated the manifest fields, export the template worksheet as a plain-text tab-separated file (TSV). You are now ready to validate the four metadata files.
Metadata Validation
To standardize data submissions and quality control, we’ve built a metadata validation tool, dccvalidator, that will perform several data quality checks on metadata templates and manifest files.
Validated metadata can be uploaded to the staging location provided by the DCC Curator. See more information about uploading data.
0 Comments