Metadata
Metadata is standardized information provided along with data that helps with data organization and description. It is, essentially, data about the data.
Metadata is useful for you and everyone else that has a need for discovering, accessing, and/or using the corresponding data or repository the data is stored in. Therefore, effort is placed on the importance of metadata being understandable to a diverse set of users. This gives you the ability to interpret the data in a well-rounded, comprehensive way that tells the whole picture of the data.
What is the purpose of metadata?
Metadata serves many purposes:
Data understandability
Metadata gives context to the data, allowing it to be understood by others, including those who were not involved in the data generation process. For example, descriptive metadata such as the study name, the assay performed, tissue type, species, etc., provides general information about the data that allows a user, such as a bioinformatician, to decide if they can reuse the data for analysis or other purposes.
Data discovery and accessibility
Because metadata is provided in a standardized format, it allows the data to be searchable by various metadata elements, as well as accessible, all in an organized way. This standardization can greatly improve the accuracy of a search and reduce ambiguity.
Data interoperability
The standardized nature of metadata allows the data to be efficiently integrated with other applications.
Data reuse
All of the aforementioned purposes of metadata contribute to a data’s ability to be re-used. If data is not understandable, discoverable, accessible, or interoperable, it will be difficult or even impossible to reuse.
Metadata requirements
All data that gets uploaded into Synapse is curated by the NF-OSI to ensure the metadata properly allows for data usability. In most cases, the required metadata will be expected in four files:
Individual metadata: .csv file describing each individual in the study
Biospecimen metadata: .csv file describing the specimens collected in the study
Assay metadata: .csv file(s) describing the assay(s) performed (one file to be uploaded per study, if multiple assays were used)
A manifest: a .tsv (tab-delimited text) file listing each file that will be uploaded, used to upload the data after it has been validated and approved
For more information on how to upload metadata as a data contributor, see the data contribution section.
Metadata dictionary
The metadata dictionary provides the proper terms that you need to know in order to successfully query for data as a user, or properly contribute data as a contributor. As a user, knowing this information will allow you to search for and find the data you need.
Find more information on annotations and how to assign them to your uploaded data here.
In general, the metadata dictionary is useful if you are:
A user who needs to know what terms in the annotations and metadata files mean so you can properly search for and find the data you need.
A contributor who needs to know what the accepted values are for a given key. For example, if you used a HiSeq 2000 machine to generate your data, then you would need to know that the accepted value for that machine is HiSeq2000 (no space). You also need to know that it goes in the column with the key platform.
A curator who needs to know what keys/values are currently available in order to direct contributors to use them, or gage a need for any additional ones.