Document toolboxDocument toolbox

History of Controlled Vocabularies in Synapse

Any discussion on improving project organization is bound to touch on the use of controlled vocabularies to ensure all contributed data is annotated correctly.  Synapse has a long history of controlled vocabulary features.  In fact, the first release of Synapse in 2011 required any data contributor to conform an established set of objects each with a controlled set of required annotations.  Most users were confused by the object models.  What is a layer and how does it differ from a dataset?  When I upload a power-point why am I required to choose a species or tissue type?  Users were complaining that we were being too restrictive.

In a subsequent release of Synapse we added a feature where advanced users could define their own objects with their own fields in attempt to provide more options.  The number of objects types peaked at about fifteen (Study, Data, Link, Preview, Media, Project, Folder, PhenotypeData, RObject, Code, Step, Analysis, Row, Reference, Layer).  We polled our users and discovered everyone was confused including our power users.

A decision was made to reduce the number of objects to Projects, Folders, and Files.  This decision created a schism within the Synapse community.  It took months of negotiation to figure out how to collapse all of the existing object into the three new types.

Probably the least known controlled vocabulary features in the history of Synapse was the "Concept Services".  This feature set allowed a project organizer to use an OWL ontology as an annotation controlled vocabulary.  The first and only ontology applied using this feature was an extension of the NCI oncology ontology.   The project organizers were the only users to contribute data to this project.

The history of Synapse suggests that controlled vocabulary features will ensure that only those who created the vocabulary will contribute to the project.