Glossary

Browse the glossary to learn more about terms and definitions commonly used throughout Synapse.

ACT

Abbreviation for the Synapse Access and Compliance Team, a group of people who are responsible for setting, maintaining, and controlling governance throughout Sage and its platforms. The ACT recommends appropriate safeguards depending on the type of data and who can access it, and can apply certain limitations or conditions for data access based on how the data will be shared.

Find more information on the ACT here.

Annotations

Annotations help users search for and find data, and they are a powerful tool used to systematically group and/or describe things in Synapse.

Annotations are stored as key-value pairs in Synapse, where the key defines a particular aspect of your data (for example, species, assay, file format) and the value defines a variable that belongs to that category (mouse, RNAseq, .bam). You can use annotations to add additional information about a project, file, folder, table, or view.

Annotations can be based on an existing ontology or controlled vocabulary, or can be created as needed and modified later as your metadata evolves.

Anonymous access

This is a type of data access setting—data set as anonymous access is available for anyone on the web, without Conditions for Use.

The other data access tiers are: private access, controlled access, and open access.

Learn more about data access here.

API

Abbreviation for Application Programming Interface, this is a type of software application that allows for the integration or connection between otherwise unconnected services. Synapse uses API clients to allow for the programmatic use of certain tasks, such as uploading and downloading data.

Find more information, including installation instructions, on Sage’s API clients here.

Certified User

This is one of four user account types in Synapse, which determines what actions a user can perform. Certified users have full access to Synapse functionality.

The other user account types are: anonymous users, registered users, and validated users.

Find more information on user account types here.

Challenges

An open-science, collaborative competition framework for evaluating and comparing computational algorithms.

Command line

One of the API clients that provides a way to use Synapse programmatically.

Learn how to install the Synapse command line client here.

Is this enough?

Controlled access

This is a type of data access setting—data set at as controlled access is available to registered, certified, or validated users that fulfil specific requirements for data access.

The other data access tiers are: private access, open access, and anonymous access.

Learn more about data access here.

Data dictionary

This is essentially a repository for metadata used throughout the Synapse site.

Need more here…

Data science

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains.

Source: Wikipedia. Is there a better / more Sage-focused explanation?

Data model

An abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. A data model explicitly determines the structure of data.

Source: Wikipedia. Is there a better / more Sage-focused explanation?

Digital object identifier (DOI)

A distinct alphanumeric string assigned to uniquely label and identify a digital object. A DOI is defined by a digital location like a URL and a description of the object, which includes attribution and a creation or publication date.

Docker

A tool for creating, running, and managing lightweight virtual machines to bundle code and other dependancies. You can add a Docker container to a project and share it with your teammates. Learn more about Docker here.

Entity

Any distinct object in Synapse, including a file, folder, project… what else?

Experimental mode

A mode in Synapse where new features and feature updates that are still in development until they are ready to be pushed live. Anyone can test out this mode using the Experiment Mode link at the bottom right of Synapse.

FAIR

Abbreviation for Findable, Accessible, Interoperable, and Reusable. This is the standard set of principles that we follow at Sage. FAIR data are discoverable to users through precise metadata, understandable in terms of how the data can be used, machine-readable to enable computational analysis, and ultimately, fit for reuse.

Governance

A set of protocols, conditions, and responsibilities that Sage has established to enable the ability of our organization to ensure quality, compliance, and usability of data uploaded to our platforms.

Synapse governance is an essential component of the Synapse platform; it is a system of policies, procedures, and tools for managing and protecting data in Synapse. Our policies define the community norms, user rights, and user responsibilities. Our procedures determine how to and who can contribute, access, and use content.

Read more about Synapse governance here.

JSON

Abbreviation for JavaScript Object Notation. JSON is a data-interchange format or language based on two structures: an object and associated values, or an array.

This may need tweaking… I tried to make it “approachable” but I may have missed key points or got something wrong.

JSON schema

A specific JSON-based format that defines the structure of JSON data for validation, documentation, and interaction control. It provides a contract for the JSON data required by a given application, and how that data can be modified.

Source: Wikipedia. Please check for accuracy/relevance

Key-value pairs

Key-value pairs are used in annotations, where the key defines a particular aspect of your data (for example, species, assay, file format) and the value defines a variable that belongs to that category (mouse, RNAseq, .bam).

Is this enough explanation? Are key-value pairs used outside of annotations?

Manifest

This is a file that gets uploaded alongside data, which specifies information about the data files being uploaded. It also contains annotations that will be associated with the file in Synapse. It tells the computer the current directory of the file to be uploaded (via path) and the Synapse ID of the folder where files will be uploaded (via parent). The manifest can also be used to describe provenance of each file, indicating how it was generated, but this is optional (but helpful).

There are several different types of manifests used throughout Synapse:

Upload manifest: This is a .tsv file used to upload metadata—more details, along with a template, are provided here.
Download manifest: This is used when downloading data programmatically—the template is provided by Synapse Python Client.
File Schema Driven Manifest: This is based on the new File Schema.
Portals Manifest: This is currently provided when exporting data.

Markdown

Markdown is a simple language used for creating formatted text (such as italics, bold, hyperlinks, paragraph breaks, etc.). It is simple in that it is readable to computers while also being appealing and understandable to people.

Markdown is used in certain parts of Synapse, including wikis and discussion forums.

Metadata

Metadata is additional, standardized information included alongside the data to give it context—data about the data, if you will. Metadata is what allows data in Synapse to be searchable, discoverable, accessible, re-usable, and understandable to others, including those who were not involved in the data generation process.

Metadata can be descriptive (i.e., the name of the file), administrative (i.e., provenance information), or research-based (i.e., information about the sampling and handling of data).

Ontology

In the context of data science, an ontology is essentially the system in place for naming and classifying entities and the relationships between them, as they exist in a particular data model. For example, the ontology of a research study would specify an appropriate naming convention for terms used throughout the study.

I made this up but not sure if it’s accurate / if I captured everything. Probably needs tweaking.

Open access

This is a type of data access setting—data set at as open access is available to all registered Synapse users, without use limitations.

The other data access tiers are: private access, controlled access, and anonymous access.

Learn more about data access here.

Permissions

This refers to the level of access that a Synapse user or team has to view, download, edit, delete, and manage data. Permissions can be set within the sharing settings of a project, files, folders, and tables.

Read more about permissions and how to set them here.

Private access

This is a type of data access setting—data set at as private access is visible only to the data owner and any other users who the owner grants access to.

The other data access tiers are: controlled access, open access, and anonymous access.

Learn more about data access here.

Project

In Synapse, projects act as containers that group relevant content and people together.

Provenance

Provenance is a concept describing the origin of something. In Synapse, it is used to describe the connections between the workflow steps used to create a particular file or set of results. The Synapse provenance system is one of many solutions that makes research work reproducible by you and others.

Read more about Synapse provenance here.

Python

One of the API clients that provides a way to use Synapse programmatically.

Learn how to install the Synapse python client here.

Is this enough?

Registered User

This is one of four user account types in Synapse, which determines what actions a user can perform. Registered users can create projects and wikis, collaborate with other registered users and create Synapse teams, can download publicly available data, and can access controlled data (if they fulfil the conditions for use)

The other user account types are: anonymous users, certified users, and validated users.

Find more information on user account types here.

REST (or REST API)

Need explanation

RNA-Seq

A sequencing technique which uses next-generation sequencing to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome (the set of all RNA transcripts in an individual or population of cells).

Source: Wikipedia. Is there a better / more Sage-focused explanation?

Schema

A snapshot of all the objects contained in a database and their relationship. Essentially, it is the structure of your data. In Synapse entities, such as views and tables, the schema defines the column names, as well as the values or types of data allowed in each column.

Please review for accuracy.

Sharing Settings

Determine who can access content in Synapse and what permissions those users have with respect to a dataset.

Learn more about sharing settings here.

Synapse ID (synID)

Every object in Synapse (file, folder, project, table, view, user, etc.) is designated a unique Synapse ID (also known as synID) that is readable by programmatic clients.

Is this accurate?

Validated User

This is one of four user account types in Synapse, which determines what actions a user can perform. Validated users are certified users that have applied to have their user profile validated. This validation makes you eligible to request access to mHealth data.

The other user account types are: anonymous users, registered users, and certified users.

Find more information on user account types here, and specifically on validated users (including instructions for gaining validation) here.