Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A dataset is a collection of files that already exist in Synapse that may be hosted in one or more Synapse projects or folders. This can include You can create a dataset that includes any files that you have read access to, whether you have own/edit access to them or not.

You can use a dataset to:

  • Search and query many files at once

  • View and edit file annotations in bulk

  • Group or link files together by their annotationsCollect and distribute a set of files generated from the same study or project

  • Create a single item to represent a group of files that exist across disparate projects or folders

The main use cases of datasets are to allow you to collect and share immutable sets of items, which:

  • You created and want to distribute to the community

  • You created and want to connect with a publication or tool

  • You found in Synapse and used as part of your own research, and want to distribute

Although a dataset is similar to a view, it serves different purposes. While a view allows you to set a scope for a folder that could be continuously updating, a dataset includes specific versions of files that you determine when setting it up.

After creating a dataset, it will exist as a draft version, meaning you can continue editing it as you wish. You can also access create a stable (static) version of the dataset at that point in time, which cannot be changed. You can share this stable version with others, or link it to a publication, by minting a DOI.

...

Note

Notice the banner indicating that this is a draft version of the dataset. A draft dataset should not be shared distributed externally until it has been made into is finalized by creating a stable version. See Create a Stable Version for more information.

...

  1. In the dataset, click Add Items

  2. In the Add Files to Dataset window, browse for the file(s) you want to add

    1. Click on the name of a project to see all folders, files, and tables contained within that project. Note that only files can be selected and added to the dataset

    2. If you want to see the contents of an individual folders, click the dropdown arrow next to a project name, or next to a folder, to reveal all of its contents. This will allow you to select in individual files contained within

    3. You can also search for individual files using the Search for Files tool (Note that you cannot use this tool to search for folders or projects, only individual files)

  3. Click the checkbox next to any of the file(s) that you want to add. If you want to add all files from within a folder, you can click the general checkbox at the top of the list to add all contents. You can also select which version of the file you want to appear in your dataset.
    Your selections will appear in the Selected box at the bottom. You can delete remove individual selections from here if necessary. You can also select which version of the file you want to appear in your dataset.

  4. Once you have selected all of the files you want, click Add Files. All files from your Selected box will be added to the dataset. At this point, before saving the dataset, you can still add or remove files from the dataset, or change the version of any files (see screenshot below)

  5. Click Save to save your current selection and return to the draft dataset

...

Once you have created a draft dataset, there are a number of things you can do to it, similar to other features in Synapse. This includes:

  • Create a stable version (a static snapshot of the dataset)

  • Edit sharing settings

  • Annotate the dataset with metadata (in order to query for sets of datasets)

  • Create a wiki (add documentation of the dataset using the wiki)

  • Edit the dataset column schema

  • Create a stable version (a static snapshot of the dataset)

  • Mint a DOI

These actions are described below or linked to other help articles.

Create a stable version

A dataset can exist as a draft or stable version. 

A draft dataset is mutable, meaning that it can be edited. A stable version is a snapshot of the dataset at the moment the version was created. The version will have a synID which is appended with a number based on which version it is. For example, syn123456.2 would be version two of syn123456. 

Only stable versions should be shared with others, or included in downstream resources, as only stable versions are immutable (static and uneditable). If a file is deleted from Synapse, its metadata will still be visible in any stable dataset version that included that file. However, if another user clicks on that dataset version, they will find that it no longer exists. Such a deleted file may still be visible, but it no longer physically exists.

It is important to note that the wiki, sharing settings, and annotations remain the same between the draft dataset and the stable version.

Here’s how to create a stable version:

  1. Click Dataset Tools and select Create a Stable Dataset Version from the dropdown menu

  2. In the Create Stable Version window, add an appropriate label for the version, and a comment if necessary. Note that you do not need to add a version number, since is is already added for you

You will now see the new version, as well as the full version history. From here, you can go back to your draft.

Edit sharing settings

In the dataset, click on Dataset Tools, and select Dataset Sharing Settings from the dropdown menu. This will show you the current sharing settings of the dataset. Note that sharing settings of the dataset will be inherited from any parent projects or folders. If you want to have different settings on a specific file, you can create local sharing settings and then modify them. See this article for more information.

...

If your dataset is included in a view, you may wish to customize how you or others are able to query your dataset. If this is the case, you can add annotations so that you and other users can query for this dataset on custom keys. Another reason for adding annotations is to make your dataset findable using the search tool in Synapse.

To add annotations to a dataset:

...

  1. In the dataset, click Dataset Tools, and select Show Dataset Schema from the dropdown menu

  2. Click Edit Schema at the bottom of the table

  3. In the resulting Edit Columns window, you can add columns to your dataset schema using any combination of these three options:

    • Click Add Column to manually add individual columns one by one. If these columns exist as annotations on one or more of the files in the Dataset, the values will be displayed in the Dataset. You cannot use a Dataset to bulk annotate files, so do not add columns that do not already exist as annotations, since this will not serve any purpose.

    • Click Add Default Dataset Columns to add the default columns used in Datasets

      • You can then customize this list by removing any of the default columns you don’t want to be included—to do so, click the checkbox next to any column(s), followed by the trash can icon at the top.

    • Click Import columns to import columns from another table in Synapse. Again, only columns which already exist as underlying annotations will be relevant.

  4. Once you have added all columns of interest, you can:

    • Use the arrows at the top to reorder the columns

    • Enter any values as needed in the Restrict Values column

    • Select/change any column facet using the Facet dropdown

  5. Click Save

Create a stable version

A dataset can exist as a draft or stable version. 

A draft dataset is mutable, meaning that it can be edited. A stable version is a snapshot of the dataset at the moment the version was created. The version will have a synID which is appended with a number based on which version it is. For example, syn123456.2 would be version two of syn123456. 

Only stable versions should be shared with others, or included in downstream resources, as only stable versions are immutable (static and uneditable). If a file is deleted from Synapse, its metadata will still be visible in any stable dataset version that included that file.

It is important to note that the wiki, sharing settings, and annotations remain the same between the draft dataset and the stable version.

Here’s how to create a stable version:

  1. Click Dataset Tools and select Create a Stable Dataset Version from the dropdown menu

  2. In the Create Stable Version window, add an appropriate label for the version, and a comment if necessary.

...

Mint a DOI (digital object identifier)

...