Content Comparison

...

Although a dataset is similar to a file view /wiki/spaces/DOCS/pages/2011070739, it serves different purposes. While a file view allows you to set a scope for a folder that could be continuously updating, a dataset includes specific versions of files that you determine when setting it up.

After creating a dataset, it will exist as a draft version, meaning you can continue editing it as you wish. You can also create a stable (static) version of the dataset at that point in time, which cannot be changed. You can share this stable version with others, or link it to a publication, by minting a DOI /wiki/spaces/DOCS/pages/1972405096.

Since you can create and share a dataset with files you do not own, it’s important to ensure that you follow the Synapse Terms and Conditions of Use.

When to use a dataset vs. a file view

As mentioned, a dataset is similar to a file view /wiki/spaces/DOCS/pages/2011070739 in that its purpose is to group a specific set of files together, there are distinct differences between the two, which may determine when you would use one over the other. Review the table below for a summary of these differences.

	Datasets	File Views
Underlying object type	Dataset	EntityView
Method for adding files	Select specific versions of individual files	Select projects and folders, including the latest version of all contained files, tables, or datasets
View and query file annotations	✔	✔
Edit file annotations in bulk		✔
Versioning/snapshot functionality	✔	✔

Annotation functionality

✔

DOI functionality	✔	✔
Limit on number of files	10,000

35

up to 350,000,000 with appropriate project/folder structure (see View Limits)

How to create a dataset

Create a dataset

Navigate to the project that you want to create the dataset in
Click the Datasets tab
Click Add Dataset
In the Create Dataset window, enter a name for the dataset
Click Finish

You will now be directed to the new dataset that you just created (it will be empty at this point).

Note
Notice the banner indicating that this is a draft version of the dataset. A draft dataset should not be distributed externally until it is finalized by creating a stable version. See the section Create a Stable Version below for more information.

Add files to the dataset

In the dataset, click Add Items
In the Add Files to Dataset window, browse for the file(s) you want to add
1. Click on the name of a project to see all folders, files, and tables contained within that project. Note that only files can be selected and added to the dataset
2. If you want to see the contents of an individual folders, click the dropdown arrow next to a project name, or next to a folder, to reveal all of its contents. This will allow you to select in individual files contained within
3. You can also search for individual files using the Search for Files tool (Note that you cannot use this tool to search for folders or projects, only individual files)
Click the checkbox next to any of the file(s) that you want to add. If you want to add all files from within a folder, you can click the general checkbox at the top of the list to add all contents. You can also select which version of the file you want to appear in your dataset.

Find more information at /wiki/spaces/DOCS/pages/2667675758.

Your selections will appear in the Selected box at the bottom. They will remain in this “selected” status, even as you navigate through other folders and files. You can remove individual selections from here if necessary.
Once you have selected all of the files you want, click Add Files. All files from your Selected box will be added to the dataset. At this point, before saving the dataset, you can still add or remove files from the dataset, or change the version of any files (see screenshot below)
Click Save to save your current selection and return to the draft dataset

In the screenshot below, notice how there are several actions you can take after adding new files, before saving this version. You can use the checkmarks to select any of the files and remove them, or use the Version dropdowns to change the version selection of any file. You can also add new files. Note that you can also edit the draft dataset after saving, but if you create a stable version then it will reflect the selection that you included at the time of creation. See the section Create a Stable Version below for more information on this.

...

How to use a dataset

Once you have created a draft dataset, there are a number of things you can do to it, similar to other features in Synapse. This includes:

...

Here’s how to create a stable version:

Click Dataset Tools and select Create a Stable Dataset Version from the dropdown menu
In the Create Stable Version window, add an appropriate label for the version, and a comment if necessary. Note that you do not need to add a version number, since is is already added for you

You will now see the new version, as well as the full version history. From here, you can go back to your draft.

...

In the dataset, click on Dataset Tools, and select Dataset Sharing Settings from the dropdown menu. This will show you the current sharing settings of the dataset. Note that sharing settings of the dataset will be inherited from any parent projects or folders. If you want to have different settings on a specific file, you can create local sharing settings and then modify them. See this article for more information

For more information on sharing settings, see /wiki/spaces/DOCS/pages/2024276030.

Annotate the dataset with metadata

...

To add annotations to a dataset:

Once in the dataset, click Dataset Tools and select Annotations from the dropdown list
In the My Dataset window, click Edit
As the on-screen instructions state, click the Add icon to begin adding annotations
Complete your annotations using the fields provided
- Use the + button to the right of each row to add a new value for any Key
- Use the x button to the right of each row to delete that row
- Click the Add icon to add another Key
Click Save

Edit the dataset schema

You can customize the visible columns of a dataset. These columns will be auto-populated based on the annotation values of the underlying files in the dataset. Here’s how to customize these columns:

In the dataset, click Dataset Tools, and select Show Dataset Schema from the dropdown menu
Click Edit Schema at the bottom of the table
In the resulting Edit Columns window, you can add columns to your dataset schema using any combination of these three options:
- Click Add Column to manually add individual columns one by one. If these columns exist as annotations on one or more of the files in the Dataset, the values will be displayed in the Dataset. You cannot use a Dataset to bulk annotate files, so do not add columns that do not already exist as annotations, since this will not serve any purpose.
- Click Add Default Dataset Columns to add the default columns used in Datasets
  - You can then customize this list by removing any of the default columns you don’t want to be included—to do so, click the checkbox next to any column(s), followed by the trash can icon at the top.
- Click Import columns to import columns from another table in Synapse. Again, only columns which already exist as underlying annotations will be relevant.
Once you have added all columns of interest, you can:
- Use the arrows at the top to reorder the columns
- Enter any values as needed in the Restrict Values column
- Select/change any column facet using the Facet dropdown
Click Save

Mint a DOI (digital object identifier)

You can use a DOI (Digital Object Identifier) to generate a permanent link to the dataset. See this article for

Find more information and instructions at Digital Object Identifiers (DOIs).

Version	Old Version 13	New Version Current
Changes made by	alex.knoll (Unlicensed)	Kevin Boske
Saved on	Jul 19, 2022	Feb 16, 2024