Document toolboxDocument toolbox

Download List for all Clients - API Design

Introduction

The original download list feature was designed solely for the Synapse web client users. The assumption was that the web client user would only want to download 100 files or less as a packaged zip file. The original feature API design document can be found here: Bulk File/Table Download via Web Client REST API.

The original feature has a few drawbacks that limit its usefulness:

  • While the files in the downloaded package are named, it would be extremely difficult, if not impossible, to tie the downloaded files back to their source. This means none of the metadata associated with each file is available to the user after download. Without the file’s metadata its usefulness is limited. See: 1-Pager Annotations Download

  • Many datasets have thousands of files that a user might wish to download, so the 100 file limit is overly restrictive. This is especially true for our programmatic client users that are capable of managing vast numbers of files.

With the new download list, we want to overcome these limitations, but we also want provide a more consistent experience for all of our users regardless of the client used.

The Figma design for this project can be found at: Data List Download II.

Note: The statistics on the current download list feature suggests that it is fairly popular with our users. See: Statistics on current download list feature. While it should be safe to add new features, we should probably avoid removing any of the existing features.

Stages

It can be useful to divide the download process into at least two stages:

  • File Selection - In this stage a data consumer is engaged in navigation and discovery of potential files to download. This stage is similar to the product selection stage of an on-line shopping experience, where items are added to a cart.

  • Download List Management - The data consumer enters this stage, when they are ready to actually download files on their download list. This stage is similar to the “checkout” stage of an on-line shopping experience.

File Selection

The first stage in the file download process is file selection. In this stage, the data consumer uses faceted navigation of one or more Synapse Files Views, to find the files they are interested in downloading. Users can also add files from a folder, or individual files via the file explorer. Once a user finds one or more files they wish to download, they can add them to their download list, and then continue looking for more files. Note: The user is free to add files that they cannot yet download to their list.

We currently have no plans to add any new API features or changes to the file selection stage.

Download List Management

The goal of this stage is to download and clear the files from the user’s download list. There can be two distinct categories of files on the user’s download list:

  • Available for Download - Files in this category have no access restrictions, or access restrictions that the user has already met. There is an ongoing debate if FileEntities with ExternalFileHandles should be in this category.

  • Unavailable for Download - Files in this category have access restrictions that the user has not met.

While the user is free to remove files from their list or even clear their entire list, their main goal is to download the files on their list. Once a file is successfully downloaded, it should automatically be removed from the download list.

If there are only a few files available for download on the user’s list, then downloading each file individually through the UI will be an option. If there are less than 2 GB of total data, then the user will have the option of initializing the creation of a “package” zip files from the UI. However, for cases where there are many files on their list, we want to guide the user to use one of the programmatic clients to perform the actual download. The number of files that can be managed by a programmatic client is only bound by available disk space.

Proposed API

Deprecate Existing APIs

With the previous download list limit of 100 files, all items of a user’s download list would fit in client side memory.  This allowed us to create a super simple API. Most of the API calls returned the full download list.  Since the amount of data on the list was small, the UI was able to handle most of the complexity. For example, the UI would group files on the list into categories such as files available for download, files with access restrictions and files that are external.

In order to remove the limit, all calls that examine or manipulate the download list must be paginated. All grouping, sorting, and statistics must now be provided by the API. Workers that add files to the download list from view queries, must be converted to stream over the potentially unbounded results.

Therefore, we are proposing to completely deprecate, and eventually remove, all of the existing download list API calls. The existing API calls, are the last nine methods on: File Services.

Download List Management

The driving use case from the design involves allowing users to add files from a file view. This includes adding individual files, or all files based on the current filters applied to the view.

 

APIs

Response

URL

Request

Description

Response

URL

Request

Description

AsynchJobId

POST /user/<user_id>/download/list/add/async/start

 

AddResultsToDownloadListRequest

Start an asynchronous job to add files from the given View query or folder the user’s download list. In all cases the current version of the file entity will be explicitly added to the user’s download list. When adding files from a folder, only the direct children of the folder will be added. It should be possible to add a ‘recursive’ option in the future.

AddResultsToDownloadListResponse

GET /user/<user_id>/download/list/add/async/get/<job_id>

AsynchJobId

Get the results of a job to add query results to the user’s download list.

{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "org.sagebionetworks-AddResultsToDownloadListRequest", "description": "Start an asynchronous job to add files from the given View query or folder the user’s download list,", "allOf": [ { "$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody" } ], "properties": { "query": { "description": "Results from this view query will be added to the user's download list. This parameter should be excluded when adding files from a folder.", "$ref": "org.sagebionetworks.repo.model.table.Query" }, "folderId": { "description": "The synID of a folder to add all of the children from the folder to the user's download list. This parameter should be excluded when adding files from a query.", "type": "string" }, "useVersionNumber": { "description": "When true (default), the version number will be included for each file added to the user's download list. When set to false, the version number will be excluded, indicating that the 'current' version should always be downloaded.", "type": "boolean" } } }

 

{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "org.sagebionetworks-AddQueryResultsToDownloadListResponse", "description": "The results of a job to add the files from a query result or folder to the user's download list.", "allOf": [ { "$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousResponseBody" } ], "properties": { "numberOfFilesAdded": { "description": "The number of files that were added to the user's download list.", "type": "string" }, "totalNumberOfFliesOnDownloadList": { "description": "The total number of files on the user's download list.", "type": "string" } } }

 

Response

URL

Request

Description

Response

URL

Request

Description

AddBatchOfFilesToDownloadListResponse

POST /user/<user_id>/download/list/add

AddBatchOfFilesToDownloadListRequest

A request to add a batch of files to the user’s download list. There is a limit of 1000 files per batch.

{ "$schema": "http://json-schema.org/draft-07/schema", "$id": "org.sagebionetworks-IdAndVersion", "description": "", "properties": { "fileEntityId": { "description": "The 'syn' identifier of a Synapse FileEntity.", "type": "string" }, "versionNumber": { "description": "Optional. When include, indicates a specific version number of a FileEntity. When excluded, this is a reference to the current version of the FileEntity.", "type": "integer" } } }

 

 

 

Response

URL

Request

Description

Response

URL

Request

Description

RemoveBatchOfFilesFromDownloadListResponse

POST /user/<user_id>/download/list/remove

RemoveBatchOfFilesFromDownloadListRequest

A request to remove a batch of files from the user’s download list. The client is expected to remove files from the download list after download using this call.

There is a limit of 1000 files per batch.

 

 

Response

URL

Request

Description

Response

URL

Request

Description

 

DELETE /user/<user_id>/download/list

 

Clear all files from a user’s download list.

DownloadListStatistics

GET /user/<user_id>/download/list/statistics

 

Get the statistics about the the files on the user’s download list

 

 

Response

URL

Request

Description

Response

URL

Request

Description

DownloadListPageResponse

GET /user/<user_id>/download/list

DownloadListPageRequest

Get a single page of files on the user’s download list. Note: This call will only return files that the users can download. There is a limit of 1000 files per page.

 

 

 

Response

URL

Request

Description

Response

URL

Request

Description

GetActionRequiredResponse

POST /user/<user_id>/download/list/action/required

GetActionRequiredRequest

Get a single page of results that summarizes actions that the user must take to download inaccessible files from their download list.

 

 

 

Response

URL

Request

Description

Response

URL

Request

Description

AsynchJobId

POST /user/<user_id>/download/list/manifest/async/start

 

GetDownloadListMainifestRequest

Start an asynchronous job to add to generate a metadata file manifest of all of the files on the use’s download list. The manifest will be a sparse matrix CSV including all annotations and general files metadata.

GetDownloadListMainifestResposne

GET /user/<user_id>/download/list/manifest/async/get/<job_id>

AsynchJobId

Get the results of a job to generate a manifest files for the files on the user’s download list.

 

Response

URL

Request

Description

Response

URL

Request

Description

AsynchJobId

POST /user/<user_id>/download/list/package/async/start

 

GetDownloadListPackageRequest

Start an asynchronous job to add all files from the user’s download list to a packaged zip file. There is a two GB limit on the total size of the resulting zip file. This job will attempt to create the largest possible zip files while remaining under the limit. Files that are packaged will automatically be removed from the user’s download list. It might take multiple jobs to package all of the files on the user’s download list.

GetDownloadListPackageResponse

GET /user/<user_id>/download/list/package/async/get/<job_id>

AsynchJobId

Get the results of a job to generate a package of file from a user’s download list.