Document toolboxDocument toolbox

Bulk File/Table Download via Web Client REST API

For use cases see: Bulk File/Table Download via Web Client

Introduction

The bulk file download via the web client is a new feature that will allow users to select files, review/refine the file selection, and then download all of the files as a single zip file.  The workflow consists of five basic phases: file selection, selection review, download, download order review, download history.   While a typical workflow flow might involve the user moving through the all of the phase in order, the user is free to move between phases at will.

File selection

The user's goal for this phase will simply be to select files they wish to download from various sources in Synapse. This phase is similar to the product selection phase of an online shopping experience. In this phase the user will be able to select files from the following sources:

  • Add all files from a folder.  Note: This operation is not recursive, so files within sub-folders will not be added.
  • Add a single file from a folder.
  • Add all of the files listed in a view query.  We do not plan to support adding individual files from a view, instead the user is expected to refine their query to sub-select files from views.  Note: This operation will add all files from the view query, not just the single page of files shown in the UI.

Selection review

The user's goal for this phase is to review and refine the files they selected before they start the actual download.  This phase is similar to the review of a shopping cart/basket of an online shopping experience. All of the files the users selected in the file selection phase will be consolidated in the user's private download list.  The download list will include the following information about each file:

  • Link to the original file
  • File size
  • File availability:
    • Are there any unmet access restrictions on the file?
    • Is the file type available for bulk download?  For example, external file links and SFTP files will not be available for bulk download.

For each file in the download list the user will have the option to perform the following actions:

  • Request access for unmet access restrictions
  • Remove the file from the list.

The user will have the option to perform the following actions on the entire download list:

  • Clear the list
  • Download the list (transition to download phase)

Download

The user's goal for this phase will be proceed with the actual file download.  This phase is similar to the checkout phase of the an online shopping experience.  In this phase the user will provide a name for their zip file and will be presented with the sub-set of files that will actually be included in the download (unavailable files will be excluded).  When the user chooses to proceed with the download, the download order will be started (see Figure 1).

Figure 1. State transition from a download list to a final file download.

Download transaction

When the user orders the download of their download list, a download transaction will be started.  The download transaction consists of the following operations:

  1. User's download list will be locked.
  2. All of the availability files will be moved from the user's download list to a newly created download record.  Unavailable files will remain on the user's download list.
  3. Releases the lock on on the user's download list.

If any errors occur during this transaction all changes will be rolled back, restoring their download list to its starting state.  The user will be blocked from making changes to their download list during the execution of the download transaction.  Upon success, the user will be transitioned to the download order review phase to review the newly created download order and ultimately download the file.

Download Order Review

The user's goal for the download review phase is to review an existing download order.  This phase is similar to the review of an existing order from an online shopping experience.  Figure 1. shows two start points, the first involves the creation of a new download order from the user's download list, and the second starts from an existing download order.  In either case the user will be able to select an existing download orders to start the actual bulk file download job and ultimately download the file.  Download orders are immutable and stateless.  The user will be free to re-download any of their previous download orders from this phase.

Download History

The user's goal for the download history phase is to review previous download orders.  This phase is similar to reviewing previous order from an online shopping experience.  In this phase the user will be able see a listing of all of their previous orders in reverse chronological order.  When the user wishes to download a previous order they will transition back to the download order review phase.

Limitations

Managing file selection across a paginated list of results creates an awkward user experience.  Therefore, the entire download list must be presented to the user without pagination (scrollbars are allowed).  This means there must be a limit on the number of files allowed in the download list.  The download list must be small enough to be fetched as a single web-service request.  A download list will have a limit of 100 files.

There is also a limit to how long a users will wait for their file downloads to be created.  Currently, it can take 10 minutes to prepare a 2 GB zip file for download.   A download list will have a maximum size of 2 GB (sum unzipped files must be less than 2 GB).

File sizes

One of the implied requirements from Ljubomir's design is the availability of the total size of all files in both view query results and folder navigation.  The file sizes will be used to estimate the download time based on the user's current network speeds.  The sizes will also be used to help the user keep their download list under the maximum size.

View Query Results

Since the user will only have the option to add all of the files from a give view result, and not just the currently shown page, the file size results will need to include the size of all files for a given query.  This is similar to the query count already available to in table query results.   The proposal is to add a new mask to the existing QueryBundleRequest.partMask call 'fileSizes' with a value of 0x20.  When the 'fileSize' mask is include the resulting QueryResultBundle will include a numeric value called 'sizeOfAllFilesMB'.

Folder Navigation

Unlike View query results, users will have the option to add one file at a time from the folder navigation.  This implies that we will need to show the size of each individual file in the folder navigation.  We should be able to use the existing POST/fileHandle/batch to get the file handles for a single page of files shown in the folder navigation.

To support adding all of the files in a folder (non-recursive) (use case 1a) we will need to return the total number of files in a folder and the total size of all files in the folder from POST/entity/children.  The proposal is to add a 'partMask' (similar to QueryBundleRequest) with 0x01=count and 0x02=totalFileSizeMB..

REST APIs

DownloadList
List<FileHandleAssociation> filesToDownload
Date updatedOn
DownloadOrder
String orderId
List<FileHandleAssociation> files
Date createdOn
String createdBy
String zipFileName
Long totalSizeMB
DownloadOrderSummary
String orderId
Date createdOn
String zipFileName
Long numberOfFiles
Long totalSizeMB

<<Interface>>

AddRequest

AddFolderRequest implements AddRequest
String parentId
AddQueryRequest implements AddRequest
Query query
DownloadOrderRequest
LIst<FileHandleAssociation> subSet
String zipFileName


PhaseDescriptionResponsePathRequest
File SelectionStart an asynchronous job to add all of the files from either a folder or a view query to the user's DownloadListAsyncJobIdPOST /download/list/<userid>/add/async/startAddRequest
File SelectionGet the results of an asynchronous job to add files to a user's download list.DownloadListGET /download/list/<userid>/add/async/get/<jobid>
File SelectionAdd a single file to a user's download list.DownloadListPOST /download/list/<userid>/addFileHandleAssociation
Selection ReviewGet a user's download list.DownloadListGET /download/list/<userid>
Selection ReviewRemove a list of file from a user's download list
POST /download/list/<userid>/removeList<FileHandleAssociation>
Selection ReviewClear a user's download list
DELETE /download/list/<userid>
DownloadCreate a DownloadOrder from the user's current download list.DownloadOrderPUT /download/list/<userid>/orderDownloadOrderRequest
Download Order ReviewGet a DownloadOrder given its ID.DownloadOrderGET /download/order/<orderId>
Download Order ReviewStart an asynchronous job to download a download order.AsyncJobIdPOST /download/order/<orderId>/async/start
Download ReviewGet the results of an asynchronous job to download a download order.BulkFileDownloadResponsePOST /download/order/<orderId>/async/get/<jobId>
Download HistoryGet a user's previous download order history in reverse chronological order.Paginated<DownloadOrderSummary>GET /download/order/<userId>/history

Review Notes

  1. Do not use file size to restrict the addition of a file to the list. Allow the user's download list to be larger than the size limit.  Instead block the download of a list if over the size limit.
  2. Download order should include a sub-list so use can choose to download a sub-set of the their list.  This is part of allowing user's download list to be over the max file size.