Table of Contents | ||||||
---|---|---|---|---|---|---|
|
...
- The process is relatively lengthy and requires non trivial technical skills
- The data is limited to a 6 months window, the current solution to this problem is to store incremental updates on an external source (cvs file on S3)
- The data cannot be easily integrated in the synapse portal or other components (in some cases the files are manually annotated with the number of downloads)
- The system has an all or nothing policy for accessing the data, that is (for good reason) only accessible to a specific subset of synapse employees, this does not allow the users of the synapse platform to access this kind of data without asking a synapse engineer
An example of usage report generated using the Synapse Usage Report written by Kenny Daily using the data warehouse:
Files and Downloads
Files in synapse are referenced through an abstraction (FileHandle) that maintain the information about the link to the content of the file itself (e.g. an S3 bucket). A file handle is then referenced in many places (such as FileEntity and WikiPage, see FileHandleAssociateType) as pointers to the actual file content. In order to actually download the content the synapse platform allows to generated a pre-signed url (according to the location where the file is stored) that can be used to directly download the file. Note that the platform has no way to guarantee that the pre-signed url is actually used by the client in order to download a file. Every single pre-signed url request in the codebase comes down to a single method getURLForFileHandle.
...
"direct" downloads statistics
Included APIs | Average Daily Downloads | Max Daily Downloads |
---|---|---|
|
2324 | 4030 | |
| 67 | 245 |
"Batch and bulk" Downloads
Association Type | Average Daily Downloads | Max Daily Downloads | Average daily users | Max daily users |
---|---|---|---|---|
All | 33175 | 167002 | 138 | 227 |
FileEntity | 12100 | 164269 | 58 | 100 |
Proposed API Design
...
Method | Endpoint | Request Body | Response Body | Description | Restrictions |
---|---|---|---|---|---|
POST | /asynchronous/job |
| AsynchronousJobStatus: the responseBody property in the status will contain either the
| Allows to submit a job to gather the download statistics for an entity. The id returned by the request is used in order to get the job status. |
|
GET | /asynchronous/job/{id} | N/A | AsynchronousJobStatus | Allows to get the current status of the statistics job with the given id. |
...