Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel3
indent15px
styledics

...

  1. Downloads Count
  2. Page Views
  3. Data Breaches/Audit Trail

Downloads Count

This is the main statistic that the users are currently looking for, it provides a way for project owners, funders and data contributor to monitor the interest over time in the datasets published in a particular project, which then reflects on the interest on the project itself and it is a metric of the value provided by the data in the project. This kind of data is related specifically to the usage of the platform by synapse users, since without being authenticated the downloads are not available. This is part of a generic category of statistics that relates to the entities and metadata that is stored in the backend and it's only a subset of aggregate statistic that can be exposed (e.g. number of projects, users, teams etc).

Page Views

This metric is also an indicator to monitor the interest but it plays a different role and focuses on the general user activity over the synapse platform as a whole. While it might be an indicator for a specific project success it captures a different aspect that might span to different type of clients used to interface on the Synapse API and that include information about users that are not authenticated into synapse. For this particular aspect there are tools already integrated (E.g. google analytics) that collect analytics on the user interactions. Note however that this information is not currently available to the synapse users, nor setup in a way to produce information about specific projects pages, files, wikis etc.

Data Breaches/Audit Trail

Another aspect that came out and might seem related is the identification of when/what/why of potential data breaches (e.g. a dataset was released even though it was not supposed to). This relates to the audit trail of users activity in order to identify potential offenders. While this information is crucial it should not be exposed by the API, and a due process is in place in order to access this kind of data.

Project Statistics

With this brief introduction in mind this document focuses on the main driving use case, that is:

  • A funder and/or project creator would like to have a way to understand if the project is successful and if its data is used.

There are several metrics that can be used in order to determine the usage and success of a project, among which:

  • Project Access (e.g. page views)
  • Number of Downloads
  • Number of Uploads
  • User Discussions

...

Files, Downloads and Uploads

Files in synapse are referenced through an abstraction (FileHandle) that maintain the information about the link to the content of the file itself (e.g. an S3 bucket). A file handle is then referenced in many places (such as FileEntity and WikiPage, see FileHandleAssociateType) as pointers to the actual file content. In order to actually download the content the synapse platform allows to generated a pre-signed url (according to the location where the file is stored) that can be used to directly download the file. Note that the platform has no way to guarantee that the pre-signed url is actually used by the client in order to download a file. Every single pre-signed url request in the codebase comes down to a single method getURLForFileHandle.

...

Represents the response for a job request for computing the project statistics (In response to a ProjectStatisticsRequest).

PropertyTypeDescription
downloads

DownloadStatistics

Contains the download statistics for the project specified in the request, this is included only if the mask property in the ProjectStatisticsRequest has the 0x1 flag set.
uploads

UploadStatistics

Contains the upload statistics for the project specified in the request, this is included only if the mask property in the ProjectStatisticsRequest has the 0x2 flag set.

DownloadStatistics/UploadStatistics

PropertyTypeDescription
lastUpdatedOnDateContains the last update date for the download/upload statistics, this value provide an approximation on the freshness of the statistics
monthlyARRAY<StatisticsCountBucket>An array containing the values for this aggregate for monthly download/upload count for the last 12 months, each bucket aggregates a month worth of data. The number of buckets is limited to 12. Each bucket will include the unique users count for the month (In the extra.usersCount property).

StatisticsCountBucket

The purpose of this object is to include information about the count of a certain metric within a specific time frame , that may include extra information about this specific bucket in the extra property(in this case monthly):

PropertyTypeDescription
startDateDateThe starting date of the time frame represented by the bucket
countINTEGERThe download/upload count in the time frame
extrausersCountObject/MapExtra values that are part of this count INTEGERThe number of unique users that performed a download/upload in the time frame of the bucket


Code Block
languagejs
titleExample
{
	"downloads": {
		"lastUpdatedOn": "2019-26-06T01:01:00.000Z",
		"monthly": [{
			"startDate": "2019-01-06T00:00:00.000Z", 
			"count": 1230,
			"extra": {
				"usersCount": 10 
			}
		},
		{
			"startDate": "2019-01-05T00:00:00.000Z", 
			"count": 10000,
			"extra": {
				"usersCount": 100 
			}
		}]
	},
	"uploads": {
		"lastUpdatedOn": "2019-26-06T01:01:00.000Z",
		"monthly": [{
			"startDate": "2019-01-06T00:00:00.000Z", 
			"count": 51200,
			"extra": {
				"usersCount": 200

			}
		},
		{
			"startDate": "2019-01-05T00:00:00.000Z", 
			"count": 10000,
			"extra": {
				"usersCount": 100 
			}
		}]
	}
}

Endpoints

The API reuses the endpoints from the Asynchronous Job API (We could potentially add a dedicated /statistics endpoint just for clarity).

MethodEndpointRequest BodyResponse BodyDescriptionRestrictions
POST/asynchronous/job

ProjectStatisticsRequest

AsynchronousJobStatus: the responseBody property in the status will contain the ProjectStatisticsResponseAllows to submit a job to gather the download statistics for a project. The id returned by the request is used in order to get the job status.
  • The project specified in the request should exist (404 is not)
  • The current user should be the owner (and/or administrator) of the project (if not 403)
GET/asynchronous/job/{id}N/AAsynchronousJobStatusAllows to get the current status of the statistics job with the given id.

...