Document toolboxDocument toolbox

Evaluation Submissions Annotations

Linked JIRA:

PLFM-4595 - Getting issue details... STATUS

Overview

Synapse provides a set of services that allow users to manage submissions of file and docker entities in order to support data analysis that can be performed by external systems (See https://sagebionetworks.jira.com/wiki/spaces/PLFM/pages/30441522/Evaluation+API).

Over the years the system evolved towards a model which is a precursor of the current view engine. In particular while each submission is an immutable object, its status is represented by a mutable object (SubmissionStatus) that can be annotated by a user that has UPDATE_SUBMISSION access to the evaluation the submission is tied to.

The annotated submissions are indexed asynchronously in special purpose tables that contain the annotation key/value pair in a semi-normalized format so that they can be queried efficiently.

This system and its architecture is very close to the current engine used to query annotated entities, which are indexed separately and on top of which views can be built for efficient querying.

Both systems use an SQL like language for querying, but they follow different conventions (See Evaluation Query and Table Queries) and are developed in parallel. Additionally the annotations on submissions and entities use a different API model that is not compatible.

Finally the way the queries are submitted, their access model and the way results are retrieved is quite different: while the evaluation API provide a synchronous API to query the data that directly builds an SQL query on the indexing tables, querying entities indexed through their annotation require the user to first specify a scope and a meta-model which together constitute a view and submit an asynchronous job to query and retrieve the results.

The underlying issue is therefore the added overhead in maintaining two separate systems that achieve the same result, both from an engineering perspective as well as an end-user perspective which bring confusion and documentation conflicts in presenting the system. While both systems have of course issues, maintaining both when the more advanced and adopted entity query system already solves some of the problems is burdensome and far from ideal.

Additionally the entity query engine is richer in features, for example faceting, multiple value annotations, ACL on views etc.

The scope of this document is to outline a potential solution to move toward a single unified system that allows to deprecate in time the submission query system and finally adopt the entity query engine as primary solution for querying submissions.

Submissions as Entities?

Since the current annotation and entity query system work already on top of entities, an idea could be to turn the current evaluation objects (Evaluation, Submission etc) into entities (See

CHL-14 - Getting issue details... STATUS
for a proposal). After all an evaluation is a container of submissions and evaluations live in a project (note that an evaluation is not necessarily linked to a challenge).

We could see a parallel:

Evaluation → Folder

Submission → File

On the surface it would make sense to go for this re-factoring as this would automatically inherit the entity query engine. While this sounds easy in theory, in practice refactoring the system this way would most likely introduce more issues than it solves. Among others:

  • The ACL scheme adopted by the evaluation API is completely different (for good reasons). Different access levels are required for submissions vs evaluations and in general the submission do not inherit from the evaluation.

  • The life cycle of the evaluations objects is different than the one of entities: for example submissions are immutable and cannot be deleted (but from the evaluation manager) and their management is driven by the evaluation they are associated with.

  • Additional and dedicated APIs are needed for evaluations, for example to inspect submission eligibility. While their basic CRUD operations could be managed through the entity APIs we risk having dedicated APIs on top that work on evaluations which might be confusing.

  • It would also introduce several complexities around the the user experience, how do we represent these objects? Do we need to filter them in a project? Do we put them along side the project itself (e.g. See docker repository)?

  • Documentation: If would be hard from an API perspective to document their usage and constraints

Some advantages if we model evaluation and submissions as entities:

  • We can automatically index annotations using a common and known API

  • Views could be created both on top of evaluations and submissions

  • We wouldn’t have to (heavily) alter the view engine to support external objects (we would still need to introduce add in the replication index system properties that are tied to evaluations and submissions)

Proposed solution: decouple entities and annotations

The proposed solution is instead to take a different approach. What if we instead generalize the annotation system to allow annotating objects other than entities? It seems that the direction the platform is going for is to extend the type of objects that can be annotated (e.g. Access Requirements) so that the same query system can be enabled for other objects as well.

This has the advantage that anytime we need to enable annotations on an object we do not have to refactor the system to move objects under the entity hierarchy but rather we add annotations to the object itself retaining the object life cycle.

An additional advantage is that we can bring the feature to the end-user in a faster way, without having to replicate the API to work under the entity hierarchy assumption and at the same time reducing the amount of refactoring needed for clients that use the current API.

The proposal is therefore to avoid recreating from scratch a completely different API for evaluations that solves all the current issues (with the potential to introduce new issues), but instead purely focus on the submission annotations and querying through views.

Proposed API

For above proposal to work we need to introduce new APIs for the evaluations so that the new type of annotations supported by the replication index can be added to submissions and at the same time provide a way to create views for the submissions so that the table/view query system can be used as a replacement of the current submission query.

Evaluation API additions

The Evaluations Service currently provide a way to annotate the submissions (which are immutable) through a mutable object, SubmissionStatus. This object contains a property named annotations that uses the old format supported by the submission query engine.

We can reuse the same exact APIs and add a new attribute to the SubmissionStatus named submissionAnnotations that can transport the annotations in the new format instead. The previous annotations attribute will be deprecated (but still supported) and not used to index the annotations in the replication index.

This has the advantage of not having to reimplement the logic around the ACL for submissions.

Entity and Table APIs additions

In order to support querying the annotated submissions we need to introduce a new type of view, similar to an EntityView. We propose to be explicit and create a SubmissionView that simply specifies the scope:

  • scopeIds: The list of evaluations ids that define the scope of the submissions to query

The EntityView currently allows to define a typeMask to filter the object sub-types. Views on submissions are not hierarchical and do not have different subtypes so we do not have the need for the type mask for now, we introduce a new super interface named View that both EntityView and SubmissionView implement. Since all the view types need to define a scope the View interface will embed the scopeIds, while the EntityView will have the current attribute for the type mask. This allows for the API to be backward compatible but more extensible.

For a SubmissionView the creator must be granted ACCESS_TYPE.READ_PRIVATE_SUBMISSION on each of the evaluations in the scope. Note that this is actually more strict that the current submission querying system, in fact today in order to query the submissions of an evaluation only ACCESS_TYPE.READ access is needed to query the submissions.

The system attributes of a submission that are currently indexed in the submission query system should also be available in the view (e.g. status, userId, teamId, submitterId, name, createdOn, submitterAlias etc). The way how those are indexed will work similarly to how the entity system attributes are indexed and exposed.

Note: The canCancel, cancelRequested and cancelControl will NOT be included in the first release. This is because they are currently serialized into the database as a blob values and they need to be normalized into their own columns in order to be efficiently retrieved when indexed.

Once a view is exposed it can be shared using the standard ACL controls on the view (which is an entity) so that it can be queried (e.g. in a leaderboard) by other users. This allows more flexibility than the current model which includes in the annotations an isPrivate boolean to “hide” non-public annotations. The user creating the view could create multiple views exposing different columns for different purposes.

The rest of the table APIs should largely remain the same with minor additions to accommodate the new view type, in particular:

  • Two new ColumnTypes: SUBMISSIONID and EVALUATIONID, so that they can be rendered in the view to point to the relative objects

  • A new EntityType: submissionview for the new type of view

  • A new viewEntityType parameter in the https://rest-docs.synapse.org/rest/GET/column/tableview/defaults.html: This is needed since at the moment the only information that is passed is the viewTypeMask which is not enough to discriminate between different view types. By default it will be set to entityview to maintain backward compatibility. If the submissionview is used then the viewTypeMask will not be taken into consideration (as submission views do not have subtypes).

  • Similarly a new viewEntityType attribute will be added in the ViewScope object used in the https://rest-docs.synapse.org/rest/POST/column/view/scope.html API to discriminate between different view types.

The viewEntityType parameter/attribute will be an enumeration which is a subset of the EntityType enumeration and that will contain only the entity types that define a view: entityview and submissionview.

Submissions Schemas

In the future with the introduction of schemas for objects we could introduce binding of a schema to the annotations on submissions. In particular we could bind a schema to an evaluation so that the submissions annotations would abide by the rules set on the schema. At the same time creating a view on submissions could be guided by the same schema.