Curation Task v2 API Design
The Tasks v1 API design can be found here: Design+for+Records-Based+Metadata+Curation.
Problem Statement:
The current implementation of Curation Tasks in Synapse served as an effective MVP for organizing file-based and record-based metadata. However, as the system has scaled to support larger consortia and collaborative teams, three critical areas of friction have emerged that the V2 API intends to resolve:
1. Fragile Session Resolution and Data Integrity
Currently, the responsibility for finding or creating a metadata grid session is delegated entirely to the UI layer. Because tasks are not formally linked to specific session IDs, the UI must "guess" which session to open based on the task definition. This architectural gap leads to:
Duplicate Sessions: Multiple users (or even a single user in multiple tabs) can accidentally spawn parallel sessions for the same task.
Version Confusion: Users may be directed to the "latest" session by default, which might not be the session currently under review or the one that was recently completed.
Work on Stale Data: Curators can inadvertently land in a session that should be closed or read-only, risking data loss or conflicting updates.
2. Visibility and Scaling Issues (The "Curator's Burden")
As users become responsible for metadata curation across dozens of projects, the current project-centric view has become a bottleneck.
Lack of Global Task Access: There is currently no unified way for a user to see all tasks assigned to them or their teams across the entire Synapse platform.
Cluttered Workspaces: Without a formal
statusfield (e.g., Not Started, In-Progress, Completed), all tasks—including those that have already been finalized—remain visible in the project tab, creating significant visual noise.Ineffective Filtering: Current filtering is limited, making it difficult for users to isolate high-priority tasks or tasks specifically assigned to them within high-volume projects.
3. Ambiguous Collaboration and Verification Workflows
While the system supports Teams, the lack of a formal task lifecycle creates ambiguity regarding ownership and completion.
Concurrent Editing Risks: Without a linked session, team members cannot reliably know if a collaborator is currently working on a shared task.
Undefined Completion: There is no distinct state to signal that a task has been completed and verified by a data manager.
Objectives for V2
To address these issues, the V2 API will introduce a task status tracker with client-orchestrated session linking, optimistic concurrency control, and comprehensive cross-project filtering for assignees (Users and Teams).
V2 API
This V2 design addresses the original MVP's technical debt through three primary mechanisms:
A. Client-Orchestrated Session Linking (Solving Problem 1)
In V1, the UI "guessed" the session ID, leading to parallel, duplicate sessions.
V2 Solution: The task system now tracks a
TaskStatusthat includes anactiveSessionIdvia polymorphicTaskExecutionDetails. When an assignee starts a task, the client creates a grid session (using the existing asyncCreateGridRequest) and then updates the task status toIN_PROGRESSwith the session ID linked — all in a single synchronousPUTcall with etag-based optimistic concurrency. Race conditions are handled gracefully: if two assignees start simultaneously, the second caller's etag check fails (409), and the client re-fetches the current status viaGET /curation/task/{taskId}/statusto get the fresh etag and linked session.
B. Global Visibility & Filtering (Solving Problem 2)
V1 required users to manually check every project for tasks, creating a "Curator's Burden."
V2 Solution: The
ListCurationTaskRequestnow supports an optionalprojectId. If omitted, the API aggregates tasks across all projects where the caller has READ access. Combined withstateFilter(e.g., hideCOMPLETEDtasks),assigneeIdsfiltering, and theassignedToMeflag (which automatically includes all teams the caller belongs to), assignees get a unified cross-project task view.
C. Formalized Task Lifecycle (Solving Problem 3)
V1 lacked a way to track task progress and signal completion.
V2 Solution: A formal
TaskStateenum (NOT_STARTED, IN_PROGRESS, COMPLETED, CANCELED) with a separateTaskStatusobject that tracks state, execution details, and an etag for optimistic concurrency. Managers can filter for in-progress tasks, and theTaskBundlein list responses provides both task definition and current status at a glance.
All of the API changes are additive to the existing API, so there are no "breaking" API changes.
New Task State
We propose extending the curation tasks to support a basic state machine with the following possible states:
State | Definition | Session Status |
|---|---|---|
NOT_STARTED | Task is created but no work has begun. | No active session linked. |
IN_PROGRESS | An assignee has started the task. | Session created by client & linked via |
COMPLETED | Data Manager has verified the results. | The linked session remains unchanged. |
CANCELED | Data Manager no longer needs this task. A "soft" delete that removes tasks from most views while keeping them for historical purposes. | The existing grid session link is maintained. |
For more details on the possible states see: TaskState (enum).
New Task APIs
response | endpoint | request | description |
|---|---|---|---|
TaskStatus |
| — | Get the current status of a task. Useful for fetching a fresh etag after a 409 conflict. Requires READ access on the task's project. |
TaskStatus |
| TaskStatus | Update the state of a task. Requires the current |
Authorization for GET: READ access on the task's project.
Authorization for PUT: A user can update a task's status if:
The user has UPDATE access on the task's project (task managers), OR
The user is the assignee of the task (either directly or via team membership).
Client Workflow: Starting a Task
When an assignee clicks "Start" on a task that is in the NOT_STARTED state:
Client creates a grid session using the existing async job (
POST /grid/session/async/startwithCreateGridRequest) populated from the task's definition.Client updates the task status to IN_PROGRESS and links the newly created session via
PUT /curation/task/{taskId}/statuswith:state:IN_PROGRESSexecutionDetails:GridExecutionDetailswithactiveSessionIdset to the new session IDetag: the current task etag
Race condition handling: If two assignees click "Start" simultaneously:
Both create separate grid sessions (step 1 succeeds for both).
The first caller's
PUTsucceeds and links their session.The second caller's
PUTfails with 409 Conflict (stale etag).The client calls
GET /curation/task/{taskId}/statusto fetch the current status with a fresh etag, sees it is now IN_PROGRESS with a linked session, and redirects the second user to that session.The orphaned grid session created by the losing racer can be cleaned up by the client.
State Transition & Authorization Matrix:
Current State | Target State | Authorized User | Side Effects |
|---|---|---|---|
NOT_STARTED | IN_PROGRESS | Assignee | None (client creates and links session) |
IN_PROGRESS | COMPLETED | Manager Only | None |
ANY | CANCELED | Manager Only | None |
ANY | NOT_STARTED | Manager Only | None |
Database Design
The task status columns live on the same CURATION_TASK table and share the single ETAG column with the task definition. Any mutation — whether a task property update or a status transition — bumps the same etag. This ensures that migration (which detects row changes via CRC32(CONCAT(ID, '@', ETAG))) correctly picks up all changes:
`STATE` ENUM('NOT_STARTED','IN_PROGRESS','COMPLETED','CANCELED') NOT NULL DEFAULT 'NOT_STARTED',
`EXECUTION_DETAILS` JSON DEFAULT NULL,
`STATE_UPDATED_BY` BIGINT DEFAULT NULL,
`STATE_UPDATED_ON` TIMESTAMP(3) NULL DEFAULT NULL,Key design decisions:
Single ETAG column for optimistic concurrency on both task definition updates and status transitions. This avoids the problem where a separate
STATE_ETAGwould not be detected by the migration system's etag-based change detection, potentially causing data loss during blue-green deployments.MySQL ENUM for the STATE column provides type safety at the database level.
JSON column for
EXECUTION_DETAILSenables polymorphic execution details per task type (e.g.,GridExecutionDetailswithactiveSessionId,UploadExecutionDetailswithfileCount).No FK to GRID_SESSION — the session link is stored inside the JSON execution details, keeping the schema flexible for future task types.
DDL defaults (
STATE = 'NOT_STARTED') handle backfill of existing tasks automatically. CustomMigratableTableTranslationin the DBO handles migration from pre-v2 stacks.
New Model Objects
TaskState (enum)
{
"description": "The state of a curation task in its lifecycle.",
"type": "string",
"name": "TaskState",
"enum": [
{
"name": "NOT_STARTED",
"description": "The task has been created and assigned but work has not yet started."
},
{
"name": "IN_PROGRESS",
"description": "The assignee has actively started the task."
},
{
"name": "COMPLETED",
"description": "The task has been completed and verified."
},
{
"name": "CANCELED",
"description": "The task has been canceled and is no longer needed."
}
]
}TaskStatus (object)
{
"description": "Tracks the dynamic lifecycle and progress of a CurationTask.",
"properties": {
"taskId": {
"type": "integer",
"description": "The unique identifier of the associated curation task."
},
"state": {
"$ref": "org.sagebionetworks.repo.model.curation.TaskState",
"description": "The current state of the task in its lifecycle."
},
"executionDetails": {
"$ref": "org.sagebionetworks.repo.model.curation.TaskExecutionDetails",
"description": "Task-type-specific execution details. Null if no execution details are available."
},
"lastUpdatedBy": {
"type": "string",
"description": "The principal ID of the user who last updated the status."
},
"lastUpdatedOn": {
"type": "string",
"format": "date-time",
"description": "Timestamp of when the status was last updated."
},
"etag": {
"type": "string",
"description": "Optimistic concurrency control token for the task. Shared with the task definition — any mutation (task update or status transition) bumps this etag.",
"transient": true
}
}
}TaskExecutionDetails (interface)
{
"description": "An interface for task-specific execution details. The concrete type determines which task-type-specific properties are available.",
"type": "interface",
"properties": {
"concreteType": {
"type": "string",
"description": "Indicates which implementation of TaskExecutionDetails this object represents."
}
}
}GridExecutionDetails
{
"description": "Execution details for a metadata curation task involving a collaborative grid session.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.curation.TaskExecutionDetails"
}
],
"properties": {
"activeSessionId": {
"type": "string",
"description": "The unique identifier of the active CRDT grid session linked to this task."
}
}
}UploadExecutionDetails
{
"description": "Execution details for a file upload task.",
"implements": [
{
"$ref": "org.sagebionetworks.repo.model.curation.TaskExecutionDetails"
}
],
"properties": {
"fileCount": {
"type": "integer",
"description": "The current number of files successfully uploaded for this task."
},
"totalBytesUploaded": {
"type": "integer",
"description": "The sum of the size of all files uploaded for this task."
}
}
}TaskBundle
{
"description": "A bundle containing a CurationTask and its associated TaskStatus.",
"properties": {
"task": {
"$ref": "org.sagebionetworks.repo.model.curation.CurationTask",
"description": "The configuration and metadata of the task."
},
"status": {
"$ref": "org.sagebionetworks.repo.model.curation.TaskStatus",
"description": "The dynamic lifecycle state, including execution details and concurrency etag."
}
}
}Extended Model Objects
These are existing model objects that have been extended.
ListCurationTaskRequest
{
"description": "Request for a single page of CurationTasks with optional filtering.",
"properties": {
"projectId": {
"type": "string",
"description": "Optional. The synId of the project. If omitted, results are aggregated across all projects where the caller has READ access."
},
"assigneeIds": {
"type": "array",
"items": {
"type": "string"
},
"description": "Optional. Filter tasks assigned to specific users or teams. Cannot be combined with assignedToMe."
},
"assignedToMe": {
"type": "boolean",
"description": "Optional. When true, filter to tasks assigned to the caller or any team the caller belongs to. Cannot be combined with assigneeIds."
},
"stateFilter": {
"type": "array",
"items": {
"$ref": "org.sagebionetworks.repo.model.curation.TaskState"
},
"description": "Optional. Filter tasks by their current state."
},
"nextPageToken": {
"type": "string",
"description": "Forward the returned 'nextPageToken' to get the next page of results."
}
}
}ListCurationTaskResponse
{
"description": "A single page of CurationTasks.",
"properties": {
"page": {
"type": "array",
"items": {
"$ref": "org.sagebionetworks.repo.model.curation.CurationTask"
},
"description": "A list of task definitions only. Use 'bundlePage' for task status info."
},
"bundlePage": {
"type": "array",
"items": {
"$ref": "org.sagebionetworks.repo.model.curation.TaskBundle"
},
"description": "A list of task bundles containing both the definition and the current status."
},
"nextPageToken": {
"type": "string",
"description": "Forward this token to get the next page of results."
}
}
}