RecordSet Synchronization

RecordSet Synchronization

This document describes proposed Synapse functionality to support the scenarios described by PLFM-9608

Specifically, we aim to support updating the columns in a GridSession sourced from a RecordSet after that RecordSet's bound JSON Schema is changed or updated. This proposal does so by adding comprehensive synchronization support, similar to synchronization added for EntityView-based grid sessions in PLFM-9356.

New Functionality

The GridSession object is extended to store the RecordSet's revision (version) number. These are set when a GridSession is created from a RecordSet, and updated any time the 'pull' algorithm below is completed.

{ "description": "Basic information about a grid session.", "properties": { // ...existing properties "sourceEntityVersionNumber": { "type": "integer", "description": "If referencing a RecordSet (CSV), this represents the version number of the source entity at the time this grid was created or last synchronized. This reference enables the system to identify if rows were deleted in a GridSession." } } }

While this document does not propose implementing synchronization for grids created from a TableEntity, such a feature could re-use the sourceEntityVersionNumber property.

Pull

  1. Verify all rows in the grid have a non-null upsertKey, and that the upsertKey is unique for each row.

  2. Columns are reconciled with the schema currently bound to the RecordSet.

    1. If the specific, immutable JSON Schema $id of the GridSession matches the schema bound to the RecordSet, this step can be skipped.

    2. Add any missing columns that are in the new schema

  3. Merge data from the latest revision into the grid. Each row will be individually considered using the following algorithm.

    • Rows are matched using the upsertKey.

    • If the current revision matches the last synced revision, data does not need to be pulled.

    • If the current revision is newer than the last synced revision, and the row exists in both the grid and the latest revision, each cell in the row is compared. If the grid cell was modified (CRDT attribution), then the modification will be preserved and re-attributed to the system. If the grid cell was not modified, the value in the latest revision will be used (if different).

    • If the current revision is newer than the last synced revision, and the row exists in the current revision but not the grid, then we must load the last-synced revision to determine if the row was added in a recent revision, or was deleted by the user. If the row does not exist in the last-synced revision, it is added to the grid. If it does, then we consider it to have been removed in the current grid session, and will not be added back to the grid.

  4. Update the GridSession's sourceEntityVersionNumber and gridJsonSchema$Id. If the JSON Schema changed, or did not reference a specific immutable version, clear all validation information and trigger re-validation for every row.

  5. Re-attribute all modified cells in the GridSession to the system so they no longer appear to be user-modified (this ensures that future merges are performed accurately)

Truth table for a row-wise merge algorithm:

Row is in Grid

Grid Row includes user changes

Row is in Synced Revision

Row is in New Revision

Variant Description

Outcome

Potential Data Loss Scenario

Row is in Grid

Grid Row includes user changes

Row is in Synced Revision

Row is in New Revision

Variant Description

Outcome

Potential Data Loss Scenario

New row was added to grid

No change to grid

 

-

Row was removed from grid and new revision

No change to grid

 

-

Row was added to revision

Add new row to grid

 

Row was removed by a new revision

Remove row from grid

 

Row was changed in grid, removed by a new revision

No change to grid (next push will add this row back)

 

Row was added in grid and in a recent revision.

Current row data that matches the new schema is maintained. If the latest revision contains any new columns, that row's data will be added to the grid.

⚠️

-

Row was removed from grid

No change to grid (next push will remove it from the CSV)

⚠️

Row is in all 3

Pull any changes from latest revision

 

Row is in all 3, grid has changes by a user

Keep all cell changes applied by a user or agent. Any data in cells that were not changed by a user or agent will be updated to the current revision's value. If the latest revision contains any new columns, that row's data will be added to the grid.

⚠️

Any grid row not changed by the user necessarily exists in the synced revision; combinations violating this are unreachable and omitted.

Additionally:

  • A change to a row's upsertKey value is indistinguishable from, and handled as, a deletion of the old-key row plus an addition of the new-key row; it therefore reduces to the cases above.

  • If a RecordSet's bound schema is the same, but does not reference a specific (immutable) version of the JSON Schema, columns will still be changed, and all grid rows will be re-validated.

Push

The 'push' step behavior is similar to the existing RecordSet export service, creating a new version of the source RecordSet with a new CSV containing the grid data. Push should be blocked if the last-synced revision does not match the latest revision. Once the 'push' step completes, the system will update the GridSession with the new sourceEntityVersionNumber.

API Changes

org.sagebionetworks.repo.model.grid.SynchronizeGridRequest is updated to support separating PULL/PULL_PUSH logic. EntityViews currently do a PULL_PUSH; implementing PULL only for Views is feasible but out of scope for this design.

{ "description": "Start a new job to synchronize a grid session with its source data.", "implements": [ { "$ref": "org.sagebionetworks.repo.model.asynch.AsynchronousRequestBody" } ], "properties": { "gridSessionId": { "type": "string", "description": "The ID of the grid session to synchronize." }, "syncType": { "type": "string", "enum": ["PULL", "PULL_PUSH"], "description": "The type of synchronization to perform. For both values, the grid will be updated. If \"PULL_PUSH\", the referenced entities (EntityView-based grids) or the source RecordSet (RecordSet-based grids) will be updated after updating the grid with the latest data. Default is \"PULL_PUSH\". Currently, \"PULL\" is only supported for RecordSet-based GridSessions." } } }

Comments

Nick Grosenbacher
June 16, 2026

It is not guaranteed that a newer RecordSet version follows the upsertKey rule, so we also need to handle importing rows from the CSV that have an incomplete upsertKey

Lingling Peng
June 22, 2026

Currently, the Python client also doesn’t have a way to handle incomplete upsertKey: https://python-docs.synapse.org/en/stable/reference/experimental/async/curator/?h=import+csv#synapseclient.models.Grid.import_csv_async . Do you also plan to add how we should handle incomplete upsert key when importing CSV in this design?