Large amount of messages on prod-291-ENTITY_REPLICATION_RECONCILIATION queue

Description

Last night and tonight, we noticed two burst of messages on the queue (~12M and ~7M).


Note: quick drop in number of msgs visible is us purging the queue.

Environment

None

Activity

Show:
Xavier Schildwachter
January 10, 2020, 3:08 AM

Collected some msgs on the queue:

John Hill
January 10, 2020, 8:54 PM
Edited

Here is the order of events that cause this issue:

  1. User runs a query against a view. If the view is currently 'available' a check will be made to determine if the view is up-to-date. If the view is out-of-date, the view state is changed to 'processing' to trigger a rebuild and the query message is returned to its queue. We also send all containers from the view's ENTITY_REPLICATION_RECONCILIATION queue.

  2. The view worker responds to the update message and updates the view, then changes it state back to 'available'.

  3. The view query message is again pulled from the query queue, jumping back to step one.

If there are any changes to any files in the view during the above process, it will trigger the above cycle to repeat. Each time the cycle repeats, all of the containers from the view's scope get pushed back to the ENTITY_REPLICATION_RECONCILIATION queue. The entity replication reconciliation process is expensive and slow, so the above cycle is adding messages to the queue faster than the worker is processing the messages.

John Hill
January 10, 2020, 9:00 PM

In the above cycle the view query message is the cycle driver. With our fix for PLFM-5954, we will be breaking the cycle because we will no longer check if a view is up-to-date before running the view query. Instead, if the view is available, the query will be run.

John Hill
January 17, 2020, 1:03 AM

With the fix for we break cycle that pushes the message to the reconciliation queue by immediately running the query instead of putting the query message back on the queue.

Xavier Schildwachter
February 3, 2020, 7:46 AM

Looks good. Verified fixed 294.0*.

Fixed

Assignee

John Hill

Reporter

Xavier Schildwachter

Labels

None

Validator

Xavier Schildwachter

Development Area

Synapse Core Infrastructure

Release Version History

None

Components

Fix versions

Priority

Major
Configure