Last night and tonight, we noticed two burst of messages on the queue (~12M and ~7M).
Note: quick drop in number of msgs visible is us purging the queue.
Collected some msgs on the queue:
Here is the order of events that cause this issue:
User runs a query against a view. If the view is currently 'available' a check will be made to determine if the view is up-to-date. If the view is out-of-date, the view state is changed to 'processing' to trigger a rebuild and the query message is returned to its queue. We also send all containers from the view's ENTITY_REPLICATION_RECONCILIATION queue.
The view worker responds to the update message and updates the view, then changes it state back to 'available'.
The view query message is again pulled from the query queue, jumping back to step one.
If there are any changes to any files in the view during the above process, it will trigger the above cycle to repeat. Each time the cycle repeats, all of the containers from the view's scope get pushed back to the ENTITY_REPLICATION_RECONCILIATION queue. The entity replication reconciliation process is expensive and slow, so the above cycle is adding messages to the queue faster than the worker is processing the messages.
In the above cycle the view query message is the cycle driver. With our fix for PLFM-5954, we will be breaking the cycle because we will no longer check if a view is up-to-date before running the view query. Instead, if the view is available, the query will be run.
With the fix for we break cycle that pushes the message to the reconciliation queue by immediately running the query instead of putting the query message back on the queue.
Looks good. Verified fixed 294.0*.