Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix resync remote devices on receive PDU in worker mode. #7815

Merged
merged 5 commits into from
Jul 10, 2020

Conversation

erikjohnston
Copy link
Member

The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.

The replication client requires that arguments are given as keyword
arguments, which was not done in this case. We also pull out the logic
so that we can catch and handle any exceptions raised, rather than
leaving them unhandled.
@erikjohnston erikjohnston requested a review from a team July 10, 2020 09:28
"""

try:
await self.store.mark_remote_user_device_cache_as_stale(sender)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this function will cause the next iteration of the resync retry loop (introduced in #7453) to retry this device list, which means that we can have 2 resync happen at the same time for a given user. This may be fine, but at least we should be aware of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. I think its still correct to go and fire off a resync, as a) we don't want to wait for the loop to come round necessarily, b) its safe and c) I don't really want to assume that the loop will come round quickly or even retry this particular user (it might have backoff logic etc)

Copy link
Contributor

@babolivier babolivier Jul 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it makes sense, again I think it's fine as long as we're aware of it :)
. o O ( we could avoid this issue by storing and checking an in-memory list of the users we're currently resyncing (i.e. update it in user_device_resync) but that's probably out of scope here ) ignore me, that still wouldn't work, and it's not that big a deal anyway.

@erikjohnston erikjohnston requested a review from a team July 10, 2020 10:04
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you give us some clues about the symptoms of this bug? was it throwing exceptions?

@@ -784,15 +784,23 @@ async def _process_received_pdu(
resync = True

if resync:
await self.store.mark_remote_user_device_cache_as_stale(event.sender)
run_in_background(self._resync_device, event.sender)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're here, can you fix this to be run_as_background_process please.

return run_in_background(
self._device_list_updater.user_device_resync, event.sender
)
async def _resync_device(self, sender: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async def _resync_device(self, sender: str):
async def _resync_device(self, sender: str) -> None:

@erikjohnston
Copy link
Member Author

erikjohnston commented Jul 10, 2020

This produces the glorious stack trace of the following, I don't think it had any other observable behaviour.

2020-07-10 13:31:17,205 - twisted - 192 - CRITICAL -  - Unhandled error in Deferred:
2020-07-10 13:31:17,215 - twisted - 192 - CRITICAL -  - 
Capture point (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/synapse/src/synapse/app/federation_reader.py", line 24, in <module>
    start(sys.argv[1:])
  File "/home/synapse/src/synapse/app/generic_worker.py", line 1046, in start
    _base.start_worker_reactor("synapse-generic-worker", config)
  File "/home/synapse/src/synapse/app/_base.py", line 80, in start_worker_reactor
    run_command=run_command,
  File "/home/synapse/src/synapse/app/_base.py", line 140, in start_reactor
    daemon.start()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/daemonize.py", line 248, in start
    self.action(*privileged_action_result)
  File "/home/synapse/src/synapse/app/_base.py", line 117, in run
    run_command()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1283, in run
    self.mainLoop()
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 1292, in mainLoop
    self.runUntilCurrent()
  File "/home/synapse/src/synapse/metrics/__init__.py", line 517, in f
    ret = func(*args, **kwargs)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/base.py", line 886, in runUntilCurrent
    f(*a, **kw)
...
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/synapse/src/synapse/util/async_helpers.py", line 157, in _concurrently_execute_inner
    await maybe_awaitable(func(next(it)))
  File "/home/synapse/src/synapse/federation/federation_server.py", line 281, in process_pdus_for_room
    await self._handle_received_pdu(origin, pdu)
  File "/home/synapse/src/synapse/federation/federation_server.py", line 660, in _handle_received_pdu
    await self.handler.on_receive_pdu(origin, pdu, sent_to_us_directly=True)
  File "/home/synapse/src/synapse/handlers/federation.py", line 416, in on_receive_pdu
    await self._process_received_pdu(origin, pdu, state=state)
  File "/home/synapse/src/synapse/handlers/federation.py", line 791, in _process_received_pdu
    return run_in_background(self._user_device_resync, event.sender)
  File "/home/synapse/src/synapse/logging/context.py", line 695, in run_in_background
    res = f(*args, **kwargs)
  File "/home/synapse/src/synapse/logging/opentracing.py", line 745, in _trace_inner
    result = func(*args, **kwargs)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1613, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1529, in _cancellableInlineCallbacks
    _inlineCallbacks(None, g, status)
Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/synapse/env-py37/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
  File "/home/synapse/src/synapse/replication/http/_base.py", line 159, in send_request
    "Instance %r not in 'instance_map' config" % (instance_name,)
Exception: Instance '@witchent:zapdos.my-router.de' not in 'instance_map' config

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@erikjohnston erikjohnston merged commit f1245dc into develop Jul 10, 2020
@erikjohnston erikjohnston deleted the erikj/fix_worker_fderation_device_resync branch July 10, 2020 17:23
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit 'f1245dc3c':
  Fix resync remote devices on receive PDU in worker mode. (#7815)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants