Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ark: Synchronise handling of LSP messages #2999

Closed
lionel- opened this issue May 3, 2024 · 1 comment · Fixed by posit-dev/ark#361
Closed

Ark: Synchronise handling of LSP messages #2999

lionel- opened this issue May 3, 2024 · 1 comment · Fixed by posit-dev/ark#361
Assignees
Labels
area: kernels Issues related to Jupyter kernels and LSP servers bug Something isn't working lang: r

Comments

@lionel-
Copy link
Contributor

lionel- commented May 3, 2024

While investigating #2692, @DavisVaughan discovered this thread discussing synchronisation issues within tower-lsp, the LSP server implementation used in Ark: ebkalderon/tower-lsp#284.

The summary is that while message ordering is preserved by tower-lsp at all times, because handlers are async, the ordering is likely to be compromised by yielding with .await. And we call .await a lot, for instance at handler entry to log a message in our LSP output. This causes two main issues:

  1. Stale state. A did_change_ state-changing notification might be invoked and grab the document lock before a refactoring/analysis request had a chance to read state. When this request is yielded control back, suddenly the world has changed and the cursor positions or ranges that the request is working on have become stale. We think this explains crashes like Ark: LSP crash while looking for completions or creating new document context #2692.

  2. Unordered responses. The LSP server should try its best to preserve ordering of messages, and it's even required to do so when the messages might change the document. Because of the concurrency of async handlers, this ordering is not guaranteed, causing the server and client states to drift apart.

To fix (1), we should synchronise our handlers on entry with a read-write lock. To fix (2), we should synchronise them on exit with some sort of queue.

This comment from the author of rust-analyzer is also relevant: https://reddit.com/r/rust/comments/tbsiv5/towerlsp_0160_lightweight_framework_for_building/i0exnbi/

while most of the requests can be handled asynchronously, a minority of requests are "real time", in that they block user's input (eg, selectionRanges). For such requests, you want to handle them "on the main loop" as soon as possible, like you'd do for state update notifications.

Worth noting that the ruff project recently decided (astral-sh/ruff#10158) to switch from tower-lsp to lsp-server (https://github.com/rust-lang/rust-analyzer/tree/master/lib/lsp-server), the framework created for rust-analyzer. This seems like a lot of work but allows greater control over the scheduling of handlers and avoids the tokio runtime.

@DavisVaughan
Copy link
Contributor

DavisVaughan commented May 3, 2024

Based on our new understanding of all this, I did another analysis of how we could possibly be getting supposedly "out of order" did_change() requests - which is what forced me to put that pending changes queue in place. I think it can be described as the following series of events:

  • Client sends did-change request A
  • Client sends did-change request B
  • tower-lsp forwards did-change request A
  • tower-lsp forwards did-change request B
  • Everything is fine up to this point
  • did_change(A)
    • Immediately calls backend_log() before locking the doc
    • await the backend_log()
  • The await allows for switching to did_change(B) immediately and we pause did_change(A)
  • did_change(B)
    • Immediately calls backend_log() before locking the doc
    • await the backend_log()
    • Assume we stay in did_change(B) this time instead of switching back to did_change(A)
    • Takes the lock on the doc
    • Tries to process the change, but it looks out of order! Saves B as a pending change.
  • did_change(A) finally gets control back
    • We process change A and the saved one from B

It's worth noting that if we had locked the doc before calling backend_log(), which is a totally reasonable thing to think you could do, then this would deadlock in this case (which we have seen many times in other scenarios), because when it switches to B and B tries to take the lock on the doc, it won't be able to until A releases it, but A can't release it until it gets control back. We are super careful about this in the diagnostics code, for example.

@lionel- lionel- added the area: kernels Issues related to Jupyter kernels and LSP servers label May 16, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: kernels Issues related to Jupyter kernels and LSP servers bug Something isn't working lang: r
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants