Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: UTF-8 support #3344

Open
axelson opened this issue Feb 5, 2022 · 11 comments
Open

Feature Request: UTF-8 support #3344

axelson opened this issue Feb 5, 2022 · 11 comments

Comments

@axelson
Copy link
Contributor

axelson commented Feb 5, 2022

It seems likely that a future version of the Language Server Protocol will allow servers and clients to support UTF-8 instead of UTF-16:
microsoft/language-server-protocol#376 (comment)

I'd like to request that lsp-mode support UTF-8 as an encoding. One benefit is to avoid #2080 and it will also hopefully have a speed benefit when compared to full UTF-16 support.

Rust Analyzer LSP Server has already implemented their own version of UTF-8 negotiation although I don't have a link handy.

@yyoncho
Copy link
Member

yyoncho commented Feb 5, 2022

ATM we run using utf-8. we should fix #2080 in a way to allow using utf8 when the server supports that.

@ericdallo
Copy link
Member

I'd love to implement that on clojure-lsp side, although have no clue what would be like the necessary changes

@ddickstein
Copy link

ATM we run using utf-8

@yyoncho did you mean to say UTF-16 here?

@matklad
Copy link
Contributor

matklad commented Nov 19, 2022

Specific improvement here would be to add "positionEncodings": ["utf-8", "utf-16"] somewhere here:

lsp-mode/lsp-mode.el

Lines 3530 to 3625 in b4e8aac

(defun lsp--client-capabilities (&optional custom-capabilities)
"Return the client capabilities appending CUSTOM-CAPABILITIES."
(append
`((workspace . ((workspaceEdit . ((documentChanges . t)
(resourceOperations . ["create" "rename" "delete"])))
(applyEdit . t)
(symbol . ((symbolKind . ((valueSet . ,(apply 'vector (number-sequence 1 26)))))))
(executeCommand . ((dynamicRegistration . :json-false)))
,@(when lsp-enable-file-watchers '((didChangeWatchedFiles . ((dynamicRegistration . t)))))
(workspaceFolders . t)
(configuration . t)
,@(when lsp-semantic-tokens-enable
`((semanticTokens . ((refreshSupport . ,(or (and (boundp 'lsp-semantic-tokens-honor-refresh-requests)
lsp-semantic-tokens-honor-refresh-requests)
:json-false))))))
,@(when lsp-lens-enable '((codeLens . ((refreshSupport . t)))))
(fileOperations . ((didCreate . :json-false)
(willCreate . :json-false)
(didRename . t)
(willRename . t)
(didDelete . :json-false)
(willDelete . :json-false)))))
(textDocument . ((declaration . ((dynamicRegistration . t)
(linkSupport . t)))
(definition . ((dynamicRegistration . t)
(linkSupport . t)))
(references . ((dynamicRegistration . t)))
(implementation . ((dynamicRegistration . t)
(linkSupport . t)))
(typeDefinition . ((dynamicRegistration . t)
(linkSupport . t)))
(synchronization . ((willSave . t) (didSave . t) (willSaveWaitUntil . t)))
(documentSymbol . ((symbolKind . ((valueSet . ,(apply 'vector (number-sequence 1 26)))))
(hierarchicalDocumentSymbolSupport . t)))
(formatting . ((dynamicRegistration . t)))
(rangeFormatting . ((dynamicRegistration . t)))
(onTypeFormatting . ((dynamicRegistration . t)))
,@(when (and lsp-semantic-tokens-enable
(functionp 'lsp--semantic-tokens-capabilities))
(lsp--semantic-tokens-capabilities))
(rename . ((dynamicRegistration . t) (prepareSupport . t)))
(codeAction . ((dynamicRegistration . t)
(isPreferredSupport . t)
(codeActionLiteralSupport . ((codeActionKind . ((valueSet . [""
"quickfix"
"refactor"
"refactor.extract"
"refactor.inline"
"refactor.rewrite"
"source"
"source.organizeImports"])))))
(resolveSupport . ((properties . ["edit" "command"])))
(dataSupport . t)))
(completion . ((completionItem . ((snippetSupport . ,(cond
((and lsp-enable-snippet (not (featurep 'yasnippet)) t)
(lsp--warn (concat
"Yasnippet is not installed, but `lsp-enable-snippet' is set to `t'. "
"You must either install yasnippet, or disable snippet support."))
:json-false)
(lsp-enable-snippet t)
(t :json-false)))
(documentationFormat . ["markdown" "plaintext"])
;; Remove this after jdtls support resolveSupport
(resolveAdditionalTextEditsSupport . t)
(insertReplaceSupport . t)
(deprecatedSupport . t)
(resolveSupport
. ((properties . ["documentation"
"detail"
"additionalTextEdits"
"command"])))
(insertTextModeSupport . ((valueSet . [1 2])))))
(contextSupport . t)
(dynamicRegistration . t)))
(signatureHelp . ((signatureInformation . ((parameterInformation . ((labelOffsetSupport . t)))))
(dynamicRegistration . t)))
(documentLink . ((dynamicRegistration . t)
(tooltipSupport . t)))
(hover . ((contentFormat . ["markdown" "plaintext"])
(dynamicRegistration . t)))
,@(when lsp-enable-folding
`((foldingRange . ((dynamicRegistration . t)
,@(when lsp-folding-range-limit
`((rangeLimit . ,lsp-folding-range-limit)))
,@(when lsp-folding-line-folding-only
`((lineFoldingOnly . t)))))))
(selectionRange . ((dynamicRegistration . t)))
(callHierarchy . ((dynamicRegistration . :json-false)))
(typeHierarchy . ((dynamicRegistration . t)))
(publishDiagnostics . ((relatedInformation . t)
(tagSupport . ((valueSet . [1 2])))
(versionSupport . t)))
(linkedEditingRange . ((dynamicRegistration . t)))))
(window . ((workDoneProgress . t)
(showDocument . ((support . t))))))
custom-capabilities))

Just signaling support for utf8 should be enough to solve the problems for servers which support utf8

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#positionEncodingKind

@teor2345
Copy link

teor2345 commented Feb 3, 2023

I just ran into this issue today, it still crashes rust-analyzer.

Can we get at least one of these fixes applied?

@ddickstein
Copy link

ddickstein commented Feb 3, 2023 via email

@lnicola
Copy link

lnicola commented Feb 10, 2023

@ddickstein I'm not a lsp-mode user, but my understanding is that yes, lsp-mode still sends UTF-8 offsets while the protocol mandates UTF-16. This has probably been reported dozens times as a rust-analyzer bug (because lsp-mode corrupts our state), but most likely affects other language servers just as well.

Advertising UTF-8 offset encoding might fix RA, but it's optional, so servers which don't implement it will still have problems. But it fixes our problem, it's better for performance, and the UTF-16 encoding can still be fixed in the future, maybe like in yyoncho@aac9b6a.

@hammerandtongs
Copy link

As probably noticed elsewhere, fasterthanli did a writeup of this issue -

https://fasterthanli.me/articles/the-bottom-emoji-breaks-rust-analyzer#the-language-server-protocol

@yyoncho I hope this is useful and not annoying, thanks for all your help over the years, lsp-mode has been awesome and I've never noticed this bug.

@yyoncho
Copy link
Member

yyoncho commented Feb 13, 2023

Can someone using the patch report if there are noticeable perf impact? If no - I will merge it as default option.

@ddickstein
Copy link

@yyoncho can you confirm @lnicola's reply above is correct (namely, that already UTF-8 is supported just not declared, and UTF-16 is declared but has a bug), and if so can UTF-8 support be declared and this issue closed and a separate issue be made to track the bug in UTF-16?

@lnicola
Copy link

lnicola commented Mar 29, 2023

It's incorrect. It seems that the "native" encoding of Emacs is UTF-32, which lsp-mode now advertises. So UTF-8 is still unimplemented, but that's fine from rust-analyzer's point of view because we now support UTF-32.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants