Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlight confusable Unicode characters #136437

Closed
alexdima opened this issue Nov 4, 2021 · 3 comments · Fixed by #137508
Closed

Highlight confusable Unicode characters #136437

alexdima opened this issue Nov 4, 2021 · 3 comments · Fixed by #137508
Assignees
Labels
feature-request Request for new features or functionality insiders-released Patch has been released in VS Code Insiders on-testplan
Milestone

Comments

@alexdima
Copy link
Member

alexdima commented Nov 4, 2021

updated

UNTRUSTED WORKSPACES:

  • there's a feature on by default that non basic ASCII is highlighted --> boolean
    • defaults to true (false in trusted)
    • "editor.unicodeHighlight.nonBasicASCII"

TRUSTED WORKSPACES:

  • highlight unexpected invisibles code points (on by default) ---> boolean

    • defaults to true
    • some smartness if they appear in emojis
    • "editor.unicodeHighlight.invisibleCharacters"
    • does not highlight characters that appear in the vscode-loc translation files of the current language.
  • highlight unexpected confusable code points ---> boolean (linting feature)

    • on the worker side, have a set of dangerous chars (includes Cyrilic!) and return all ranges with them
    • create a set of 1000 characters ('a' cyrilic)
      • FIND false positives in repos nodejs/linux -- 900 are problematic or 100 or 10?
      • EXCEPTIONS (remove things from this set ("false positives"))
      • never highlight comments or strings (filter on the UI side, needs some work with background tokenization?)
      • comments, strings,
      • if your display language contains those
      • or if your keyboard layout produced them
      • Intl (system defined preferences?), like env.LOCALE
      • intersect with vscode-loc
    • defaults to true except for markdown, plain text, latex
    • "editor.unicodeHighlight.confusables"

-> try to start with NO setting to define further "exceptions" for confusables

@alexdima alexdima self-assigned this Nov 4, 2021
@alexdima alexdima added the feature-request Request for new features or functionality label Nov 4, 2021
@alexdima alexdima assigned hediet and unassigned alexdima Nov 5, 2021
@alexdima alexdima added this to the November 2021 milestone Nov 5, 2021
@Kroc
Copy link

Kroc commented Nov 11, 2021

Semi-related; I asked for a not-too-dissimilar feature some time ago: #1727
It was also mentioned by @roa-nyx as a potential security threat back in 2018. Why was this issue ignored and closed by the bot?

@alexdima
Copy link
Member Author

alexdima commented Nov 11, 2021

@Kroc I apologize for missing or not answering to your later comments from #1727. I don't exactly remember what happened back then, but I can speculate that perhaps the issue started out as a discussion about rendering all characters as monospace (like e.g. in a terminal) and I considered that to be a feature request when I saw that Chromium does the same rendering that we do.

Later the discussion mentioned that there might be security implications, but what was mentioned was watermarked content using zwj, which is not a security problem per se, it is a privacy problem (AFAIK watermarked content might be used to track down a specific person that might have copy-pasted content).

While this is a known problem since more than 10 years, I think the merit of https://www.trojansource.codes/ is that they have shown a way to weaponize it.

Also, AFAIK no editors today try to tackle confusables, like e.g.:

( а → a ) CYRILLIC SMALL LETTER A → LATIN SMALL LETTER A

I want to apologize if I missed your point. Please, if you find any security problems with VS Code use the steps documented at https://github.com/microsoft/vscode/blob/main/SECURITY.md to bring the security problem to our attention .

@alexdima
Copy link
Member Author

@hediet A first implementation could focus just on https://www.unicode.org/Public/security/14.0.0/IdentifierStatus.txt and would highlight Restricted code points.

@hediet hediet linked a pull request Nov 19, 2021 that will close this issue
1 task
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature-request Request for new features or functionality insiders-released Patch has been released in VS Code Insiders on-testplan
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@Kroc @hediet @alexdima and others