-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C Code detected as LESS Code #3184
Comments
Worth noting the latest version (11 beta) does correctly identify this as C++ code, but it's close. There is simply not enough actual code here (signal to noise). Our language detection is merely our best effort (based on counting keywords, etc), not best in class. #1213 Often auto-detect does a good job, but not always. When auto-detect is confused tiny changes to the code can affect what language is identified, trying to read too much into what those changes are typically isn't super helpful. To correctly highlight as C++ always you should always manually specify the language with |
2b9ca9f might also help slightly, but core problem again is auto-detect is simply not perfect and shouldn't be relied on if accuracy is required. |
Thanks for the commit. So accuracy might just not be possible in my case. As in my specific case all user content is end to end encrypted. Because of the length of these strings putting another field in the database for language could compromise user privacy even when encrypted the byte length would be telling as to what language a paste is in or at a minimum would narrow it down to a 4 / 8 character language. It's additional data I don't want to ask for or store. |
Sounds like a problem specific to your use case - if the code itself can be encrypted/decrypted then I'm unsure why the language can't be also... no one mandates that you store it in a second field - or that you encrypted it such that the length is easily guessable. Those would both seem to be issues of implementation rather than issues inherent in the problem itself.
Then indeed accuracy may not be possible. Or you can use some other entirely different heuristic for detecting the language and then ask us to highlight the language you detect via that heuristic... |
Closing via 2b9ca9f and "auto detect isn't perfect, sadly". |
Describe the issue
I have an application which uses the JavaScript
fetch()
API to fetch some data and render it on a page. After the data is fetched and decrypted, highlight.js init is called and the code is highlighted as LESS instead of C. I've narrowed this down to a single line comment at the top of a file. It's usually then erroneously detected as CSS unless the comment is a link.Example Page: https://paste.is/p/v/264ce2ae-c9c0-40f9-9863-61f1b0c3fd1b#NXZNeWVpM3lEV2xpNXZENldOOE5qaENUYlE5eHpoajJYTUJUblViQlZXWHY2dTJTYTFrWmVKeXZON0NHZ2haUQ==
Which language seems to have the issue?
The issue is with C.
Are you using
highlight
orhighlightAuto
?I believe I am using highlightAuto
...
Sample Code to Reproduce
The code using highlightjs is
The code I need to highlight is:
Now I did some research on this issue. First I tried turning off the paste encryption just to make sure the fetch and decrypt wasn't causing some weird issue. Then I found something interesting, putting the comment
// https://raw.githubusercontent.com/openjdk/jdk/master/src/java.base/windows/native/libjava/VM_md.c
is what causes the code to be detected as LESS. If I remove said comment instead of the code being detected as C it is detected as JavaScript also wrong but hey more accurate than LESS. See: https://paste.is/p/v/8ad68734-1054-4f4e-b787-22d9e5158c6fExpected behavior
C Code with a
// comment
at the top should be detected as C and not as LESS.Additional context
tags or is it an HTML feature on
Element.innerText =
property?The text was updated successfully, but these errors were encountered: