-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Make sure binary files are not checked. #5780
Conversation
fixes #5779 - add `.mp4` to the video list - treat unknown file types that contain 0x00 as binary.
Performance Report
Note:
|
const ids = getLanguagesForBasename(file); | ||
if (ids.length) return isGenerated(ids); | ||
// Unknown file type, check the content. | ||
return text?.slice(0, 1024).includes('\u0000') || false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you are looking for \0
to check if file is a binary ? how can be sure it's statiscally correct ?
maybe a lib like this ?
https://github.com/bevry/istextorbinary
https://github.com/gjtorikian/isBinaryFile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking a look. In this case we are only looking at the contents of an unknown file type. Since the file loader already converts UTF16 to text, the occurrence of \0
should be minimal (since \0
doesn't occur is most text files).
In any case, I'll take a look at the suggested libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I took a look at isBinaryFile. It mostly does exactly the same thing, searching for
\0
ornull
as they call it. It also checks for weird character combinations. -
istextorbinary does look at the file name to and then at the content (exactly like above), but it only checks to see if the data is UTF8. It does seem to include two other packages
textextensions
andbinaryextensions
. I'll take a look to see if their lists are more extensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for checking. I think you code is robust enough then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine. Especially since it only runs when it is an unknown file type.
fixes #5779
.mp4
to the video list