Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Make sure binary files are not checked. #5780

Merged
merged 1 commit into from
Jun 19, 2024
Merged

fix: Make sure binary files are not checked. #5780

merged 1 commit into from
Jun 19, 2024

Conversation

Jason3S
Copy link
Collaborator

@Jason3S Jason3S commented Jun 19, 2024

fixes #5779

  • add .mp4 to the video list
  • treat unknown file types that contain 0x00 as binary.

fixes #5779

- add `.mp4` to the video list
- treat unknown file types that contain 0x00 as binary.
@github-actions github-actions bot added the fix label Jun 19, 2024
Copy link
Contributor

github-actions bot commented Jun 19, 2024

Performance Report

Repository Elapsed Min/Avg/Max SD SD Graph
AdaDoom3/AdaDoom3 12.35 11.9 / 12.3 / 13.9 0.37 ┣━━┻━━●━━┻━━┫
alexiosc/megistos 29.95 29.5 / 31.3 / 33.0 0.80 ┣●━┻━━╋━━┻━━┫
apollographql/apollo-server 6.72 6.4 / 6.8 / 7.0 0.17 ┣━┻━━●━━┻━┫
aspnetboilerplate/aspnetboilerplate 25.07 23.3 / 24.5 / 25.8 0.69 ┣━━┻━━╋━●┻━━┫
aws-amplify/docs 39.67 35.3 / 36.8 / 39.0 0.97 ┣━━┻━━╋━━┻━━┫ ●
Azure/azure-rest-api-specs 31.56 30.1 / 32.1 / 34.3 1.16 ┣━━┻━●╋━━┻━━┫
bitjson/typescript-starter 0.87 0.8 / 0.9 / 1.1 0.05 ┣━━┻━●━┻━━┫
caddyserver/caddy 11.73 11.2 / 11.7 / 12.6 0.34 ┣━━┻━━●━━┻━━┫
canada-ca/open-source-logiciel-libre 1.01 0.9 / 1.0 / 1.1 0.03 ┣━━┻━╋●┻━━┫
chef/chef 21.81 19.8 / 20.6 / 21.8 0.55 ┣━━┻━━╋━━┻━━┫●
django/django 52.55 50.2 / 52.8 / 55.5 1.19 ┣━━━┻━●╋━━┻━━━┫
eslint/eslint 31.03 29.7 / 31.8 / 33.7 0.94 ┣━━┻●━╋━━┻━━┫
exonum/exonum 12.18 11.3 / 11.7 / 12.3 0.25 ┣━━┻━━╋━━┻━●┫
gitbucket/gitbucket 6.73 6.7 / 7.0 / 7.5 0.22 ┣━━●━━╋━━┻━━┫
googleapis/google-cloud-cpp 385.67 378.1 / 393.2 / 419.4 10.17 ┣━━━┻●━━╋━━━┻━━━┫
graphql/express-graphql 0.92 0.9 / 0.9 / 1.0 0.03 ┣━━┻━●━┻━━┫
graphql/graphql-js 5.92 5.7 / 6.0 / 6.6 0.18 ┣━┻━●╋━━┻━┫
graphql/graphql-relay-js 0.94 0.9 / 0.9 / 1.1 0.05 ┣━━┻━●━┻━━┫
graphql/graphql-spec 1.84 1.8 / 1.9 / 2.0 0.05 ┣━●┻━╋━┻━━┫
iluwatar/java-design-patterns 35.25 31.1 / 33.0 / 36.2 1.12 ┣━━┻━━╋━━┻━━●
ktaranov/sqlserver-kit 23.01 22.1 / 23.4 / 25.3 0.65 ┣━━┻●━╋━━┻━━┫
liriliri/licia 7.60 7.5 / 8.0 / 8.7 0.29 ┣━●┻━━╋━━┻━━┫
MartinThoma/LaTeX-examples 14.42 12.8 / 14.0 / 15.2 0.47 ┣━━┻━━╋━━●━━┫
mdx-js/mdx 4.00 3.7 / 3.8 / 4.1 0.10 ┣━┻━━╋━━┻●┫
microsoft/TypeScript-Website 17.62 17.1 / 18.5 / 20.6 0.98 ┣━━●━━╋━━┻━━┫
MicrosoftDocs/PowerShell-Docs 88.82 85.0 / 89.0 / 94.4 2.00 ┣━━━┻━━●━━┻━━━┫
neovim/nvim-lspconfig 8.99 8.6 / 9.1 / 9.7 0.23 ┣━━┻━●╋━━┻━━┫
pagekit/pagekit 7.35 7.4 / 7.7 / 8.3 0.24 ┣━●┻━━╋━━┻━━┫
php/php-src 115.26 110.5 / 116.3 / 139.3 5.03 ┣━━┻━━●╋━━━┻━━┫
plasticrake/tplink-smarthome-api 1.53 1.5 / 1.5 / 1.7 0.05 ┣━━┻━●━┻━━┫
prettier/prettier 13.71 12.8 / 13.4 / 14.1 0.33 ┣━━┻━━╋━━●━━┫
pycontribs/jira 2.60 2.5 / 2.7 / 3.1 0.11 ┣━┻●━╋━━┻━┫
RustPython/RustPython 15.06 13.9 / 14.5 / 15.9 0.41 ┣━━┻━━╋━━┻●━┫
shoelace-style/shoelace 7.17 7.1 / 7.5 / 8.5 0.31 ┣━━●━━╋━━┻━━┫
SoftwareBrothers/admin-bro 4.84 4.5 / 4.7 / 5.0 0.13 ┣━┻━━╋━━┻●┫
sveltejs/svelte 38.00 36.6 / 37.9 / 39.5 0.78 ┣━━┻━━●━━┻━━┫
TheAlgorithms/Python 16.87 16.2 / 17.0 / 18.1 0.42 ┣━━┻━●╋━━┻━━┫
twbs/bootstrap 3.79 3.6 / 3.7 / 4.0 0.11 ┣━┻━━╋━●┻━┫
typescript-cheatsheets/react 2.12 2.0 / 2.1 / 2.3 0.07 ┣━━┻━●━┻━━┫
typescript-eslint/typescript-eslint 6.52 6.3 / 6.7 / 7.5 0.26 ┣━━┻●━╋━━┻━━┫
w3c/aria-practices 9.72 9.2 / 9.7 / 10.5 0.31 ┣━━┻━━●━━┻━━┫
w3c/specberus 2.91 2.8 / 3.0 / 3.3 0.10 ┣━┻●━╋━━┻━┫
webdeveric/webpack-assets-manifest 0.86 0.8 / 0.8 / 1.0 0.04 ┣━━┻━╋●┻━━┫
webpack/webpack 12.42 11.5 / 12.2 / 13.0 0.33 ┣━━┻━━╋━●┻━━┫
wireapp/wire-desktop 1.32 1.3 / 1.4 / 1.6 0.07 ┣━━┻●╋━┻━━┫
wireapp/wire-webapp 21.60 19.5 / 21.5 / 22.8 0.64 ┣━━┻━━╋●━┻━━┫
Repository Elapsed Rel Trend Count
AdaDoom3/AdaDoom3 12.35 0.33% ▅▂▃▃▄▄▃▃▂▄▄▃▂▃▄▃▅▆▄▄ 31
alexiosc/megistos 29.95 -4.21% ▄▃▄▃▂▃▄▄▂▄▃▃▄ ▂▃▅▅▅▁ 31
apollographql/apollo-server 6.72 -0.44% ▃▂▅▅▁▂▅▅▃▃▄▆▃▃▆▅▃▃▃ 31
aspnetboilerplate/aspnetboilerplate 25.07 2.21% ▄▇▅▃▃▃▃▆▇▃▄▆▃▃▄▄▅▇▃▅ 32
aws-amplify/docs 39.67 7.70% ▄▄▃▂▆▂▄▁▃▄▄▄▂▂▄▂▄▇▆█ 32
Azure/azure-rest-api-specs 31.56 -1.67% ▅▅▃▅▃▃▄▅▃▃▅▁▄▃▅▆▅▄▄▃ 32
bitjson/typescript-starter 0.87 -0.32% ▃▃▅▃▃▃▃▂▄▂▇▄▃▄▄▃▄▂▃▄ 31
caddyserver/caddy 11.73 0.04% ▄▆▁▃▃▂▅▆▃▂▂▃▆▄▄▄▄▄▆▄ 32
canada-ca/open-source-logiciel-libre 1.01 1.71% ▃▃▃▂▇▅▄▃▃▃▅▂▅▂▃ ▄▃▅▅ 31
chef/chef 21.81 5.77% ▄▁▃▃▄▃▂▃▂▂▁▅▂▃▄▇▃▃▄▇ 32
django/django 52.55 -0.43% ▅▁▃▃▅▂▃▅▃▄▂▄▄▃▇▆▃▄▅▃ 32
eslint/eslint 31.03 -2.32% ▃▃▂▂▃▂▂▇▅▃▅▄▄▄▂▆▄▄▇▂ 32
exonum/exonum 12.18 3.97% ▅▂▃▄▇▃▃▄▂▃▄▄█▃▁▂▄▃▂▇ 31
gitbucket/gitbucket 6.73 -3.60% ▃▄▁▆▆▁▄▃▅▃▅▄▂▂▇▃▅▄▄▂ 32
googleapis/google-cloud-cpp 385.67 -1.92% ▁▆▂▂▄▄▅▃▂▅▇▄▂▄▃▄▅▅█▃ 34
graphql/express-graphql 0.92 -0.25% ▄▂█▅▂▅▂▄▂▄▂▂▄▂▃▂▄▆▄▄ 31
graphql/graphql-js 5.92 -1.52% ▅▁▂▂▅▂▂▄▄█▄▅▄▄▃▂▂▃▃▃ 32
graphql/graphql-relay-js 0.94 -0.39% ▃▇▃▃▃▅▃▄▃▃▂▃▃▂▂▂▃▅█▄ 31
graphql/graphql-spec 1.84 -3.35% ▂▃▂▂▃▂▂▂▇▃▃▄▄▅▃▅▅▂▇▁ 32
iluwatar/java-design-patterns 35.25 6.91% ▄▃▃▄█▄▄▅▄▄▄▅▄▅▅▃▇▄▆▇ 32
ktaranov/sqlserver-kit 23.01 -1.46% ▂▃▃▆▄▄█▃▃▁▃▁▄▃▅▄▄▇▃▃ 31
liriliri/licia 7.60 -5.35% ▆▁▃▂▃▁▃▅▂▄▄▄▆▇▄▂▆▇▅▁ 32
MartinThoma/LaTeX-examples 14.42 3.05% ▄▄▂▃▄▇▃▃▄▃▄▂▅▆▄▆▅▃▅▅ 31
mdx-js/mdx 4.00 4.14% ▃▆▂▇▄▁▃▅▆▂▄▆▅▅▄▄▅▂▃▆ 31
microsoft/TypeScript-Website 17.62 -4.94% ▅▅▅▅▆▆▅▅▃▂▂▃▃▃▂▃▃▃▂▂ 32
MicrosoftDocs/PowerShell-Docs 88.82 -0.21% ▅▄▄▁▅▃▄▆▂▄▄▄▃▁█▂▄▄▆▄ 32
neovim/nvim-lspconfig 8.99 -0.93% ▇▄▄▂▂▃▄▆▅▄▅▄▄▅▃▅▄▂▄▃ 32
pagekit/pagekit 7.35 -4.60% ▄▃▂▃▃▂▂▃▇▃▄▂▄▄▂▄▂▃▅▁ 31
php/php-src 115.26 -0.85% ▄▅▂▃▂▅▄▂▃▃▃█▂▃▅▄▄▄▄▃ 33
plasticrake/tplink-smarthome-api 1.53 -0.58% ▅▃▂▃▂▅▂▃▃▂▅▅▄▃█▄▄▄▄▃ 31
prettier/prettier 13.71 2.21% ▃▅▁▃▃▁▃▂▆▅▇▁▅▃▃▅▂▅▄▅ 32
pycontribs/jira 2.60 -3.26% ▄▅▃▂▄▅▆▄▄▄▄▄▃▂▃▃▄▃█▂ 31
RustPython/RustPython 15.06 3.83% ▃▅▃▆▃▃▅▂▃▅▄▃▂▂▅▅▃▃▃▆ 32
shoelace-style/shoelace 7.17 -4.00% ▄▃▅▂▂█▄▃▇▄▃▄▃▂▃▂▂▂▃▂ 32
SoftwareBrothers/admin-bro 4.84 3.85% ▂▄▆▄▂▂▁▃▃▄▄▃▅▃▃█▃▆▃▆ 31
sveltejs/svelte 38.00 0.14% ▅▄▃▄▇▄▆▄▃▅▅▂▃▅▄▃▇▄▃▄ 32
TheAlgorithms/Python 16.87 -0.51% ▂▃▄▃▃▁▅▅▄▅▂▃▂▅▄▄▆█▃▃ 32
twbs/bootstrap 3.79 1.78% ▆▃▅▂▇▃▂▄▅▄▄▃▆▂▁▇▄▅▄▅ 32
typescript-cheatsheets/react 2.12 0.01% ▆▅▂▃▄▄▂▅▃▃▂▂▇▄▆▄▃▂▄▄ 31
typescript-eslint/typescript-eslint 6.52 -2.45% █▆▃▃▂▅▇▅▅▄▄▄▄▄▃▃▃▃▃▃ 32
w3c/aria-practices 9.72 -0.27% ▄▄▂▅▁▂▆▇▂▄▄▃▄▂▂▃▃▅▄▄ 31
w3c/specberus 2.91 -2.01% ▄▃▃▄▂▃▅▅█▄▂▄▂▃▄▄▂▃▅▃ 32
webdeveric/webpack-assets-manifest 0.86 2.16% ▃▃▃█▃▅▃▃▃▃▄▃▄▃▃▃▃▃▄▄ 31
webpack/webpack 12.42 1.76% ▂▅█▂▄▃▅▅▄▃▂▃▃▂▆▂▃▄▇▅ 32
wireapp/wire-desktop 1.32 -2.85% █▂▇▂▂▃▂▂▅▃▃▃▃▃▄▃▃▃▄▃ 32
wireapp/wire-webapp 21.60 0.55% ▃▄▆▄▄▃▆▅▂▃▃▄▆▄▂▃▄▄▅▄ 32

Note:

  • Elapsed time is in seconds. The trend graph shows the last 10 runs.
    The SD graph shows the current run relative to the average and standard deviation.
  • Rel is the relative change from the average.

const ids = getLanguagesForBasename(file);
if (ids.length) return isGenerated(ids);
// Unknown file type, check the content.
return text?.slice(0, 1024).includes('\u0000') || false;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you are looking for \0 to check if file is a binary ? how can be sure it's statiscally correct ?

maybe a lib like this ?

https://github.com/bevry/istextorbinary
https://github.com/gjtorikian/isBinaryFile

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccoVeille,

Thank you for taking a look. In this case we are only looking at the contents of an unknown file type. Since the file loader already converts UTF16 to text, the occurrence of \0 should be minimal (since \0 doesn't occur is most text files).

In any case, I'll take a look at the suggested libraries.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ccoVeille,

  • I took a look at isBinaryFile. It mostly does exactly the same thing, searching for \0 or null as they call it. It also checks for weird character combinations.

  • istextorbinary does look at the file name to and then at the content (exactly like above), but it only checks to see if the data is UTF8. It does seem to include two other packages textextensions and binaryextensions. I'll take a look to see if their lists are more extensive.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. I think you code is robust enough then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine. Especially since it only runs when it is an unknown file type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: don't scan .mp4
2 participants