-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't choke on invalid UTF-8 in file
output
#1298
Conversation
Sometimes `file` output contains data from the file under examination, which may include binary data that does not represent valid UTF-8 codepoints. String#split dies if it doesn't understand the encoding, so tell Ruby to treat `file` output as a bytestring.
cc @jawshooah |
A failing line looks like: |
path, info = line.split("\0") | ||
next unless info.to_s.include?("text") | ||
path, info = line.split("\0", 2) | ||
next unless info.include?("text") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The to_s
needs to stay for the reasons given in #1273 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, gross, okay! this is surprising enough i think being more explicit about it is good
Thanks for the reviews! |
Sometimes
file
output contains data from the file under examination,which may include binary data that does not represent valid UTF-8
codepoints. String#split dies if it doesn't understand the encoding, so
tell Ruby to treat
file
output as a bytestring.