Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't choke on invalid UTF-8 in file output #1298

Merged
merged 2 commits into from
Oct 16, 2016

Conversation

tdsmith
Copy link
Contributor

@tdsmith tdsmith commented Oct 16, 2016

Sometimes file output contains data from the file under examination,
which may include binary data that does not represent valid UTF-8
codepoints. String#split dies if it doesn't understand the encoding, so
tell Ruby to treat file output as a bytestring.

Sometimes `file` output contains data from the file under examination,
which may include binary data that does not represent valid UTF-8
codepoints. String#split dies if it doesn't understand the encoding, so
tell Ruby to treat `file` output as a bytestring.
@tdsmith tdsmith added the in progress Maintainers are working on this label Oct 16, 2016
@tdsmith
Copy link
Contributor Author

tdsmith commented Oct 16, 2016

cc @jawshooah

@tdsmith
Copy link
Contributor Author

tdsmith commented Oct 16, 2016

A failing line looks like:
"/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/__pycache__/reprlib.cpython-35.opt-2.pyc\u0000: dBase III DBT, version number 0, next free block index 168627478, 1st item \"\x83\u0001\"\n"

path, info = line.split("\0")
next unless info.to_s.include?("text")
path, info = line.split("\0", 2)
next unless info.include?("text")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The to_s needs to stay for the reasons given in #1273 (comment).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, gross, okay! this is surprising enough i think being more explicit about it is good

@tdsmith tdsmith merged commit de880f1 into Homebrew:master Oct 16, 2016
@tdsmith tdsmith deleted the no-no-bad-unicode branch October 16, 2016 16:47
@tdsmith tdsmith removed the in progress Maintainers are working on this label Oct 16, 2016
@tdsmith
Copy link
Contributor Author

tdsmith commented Oct 16, 2016

Thanks for the reviews!

@Homebrew Homebrew locked and limited conversation to collaborators May 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants