Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check Before Re-encoding #1

Merged
merged 2 commits into from
Apr 18, 2019
Merged

Conversation

ACBullen
Copy link

First commit is straightforward and harmless (to address: sporkmonger#254), second is potentially problematic, but I think represents what's actually wanted with all the force_encoding calls.

@QuinnWilton
Copy link

I think this looks good to me. The 2nd commit might be problematic, but I don't know what the implications of both choices are, and I think I agree with your assessment. I'd be curious to hear what Sporkmonger has to say about it if you PR these upstream.

@QuinnWilton
Copy link

Worth noting is that some URLs will still break for us (anything that is not UTF-8 will get force encoded), but I don't think that will happen in normal operation, and it might be okay for those URLs to fail anyway.

calling force_encoding on them

This prevents unnecessary mutation of these variables when already set
and appropriately encoded. Necessary for our use case as we freeze our
Addressable::URI objects to ensure they don't get changed when passed around
Calling force_encoding doesn't actually change anything about the string,
it just changes what the string considers its encoding to be, allowing you
to end up with strings with invalid encodings. There are some invocations
in Addressable::URI where this seems intentional (followed by some g_sub'ing)
but at least at the output stage it seems like the strings should be converted
and the non-utf-8 bytes/characters removed
@ACBullen ACBullen force-pushed the alexb/check_before_force_encoding branch from ba9af52 to bfc9a44 Compare April 18, 2019 00:04
@ACBullen
Copy link
Author

ACBullen commented Apr 18, 2019

Worth noting is that some URLs will still break for us (anything that is not UTF-8 will get force encoded), but I don't think that will happen in normal operation, and it might be okay for those URLs to fail anyway.

Since we call normalize on the Addressable::URI before freezing I believe we'll end up doing an encode! with replace everywhere we care about so there ought not be any non-utf-8 characters left anywhere we care about? I will re-check that though.

@QuinnWilton
Copy link

Ooh, I hadn't considered that, but that sounds excellent :)

@ACBullen ACBullen merged commit 98bc887 into master Apr 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants