Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use correct byte size for truncation #810

Merged
merged 1 commit into from
Dec 5, 2019
Merged

Conversation

waltjones
Copy link
Contributor

When calculating payload length for truncation, payload.length was being used. This returns the character count (or more accurately the UTF-16 code point count), rather than the UTF-8 byte count. This leads to non-ascii payloads being insufficiently truncated and then rejected by the API.

This PR uses a minimal implementation for counting UTF-8 bytes that never undercounts bytes, and rarely overcounts*. This is safe and valid to be used for truncation, and is significantly smaller and faster than a complete UTF-8 encoder.

  • It will overcount when it encounters UTF-16 surrogates (two code point chars.) These should be rare in practice, but specific support for these could be added if/when desired.

@waltjones waltjones merged commit b63fb20 into master Dec 5, 2019
mudetroit pushed a commit that referenced this pull request Mar 14, 2024
Use correct byte size for truncation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant