-
Notifications
You must be signed in to change notification settings - Fork 12.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid re-validating UTF-8 in FromUtf8Error::into_utf8_lossy
#130408
base: master
Are you sure you want to change the base?
Conversation
r? @Noratrieb rustbot has assigned @Noratrieb. Use |
let iter = self.bytes[self.error.valid_up_to()..].utf8_chunks(); | ||
|
||
for chunk in iter { | ||
res.push_str(chunk.valid()); | ||
if !chunk.invalid().is_empty() { | ||
res.push_str(REPLACEMENT); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the logic from String::from_utf8_lossy
library/alloc/src/string.rs
Outdated
|
||
// `Utf8Error::valid_up_to` returns the maximum index of validated | ||
// UTF-8 bytes. Copy the valid bytes into the output buffer. | ||
v.extend(self.bytes[..self.error.valid_up_to()].iter().copied()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use extend_from_slice
here? that's both simpler and possible more efficient (though the iterator probably produces good code as well? but either way, extend_from_slice is shorter)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I normally use extend_from_slice
and second-guessed myself since I couldn't remember if there was a difference.
I checked the docs and went with this because of the last sentence here but that's maybe a bit hopeful still.
Note that this function is same as
extend
except that it is specialized to work with slices instead. If and when Rust gets specialization this function will likely be deprecated (but still available).
https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#method.extend_from_slice
library/alloc/tests/string.rs
Outdated
assert_eq!(func(xs), ys); | ||
|
||
let xs = b"Hello\xC2 There\xFF Goodbye"; | ||
assert_eq!(func(xs), String::from("Hello\u{FFFD} There\u{FFFD} Goodbye")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit (you don't have to change this but can if you're touching it anyways and want to): you've used .to_owned()
above but use String::from
here, i prefer .to_owned()
myself so they could be made consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed.
I just copied the test above but I prefer that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, this looks good! a small comment and then it's good
Refactor `into_utf8_lossy` to copy valid UTF-8 bytes into the buffer, avoiding double validation of bytes. Add tests that mirror the `String::from_utf8_lossy` tests
a9b04b3
to
b94c5a1
Compare
Force pushed |
@bors r+ rollup |
…atrieb Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy` Part of the unstable feature `string_from_utf8_lossy_owned` - rust-lang#129436 Refactor `FromUtf8Error::into_utf8_lossy` to copy valid UTF-8 bytes into the buffer, avoiding double validation of bytes. Add tests that mirror the `String::from_utf8_lossy` tests.
…iaskrgr Rollup of 9 pull requests Successful merges: - rust-lang#127766 (add `extern "C-cmse-nonsecure-entry" fn` ) - rust-lang#129629 (Implement Return Type Notation (RTN)'s path form in where clauses) - rust-lang#130246 (rustc_expand: remember module `#[path]`s during expansion) - rust-lang#130408 (Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy`) - rust-lang#130651 (Add --enable-profiler to armhf dist) - rust-lang#130653 (ABI compatibility: mention Result guarantee) - rust-lang#130665 (Prevent Deduplication of `LongRunningWarn`) - rust-lang#130666 (Assert that `explicit_super_predicates_of` and `explicit_item_super_predicates` truly only contains bounds for the type itself) - rust-lang#130667 (compiler: Accept "improper" ctypes in extern "rust-cold" fn) r? `@ghost` `@rustbot` modify labels: rollup
…atrieb Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy` Part of the unstable feature `string_from_utf8_lossy_owned` - rust-lang#129436 Refactor `FromUtf8Error::into_utf8_lossy` to copy valid UTF-8 bytes into the buffer, avoiding double validation of bytes. Add tests that mirror the `String::from_utf8_lossy` tests.
…mpiler-errors Rollup of 8 pull requests Successful merges: - rust-lang#127766 (add `extern "C-cmse-nonsecure-entry" fn` ) - rust-lang#129629 (Implement Return Type Notation (RTN)'s path form in where clauses) - rust-lang#130408 (Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy`) - rust-lang#130651 (Add --enable-profiler to armhf dist) - rust-lang#130653 (ABI compatibility: mention Result guarantee) - rust-lang#130666 (Assert that `explicit_super_predicates_of` and `explicit_item_super_predicates` truly only contains bounds for the type itself) - rust-lang#130667 (compiler: Accept "improper" ctypes in extern "rust-cold" fn) - rust-lang#130673 (Parser: recover from `:::` to `::`) r? `@ghost` `@rustbot` modify labels: rollup
…mpiler-errors Rollup of 8 pull requests Successful merges: - rust-lang#127766 (add `extern "C-cmse-nonsecure-entry" fn` ) - rust-lang#129629 (Implement Return Type Notation (RTN)'s path form in where clauses) - rust-lang#130408 (Avoid re-validating UTF-8 in `FromUtf8Error::into_utf8_lossy`) - rust-lang#130651 (Add --enable-profiler to armhf dist) - rust-lang#130653 (ABI compatibility: mention Result guarantee) - rust-lang#130666 (Assert that `explicit_super_predicates_of` and `explicit_item_super_predicates` truly only contains bounds for the type itself) - rust-lang#130667 (compiler: Accept "improper" ctypes in extern "rust-cold" fn) - rust-lang#130673 (Parser: recover from `:::` to `::`) r? `@ghost` `@rustbot` modify labels: rollup
Part of the unstable feature
string_from_utf8_lossy_owned
- #129436Refactor
FromUtf8Error::into_utf8_lossy
to copy valid UTF-8 bytes into the buffer, avoiding double validation of bytes.Add tests that mirror the
String::from_utf8_lossy
tests.