Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove language-level UB for non-UTF-8 str #792

Merged
merged 2 commits into from
May 11, 2020

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Apr 11, 2020

Ever since Rust 1.0, the reference said that a non-UTF-8 str causes immediate UB. In terms of today's terminology, that means that str has a validity invariant of being valid UTF-8.

However, that seems unnecessary: the compiler does not actually exploit this, nor is there any clear way it could exploit this. Making UTF-8 a library-level safety invariant is more than enough for everything str does. Most likely, it was made a validity invariant because we had not yet properly teased apart those two concepts when the document was initially written.

This is also the conclusion that the UCG WG arrived at in rust-lang/unsafe-code-guidelines#78.

I therefore propose we remove the UTF-8 clause from the language spec, so that str will have the same validity invariant as [u8]. @Centril suggested I open this a PR here to put this through FCP, so here we go.

Fixes rust-lang/rust#71033

Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

However, I don't believe I can actually use @rfcbot in this repo,
so I think you will need to create an issue on rust-lang/rust
perhaps with most of the PR description pasted there so that I can initiate FCP.

Here are also some textual nits according to the style guide we recently merged.

src/behavior-considered-undefined.md Outdated Show resolved Hide resolved
src/behavior-considered-undefined.md Outdated Show resolved Hide resolved
@RalfJung RalfJung force-pushed the str-utf8 branch 2 times, most recently from f40dd70 to e2fceb4 Compare April 11, 2020 15:07
Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Approving the text itself to be merged once FCP itself is done and stuff.)

@RalfJung
Copy link
Member Author

Opened rust-lang/rust#71033.

@RalfJung
Copy link
Member Author

FCP passed. So can we land this?

@joshtriplett joshtriplett self-requested a review May 11, 2020 18:12
@ehuss
Copy link
Contributor

ehuss commented May 11, 2020

AIUI, str would be defined as [u8], correct? Would it make sense to update the str definition to clarify what it means? Like, make it clear that what str can contain, that it is conventionally interpreted as utf-8 code units, and then include a sidenote that the standard library assumes that it is valid utf-8, and violating that is a safety invariant (or whatever terminology you want to use).

@joshtriplett joshtriplett merged commit 892b928 into rust-lang:master May 11, 2020
@RalfJung RalfJung deleted the str-utf8 branch May 12, 2020 07:23
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 12, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 12, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request May 12, 2020
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Nov 4, 2023
clarify that the str invariant is a safety, not validity, invariant

Updates these docs to match rust-lang/reference#792
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Nov 4, 2023
Rollup merge of rust-lang#117534 - RalfJung:str, r=Mark-Simulacrum

clarify that the str invariant is a safety, not validity, invariant

Updates these docs to match rust-lang/reference#792
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove language-level UB for non-UTF-8 str
4 participants