Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add str::get_unchecked(_mut) in accordance with slice #40436

Closed
wants to merge 1 commit into from
Closed

add str::get_unchecked(_mut) in accordance with slice #40436

wants to merge 1 commit into from

Conversation

F001
Copy link
Contributor

@F001 F001 commented Mar 11, 2017

They take usize or ranges as generic parameter.

cc #39932

r? @nagisa

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

The original issue is also about get and get_mut. If you could add those as well, it’d be great.

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

As for implementation, you probably will also want to add the methods to impl str block here.

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

cc @alexcrichton

which can takes usize or ranges as generic parameter
@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

Also, use str::as_bytes instead of transmute. And the methods probably should still return a &str (via str::from_utf8_unchecked et al (str::from_utf8 could be used for the safe variant, but it would revalidate the whole slice, which seems inefficient))

@F001
Copy link
Contributor Author

F001 commented Mar 11, 2017

Since the parameter can be a range, I assume it represents byte-range instead of chars-range. So, I can't guarantee the result is a valid utf8 string.

@F001
Copy link
Contributor Author

F001 commented Mar 11, 2017

Also, if the parameter is an usize, the best result type is u8 instead of &str.

@F001
Copy link
Contributor Author

F001 commented Mar 11, 2017

I noticed the implementation of str::as_bytes is based on transmute. But there is no str::as_bytes_mut available, so I used transmute in both functions. If I use str::as_bytes instead, I probably need to write a str::as_bytes_mut but keep it private?

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

Since the parameter can be a range, I assume it represents byte-range instead of chars-range. So, I can't guarantee the result is a valid utf8 string.

Yes, str is always sliced in bytes.

The get and get_mut should return None if either side of the range is not on a character boundary (is_char_boundary helps here)

The get_unchecked and get_unchecked_mut should slice in unchecked way, without caring about how correct the end result is.

All the methods should be available on both StrExt trait and as inherent methods (in impl str).

I noticed the implementation of str::as_bytes is based on transmute. But there is no str::as_bytes_mut available, so I used transmute in both functions. If I use str::as_bytes instead, I probably need to write a str::as_bytes_mut but keep it private?

Oh hmm, amusing oversight in the API. I’m not sure.

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

I do not think we want to allow indexing str with usize. We’ve avoided that so far to avoid people assuming all strings are ASCII and will want to avoid introducing such API in the future as well.

EDIT: so instead of using SliceIndex, we’ll probably want to change it somehow to be able to differentiate between slicing [u8] and str. So, that would involve changing SliceIndex somewhat from what it is now to something along the lines of:

pub trait SliceIndex<T: ?Sized> {
    type Output: ?Sized;
    fn get(self, slice: &T) -> Option<&Self::Output>;
    fn get_mut(self, slice: &mut T) -> Option<&mut Self::Output>;
    unsafe fn get_unchecked(self, slice: &T) -> &Self::Output;
    unsafe fn get_unchecked_mut(self, slice: &mut T) -> &mut Self::Output;
    fn index(self, slice: &T) -> &Self::Output;
    fn index_mut(self, slice: &mut T) -> &mut Self::Output;
}

impl<T> SliceIndex<[T]> for usize { ... }
// etc
impl SliceIndex<str> for RangeFull { ... }
// etc

@F001
Copy link
Contributor Author

F001 commented Mar 11, 2017

I agree we should not allow indexing str with usize, which means we can't reuse SliceIndex here.

I tend to not to add the get and get_mut methods now. The logic that default to None when index is not at boundary seems over designed to me. It seems like we should leave the error handling choice to end users.

@nagisa
Copy link
Member

nagisa commented Mar 11, 2017

I tend to not to add the get and get_mut methods now.

I’m fine with adding these later.

which means we can't reuse SliceIndex here.

Please see my edit to the previous comment.

@alexcrichton
Copy link
Member

Thanks for the PR @F001! I agree with the conclusion so far that reusing the SliceIndex trait probably isn't what we want here, but in general having these methods on str sounds like a great idea. We can probably start with just ranges for now and that'll allow non-fallible methods of indexing a string easily. Note that all the implementations for str will have to concern themselves not only with indices but also UTF-8 -ness (e.g. not landing between long codepoints, etc)

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Mar 14, 2017
@F001
Copy link
Contributor Author

F001 commented Mar 19, 2017

@alexcrichton Thanks for your attention. I was very busy in the past few days and will not have enough time to do this work recently. So I would like to close this PR for now. And hopefully I will come back to work on this issue again.

@nagisa Thanks for your patient guidance. I will close this PR and create a new one later.

@F001 F001 closed this Mar 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants