Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add iterator constructors and accessors from/to raw pointers #37921

Closed
wants to merge 1 commit into from

Conversation

bluss
Copy link
Member

@bluss bluss commented Nov 21, 2016

Add a constructor for creating a slice iterator from a pair of start/end pointers. Also add accessors for getting the start/end pointers.

The motivation is that low level code wants to be able to create iterators losslessly from raw pointers, without computing an intermediate slice (computing length from the pointers and then back to pointer pair in .iter()).

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss
Copy link
Member Author

bluss commented Nov 21, 2016

Ok.. Request for comment! Does this need a bigger design to be accepted?

Big topic: The ZST (zero-sized type) special case. It's not actually an important use case, but it's an aspect of the slice iterator anyway. I imagine most users won't have to deal with this case, but when they do, it needs more detailed docs so that it is possible to write correct code.

@BurntSushi BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 21, 2016
@BurntSushi
Copy link
Member

cc @rust-lang/libs

@BurntSushi
Copy link
Member

This seems reasonable to me. Could you elaborate more on the ZST special case? I don't think I quite grok it.

@bluss
Copy link
Member Author

bluss commented Nov 21, 2016

Oh yeah, I didn't intend to that ZST would be understandable from the PR alone. If you look at the implementation of the slice iterator, basically every part of it has different behaviour for ZST vs others.

Regular elements

  • start, end are pointers that are in bounds or one past the end
  • step using .offset()
  • iterator element is &*ptr (raw pointer as a reference)

ZST:

  • start, end are raw pointers that are used like integers. The space between them (in arithmetic/byte units) is the length of the sequence. The pointers are not actually related to any slice or allocated object.
  • step using wrapping_offset (arith_offset)
  • iterator element is &*0x1 (0x1 as raw pointer as reference to zst)

@Kimundi
Copy link
Member

Kimundi commented Nov 21, 2016

Looks nice! I mentioned on IRC that I can imagine setters for the pointers being useful, but I just realized that you can just a well reconstruct the whole iterator instead.

@brson
Copy link
Contributor

brson commented Nov 21, 2016

What low level code wants to do this? The standard library doesn't seem to need it (it's not using it in this patch). It seems like that needs slice iterators should just create a slice.

@bluss
Copy link
Member Author

bluss commented Nov 21, 2016

The patch for PR #30740 first was using the slice iterator for one part, and raw pointers for the ascii fast path. Then @dotdash came along and realized that the constant conversion back and forth between slice and iterator lead to some losses, and rewrote the function to only use slices (current formulation), which was an improvement. So that's one example where this would be useful.

Second use case is https://docs.rs/odds/0.2.25/odds/slice/unalign/struct.UnalignedIter.html

It right now uses a replacement custom iterator (fn tail(&self) -> SliceCopyIter<'a, u8>) exactly because these constructors are missing.

What I like here is the abstraction UnalignedIter and that it allows some blocked algorithms in safe code.

It has a neat thing with the lossless hand-off from the main iterator to the tail end (.tail() method). I think I've found that using the same pointer through the whole iteration is more efficient than splitting the slice up front, and doing two or three separate iterations. This idea can be used both for the unaligned load iterator, or for a three-split for an aligned block as well.

The motivation for UnalignedIter is a safe abstraction for this pseudo code

let mut ptr: *const u8;
let end_block; // A whole number of blocks from ptr
let real_end;
while ptr != end_block {
    ptr = ptr.offset(block_size);
}
while ptr != real_end {
    ptr = ptr.offset(1);
}

The example usage of UnalignedIter is:

fn unalign_count_ones(data: &[u8]) -> u32 {
    let mut total = 0;
    let mut iter = UnalignedIter::<u64>::from_slice(data);
    total += (&mut iter).map(|x| x.count_ones()).sum();
    total += iter.tail().map(|x| x.count_ones()).sum();
    total
}

It can also be plugged in into the current memchr fallback code to reproduce about the same algorithm, but with a high level and safe interface (:wink: burntsushi).

@nagisa
Copy link
Member

nagisa commented Nov 22, 2016

Here’s a question: consider a T of size 1024 bytes. Now somebody provides from_raw_parts a start pointer of x and end pointer of x + 512. What happens?

Going through slice avoids this problem altogether, but allowing the end pointer to be specificed may potentially lead to unintended behaviour IMHO.

@bluss
Copy link
Member Author

bluss commented Nov 22, 2016

@nagisa It's certainly possible to provide bad input for this unsafe function. If I understand your scenario correctly, x + 512 can't be produced by using .offset() without casting to a different pointer type so it's quite unlikely to happen.

Slices have from_raw_parts too, and you can put in bogus data there too.

@nagisa
Copy link
Member

nagisa commented Nov 22, 2016

Slices have from_raw_parts too, and you can put in bogus data there too.

You cannot provide 3.5 as a length for the slice::from_raw_parts.

x + 512 is not even aligned correctly for T

That’s not true. As a counterexample alignment of (u8, ..., u8) is 1 byte for arbitrary number of u8. As you say, you cannot produce such invalid pointer using the ptr::offset, though.

Perhaps the documentation for function ought to mention this case; something along the lines of this function only accepting pointers such that end-start is a multiple of sizeof::<T>() or something.

@bluss
Copy link
Member Author

bluss commented Nov 22, 2016

Yep, the alignment part wasn't true.

@bluss
Copy link
Member Author

bluss commented Nov 22, 2016

For slices's from_raw_parts we put in a length as an integer instead. So a specific criticism against it would be that it's vulnerable to arithmetic wraparound bugs if you subtract things the wrong way, for example.

@alexcrichton
Copy link
Member

My gut here would be to push on slice::from_raw_parts as much as possible which handles things like ZSTs quite nicely. That way we don't have to expand the API which is always nice!

It sounds like LLVM doesn't optimize slice::from_raw_parts(..).iter() though? (relative to construction via the raw methods here). Perhaps that's an LLVM bug we should fix?

Is there extra functionality we want here though like unaligned accesses and such?

@nagisa does have a good point that we currently have a static guarantee that the start/end pointers are aligned, which we could have in theory forgotten to document (although now we explicitly could as part of the unsafe guarantee)

@bluss
Copy link
Member Author

bluss commented Nov 23, 2016

I've made a test case that can be a “benchmark” for std::slice::from_raw_parts to iterator conversions optimizing well. Current rust is not "passing" the test.

Playground link

I don't think that's very realistic, but at least it's good to formulate the target.

I would prefer to have access to the raw parts of something as fundamental as the slice iterator, because it is useful for a lot of low level code. I've got a reimplementation of the slice iterator in an external crate with some extra features (which is just as well, third party crates are our experimentation & innovation space). This is also why I can say that it has a useful effect to have the raw parts constructor.

@alexcrichton
Copy link
Member

@bluss hm I'm not sure I understand the play link? The optimized LLVM IR looks optimal in all cases there?

@bluss
Copy link
Member Author

bluss commented Nov 28, 2016

iter_split uses more instructions than iter_split_raw, even with optimizations. So the conversion through a slice and back is not zero cost.

@alexcrichton
Copy link
Member

Ah yes, they do differ in what looks like an instruction or two. So this still isn't answering my previous question though. So far we've confirmed that LLVM doesn't optimize this well (which may just be an LLVM bug). What I'm curious about is the motivation of this API beyond "nice to have", e.g.:

Is there extra functionality we want here though like unaligned accesses and such?

I'm assuming that any performance difference is an LLVM bug that's fixable. Those sorts of problems shouldn't force us to expose internals of the standard library.

I'm also still very worried about the ZST problem, slice::from_raw_parts is a natural solution which doesn't have the pitfalls of this method.

@bluss
Copy link
Member Author

bluss commented Nov 28, 2016

It does answer the llvm question that brson and you asked, that this fills a niche that is not reached through optimized slice::from_raw_parts.

The slice iterator shouldn't deal with unaligned pointers. Std::slice::from_raw_parts has the same precondition that the raw pointer is well aligned. Take any raw pointer API and this will be a precondition unless otherwise stated IMO. Rust in general does make this assumption itself (the * operator does).

@bluss
Copy link
Member Author

bluss commented Nov 28, 2016

I don't think this is just nice to have, I've motivated that it's needed to lower overhead of some low level abstractions.

@alexcrichton
Copy link
Member

Ok so it sounds like we're basically at a point where:

  • This API allows for "more zero cost" construction of a slice iterator given two raw pointers. The alternative is slice::from_raw_parts(..).iter(), but it seems like it's unclear whether the difference between these two is an LLVM issue or a more fundamental one.
  • This API doesn't handle ZST types currently, and it's questionable whether we'd want to stabilize the fashion that iterators currently handle this case.
  • Currently, there's no more motivation for this API beyond more performance.

@bluss I'm curious about your thoughts on the ZST issue? Do you think we can paper over it here and the perf improvements are worth it? I'm also curious, do you think the perf differences here are simply LLVM bugs to fix, or are they perhaps more fundamental?

@bluss
Copy link
Member Author

bluss commented Nov 30, 2016

I don't know anything definite if this optimization can be improved or not, I'm hoping that a contributor with better llvm knowledge can help answer that. My thought is to not expect too much; it would be about computing the same end pointer in two or multiple different ways, and expecting the compiler to see that they are equal. Reliance on the optimizer is reliance on llvm, impacts debug performance, and makes code vulnerable to optimization regressions.

This API handles ZST correctly in the way that ZST iterators round trip correctly if you get the start/end pointer and make a new iterator with that. So what can go wrong: Creating a zst iterator manually (use a slice), stepping through raw pointer to ZST manually (already need special handling).

I think that it is close to a non-issue, because these won't be used with ZST elements, but most often with specific types.

Currently, there's no more motivation for this API beyond more performance.

To fulfill Rust's promise of Zero cost abstractions 😄 I think we can make awesome things with this. As shown in a previous comment, I looked at the two consecutive raw pointer loops and tried to figure out, how can we express this in safe Rust? And this PR is part of the result, it's a needed part.

@bluss
Copy link
Member Author

bluss commented Nov 30, 2016

cc @ranma42 What do you think about the optimization question? It is a question of std::slice::from_raw_parts(ptr, <length expression>) versus the raw pointer constructors in this PR, and if better optimization can make the raw pointer constructors redundant.

@ranma42
Copy link
Contributor

ranma42 commented Nov 30, 2016

I think it should be possible to optimise the computation. Apparently LLVM does not recognise that it is computing the same value twice. It might be possible to improve this on the LLVM side or to find out a different way to express the same values so that the equivalence is proved by the current optimisation pipeline. I will try to experiment with this.

@ranma42
Copy link
Contributor

ranma42 commented Dec 1, 2016

Gasp, I had a pretty long comment explaining what was going on, but a misclick erased it (I should stop writing long comments in the browser).

Long story short, I worked on a slightly simpler case (u8 instead of u32, see Playpen) and found a case in which LLVM is not optimising properly nested GetElementPointer. This would make it possible for LLVM to optimise iter_split as well as iter_split_good. I will try to write and propose to upstream LLVM this optimisation "soon".

Additionally the version of LLVM currently used by Rust does not include yet the patch from https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264, which would remove the bound checking (and panic handling) in the iter_split_safe version.

@alexcrichton
Copy link
Member

Thanks for the investigation @ranma42! To clarify, does "LLVM tip" optimize this the same way? Or is it still missing a "hypothetical optimization" to get these two examples to have equivalent codegen?

@ranma42
Copy link
Contributor

ranma42 commented Dec 2, 2016

@alexcrichton rust-llvm + https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264 optimises away the bound check from iter_split_safe.

In order to get all of the functions in the simplified example to optimise completely, another patch like the one here is needed. I will try to improve it a little (for example it definitely needs a testcase) before proposing it upstream, but it already passes the LLVM test suite :D

I think that in order to fully optimise the code by @bluss other patches might be needed. It looks like LLVM does not infer the alignment of the slice pointers (or Rust does not provide this information?) and manually realigns the length by adding an and operation. (This is the reason why I started working on the simpler case of u8 slices, as it already hit other issues).

@bluss
Copy link
Member Author

bluss commented Dec 2, 2016

I just want to emphasize that the bounds check wasn't even present in the code I originally asked about, so it's a separate thing.

@ranma42
Copy link
Contributor

ranma42 commented Dec 2, 2016

@bluss Yes, sorry, I added it in the code I was testing because iter_split_safe looked to me like the most Rust-ic way to express that split, so I wanted to compare the results of that, too. In any case, the GEP optimisation affects the code generated for both iter_split_safe and iter_split.

@alexcrichton
Copy link
Member

Interesting! So the GEP optimization still doesn't bring the two iter_split functions to be the same? It still favors the one using the head/tail pointers directly?

@ranma42
Copy link
Contributor

ranma42 commented Dec 3, 2016

No, for the original example (u32), there is a still a difference.

It should be possible to optimise away the and in line 26 as it is simply related to some re-alignment of the pointers. That would also cause the add in line 29 to be optimised away.

@ranma42
Copy link
Contributor

ranma42 commented Dec 5, 2016

LLVM patch for GEP is here

@bluss
Copy link
Member Author

bluss commented Dec 5, 2016

llvm improvements are awesome!

@alexcrichton
Copy link
Member

Awesome, thanks for that patch @ranma42! Do you know if those extra optimizations can be implemented in LLVM, or do they rely on the source?

@ranma42
Copy link
Contributor

ranma42 commented Dec 13, 2016

Sorry for the late reply. Apparently LLVM has some of the needed optimisations, but they need to be extended. Specifically, it looks like some GEP patterns only match on signed operations. There might be some additional interaction with ptrtoint and bitcast combinations, but apparently it should be possible to get the desired code working only on the LLVM side :D

@bluss
Copy link
Member Author

bluss commented Dec 13, 2016

oh the alignment question is a bit more clear when comparing to UnalignedIter I realized (see /pull/38352). The latter can't give out elements by reference, only copies. Since the slice iterators yield references, we know (follow up: do we?) they can only deal with aligned data.

@cyplo
Copy link
Contributor

cyplo commented Dec 29, 2016

Would it be possible to cover these with some unit tests or write some compilable examples, or is it covered by some existing tests possibly ?

@bluss
Copy link
Member Author

bluss commented Dec 29, 2016

Sure, they should get the regular kind of unit tests

@alexcrichton
Copy link
Member

Oh sorry this accidentally fell off my radar. I would still be quite uncomfortable stabilizing functions like this, but it seems fine that if we want to test out the performance we land unstable versions and see how things go either by updating LLVM in the meantime or by pushing on APIs like this. Does that sounds ok?

@alexcrichton
Copy link
Member

@bluss thoughts on my most recent comment?

@BurntSushi
Copy link
Member

I'm somewhat sympathetic with making these functions available because it makes the intended optimization clearer, especially if getting llvm to optimize it is tricky. But landing them now as unstable seems like a reasonable starting place to me.

@bluss
Copy link
Member Author

bluss commented Jan 23, 2017

@alexcrichton I'm sorry! I was meaning to answer twice, not sure what thoughts derailed my answers. I have accepted that the two cases makes it an API that's hard to use correctly with generic code, and that's not good.

We have Vec::from_raw_parts that's hard to use and often ends in tears for our users, and also CString::from_vec_unchecked; and in neither of these cases does it help that the docs are relatively detailed. (The common error with CString is to assume it can be used like an uninitialized buffer; it can't because from_vec_unchecked reallocates.)

So before this derails again, how do we design an unsafe function that is easy to use? I suspect that leaving a lot unspecified doesn't work (See Vec::from_raw_parts's docs). Clearly the ZST case can catch one unaware. One mitigating factor is that ZST bugs are not really memory safety issues: you can't read or write through a pointer to ZST (it's a no op).

I'm ok with skipping these for now due to the complications. It is a shame that the slice iterator is still implemented with unstable-only fairy dust (specialization and assume intrinsic), but with time Rust users will hopefully have the former and not need the latter.

@alexcrichton
Copy link
Member

Yeah it's a good point in general, and one that I'm not entirely sure how to solve. Given your thoughts though I'll close this for now (or at least that's my interpretation, feel free to reopen!) but do you want to move discussion over to an issue perhaps?

@bluss bluss deleted the slice-iter-raw-parts branch January 24, 2017 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants