Add iterator constructors and accessors from/to raw pointers #37921

bluss · 2016-11-21T20:46:00Z

Add a constructor for creating a slice iterator from a pair of start/end pointers. Also add accessors for getting the start/end pointers.

The motivation is that low level code wants to be able to create iterators losslessly from raw pointers, without computing an intermediate slice (computing length from the pointers and then back to pointer pair in .iter()).

rust-highfive · 2016-11-21T20:46:11Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

bluss · 2016-11-21T20:48:12Z

Ok.. Request for comment! Does this need a bigger design to be accepted?

Big topic: The ZST (zero-sized type) special case. It's not actually an important use case, but it's an aspect of the slice iterator anyway. I imagine most users won't have to deal with this case, but when they do, it needs more detailed docs so that it is possible to write correct code.

BurntSushi · 2016-11-21T21:32:42Z

cc @rust-lang/libs

BurntSushi · 2016-11-21T21:35:51Z

This seems reasonable to me. Could you elaborate more on the ZST special case? I don't think I quite grok it.

bluss · 2016-11-21T21:46:23Z

Oh yeah, I didn't intend to that ZST would be understandable from the PR alone. If you look at the implementation of the slice iterator, basically every part of it has different behaviour for ZST vs others.

Regular elements

start, end are pointers that are in bounds or one past the end
step using .offset()
iterator element is &*ptr (raw pointer as a reference)

ZST:

start, end are raw pointers that are used like integers. The space between them (in arithmetic/byte units) is the length of the sequence. The pointers are not actually related to any slice or allocated object.
step using wrapping_offset (arith_offset)
iterator element is &*0x1 (0x1 as raw pointer as reference to zst)

Kimundi · 2016-11-21T21:53:17Z

Looks nice! I mentioned on IRC that I can imagine setters for the pointers being useful, but I just realized that you can just a well reconstruct the whole iterator instead.

brson · 2016-11-21T22:36:48Z

What low level code wants to do this? The standard library doesn't seem to need it (it's not using it in this patch). It seems like that needs slice iterators should just create a slice.

bluss · 2016-11-21T23:51:20Z

The patch for PR #30740 first was using the slice iterator for one part, and raw pointers for the ascii fast path. Then @dotdash came along and realized that the constant conversion back and forth between slice and iterator lead to some losses, and rewrote the function to only use slices (current formulation), which was an improvement. So that's one example where this would be useful.

Second use case is https://docs.rs/odds/0.2.25/odds/slice/unalign/struct.UnalignedIter.html

It right now uses a replacement custom iterator (fn tail(&self) -> SliceCopyIter<'a, u8>) exactly because these constructors are missing.

What I like here is the abstraction UnalignedIter and that it allows some blocked algorithms in safe code.

It has a neat thing with the lossless hand-off from the main iterator to the tail end (.tail() method). I think I've found that using the same pointer through the whole iteration is more efficient than splitting the slice up front, and doing two or three separate iterations. This idea can be used both for the unaligned load iterator, or for a three-split for an aligned block as well.

The motivation for UnalignedIter is a safe abstraction for this pseudo code

let mut ptr: *const u8;
let end_block; // A whole number of blocks from ptr
let real_end;
while ptr != end_block {
    ptr = ptr.offset(block_size);
}
while ptr != real_end {
    ptr = ptr.offset(1);
}

The example usage of UnalignedIter is:

fn unalign_count_ones(data: &[u8]) -> u32 {
    let mut total = 0;
    let mut iter = UnalignedIter::<u64>::from_slice(data);
    total += (&mut iter).map(|x| x.count_ones()).sum();
    total += iter.tail().map(|x| x.count_ones()).sum();
    total
}

It can also be plugged in into the current memchr fallback code to reproduce about the same algorithm, but with a high level and safe interface (:wink: burntsushi).

nagisa · 2016-11-22T05:48:13Z

Here’s a question: consider a T of size 1024 bytes. Now somebody provides from_raw_parts a start pointer of x and end pointer of x + 512. What happens?

Going through slice avoids this problem altogether, but allowing the end pointer to be specificed may potentially lead to unintended behaviour IMHO.

bluss · 2016-11-22T08:24:54Z

@nagisa It's certainly possible to provide bad input for this unsafe function. If I understand your scenario correctly, x + 512 can't be produced by using .offset() without casting to a different pointer type so it's quite unlikely to happen.

Slices have from_raw_parts too, and you can put in bogus data there too.

nagisa · 2016-11-22T08:32:37Z

Slices have from_raw_parts too, and you can put in bogus data there too.

You cannot provide 3.5 as a length for the slice::from_raw_parts.

x + 512 is not even aligned correctly for T

That’s not true. As a counterexample alignment of (u8, ..., u8) is 1 byte for arbitrary number of u8. As you say, you cannot produce such invalid pointer using the ptr::offset, though.

Perhaps the documentation for function ought to mention this case; something along the lines of this function only accepting pointers such that end-start is a multiple of sizeof::<T>() or something.

bluss · 2016-11-22T08:35:11Z

Yep, the alignment part wasn't true.

bluss · 2016-11-22T08:36:57Z

For slices's from_raw_parts we put in a length as an integer instead. So a specific criticism against it would be that it's vulnerable to arithmetic wraparound bugs if you subtract things the wrong way, for example.

alexcrichton · 2016-11-23T00:10:29Z

My gut here would be to push on slice::from_raw_parts as much as possible which handles things like ZSTs quite nicely. That way we don't have to expand the API which is always nice!

It sounds like LLVM doesn't optimize slice::from_raw_parts(..).iter() though? (relative to construction via the raw methods here). Perhaps that's an LLVM bug we should fix?

Is there extra functionality we want here though like unaligned accesses and such?

@nagisa does have a good point that we currently have a static guarantee that the start/end pointers are aligned, which we could have in theory forgotten to document (although now we explicitly could as part of the unsafe guarantee)

bluss · 2016-11-23T00:37:43Z

I've made a test case that can be a “benchmark” for std::slice::from_raw_parts to iterator conversions optimizing well. Current rust is not "passing" the test.

Playground link

I don't think that's very realistic, but at least it's good to formulate the target.

I would prefer to have access to the raw parts of something as fundamental as the slice iterator, because it is useful for a lot of low level code. I've got a reimplementation of the slice iterator in an external crate with some extra features (which is just as well, third party crates are our experimentation & innovation space). This is also why I can say that it has a useful effect to have the raw parts constructor.

alexcrichton · 2016-11-28T16:10:47Z

@bluss hm I'm not sure I understand the play link? The optimized LLVM IR looks optimal in all cases there?

bluss · 2016-11-28T16:17:19Z

iter_split uses more instructions than iter_split_raw, even with optimizations. So the conversion through a slice and back is not zero cost.

alexcrichton · 2016-11-28T18:30:15Z

Ah yes, they do differ in what looks like an instruction or two. So this still isn't answering my previous question though. So far we've confirmed that LLVM doesn't optimize this well (which may just be an LLVM bug). What I'm curious about is the motivation of this API beyond "nice to have", e.g.:

Is there extra functionality we want here though like unaligned accesses and such?

I'm assuming that any performance difference is an LLVM bug that's fixable. Those sorts of problems shouldn't force us to expose internals of the standard library.

I'm also still very worried about the ZST problem, slice::from_raw_parts is a natural solution which doesn't have the pitfalls of this method.

bluss · 2016-11-28T21:18:46Z

It does answer the llvm question that brson and you asked, that this fills a niche that is not reached through optimized slice::from_raw_parts.

The slice iterator shouldn't deal with unaligned pointers. Std::slice::from_raw_parts has the same precondition that the raw pointer is well aligned. Take any raw pointer API and this will be a precondition unless otherwise stated IMO. Rust in general does make this assumption itself (the * operator does).

bluss · 2016-11-28T21:23:25Z

I don't think this is just nice to have, I've motivated that it's needed to lower overhead of some low level abstractions.

alexcrichton · 2016-11-29T23:52:04Z

Ok so it sounds like we're basically at a point where:

This API allows for "more zero cost" construction of a slice iterator given two raw pointers. The alternative is slice::from_raw_parts(..).iter(), but it seems like it's unclear whether the difference between these two is an LLVM issue or a more fundamental one.
This API doesn't handle ZST types currently, and it's questionable whether we'd want to stabilize the fashion that iterators currently handle this case.
Currently, there's no more motivation for this API beyond more performance.

@bluss I'm curious about your thoughts on the ZST issue? Do you think we can paper over it here and the perf improvements are worth it? I'm also curious, do you think the perf differences here are simply LLVM bugs to fix, or are they perhaps more fundamental?

bluss · 2016-11-30T16:20:53Z

I don't know anything definite if this optimization can be improved or not, I'm hoping that a contributor with better llvm knowledge can help answer that. My thought is to not expect too much; it would be about computing the same end pointer in two or multiple different ways, and expecting the compiler to see that they are equal. Reliance on the optimizer is reliance on llvm, impacts debug performance, and makes code vulnerable to optimization regressions.

This API handles ZST correctly in the way that ZST iterators round trip correctly if you get the start/end pointer and make a new iterator with that. So what can go wrong: Creating a zst iterator manually (use a slice), stepping through raw pointer to ZST manually (already need special handling).

I think that it is close to a non-issue, because these won't be used with ZST elements, but most often with specific types.

Currently, there's no more motivation for this API beyond more performance.

To fulfill Rust's promise of Zero cost abstractions 😄 I think we can make awesome things with this. As shown in a previous comment, I looked at the two consecutive raw pointer loops and tried to figure out, how can we express this in safe Rust? And this PR is part of the result, it's a needed part.

bluss · 2016-11-30T16:22:54Z

cc @ranma42 What do you think about the optimization question? It is a question of std::slice::from_raw_parts(ptr, <length expression>) versus the raw pointer constructors in this PR, and if better optimization can make the raw pointer constructors redundant.

ranma42 · 2016-11-30T18:35:16Z

I think it should be possible to optimise the computation. Apparently LLVM does not recognise that it is computing the same value twice. It might be possible to improve this on the LLVM side or to find out a different way to express the same values so that the equivalence is proved by the current optimisation pipeline. I will try to experiment with this.

ranma42 · 2016-12-01T09:28:22Z

Gasp, I had a pretty long comment explaining what was going on, but a misclick erased it (I should stop writing long comments in the browser).

Long story short, I worked on a slightly simpler case (u8 instead of u32, see Playpen) and found a case in which LLVM is not optimising properly nested GetElementPointer. This would make it possible for LLVM to optimise iter_split as well as iter_split_good. I will try to write and propose to upstream LLVM this optimisation "soon".

Additionally the version of LLVM currently used by Rust does not include yet the patch from https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264, which would remove the bound checking (and panic handling) in the iter_split_safe version.

alexcrichton · 2016-12-01T23:47:17Z

Thanks for the investigation @ranma42! To clarify, does "LLVM tip" optimize this the same way? Or is it still missing a "hypothetical optimization" to get these two examples to have equivalent codegen?

ranma42 · 2016-12-02T16:44:51Z

@alexcrichton rust-llvm + https://github.com/llvm-project/llvm/commit/cf6e9a81f676b3e3885f86af704e834ef5c04264 optimises away the bound check from iter_split_safe.

In order to get all of the functions in the simplified example to optimise completely, another patch like the one here is needed. I will try to improve it a little (for example it definitely needs a testcase) before proposing it upstream, but it already passes the LLVM test suite :D

I think that in order to fully optimise the code by @bluss other patches might be needed. It looks like LLVM does not infer the alignment of the slice pointers (or Rust does not provide this information?) and manually realigns the length by adding an and operation. (This is the reason why I started working on the simpler case of u8 slices, as it already hit other issues).

bluss · 2016-12-02T16:48:08Z

I just want to emphasize that the bounds check wasn't even present in the code I originally asked about, so it's a separate thing.

ranma42 · 2016-12-02T16:52:31Z

@bluss Yes, sorry, I added it in the code I was testing because iter_split_safe looked to me like the most Rust-ic way to express that split, so I wanted to compare the results of that, too. In any case, the GEP optimisation affects the code generated for both iter_split_safe and iter_split.

alexcrichton · 2016-12-02T22:02:25Z

Interesting! So the GEP optimization still doesn't bring the two iter_split functions to be the same? It still favors the one using the head/tail pointers directly?

ranma42 · 2016-12-03T12:05:01Z

No, for the original example (u32), there is a still a difference.

It should be possible to optimise away the and in line 26 as it is simply related to some re-alignment of the pointers. That would also cause the add in line 29 to be optimised away.

ranma42 · 2016-12-05T14:21:16Z

LLVM patch for GEP is here

bluss · 2016-12-05T14:28:19Z

llvm improvements are awesome!

alexcrichton · 2016-12-08T19:47:52Z

Awesome, thanks for that patch @ranma42! Do you know if those extra optimizations can be implemented in LLVM, or do they rely on the source?

ranma42 · 2016-12-13T18:28:53Z

Sorry for the late reply. Apparently LLVM has some of the needed optimisations, but they need to be extended. Specifically, it looks like some GEP patterns only match on signed operations. There might be some additional interaction with ptrtoint and bitcast combinations, but apparently it should be possible to get the desired code working only on the LLVM side :D

bluss · 2016-12-13T21:16:48Z

oh the alignment question is a bit more clear when comparing to UnalignedIter I realized (see /pull/38352). The latter can't give out elements by reference, only copies. Since the slice iterators yield references, we know (follow up: do we?) they can only deal with aligned data.

cyplo · 2016-12-29T18:48:54Z

Would it be possible to cover these with some unit tests or write some compilable examples, or is it covered by some existing tests possibly ?

bluss · 2016-12-29T19:01:35Z

Sure, they should get the regular kind of unit tests

alexcrichton · 2016-12-31T03:11:18Z

Oh sorry this accidentally fell off my radar. I would still be quite uncomfortable stabilizing functions like this, but it seems fine that if we want to test out the performance we land unstable versions and see how things go either by updating LLVM in the meantime or by pushing on APIs like this. Does that sounds ok?

alexcrichton · 2017-01-17T21:11:32Z

@bluss thoughts on my most recent comment?

BurntSushi · 2017-01-23T21:27:50Z

I'm somewhat sympathetic with making these functions available because it makes the intended optimization clearer, especially if getting llvm to optimize it is tricky. But landing them now as unstable seems like a reasonable starting place to me.

bluss · 2017-01-23T23:13:19Z

@alexcrichton I'm sorry! I was meaning to answer twice, not sure what thoughts derailed my answers. I have accepted that the two cases makes it an API that's hard to use correctly with generic code, and that's not good.

We have Vec::from_raw_parts that's hard to use and often ends in tears for our users, and also CString::from_vec_unchecked; and in neither of these cases does it help that the docs are relatively detailed. (The common error with CString is to assume it can be used like an uninitialized buffer; it can't because from_vec_unchecked reallocates.)

So before this derails again, how do we design an unsafe function that is easy to use? I suspect that leaving a lot unspecified doesn't work (See Vec::from_raw_parts's docs). Clearly the ZST case can catch one unaware. One mitigating factor is that ZST bugs are not really memory safety issues: you can't read or write through a pointer to ZST (it's a no op).

I'm ok with skipping these for now due to the complications. It is a shame that the slice iterator is still implemented with unstable-only fairy dust (specialization and assume intrinsic), but with time Rust users will hopefully have the former and not need the latter.

alexcrichton · 2017-01-24T18:14:02Z

Yeah it's a good point in general, and one that I'm not entirely sure how to solve. Given your thoughts though I'll close this for now (or at least that's my interpretation, feel free to reopen!) but do you want to move discussion over to an issue perhaps?

Add iterator constructors and accessors from/to raw pointers

49a1034

rust-highfive assigned alexcrichton Nov 21, 2016

BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 21, 2016

alexcrichton closed this Jan 24, 2017

bluss deleted the slice-iter-raw-parts branch January 24, 2017 18:43

Add iterator constructors and accessors from/to raw pointers #37921

Add iterator constructors and accessors from/to raw pointers #37921

Conversation

bluss commented Nov 21, 2016

rust-highfive commented Nov 21, 2016

bluss commented Nov 21, 2016 • edited Loading

BurntSushi commented Nov 21, 2016

BurntSushi commented Nov 21, 2016

bluss commented Nov 21, 2016 • edited Loading

Kimundi commented Nov 21, 2016

brson commented Nov 21, 2016

bluss commented Nov 21, 2016 • edited Loading

nagisa commented Nov 22, 2016 • edited Loading

bluss commented Nov 22, 2016 • edited Loading

nagisa commented Nov 22, 2016

bluss commented Nov 22, 2016

bluss commented Nov 22, 2016 • edited Loading

alexcrichton commented Nov 23, 2016

bluss commented Nov 23, 2016

alexcrichton commented Nov 28, 2016

bluss commented Nov 28, 2016

alexcrichton commented Nov 28, 2016

bluss commented Nov 28, 2016 • edited Loading

bluss commented Nov 28, 2016 • edited Loading

alexcrichton commented Nov 29, 2016

bluss commented Nov 30, 2016

bluss commented Nov 30, 2016

ranma42 commented Nov 30, 2016

ranma42 commented Dec 1, 2016

alexcrichton commented Dec 1, 2016

ranma42 commented Dec 2, 2016

bluss commented Dec 2, 2016

ranma42 commented Dec 2, 2016

alexcrichton commented Dec 2, 2016

ranma42 commented Dec 3, 2016

ranma42 commented Dec 5, 2016

bluss commented Dec 5, 2016

alexcrichton commented Dec 8, 2016

ranma42 commented Dec 13, 2016

bluss commented Dec 13, 2016 • edited Loading

cyplo commented Dec 29, 2016

bluss commented Dec 29, 2016

alexcrichton commented Dec 31, 2016

alexcrichton commented Jan 17, 2017

BurntSushi commented Jan 23, 2017

bluss commented Jan 23, 2017 • edited Loading

alexcrichton commented Jan 24, 2017

bluss commented Nov 21, 2016 •

edited

Loading

bluss commented Nov 21, 2016 •

edited

Loading

bluss commented Nov 21, 2016 •

edited

Loading

nagisa commented Nov 22, 2016 •

edited

Loading

bluss commented Nov 22, 2016 •

edited

Loading

bluss commented Nov 22, 2016 •

edited

Loading

bluss commented Nov 28, 2016 •

edited

Loading

bluss commented Nov 28, 2016 •

edited

Loading

bluss commented Dec 13, 2016 •

edited

Loading

bluss commented Jan 23, 2017 •

edited

Loading