Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make AllocId decoding thread-safe #50957

Closed
wants to merge 4 commits into from
Closed

Conversation

Zoxc
Copy link
Contributor

@Zoxc Zoxc commented May 22, 2018

This builds on top of #50520.

cc @michaelwoerister
r? @oli-obk

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 22, 2018
@oli-obk
Copy link
Contributor

oli-obk commented May 22, 2018

Repeating my worries from the other PR:

I am very unsure about [this]. We had a scheme like that when miri was merged, and we kept running into various edge cases with the decoding order. Even creating MCVEs for the panics we were getting was hard, because small changes in the code would change the order of evaluation.

Additionally I don't think we should do this at all, even considering it works now, because it will cause all those bugs again if we allow

let mut foo = Rc::new(RefCell::new(None));
let bar = Rc::new(RefCell::new(Some(foo.clone())));
*foo.borrow_mut() = Some(bar);

within constants. #49172 is a first step in that direction.

Any cyclic pointter structure inside constants will not work with the system proposed in this PR.

@michaelwoerister
Copy link
Member

I want to take a closer look at this.

@michaelwoerister
Copy link
Member

So, I think the difference between this and what we had before the table approach is that AllocKind::Alloc and AllocKind::AllocAtPos are two distinct cases now. That way we never encounter the case where the decoder would have to "skip ahead" when it decodes an already cached allocation.

I think it would also work for circular allocation graphs if we cache the pos -> AllocId mapping before encoding the allocation contents here.

trace!("encoding {:?} with {:#?}", alloc_id, alloc);
AllocKind::Alloc.encode(encoder)?;
alloc.encode(encoder)?;
cache(encoder).insert_same(alloc_id, pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

insert_same() doesn't seem what we want here. If this could be reached in a racy way then pos would not necessarily be the same. It would also mean that two encoders would write to the same stream. Something like assert!(cache(encoder).insert(alloc_id, pos).is_none()) seems more appropriate. Correct me if I'm wrong.

@michaelwoerister
Copy link
Member

OK, so I've reviewed 9d4d3e9 and it looks good to me. Do you still have objections, @oli-obk?

@michaelwoerister
Copy link
Member

OK, so I've reviewed 9d4d3e9 and it looks good to me.

It looks good to me if we support encoding circular graphs, as noted above, that is...

@oli-obk
Copy link
Contributor

oli-obk commented May 22, 2018

So, I think the difference between this and what we had before the table approach is that AllocKind::Alloc and AllocKind::AllocAtPos are two distinct cases now. That way we never encounter the case where the decoder would have to "skip ahead" when it decodes an already cached allocation.

This is exactly the same situation we had before, except that AllocAtPos is now discriminant + u32 instead of just u32. The old code simply inlined the AllocAtPos variant into the discriminant.

The case I mentioned will still happen. Imagine the following steps:

  1. reach an AllocAtPos, so you do a decoder.with_position
  2. the allocation you a decoding leads to you decoding an AllocId via AllocAtPos
  3. You follow the AllocAtPos, start decoding and reach another AllocId, this time AllocKind::Alloc, and it happens to be the one that 1. also pointed to.

Since thinking about this tends to fry my brain, I created a google doc illustrating the issue: https://docs.google.com/presentation/d/1AWwnDxuZKZgj1PvWo5mPiwhapmV5h3bjUVn-De-tpKc/edit?usp=sharing

I think it would also work for circular allocation graphs if we cache the pos -> AllocId mapping before encoding the allocation contents here.

While you can pre-cache the AllocId, that doesn't help you here, since you don't know how many bytes you need to skip ahead.

We also cannot encode this skip bytes amount, because at encoding time we don't know how far they are. We could reserve 4 bytes and write back the skip amount later, but that'll get horrible fast.

That said. I think we should just do this, because as @Zoxc correctly pointed out to me some time ago, we will (in the future) refactor AllocId to be

enum AllocId<'tcx> {
    Static(DefId),
    Function(Instance<'tcx>),
    Local(u64),
}

where Local refers to a constant-local id. This means that constants cannot contain pointers into other constants anymore, which is totally fine, since we can just copy the entire constant's memory. This won't do any actual copying, because the Allocations are interned so we still point to the very same physical memory in RAM, but it'll appear to have a different AllocId in the interpreter.

@oli-obk
Copy link
Contributor

oli-obk commented May 22, 2018

Oh that said, yes please insert loads of sanity checks as @michaelwoerister already pointed out. I'd rather have sensible assertions triggerd than really weird decoding errors later in the pipeline.

This mainly means asserting that the return value of any insert or remove operation is the expected one.

@michaelwoerister
Copy link
Member

michaelwoerister commented May 22, 2018

@oli-obk, I'm wondering if case 3 in your presentation wouldn't just work (although it would decode Alloc(99) twice):

Decode(AtPos(99))             <-- reserve/cache 99
  Decode(Alloc(99))           
    Decode(AtPos(42))         <-- reserve/cache 42
      Decode(Alloc(42))
        Decode(Alloc(99))
          Decode(AtPos(42))   <-- cache hit 42
          Done(Alloc(99))     <-- Alloc(99) interned
        Done(Alloc(42))       <-- Alloc(42) interned
    Done(AtPos(42))           <-- cache[42] = Alloc(42)
  Done(Alloc(99))             <-- Alloc(99) interned (again)
Done(AtPos(99))               <-- cache[99] = Alloc(99)

yes please insert loads of sanity checks

Yes, please! :)

@oli-obk
Copy link
Contributor

oli-obk commented May 22, 2018

We are also creating real AllocIds, so we'd need to ensure we don't create new ones for the same allocation. And then we need to guarantee that the second interning produces the exact same Allocation.

This might get tricky, especially with multithreading being involved. I'll do another review wrt multithreading

let alloc_type: AllocType<'tcx, &'tcx Allocation> =
tcx.alloc_map.lock().get(alloc_id).expect("no value for AllocId");
match alloc_type {
AllocType::Memory(alloc) => {
if let Some(alloc_pos) = cache(encoder).get(&alloc_id).cloned() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to be an atomic "get or insert" operation in order to prevent two threads that get here at the same time from both trying to encode alloc (I think this is the same as what @michaelwoerister mentioned below)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this been addressed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation is effectively atomic since we have unique ownership of the encoder and the cache. This doesn't matter though as encoding isn't intended to be multithreaded.

AllocKind::AllocAtPos.encode(encoder)?;
return encoder.emit_usize(alloc_pos);
}
let pos = encoder.position();

This comment was marked as resolved.

This comment was marked as resolved.

@Zoxc
Copy link
Contributor Author

Zoxc commented May 23, 2018

I think it would also work for circular allocation graphs if we cache the pos -> AllocId mapping before encoding the allocation contents here.

Yes. I've moved the insertion so that it should handle circular allocation graphs there.

@Zoxc
Copy link
Contributor Author

Zoxc commented May 23, 2018

There was a possibly race condition where one thread would decode an AllocId using the ``AllocKind::Allocpath and insert the new id in the cache. Another thread could then decoding the sameAllocId` using the `AllocAtPos` path, and it would see the id in the cache and exit, but the first thread may not have finished loading the allocation yet.

I've changed the way decoding works to deal with this. We now have 2 caches. One global and one for the current session. The global cache contains a flag which indicates if the AllocId was partially loaded (it was assigned an id) or fully loaded (the AllocId has an associated Allocation).

This PR does not attempt to make encoding thread-safe, as we currently only encode using a single thread.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:start:test_incremental
Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[00:57:33] 
[00:57:33] running 88 tests
tal-verify-ich" "-Z" "incremental-queries" "--error-format" "json" "-Zui-testing" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/issue-49595/issue_49595/a" "-Crpath" "-O" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "query-dep-graph" "--test" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/issue-49595/issue_49595/auxiliary"
[00:57:53] ------------------------------------------
[00:57:53] 
[00:57:53] ------------------------------------------
[00:57:53] stderr:
[00:57:53] stderr:
[00:57:53] ------------------------------------------
[00:57:53] thread 'main' panicked at 'internal error: entered unreachable code', librustc/mir/interpret/value.rs:197:61
[00:57:53] 
[00:57:53] error: internal compiler error: unexpected panic
[00:57:53] 
[00:57:53] 
[00:57:53] note: the compiler unexpectedly panicked. this is a bug.
[00:57:53] 
[00:57:53] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:57:53] note: rustc 1.28.0-dev running on x86_64-unknown-linux-gnu
[00:57:53] 
[00:57:53] 
[00:57:53] note: compiler flags: -Z incremental-verify-ich -Z incremental-queries -Z ui-testing -Z unstable-options -Z query-dep-graph -C incremental -C prefer-dynamic -C rpath
[00:57:53] 
[00:57:53] ------------------------------------------
[00:57:53] 
[00:57:53] thread '[incremental] incremental/issue-49595/issue_49595.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3044:9
---
[00:57:53] 
[00:57:53] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:498:22
[00:57:53] 
[00:57:53] 
[00:57:53] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/incremental" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "incremental" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-3.9/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "3.9.1\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[00:57:53] 
[00:57:53] 
[00:57:53] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[00:57:53] Build completed unsuccessfully in 0:14:59
[00:57:53] Build completed unsuccessfully in 0:14:59
[00:57:53] Makefile:58: recipe for target 'check' failed
[00:57:53] make: *** [check] Error 1

The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:132a10a1
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@michaelwoerister
Copy link
Member

I've changed the way decoding works to deal with this. We now have 2 caches. [...]

Makes sense.

This PR does not attempt to make encoding thread-safe, as we currently only encode using a single thread.

Yes, at the moment we don't have an encoder that could work concurrently anyway.

The travis error suggests that it's trying to decode from an invalid position somewhere.

@bors
Copy link
Contributor

bors commented May 23, 2018

☔ The latest upstream changes (presumably #50866) made this pull request unmergeable. Please resolve the merge conflicts.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:43:26] .....................................................i..............................................
[00:43:30] .........................................................................ii.........................
[00:43:36] ....................................................................................................
[00:43:42] ...................................................................................i................
[00:43:44] .iiiiiiiii...................................................
[00:43:44] 
[00:43:44] travis_fold:start:test_ui_nll
travis_time:start:test_ui_nll
Check compiletest suite=ui mode=ui compare_mode=nll (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
---
[00:44:31] .....................................................i..............................................
[00:44:35] .........................................................................ii.........................
[00:44:40] ....................................................................................................
[00:44:46] ...................................................................................i................
[00:44:48] ..iiiiiiiii..................................................
[00:44:48] 
[00:44:48]  finished in 63.758
[00:44:48] travis_fold:end:test_ui_nll

---
travis_time:start:test_incremental
Check compiletest suite=incremental mode=incremental (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
[00:55:27] 
[00:55:27] running 88 tests
[00:55:47] .......................................................F................................
[00:55:47] thread 'main' panicked at 'Some tests failed', tools/compiletest/src/main.rs:498:22
[00:55:47] 
[00:55:47] ---- [incremental] incremental/issue-49595/issue_49595.rs stdout ----
[00:55:47] 
[00:55:47] 
[00:55:47] error in revision `cfail2`: test compilation failed although it shouldn't!
[00:55:47] status: exit code: 101
[00:55:47] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/incremental/issue-49595/issue_49595.rs" "--target=x86_64-unknown-linux-gnu" "--cfg" "cfail2" "-C" "incremental=/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/issue-49595/issue_49595/issue_49595.inc" "-Z" "incremental-verify-ich" "-Z" "incremental-queries" "--error-format" "json" "-Zui-testing" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/issue-49595/issue_49595/a" "-Crpath" "-O" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "query-dep-graph" "--test" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental/issue-49595/issue_49595/auxiliary"
[00:55:47] ------------------------------------------
[00:55:47] 
[00:55:47] ------------------------------------------
[00:55:47] stderr:
[00:55:47] stderr:
[00:55:47] ------------------------------------------
[00:55:47] thread 'main' panicked at 'internal error: entered unreachable code', librustc/mir/interpret/value.rs:197:61
[00:55:47] 
[00:55:47] error: internal compiler error: unexpected panic
[00:55:47] 
[00:55:47] 
[00:55:47] note: the compiler unexpectedly panicked. this is a bug.
[00:55:47] 
[00:55:47] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:55:47] note: rustc 1.28.0-dev running on x86_64-unknown-linux-gnu
[00:55:47] 
[00:55:47] 
[00:55:47] note: compiler flags: -Z incremental-verify-ich -Z incremental-queries -Z ui-testing -Z unstable-options -Z query-dep-graph -C incremental -C prefer-dynamic -C rpath
[00:55:47] 
[00:55:47] ------------------------------------------
[00:55:47] 
[00:55:47] thread '[incremental] incremental/issue-49595/issue_49595.rs' panicked at 'explicit panic', tools/compiletest/src/runtest.rs:3053:9
---
[00:55:47] test result: FAILED. 87 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out
[00:55:47] 
[00:55:47] 
[00:55:47] 
[00:55:47] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/incremental" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/incremental" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "incremental" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-3.9/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options " "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "3.9.1\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[00:55:47] 
[00:55:47] 
[00:55:47] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[00:55:47] Build completed unsuccessfully in 0:14:31
[00:55:47] Build completed unsuccessfully in 0:14:31
[00:55:47] Makefile:58: recipe for target 'check' failed
[00:55:47] make: *** [check] Error 1
104168 ./obj/build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu
104164 ./obj/build/x86_64-unknown-linux-gnu/stage0-tools/x86_64-unknown-linux-gnu/release
103608 ./obj/build/x86_64-unknown-linux-gnu/stage0/lib/rustlib/x86_64-unknown-linux-gnu/codegen-backends
103228 ./obj/build/bootstrap/debug/incremental/bootstrap-c730863262pt
103228 ./obj/build/bootstrap/debug/incremental/bootstrap-c730863262pt
103224 ./obj/build/bootstrap/debug/incremental/bootstrap-c730863262pt/s-f1c1jm4hc1-16gfdfi-2h3qirbcc1hzj
91892 ./obj/build/x86_64-unknown-linux-gnu/stage1
91868 ./obj/build/x86_64-unknown-linux-gnu/stage1/lib
89804 ./src/llvm/test/CodeGen
89412 ./obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@michaelwoerister
Copy link
Member

Here's a backtrace for the ICE.

thread 'main' panicked at 'internal error: entered unreachable code', librustc/mir/interpret/value.rs:197:61
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::panicking::default_hook::{{closure}}
             at libstd/sys_common/backtrace.rs:71
             at libstd/sys_common/backtrace.rs:59
             at libstd/panicking.rs:211
   2: std::panicking::default_hook
             at libstd/panicking.rs:227
   3: rustc::util::common::panic_hook
             at librustc/util/common.rs:54
   4: std::panicking::rust_panic_with_hook
             at libstd/panicking.rs:467
   5: std::panicking::begin_panic
             at ./src/libstd/panicking.rs:397
   6: serialize::serialize::Decoder::read_enum
             at librustc/mir/interpret/value.rs:197
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/interpret/value.rs:197
             at ./src/libserialize/serialize.rs:168
   7: serialize::serialize::Decoder::read_enum
             at librustc/mir/interpret/value.rs:197
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/mir/interpret/value.rs:15
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/interpret/value.rs:10
             at ./src/libserialize/serialize.rs:168
   8: serialize::serialize::Decoder::read_enum
             at librustc/mir/interpret/value.rs:10
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/middle/const_val.rs:28
             at ./src/libserialize/serialize.rs:175
             at librustc/middle/const_val.rs:25
             at ./src/libserialize/serialize.rs:168
   9: serialize::serialize::Decoder::read_enum
             at librustc/middle/const_val.rs:25
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:205
             at librustc/ty/sty.rs:1765
             at ./src/libserialize/serialize.rs:199
             at librustc/ty/sty.rs:1761
             at librustc/ty/codec.rs:263
             at librustc/ty/codec.rs:403
             at ./src/libserialize/serialize.rs:850
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/mir/mod.rs:1849
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/mod.rs:1846
             at ./src/libserialize/serialize.rs:168
  10: serialize::serialize::Decoder::read_struct
             at librustc/mir/mod.rs:1846
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:205
             at librustc/mir/mod.rs:1840
             at ./src/libserialize/serialize.rs:199
  11: serialize::serialize::Decoder::read_enum
             at librustc/mir/mod.rs:1836
             at ./src/libserialize/serialize.rs:511
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/mir/mod.rs:1525
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/mod.rs:1512
             at ./src/libserialize/serialize.rs:168
  12: serialize::serialize::Decoder::read_enum
             at librustc/mir/mod.rs:1512
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/mir/mod.rs:1570
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/mod.rs:1567
             at ./src/libserialize/serialize.rs:168
  13: serialize::serialize::Decoder::read_enum
             at librustc/mir/mod.rs:1567
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:179
             at librustc/mir/mod.rs:1221
             at ./src/libserialize/serialize.rs:175
             at librustc/mir/mod.rs:1218
             at ./src/libserialize/serialize.rs:168
  14: serialize::serialize::Decoder::read_struct
             at librustc/mir/mod.rs:1218
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:205
             at librustc/mir/mod.rs:1199
             at ./src/libserialize/serialize.rs:199
  15: serialize::serialize::Decoder::read_seq
             at librustc/mir/mod.rs:1196
             at ./src/libserialize/serialize.rs:563
             at ./src/libserialize/serialize.rs:248
             at ./src/libserialize/serialize.rs:563
             at ./src/libserialize/serialize.rs:245
  16: serialize::serialize::Decoder::read_struct
             at ./src/libserialize/serialize.rs:560
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:205
             at librustc/mir/mod.rs:686
             at ./src/libserialize/serialize.rs:199
  17: serialize::serialize::Decoder::read_seq
             at librustc/mir/mod.rs:683
             at ./src/libserialize/serialize.rs:563
             at ./src/libserialize/serialize.rs:248
             at ./src/libserialize/serialize.rs:563
             at ./src/libserialize/serialize.rs:245
  18: <rustc::mir::Mir<'tcx> as serialize::serialize::Decodable>::decode::{{closure}}
             at ./src/libserialize/serialize.rs:560
             at ./src/librustc_data_structures/indexed_vec.rs:350
             at ./src/libcore/ops/function.rs:223
             at ./src/libserialize/serialize.rs:205
             at librustc/mir/mod.rs:79
  19: <rustc::ty::maps::queries::optimized_mir<'tcx> as rustc::ty::maps::config::QueryDescription<'tcx>>::try_load_from_disk
             at ./src/libserialize/serialize.rs:199
             at librustc/mir/mod.rs:75
             at librustc/ty/maps/on_disk_cache.rs:506
             at librustc/ty/maps/on_disk_cache.rs:396
             at librustc/ty/maps/on_disk_cache.rs:342
             at librustc/ty/maps/config.rs:702
  20: rustc::ty::maps::plumbing::<impl rustc::ty::context::TyCtxt<'a, 'gcx, 'tcx>>::get_query
             at librustc/ty/maps/plumbing.rs:440
             at librustc/ty/maps/plumbing.rs:406
             at librustc/ty/maps/plumbing.rs:603
             at librustc/ty/maps/plumbing.rs:610
  21: rustc::ty::maps::plumbing::<impl rustc::dep_graph::dep_node::DepNode>::load_from_on_disk_cache
             at librustc/ty/maps/plumbing.rs:780
             at librustc/ty/maps/plumbing.rs:773
             at librustc/ty/maps/plumbing.rs:1189
  22: rustc::dep_graph::graph::DepGraph::exec_cache_promotions
             at librustc/dep_graph/graph.rs:815
  23: rustc::ty::context::tls::with_context::{{closure}}
             at ./src/librustc/ty/maps/on_disk_cache.rs:203
             at ./src/librustc/dep_graph/graph.rs:166
             at ./src/librustc/ty/context.rs:1725
             at ./src/librustc/ty/context.rs:1666
             at ./src/librustc/ty/context.rs:1724
             at ./src/librustc/dep_graph/graph.rs:165
             at ./src/librustc/ty/context.rs:1770
  24: rustc::util::common::time
             at ./src/librustc/ty/context.rs:1761
             at ./src/librustc/ty/context.rs:1770
             at ./src/librustc/dep_graph/graph.rs:159
             at ./src/librustc/ty/maps/on_disk_cache.rs:173
             at ./src/librustc/ty/context.rs:1340
             at librustc_incremental/persist/save.rs:256
             at ./src/librustc/util/common.rs:166
             at ./src/librustc/util/common.rs:160
  25: rustc_incremental::persist::save::save_in
             at librustc_incremental/persist/save.rs:255
             at librustc_incremental/persist/save.rs:39
             at librustc_incremental/persist/save.rs:120
  26: rustc::util::common::time
             at librustc_incremental/persist/save.rs:37
             at ./src/librustc/util/common.rs:166
             at ./src/librustc/util/common.rs:160
  27: rustc_incremental::persist::save::save_dep_graph
             at librustc_incremental/persist/save.rs:36
             at ./src/librustc/dep_graph/graph.rs:166
             at ./src/librustc/ty/context.rs:1725
             at ./src/librustc/ty/context.rs:1666
             at ./src/librustc/ty/context.rs:1724
             at ./src/librustc/dep_graph/graph.rs:165
             at ./src/librustc/ty/context.rs:1770
             at ./src/librustc/ty/context.rs:1761
             at ./src/librustc/ty/context.rs:1770
             at ./src/librustc/dep_graph/graph.rs:159
             at librustc_incremental/persist/save.rs:30
  28: rustc_codegen_llvm::base::codegen_crate
             at librustc_codegen_llvm/base.rs:955
             at librustc_codegen_llvm/base.rs:946
  29: <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_utils::codegen_backend::CodegenBackend>::codegen_crate
             at librustc_codegen_llvm/lib.rs:204
  30: rustc_driver::driver::phase_4_codegen
             at librustc_driver/driver.rs:1247
             at ./src/librustc/util/common.rs:166
             at ./src/librustc/util/common.rs:160
             at librustc_driver/driver.rs:1247
  31: rustc_driver::driver::compile_input::{{closure}}
             at librustc_driver/driver.rs:317
  32: rustc::ty::context::tls::enter_context
             at librustc_driver/driver.rs:1231
             at ./src/librustc/ty/context.rs:1748
             at ./src/librustc/ty/context.rs:1725
             at ./src/librustc/ty/context.rs:1666
             at ./src/librustc/ty/context.rs:1724
  33: <std::thread::local::LocalKey<T>>::with
             at ./src/librustc/ty/context.rs:1747
             at ./src/librustc/ty/context.rs:1714
             at ./src/libstd/thread/local.rs:294
             at ./src/libstd/thread/local.rs:248
             at ./src/librustc/ty/context.rs:1706
             at ./src/libstd/thread/local.rs:294
             at ./src/libstd/thread/local.rs:248
  34: rustc::ty::context::TyCtxt::create_and_enter
             at ./src/librustc/ty/context.rs:1698
             at ./src/librustc/ty/context.rs:1736
             at ./src/librustc/ty/context.rs:1178
  35: rustc_driver::driver::compile_input
             at librustc_driver/driver.rs:1141
             at librustc_driver/driver.rs:276
  36: rustc_driver::run_compiler_with_pool
             at librustc_driver/lib.rs:551
  37: syntax::with_globals
             at librustc_driver/lib.rs:472
             at librustc_driver/driver.rs:72
             at librustc_driver/lib.rs:471
             at /home/mw/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.1/src/lib.rs:155
             at ./src/libsyntax/lib.rs:96
             at /home/mw/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.1/src/lib.rs:155
             at ./src/libsyntax/lib.rs:95
  38: rustc_driver::monitor::{{closure}}
             at librustc_driver/lib.rs:462
             at librustc_driver/lib.rs:1695
             at librustc_driver/lib.rs:180
             at librustc_driver/lib.rs:1609
  39: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:105
  40: std::panicking::try
             at ./src/libstd/panicking.rs:289
  41: rustc_driver::run
             at ./src/libstd/panic.rs:374
             at librustc_driver/lib.rs:1541
             at librustc_driver/lib.rs:1608
             at librustc_driver/lib.rs:179
  42: rustc_driver::main
             at librustc_driver/lib.rs:1688
  43: std::rt::lang_start::{{closure}}
             at ./src/libstd/rt.rs:74
  44: std::panicking::try::do_call
             at libstd/rt.rs:59
             at libstd/panicking.rs:310
  45: __rust_maybe_catch_panic
             at libpanic_unwind/lib.rs:105
  46: std::panicking::try
             at libstd/panicking.rs:289
  47: std::rt::lang_start_internal
             at libstd/panic.rs:374
             at libstd/rt.rs:58
  48: main
  49: __libc_start_main
  50: _start
``

@michaelwoerister
Copy link
Member

OK, I think this is the issue @oli-obk has been warning about all along. The code does the following:

  1. It decodes AllocAtPos at 1234, which caches 1234.
  2. Later it directly decodes Alloc at 1234 (where it was embedded the first time) and hits the cache, leaving the decoder at the position of the allocation where it is expected to be after the allocation.
  3. Decoding the data after the allocation thus fails.

@oli-obk
Copy link
Contributor

oli-obk commented May 24, 2018

Oh right. this has nothing to do with recursive allocs, just with a different order of encoding and decoding (of completely unrelated objects that contain the same AllocId)

@oli-obk
Copy link
Contributor

oli-obk commented May 24, 2018

Though without recursion you can cache the end position, too, and everything should work (It did for the old impl, just died on recursion then, because you didn't know the end position yet when you got to the actual allocation)

@oli-obk
Copy link
Contributor

oli-obk commented May 24, 2018

Oh that won't work for parallel decoding :(

Any ideas @Zoxc ?

@michaelwoerister
Copy link
Member

I think we should stick to the table based approach. I'm sure it can be made to work with parallel decoding.

// Create an id which is not fully loaded
(tcx.alloc_map.lock().reserve(), false)
});
if fully_loaded || !local_cache(decoder).insert(alloc_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if one thread tstarts decoding, the next thread takes over the CPU, gets here for the same AllocId, skips over and tries to access the allocation? It'll ICE about uncached alloc or error with dangling pointer, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the fully_loaded case this isn't a problem since the AllocId has an Allocation assigned. For the !local_cache(decoder).insert(alloc_id) case, we know that some stack frame above us will assign an AllocId before the result will be used. Since local_cache is thread local another thread won't see the value inserted here. It may instead decode the same allocation in parallel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat. Please make this explanation a comment on that if statement

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
[00:23:11]    Compiling syntax_pos v0.0.0 (file:///checkout/src/libsyntax_pos)
[00:23:15]    Compiling rustc_errors v0.0.0 (file:///checkout/src/librustc_errors)
[00:24:15]    Compiling proc_macro v0.0.0 (file:///checkout/src/libproc_macro)
[00:24:26]    Compiling syntax_ext v0.0.0 (file:///checkout/src/libsyntax_ext)
[00:26:18] thread 'main' panicked at 'internal error: entered unreachable code', librustc/mir/interpret/mod.rs:164:10
[00:26:19] 
[00:26:19] error: internal compiler error: unexpected panic
[00:26:19] 
[00:26:19] 
[00:26:19] note: the compiler unexpectedly panicked. this is a bug.
[00:26:19] 
[00:26:19] note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports
[00:26:19] note: rustc 1.28.0-dev running on x86_64-unknown-linux-gnu
[00:26:19] 
[00:26:19] 
[00:26:19] note: compiler flags: -Z force-unstable-if-unmarked -C prefer-dynamic -C opt-level=3 -C prefer-dynamic -C debug-assertions=y -C link-args=-Wl,-rpath,$ORIGIN/../lib --crate-type dylib
[00:26:19] 
[00:26:19] note: some of the compiler flags provided by cargo are hidden
[00:26:19] error: Could not compile `rustc`.
[00:26:19] 
[00:26:19] Caused by:
[00:26:19] Caused by:
[00:26:19]   process didn't exit successfully: `/checkout/obj/build/bootstrap/debug/rustc --crate-name rustc librustc/lib.rs --color always --error-format json --crate-type dylib --emit=dep-info,link -C prefer-dynamic -C opt-level=3 -C metadata=62d74f21a64f8c0c -C extra-filename=-62d74f21a64f8c0c --out-dir /checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps --target x86_64-unknown-linux-gnu -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps -L dependency=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/release/deps --extern tempdir=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libtempdir-450feded456a4278.rlib --extern lazy_static=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/liblazy_static-3846f1b0424591fd.rlib --extern jobserver=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libjobserver-3720d8c52a6bc989.rlib --extern graphviz=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libgraphviz-f21bfea456e2feba.so --extern proc_macro=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libproc_macro-5c863390141836fe.so --extern syntax_pos=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libsyntax_pos-70b92be3dfddcce2.so --extern byteorder=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libbyteorder-270afc7a968c2570.rlib --extern flate2=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libflate2-ff77786b985e61bc.rlib --extern syntax=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libsyntax-9dea40d5c994cba1.so --extern rustc_errors=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/librustc_errors-e7bbb7d6e0541d97.so --extern rustc_data_structures=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/librustc_data_structures-3762ade15a64029b.so --extern bitflags=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64-unknown-linux-gnu/release/deps/libbitflags-575f47f158b62d9a.rlib --extern log=/checkout/obj/build/x86_64-unknown-linux-gnu/stage1-rustc/x86_64own-linux-gnu/stage0/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-j" "4" "--release" "--locked" "--color" "always" "--features" " jemalloc" "--manifest-path" "/checkout/src/rustc/Cargo.toml" "--message-format" "json"
[00:26:19] expected success, got: exit code: 101
[00:26:19] thread 'main' panicked at 'cargo must succeed', bootstrap/compile.rs:1091:9
[00:26:19] travis_fold:end:stage1-rustc

[00:26:19] travis_time:end:stage1-rustc:start=1527227484793947373,finish=1527227712491800476,duration=227697853103


[00:26:19] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap build
[00:26:19] Build completed unsuccessfully in 0:21:36
[00:26:19] Makefile:28: recipe for target 'all' failed
[00:26:19] make: *** [all] Error 1
70300 ./obj/build/x86_64-unknown-linux-gnu/native/jemalloc
68788 ./src/llvm/lib
65420 ./src/llvm-emscripten/test/CodeGen
61608 ./obj/build/x86_64-unknown-linux-gnu/stage0-rustc/release

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)


// Write placeholder for size
let size_pos = encoder.position();
0usize.encode(encoder)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because of variable-length integer encoding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc has some similar code elsewhere and works around this by using a 4 byte array that the size is encoded into. Somewhat space-wasteful though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be simpler to just remember the in the global_cache during decoding?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't work, because we need this value also when another thread hasn't finished decoding the allocation yet.

match AllocKind::decode(decoder)? {
AllocKind::AllocAtPos => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the trick here that I had originally where you read a usize, and that tag is either 0 for Alloc, 1 for Static, 2 for Fn or anything else is the real_pos. This is also used in Ty encoding I think.

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2018

Before @Zoxc does more work here, we should decide whether

I think we should stick to the table based approach. I'm sure it can be made to work with parallel decoding.

is an option. It is certainly the more easily grokked option. What exactly is blocking that solution from allowing parallel decoding?

Isn't the table based solution essentially equivalent to this solution but it's AllocAtPos always, there's no Alloc variant?

@michaelwoerister
Copy link
Member

@oli-obk, do you remember why exactly we switched to the table-based approach? Because you already had the skipping implemented but that may not have been sufficient for all cases.

@michaelwoerister
Copy link
Member

The table-based approach also might work better if we ever want to make the cache updateable in-place. I know that was one of the reasons why we wanted it.

@rust-highfive
Copy link
Collaborator

The job x86_64-gnu-llvm-3.9 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@bjorn3
Copy link
Member

bjorn3 commented May 25, 2018

@TimNN @rust-highfive log is empty

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2018

Because you already had the skipping implemented but that may not have been sufficient for all cases.

The skipping failed for recursive cases, that's where we decided to scrap the approach for thetable based version, since iirc that's what we wanted all along.

The table-based approach also might work better if we ever want to make the cache updateable in-place

Oh yea, that would definitely require an indirection like the one used currently

@oli-obk
Copy link
Contributor

oli-obk commented May 25, 2018

@Zoxc

[00:24:29] thread 'main' panicked at 'assertion failed: hi >= filemap.original_start_pos && hi <= filemap.original_end_pos', librustc_metadata/decoder.rs:356:9

@michaelwoerister
Copy link
Member

FYI, I'm looking into an alternative implementation of this right now.

bors added a commit that referenced this pull request May 29, 2018
WIP: Make const decoding thread-safe.

This is an alternative to #50957. It's a proof of concept (e.g. it doesn't adapt metadata decoding, just the incr. comp. cache) but I think it turned out nice. It's rather simple and does not require passing around a bunch of weird closures, like we currently do.

If you (@Zoxc & @oli-obk) think this approach is good then I'm happy to finish and clean this up.

Note: The current version just spins when it encounters an in-progress decoding. I don't have a strong preference for this approach. Decoding concurrently is equally fine by me (or maybe even better because it doesn't require poisoning).

r? @Zoxc
bors added a commit that referenced this pull request Jun 1, 2018
Make const decoding thread-safe.

This is an alternative to #50957. It's a proof of concept (e.g. it doesn't adapt metadata decoding, just the incr. comp. cache) but I think it turned out nice. It's rather simple and does not require passing around a bunch of weird closures, like we currently do.

If you (@Zoxc & @oli-obk) think this approach is good then I'm happy to finish and clean this up.

Note: The current version just spins when it encounters an in-progress decoding. I don't have a strong preference for this approach. Decoding concurrently is equally fine by me (or maybe even better because it doesn't require poisoning).

r? @Zoxc
@bors
Copy link
Contributor

bors commented Jun 1, 2018

☔ The latest upstream changes (presumably #51060) made this pull request unmergeable. Please resolve the merge conflicts.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 1, 2018

the alternative to this PR (#51060) has been merged

@oli-obk oli-obk closed this Jun 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants