-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds on Linux #3043
Comments
@joyeecheung do you have any thoughts/suggestions on the snapshot front? @sxa I assume that it should be possible to update |
That would be my assumption yes. The previous issues and PRs suggest this was resolved, so we'd just need to understand what happened and potentially reimplement a fix for it. |
Pinging @warpfork for this thread. He's currently heads-down on building Warpforge and has a passion for reproducible builds and I was telling him that there's interest in making progress on this front for Node.js so there might be some potential for collaboration and the production of some tooling to both generate reproducible builds and also document & provide mechanisms for people to do it for themselves without all the pain that usually comes from doing it manually. So @sxa, meet the most excellent and clever @warpfork; maybe you two should chat. |
Thanks @rvagg - I've worked on reproducible build environments for another project, and the Node.js build is close other than those two issues above, so it should just require a little bit of knowledge about the Node processes involved in producing those two things to be able to resolve it, then we can possibly put some more robust things in place for making our builds reproducible by default, which I think should be the aim here. @warpfork DO you have much experience on non-Linux reproducible build work (I've so far only looked at Linux for Node.js) |
One source of indeterminism is here: Instead of a set a dict can be used here (with |
Is there a todo on this issue or can it be closed? |
Definitely still work to do and I still plan to look at it. |
Just tried four consecutive builds with the latest code and the This was tested with With snapshots enabled the builds are still non-reproducible. due to |
Looks like the break was between these two commits, so nodejs/node@1faf6f459f likely made it non-reproducible again.
FYI @joyeecheung (PR) @bnoordhuis (PR) I don't think I have enough knowledge of this code to be able to propose a safe solution here so looking for advice. |
@joyeecheung Would you be able to assist with the reproducibility of the snapshots now? I'm not sure who else might have good knowledge of the snapshot support in Node so anyone else might have to start from scratch. |
Thanks for the ping, not sure how I missed the earlier one. I diff'ed the generated snapshot locally and found ~20 lines of differences in the snapshot.cc generated (with |
With this patch the snapshot.cc difference is down to 13 lines. Still need to figure out the differences in the blob though. see diffdiff --git a/src/cleanup_queue-inl.h b/src/cleanup_queue-inl.h
index 5d9a56e6b0..d1fbd8241d 100644
--- a/src/cleanup_queue-inl.h
+++ b/src/cleanup_queue-inl.h
@@ -39,7 +39,9 @@ void CleanupQueue::Remove(Callback cb, void* arg) {
template <typename T>
void CleanupQueue::ForEachBaseObject(T&& iterator) const {
- for (const auto& hook : cleanup_hooks_) {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
+ for (const auto& hook : callbacks) {
BaseObject* obj = GetBaseObject(hook);
if (obj != nullptr) iterator(obj);
}
diff --git a/src/cleanup_queue.cc b/src/cleanup_queue.cc
index 6290b6796c..c0fcda2fac 100644
--- a/src/cleanup_queue.cc
+++ b/src/cleanup_queue.cc
@@ -5,7 +5,7 @@
namespace node {
-void CleanupQueue::Drain() {
+std::vector<CleanupQueue::CleanupHookCallback> CleanupQueue::GetOrdered() const {
// Copy into a vector, since we can't sort an unordered_set in-place.
std::vector<CleanupHookCallback> callbacks(cleanup_hooks_.begin(),
cleanup_hooks_.end());
@@ -20,6 +20,12 @@ void CleanupQueue::Drain() {
return a.insertion_order_counter_ > b.insertion_order_counter_;
});
+ return callbacks;
+}
+
+void CleanupQueue::Drain() {
+ std::vector<CleanupHookCallback> callbacks = GetOrdered();
+
for (const CleanupHookCallback& cb : callbacks) {
if (cleanup_hooks_.count(cb) == 0) {
// This hook was removed from the `cleanup_hooks_` set during another
diff --git a/src/cleanup_queue.h b/src/cleanup_queue.h
index 64e04e1856..2ca333aca8 100644
--- a/src/cleanup_queue.h
+++ b/src/cleanup_queue.h
@@ -6,6 +6,7 @@
#include <cstddef>
#include <cstdint>
#include <unordered_set>
+#include <vector>
#include "memory_tracker.h"
@@ -66,6 +67,7 @@ class CleanupQueue : public MemoryRetainer {
uint64_t insertion_order_counter_;
};
+ std::vector<CleanupHookCallback> GetOrdered() const;
inline BaseObject* GetBaseObject(const CleanupHookCallback& callback) const;
// Use an unordered_set, so that we have efficient insertion and removal.
diff --git a/tools/snapshot/node_mksnapshot.cc b/tools/snapshot/node_mksnapshot.cc
index ecc295acdb..2ba6878a28 100644
--- a/tools/snapshot/node_mksnapshot.cc
+++ b/tools/snapshot/node_mksnapshot.cc
@@ -52,6 +52,7 @@ int main(int argc, char* argv[]) {
#endif // _WIN32
v8::V8::SetFlagsFromString("--random_seed=42");
+ v8::V8::SetFlagsFromString("--predictable");
v8::V8::SetFlagsFromString("--harmony-import-assertions");
return BuildSnapshot(argc, argv);
} |
Sounds like good progress - thanks @joyeecheung! |
Checking the snapshots again, another source of indeterminism comes from performance data (milestones, time origin etc.), the easiest way to fix it is probably discarding it before snapshot generation. But I'll need to check if/how they should be synchronized. |
With nodejs/node#48702 and nodejs/node#48708 and --predictable the differences are down to 8 places:
EDIT: we can just return some non-empty data for both BaseObject slots to fix 1. 2 probably need a V8 patch for us to customize how the slots should be serialized. Trying to work out a prototype. |
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we just rely on the unordered_set order to iterate over the BaseObjects, which is not deterministic. The iteration is only used in printing, verification, and snapshot generation. In the first two cases the performance overhead of sorting does not matter because they are only used for debugging. In the last case the determinism is more important than the trivial overhead of sorting. So this patch makes the iteration deterministic by sorting the set first, as what is already being done when we drain the queue. PR-URL: #48702 Refs: nodejs/build#3043 Reviewed-By: Tobias Nießen <tniessen@tnie.de> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: #48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Previously we cache the time origin for the milestones in the user land, and refresh it at pre-execution. As result the time origin gets serialized into the snapshot and is therefore not deterministic. Now we store it in the milestone array as an internal value and reset the milestones at serialization time instead of deserialization time. This improves the determinism of the snapshot. Drive-by: remove the unused MarkMilestone() binding. PR-URL: nodejs#48708 Refs: nodejs/build#3043 Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
I've put https://ci.nodejs.org/job/reproducibility-test/ in place to test how reproducible the builds are on Linux/aarch64 - it will run weekly:
|
Thanks, I think when nodejs/node#48851 lands, we could also add another flag to display the |
Yeah that's probably more useful for the future :-) |
that was 18.18.2 |
Without nodejs/node#50983 The snapshot would not be reproducible. The V8 patch just landed last week in the upstream, I plan to backport it soon once I fix a small regression the patch caused. |
Just tried 18.18.2 with:
Here is the full Containerfile if anyone wants to test.
Still can't get a match, though I would note I am building against musl rather than glibc as with nix.
|
Saw some diffs were ICU related so I also just tried with |
I would suggest that with all of the configure options you're using (plus using a musl instead of glibc) it may take a bit of work to identify the source of your problems from this. My tests were on Fedora 39 with gcc 13.2.1 but I've used earlier compilers when doing this in the past with the same results, and I have a CI job that does a regular check of the tip with gcc 9.3.0 on Ubuntu 20.04 which still seems to be working ok. 18.18.2 didn't produce identical builds for me, although
That's great news @joyeecheung! Sounds like we're really close now. |
Okay so when I build 20.11.1 twice with only
Will try again with latest tip and see if that one difference is corrected and share if not. I was able to conclude the big diffs I provided above, were only from cases where I am doing parallel builds with identical containerized build environments except for the CPU. In one case I am building with 20 cores of a Threadripper 2990WX 32 core, and in another I was building with ~all 32 cores of a AMD EPYC 7502P 32 core. In stagex (the deterministic Linux distro we are trying to add nodejs to) we have seen this behavior a few times in other packages, and was always because the compilation system was either making optimizations based on the CPU sub-architecture, or embedding information about the CPU or build environment into early headers. In either case we end up with a huge shift in headers as per my diffs above. Anyone more familiar with nodejs building have ideas on how I can maybe hardcode away any CPU-specific behavior or metadata beyond "generic x86_64"? |
Yeah I get different output on different CPU types too (Most notably in the |
Interesting. My biggest differences above also all seemed ICU related. Did you try comparing with using system icu? |
No I haven't tried that |
I landed the regression fix for my V8 patch, looking into backporting the V8 patches and rebasing nodejs/node#50983 - with the patches the snapshot part should be reproducible, though it seems there are some bits beyond snapshots that are not reproducible. (It seems #3043 (comment) matches my findings in #3043 (comment), I was using Ubuntu (don't remember the version now)) |
To improve determinism of snapshot generation, add --predictable to the V8 flags used to initialize a process launched to generate snapshot. Also add a kGeneratePredictableSnapshot flag to ProcessInitializationFlags for this and moves the configuration of these flags into node::InitializeOncePerProcess() so that it can be shared by embedders. PR-URL: nodejs#48749 Refs: nodejs/build#3043 Reviewed-By: Richard Lau <rlau@redhat.com> Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com> Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
In nodejs/node#50983 the snapshots are made reproducible with a test that diffs the snapshots passing in the CI. Would need some reviews to push it forward though. |
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
With nodejs/node#50983 landed on Ubuntu I am able to get identical builds with |
This is an awesome result @joyeecheung! Thank you! I've just done a test on Linux/aarch64 and confirmed that two consecutive builds on Ubuntu 22.04 come out identical on the same machine (I was running without |
From a quick glance, the job needs to be using a newer compiler than gcc 9.4.0. |
Yep - sorted and the build is now good. I've also verified that it's reproducible whether or not you use @joyeecheung Would it be possible to put these fixes back to earlier Node versions, or will the V8 levels in those not be easy to convert? It'll be a good story for the next major release regardless though! |
For v22.x the changes should land cleanly or don't need too much work to backport. For 20.x, the necessary V8 API changes would be ABI breaking although I think it can still be backportable by either backporing the V8 API changes in a non-ABI breaking way (adding new signatures instead of appending parameters), or nulling out the pointers on our end before snapshot serialization (might be easier that way anyway?). For 18.x that might be harder since it's already quite different. |
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: #50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
For pointer values in the context data, we need to return non-empty data in the serializer so that V8 does not serialize them verbatim, making the snapshot unreproducible. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
To make the snapshots reproducible, this patch updates the size of a few types and adds some static assertions to ensure that there are no padding in the memcpy-ed structs. PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
- Print offsets in blob serializer - Add a special node:generate_default_snapshot ID to generate the built-in snapshot. - Improve logging - Add a test to check the reproducibilty of the snapshot PR-URL: nodejs#50983 Refs: nodejs/build#3043 Reviewed-By: Daniel Lemire <daniel@lemire.me> Reviewed-By: James M Snell <jasnell@gmail.com>
In the dim and distant past before global lockdowns there was some previous work to make the build process binary reproducible.
I had a play recently with the process (on Linux/aarch64 because they are the quickest machines I have access to) and I hit two issues:
debug-support.cc
is nondeterministic - the definitions are not always in the same order within the file. It is generated by https://github.com/nodejs/node/blob/main/deps/v8/tools/gen-postmortem-metadata.py - If I copy in a consistent one the builds can be reproducible.configure --without-node-snapshot
), which suggests that the changes made in node_code_cache.cc and node_snapshot.cc generation is unreproducible node#29108 are no longer valid for the current release.Tagging @ChALkeR who put the original PR in (albeit three years ago!)
The text was updated successfully, but these errors were encountered: