-
-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nixos: fix some remaining consequences of network-online.target dep fix #282795
base: master
Are you sure you want to change the base?
Changes from all commits
35187f0
4697f75
6f8433d
288ec12
d3a0349
514a606
da9fdd1
84f6ec6
ae6c49a
2c9a5dd
b5b437e
a081cf8
4d6be37
bfa13cc
1893853
5cf6d4f
ad47b1e
b2b9687
dc389cc
853e16e
4f5d54a
3fc69ec
102df28
7323136
e2db5a4
47e8f1a
0182d66
c5f7c29
ab7b0a0
a04d708
6be7400
b91485b
df51365
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,7 +28,7 @@ import ./make-test-python.nix ({ pkgs, lib, ... }: { | |
}; | ||
group.root.exists = true; | ||
kernel-param."kernel.ostype".value = "Linux"; | ||
service.goss = { | ||
service."systemd-journald" = { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why the change from goss to systemd-journald? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The shape of the failure is goss reporting that some service is not running. journald should be started very very early and is definitely enabled, so this was trying to make the whole thing more deterministic, in case systemd didn't acknowledge goss as running or something. Nevertheless it still would fail without the sleep. This makes absolutely no sense, because the monitor here is simply invoking systemctl, which, I simply have no idea why it is not working, basically, and why it fixes itself later in boot. This change and the delay should be enough to make this not happen anymore, but I have no idea why it was happening in the first place. https://github.com/goss-org/goss/blob/4e36e8fb52d999e418be7d72f14bad1bfbd65737/system/service_systemd.go#L75 AFAICT this is not my regression, and it at best just exposes a test that was flakey for bizarre reasons already. |
||
enabled = true; | ||
running = true; | ||
}; | ||
|
@@ -43,8 +43,14 @@ import ./make-test-python.nix ({ pkgs, lib, ... }: { | |
machine.wait_for_unit("goss.service") | ||
machine.wait_for_open_port(8080) | ||
|
||
# due to incomprehensible race conditions, somehow goss fails to get | ||
# answers out of systemctl about systemd-journald being up, despite this | ||
# being more or less impossible. | ||
machine.sleep(5) | ||
|
||
with subtest("returns health status"): | ||
result = json.loads(machine.succeed("curl -sS http://localhost:8080/healthz")) | ||
print(result) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guessing this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, it was left in so the test is debuggable if it fails in the future, since it doesn't print out the failures. |
||
|
||
assert len(result["results"]) == 10, f".results should be an array of 10 items, was {result['results']!r}" | ||
assert result["summary"]["failed-count"] == 0, f".summary.failed-count should be zero, was {result['summary']['failed-count']}" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,6 +23,8 @@ import ./make-test-python.nix ({ pkgs, lib, ...} : | |
DynamicUser = true; | ||
Restart = "on-failure"; | ||
RestartSec = "1s"; | ||
# restart service without going through failed/inactive state that confuses wait_for_unit | ||
RestartMode = "direct"; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm curious why it seems to be expected that this service will need to restart, but I suppose it was already like that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It needs to restart because the rtsp stream broker in the middle doesn't have its sockets up immediately, so it will have a few false starts on connecting to the stream broker. We could probably try harder, but shrug. This works. |
||
TimeoutStartSec = "10s"; | ||
ExecStart = "${lib.getBin pkgs.ffmpeg-headless}/bin/ffmpeg -re -f lavfi -i smptebars=size=800x600:rate=10 -c libx264 -f flv rtmp://localhost:1935/test"; | ||
}; | ||
|
@@ -37,6 +39,8 @@ import ./make-test-python.nix ({ pkgs, lib, ...} : | |
DynamicUser = true; | ||
Restart = "on-failure"; | ||
RestartSec = "1s"; | ||
# restart service without going through failed/inactive state that confuses wait_for_unit | ||
RestartMode = "direct"; | ||
TimeoutStartSec = "10s"; | ||
ExecStart = "${lib.getBin pkgs.ffmpeg-headless}/bin/ffmpeg -y -re -i rtmp://localhost:1935/test -f flv /dev/null"; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this just always flaky? Or is this breakage new?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breakage might be new, but I absolutely don't understand why it is happening, given that it is resolved by depending on network-online.target.