feat: Conditionally remove networkd online dependency on Ubuntu #5772

TheRealFalcon · 2024-10-02T20:31:10Z

Proposed Commit Message

feat: Conditionally remove networkd online dependency on Ubuntu

Traditionally, cloud-init-network.service (previously
cloud-init.service) waited for network connectivity (via systemd
service ordering) before running. This has caused
cloud-init-network.service to block boot for a significant amount of
time. For the vast majority of boots, this network connectivity
isn't required.

This commit removes the ordering
After=systemd-networkd-wait-online.service, but checks the datasource
and user data in the init-local timeframe to see if network
connectivity will be necessary in the init network timeframe.
If so, when the init network service starts, it will call
systemd-networkd-wait-online manually in the same manner that the
systemd-networkd-wait-online.service does to wait for network
connectivity.

This commit affects Ubuntu only due to the various number of service
orderings and network renderers possible, along with the downstream
synchronization needed. However, a new overrideable method in the
Distro class should make this optimization trivial to implement for
any other distro.

Additional Context

Test Steps

Merge type

Squash merge using "Proposed Commit Message"
Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

TheRealFalcon · 2024-10-03T03:38:31Z

Take 2...

I added a commit that instead of using a drop-in will call the systemd-networkd-wait-online binary using the exact same args that the service uses based on 'systemctl cat'. This removes the need for a costly daemon-reload in the cases we need to wait on network.

I haven't touched integration tests since the first commit, so they will have known failures.

blackboxsw

Thanks @TheRealFalcon for working this. quick first pass review as your are actively coding/developing tests for this functionality

blackboxsw · 2024-10-03T15:52:12Z

cloudinit/cmd/main.py

@@ -411,6 +474,9 @@ def main_init(name, args):
    mode = sources.DSMODE_LOCAL if args.local else sources.DSMODE_NETWORK

    if mode == sources.DSMODE_NETWORK:
+        if os.path.exists(init.paths.get_runpath(".wait-on-network")):
+            LOG.debug("Will wait for network connectivity before continuing")
+            init.distro.wait_for_network()


Should we now util.del_file("init.paths.get_runpath(".wait-on-network")) now that we've waited and succeeded on the wait_for_network() call?

Is there a reason to? /run gets cleaned on boot, and if a cloud-init collect-logs will contain the file as another piece of evidence. I don't think it would hurt any further invocations of cloud-init on the current boot.

The only reason would be later calls on the command line to cloud-init init per use-cases like -#5726 not needing to actually shell out to run that wait again.

blackboxsw · 2024-10-03T16:01:48Z

cloudinit/cmd/main.py

+        _should_wait_via_cloud_config(config)
+        for config in [
+            datasource.get_userdata_raw(),
+            datasource.get_vendordata_raw(),
+            datasource.get_vendordata2_raw(),
+        ]


Can we make this a generator expression without having to calculate the truthy return values from userdata vedordata and vendordarta2? Then we can early exit on the first case of userdata/vendordata containing write_files or un-decodable file content without having to perform the extra calculations.

Hmmm, good question. A generator expression won't accomplish that though.

I could do something like:

def _get_configs(datasource): yield datasource.get_userdata_raw() yield datasource.get_vendordata_raw() yield datasource.get_vendordata2_raw() def _should_wait_on_network(datasource): ... return any( _should_wait_via_cloud_config(config) for config in _get_configs(datasource) )

blackboxsw · 2024-10-03T16:03:12Z

cloudinit/cmd/main.py

+        if _should_wait_on_network(init.datasource):
+            LOG.debug(
+                "Network connectivity determined necessary for "
+                "cloud-init's network stage"


"Future me" in triage wants to know the reason we chose to wait on network. Can we log what condition triggered our network wait: no datasource vs non-#cloud-config user-data vs write_files in config. I also want this in logs to make it conspicuous why we are waiting in the event that we shouldn't be in some cases.

blackboxsw · 2024-10-03T16:08:52Z

cloudinit/distros/debian.py

@@ -220,6 +221,10 @@ def set_keymap(self, layout: str, model: str, variant: str, options: str):
        # be needed
        self.manage_service("restart", "console-setup")

+    def wait_for_network(self):
+        """Ensure that cloud-init's network service has network connectivity"""
+        netplan.wait_for_network()


This looks like this approach will work for broader environments than just netplan and just Ubuntu and/or Debian. I like that you are keeping it specific for the time being to environments which have netplan.

Don't we also want to confirm netplan.available() before calling netplan.wait_for_network() to avoid calling this on debian environments which don't have netplan.io installed?

This makes sense if the debian image contains netplan, but what happens on Debian images when they don't have have netplan?

Good point. I was assuming it was fine for debian because of the netplan usage there too.
I think instead I should move this method into ubuntu.py. If netplan isn't available, we have nothing left to wait on. On Ubuntu, we can assume we have netplan available for supported images. If somebody goes off the beaten path in their image there's really no "right" thing to do other than log an error.

blackboxsw · 2024-10-03T16:15:44Z

systemd/cloud-init-network.service.tmpl

@@ -9,7 +9,9 @@ Wants=cloud-init-local.service
 Wants=sshd-keygen.service
 Wants=sshd.service
 After=cloud-init-local.service
+{% if variant not in ["debian", "ubuntu"] %}


I don't think it is "safe to include debian in this conditional here. There are plenty of images, even cloud image containing cloud-init, which also don't contain netplan.io in the image: lxc launch images:debian/bookworm/cloud. In these cases, I'd not want default packaging to drop this ordering on Debian.

blackboxsw · 2024-10-03T16:20:18Z

cloudinit/net/netplan.py

+        LOG.debug("NetworkManager is enabled, skipping networkd wait")
+        return
+    wait_online_def: str = subp.subp(
+        ["systemctl", "cat", "systemd-networkd-wait-online.service"]


Given that this function isn't really netplan-speciifc, it's more systemd-networkd specific, doesn't this helper function actually belong over in cloudinit/net/systemd.py?

If we happen to run in an environment which doesn't have the systemd-networkd-wait-online.service the systemctl cat will exit 1, we should figure out how we want to behave in that environment. Info or WARNING log and return without waiting?

If called on a system without netplan.io, do we want this function to log that condition and just return without waiting?

Traditionally, cloud-init-network.service (previously cloud-init.service) waited for network connectivity (via systemd service ordering) before running. This has caused cloud-init-network.service to block boot for a significant amount of time. For the vast majority of boots, this network connectivity isn't required. This commit removes the ordering After=systemd-networkd-wait-online.service, but checks the datasource and user data in the init-local timeframe to see if network connectivity will be necessary in the init network timeframe. If so, when the init network service starts, it will call systemd-networkd-wait-online manually in the same manner that the systemd-networkd-wait-online.service does to wait for network connectivity. This commit affects Ubuntu only due to the various number of service orderings and network renderers possible, along with the downstream synchronization needed. However, a new overrideable method in the Distro class should make this optimization trivial to implement for any other distro.

…call into the activators.

TheRealFalcon · 2024-10-03T19:49:39Z

@blackboxsw , thanks for the review. I applied your comments and also made a change to move the wait into the activators.

blackboxsw · 2024-10-03T20:23:47Z

cloudinit/cmd/main.py

+        return True, "random_seed command found"
+    if parsed_yaml.get("mounts"):
+        return True, "mounts found"
+    return False, "cloud-config does not contain network requiring elements"


This is good. thanks for the breakdown

blackboxsw · 2024-10-03T20:32:22Z

cloudinit/distros/__init__.py

-            return activators.select_activator(priority=priority)
-        except activators.NoActivatorException:
-            return None
+        return activators.select_activator(priority=priority)


With this change I think you've also missed handling the exception at the call-site in cloudinit/cmd/devel/hotplug_hook.py... Although I think that call-site in hotplug hook is also broken anyway before this PR if cloud-init can't find an activator as activator return value would be None, and we'd try to reference None.bring_up_interface

Yeah, lack of exception handling there was intentional.

blackboxsw reviewed Oct 3, 2024

View reviewed changes

TheRealFalcon force-pushed the wait branch from b8d6094 to f88d8a9 Compare October 3, 2024 19:42

Comments and minor refactors. Additionally, move the wait_on_network …

ed3a50e

…call into the activators.

TheRealFalcon force-pushed the wait branch from f88d8a9 to ed3a50e Compare October 3, 2024 19:48

fix the tests I broke

7f1e212

blackboxsw added the packaging Supplemental package review requested label Oct 3, 2024

blackboxsw reviewed Oct 3, 2024

View reviewed changes

TheRealFalcon marked this pull request as ready for review October 8, 2024 19:41

blackboxsw self-assigned this Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Conditionally remove networkd online dependency on Ubuntu #5772

feat: Conditionally remove networkd online dependency on Ubuntu #5772

TheRealFalcon commented Oct 2, 2024 •

edited

Loading

TheRealFalcon commented Oct 3, 2024 •

edited

Loading

blackboxsw left a comment

blackboxsw Oct 3, 2024

TheRealFalcon Oct 3, 2024

blackboxsw Oct 3, 2024

blackboxsw Oct 3, 2024

TheRealFalcon Oct 3, 2024

blackboxsw Oct 3, 2024

blackboxsw Oct 3, 2024

TheRealFalcon Oct 3, 2024

blackboxsw Oct 3, 2024

blackboxsw Oct 3, 2024

TheRealFalcon commented Oct 3, 2024

blackboxsw Oct 3, 2024

blackboxsw Oct 3, 2024

TheRealFalcon Oct 3, 2024

feat: Conditionally remove networkd online dependency on Ubuntu #5772

Are you sure you want to change the base?

feat: Conditionally remove networkd online dependency on Ubuntu #5772

Conversation

TheRealFalcon commented Oct 2, 2024 • edited Loading

Proposed Commit Message

Additional Context

Test Steps

Merge type

TheRealFalcon commented Oct 3, 2024 • edited Loading

blackboxsw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Oct 3, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TheRealFalcon commented Oct 2, 2024 •

edited

Loading

TheRealFalcon commented Oct 3, 2024 •

edited

Loading