Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet.service: Wait for network-online.target #1249

Merged
merged 1 commit into from
Apr 24, 2020

Conversation

rosti
Copy link
Contributor

@rosti rosti commented Apr 22, 2020

What type of PR is this?

/kind bug

What this PR does / why we need it:

Whenever kubeadm detects a system that has systemd-resolved running, it would
provision the kubelet on the local node with a resolv.conf overwrite -
/run/systemd/resolve/resolv.conf.

However, some kubeadm users have discovered an issue during system boot.
The kubelet can end up in a race with the systemd-resolved service and actually
startup loads with empty or incorrect resolve.conf files.

The race is caused by the fact that the kubelet.service file does not indicate
dependence on the network-online.target.

To fix this we add network-online.target as a dependency and wait for its
initialization to complete before starting the kubelet.

Which issue(s) this PR fixes:

Refs #1248, kubernetes/kubeadm#2111

Special notes for your reviewer:

/cc @neolit123
/cc @kubernetes/release-engineering
/cc @kubernetes/sig-node-bugs
/assign @justaugustus @tpepper
/priority important-longterm

Does this PR introduce a user-facing change?

systemd would now start the kubelet service after the network-online.target is reached.

Whenever kubeadm detects a system that has systemd-resolved running, it would
provision the kubelet on the local node with a resolv.conf overwrite
- /run/systemd/resolve/resolv.conf.

However, some kubeadm users have discovered an issue during system boot.
The kubelet can end up in a race with the systemd-resolved service and actually
startup loads with empty or incorrect resolve.conf files.

The race is caused by the fact that the kubelet.service file does not indicate
dependence on the network-online.target.

To fix this we add network-online.target as a dependency and wait for its
initialization to complete before starting the kubelet.

Signed-off-by: Rostislav M. Georgiev <rostislavg@vmware.com>
@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Apr 22, 2020
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Apr 22, 2020
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. labels Apr 22, 2020
@neolit123
Copy link
Member

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2020
@ncdc
Copy link
Member

ncdc commented Apr 22, 2020

cc @detiber

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

But I'm wondering if this fixes the real issue since it seemed related to kubeadm.

@rosti
Copy link
Contributor Author

rosti commented Apr 22, 2020

But I'm wondering if this fixes the real issue since it seemed related to kubeadm.

@saschagrunert there are a couple of problems in the original kubeadm issue:

  1. kubeadm ignores resolv.conf overwrite by users if they are done through the kubelet component config (and not the kubelet command line). That problem requires a patch in kubeadm itself.

  2. Users with slow systems, who rely on kubeadm to point the kubelet to the systemd-resolved resolv.conf file, might experience problems due to the described race (the problem this PR fixes).

@dims
Copy link
Member

dims commented Apr 22, 2020

cc @akutz

Copy link
Member

@SataQiu SataQiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
👍

@hickeng
Copy link

hickeng commented Apr 22, 2020

@rosti I think this is a useful change and does close at least one timing hole.
However please note that it doesn't address kubernetes/kubeadm#2111 which comes from a post-boot restart of systemd-networkd.

@justaugustus
Copy link
Member

Thanks for the fix!
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justaugustus, neolit123, rosti, saschagrunert, SataQiu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 24, 2020
@k8s-ci-robot k8s-ci-robot merged commit 29b1cd9 into kubernetes:master Apr 24, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Apr 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/release-eng Issues or PRs related to the Release Engineering subproject cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants