Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt the event message regex in the OOM tracker #1579

Merged
merged 1 commit into from
Nov 19, 2020

Conversation

tosi3k
Copy link
Member

@tosi3k tosi3k commented Nov 19, 2020

Get rid of Kill process <PID> (<PROCESS_NAME>) score 0 or sacrifice child\n part of the OOM event message regex as it's no longer printed out in the kernel logs.

/sig scalability
/assign @jkaniuk

@k8s-ci-robot k8s-ci-robot added sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Nov 19, 2020
@@ -42,7 +42,7 @@ const (
)

var (
oomEventMsgRegex = regexp.MustCompile(`Kill process (\d+) \((.+)\) score \d+ or sacrifice child\nKilled process \d+ .+ total-vm:(\d+kB), anon-rss:\d+kB, file-rss:\d+kB.*`)
oomEventMsgRegex = regexp.MustCompile(`Killed process (\d+) \((.+)\) total-vm:(\d+kB), anon-rss:\d+kB, file-rss:\d+kB.*`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we try to have both?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow but here's the context: kubernetes/node-problem-detector#480 (comment).

Basically, I'm adapting the regex to handle OOM events emitted by both the old and the new NPD here.

Copy link
Contributor

@jkaniuk jkaniuk Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are replacing Kill with Killed it would not work with old NPD/kernel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will - I'm simply removing the prefix of the message we observe in the NPD's kernel monitor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I did not notice the second part.

Copy link
Contributor

@jkaniuk jkaniuk Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some unit tests? Old kernel message and new message, so that future changes would not break it. It could be in another commit so it would be easier to backport.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will open a separate PR for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As backports already have merged, ok.

@mm4tt
Copy link
Contributor

mm4tt commented Nov 19, 2020

IIUC, the new method will be backward compatible, i.e. it will also work for old NPD releases.
In that a case, it looks reasonable. Adding hold to confirm. Feel free to void it yourself
/hold

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2020
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 19, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mm4tt, tosi3k

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 19, 2020
@tosi3k
Copy link
Member Author

tosi3k commented Nov 19, 2020

IIUC, the new method will be backward compatible, i.e. it will also work for old NPD releases.

Exactly. OOM tracker searches for the substring match of the regex, not for an exact match, so both formats will be covered.

@tosi3k
Copy link
Member Author

tosi3k commented Nov 19, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2020
@jkaniuk
Copy link
Contributor

jkaniuk commented Nov 19, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2020
@jkaniuk
Copy link
Contributor

jkaniuk commented Nov 19, 2020

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2020
@k8s-ci-robot k8s-ci-robot merged commit d638337 into kubernetes:master Nov 19, 2020
@tosi3k tosi3k deleted the oom-regex-master branch February 9, 2021 21:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants