Error generating reports #115

chzgustavo · 2022-11-02T21:48:36Z

Hello, I am using this tool, congratulations it is very good, but I have noticed that when a segment fault is generated, it sometimes generates all the files with another namespace name.

I attach evidence.

cluster: EKS v1.21
core dump version:

NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                        APP VERSION
core-dump-handler       observe         1               2022-07-01 04:33:50.377219926 +0000 UTC deployed        core-dump-handler-v8.6.0     v8.6.0

pod-info file: it contains the namespace env-1f1de3e2bda8, when in fact this pod is in the namespace: env-e4e2facbcb22

It occurs to me to update core-dump to the newest version, I don't know if this will solve this problem.

The text was updated successfully, but these errors were encountered:

chzgustavo · 2022-11-02T21:49:40Z

Do you have any idea how I could debug this error?

Regards,
Gustavo.

No9 · 2022-11-02T23:51:51Z

Hi @chzgustavo
Thanks for the feedback really appreciate it.

Do you have pods with the same name running in different namespaces?

Background

The information from crio is currently queried using the hostname of the crashing container which is assumed to be unique.

This container hostname is then used to match to the pod.
https://github.com/IBM/core-dump-handler/blob/main/core-dump-composer/src/main.rs#L75

It isn't ideal but using the hostname is the only way to try and catch the crashing container information that I am aware of.

This isn't an issue in most deployment scenarios as people tend to use replicasets/deployments that generates a unique id for each pod.

However if you are creating pods directly in each namespace then you may have the potential to hit a name clash issue.

Possible Solution

If that sounds like the problem I would suggest giving each pod a unique name when provisioning.

chzgustavo · 2022-11-03T00:01:58Z

Yes, indeed, I have many pods with the same name running in different namespaces.
The pods that generate segment fault belong to statefulset resources.

chzgustavo · 2022-11-03T00:22:34Z

They all have the same hostname (but they are in different namespaces), is there any other possible solution for this case?
Thanks for your help!

No9 · 2022-11-03T00:56:40Z

Sorry I'm not aware of another possible solution.

Statefulsets intentionally label the pods with ordinal numbers
https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#pod-identity.

If you're using helm you can add the namespace to the statefulset name which would resolve this.
I know it's clunky but it should resolve it handily enough.

The underlying issue here is that the container kernel.core_pattern is per host and not per container so it's not possible to feed dynamic info from the pod to the kernel at runtime.

As systemd becomes more pod aware there may be a possibility to do something there but the last time I looked it just seemed to pass through to the system code.

[Edit]
I will add this to the FAQ as it seems like it would be a fairly common scenario that will trip others up.

[Edit2]
I'll double check the statuses in the responses from CRIO it may be possible to detect if the pod is crashing and if it isn't then move on to the next pod. I seem to remember looking at this when I wrote it and it wasn't possible but I'll double check.
I won't get to that for a bit though as I have to look at #114 first.

…ariable from a core dump and use that as the podname If the environment variable can't be found then the composer will default back to hostname Signed-off-by: Tom Haygarth <tom@ninjakiwi.com>

jesuslinares · 2023-05-12T09:44:06Z

Hi @No9,

Thanks for the information. We continue with this bug in production since we didn't apply the "clunky workaround".

Did you make some progress to fix it?

This project is very useful for us, thanks for the good work.

No9 added the documentation Improvements or additions to documentation label Nov 3, 2022

No9 mentioned this issue Jan 24, 2023

multi tenant support #131

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error generating reports #115

Error generating reports #115

chzgustavo commented Nov 2, 2022

chzgustavo commented Nov 2, 2022 •

edited

Loading

No9 commented Nov 2, 2022

chzgustavo commented Nov 3, 2022 •

edited

Loading

chzgustavo commented Nov 3, 2022

No9 commented Nov 3, 2022 •

edited

Loading

jesuslinares commented May 12, 2023

Error generating reports #115

Error generating reports #115

Comments

chzgustavo commented Nov 2, 2022

chzgustavo commented Nov 2, 2022 • edited Loading

No9 commented Nov 2, 2022

Background

Possible Solution

chzgustavo commented Nov 3, 2022 • edited Loading

chzgustavo commented Nov 3, 2022

No9 commented Nov 3, 2022 • edited Loading

jesuslinares commented May 12, 2023

chzgustavo commented Nov 2, 2022 •

edited

Loading

chzgustavo commented Nov 3, 2022 •

edited

Loading

No9 commented Nov 3, 2022 •

edited

Loading