Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to run a container scenario for api while count is bigger than 1 results in crash #430

Open
achuzhoy opened this issue May 25, 2023 · 1 comment

Comments

@achuzhoy
Copy link

How to reproduce:
config.yaml shold have this scenario
chaos_scenarios: # List of policies/chaos scenarios to load - container_scenarios: # List of chaos pod scenarios to load - - scenarios/openshift/container_api.yml

The content of the scenario file:
`
scenarios:

  • name: "kill apiserver container"
    namespace: "openshift-apiserver"
    label_selector: "app=openshift-apiserver-a"
    container_name: "openshift-apiserver"
    action: "kill 1"
    count: 2
    expected_recovery_time: 60
    `

python3.9 run_kraken.py --config config/kill-api.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _ | |/ / _ \ '_ \
| <| | | (| | < __/ | | |
|
|__| _,||____|| |_|

2023-05-25 11:58:39,485 [INFO] Starting kraken
2023-05-25 11:58:39,495 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,998 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 11:58:42,999 [INFO] Starting http server at http://0.0.0.0:8085

2023-05-25 11:58:43,000 [INFO] Fetching cluster info
2023-05-25 11:58:43,008 [INFO] Cluster version is 4.13.0
2023-05-25 11:58:43,008 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 11:58:43,008 [INFO] Generated a uuid for the run: a713f10c-8b26-4b2c-8a81-8356cff6ef58
2023-05-25 11:58:43,008 [INFO] Daemon mode not enabled, will run through 1 iterations

2023-05-25 11:58:43,009 [INFO] Executing scenarios for iteration 0
2023-05-25 11:58:43,009 [INFO] connection set up
127.0.0.1 - - [25/May/2023 11:58:43] "GET / HTTP/1.1" 200 -
2023-05-25 11:58:43,010 [INFO] response RUN
2023-05-25 11:58:43,010 [INFO] Running container scenarios
2023-05-25 11:58:44,823 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-hmpsj (ns openshift-apiserver)
2023-05-25 11:58:44,959 [INFO] Killing container openshift-apiserver in pod apiserver-5d45f6d58f-cd7bv (ns openshift-apiserver)
2023-05-25 11:58:45,071 [INFO] Scenario kill apiserver container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'

`

The issue reproduced with count set to 3
The issue didn't reproduce with count set to 1.

Note that the cluster has 3 pods.

When the same was attempted against SNO (with a single api pod), the following error was thrown:
2023-05-25 12:06:17,950 [INFO] Killing container openshift-apiserver in pod apiserver-6b77769b8-6j4gg (ns openshift-apiserver) 2023-05-25 12:06:18,083 [ERROR] Trying to kill more containers than were found, try lowering kill count 2023-05-25 12:06:18,083 [ERROR] Scenario kill apiserver container failed
In this case it's an expected error.

@achuzhoy
Copy link
Author

Same behavior reproduced with killing etcd:

`
chaos_scenarios: # List of policies/chaos scenarios to load
- container_scenarios: # List of chaos pod scenarios to load
- - scenarios/openshift/container_etcd.yml

`

`
scenarios:

  • name: "kill etcd container"
    namespace: "openshift-etcd"
    label_selector: "k8s-app=etcd"
    container_name: "etcd"
    action: "kill 1"
    count: 1
    expected_recovery_time: 60
    `

python3.9 run_kraken.py --config config/kill-etcd.yaml _ _ | | ___ __ __ _| | _____ _ __ | |/ / '__/ _ | |/ / _ \ '_ \
| <| | | (| | < __/ | | |
|
|__| _,||____|| |_|

2023-05-25 12:23:02,066 [INFO] Starting kraken
2023-05-25 12:23:02,075 [INFO] Initializing client to talk to the Kubernetes cluster
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,649 [INFO] Publishing kraken status at http://0.0.0.0:8085
2023-05-25 12:23:05,650 [INFO] Starting http server at http://0.0.0.0:8085

2023-05-25 12:23:05,650 [INFO] Fetching cluster info
2023-05-25 12:23:05,658 [INFO] Cluster version is 4.13.0
2023-05-25 12:23:05,659 [INFO] Server URL: https://api.elvis2.qe.lab.redhat.com:6443
2023-05-25 12:23:05,659 [INFO] Generated a uuid for the run: 77d465f6-2149-4233-b9f7-4642e84dffb0
2023-05-25 12:23:05,659 [INFO] Daemon mode not enabled, will run through 1 iterations

2023-05-25 12:23:05,659 [INFO] Executing scenarios for iteration 0
2023-05-25 12:23:05,659 [INFO] connection set up
127.0.0.1 - - [25/May/2023 12:23:05] "GET / HTTP/1.1" 200 -
2023-05-25 12:23:05,660 [INFO] response RUN
2023-05-25 12:23:05,660 [INFO] Running container scenarios
2023-05-25 12:23:08,343 [INFO] Killing container etcd in pod etcd-master-1-2 (ns openshift-etcd)
2023-05-25 12:23:08,466 [INFO] Killing container etcd in pod etcd-master-1-1 (ns openshift-etcd)
2023-05-25 12:23:08,657 [INFO] Scenario kill etcd container successfully injected
Traceback (most recent call last):
File "/root/krkn/krkn/run_kraken.py", line 421, in
main(options.cfg)
File "/root/krkn/krkn/run_kraken.py", line 218, in main
failed_post_scenarios = pod_scenarios.container_run(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 92, in container_run
failed_post_scenarios = check_failed_containers(
File "/root/krkn/krkn/kraken/pod_scenarios/setup.py", line 199, in check_failed_containers
killed_container_list = killed_container_list.remove(item)
AttributeError: 'NoneType' object has no attribute 'remove'
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant