-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIM Server issues/timeouts with ESXi 6.7 #37
Comments
I have the same issue. Working fine from ESX v4.1 up to ESXi v6.7U1 (Dell PowerEdge M620, M630, R640, R720, R730, R740, R815, R840, R905).
Dell M630:
Dell R640:
Hope this will help you folks figure it out. Edit by @Napsty : Adjusted format for better readability |
I upgraded a couple of ESXi 6.7U1 Dell R740 hosts with BIOS and iDRAC firmware up to date running Dell OpenManage 9.2 to 6.7U2 plus latest security update and got timeouts querying CIM. Updated to OpenManage 9.3 without any change. Uninstalling OpenManage fixed it. |
There's a discussion over at Spiceworks: https://community.spiceworks.com/topic/2213257-dell-omsa-9-2-esxi-6-7u2 |
Dell's OpenManage Integration for VMware vCenter doesn't support 6.7U2. Their web page (https://www.dell.com/support/article/us/en/04/sln311238/openmanage-integration-for-vmware-vcenter) says: "Does not add official 6.7 U2 support (support for 6.7 U2 will come in the fall with the next major release)" So we could be waiting a while for an updated OpenManage vib |
Thanks @philrandal for your research! So this would explain the issues for Dell servers with ESXi 6.7 U2, but in the other issue Supermicro was also mentioned (by @MarcusCaepio) . We need a confirmation/verification that it's only Dell's OpenManage or the other way around, that other vendors are also affected. In the meantime I have created a support ticket at VMware to get a statement. |
can confirm that Supermicro also have this issue |
@ucola Let's assume there are two different issues:
Are the Supermicro servers only affected by 1 or 2 or both? |
thank you for your answer, Supermicro with ESXi 6.7 U1 gives me after some time the Timeout, but after restarts (/sbin/services.sh restart) it works again for a while. |
@ucola OK that looks like the first issue applies to Supermicro. It would be interesting to see how it behaves with ESXi 6.7 U2. |
@ucola Do you have to |
I compared to another R640 with v6.7.0 build=10764712 (before U2) and the output is the same. Some sensors show unknown but most show status=normal comparing to R640 with v6.7.0 build=13006603 (U2). |
@bridrod So "Hardware Health" is shown correctly and without timeouts (when you hit "Refresh") on a Dell server with ESXi 6.7 U2 and OpenManage VIB installed? |
@Napsty /sbin/services.sh restart works fine... i put this on a cronjob every hour... |
I refreshed on both versions and they both took the tasks and came out as completed. Interesting the last updated column did not change though. Not sure how it should behave. I actually never used the hardware health tab before. Relied only on either Dell Open Manage Essentials and Enterprise appliance (to monitor Dell hardware) and the check_esxi_hardware script. |
A colleague of mine added two new ESXi 6.7 U2 servers into the datacenter, so I can test locally now too. I immediately hit the problem reported by @ucola with the error I will keep an eye on this in the next days and see if there are stability issues. This is on Cisco UCSB-B200-M5 by the way. |
@Napsty Now my ESXi is updated to 6.7 U2, same issue "<Terminated by signal 9 (Killed).>" |
I have the same issue on HP DL380 G8, ESXi 6.7U2 when I check with Icinga2 --> <Terminated by signal 9 (Killed).> Local check from console works fine. |
@hampe4460 Can you please elaborate what you mean with "console" ? |
Sorry for the "console", I mean Debian CLI |
@hampe4460 Thanks. Can you measure how long it takes? |
@Napsty also in the terminal... |
It takes around 12 seconds, and it works every time I try it. |
@hampe4460 You might need to increase the command timeout in Icinga2:
But be aware that if you change this in ITL (/usr/share/icinga2/include/plugins-contrib.d/virtualization.conf) it might be overwritten again after an update. So you could just copy the whole command definition from @ucola Can you run the plugin on the terminal in verbose mode (-v) with |
I added the timeout = 180 option to /usr/share/icinga2/include/plugins-contrib.d/virtualization.conf, still the same |
@hampe4460 You reloaded icinga2 afterwards, right? |
Yes, I rebootet the whole server. I also was looking for a solution on the Icinga2 side, but no luck. I don't know Icinga 2 good enough to figure out the right switches to set correctly... |
You were so right, I just figured out what I needed to change, was too easy at the end... hopefully. |
FYI I opened a VMware support ticket concerning the CIM server issues. However I and VMware are unable to reproduce. The issues with the OpenManage VIB are obviously caused by Dell's OpenManage and that's not VMware's problem. @ucola Is there a way we can have a remote session to try and debug this? |
@Napsty of corse, you can contact me... |
FYI, from Dell -
|
To close this issue, which are actually two separate cases (see updated description at the begin):
Both cases are caused by third party tools. Nothing where the check_esxi_hardware could do something against. |
VMware's KB article - ESXi 6.7 U2/U3 unresponsive when running Dell OpenManage Server Administrator 9.3.0: |
Yep. The solution was to remove OpenManage indeed. In our case, we were not gaining much anyway, by having it installed. |
Has anyone tried new OMSA Version 9.3.1 (OM-SrvAdmin-Dell-Web-9.3.1-3684.VIB-ESX67i_A00.zip)? |
OMSA 9.3.1 has fixed the problem |
CIM issues in ESXi 6.7
In the past weeks, several users have reported issues with check_esxi_hardware since ESXi version 6.7.
This ticket is here to retrieve more information to pinpoint the issue. So far multiple hardware vendors and server models are affected. Please comment on this issue if you experience similar issues.
As of today, June 12 2019, I have not heard any feedback from VMware and Dell. I reached out to both of them via Twitter.
Related tickets
#31
#34
Two issues depending on ESXi version and hardware
Affected server vendors
Workarounds
Issue 1: Restart of CIM server using
/sbin/services.sh restart
Issue 2: Uninstalling the "Openmanage" Offline Bundle VIB helps, the CIM server and therefore the plugin works again afterwards.
esxcli software vib remove --vibname=OpenManage
The text was updated successfully, but these errors were encountered: