Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

False critical on HP ProLiant 380p Gen8 with ESXi 6.5u1 #24

Closed
reach3r opened this issue Sep 19, 2017 · 1 comment
Closed

False critical on HP ProLiant 380p Gen8 with ESXi 6.5u1 #24

reach3r opened this issue Sep 19, 2017 · 1 comment
Labels

Comments

@reach3r
Copy link

reach3r commented Sep 19, 2017

Since some time after we upgraded our fleet to ESXi 6.5u1 the check returns the critical status for disk or disk bay of all installed drives (2 SAS HDDs) for two of our hosts.

./check_esxi_hardware-20161013.py -H hostX.fqdn -U user -P password
 CRITICAL : Disk or Disk Bay 2 C1 P1I Bay 2: In Failed Array  CRITICAL : Disk or Disk Bay 1 C1 P1I Bay 1: In Failed Array - Server: HP ProLiant DL380p Gen8 s/n: XXXXXXXXXXX System BIOS: P70 2015-07-01

At the same time it returns all good for a dozen other hosts with identical or similar hardware configuration:

./check_esxi_hardware-20161013.py -H hostY.fqdn -U user -P password
 OK - Server: HP ProLiant DL380p Gen8 s/n: XXXXXXXXXXYY System BIOS: P70 2015-07-01

Several things confuse me:

  1. The critical status was not triggered immediately after the firmware and ESXi upgrades, but only after a later reboot.
  2. The check returns all good for a dozen other hosts of which most have the same or very similar configuration.
  3. Firmware versions at first glance don't seem to be the culprit, as both hosts with critical checks have slightly different firmware versions (controller and disks) and the same versions are found on other hosts that do not result in critical checks.

Is there further information I can provide to help get to the root of this?

Best regards

@Napsty
Copy link
Owner

Napsty commented Apr 1, 2018

There are several things to check.

  1. Please show the verbose output
  2. Check the hardware status tab in vsphere client/in the WebUI and compare the output
  3. I don't see you using the "-V hp" switch, although you have a Proliant server - try with this, too
  4. Make sure you have updated your CIM Offline Bundle from HP

The question is rather why your ESXi server's CIM service reported these failures (firmware problem? cim offline bundle problem? etc). The plugin does nothing else than parsing through all output and report "non-ok" elements.

@Napsty Napsty added the wontfix label Apr 1, 2018
@Napsty Napsty closed this as completed Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants