Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redfish_chassis_temperature_sensor_health_state metric #73

Closed
ulikl opened this issue Oct 6, 2023 · 4 comments
Closed

Add redfish_chassis_temperature_sensor_health_state metric #73

ulikl opened this issue Oct 6, 2023 · 4 comments

Comments

@ulikl
Copy link

ulikl commented Oct 6, 2023

Hi,

The current temperature metrics looks like

redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="CPU1 Temp", sensor_id="0"} 37
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="CPU2 Temp", sensor_id="1"} 32
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board Exhaust Temp", sensor_id="4"} 30
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board GPU7 Temp", sensor_id="3"} 32
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board Inlet Temp", sensor_id="2"} 19
redfish_chassis_temperature_sensor_state{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="CPU1 Temp", sensor_id="0"} 1
redfish_chassis_temperature_sensor_state{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="CPU2 Temp", sensor_id="1"} 1
redfish_chassis_temperature_sensor_state{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board Exhaust Temp", sensor_id="4"} 1
redfish_chassis_temperature_sensor_state{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board GPU7 Temp", sensor_id="3"} 1
redfish_chassis_temperature_sensor_state{chassis_id="System.Embedded.1", instance="xxxx", job="redfish-exporter", resource="temperature",  sensor="System Board Inlet Temp", sensor_id="2"} 1

Note: for the test I set the Warning threshold for sensor "System Board Inlet Temp" to 17.
The only state/health metrics > 1 in this case are:

redfish_system_health_state{cluster="steyr-prod-gpu",environment="prod",instance="steyr-prod-gpu__lp05edge02008",job="redfish-exporter",node="lp05edge02008",prometheus="victoriametrics/central",resource="system",scrape_from="edge-tooling",system_id="System.Embedded.1"} 2
redfish_chassis_health{chassis_id="System.Embedded.1",cluster="steyr-prod-gpu",environment="prod",instance="steyr-prod-gpu__lp05edge02008",job="redfish-exporter",node="lp05edge02008",prometheus="victoriametrics/central",resource="chassis",scrape_from="edge-tooling"} 2

So we in this case, when can only get a unspecific Chassis alert or need to define a Alert on the redfish_chassis_temperature_celsius using separate thresholds int the alert definition, which might not match the server configurations.

But the at least for our Dell servers also a Health value is provided via:
https:///redfish/v1/Chassis/System.Embedded.1/Sensors/SystemBoardInletTemp

e.g. for

{
    "@odata.context": "/redfish/v1/$metadata#Sensor.Sensor",
    "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Sensors/SystemBoardInletTemp",
    "@odata.type": "#Sensor.v1_5_0.Sensor",
    "Name": "System Board Inlet Temp",
    "Id": "SystemBoardInletTemp",
    "Description": "Instance of Sensor Id",
    "ReadingType": "Temperature",
    "ReadingUnits": "Cel",
    "Status": {
        "Health": "Warning",
        "State": "Enabled"
    },
    "Reading": 20.0,
   ...
}

Can the redfish_exporter be extended by such a temperature health metric?

@jenningsloy318
Copy link
Owner

I checked the code, we have redfish_chassis_temperature_celsius and redfish_chassis_temperature_sensor_state, but we don't have redfish_chassis_temperature_sensor_health, I will check if we can add redfish_chassis_temperature_sensor_health

@jenningsloy318
Copy link
Owner

@ulikl latest commit add such metric, please build and test since I don't have device

@ulikl
Copy link
Author

ulikl commented Nov 13, 2023

@jenningsloy318 , Thank you very much.
Its working

# HELP redfish_chassis_temperature_celsius celsius of temperature on this chassis component
# TYPE redfish_chassis_temperature_celsius gauge
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU1 Temp",sensor_id="0"} 36
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU2 Temp",sensor_id="1"} 36
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Exhaust Temp",sensor_id="3"} 37
redfish_chassis_temperature_celsius{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Inlet Temp",sensor_id="2"} 27
# HELP redfish_chassis_temperature_sensor_health status health of temperature on this chassis component,1(Enabled),2(Disabled),3(StandbyOffinline),4(StandbySpare),5(InTest),6(Starting),7(Absent),8(UnavailableOffline),9(Deferring),10(Quiesced),11(Updating)
# TYPE redfish_chassis_temperature_sensor_health gauge
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU1 Temp",sensor_id="0"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU2 Temp",sensor_id="1"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Exhaust Temp",sensor_id="3"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Inlet Temp",sensor_id="2"} 1

With inlet over warning:

# TYPE redfish_chassis_temperature_sensor_health gauge
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU1 Temp",sensor_id="0"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="CPU2 Temp",sensor_id="1"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Exhaust Temp",sensor_id="3"} 1
redfish_chassis_temperature_sensor_health{chassis_id="System.Embedded.1",resource="temperature",sensor="System Board Inlet Temp",sensor_id="2"} 2

@ulikl ulikl closed this as completed Nov 13, 2023
@fschlich
Copy link

if "2" means Warning, the HELP text is wrong, should be CommonHealthHelp instead of CommonStateHelp, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants