Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collected metric "redfish_system_pcie_function_state" ... was collected before with the same name and label values on PERC H730 Mini #71

Open
fschlich opened this issue Aug 25, 2023 · 10 comments

Comments

@fschlich
Copy link

We have a few older Dell systems that have a PERC H730 Mini integrated RAID controller. On these systems, redfish_exporter (latest git: e28371d) throws a fatal error, while it used to work ok prior to the collection of more detailed PCIe metrics:

An error has occurred while serving metrics:

2 error(s) occurred:
* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values

I think perhaps these adapters don't report a "state" as the exporter expects it to do, this is the data from /redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1:

{
  "@odata.context": "/redfish/v1/$metadata#Storage.Storage",
  "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1",
  "@odata.type": "#Storage.v1_4_0.Storage",
  "Description": "PERC H730 Mini",
  "Drives": [
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.0:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.1:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.2:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.3:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.4:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.5:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.6:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/Drives/Disk.Bay.7:Enclosure.Internal.0-1:RAID.Integrated.1-1"
    }
  ],
  "Drives@odata.count": 8,
  "Id": "RAID.Integrated.1-1",
  "Links": {
    "Enclosures": [
      {
        "@odata.id": "/redfish/v1/Chassis/Enclosure.Internal.0-1:RAID.Integrated.1-1"
      },
      {
        "@odata.id": "/redfish/v1/Chassis/System.Embedded.1"
      }
    ],
    "Enclosures@odata.count": 2
  },
  "Name": "PERC H730 Mini",
  "Status": {
    "Health": "OK",
    "HealthRollup": "OK",
    "State": "Enabled"
  },
  "StorageControllers": [
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/StorageControllers/RAID.Integrated.1-1",
      "Assembly": {
        "@odata.id": "/redfish/v1/Chassis/System.Embedded.1/Assembly"
      },
      "FirmwareVersion": "25.5.6.0009",
      "Identifiers": [
        {
          "DurableName": "544A842006943000",
          "DurableNameFormat": "NAA"
        }
      ],
      "Links": {},
      "Manufacturer": "DELL",
      "MemberId": "RAID.Integrated.1-1",
      "Model": "PERC H730 Mini",
      "Name": "PERC H730 Mini",
      "SpeedGbps": 12,
      "Status": {
        "Health": "OK",
        "HealthRollup": "OK",
        "State": "Enabled"
      },
      "SupportedControllerProtocols": [
        "PCIe"
      ],
      "SupportedDeviceProtocols": [
        "SAS",
        "SATA"
      ]
    }
  ],
  "StorageControllers@odata.count": 1,
  "Volumes": {
    "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Storage/RAID.Integrated.1-1/Volumes"
  }
}
@jenningsloy318
Copy link
Owner

Hi,
your error shows that it occured when scraping pcie_function, but you don't post it. you just post the storage/RAID output, can you please confirm.

@fschlich
Copy link
Author

ok, so /redfish/v1/Systems/System.Embedded.1 has a few PCIeFunctions:

  "PCIeFunctions": [
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/130-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/130-0-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/9-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-23-4"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-2"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-3"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-49-2"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"
    }
  ],
  "PCIeFunctions@odata.count": 17,

and I read from the error message that it is 0-0-0 which we're interested in, so this is /redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0:

{
  "@odata.context": "/redfish/v1/$metadata#PCIeFunction.PCIeFunction",
  "@odata.etag": "1693376981",
  "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0",
  "@odata.type": "#PCIeFunction.v1_1_1.PCIeFunction",
  "ClassCode": "0x000006",
  "Description": "Xeon E7 v3/Xeon E5 v3/Core i7 DMI2",
  "DeviceClass": "Bridge",
  "DeviceId": "0x2f00",
  "FunctionId": 0,
  "FunctionType": "Physical",
  "Id": "0-0-0",
  "Links": {
    "Drives": [],
    "Drives@odata.count": 0,
    "EthernetInterfaces": [],
    "EthernetInterfaces@odata.count": 0,
    "PCIeDevice": {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevice/0-0"
    },
    "StorageControllers": [],
    "StorageControllers@odata.count": 0
  },
  "Name": "Xeon E7 v3/Xeon E5 v3/Core i7 DMI2",
  "RevisionId": "0x02",
  "Status": {
    "Health": "OK",
    "HealthRollup": "OK",
    "State": "Enabled"
  },
  "SubsystemId": "0x0000",
  "SubsystemVendorId": "0x8086",
  "VendorId": "0x8086"
}

Is that helpful? I'm happy to post more, please explain in detail what you might need

@jenningsloy318
Copy link
Owner

not exactly, you error message

* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H730 Mini" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values

which means that there must be some extra attribute to distinguish these metrics, so please help upload all api responses that match the errors exactly.

from you single pciefunction response, I can't differentiate which label I can add for it .

@fschlich
Copy link
Author

ok, so three weeks ago I was confused, because what I was seeing didn't match my memories and I had a hard time reproducing the original issue. Today I took some more time and a systematic approach, and I am now certain that some servers which displayed this issue no longer do. On those servers, we have done firmware updates, among other things updating the "PowerEdge Server BIOS" from version 2.15 to 2.17.

On several boxes that still have a 2.15 or 2.13 BIOS and display the error, the output of /redfish/v1/Systems/System.Embedded.1 actually looks different to what I wrote three weeks ago: As you can see below, the PCIeFunction/0-0-0 is listed twice, and I guess that's the reason the exporter is scraping it twice, and unsurprisingly finds the same data twice.

Given that this is fixed in current firmware versions, I'm not sure if you want to change the exporter to guard against duplicate IDs, or just write it off as Dell's problem and close this issue?

$ curl https://..../redfish/v1/Systems/System.Embedded.1' | jq
...
  "PCIeFunctions": [
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/10-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"                      <==
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-23-4"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"                     <==
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-1"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-2"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/1-0-3"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-49-2"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-2-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"
    },
    {
      "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"
    }
  ],
  "PCIeFunctions@odata.count": 16,

@hanchao131415
Copy link

Browser to access http://172.100.70.202:9610/redfish? target=172.100.70.52 The result is:

`An error has occurred while serving metrics:

8 error(s) occurred:

  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller" > label:<name:"pcie_device_id" value:"177-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller" > label:<name:"pcie_device_id" value:"177-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family SMBus" > label:<name:"pcie_device_id" value:"0-31" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family SMBus" > label:<name:"pcie_device_id" value:"0-31" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family PCI Express Root Port deps rule missing from Makefile #5" > label:<name:"pcie_device_id" value:"0-28" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"C620 Series Chipset Family PCI Express Root Port deps rule missing from Makefile #5" > label:<name:"pcie_device_id" value:"0-28" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"PowerEdge Rx5xx LOM Board" > label:<name:"pcie_device_id" value:"4-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values
  • [from Gatherer No Issue #2] collected metric "redfish_system_pcie_device_health_state" { label:<name:"hostname" value:"ipmi" > label:<name:"pcie_device" value:"PowerEdge Rx5xx LOM Board" > label:<name:"pcie_device_id" value:"4-0" > label:<name:"resource" value:"pcie_device" > gauge:<value:1 > } was collected before with the same name and label values`

==========================================================================

I use the postman test request/redfish/v1 / Systems/System. Embedded. 1 / result is:

"PCIeDevices": [ { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/177-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/177-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-31" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-23" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-28" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/4-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/202-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/49-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/3-0" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-17" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-31" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/0-28" }, { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/PCIeDevices/4-0" } ], "PCIeDevices@odata.count": 13,

From the returned results can be found in the same @ odata. Id such as: "@ odata. Id" : "/ redfish/v1 / Systems/System. Embedded. 1 / PCIeDevices / 177-0"

===================================================================

我的服务器信息是dell PowerEdge R750 iDRAC9

@fschlich
Copy link
Author

fschlich commented Dec 1, 2023

@hanchao131415 what is your BiosVersion value from /redfish/v1/Systems/System.Embedded.1? If it is less than 2.17.0, does the issue persist when you upgrade to the current server firmware?

@hanchao131415
Copy link

hanchao131415 commented Dec 4, 2023

@fschlich

"AssetTag": "", "Bios": { "@odata.id": "/redfish/v1/Systems/System.Embedded.1/Bios" }, "BiosVersion": "1.8.2",

==================================
My bios version is 1.8.2 and I have not upgraded the bios version

@burdorff
Copy link

burdorff commented Feb 14, 2024

"AssetTag":"","Bios":{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/Bios"},"BiosVersion":"2.9.0",

I see the issue here despite a bios of 2.9.0.

2 error(s) occurred:
* [from Gatherer #2] collected metric "redfish_system_pcie_function_state" { label:<name:"hostname" value:"--removed--" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H710P Mini (for monolithics)" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "redfish_system_pcie_function_health_state" { label:<name:"hostname" value:"--removed--" > label:<name:"pci_function_deviceclass" value:"UnclassifiedDevice" > label:<name:"pci_function_type" value:"Physical" > label:<name:"pcie_function_id" value:"0-0-0" > label:<name:"pcie_function_name" value:"PERC H710P Mini (for monolithics)" > label:<name:"resource" value:"pcie_function" > gauge:<value:1 > } was collected before with the same name and label values

In my case it's possible that some examples (such as this one) have IDRAC7 (which still supports Redfish API).
edit: confirmed on a 2.18.1 BIOS for IDRAC8

However the pcie_function 0-0-0 still appears twice despite the bios version:
https://removed/redfish/v1/Systems/System.Embedded.1

"PCIeFunctions":[{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/6-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-29-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-31-2"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-1-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-4"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-2"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/8-0-3"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/2-0-1"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-26-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-3-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-0"},{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-28-7"}], "PCIeFunctions@odata.count":18,

@burdorff
Copy link

https://removed/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0

{"@odata.context":"/redfish/v1/$metadata#PCIeFunction.PCIeFunction","@odata.etag":"1705552257","@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeFunction/0-0-0","@odata.type":"#PCIeFunction.v1_1_1.PCIeFunction","ClassCode":"0x000000","Description":"PERC H830 Adapter","DeviceClass":"UnclassifiedDevice","DeviceId":"0x005d","FunctionId":0,"FunctionType":"Physical","Id":"0-0-0","Links":{"Drives":[],"Drives@odata.count":0,"EthernetInterfaces":[],"EthernetInterfaces@odata.count":0,"PCIeDevice":{"@odata.id":"/redfish/v1/Systems/System.Embedded.1/PCIeDevice/0-0"},"StorageControllers":[],"StorageControllers@odata.count":0},"Name":"PERC H830 Adapter","RevisionId":"0x00","Status":{"Health":"OK","HealthRollup":"OK","State":"Enabled"},"SubsystemId":"0x1f41","SubsystemVendorId":"0x1028","VendorId":"0x1000"}

@GregWhiteyBialas
Copy link

Hi,
I submitted PR which workarounds this problem. Any feedback is welcomed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants