Dell R730xd iDRAC "System Board CPU Usage" metric 100%

Hi everyone,

I have an R730xd that suits the needs of my homelab perfectly. It's equipped with a Tesla P4 and a Mini-PCI Coral TPU, and everything has been working smoothly with no issues.

I recently dove into iDRAC's capabilities, setting up alerts and the like, and something caught my eye. The "System Board CPU Usage" metric under the hardware tab's System Performance in iDRAC shows 100% even when the server is only moderately loaded. For context, here's what it shows:

Hardware > System Performance (iDRAC)

And at the very same time PVE:

PVE Summary

This level of activity corresponds with what I see under the PVE Summary, which aligns with my expectations for the server's current operations.

According to iDRAC's help documentation, this metric is supposed to reflect the current summarized CPU load:

In 13th generation servers (with iDRAC Enterprise) that supports Intel ME, you can view the performance monitoring data for the CPU, memory, and I/O utilization. Intel ME supports Compute Usage Per Second (CUPS) functionality. It provides real-time monitoring of CPU, memory, I/O, and system level utilization index for the server. It is independent of the OS and does not consume CPU resources. The CUPS sensors in the server compute the CPU, memory and I/O resource utilization values as CUPS Index value. iDRAC monitors this CUPS index for overall system utilization and it also monitors instantaneous values of CPU, memory, and I/O utilization index.

CPU utilization — The combined CPU utilization of all the CPU cores in the system. Utilization is based on time spent in active state versus time spent in inactive state.

Initially, my server's monitoring for these performance metrics was essentially disabled, with the warning threshold set at 101%. Observing the metrics myself, I've noted that while IO and MEM usage readings correspond to actual loads, the CPU metric doesn't; it hits 100% even when far from full capacity.

I'm wondering if this could be due to the CPUs installed: 72 x Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz (2 Sockets), which appear to be some type of OEM variant common in China. Might they lack full support for CUPS functionality?

This inquiry is driven by curiosity more than anything. Having OS-independent monitoring of the actual CPU load would be ideal. So, what are your thoughts on this, reddit?