Sitemap

Key Insight: “100% GPU Util” ≠ “100% Heat”

4 min readMay 6, 2025

--

What “GPU Util” Actually Measures

When NVIDIA tools like NVML or DCGM measure utilization.gpu metric, they divide time into small slices — typically milliseconds. For each time slice, they check only whether at least one CUDA kernel is resident on any of the GPU’s cores, called Streaming Multiprocessors (SMs). It doesn’t matter how much actual work the GPU is doing — whether it’s ready to execute or already executing — just being “occupied” counts as 100% utilization during that moment. 100% utilization from the utilization.gpu metric doesn’t reflect:

  • Functional unit activity — It doesn’t show whether FP32 units, Tensor Cores, or memory controllers are actively working or sitting idle.
  • SM occupancy — It ignores how fully the SMs are used. An SM may be active but running only a small number of warps (e.g., just 1 out of many possible).
  • Voltage/frequency state — It doesn’t account for power-saving features like DVFS (Dynamic Voltage and Frequency Scaling) or clock-gating, where parts of the chip may run slower or be temporarily turned off.

Example cases where “100% Util” masks variable heat output

The utilization.gpu metric often shows 100%, but this can be misleading. The table below compares what the metric reports with the GPU’s actual activity and power usage, and clarifies what that 100% utilization really means in different scenarios.

* TDP (Thermal Design Power) is the expected maximum heat a GPU generates under typical peak load, measured in watts. It’s a design target — not the absolute power limit — used to size cooling systems for safe and stable operation.

Metrics to Track Real Heat Generation

Ultimately, the best metric to gauge the thermal load is using the nvmlDeviceGetPowerUsage metric. And together with the pstate, we can decide how much heat is generated by workloads running on a GPU and if thermal throttling has happened because of inefficient cooling.

How Federator.ai monitors and manages thermal energy generated by GPU workloads

It will be beneficial to define a heat index to model the generated heat regardless the different GPU models. A reasonable way to define such heat index (HI) is

HI = (GPU Power Draw−GPU Idle Power)/(GPU Max Power−GPU Idle Power)

The range of the heat index will be between 0 and 1 based on this definition. Federator.ai monitors the scheduling and orchestration of GPU workloads and the fluctuation of the heat index of GPUs of the servers from the same rack, which are cooled by the same CDU. It also monitors in real time the CDU temperature sensors and coolant flow rate, and other CDU metrics. With this information, Federator.ai dynamically adjusts the CDU coolant flow rate that maintains optimal GPU operation temperature range while reducing energy used by the CDU.

It is also important to raise alerts and notifications in case any GPU temperature reaches its operation maximum operation temperature and is experiencing thermal throttling. Federator.ai monitors the GPU’s pstate metric for this purpose.

Federator.ai Smart Cooling system consists of the following three management planes for efficient thermal management.

  1. Real-time GPU Metrics Monitoring at the Edge
    An edge agent is installed at each GPU server to collect and monitor DCGM metrics (power usage, temperatures, pstate) and compute the heat index of each GPU at 1-second interval. An alert is triggered if GPU thermal throttling occurs or GPU temperature reaches to a predefined max boundary.
  2. Thermal-Aware Workload Placement
    Using metrics collected from the DCGM as well as from the liquid cooling system (e.g., CDU), Federtor.ai places the new GPU workloads to appropriate GPU servers so that it avoids hotspots and, at the same time, has the most efficient energy use of CDUs.
  3. Intelligent Smart Cooling Control
    Federaor.ai interfaces with the external liquid cooling hardware, such as rack-based or in-row CDUs, and adjusts flow rate/valves so that GPUs are operating in the optimal temperature range with the least amount of energy.

The following table summarizes how Fedeartor.ai GPU Booster integrates the workload-aware IT plane and liquid cooling system facility plane into an intelligent smart cooling solution.

#AI #GPU #SmartCooling #DataCenter #Server #AIDC

Reference

  1. NVIDIA Developer Forum, ” Nvidia-SMI reporting 0% gpu utilization “, 2023. [Online]. Available:https://forums.developer.nvidia.com/t/nvidia-smi-reporting-0-gpu-utilization/261878.
  2. NVIDIA Developer, ” System Management Interface SMI “, NVIDIA. [Online]. Available: https://developer.nvidia.com/system-management-interface.
  3. NVIDIA Developer, “Measuring the GPU Occupancy of Multi-stream Workloads”, NVIDIA Blog, 2024. [Online]. Available: https://developer.nvidia.com/blog/measuring-the-gpu-occupancy-of-multi-stream-workloads/.
  4. Wang, “DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information,” arXiv preprint arXiv:2407.13096, 2024. [Online]. Available: https://arxiv.org/abs/2407.13096.
  5. Open Compute Project Cooling Environments Project, Reservoir and Pumping Unit (RPU) Specification, Version 1.0, Nov 2022:
    https://www.opencompute.org/documents/ocp-reservoir-and-pumping-unit-specification-v1-0-pdf.

Bottom line: a single “100 % GPU util” flag is a poor proxy for thermal load; Federator.ai should key its cooling logic on power and functional-unit activity, not the coarse utilization bit.

--

--

ProphetStor
ProphetStor

Written by ProphetStor

Pioneering Excellence in IT/Cloud Efficiency and GPU Management Through Resilient and Advanced Optimization.

Responses (1)