Skip to content

Latest commit

 

History

History
 
 

docs

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>Intel(R) XPU Manager and XPU System Management Interface &mdash; XPU Manager  documentation</title>
      <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
      <link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
      <link rel="stylesheet" href="_static/doxyrest-pygments.css" type="text/css" />
      <link rel="stylesheet" href="_static/doxyrest-sphinx_rtd_theme.css" type="text/css" />
  <!--[if lt IE 9]>
    <script src="_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
        <script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
        <script src="_static/jquery.js"></script>
        <script src="_static/underscore.js"></script>
        <script src="_static/doctools.js"></script>
        <script src="_static/target-highlight.js"></script>
    <script src="_static/js/theme.js"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="next" title="Installation" href="installation.html" />
    <link rel="prev" title="XPUM Documentation" href="xpum_index.html" /> 
</head>

<body class="wy-body-for-nav"> 
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >
            <a href="index.html" class="icon icon-home"> XPU Manager
          </a>
<div role="search">
  <form id="rtd-search-form" class="wy-form" action="search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <ul class="current">
<li class="toctree-l1 current"><a class="reference internal" href="xpum_index.html">XPUM Documentation</a><ul class="current">
<li class="toctree-l2 current"><a class="current reference internal" href="#">Intel(R) XPU Manager and XPU System Management Interface</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#intel-r-xpu-manager-features">Intel(R) XPU Manager features</a></li>
<li class="toctree-l3"><a class="reference internal" href="#cli-output-of-gpu-device-info-telemetries-and-firmware-update">CLI output of GPU device info, telemetries and firmware update</a></li>
<li class="toctree-l3"><a class="reference internal" href="#feature-set-of-xpu-manager-xpu-smi-and-windows-cli-tool">Feature set of XPU Manager, XPU-SMI and Windows CLI tool</a></li>
<li class="toctree-l3"><a class="reference internal" href="#how-to-get-xpu-manager-xpu-smi-windows-cli-and-amcmcli-binaries">How to get XPU Manager, XPU-SMI, Windows CLI and amcmcli binaries.</a></li>
<li class="toctree-l3"><a class="reference internal" href="#supported-devices">Supported Devices</a></li>
<li class="toctree-l3"><a class="reference internal" href="#supported-oses">Supported OSes</a></li>
<li class="toctree-l3"><a class="reference internal" href="#documentation">Documentation</a></li>
<li class="toctree-l3"><a class="reference internal" href="#architecture">Architecture</a></li>
<li class="toctree-l3"><a class="reference internal" href="#gpu-telemetry-exported-to-grafana">GPU telemetry exported to Grafana</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="installation.html">Installation</a></li>
<li class="toctree-l2"><a class="reference internal" href="CLI_user_guide.html">Intel(R) XPU Manager Command Line Interface User Guide</a></li>
<li class="toctree-l2"><a class="reference internal" href="restful.html">Restful API</a></li>
<li class="toctree-l2"><a class="reference internal" href="xpumdoc/rst/index.html">Core Library API</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="smi_index.html">XPU-SMI Linux Documentation</a></li>
<li class="toctree-l1"><a class="reference internal" href="win_index.html">XPU-SMI Windows Documentation</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="index.html">XPU Manager</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">
      <li><a href="index.html" class="icon icon-home"></a> &raquo;</li>
          <li><a href="xpum_index.html">XPUM Documentation</a> &raquo;</li>
      <li>Intel(R) XPU Manager and XPU System Management Interface</li>
      <li class="wy-breadcrumbs-aside">
      </li>
  </ul>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
             
  <section class="tex2jax_ignore mathjax_ignore" id="intel-r-xpu-manager-and-xpu-system-management-interface">
<h1>Intel(R) XPU Manager and XPU System Management Interface<a class="headerlink" href="#intel-r-xpu-manager-and-xpu-system-management-interface" title="Permalink to this headline"></a></h1>
<p>Intel(R) XPU Manager is a free and open-source tool for monitoring and managing Intel data center GPUs.</p>
<p>It is designed to simplify administration, maximize reliability and uptime, and improve utilization.</p>
<p>XPU Manager can be used standalone through its command line interface (CLI) to manage GPUs locally, or through its RESTful APIs to manage GPUs remotely. Intel(R) XPU System Management Interface (XPU-SMI) is the daemon-less version of XPU Manager and it only provides the local interface. XPU-SMI feature scope is the subset of XPU Manager. Their features are listed in the table below. Please note that XPU-SMI has been included in the GPU driver repository. If you want to use XPU Manager, please uninstall XPU-SMI and install XPU Manager.</p>
<p>amcmcli is a portable CLI tool to manage GPU AMC firmware on Linux OS. It is independent of GPU driver.</p>
<p>3rd party open-source and commercial workload and cluster managers, job schedulers, and monitoring solutions can also integrate the XPU Manager or XPU-SMI to manage Intel data center GPUs.</p>
<section id="intel-r-xpu-manager-features">
<h2>Intel(R) XPU Manager features<a class="headerlink" href="#intel-r-xpu-manager-features" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>Administration:</p>
<ul>
<li><p>GPU discovery and information - name, model, serial, stepping, location, frequency, memory capacity, firmware version</p></li>
<li><p>GPU topology and grouping</p></li>
<li><p>GPU Firmware updating, including GPU GFX firmware and AMC (Add-in card Management Controller) firmware updating.</p></li>
</ul>
</li>
<li><p>Monitoring:</p>
<ul>
<li><p>GPU telemetry – utilization, power, frequency, temperature, fabric speed, memory throughput, errors</p></li>
<li><p>GPU health – memory, power, temperature, fabric port, etc.</p></li>
</ul>
</li>
<li><p>Diagnostics:</p>
<ul>
<li><p>3 levels of GPU diagnostic tests</p></li>
<li><p>Pre-check GPU hardware and driver critical issues</p></li>
<li><p>GPU log collection for the issue investigation</p></li>
</ul>
</li>
<li><p>Configuration:</p>
<ul>
<li><p>GPU Settings - GPU power limits, frequency range, standby mode, scheduler mode, ECC On/Off, performance factor, fabric port status</p></li>
<li><p>GPU policies - Throttle GPU when the temperature set threshold is reached</p></li>
</ul>
</li>
<li><p>Supported Frameworks:</p>
<ul>
<li><p>Prometheus exporter, Docker container support, Icinga plugin</p></li>
</ul>
</li>
</ul>
</section>
<section id="cli-output-of-gpu-device-info-telemetries-and-firmware-update">
<h2>CLI output of GPU device info, telemetries and firmware update<a class="headerlink" href="#cli-output-of-gpu-device-info-telemetries-and-firmware-update" title="Permalink to this headline"></a></h2>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span>xpumcli discovery -d 0
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Type: GPU                                                                     |
|           | Device Name: Intel(R) Graphics [0x56c0]                                              |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | UUID: 01000000-0000-0000-0000-0000004d0000                                           |
|           | Serial Number: LQAC20305316                                                          |
|           | Core Clock Rate: 2050 MHz                                                            |
|           | Stepping: C0                                                                         |
|           |                                                                                      |
|           | Driver Version:                                                                      |
|           | Kernel Version: 5.15.47+prerelease3762                                               |
|           | GFX Firmware Name: GFX                                                               |
|           | GFX Firmware Version: DG02_1.3170                                                    |
|           | GFX Data Firmware Name: GFX_DATA                                                     |
|           | GFX Data Firmware Version: 0x12d                                                     |
|           |                                                                                      |
|           | PCI BDF Address: 0000:4d:00.0                                                        |
|           | PCI Slot: J37 - Riser 1, Slot 1                                                      |
|           | PCIe Generation: 4                                                                   |
|           | PCIe Max Link Width: 16                                                              |
+-----------+--------------------------------------------------------------------------------------+

xpumcli dump -d 0 -m 0,1,2,3
Timestamp, DeviceId, GPU Utilization (%), GPU Power (W), GPU Frequency (MHz), GPU Core Temperature (Celsius Degree)
21:23:00.000,    0, 99.55, 119.61, 1800, 49.00
21:23:01.000,    0, 99.45, 119.36, 1800, 50.00
21:23:02.000,    0, 99.48, 119.55, 1750, 50.50
21:23:03.000,    0, 99.65, 119.24, 1700, 51.00


sudo xpumcli updatefw -d 0 -t GFX -f ATS_M150_512_C0_PVT_ES_032_gfx_fwupdate_SOC1.bin
Device 0 FW version: DG02_1.3170
Image FW version: DG02_1.3172
Do you want to continue? (y/n) y
Start to update firmware
Firmware Name: GFX
Image path: /home/dcm/ATS_M150_512_C0_PVT_ES_032_gfx_fwupdate_SOC1.bin
[============================================================] 100 %
Update firmware successfully.
</pre></div>
</div>
</section>
<section id="feature-set-of-xpu-manager-xpu-smi-and-windows-cli-tool">
<h2>Feature set of XPU Manager, XPU-SMI and Windows CLI tool<a class="headerlink" href="#feature-set-of-xpu-manager-xpu-smi-and-windows-cli-tool" title="Permalink to this headline"></a></h2>
<table class="colwidths-auto docutils align-default">
<thead>
<tr class="row-odd"><th class="text-left head"><p></p></th>
<th class="text-center head"><p>XPU Manager</p></th>
<th class="text-center head"><p>XPU-SMI</p></th>
<th class="text-center head"><p>Windows CLI tool</p></th>
<th class="text-center head"><p>amcmcli</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td class="text-left"><p>Device Info and Topology</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-odd"><td class="text-left"><p>GPU Telemetries</p></td>
<td class="text-center"><p>Yes (aggregated data)</p></td>
<td class="text-center"><p>Yes (real-time data)</p></td>
<td class="text-center"><p>Yes (real-time data)</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-even"><td class="text-left"><p>GPU Firmware Update</p></td>
<td class="text-center"><p>GFX, GFX_Data, AMC</p></td>
<td class="text-center"><p>GFX, GFX_Data, AMC</p></td>
<td class="text-center"><p>GFX, GFX_Data, AMC</p></td>
<td class="text-center"><p>AMC (IPMI)</p></td>
</tr>
<tr class="row-odd"><td class="text-left"><p>GPU Configuration</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-even"><td class="text-left"><p>GPU Diagnostics</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-odd"><td class="text-left"><p>GPU Health</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-even"><td class="text-left"><p>GPU Grouping</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-odd"><td class="text-left"><p>GPU policy</p></td>
<td class="text-center"><p>Yes</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
<td class="text-center"><p>No</p></td>
</tr>
<tr class="row-even"><td class="text-left"><p>Architecture</p></td>
<td class="text-center"><p>Daemon based</p></td>
<td class="text-center"><p>Daemon-less</p></td>
<td class="text-center"><p>Daemon-less</p></td>
<td class="text-center"><p>Daemon-less</p></td>
</tr>
<tr class="row-odd"><td class="text-left"><p>Interfaces</p></td>
<td class="text-center"><p>CLI, RESTFul, Library</p></td>
<td class="text-center"><p>CLI, Library</p></td>
<td class="text-center"><p>CLI, Library</p></td>
<td class="text-center"><p>CLI</p></td>
</tr>
</tbody>
</table>
</section>
<section id="how-to-get-xpu-manager-xpu-smi-windows-cli-and-amcmcli-binaries">
<h2>How to get XPU Manager, XPU-SMI, Windows CLI and amcmcli binaries.<a class="headerlink" href="#how-to-get-xpu-manager-xpu-smi-windows-cli-and-amcmcli-binaries" title="Permalink to this headline"></a></h2>
<p>You may get the latest installers or binaries in <a class="reference external" href="https://github.com/intel/xpumanager/releases">Releases</a>.</p>
</section>
<section id="supported-devices">
<h2>Supported Devices<a class="headerlink" href="#supported-devices" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>Intel(R) Data Center Flex Series GPU (<a class="reference external" href="https://dgpu-docs.intel.com/installation-guides/index.html">GPU Driver Installation Guides</a>)</p></li>
<li><p>Intel(R) Data Center Max Series GPU (<a class="reference external" href="https://dgpu-docs.intel.com/installation-guides/index.html">GPU Driver Installation Guides</a>)</p></li>
</ul>
</section>
<section id="supported-oses">
<h2>Supported OSes<a class="headerlink" href="#supported-oses" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>XPU Manager</p>
<ul>
<li><p>Ubuntu 20.04.3/22.04</p></li>
<li><p>RHEL 8.5/8.6</p></li>
<li><p>CentOS 8/9 Stream</p></li>
<li><p>CentOS 7.4/7.9</p></li>
<li><p>SLES 15 SP3/SP4</p></li>
<li><p>Windows Server 2019/2022 (limited features including: GPU device info, GPU telemetry, GPU firmware update and GPU configuration)</p></li>
</ul>
</li>
<li><p>XPU-SMI</p>
<ul>
<li><p>Ubuntu 20.04.3/22.04</p></li>
<li><p>RHEL 8.5/8.6</p></li>
<li><p>CentOS 8/9 Stream</p></li>
<li><p>CentOS 7.4/7.9</p></li>
<li><p>SLES 15 SP3/SP4</p></li>
<li><p>Debian 10.13</p></li>
</ul>
</li>
</ul>
</section>
<section id="documentation">
<h2>Documentation<a class="headerlink" href="#documentation" title="Permalink to this headline"></a></h2>
<ul class="simple">
<li><p>Refer to the <span class="xref myst">XPU Manager Installation Guide</span> and for how to install/uninstall XPU Manager.</p></li>
<li><p>Refer to the <span class="xref myst">XPU-SMI Installation Guide</span> and for how to install/uninstall XPU-SMI.</p></li>
<li><p>Refer to the <span class="xref myst">XPU Manager CLI User Guide</span> to start to use XPU Manager.</p></li>
<li><p>Refer to the <span class="xref myst">XPU-SMI CLI User Guide</span> to start to use XPU-SMI.</p></li>
<li><p>Refer to the <span class="xref myst">XPU Manager Windows CLI User Guide</span> to start to use XPU Manager Windows CLI.</p></li>
<li><p>Refer to the <span class="xref myst">XPU Manager amcmcli User Guide</span> to start to use XPU Manager amcmcli.</p></li>
<li><p>Refer to <a class="reference external" href="https://hub.docker.com/r/intel/xpumanager">DockerHub</a> for a Docker container image that can be used as a Prometheus exporter in a Kubernetes environment.</p></li>
<li><p>Refer to <span class="xref myst">Building XPU Manager Installer</span> to build XPU Manager installer packages.</p></li>
<li><p>Refer to <a class="reference external" href="https://intel.github.io/xpumanager/smi_index.html">XPU Manager/XPU-SMI API documents</a> to integrate the library or RESTFul interface.</p></li>
</ul>
</section>
<section id="architecture">
<h2>Architecture<a class="headerlink" href="#architecture" title="Permalink to this headline"></a></h2>
<p><img alt="XPU Manager Architecture" src="_images/architecture.PNG" /></p>
</section>
<section id="gpu-telemetry-exported-to-grafana">
<h2>GPU telemetry exported to Grafana<a class="headerlink" href="#gpu-telemetry-exported-to-grafana" title="Permalink to this headline"></a></h2>
<p><img alt="GPU telemetry exported from XPU Manager to Grafana" src="_images/Grafana.PNG" /></p>
</section>
</section>


           </div>
          </div>
          <footer><div class="rst-footer-buttons" role="navigation" aria-label="Footer">
        <a href="xpum_index.html" class="btn btn-neutral float-left" title="XPUM Documentation" accesskey="p" rel="prev"><span class="fa fa-arrow-circle-left" aria-hidden="true"></span> Previous</a>
        <a href="installation.html" class="btn btn-neutral float-right" title="Installation" accesskey="n" rel="next">Next <span class="fa fa-arrow-circle-right" aria-hidden="true"></span></a>
    </div>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2023, Intel.</p>
  </div>

  Built with <a href="https://www.sphinx-doc.org/">Sphinx</a> using a
    <a href="https://github.com/readthedocs/sphinx_rtd_theme">theme</a>
    provided by <a href="https://readthedocs.org">Read the Docs</a>.
   

</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script> 

</body>
</html>