Name		Name	Last commit message	Last commit date
parent directory ..
docker		docker
kubernetes		kubernetes
README.md		README.md

README.md

Intel XPU Manager

Intel XPU Manager is an in-band node-level tool that provides local/remote GPU management. It is easily integrated into the cluster management solutions and cluster scheduler. GPU users may use it to manage Intel GPUs, locally. It supports local command line interface, local library call and remote RESTFul API interface.

So far, this container image is targeted as a Prometheus exporter.

The Intel XPU Manager source repository can be found at intel/xpumanager.

Run XPU Manager in Docker

Enable TLS

Generate certificate for TLS and configure REST user credential:

xpum_src_root=... # e.g., $HOME/xpumanager
mkdir -p rest/conf
sudo ${xpum_src_root}/install/tools/rest/keytool.sh --owner=root --group=root
sudo ${xpum_src_root}/install/tools/rest/rest_config.py --owner=root --group=root

Run the XPUM container:

xpum_image=...
docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \
--publish 29999:29999 \
--device /dev/dri:/dev/dri \
-v $(pwd)/rest/conf:/usr/lib/xpum/rest/conf:ro \
${xpum_image}

Disable TLS

Configure REST user credential:

xpum_src_root=... # e.g., $HOME/xpumanager
mkdir -p rest/conf
sudo ${xpum_src_root}/install/tools/rest/rest_config.py --owner=root --group=root

Run the XPUM container by passing environment variable XPUM_REST_NO_TLS=1:

xpum_image=...
docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \
--publish 29999:29999 \
--device /dev/dri:/dev/dri \
-v $(pwd)/rest/conf:/usr/lib/xpum/rest/conf:ro \
-e XPUM_REST_NO_TLS=1 \
${xpum_image}

Support Fabric Throughput

To enable XPUM for reporting fabric throughput metrics, the container must be run in the 'host' network mode (--network host). To avoid port conflict with host network port assignment, you can pass environment variable XPUM_REST_PORT to the container to specify the XPUM REST server port. For example:

docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \
--network host \
--device /dev/dri:/dev/dri \
-v $(pwd)/rest/conf:/usr/lib/xpum/rest/conf:ro \
-e XPUM_REST_NO_TLS=1 \
-e XPUM_REST_PORT=12345 \
${xpum_image}

Support PCIe Throughput

PCIe throughput metrics collection depends on the kernel module 'msr'. It should be loaded on the host by "modprobe msr".

And this metrics collection is not started in XPUM by default. To make XPUM start to collect it, user needs to pass environment variable XPUM_METRICS which includes the PCIe throughput metrics index.

This example shows how to get the list of metrics index from XPUM daemon help text:

docker run --rm --entrypoint /usr/bin/xpumd ${xpum_image} -h

This example shows how to make XPUM in container start to collect PCIe throughput metrics by passing the environment variable XPUM_METRICS:

docker run --rm --cap-drop ALL --cap-add=SYS_ADMIN \
--cap-add=SYS_RAWIO \
--publish 29999:29999 \
--device /dev/dri:/dev/dri \
--device /dev/cpu:/dev/cpu \
-v /sys/firmware/acpi/tables/MCFG:/pcm/sys/firmware/acpi/tables/MCFG:ro \
-v /proc/bus/pci/:/pcm/proc/bus/pci/ \
-v /proc/sys/kernel/nmi_watchdog:/pcm/proc/sys/kernel/nmi_watchdog \
-v $(pwd)/rest/conf:/usr/lib/xpum/rest/conf:ro \
-e XPUM_REST_NO_TLS=1 \
-e XPUM_METRICS=0-38 \
${xpum_image}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deployment

deployment

README.md

Intel XPU Manager

Run XPU Manager in Docker

Enable TLS

Disable TLS

Support Fabric Throughput

Support PCIe Throughput

Files

deployment

Directory actions

More options

Directory actions

More options

Latest commit

History

deployment

Folders and files

parent directory

README.md

Intel XPU Manager

Run XPU Manager in Docker

Enable TLS

Disable TLS

Support Fabric Throughput

Support PCIe Throughput