Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel config for NVIDIA DPU/ConnectX adapter #9620

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

l8huang
Copy link

@l8huang l8huang commented May 10, 2024

With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough to guest VM in guest-kernel mode. In the guest kernel, the adapter's driver is required to claim the VFIO device and create network interface.

@katacontainersbot katacontainersbot added the size/small Small and simple task label May 10, 2024
@l8huang
Copy link
Author

l8huang commented May 10, 2024

FYI: in the guest VM:

root@localhost:/proc# lspci  -tv
-[0000:00]-+-00.0  Device 8086:29c0
           +-01.0  Device 1af4:1003
           +-02.0-[01]--
           +-03.0  Device 1af4:1004
           +-04.0  Device 1af4:1005
           +-05.0-[02]----00.0  Device 15b3:101e
           +-06.0-[03]--
           +-07.0  Device 1af4:1053
           +-08.0  Device 1af4:1009
           +-1f.0  Device 8086:2918
           +-1f.2  Device 8086:2922
           \-1f.3  Device 8086:2930

root@localhost:/proc# lspci -nn -k -s 0000:02:00.0
02:00.0 Class [0200]: Device [15b3:101e] (rev 01)
	Subsystem: Device [15b3:0063]
	Kernel driver in use: mlx5_core

root@localhost:/proc# ethtool -i eth0 
driver: mlx5_core
version: 6.1.62-nvidia-gpu
firmware-version: 24.35.3502 (MT_0000000542)
expansion-rom-version: 
bus-info: 0000:02:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes

@zvonkok zvonkok changed the title Add kernel config for Nvidia DPU/ConnectX adapter Add kernel config for NVIDIA DPU/ConnectX adapter May 13, 2024
@l8huang
Copy link
Author

l8huang commented May 17, 2024

@zvonkok PR updated, PTAL, thanks

@zvonkok zvonkok added ok-to-test area/dpu All things related to Data Processing Units area/arm Issues specific to the ARM architecture labels May 22, 2024
@zvonkok
Copy link
Contributor

zvonkok commented May 22, 2024

@l8huang You need to bump tools/packaging/kernel/kata_config_version

@l8huang l8huang force-pushed the kernel branch 2 times, most recently from 95edfb3 to 71c5f45 Compare May 22, 2024 21:31
@l8huang
Copy link
Author

l8huang commented May 22, 2024

@l8huang You need to bump tools/packaging/kernel/kata_config_version

Thanks for the heads up, version updated.

Copy link
Contributor

@zvonkok zvonkok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@l8huang
Copy link
Author

l8huang commented Jun 4, 2024

@lifupan @fidencio @GabyCT need another LGTM, could you please take a look? thanks

Copy link
Member

@amshinde amshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@l8huang I see that this PR is for adding kernel configs for NVIDIA Mellanox ConnectX network adaptor. While the PR looks good overall, the usage of NVIDIA here and in tools/packaging/kernel/configs/fragments/dpu/nvidia.conf file name you added is slightly confusing, also due to the fact we have fragments present for NVIDIA gpu as well.

Since the configs added in the fragment here have Mellonox/MLX in the name, can we rename this to Mellanox instead to reduce confusion.

I am suggesting to replace
-D : DPU/SmartNIC vendor, only NVIDIA. => -D : DPU/SmartNIC vendor, only Mellanox.

cc @zvonkok @fidencio

@l8huang
Copy link
Author

l8huang commented Jun 6, 2024

-D : DPU/SmartNIC vendor, only NVIDIA. => -D : DPU/SmartNIC vendor, only Mellanox.

@amshinde Mellanox was acquired by NVIDIA, the products are named under NVIDIA now.

@amshinde
Copy link
Member

amshinde commented Jun 6, 2024

-D : DPU/SmartNIC vendor, only NVIDIA. => -D : DPU/SmartNIC vendor, only Mellanox.

@amshinde Mellanox was acquired by NVIDIA, the products are named under NVIDIA now.

@l8huang I understand that, I suggested the rename to Mellanox to avoid confusion with Nvidia GPU, and since the kernel configs refer to Mellanox rather than Nvidia.

@l8huang
Copy link
Author

l8huang commented Jun 7, 2024

According to https://en.wikipedia.org/wiki/Mellanox_Technologies:

The company was integrated into Nvidia's networking division in 2020 and Nvidia stopped using the brand name "Mellanox" for its new networking products.

If one googles Mellanox Technologies, the top results point to NVIDIA.

TBH: I don't see too much confusion, the option says the DPU/SmartNIC vendor. We should move beyond historical legacies, looking forward NVIDIA is the de facto vendor.

@zvonkok what do you think?

@l8huang
Copy link
Author

l8huang commented Jun 13, 2024

@amshinde Would you mind merging this PR as it is? If any confusion arises later, I will address and amend it accordingly.

@amshinde
Copy link
Member

@l8huang There is a merge conflict now, can you rebase this PR?

With Nvidia DPU or ConnectX network adapter, VF can do VFIO passthrough
to guest VM in `guest-kernel` mode. In the guest kernel, the adapter's
driver is required to claim the VFIO device and create network interface.

Signed-off-by: Lei Huang <leih@nvidia.com>
@l8huang
Copy link
Author

l8huang commented Jun 25, 2024

@amshinde thanks for heads up, just rebased.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/arm Issues specific to the ARM architecture area/dpu All things related to Data Processing Units ok-to-test size/small Small and simple task
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants