Proxmox eGPU setup for AI VM usage

Yes, just like the title says. We are now taking our setup from the last post and move on.

Proxmox Host setup

We got Proxmox on our NUC and the eGPU pluged in and ready to go. First we need to prepare the Proxmox Host.

nano /etc/default/grub
# Adjust accordingly:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=efifb:off"
update-grub
reboot

We are activating IOMMU which we need for the PCI passthrough. Next we need the VFIO modules loaded:

echo -e "vfio\nvfio_pci\nvfio_virqfd" >> /etc/modules

I am assuming that you will not be using the eGPU to drive a monitor or somesuch on your Proxmox NUC but that you are planning on using it exclusive for the VMs. In case you will be using it on the host (really do not know why) then you need to make shure it binds to the host.

#OPTIONAL - NOT REALLY NEEDED:
echo -e "blacklist nouveau\noptions nouveau modeset=0" | tee /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

Same goes for the NVIDEA drivers, we can install, load and test, but no need on the Proxmox Host as we will not be using it directly there.

Guest VM setup

So let’s get a VM built which we can then use as a template for all of our future tinkering. Get yourself an Ubuntu22 ISO Server to install. Yes Version 24 is already out a while, however I had more problems later on my LLM tinkering than I imagined, so I fell back to 22 for now. Need to get a working win here before I throw it all in the garbage, if you know what I mean.

So make a KVM VM in your Proxmox GUI. Add the Ubuntu CD-ROM image. Give it 60 GB of space (won’t need it all for the template). Give it some Cores and Ram.

Important: you do not want a default machine, you want so select „q35“.

When installing Ubuntu, you can take the Default or the Minimal, I do not remember really missing anything in just using the Minimal. Do not forget to install the ssh server of course, as you will need to connect to it 🙂

When done, as susual, remove the CD and reboot the VM. Log in and see that all is fine. Now there was one thing I installed for better console visual when I used the minimal server:

sudo apt install -y whiptail

Then make shure you are all up to date:

sudo apt update && sudo apt upgrade -y

Now we need to add the NVIDIA server:

sudo apt install -y wget gnupg

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin

sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600

sudo mkdir -p /etc/apt/keyrings

curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | gpg --dearmor | sudo tee /etc/apt/keyrings/cuda-archive-keyring.gpg > /dev/null

echo "deb [signed-by=/etc/apt/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda.list

sudo apt update

Get CUDA installed:

sudo apt install -y cuda-toolkit-12-2 libcudnn8 libcudnn8-dev

Make nvcc available on the system and see if it works:

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc

One last step and if you are using a differen Grafics Card than I am here you most likely need to do your own thing, as I doubt you got the same driver.

sudo apt install -y nvidia-driver-535
reboot
# Check after reboot
nvcc --version
nvidia-smi

So if you got a different card, what are you going to do? Well let’s see what you got first:

lspci -nnk | grep -i nvidia

Here you will see some dort od ID like „10de:2783“ or similar. Write that down somewhere as you will need it later again.

Now with that ID you can go searching direclty at NVIDIA or do it like me and check The PCI ID Repository.

Time to do the passthrough of the eGPU from Proxmox to the VM.

First check on the Proxmox Host:

lspci -nn | grep -i nvidia
for d in /sys/kernel/iommu_groups/*/devices/*; do 
  echo -n "$d → "; 
  lspci -nnks "${d##*/}"; 
  echo ""; 
done | grep -EA3 "NVIDIA|GeForce"

Both VGA and Audio should be listed in the same IOMMU group. Now you can go to the GUI and go to Hardware and add the PCI devices. Now I noticed that the GUI does not add the full Adress of the device into the conf.

So check on the console (your numbers will probably differ):

nano /etc/pve/qemu-server/<VMID>.conf
# Beispiel:
hostpci0: 01:00.0,pcie=1
hostpci1: 01:00.1,pcie=1

If you have some vga settings in there, this might block your passthrough.

Now we need to get VFIO ready:

echo -e "vfio\nvfio_pci\nvfio_virqfd" | sudo tee -a /etc/modules

echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo -e "options vfio-pci ids=10de:2783,10de:22bc" | sudo tee /etc/modprobe.d/vfio.conf
update-initramfs -u
reboot

Now we need to check if we got it all working:

lspci -nnk | grep -i nvidia -A3
dmesg | grep -i vfio

Would be good to see: "Kernel driver in use: vfio-pci

If you are not getting this then recheck, is IOMMU active? The VM is a q35?

If all went well you are done and can make a Template out of the VM and get started tinkering.

1 Gedanke zu „Proxmox eGPU setup for AI VM usage“

Schreibe einen Kommentar