Yes, just like the title says. We are now taking our setup from the last post and move on.
Proxmox Host setup
We got Proxmox on our NUC and the eGPU pluged in and ready to go. First we need to prepare the Proxmox Host.
nano /etc/default/grub
# Adjust accordingly:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt video=efifb:off"
update-grub
reboot
We are activating IOMMU which we need for the PCI passthrough. Next we need the VFIO modules loaded:
echo -e "vfio\nvfio_pci\nvfio_virqfd" >> /etc/modules
I am assuming that you will not be using the eGPU to drive a monitor or somesuch on your Proxmox NUC but that you are planning on using it exclusive for the VMs. In case you will be using it on the host (really do not know why) then you need to make shure it binds to the host.
#OPTIONAL - NOT REALLY NEEDED:
echo -e "blacklist nouveau\noptions nouveau modeset=0" | tee /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot
Same goes for the NVIDEA drivers, we can install, load and test, but no need on the Proxmox Host as we will not be using it directly there.
Guest VM setup
So let’s get a VM built which we can then use as a template for all of our future tinkering. Get yourself an Ubuntu22 ISO Server to install. Yes Version 24 is already out a while, however I had more problems later on my LLM tinkering than I imagined, so I fell back to 22 for now. Need to get a working win here before I throw it all in the garbage, if you know what I mean.
So make a KVM VM in your Proxmox GUI. Add the Ubuntu CD-ROM image. Give it 60 GB of space (won’t need it all for the template). Give it some Cores and Ram.
Important: you do not want a default machine, you want so select „q35“.
When installing Ubuntu, you can take the Default or the Minimal, I do not remember really missing anything in just using the Minimal. Do not forget to install the ssh server of course, as you will need to connect to it 🙂
When done, as susual, remove the CD and reboot the VM. Log in and see that all is fine. Now there was one thing I installed for better console visual when I used the minimal server:
sudo apt install -y whiptail
Then make shure you are all up to date:
sudo apt update && sudo apt upgrade -y
Now we need to add the NVIDIA server:
sudo apt install -y wget gnupg
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | gpg --dearmor | sudo tee /etc/apt/keyrings/cuda-archive-keyring.gpg > /dev/null
echo "deb [signed-by=/etc/apt/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt update
Get CUDA installed:
sudo apt install -y cuda-toolkit-12-2 libcudnn8 libcudnn8-dev
Make nvcc available on the system and see if it works:
echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
One last step and if you are using a differen Grafics Card than I am here you most likely need to do your own thing, as I doubt you got the same driver.
sudo apt install -y nvidia-driver-535
reboot
# Check after reboot
nvcc --version
nvidia-smi
So if you got a different card, what are you going to do? Well let’s see what you got first:
lspci -nnk | grep -i nvidia
Here you will see some dort od ID like „10de:2783“ or similar. Write that down somewhere as you will need it later again.
Now with that ID you can go searching direclty at NVIDIA or do it like me and check The PCI ID Repository.
Time to do the passthrough of the eGPU from Proxmox to the VM.
First check on the Proxmox Host:
lspci -nn | grep -i nvidia
for d in /sys/kernel/iommu_groups/*/devices/*; do
echo -n "$d → ";
lspci -nnks "${d##*/}";
echo "";
done | grep -EA3 "NVIDIA|GeForce"
Both VGA and Audio should be listed in the same IOMMU group. Now you can go to the GUI and go to Hardware and add the PCI devices. Now I noticed that the GUI does not add the full Adress of the device into the conf.
So check on the console (your numbers will probably differ):
nano /etc/pve/qemu-server/<VMID>.conf
# Beispiel:
hostpci0: 01:00.0,pcie=1
hostpci1: 01:00.1,pcie=1
If you have some vga settings in there, this might block your passthrough.
Now we need to get VFIO ready:
echo -e "vfio\nvfio_pci\nvfio_virqfd" | sudo tee -a /etc/modules
echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo -e "options vfio-pci ids=10de:2783,10de:22bc" | sudo tee /etc/modprobe.d/vfio.conf
update-initramfs -u
reboot
Now we need to check if we got it all working:
lspci -nnk | grep -i nvidia -A3
dmesg | grep -i vfio
Would be good to see: "Kernel driver in use: vfio-pci
„
If you are not getting this then recheck, is IOMMU active? The VM is a q35?
If all went well you are done and can make a Template out of the VM and get started tinkering.
Do realize that checking nvidia-smi will fail of course without nvidia drivers.