So I’ve been doing some Ollama LLM testing on my old RTX 3060 ti on a Windows machine and found a model that works acceptably on that GPU for Home Assistant, and now I want to switch it to my LLM VM.
I added an Oculink card and GPU dock to the VM server.
The host shows the card properly in lspci.
I set the card and audio device for passthrough in the UI and added them to the VM.
I disabled dom0 access to the devices per the XCP-NG documentation.
I added both devices to the passthrough list on the VM in the UI.
When the VM is running they show in the UI properly.
However, the VM does not show the devices in lspci (and nvidia drivers don’t see it).
I followed both the Debian 13 direct instructions, the Nvidia direct instructions for Debian 13, and even reloaded the VM with Ubuntu 24.04 and sued the Nvidia direct 580 drivers.
I’ve turned the VGA setting for the VM on and off in case that was conflicting.
At one point last night it showed the card in lspci..when there were no Nvidia drivers installed..but I haven’t seen it again in any variation I’ve tried.
Any suggestion on what I might be missing/doing wrong?
XCP-NG 8.3.0 (fully patched) and XOA from source that I updated last night while messing with stuff in case that would help..not sure the exact version but apparently it’s a combination of v6 and v5 now…(ui loads to the new interface and most management stuff links back to the old interface).
Is the RTX 3000 series allowed to pass through? I know pretty much any Quadro card works, but there was a gotcha with some of the RTX until a few years ago. Yes it’s an nVidia thing where they blocked it to force you to buy a Quadro or better card. I know they allowed certain RTX and Geforce cards to work, but I don’t remember how new the card needed to be.
My understanding is the old restriction was removed, and Nvidia lists Ampere series on their list of passthrough supported architectures. So it SHOULD be unless it’s only part of that series and they left some restrictions in?
Making progress I think.
After some messing around it seems to be somewhat reliably seeing the GPU (but sometimes I need to reboot host..I’m guessing sometimes when things freeze up and I force shutdown/reboot the VM it doesn’t properly release/reconnect GPU).
Nvidia Direct Drivers 590 and 595 DO NOT support RTX 3060, so I’m reloading the VM and will try the Debian packaged ones (550).
Well I’m stuck again lol.
Fresh Debian install, fresh boot of host, and both nvdiia-kernel-dkms and nvidia-open-kernel-dkms cause the vm to freeze on boot with “probe with driver nvidia failed with error -1”
nvidia-open-dkms also gives “Unrecognized AMD processor in cpuidInfoAMD”
So at this point it seems like a very finicky issue with drivers that supposedly support the 3060…which makes me wonder how much is a virtualization issue and how much is just old card issue..
In your original post you mention an Oculink dock - I used one of those to passthrough a 3060 to an Ubuntu VM on XCP-ng 8.3 but I did not pass the GPU device through, I passed the Oculink device through (The GPU is a child device of the Oculink device). Try pass through only the Oculink PCIe device, and see if that works?
Interesting..how does your device show up? I don’t see anything in lspci or the ui device list that looks like the adapter card or dock, just the GPU video/audio devices.
Sadly I’ve reinstalled that system and no longer using the OcuLink setup. I did have an OcuLink add-on PCIe card installed in a Lenovo M920q, with the dock attached to that. Now I’m second-guessing myself as to whether I passed through an ‘OcuLink PCIe’ device. Hope I haven’t set you off on a false trail. But I am certain I didn’t pass through the GPU device… Can’t easily replicate the test as my kit is in pieces and in crates.
Haha all good. Maybe I’ll try Ubuntu again now that I have a better idea of likely usable driver versions and the Nvidia site has a number of versions available for it.