Dual GPU Setup issues
I recently bought a second RTX 3090 to run LLM models and other AI work. However my motherboard only gives me a PCIe 3.0 x4 slot for the second gpu while the primary slot gets full x16 Gen4, this should not be a problem however.. Should. Problem is, whether it is Linux or my motherboard the GPU order enumerates GPUs by PCI bus address.
The Problem
My two GPUs sit at:
0000:04:00.0- the x4 slot (slow)0000:08:00.0- the x16 slot (fast)
Lower bus address = GPU 0. Higher = GPU 1. So my display, games, drivers and everything else default to the GPU with 1/4 the bandwidth. While not optimal this should still work, however i got really bad performance and tearing.
Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 3@ 4x
Device 1 [NVIDIA GeForce RTX 3090] PCIe GEN 4@16x
My BIOS/UEFI gives me nothing to fix this.. no software launch options seem to help, GNOME/Wayland cares little for my attempts..
The Solution
I create a service to unbind the slow GPU before the display manager starts. The nvidia driver loads on the remaining GPU, making it GPU 0. Display works on the fast slot. The AI card can be bound/unbound later without crashing everything since drivers, desktop etc are already loaded elsewhere.
Setup
Create the systemd service:
sudo nano /etc/systemd/system/unbind-gpu0.service
[Unit]
Description=Unbind GPU 0 from display
Before=display-manager.service
After=systemd-modules-load.service
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo 0000:04:00.0 > /sys/bus/pci/drivers/nvidia/unbind'
RemainAfterExit=yes
[Install]
WantedBy=graphical.target
Enable it:
sudo systemctl enable unbind-gpu0.service
Reboot. Your x16 GPU is now primary.
Re-enabling the AI GPU
Then when you need both GPUs you run this script (put in /bin/ for easy access):
#!/bin/bash
echo "Binding GPU 0..."
sudo sh -c 'echo 0000:04:00.0 > /sys/bus/pci/drivers/nvidia/bind'
sleep 2
nvidia-smi
echo "GPU 0 enabled"
And then we can unbind it again when we are done, just have to make sure that we kill everything on the secondary gpu first:
#!/bin/bash
# Kill all processes using GPU 0
GPU0_PIDS=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader -i 0 2>/dev/null)
if [ -n "$GPU0_PIDS" ]; then
echo "Killing processes on GPU 0: $GPU0_PIDS"
echo "$GPU0_PIDS" | xargs -r kill -9
sleep 2
fi
# Verify no processes remain
REMAINING=$(nvidia-smi --query-compute-apps=pid --format=csv,noheader -i 0 2>/dev/null)
if [ -n "$REMAINING" ]; then
echo "ERROR: Processes still running on GPU 0"
exit 1
fi
# Unbind
echo "Unbinding GPU 0..."
sudo sh -c 'echo 0000:04:00.0 > /sys/bus/pci/drivers/nvidia/unbind'
nvidia-smi
echo "GPU 0 disabled"
Finding Your PCI Addresses
If your addresses differ:
lspci | grep -i nvidia
Or for more detail:
nvidia-smi --query-gpu=index,pci.bus_id,name --format=csv