VM Setup#
Prerequisites#
Before starting this guide, you must complete:
Getting Started with Virtualization - Understanding of MxGPU concepts
Host Configuration - Host system properly configured
Advanced (for custom configurations):
Guest VM Initial Setup#
The initial VM setup can be performed using QEMU/libvirt command line utilities. Creation of each guest OS VM is similar, and steps are mostly common for each of them. This chapter will reference Ubuntu22.04 setup as an example.
Install dependencies:
# sudo apt update
# sudo apt install cloud-utils
Dowload Ubuntu base Image:
# sudo wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
Set password for new VM:
# cat >user-data1.txt <<EOF
# > #cloud-config
# > password: user1234
# > chpasswd: { expire: False }
# > ssh_pwauth: True
# > EOF
# sudo cloud-localds user-data1.img user-data1.txt
Create a disk for new VM:
# sudo qemu-img create -b ubuntu-22.04-server-cloudimg-amd64.img -F qcow2 -f qcow2 ubuntu22.04-vm1-disk.qcow2 100G
Install new VM and login to check the IP:
# sudo virt-install --name ubuntu22.04-vm1 --virt-type kvm --memory 102400 --vcpus 20 --boot hd,menu=on --disk path=ubuntu22.04-vm1-disk.qcow2,device=disk --disk path=user-data1.img,format=raw --graphics none --os-variant ubuntu22.04
# Login: ubuntu
# Password: user1234
# ip addr
# sudo passwd root (set root password as `user1234`)
# sudo usermod -aG sudo ubuntu
# sudo vi /etc/default/grub
# GRUB_CMDLINE_LINUX="modprobe.blacklist=amdgpu"
# sudo update-grub
# sync
# sudo shutdown now
# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234) - verify access
# exit
GPU VF device nodes can be added to VM XML configuration using sudo virsh edit <VM_NAME> command and by modifying devices section:
# sudo virsh list --all
# sudo virsh shutdown ubuntu22.04-vm1
# sudo virsh edit ubuntu22.04-vm1 (add hostdev entry under devices section)
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x<DEVICE_BUS_ID>' slot='0x<DEVICE_SLOT>' function='0x0'/>
</source>
</hostdev>
Repeat this step for every virtual GPU that is being added to the VM (one node per virtual device). DEVICE_BUS_ID and DEVICE_SLOT for each of targeted device can be obtained from output of lspci -d 1002:74b5 command which prints out devices VF BDF address in format DEVICE_BUS_ID:DEVICE_SLOT.function.
As an example, this is how all eight GPU VF device nodes can be added to the VM config. If this is the output of the command:
# lspci -d 1002:74b5
03:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
26:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
43:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
63:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
83:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
a3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
c3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
e3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)
VF BDF address is shown at the beginning of every line in the mentioned format: DEVICE_BUS_ID:DEVICE_SLOT.function
Based on that data, GPU VFs device nodes should be added to the VM XML configuration under
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x03' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x26' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x43' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x63' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x83' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0xa3' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0xc3' slot='0x02' function='0x0'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0xe3' slot='0x02' function='0x0'/>
</source>
</hostdev>
Check added GPUs are visible on the guest:
# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234)
# lspci
To set up a RHEL VM, refer to the Red Hat documentation on preparing and deploying KVM guest images with Image Builder. The process is similar to the Ubuntu setup, as both utilize QEMU/libvirt for creating and configuring the VM. However, there are some differences: RHEL images are obtained from the Red Hat Customer Portal and their setup may involve using Image Builder for customization, while Ubuntu images are downloaded from the Ubuntu cloud images repository.
It’s important to note that assigning GPU VF devices to the VM is not operating system-specific. Described method for adding VF devices is consistent across both RHEL and Ubuntu environments.
Guest Driver Setup#
Connect to the VM to install ROCm AMDGPU VF Driver:
# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234)
The ROCm™ software stack and other Radeon™ software for Linux components are installed using the amdgpu-install script to assist you in the installation of a coherent set of stack components. For installation steps and after-install verification please refer to Radeon software for Linux with ROCm installation guide.
Note: Loading AMDGPU VF Driver should be done with command:
# sudo modprobe amdgpu
Post-install verification check#
To confirm that the entire setup is functioning correctly and that VM can efficiently execute tasks on the GPU, check output from rocminfo and clinfo tools in the VM.
# sudo rocminfo
Output should be as follows:
[...]
*******
Agent 2
*******
Name: gfx942
Uuid: GPU-664b52e347835f94
Marketing Name: AMD Instinct MI300X
Vendor Name: AMD
Feature: KERNEL_DISPATCH
[...]
Also try following:
# sudo clinfo
Output should be as follows:
[...]
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP (3649.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Host timer resolution 1ns
[...]
This marks the final step in setting up the AMD GPUs with MxGPU in KVM/QEMU environments. By following the outlined steps, users can effectively allocate GPU resources across virtual machines, optimizing performance and resource utilization for demanding workloads.
With your environment now configured, consider deploying high-performance computing applications, artificial intelligence models, or machine learning tasks that can fully leverage the compute capabilities of the AMD GPUs. These applications can benefit significantly from the enhanced resource allocation that MxGPU provides.
Next Steps#
Congratulations! Your MxGPU setup is complete.
Optional Maintenance#
Removing MxGPU - Clean removal procedures when needed