VM Setup#

Prerequisites#

Before starting this guide, you must complete:

  1. Getting Started with Virtualization - Understanding of MxGPU concepts

  2. Host Configuration - Host system properly configured

Advanced (for custom configurations):

  1. GPU Partitioning

  2. XGMI Configuration


Guest VM Initial Setup#

The initial VM setup can be performed using QEMU/libvirt command line utilities. Creation of each guest OS VM is similar, and steps are mostly common for each of them. This chapter will reference Ubuntu22.04 setup as an example.

  1. Install dependencies:

# sudo apt update
# sudo apt install cloud-utils
  1. Dowload Ubuntu base Image:

# sudo wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
  1. Set password for new VM:

# cat >user-data1.txt <<EOF
# > #cloud-config
# > password: user1234
# > chpasswd: { expire: False }
# > ssh_pwauth: True
# > EOF
# sudo cloud-localds user-data1.img user-data1.txt
  1. Create a disk for new VM:

# sudo qemu-img create -b ubuntu-22.04-server-cloudimg-amd64.img -F qcow2 -f qcow2 ubuntu22.04-vm1-disk.qcow2 100G
  1. Install new VM and login to check the IP:

# sudo virt-install --name ubuntu22.04-vm1 --virt-type kvm --memory 102400 --vcpus 20 --boot hd,menu=on --disk path=ubuntu22.04-vm1-disk.qcow2,device=disk --disk path=user-data1.img,format=raw --graphics none --os-variant ubuntu22.04

# Login: ubuntu
# Password: user1234
# ip addr
# sudo passwd root (set root password as `user1234`)
# sudo usermod -aG sudo ubuntu
# sudo vi /etc/default/grub
#       GRUB_CMDLINE_LINUX="modprobe.blacklist=amdgpu"
# sudo update-grub
# sync
# sudo shutdown now
# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234) - verify access
# exit
  1. GPU VF device nodes can be added to VM XML configuration using sudo virsh edit <VM_NAME> command and by modifying devices section:

# sudo virsh list --all
# sudo virsh shutdown ubuntu22.04-vm1
# sudo virsh edit ubuntu22.04-vm1 (add hostdev entry under devices section)

<hostdev mode='subsystem' type='pci' managed='yes'>
    <source>
        <address domain='0x0000' bus='0x<DEVICE_BUS_ID>' slot='0x<DEVICE_SLOT>' function='0x0'/>
    </source>
</hostdev>

Repeat this step for every virtual GPU that is being added to the VM (one node per virtual device). DEVICE_BUS_ID and DEVICE_SLOT for each of targeted device can be obtained from output of lspci -d 1002:74b5 command which prints out devices VF BDF address in format DEVICE_BUS_ID:DEVICE_SLOT.function.

As an example, this is how all eight GPU VF device nodes can be added to the VM config. If this is the output of the command:

# lspci -d 1002:74b5
03:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
26:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
43:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
63:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
83:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
a3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
c3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02) 
e3:02.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 74b5 (rev 02)

VF BDF address is shown at the beginning of every line in the mentioned format: DEVICE_BUS_ID:DEVICE_SLOT.function

Based on that data, GPU VFs device nodes should be added to the VM XML configuration under section: (in this example all 8 GPUs are being assinged to a single VM)

<hostdev mode='subsystem' type='pci' managed='yes'>
      <source> 
          <address domain='0x0000' bus='0x03' slot='0x02' function='0x0'/> 
      </source> 
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
      <source> 
          <address domain='0x0000' bus='0x26' slot='0x02' function='0x0'/> 
      </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0x43' slot='0x02' function='0x0'/> 
      </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0x63' slot='0x02' function='0x0'/> 
      </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0x83' slot='0x02' function='0x0'/> 
      </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0xa3' slot='0x02' function='0x0'/> 
      </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0xc3' slot='0x02' function='0x0'/> 
                </source> 
</hostdev> 
<hostdev mode='subsystem' type='pci' managed='yes'> 
      <source> 
          <address domain='0x0000' bus='0xe3' slot='0x02' function='0x0'/> 
      </source> 
</hostdev>
  1. Check added GPUs are visible on the guest:

# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234)
# lspci

To set up a RHEL VM, refer to the Red Hat documentation on preparing and deploying KVM guest images with Image Builder. The process is similar to the Ubuntu setup, as both utilize QEMU/libvirt for creating and configuring the VM. However, there are some differences: RHEL images are obtained from the Red Hat Customer Portal and their setup may involve using Image Builder for customization, while Ubuntu images are downloaded from the Ubuntu cloud images repository.

It’s important to note that assigning GPU VF devices to the VM is not operating system-specific. Described method for adding VF devices is consistent across both RHEL and Ubuntu environments.

Guest Driver Setup#

Connect to the VM to install ROCm AMDGPU VF Driver:

# sudo virsh start ubuntu22.04-vm1
# sudo virsh domifaddr ubuntu22.04-vm1
# ssh [email protected] (password: user1234)

The ROCm™ software stack and other Radeon™ software for Linux components are installed using the amdgpu-install script to assist you in the installation of a coherent set of stack components. For installation steps and after-install verification please refer to Radeon software for Linux with ROCm installation guide.

Note: Loading AMDGPU VF Driver should be done with command:

# sudo modprobe amdgpu

Post-install verification check#

To confirm that the entire setup is functioning correctly and that VM can efficiently execute tasks on the GPU, check output from rocminfo and clinfo tools in the VM.

# sudo rocminfo

Output should be as follows:

[...] 

*******                   

Agent 2                   

*******                   

  Name:                    gfx942                              

  Uuid:                    GPU-664b52e347835f94                

  Marketing Name:          AMD Instinct MI300X                 

  Vendor Name:             AMD                                 

  Feature:                 KERNEL_DISPATCH                     

[...] 

Also try following:

# sudo clinfo

Output should be as follows:

[...] 

  Platform Name                                   AMD Accelerated Parallel Processing 

  Platform Vendor                                 Advanced Micro Devices, Inc. 

  Platform Version                                OpenCL 2.1 AMD-APP (3649.0) 

  Platform Profile                                FULL_PROFILE 

  Platform Extensions                             cl_khr_icd cl_amd_event_callback  

  Platform Extensions function suffix             AMD 

  Platform Host timer resolution                  1ns 

[...]

This marks the final step in setting up the AMD GPUs with MxGPU in KVM/QEMU environments. By following the outlined steps, users can effectively allocate GPU resources across virtual machines, optimizing performance and resource utilization for demanding workloads.

With your environment now configured, consider deploying high-performance computing applications, artificial intelligence models, or machine learning tasks that can fully leverage the compute capabilities of the AMD GPUs. These applications can benefit significantly from the enhanced resource allocation that MxGPU provides.


Next Steps#

Congratulations! Your MxGPU setup is complete.

Optional Maintenance#