Deploy a Workload with a Network Device#
1. Create a NetworkAttachmentDefinition#
Create a Network Attachment Definition to assign requested device to a workload
cat <<EOF > amd-host-device-nad.yaml
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: amd-host-device-nad
annotations:
k8s.v1.cni.cncf.io/resourceName: amd.com/vnic
spec:
config: '{
"name": "amd-host-device-nad",
"cniVersion": "0.3.1",
"type": "amd-host-device"
}'
EOF
This step defines a secondary network using a NetworkAttachmentDefinition, which ensures the requested NIC or vNIC device is assigned to the workload pod via Multus.
2. Deploy the Workload#
Create a workload requesting for a nic/vnic
cat <<EOF > workload.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: workload-app
spec:
replicas: 2
selector:
matchLabels:
app: workload-app
template:
metadata:
annotations:
k8s.v1.cni.cncf.io/networks: amd-host-device-nad
labels:
app: workload-app
spec:
hostNetwork: false
containers:
- name: workload-container
image: docker.io/rocm/roce-workload:ubuntu24_rocm7_rccl-J13A-1_anp-v1.1.0-4D_ainic-1.117.1-a-63
imagePullPolicy: IfNotPresent
workingDir: /tmp
command: ["/bin/bash", "-c"]
args:
- |
/tmp/container_setup.sh
securityContext:
capabilities:
add:
- IPC_LOCK
- NET_ADMIN
resources:
requests:
amd.com/gpu: 1
amd.com/nic: 1
limits:
amd.com/gpu: 1
amd.com/nic: 1
3. Run IB and RCCL Tests#
3.1 Run IB between the nodes#
Exec into the workload pods and run IB and RCCL tests between the nodes.
On node1, start the write bandwidth test using the local RoCE device:
root@app:/tmp# ib_write_bw -d roce_ai1 -i 1 -n 1000 -F -a -x 1 -q 1
On node2, run the write bandwidth test targeting node1’s IP address, specifying its local RoCE device:
root@app:/tmp# ib_write_bw -d ionic_0 -i 1 -n 1000 -F -a -x 1 -q 1 55.1.1.56
Note:
roce_ai1
and ionic_1
are the RoCE devices available on the respective pods.
You can list available RDMA devices by running ibv_devices
inside the pod or workload container.
3.2 Run RCCL between the nodes#
root@app:/tmp# /tmp/vf_rccl_run.sh