Release Notes#

v1.1.0#

This release introduces major enhancements, including a Cluster Validation Framework and Network Operator images redesigned for deployment independent of the host OS version.

Release Highlights#

  • Network Operator

    • Introduced support for the Cluster Validation Framework, enabling validation of newly added worker nodes in the Kubernetes cluster before scheduling distributed training or inference workloads

    • Added support for Fluent sidecar-based logging, providing centralized logging of cluster validation runs.

  • Device Plugin, Metrics Exporter and Node Labeller

    • The NICCTL tool is now bundled within the Device Plugin, Metrics Exporter and Node Labeller images, allowing these Operator components to run independently of host OS versions

  • RoCE Workload Image

    • Ubuntu-based workload image with supported AINIC firmware 1.117.5-a-56 has been uploaded to ROCm Docker Hub for running RCCL and InfiniBand tests

v1.0.1#

This release introduces support for user-defined tolerations in KMM modules and includes significant latency improvements for RDMA statistics in the Device Metrics Exporter.

Release Highlights#

  • Network Operator

    • Added support for user-defined tolerations for the KMM module. Users can now inject custom tolerations into the KMM Module via the NetworkConfig CR.

  • Device Metrics Exporter

    • Improved RDMA statistics collection, reducing the previously observed latency by several folds compared to the earlier release v1.0.0

v1.0.0#

This release is the first major release of AMD Network Operator. The AMD Network Operator simplifies the use of AMD AINICs in Kubernetes environments. It manages all networking components required to enable RDMA workloads within a Kubernetes cluster.

Release Highlights#

  • Manage AMD AI NIC drivers with desired versions on Kubernetes cluster nodes

  • Customized scheduling and efficient resource allocation for containers

  • Metrics and statistics monitoring solution for AMD AI NIC workloads

Hardware Support#

New Hardware Support#

  • AMD Pensando™ Pollara AI NIC

Platform Support#

New Platform Support#

  • Kubernetes 1.29+

    • Supported features:

      • Driver management

      • Workload scheduling

      • Metrics monitoring

    • Requirements: Kubernetes version 1.29+

Breaking Changes#

Not Applicable as this is the initial release.