Event Monitoring

Event Monitoring#

AMD SMI: Event Monitoring
Event Monitoring

Functions

amdsmi_status_t amdsmi_event_create (amdsmi_processor_handle *processor_list, uint32_t num_devices, uint64_t event_types, amdsmi_event_set *set)
 Allocate a new event set notifier to monitor different types of issues with the GPU running virtualization SW. This call registers an event set. The user must pass an array with the GPUs it wants to monitor with the selected event flags. More...
 
amdsmi_status_t amdsmi_event_read (amdsmi_event_set set, int64_t timeout_usec, amdsmi_event_entry_t *event)
 The call blocks till timeout is expired to copy one event specified by the event set into the user provided notifier storage. More...
 
amdsmi_status_t amdsmi_event_destroy (amdsmi_event_set set)
 Destroys and frees an event set. More...
 

Detailed Description

Function Documentation

◆ amdsmi_event_create()

amdsmi_status_t amdsmi_event_create ( amdsmi_processor_handle *  processor_list,
uint32_t  num_devices,
uint64_t  event_types,
amdsmi_event_set *  set 
)

Allocate a new event set notifier to monitor different types of issues with the GPU running virtualization SW. This call registers an event set. The user must pass an array with the GPUs it wants to monitor with the selected event flags.

Parameters
[in]processor_listProcessor handles for the GPU to listen for events.
[in]num_devicesNumber of processors in the list.
[in]event_typesBitmask of the different event_types that the event_set will monitor in this GPU. Bit index (from 0): | 63 62 61 60| 59 .......... 0 | | event severity | event category bit field |

There are 5 event severities and the appropriate macros to set them: 0b0000 High severity - AMDSMI_MASK_HIGH_ERROR_SEVERITY_ONLY 0b0001 Med severity - AMDSMI_MASK_INCLUDE_MED_ERROR_SEVERITY 0b0010 Low severity - AMDSMI_MASK_INCLUDE_LOW_ERROR_SEVERITY 0b0100 Warn severity - AMDSMI_MASK_INCLUDE_WARN_SEVERITY 0b1000 Info severity - AMDSMI_MASK_INCLUDE_INFO_SEVERITY

AMDSMI_MASK_INCLUDE_CATEGORY macro is used to set the category we want to monitor. Enum AMDSMI_EVENT_CATEGORY is used as the input parameter of the macro.

Parameters
[out]setReference to the pointer to the event set created by the library. This will be allocated by the library.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_event_read()

amdsmi_status_t amdsmi_event_read ( amdsmi_event_set  set,
int64_t  timeout_usec,
amdsmi_event_entry_t event 
)

The call blocks till timeout is expired to copy one event specified by the event set into the user provided notifier storage.

Note
If timeout_usec is negative, the call will block forever, if timeout_usec is zero, the call returns immediately. Timeout value given in microseconds is converted to milliseconds. Minimal timeout is 1000 us. If provided timeout is lower than 1000 then the timeout will be set to 1000us by default. The timeout value in us will be converted to a smaller integer value in ms. (e.g. 1500us -> 1ms , 2600us -> 2ms)
Provided event entry contains a 64 bit timestamp, fields for the category of the error, the sub-code and flags associated with the error, VF and GPU handles that originated the error and a 256B text buffer with a human-readable description of the error.
Parameters
[in]setEvent set to read from. Use the same variable set that was used in the amdsmi_event_create call.
[in]timeout_usecTimeout in usec to wait for event
[out]eventReference to the user allocated event notifier.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail

◆ amdsmi_event_destroy()

amdsmi_status_t amdsmi_event_destroy ( amdsmi_event_set  set)

Destroys and frees an event set.

Parameters
[in]setEvent set to destroy.
Returns
amdsmi_status_t | AMDSMI_STATUS_SUCCESS on success, non-zero on fail