Posted On: Mar 11, 2022

Amazon CloudWatch agent now supports the collection of NVIDIA GPU performance metrics from Amazon Elastic Compute Cloud (Amazon EC2) accelerated computing instances running Linux. GPU-based instances provide access to NVIDIA GPUs with thousands of compute cores. You can use these instances to accelerate scientific, engineering, and rendering applications. Customers can install and configure CloudWatch agent to collect system and application metrics from Amazon EC2, on-premises hosts, and containerized applications and send them to CloudWatch. CloudWatch provides you with data and actionable insights to monitor your applications and optimize resource utilization. GPU metrics are intended for users who want to monitor the utilization of GPU co-processors in their EC2 accelerated instances.

Using CloudWatch agent, you can now collect NVIDIA GPU metrics and send them to CloudWatch. GPU metrics help you to ensure efficient and cost-effective use of your GPU accelerators. By monitoring metrics such as GPU Utilization and Free Memory on a CloudWatch Dashboard, you can identify when an accelerated instance is over-, or under-utilized enabling you to right-size your instances or provision additional hosts. You can surface anomalies in usage with CloudWatch Anomaly Detection and receive notifications with CloudWatch Alarms.

Amazon CloudWatch Agent is available in Amazon Web Services China (Beijing) Region, operated by Sinnet and Amazon Web Services China (Ningxia) Region, operated by NWCD.

To get started, see Create or Edit the CloudWatch Agent Configuration File in the Amazon CloudWatch User Guide. To learn more about accelerated computing instances, see User Guide for Linux Instances in the Amazon EC2 User Guide. For more information about CloudWatch features, see the CloudWatch User Guide.