Amazon ECS provides contextual failure reasons for troubleshooting task launches with capacity providers

Posted On: Apr 9, 2023

Amazon Elastic Container Service (Amazon ECS) now provides contextual failure reasons that make it easier to troubleshoot task launch failures when customers use capacity providers for auto scaling the compute capacity in their cluster.

ECS capacity providers are designed to automatically scale your Amazon EC2 instances based on the capacity requirements of your applications, so that you can seamlessly scale your applications without having to manage scaling of the underlying infrastructure. When you use capacity providers, ECS schedules tasks to run even when there’s no capacity available, by automatically adding capacity in the EC2 auto scaling group; without capacity providers, task launches would fail immediately in this scenario. However, even when you use capacity providers, there could be circumstances when your tasks fail to start, e.g., if none of the EC2 instances match your task placement constraints or the instances in your autoscaling group have fewer resources than the resource requirements (cpu or memory) in your task definition. With today’s launch, ECS publishes contextual failure reasons on the Amazon ECS console, DescribeTasks API response, and in the Service task placement failure events sent to Amazon EventBridge.

The new experience is now automatically enabled for you in Amazon Web Services China (Beijing) Region, operated by Sinnet and Amazon Web Services China (Ningxia) Region, operated by NWCD. To see a detailed list of failure reasons, see API reference documentation.