We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Best practices for viewing and querying Amazon SageMaker service quota usage
SageMaker helps you build, train, and deploy machine learning (ML) models with ease. To learn more, refer to
With Service Quotas, you can view the maximum number of resources, actions, or items in your Amazon Web Services account or Amazon Web Services Region. You can also use Service Quotas to request an increase for adjustable quotas.
With the increasing usage of MLOps practices, and therefore the demand for resources designated for ML model experimentation and retraining, more customers need to run multiple instances, often of the same instance type at the same time.
Many data science teams often work in parallel, using several instances for processing, training, and tuning concurrently. Previously, users would sometimes reach an adjustable account limit for some particular instance type and have to manually request a limit increase from Amazon Web Services.
To request quota increases manually from the
In this post, we show how you can use the new features to automatically request limit increases when a high level of instances is reached.
Solution overview
The following diagram illustrates the solution architecture.
This architecture includes the following workflow:
- A CloudWatch metric monitors the usage of the resource. A CloudWatch alarm triggers when the resource usage goes beyond a certain preconfigured threshold.
- A message is sent to
Amazon Simple Notification Service (Amazon SNS). - The message is received by an
Amazon Web Services Lambda function. - The Lambda function requests the quota increase.
Aside from requesting for a quota increase for the specific account, the Lambda function can also add the quota increase to the
Prerequisites
Complete the following prerequisite steps:
- Set up an
Amazon Web Services account and create anAmazon Web Services Identity and Access Management (IAM) user. For instructions, refer toSecure Your Amazon Web Services Account . - Install the
Amazon Web Services SAM CLI .
Deploy using Amazon Web Services Serverless Application Model
To deploy the application using the
After the solution is deployed, you should have a new alarm on the CloudWatch console. This alarm monitors usage for SageMaker notebook instances for the ml.t3.medium instance.
If your resource usage reaches more than 50%, the alarm triggers and the Lambda function requests an increase.
If the account you have is part of an Amazon Web Services Organization and you have the
Deploy using the CloudWatch console
To deploy the application using the CloudWatch console, complete the following steps:
- On the CloudWatch console, choose All alarms in the navigation pane.
- Choose Create alarm .
- Choose Select metric .
- Choose Usage .
- Select the metric you want to monitor.
- Select the condition of when you would like the alarm to trigger.
For more possible configurations when configuring the alarm, see
- Configure the SNS topic to be notified about the alarm.
You can also use Amazon SNS to trigger a Lambda function when the alarm is triggered. See
- For Alarm name , enter a name.
- Choose Next .
- Choose Create alarm .
Clean up
To clean up the resources created as part of this post, make sure to delete all the created stacks. To do that, run the following command:
Conclusion
In this post, we showed how you can use the new integration from SageMaker with Service Quotas to automate the requests for quota increases for SageMaker resources. This way, data science teams can effectively work in parallel and reduce issues related to unavailability of instances.
You can learn more about Amazon SageMaker quotas by accessing the
About the authors
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.