We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Unlocking efficiency: Harnessing the power of Selective Execution in Amazon SageMaker Pipelines
MLOps is a key discipline that often oversees the path to productionizing machine learning (ML) models. It’s natural to focus on a single model that you want to train and deploy. However, in reality, you’ll likely work with dozens or even hundreds of models, and the process may involve multiple complex steps. Therefore, it’s important to have the infrastructure in place to track, train, deploy, and monitor models with varying complexities at scale. This is where MLOps tooling comes in. MLOps tooling helps you repeatably and reliably build and simplify these processes into a workflow that is tailored for ML.
In this post, we spotlight an exciting new feature of SageMaker Pipelines known as
Solution overview
SageMaker Pipelines continues to innovate its developer experience with the release of
It’s important to note that the selected steps may rely on the results of non-selected steps. In such cases, the outputs of these non-selected steps are reused from a reference run of the current pipeline version. This means that the reference run must have already completed. The default reference run is the latest run of the current pipeline version, but you can also choose to use a different run of the current pipeline version as a reference.
The overall state of the reference run must be Successful , Failed or Stopped . It cannot be Running when Selective Execution attempts to use its outputs. When using Selective Execution, you can choose any number of steps to run, as long as they form a contiguous portion of the pipeline.
The following diagram illustrates the pipeline behavior with a full run.
The following diagram illustrates the pipeline behavior using Selective Execution.
In the following sections, we show how to use Selective Execution for various scenarios, including complex workflows in pipeline Direct Acyclic Graphs (DAGs).
Prerequisites
To start experimenting with Selective Execution, we need to first set up the following components of your SageMaker environment:
- SageMaker Python SDK – Ensure that you have an updated
SageMaker Python SDK installed in your Python environment. You can run the following command from your notebook or terminal to install or upgrade the SageMaker Python SDK version to 2.162.0 or higher:python3 -m pip install sagemaker>=2.162.0
orpip3 install sagemaker>=2.162.0
. - Access to SageMaker Studio (optional) –
Amazon SageMaker Studio can be helpful for visualizing pipeline runs and interacting with preexisting pipeline ARNs visually. If you don’t have access to SageMaker Studio or are using on-demand notebooks or other IDEs, you can still follow this post and interact with your pipeline ARNs using the Python SDK.
The sample code for a full end-to-end walkthrough is available in the
Setup
With the sagemaker>=1.162.0
SelectiveExecutionConfig
class as part of the sagemaker.workflow.selective_execution_config
module. The Selective Execution feature relies on a pipeline ARN that has been previously marked as Succeeded , Failed or Stopped . The following code snippet demonstrates how to import the SelectiveExecutionConfig
class, retrieve the reference pipeline ARN, and gather associated pipeline steps and runtime parameters governing the pipeline run:
import boto3
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig
sm_client = boto3.client('sagemaker')
# reference the name of your sample pipeline
pipeline_name = "AbalonePipeline"
# filter for previous success pipeline execution arns
pipeline_executions = [_exec
for _exec in Pipeline(name=pipeline_name).list_executions()['PipelineExecutionSummaries']
if _exec['PipelineExecutionStatus'] == "Succeeded"
]
# get the last successful execution
latest_pipeline_arn = pipeline_executions[0]['PipelineExecutionArn']
print(latest_pipeline_arn)
>>> arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/x62pbar3gs6h
# list all steps of your sample pipeline
execution_steps = sm_client.list_pipeline_execution_steps(
PipelineExecutionArn=latest_pipeline_arn
)['PipelineExecutionSteps']
print(execution_steps)
>>>
[{'StepName': 'Abalone-Preprocess',
'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 519000, tzinfo=tzlocal()),
'EndTime': datetime.datetime(2023, 6, 27, 4, 41, 30, 986000, tzinfo=tzlocal()),
'StepStatus': 'Succeeded',
'AttemptCount': 0,
'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-fvsmu7m7ki3q-Abalone-Preprocess-d68CecvHLU'}},
'SelectiveExecutionResult': {'SourcePipelineExecutionArn': 'arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/ksm2mjwut6oz'}},
{'StepName': 'Abalone-Train',
'StartTime': datetime.datetime(2023, 6, 27, 4, 41, 31, 320000, tzinfo=tzlocal()),
'EndTime': datetime.datetime(2023, 6, 27, 4, 43, 58, 224000, tzinfo=tzlocal()),
'StepStatus': 'Succeeded',
'AttemptCount': 0,
'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:training-job/pipelines-x62pbar3gs6h-Abalone-Train-PKhAc1Q6lx'}}},
{'StepName': 'Abalone-Evaluate',
'StartTime': datetime.datetime(2023, 6, 27, 4, 43, 59, 40000, tzinfo=tzlocal()),
'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 76000, tzinfo=tzlocal()),
'StepStatus': 'Succeeded',
'AttemptCount': 0,
'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-1:123123123123:processing-job/pipelines-x62pbar3gs6h-Abalone-Evaluate-vmkZDKDwhk'}}},
{'StepName': 'Abalone-MSECheck',
'StartTime': datetime.datetime(2023, 6, 27, 4, 57, 43, 821000, tzinfo=tzlocal()),
'EndTime': datetime.datetime(2023, 6, 27, 4, 57, 44, 124000, tzinfo=tzlocal()),
'StepStatus': 'Succeeded',
'AttemptCount': 0,
'Metadata': {'Condition': {'Outcome': 'True'}}}]
# list all configureable pipeline parameters
# params can be altered during selective execution
parameters = sm_client.list_pipeline_parameters_for_execution(
PipelineExecutionArn=latest_pipeline_arn
)['PipelineParameters']
print(parameters)
>>>
[{'Name': 'XGBNumRounds', 'Value': '120'},
{'Name': 'XGBSubSample', 'Value': '0.9'},
{'Name': 'XGBGamma', 'Value': '2'},
{'Name': 'TrainingInstanceCount', 'Value': '1'},
{'Name': 'XGBMinChildWeight', 'Value': '4'},
{'Name': 'XGBETA', 'Value': '0.25'},
{'Name': 'ApprovalStatus', 'Value': 'PendingManualApproval'},
{'Name': 'ProcessingInstanceCount', 'Value': '1'},
{'Name': 'ProcessingInstanceType', 'Value': 'ml.t3.medium'},
{'Name': 'MseThreshold', 'Value': '6'},
{'Name': 'ModelPath',
'Value': 's3://sagemaker-us-east-1-123123123123/Abalone/models/'},
{'Name': 'XGBMaxDepth', 'Value': '12'},
{'Name': 'TrainingInstanceType', 'Value': 'ml.c5.xlarge'},
{'Name': 'InputData',
'Value': 's3://sagemaker-us-east-1-123123123123/sample-dataset/abalone/abalone.csv'}]
Use cases
In this section, we present a few scenarios where Selective Execution can potentially save time and resources. We use a typical pipeline flow, which includes steps such as data extraction, training, evaluation, model registration and deployment, as a reference to demonstrate the advantages of Selective Execution.
SageMaker Pipelines allows you to define runtime parameters for your pipeline run using pipeline parameters. When a new run is triggered, it typically runs the entire pipeline from start to finish. However, if
With the release of the Selective Execution feature, you can now rerun an entire pipeline workflow or selectively run a subset of steps using a prior pipeline ARN. This can be done even without step caching enabled. The following use cases illustrate the various ways you can use Selective Execution.
Use case 1: Run a single step
Data scientists often focus on the training stage of a MLOps pipeline and don’t want to worry about the preprocessing or deployment steps. Selective Execution allows data scientists to focus on just the training step and modify training parameters or hyperparameters on the fly to improve the model. This can save time and reduce cost because compute resources are only utilized for running user-selected pipeline steps. See the following code:
# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
selected_steps=["Abalone-Train"]
)
# start execution of pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config,
parameters={
"XGBNumRounds": 120,
"XGBSubSample": 0.9,
"XGBGamma": 2,
"XGBMinChildWeight": 4,
"XGBETA": 0.25,
"XGBMaxDepth": 12
}
)
The following figures illustrate the pipeline with one step in process and then complete.
Use case 2: Run multiple contiguous pipeline steps
Continuing with the previous use case, a data scientist wants to train a new model and evaluate its performance against a golden test dataset. This evaluation is crucial to ensure that the model meets rigorous guidelines for user acceptance testing (UAT) or production deployment. However, the data scientist doesn’t want to run the entire pipeline workflow or deploy the model. They can use Selective Execution to focus solely on the training and evaluation steps, saving time and resources while still getting the validation results they need:
# select a reference pipeline arn and subset step to execute
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
selected_steps=["Abalone-Train", "Abalone-Evaluate"]
)
# start execution of pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config,
parameters={
"ProcessingInstanceType": "ml.t3.medium",
"XGBNumRounds": 120,
"XGBSubSample": 0.9,
"XGBGamma": 2,
"XGBMinChildWeight": 4,
"XGBETA": 0.25,
"XGBMaxDepth": 12
}
)
Use case 3: Update and rerun failed pipeline steps
You can use Selective Execution to rerun failed steps within a pipeline or resume the run of a pipeline from a failed step onwards. This can be useful for troubleshooting and debugging failed steps because it allows developers to focus on the specific issues that need to be addressed. This can lead to more efficient problem-solving and faster iteration times. The following example illustrates how you can choose to rerun just the failed step of a pipeline.
# select a previously failed pipeline arn
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/fvsmu7m7ki3q",
selected_steps=["Abalone-Evaluate"]
)
# start execution of failed pipeline subset
select_execution = pipeline.start(
selective_execution_config=selective_execution_config
)
Alternatively, a data scientist can resume a pipeline from a failed step to the end of the workflow by specifying the failed step and all the steps that follow it in the SelectiveExecutionConfig
.
Use case 4: Pipeline coverage
In some pipelines, certain branches are less frequently run than others. For example, there might be a branch that only runs when a specific condition fails. It’s important to test these branches thoroughly to ensure that they work as expected when a failure does occur. By testing these less frequently run branches, developers can verify that their pipeline is robust and that error-handling mechanisms effectively maintain the desired workflow and produce reliable results.
selective_execution_config = SelectiveExecutionConfig(
source_pipeline_execution_arn="arn:aws:sagemaker:us-east-1:123123123123:pipeline/AbalonePipeline/execution/9e3ljoql7s0n",
selected_steps=["Abalone-Train", "Abalone-Evaluate", "Abalone-MSECheck", "Abalone-FailNotify"]
)
Conclusion
In this post, we discussed the Selective Execution feature of SageMaker Pipelines, which empowers you to selectively run specific steps of your ML workflows. This capability leads to significant time and computational resource savings. We provided some sample code in the
To explore the available steps within the SageMaker Pipelines workflow in more detail, refer to
About the Authors
Pranav Murthy is an AI/ML Specialist Solutions Architect at Amazon Web Services. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes. In his free time, he enjoys playing chess and traveling.
Akhil Numarsu is a Sr.Product Manager-Technical focused on helping teams accelerate ML outcomes through efficient tools and services in the cloud. He enjoys playing Table Tennis and is a sports fan.
Nishant Krishnamoorthy is a Sr. Software Development Engineer with Amazon Stores. He holds a masters degree in Computer Science and currently focuses on accelerating ML Adoption in different orgs within Amazon by building and operationalizing ML solutions on SageMaker.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.