We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Bring legacy machine learning code into Amazon SageMaker using Amazon Web Services Step Functions
Tens of thousands of Amazon Web Services customers use Amazon Web Services machine learning (ML) services to accelerate their ML development with fully managed infrastructure and tools. For customers who have been developing ML models on premises, such as their local desktop, they want to migrate their legacy ML models to the Amazon Web Services Cloud to fully take advantage of the most comprehensive set of ML services, infrastructure, and implementation resources available on Amazon Web Services.
The term legacy code refers to code that was developed to be manually run on a local desktop, and is not built with cloud-ready SDKs such as the
In this post, we share a scalable and easy-to-implement approach to migrate legacy ML code to the Amazon Web Services Cloud for inference using
Solution overview
In this framework, we run the legacy code in a container as a
We assume the involvement of two personas: a data scientist and an MLOps engineer. The data scientist is responsible for moving the code into SageMaker, either manually or by cloning it from a code repository such as
The MLOps engineer takes ownership of building a Step Functions workflow that we can reuse to deploy the custom container developed by the data scientist with the appropriate parameters. The Step Functions workflow can be as modular as needed to fit the use case, or it can consist of just one step to initiate a single process. To minimize the effort required to migrate the code, we have identified three modular components to build a fully functional deployment process:
- Preprocessing
- Inference
- Postprocessing
The following diagram illustrates our solution architecture and workflow.
The following steps are involved in this solution:
- The data scientist persona uses Studio to import legacy code through cloning from a code repository, and then modularizing the code into separate components that follow the steps of the ML lifecycle (preprocessing, inference, and postprocessing).
- The data scientist uses Studio, and specifically the
Studio Image Build CLI tool provided by SageMaker, to build a Docker image. This CLI tool allows the data scientist to build the image directly within Studio and automatically registers the image into Amazon ECR. - The MLOps engineer uses the registered container image and creates a deployment for a specific use case using Step Functions. Step Functions is a serverless workflow service that can control SageMaker APIs directly through the use of the Amazon States Language.
SageMaker Processing job
Let’s understand how a
SageMaker takes your script, copies your data from
The SageMaker Processing job sets up your processing image using a Docker container entrypoint script. You can also provide your own custom entrypoint by using the ContainerEntrypoint and ContainerArguments parameters of the
For this example, we construct a custom container and use a SageMaker Processing job for inference. Preprocessing and postprocessing jobs utilize the script mode with a pre-built scikit-learn container.
Prerequisites
To follow along this post, complete the following prerequisite steps:
- Create a Studio domain. For instructions, refer to
Onboard to Amazon SageMaker Domain Using Quick setup . - Create an S3 bucket.
- Clone the provided
GitHub repo into Studio.
The GitHub repo is organized into different folders that correspond to various stages in the ML lifecycle, facilitating easy navigation and management:
Migrate the legacy code
In this step, we act as the data scientist responsible for migrating the legacy code.
We begin by opening the build_and_push.ipynb
notebook.
The initial cell in the notebook guides you in installing the
Before we run the build command, it’s important to ensure that the role running the command has the necessary permissions, as specified in the CLI
See the following code:
To streamline your legacy code, divide it into three distinct Python scripts named preprocessing.py, predict.py, and postprocessing.py. Adhere to best programming practices by converting the code into functions that are called from a main function. Ensure that all necessary libraries are imported and the requirements.txt file is updated to include any custom libraries.
After you organize the code, package it along with the requirements file into a Docker container. You can easily build the container from within Studio using the following command:
By default, the image will be pushed to an ECR repository called sagemakerstudio with the tag latest. Additionally, the execution role of the Studio app will be utilized, along with the default SageMaker Python SDK S3 bucket. However, these settings can be easily altered using the appropriate CLI options. See the following code:
Now that the container has been built and registered in an ECR repository, it’s time to dive deeper into how we can use it to run predict.py. We also show you the process of using a pre-built
Productionize the container
In this step, we act as the MLOps engineer who productionizes the container built in the previous step.
We use Step Functions to orchestrate the workflow. Step Functions allows for exceptional flexibility in integrating a diverse range of services into the workflow, accommodating any existing dependencies that may exist in the legacy system. This approach ensures that all necessary components are seamlessly integrated and run in the desired sequence, resulting in an efficient and effective workflow solution.
Step Functions can control certain Amazon Web Services services directly from the Amazon States Language. To learn more about working with Step Functions and its integration with SageMaker, refer to
Preprocessing
SageMaker offers several options for running custom code. If you only have a script without any custom dependencies, you can run the script as a Bring Your Own Script (BYOS). To do this, simply pass your script to the pre-built scikit-learn framework container and run a SageMaker Processing job in script mode using the ContainerArguments and ContainerEntrypoint parameters in the
Check out the “Preprocessing Script Mode” state configuration in the
Inference
You can run a custom container using the
/opt/ml
local path, and you can specify your ProcessingInputs and their local path in the configuration. The Processing job then copies the artifacts to the local container and starts the job. After the job is complete, it copies the artifacts specified in the local path of the ProcessingOutputs to its specified external location.
Check out the “Inference Custom Container” state configuration in the
Postprocessing
You can run a postprocessing script just like a preprocessing script using the Step Functions CreateProcessingJob step. Running a postprocessing script allows you to perform custom processing tasks after the inference job is complete.
Create the Step Functions workflow
For quickly prototyping, we use the Step Functions
You can create a new Step Functions state machine on the Step Functions console by selecting Write your workflow in code .
Step Functions can look at the resources you use and create a role. However, you may see the following message:
“Step Functions cannot generate an IAM policy if the RoleArn for SageMaker is from a Path. Hardcode the SageMaker RoleArn in your state machine definition, or choose an existing role with the proper permissions for Step Functions to call SageMaker.”
To address this, you must create an
The following figure illustrates the flow of data and container images into each step of the Step Functions workflow.
The following is a list of minimum required parameters to initialize in Step Functions; you can also refer to the
- input_uri – The S3 URI for the input files
- output_uri – The S3 URI for the output files
- code_uri – The S3 URI for script files
- custom_image_uri – The container URI for the custom container you have built
- scikit_image_uri – The container URI for the pre-built scikit-learn framework
- role – The execution role to run the job
- instance_type – The instance type you need to use to run the container
- volume_size – The storage volume size you require for the container
- max_runtime – The maximum runtime for the container, with a default value of 1 hour
Run the workflow
We have broken down the legacy code into manageable parts: preprocessing, inference, and postprocessing. To support our inference needs, we constructed a custom container equipped with the necessary library dependencies. Our plan is to utilize Step Functions, taking advantage of its ability to call the SageMaker API. We have shown two methods for running custom code using the SageMaker API: a SageMaker Processing job that utilizes a pre-built image and takes a custom script at runtime, and a SageMaker Processing job that uses a custom container, which is packaged with the necessary artifacts to run custom inference.
The following figure shows the run of the Step Functions workflow.
Summary
In this post, we discussed the process of migrating legacy ML Python code from local development environments and implementing a standardized MLOps procedure. With this approach, you can effortlessly transfer hundreds of models and incorporate your desired enterprise deployment practices. We presented two different methods for running custom code on SageMaker, and you can select the one that best suits your needs.
If you require a highly customizable solution, it’s recommended to use the custom container approach. You may find it more suitable to use pre-built images to run your custom script if you have basic scripts and don’t need to create your custom container, as described in the preprocessing step mentioned earlier. Furthermore, if required, you can apply this solution to containerize legacy model training and evaluation steps, just like how the inference step is containerized in this post.
About the Authors
Bhavana Chirumamilla is a Senior Resident Architect at Amazon Web Services with a strong passion for data and machine learning operations. She brings a wealth of experience and enthusiasm to help enterprises build effective data and ML strategies. In her spare time, Bhavana enjoys spending time with her family and engaging in various activities such as traveling, hiking, gardening, and watching documentaries.
Shyam Namavaram is a senior artificial intelligence (AI) and machine learning (ML) specialist solutions architect at Amazon Web Services (Amazon Web Services). He passionately works with customers to accelerate their AI and ML adoption by providing technical guidance and helping them innovate and build secure cloud solutions on Amazon Web Services. He specializes in AI and ML, containers, and analytics technologies. Outside of work, he loves playing sports and experiencing nature with trekking.
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his PhD in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently, he helps customers in the financial service and insurance industry build machine learning solutions on Amazon Web Services. In his spare time, he likes reading and teaching.
Srinivasa Shaik is a Solutions Architect at Amazon Web Services based in Boston. He helps enterprise customers accelerate their journey to the cloud. He is passionate about containers and machine learning technologies. In his spare time, he enjoys spending time with his family, cooking, and traveling.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.