Posted On: Mar 18, 2021

We’re excited to announce Amazon SageMaker Pipelines, a new capability of Amazon SageMaker to build, manage, automate, and scale end to end machine learning workflows. SageMaker Pipelines brings automation and orchestration to ML workflows, enabling you to accelerate machine learning projects and scale up to thousands of models in production.

Machine Learning is an iterative process and requires collaboration across different stakeholders such as data engineers, data scientists, ML engineers, and DevOps engineers. It is challenging to build a scalable process for building models as the number of steps across data preparation, feature engineering, training, and model evaluation can become large, increasing the complexity in managing data dependencies. As the number of models rise, managing model versions and deploying them in production requires automation in an easy and scalable manner. Finally, tracking lineage across the end to end pipeline requires custom tooling for tracking of data and model artifacts and actions.

Amazon SageMaker Pipelines enables data science and engineering teams to collaborate seamlessly on ML projects and streamline building, automating, and scaling of end to end ML workflows. Amazon SageMaker SDK makes it easy to construct model building pipelines by defining the parameters and steps which can include Amazon SageMaker Data Wrangler, Processing, Training, Batch Transform, conditional evaluation, and registering models to the central model registry. Once the pipelines are built, Amazon SageMaker takes care of the execution of the pipelines and you can view the pipeline executions and the real-time metrics and logs for each step in Amazon SageMaker Studio. Models are registered to the new Amazon SageMaker model registry which automatically versions new models generated from pipelines and offers built-in approval workflows to select which models are deployed to production. Amazon SageMaker Pipelines automatically tracks lineage for each step of your ML pipeline, which may help with any governance and audit requirements, without the need for building any custom tooling.

Read the documentation for more information and for sample notebooks. To learn how to use the feature visit the blog post.