Amazon SageMaker makes it easy to generate predictions by providing everything you need to deploy machine learning models in production and monitor model quality.
Machine learning models are typically trained and evaluated using historical data but their quality degrades after they are deployed in production. This is because the distribution of the data sent to models for predictions can vary from the distribution of data used during training. The validity of prediction results can change over time and errors can be introduced upstream which can impact model quality. To prevent this, you need to monitor the quality of the models in production, identify issues quickly, and take corrective actions such as auditing or retraining models. To achieve this, you need to build tooling to store prediction related data securely, followed by implementing various statistical techniques to analyze this data and evaluate the quality of the model. Finally, you need to detect deviations in model quality to take the right corrective actions. As an alternative to building additional tooling, retraining models at a regular frequency is done which can be expensive.
Amazon SageMaker Model Monitor eliminates the need to build any tooling to monitor models in production and detect when corrective actions need to be taken. SageMaker Model Monitor continuously monitors the quality of machine learning models in production, and alerts you when there are deviations in model quality.
Amazon SageMaker makes it easy to deploy your trained model into production with a single click so that you can start generating predictions for real-time or batch data. You can one-click deploy your model onto auto-scaling Amazon ML instances across multiple availability zones for high redundancy. Just specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest. SageMaker will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application. Your application simply needs to include an API call to this endpoint to achieve low latency, high throughput inference. This architecture allows you to integrate your new models into your application in minutes because model changes no longer require application code changes.
Most of the time, processing batches of data for non-real-time predictions is one by resizing large datasets into smaller chunks of data and managing real-time endpoints. This can be expensive and error-prone. With the Batch Transform feature of Amazon SageMaker, there is no need to break down the data set into multiple chunks or manage real-time endpoints. Batch Transform allows you to run predictions on large or small batch datasets. Using a simple API, you can request predictions for a large number of data records and transform the data quickly and easily. Data can be as large as petabytes or as small as a few bytes to run explorations, or anywhere in between.
Train once, deploy anywhere
To achieve high inference performance across a range of edge devices, you typically need to spend weeks or months hand-tuning a model for each target device because every hardware configuration has a unique set of capabilities and restrictions. With Amazon SageMaker Neo, you can train your machine learning models once and deploy them anywhere in the cloud and at the edge.
Integration with Kubernetes
Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker addresses this requirement by letting Kubernetes users train and deploy models in SageMaker using SageMaker-Kubeflow operations and pipelines. With operators and pipelines, Kubernetes users can access fully managed SageMaker ML tools and engines, natively from Kubeflow. This eliminates the need to manually manage and optimize ML infrastructure in Kubernetes while still preserving control of overall orchestration through Kubernetes. Using SageMaker operators and pipelines for Kubernetes, you can get the benefits of a fully managed service for machine learning in Kubernetes, without migrating workloads.
Data processing beyond training
Real-life ML workloads typically require more than training and prediction. Data needs to be pre-processed and post-processed, sometimes in multiple steps. You might have to train and deploy using a sequence of algorithms that need to collaborate in delivering predictions from raw data. SageMaker enables you to deploy Inference Pipelines so you can pass raw input data and execute pre-processing, predictions, and post-processing on real-time and batch inference requests. Inference Pipelines can be comprised of any machine learning framework, built-in algorithm, or custom containers usable on Amazon SageMaker. You can build feature data processing and feature engineering pipelines with a suite of feature transformers available in the SparkML and Scikit-learn framework containers in Amazon SageMaker, and deploy these as part of the Inference Pipelines to reuse data processing code and easier management of machine learning processes.
Increasingly companies are training machine learning models based on individual user data. For example, a music streaming service will train custom models based on each listener’s music history to personalize music recommendations or a taxi service will train custom models based on each city’s traffic patterns to predict rider wait times. Building custom ML models for each use case leads to higher inference accuracy, but the cost of deploying and managing models increases significantly. These challenges become more pronounced when not all models are accessed at the same rate but still need to be available at all times.
Amazon SageMaker Multi-Model Endpoints provides a scalable and cost effective way to deploy large numbers of custom machine learning models. SageMaker Multi-Model endpoints enable you to deploy multiple models with a single click on a single endpoint and serve them using a single serving container. You specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest. SageMaker will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application. Your application simply needs to include an API call with the target model to this endpoint to achieve low latency, high throughput inference.