Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.
Traditional ML development is a complex, expensive, iterative process made even harder because there are no integrated tools for the entire machine learning workflow. You need to stitch together tools and workflows, which is time-consuming and error-prone. SageMaker solves this challenge by providing all of the components used for machine learning in a single toolset so models get to production faster with much less effort and at lower cost.
Build machine learning models
Improve productivity using Amazon SageMaker Studio, the first fully integrated development environment (IDE) for machine learning
Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps. SageMaker Studio gives you complete access, control, and visibility into each step required to build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive. All ML development activities including notebooks, experiment management, automatic model creation, debugging, and model drift detection can be performed within the unified SageMaker Studio visual interface.
Use an IDE for ML development. For example, make updates to models inside a notebook and see how changes impact model quality using a side-by-side view of your notebook and training experiments.
Build and collaborate faster using Amazon SageMaker Notebooks
Managing compute instances to view, run, or share a notebook is tedious. Amazon SageMaker Notebooks provide one-click Jupyter notebooks that you can start working with in seconds. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. SageMaker also enables one-click sharing of notebooks. All code dependencies are automatically captured, so you can easily collaborate with others. They’ll get the exact same notebook, saved in the same place.
Automatically build, train, and tune models with full visibility and control, using Amazon SageMaker Autopilot
Amazon SageMaker Autopilot is the industry’s first automated machine learning capability that gives you complete control and visibility into your ML models. Typical approaches to automated machine learning do not give you the insights into the data used in creating the model or the logic that went into creating the model. As a result, even if the model is mediocre, there is no way to evolve it. Also, you don’t have the flexibility to make trade-offs such as sacrificing some accuracy for lower latency predictions since typical automated ML solutions provide only one model to choose from.
SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. The result is the best performing model that you can deploy at a fraction of the time normally required to train the model. You get full visibility into how the model was created and what’s in it and SageMaker Autopilot integrates with Amazon SageMaker Studio. You can explore up to 50 different models generated by SageMaker Autopilot inside SageMaker Studio so it’s easy to pick the best model for your use case. SageMaker Autopilot can be used by people without machine learning experience to easily produce a model or it can be used by experienced developers to quickly develop a baseline model on which teams can further iterate.
Automatically create machine learning models and pick the one that best suits your use case. For example, review the leaderboard to see how each option performs and pick the model that meets your model accuracy and latency requirements.
Reduce data labeling costs by up to 70% using Amazon SageMaker Ground Truth
Successful machine learning models are built on the shoulders of large volumes of high-quality training data. But, the process to create the training data necessary to build these models is often expensive, complicated, and time-consuming. Amazon SageMaker Ground Truth helps you build and manage highly accurate training datasets quickly. Ground Truth offers easy access to labelers through Amazon Mechanical Turk and provides them with pre-built workflows and interfaces for common labeling tasks. You can also use your own labelers or use vendors recommended by Amazon through AWS Marketplace. Additionally, Ground Truth continuously learns from labels done by humans to make high quality, automatic annotations to significantly lower labeling costs.
Amazon SageMaker supports the leading deep learning frameworks
Supported frameworks include TensorFlow, PyTorch, Apache MXNet, Chainer, Keras, Gluon, Horovod, Scikit-learn, and Deep Graph Library.
Train machine learning models
Organize, track, and evaluate training runs using Amazon SageMaker Experiments
Amazon SageMaker Experiments helps you organize and track iterations to machine learning models. Training an ML model typically entails many iterations to isolate and measure the impact of changing data sets, algorithm versions, and model parameters. You produce hundreds of artifacts such as models, training data, platform configurations, parameter settings, and training metrics during these iterations. Often cumbersome mechanisms like spreadsheets are used to track these experiments.
SageMaker Experiments helps you manage iterations by automatically capturing the input parameters, configurations, and results, and storing them as ‘experiments’. You can work within the visual interface of SageMaker Studio, where you can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.
Analyze, detect, and alert problems for machine learning using Amazon SageMaker Debugger
The ML training process is largely opaque and the time it takes to train a model can be long and difficult to optimize. As a result, it is often difficult to interpret and explain models. Amazon SageMaker Debugger makes the training process more transparent by automatically capturing real-time metrics during training such as training and validation, confusion matrices, and learning gradients to help improve model accuracy.
The metrics from SageMaker Debugger can be visualized in SageMaker Studio for easy understanding. SageMaker Debugger can also generate warnings and remediation advice when common training problems are detected. With SageMaker Debugger, you can interpret how a model is working, representing an early step towards model explainability.
Analyze and debug anomalies. For example, training a neural network will cease if gradients are determined to be vanishing. SageMaker Debugger identifies vanishing gradients so you can remediate before training is impacted.
AWS is the best place to run TensorFlow
AWS’ TensorFlow optimizations provide near-linear scaling efficiency across hundreds of GPUs to operate at cloud scale without a lot of processing overhead to train more accurate, more sophisticated models in much less time.
Lower training costs by 90%
Amazon SageMaker provides Managed Spot Training to help you to reduce training costs by up to 90%. This capability uses Amazon EC2 Spot instances, which is spare AWS compute capacity. Training jobs are automatically run when compute capacity becomes available and are made resilient to interruptions caused by changes in capacity, allowing you to save cost when you have flexibility with when to run training jobs.
Deploy machine learning models
Amazon SageMaker makes it easy to deploy your trained model into production with a single click so that you can start generating predictions for real-time or batch data. You can one-click deploy your model onto auto-scaling Amazon ML instances across multiple availability zones for high redundancy. Just specify the type of instance, and the maximum and minimum number desired, and SageMaker takes care of the rest. SageMaker will launch the instances, deploy your model, and set up the secure HTTPS endpoint for your application. Your application simply needs to include an API call to this endpoint to achieve low latency, high throughput inference. This architecture allows you to integrate your new models into your application in minutes because model changes no longer require application code changes.
Keep models accurate over time using Amazon SageMaker Model Monitor
Amazon SageMaker Model Monitor allows developers to detect and remediate concept drift. Today, one of the big factors that can affect the accuracy of deployed models is if the data being used to generate predictions differs from data used to train the model. For example, changing economic conditions could drive new interest rates affecting home purchasing predictions. This is called concept drift, whereby the patterns the model uses to make predictions no longer apply. SageMaker Model Monitor automatically detects concept drift in deployed models and provides detailed alerts that help identify the source of the problem. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio. From inside SageMaker Studio you can configure data to be collected, how to view it, and when to receive alerts.
Monitor models in production. For example, view charts with important model features and summary statistics, watch them over time and compare with the features used in training. Some features drift when the model is running in production, which can indicate the need to retrain your model.
Validate predictions through human review
Many machine learning applications require humans to review low confidence predictions to ensure the results are correct. But, building human review into the workflow can be time consuming and expensive involving complex processes. Amazon Augmented AI is a service that makes it easy to build the workflows required for human review of ML predictions. Augmented AI provides built-in human review workflows for common machine learning use cases. You can also create your own workflows for models built on Amazon SageMaker. With Augmented AI, you can allow human reviewers to step in when a model is unable to make high confidence predictions.
Integrate with Kubernetes for orchestration and management
Kubernetes is an open source system used to automate the deployment, scaling, and management of containerized applications. Many customers want to use the fully managed capabilities of Amazon SageMaker for machine learning, but also want platform and infrastructure teams to continue using Kubernetes for orchestration and managing pipelines. SageMaker lets users train and deploy models in SageMaker using Kubernetes operators and pipelines. Kubernetes users can access all of SageMaker’s capabilities natively from Kubeflow.
Lower machine learning inference costs by up to 75% using Amazon Elastic Inference
In most deep learning applications, making predictions using a trained model - a process called inference - can be a major factor in the compute costs of the application. A full GPU instance may be over-sized for model inference. In addition, it can be difficult to optimize the GPU, CPU, and memory needs of your deep-learning application. Amazon Elastic Inference solves these problems by allowing you to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or Amazon SageMaker instance type or Amazon ECS task with no code changes. With Elastic Inference, you can choose the instance type that is best suited to the overall CPU and memory needs of your application, and then separately configure the amount of inference acceleration that you need to use resources efficiently and to reduce the cost of running inference.