Skip to main content

Amazon SageMaker

Amazon SageMaker FAQs

General

Open all

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high quality models.

For a list of the supported Amazon SageMaker Amazon Web Services regions, please visit the Amazon Web Services Region Table for all Amazon Web Services global infrastructure. Also for more information, see Regions and Endpoints in the Amazon Web Services General Reference.

Amazon SageMaker is designed for high availability. There are no maintenance windows or scheduled downtimes. SageMaker APIs run in Amazon’s proven, high-availability data centers, with service stack replication configured across three facilities in each Amazon Web Services region to provide fault tolerance in the event of a server failure or Availability Zone outage.

Amazon SageMaker ensures that ML model artifacts and other system artifacts are encrypted in transit and at rest. Requests to the SageMaker API and console are made over a secure (SSL) connection. You pass Amazon Identity and Access Management roles to SageMaker to provide permissions to access resources on your behalf for training and deployment. You can use encrypted S3 buckets for model artifacts and data, as well as pass a KMS key to SageMaker notebooks, training jobs, and endpoints, to encrypt the attached ML storage volume.

Amazon SageMaker stores code in ML storage volumes, secured by security groups and optionally encrypted at rest.

You pay for ML compute, storage, and data processing resources you use for hosting the notebook, training the model, performing predictions, and logging the outputs. Amazon SageMaker allows you to select the number and type of instance used for the hosted notebook, training, and model hosting. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments. See the Amazon SageMaker pricing page for details.

Amazon SageMaker provides a full end-to-end workflow, but you can continue to use your existing tools with SageMaker. You can easily transfer the results of each stage in and out of SageMaker as your business requirements dictate.

Amazon SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps. SageMaker Studio gives you complete access, control, and visibility into each step required to build, train, and deploy models. You can quickly upload data, create new notebooks, train and tune models, move back and forth between steps to adjust experiments, compare results, and deploy models to production all in one place, making you much more productive. All ML development activities including notebooks, experiment management, automatic model creation, debugging and profiling, and model drift detection can be performed within the unified SageMaker Studio visual interface.

Amazon SageMaker Autopilot is the industry’s first automated machine learning capability that gives you complete control and visibility into your ML models. SageMaker Autopilot automatically inspects raw data, applies feature processors, picks the best set of algorithms, trains and tunes multiple models, tracks their performance, and then ranks the models based on performance, all with just a few clicks. The result is the best performing model that you can deploy at a fraction of the time normally required to train the model. You get full visibility into how the model was created and what’s in it and SageMaker Autopilot integrates with Amazon SageMaker Studio. You can explore up to 50 different models generated by SageMaker Autopilot inside SageMaker Studio so it’s easy to pick the best model for your use case. SageMaker Autopilot can be used by people without machine learning experience to easily produce a model or it can be used by experienced developers to quickly develop a baseline model on which teams can further iterate.

While Amazon Personalize and Amazon Forecast specifically target at personalized recommendation and forecasting use cases, Amazon SageMaker Autopilot is a generic automatic machine learning solution for classification and regression problems, such as fraud detection, churn analysis, and targeted marketing. Personalize and Forecast focus on simplifying end to end experience by offering training and model hosting in a bundle. You can train models using Amazon SageMaker Autopilot and get full access to the models as well as the pipelines that generated the models. They can then deploy the models to the hosting environment of their choice, or further iterate to improve model quality.

Amazon SageMaker Autopilot supports 2 built-in algorithms at launch: XGBoost and Linear Learner.

Yes. All Amazon SageMaker Autopilot built-in algorithms support distributed training out of the box.

Yes. You can stop a job at any time. When an Amazon SageMaker Autopilot job is stopped, all ongoing trials will be stopped and no new trial will be started.

Build Models

Open all

Currently, Jupyter notebooks are supported.

Amazon SageMaker Notebooks provide one-click Jupyter notebooks that you can start working within seconds. The underlying compute resources are fully elastic, so you can easily dial up or down the available resources and the changes take place automatically in the background without interrupting your work. SageMaker also enables one-click sharing of notebooks. All code dependencies are automatically captured, so you can easily collaborate with others. They’ll get the exact same notebook, saved in the same place.

With SageMaker Notebooks you can sign in with your corporate credentials using SSO and start working with notebooks within seconds. Sharing notebooks within and across teams is easy, since the dependencies needed to run a notebook are automatically tracked in environments that are encapsulated with the notebook as it is shared.

Amazon SageMaker Notebooks give you access to all SageMaker features, such as distributed training, batch transform, hosting, and experiment management. You can access other services such as datasets in Amazon S3, Amazon Redshift, Amazon Glue, or Amazon EMR from SageMaker Notebooks.

Amazon SageMaker Ground Truth provides automated data labeling using machine learning. SageMaker Ground Truth will first select a random sample of data and send it to Amazon Mechanical Turk to be labeled. The results are then used to train a labeling model that attempts to label a new sample of raw data automatically. The labels are committed when the model can label the data with a confidence score that meets or exceeds a threshold you set. Where the confidence score falls below your threshold, the data is sent to human labelers. Some of the data labeled by humans is used to generate a new training dataset for the labeling model, and the model is automatically retrained to improve its accuracy. This process repeats with each sample of raw data to be labeled. The labeling model becomes more capable of automatically labeling raw data with each iteration, and less data is routed to humans.

Train Models

Open all

Amazon SageMaker Experiments helps you organize and track iterations to machine learning models. SageMaker Experiments helps you manage iterations by automatically capturing the input parameters, configurations, and results, and storing them as ‘experiments’. You can work within the visual interface of SageMaker Studio, where you can browse active experiments, search for previous experiments by their characteristics, review previous experiments with their results, and compare experiment results visually.

Amazon SageMaker Debugger makes the training process more transparent by automatically capturing real-time metrics during training such as training and validation, confusion matrices, and learning gradients to help improve model accuracy.

The metrics from SageMaker Debugger can be visualized in Amazon SageMaker Studio for easy understanding. SageMaker Debugger can also generate warnings and remediation advice when common training problems are detected. With SageMaker Debugger, you can interpret how a model is working, representing an early step towards model explainability.

Managed Spot Training with Amazon SageMaker lets you train your machine learning models using Amazon EC2 Spot instances, while reducing the cost of training your models by up to 90%.

You enable the Managed Spot Training option when submitting your training jobs and you also specify how long you want to wait for Spot capacity. Amazon SageMaker will then use Amazon EC2 Spot instances to run your job and manages the Spot capacity. You have full visibility into the status of your training job, both while they are running and while they are waiting for capacity.

Managed Spot Training is ideal when you have flexibility with your training runs and when you want to minimize the cost of your training jobs. With Managed Spot Training, you can reduce the cost of training your machine learning models by up to 90%.

Managed Spot Training uses Amazon EC2 Spot instances for training, and these instances can be pre-empted when Amazon Web Services needs capacity. As a result, Managed Spot Training jobs can run in small increments as and when capacity becomes available. The training jobs need not be restarted from scratch when there is an interruption as Amazon SageMaker can resume the training jobs using the latest model checkpoint. The built-in frameworks and the built-in computer vision algorithms with SageMaker enable periodic checkpoints, and you can enable checkpoints with custom models.

We recommend periodic checkpoints as a general best practice for long running training jobs. This prevents your Managed Spot Training jobs from restarting if capacity is pre-empted. When you enable checkpoints, Amazon SageMaker resumes your Managed Spot Training jobs from the last checkpoint.

Once a Managed Spot Training job is completed, you can see the savings in the Amazon Web Services management console and also calculate the cost savings as the percentage difference between the duration for which the training job ran and the duration for which you were billed.

Regardless of how many times your Managed Spot Training jobs are interrupted, you are charged only once for the duration for which the data was downloaded.

Managed Spot Training can be used with all instances supported in Amazon SageMaker.

Managed Spot Training is supported on all Amazon Web Services regions where Amazon SageMaker is currently available .

There are no fixed limits to the size of the dataset you can use for training models with Amazon SageMaker.

You can specify the Amazon S3 location of your training data as part of creating a training job.

Amazon SageMaker includes built-in algorithms for linear regression, logistic regression, k-means clustering, principal component analysis, factorization machines, neural topic modeling, latent dirichlet allocation, gradient boosted trees, sequence2sequence, time series forecasting, word2vec, and image classification. SageMaker also provides optimized Apache MXNet, Tensorflow, Chainer, PyTorch, Gluon, Keras, Horovod, Scikit-learn, and Deep Graph Library containers. In addition, Amazon SageMaker supports your custom training algorithms provided through a Docker image adhering to the documented specification.

Most machine learning algorithms expose a variety of parameters that control how the underlying algorithm operates. Those parameters are generally referred to as hyperparameters and their values affect the quality of the trained models. Automatic model tuning is the process of finding a set of hyperparameters for an algorithm that can yield an optimal model.

You can run automatic model tuning in Amazon SageMaker on top of any algorithm as long as it’s scientifically feasible, including built-in SageMaker algorithms, deep neural networks, or arbitrary algorithms you bring to SageMaker in the form of Docker images.

Not at this time. The best model tuning performance and experience is within Amazon SageMaker.

Currently, our algorithm for tuning hyperparameters is a customized implementation of Bayesian Optimization. It aims to optimize a customer specified objective metric throughout the tuning process. Specifically, it checks the object metric of completed training jobs, and leverages the knowledge to infer the hyperparameter combination for the next training job.

No. How certain hyperparameters impact the model performance depends on various factors and it is hard to definitively say one hyperparameter is more important than the others and thus needs to be tuned. For built-in algorithms within Amazon SageMaker, we do call out whether or not a hyperparameter is tunable.

The length of time for a hyperparameter tuning job depends on multiple factors including the size of the data, the underlying algorithm, and the values of the hyperparameters. Additionally, customers can choose the number of simultaneous training jobs and total number of training jobs. All these choices affect how long a hyperparameter tuning job can last.

Not at this time. Right now, you need to specify a single objective metric to optimize or change your algorithm code to emit a new metric, which is a weighted average between two or more useful metrics, and have the tuning process optimize towards that objective metric.

There is no charge for a hyperparameter tuning job itself. You will be charged by the training jobs that are launched by the hyperparameter tuning job, based on model training pricing .

Amazon SageMaker Autopilot automates everything in a typical machine learning workflow, including feature preprocessing, algorithm selection, and hyperparameter tuning, while specifically focusing on classification and regression use cases. Automatic Model Tuning, on the other hand, is designed to tune any model, no matter it is based on built-in algorithms, deep learning frameworks, or custom containers. In exchange for the flexibility, you have to manually pick the specific algorithm, determine the hyperparameters to tune, and corresponding search ranges.

Reinforcement learning is a machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its own actions and experiences.

Yes, you can train reinforcement learning models in Amazon SageMaker in addition to supervised and unsupervised learning models.

Though both supervised and reinforcement learning use mapping between input and output, unlike supervised learning where the feedback provided to the agent is correct set of actions for performing a task, reinforcement learning uses a delayed feedback where reward signals are optimized to ensure a long-term goal through a sequence of actions.

While the goal of supervised learning techniques is to find the right answer based on the patterns in the training data and the goal of unsupervised learning techniques is to find similarities and differences between data points. In contrast, the goal of reinforcement learning techniques is to learn how to achieve a desired outcome even when it is not clear how to accomplish that outcome. As a result, RL is more suited to enabling intelligent applications where an agent can make autonomous decisions such as robotics, autonomous vehicles, HVAC, industrial control, and more.

Amazon SageMaker RL supports a number of different environments for training reinforcement learning models. You can use Amazon Web Services services such as Amazon RoboMaker, open source environments or custom environments developed using Open AI Gym interfaces, or commercial simulation environments such as MATLAB and SimuLink.

No, Amazon SageMaker RL includes RL toolkits such as Coach and Ray RLLib that offer implementations of RL agent algorithms such as DQN, PPO, A3C, and many more.

Yes, you can bring your own RL libraries and algorithm implementations in Docker Containers and run those in Amazon SageMaker RL.

Yes. You can even select a heterogeneous cluster where the training can run on a GPU instance and the simulations can run on multiple CPU instances.

Deploy Models

Open all

Amazon SageMaker Model Monitor allows developers to detect and remediate concept drift. SageMaker Model Monitor automatically detects concept drift in deployed models and provides detailed alerts that help identify the source of the problem. All models trained in SageMaker automatically emit key metrics that can be collected and viewed in SageMaker Studio. From inside SageMaker Studio you can configure data to be collected, how to view it, and when to receive alerts.

No. Amazon SageMaker operates the compute infrastructure on your behalf, allowing it to perform health checks, apply security patches, and do other routine maintenance. You can also deploy the model artifacts from training with custom inference code in your own hosting environment.

Amazon SageMaker hosting automatically scales to the performance needed for your application using Application Auto Scaling. In addition, you can manually change the instance number and type without incurring downtime through modifying the endpoint configuration.

Amazon SageMaker emits performance metrics to Amazon CloudWatch Metrics so you can track metrics, set alarms, and automatically react to changes in production traffic. In addition, Amazon SageMaker writes logs to Amazon Cloudwatch Logs to let you monitor and troubleshoot your production environment.

Amazon SageMaker can host any model that adheres to the documented specification for inference Docker images. This includes models created from Amazon SageMaker model artifacts and inference code.

Amazon SageMaker is designed to scale to a large number of transactions per second. The precise number varies based on the deployed model and the number and type of instances to which the model is deployed.

Batch Transform enables you to run predictions on large or small batch data. There is no need to break down the data set into multiple chunks or managing real-time endpoints. With a simple API, you can request predictions for a large number of data records and transform the data quickly and easily

Amazon SageMaker Neo enables machine learning models to train once and run anywhere in the cloud and at the edge. SageMaker Neo automatically optimizes models built with popular deep learning frameworks that can be used to deploy on multiple hardware platforms. Optimized models run up to two times faster and consume less than a tenth of the resources of typical machine learning models.

To get started with Amazon SageMaker Neo, you log into the Amazon SageMaker console, choose a trained model, follow the example to compile models, and deploy the resulting model onto your target hardware platform.

Amazon SageMaker Neo contains two major components – a compiler and a runtime. First, the Neo compiler reads models exported by different frameworks. It then converts the framework-specific functions and operations into a framework-agnostic intermediate representation. Next, it performs a series of optimizations. Then, the compiler generates binary code for the optimized operations and writes them to a shared object library. The compiler also saves the model definition and parameters into separate files. During execution, the Neo runtime loads the artifacts generated by the compiler -- model definition, parameters, and the shared object library to run the model.

No. You can train models elsewhere and use Neo to optimize them for Amazon SageMaker ML instances or Amazon IoT Greengrass supported devices.

Currently, Amazon SageMaker Neo supports the most popular deep learning models that power computer vision applications and the most popular decision tree models used in Amazon SageMaker today. Neo optimizes the performance of AlexNet, ResNet, VGG, Inception, MobileNet, SqueezeNet, and DenseNet models trained in MXNet and TensorFlow, and classification and random cut forest models trained in XGBoost.

Currently, Neo supports SageMaker ML.C5, ML.C4, ML.M5, ML.M4, ML.P3, and ML.P2 instances and Raspberry Pi, and Jetson TX1 and TX2 devices, and Greengrass devices-based Intel® Atom and Intel® Xeon CPUs, ARM Cortex-A CPUs, and Nvidia Maxwell and Pascal GPUs.

No. Developers can run models using the Amazon SageMaker Neo container without dependencies on the framework.

You pay for the use of the Amazon SageMaker ML instance that runs inference using Amazon SageMaker Neo.

To see a list of support regions, view the Amazon Web Services region table .