We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Host the Spark UI on Amazon SageMaker Studio
You can run Spark applications interactively from
Alternately, if you need more control over the environment, you can use a pre-built SageMaker Spark container to run Spark applications as batch jobs on a fully managed distributed cluster with
Finally, you can run Spark applications by connecting Studio notebooks with
All these options allow you to generate and store Spark event logs to analyze them through the web-based user interface commonly named the
In this post, we share a
Solution overview
The solution integrates Spark History Server into the Jupyter Server app in SageMaker Studio. This allows users to access Spark logs directly from the SageMaker Studio IDE. The integrated Spark History Server supports the following:
- Accessing logs generated by SageMaker Processing Spark jobs
- Accessing logs generated by Amazon Web Services Glue Spark applications
- Accessing logs generated by self-managed Spark clusters and Amazon EMR
A utility command line interface (CLI) called sm-spark-cli
is also provided for interacting with the Spark UI from the SageMaker Studio system terminal. The sm-spark-cli
enables managing Spark History Server without leaving SageMaker Studio.
The solution consists of shell scripts that perform the following actions:
- Install Spark on the Jupyter Server for SageMaker Studio user profiles or for a SageMaker Studio shared space
- Install the
sm-spark-cli
for a user profile or shared space
Install the Spark UI manually in a SageMaker Studio domain
To host Spark UI on SageMaker Studio, complete the following steps:
- Choose System terminal from the SageMaker Studio launcher.
- Run the following commands in the system terminal:
The commands will take a few seconds to complete.
- When the installation is complete, you can start the Spark UI by using the provided
sm-spark-cli
and access it from a web browser by running the following code:
sm-spark-cli start s3://DOC-EXAMPLE-BUCKET/ <SPARK_EVENT_LOGS_LOCATION>
The S3 location where the event logs produced by SageMaker Processing, Amazon Web Services Glue, or Amazon EMR are stored can be configured when running Spark applications.
For SageMaker Studio notebooks and Amazon Web Services Glue Interactive Sessions, you can set up the Spark event log location directly from the notebook by using the sparkmagic
kernel.
The sparkmagic
kernel contains a set of tools for interacting with remote Spark clusters through notebooks. It offers magic ( %spark
, %sql
) commands to run Spark code, perform SQL queries, and configure Spark settings like executor memory and cores.
For the SageMaker Processing job, you can configure the Spark event log location directly from the SageMaker Python SDK.
Refer to the Amazon Web Services documentation for additional information:
- For SageMaker Processing, refer to
PySparkProcessor - For Amazon Web Services Glue Interactive Sessions, refer to
Configuring the Spark UI (console) - For Amazon EMR, refer to
Configure an output location
You can choose the generated URL to access the Spark UI.
The following screenshot shows an example of the Spark UI.
You can check the status of the Spark History Server by using the sm-spark-cli status
command in the Studio System terminal.
You can also stop the Spark History Server when needed.
Automate the Spark UI installation for users in a SageMaker Studio domain
As an IT admin, you can automate the installation for SageMaker Studio users by using a
You can create a lifecycle configuration from the
From a terminal configured with the
After Jupyter Server restarts, the Spark UI and the sm-spark-cli
will be available in your SageMaker Studio environment.
Clean up
In this section, we show you how to clean up the Spark UI in a SageMaker Studio domain, either manually or automatically.
Manually uninstall the Spark UI
To manually uninstall the Spark UI in SageMaker Studio, complete the following steps:
- Choose System terminal in the SageMaker Studio launcher.
- Run the following commands in the system terminal:
Uninstall the Spark UI automatically for all SageMaker Studio user profiles
To automatically uninstall the Spark UI in SageMaker Studio for all user profiles, complete the following steps:
- On the SageMaker console, choose Domains in the navigation pane, then choose the SageMaker Studio domain.
- On the domain details page, navigate to the Environment tab.
- Select the lifecycle configuration for the Spark UI on SageMaker Studio.
- Choose Detach .
- Delete and restart the Jupyter Server apps for the SageMaker Studio user profiles.
Conclusion
In this post, we shared a solution you can use to quickly install the Spark UI on SageMaker Studio. With the Spark UI hosted on SageMaker, machine learning (ML) and data engineering teams can use scalable cloud compute to access and analyze Spark logs from anywhere and speed up their project delivery. IT admins can standardize and expedite the provisioning of the solution in the cloud and avoid proliferation of custom development environments for ML projects.
All the code shown as part of this post is available in the
About the Authors
Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the Amazon Web Services Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of Amazon Web Services services. In his free time, Giuseppe enjoys playing football.
Bruno Pistone is an AI/ML Specialist Solutions Architect for Amazon Web Services based in Milan. He works with customers of any size, helping them understand their technical needs and design AI and ML solutions that make the best use of the Amazon Web Services Cloud and the Amazon Machine Learning stack. His field of expertice includes machine learning end to end, machine learning endustrialization, and generative AI. He enjoys spending time with his friends and exploring new places, as well as traveling to new destinations.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.