Posted On: Mar 7, 2023

We’re pleased to announce the launch of Amazon Glue version 4.0, a new version of Amazon Glue that accelerates data integration workloads in Amazon. Amazon Glue 4.0 upgrades the Spark engines to Apache Spark 3.3.0 and Python 3.10. Glue 4.0 gives customers the latest Spark and Python releases so they can develop, run, and scale their data integration workloads and get insights faster.

Amazon Glue is a serverless, scalable data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources. Amazon Glue 4.0 adds support for built-in Pandas APIs as well as support for Apache Hudi, Apache Iceberg, and Delta Lake formats, giving you more options for analyzing and storing your data. It upgrades connectors for native Amazon Glue database sources such as RDS, MySQL, and SQLServer, which simplifies connections to common database sources. Amazon Glue 4.0 also adds native support for the new Cloud Shuffle Storage Plugin for Apache Spark, which helps customers scale their disk usage during runtime. It enables Adaptive Query Execution which dynamically optimizes your queries as it runs. Finally, Amazon Glue 4.0 improves the developer experience by adding more context to error messages. As with Amazon Glue 3.0, customers only pay for the resources they use.

Amazon Glue 4.0 is generally available today in Amazon Web Services China (Beijing) region, operated by Sinnet and Amazon Web Services China (Ningxia) region, operated by NWCD.

 

To learn more, visit our documentation.