Posted On: Sep 28, 2021

Today, we’re pleased to announce Amazon Glue version 3.0, a new version of Amazon Glue that accelerates your data integration workloads in Amazon Web Services. Amazon Glue 3.0 introduces a performance-optimized Spark runtime that includes optimizations from Amazon Glue and Amazon EMR, and is based on open-source Apache Spark 3.1.1. The Amazon Glue 3.0 runtime optimizes both read and write access to Amazon Simple Storage Service (Amazon S3), using faster vectorized readers and Amazon S3 optimized output committers. It also optimizes access to the Amazon Glue Data Catalog with the use of partition predicates. For highly partitioned datasets, Glue 3.0 improves the execution speed by filtering out unnecessary partitions using partition indexes. Amazon Glue 3.0 runtime is also fully integrated with Amazon Lake Formation, so you can secure your data access in different granularities like database-, table-, column-, row-, and cell-level access control using resource names and Amazon Lake Formation tag based access control. With Amazon Glue 3.0, we also bring in new capabilities to improve user experience for monitoring, debugging, and tuning Spark applications. Spark 3.1.1 enables an improved Spark UI experience that includes new Spark executor memory metrics and Spark Structured Streaming metrics that are useful for Amazon Glue streaming jobs. Similar to Amazon Glue 2.0, Amazon Glue 3.0 reduces startup latency and improve the overall job completion times.

AMAZON Glue 3.0 is available in every Region where Amazon Glue is available, including the Amazon Web Services China (Beijing) Region, operated by Sinnet and the Amazon Web Services China (Ningxia) Region, operated by NWCD. To learn more about this feature, visit the blog and the Amazon Glue User Guide.