Posted On: May 30, 2023

Amazon Glue now offers 3 new features: custom visual transforms, which let customers define, reuse, and share business-specific ETL logic among their teams, three open-source data lake storage frameworks, which read and write data in Amazon Simple Storage Service (Amazon S3) in a transactionally consistent manner, and new connectors. Amazon Glue is a serverless, scalable data integration service that makes it easier to discover, prepare, move, and integrate data from many sources. With these new features, data engineers can write reusable transforms for the Amazon Glue visual job editor, write data into open data lake format standards, and connect to more native data stores.

You define Amazon Glue custom visual transforms using Apache Spark code and the user input form. You can also specify validations for the input form to help protect users from making mistakes. Once you save the files defining the transform to your account, it automatically appears in the dropdown list of available transforms in the visual job editor. You can call custom visual transforms from both visual and code-based jobs, and sharing transforms between accounts is straightforward. In addition, when reading or writing data, data lake frameworks simplify incremental data processing in data lakes built on Amazon S3. They enable capabilities including time travel queries, ACID (Atomicity, Consistency, Isolation, Durability) transactions, streaming ingestion, change data capture (CDC), upserts, and deletes.

This upgrade is available in the same regions as Amazon Glue

  • Amazon Web Services China (Beijing) Region, operated by Sinnet
  • Amazon Web Services China (Ningxia) Region, operated by NWCD

To learn more, visit our documentation.