Integrated data catalog

The Amazon Glue Data Catalog is your persistent metadata store for all your data assets, regardless of where they are located. The Data Catalog contains table definitions, job definitions, and other control information to help you manage your Amazon Glue environment. It automatically computes statistics and registers partitions to make queries against your data efficient and cost-effective. It also maintains a comprehensive schema version history so you can understand how your data has changed over time.

Automatic schema discovery

Amazon Glue crawlers connect to your source or target data store, progresses through a prioritized list of classifiers to determine the schema for your data, and then creates metadata in your Amazon Glue Data Catalog. The metadata is stored in tables in your data catalog and used in the authoring process of your ETL jobs. You can run crawlers on a schedule, on-demand, or trigger them based on an event to ensure that your metadata is up-to-date.

Code generation

Amazon Glue automatically generates the code to extract, transform, and load your data. Simply point Amazon Glue to your data source and target, and Amazon Glue creates ETL scripts to transform, flatten, and enrich your data. The code is generated in Scala or Python and written for Apache Spark.

Developer endpoints

If you choose to interactively develop your ETL code, Amazon Glue provides development endpoints for you to edit, debug, and test the code it generates for you. You can use your favorite IDE or notebook. You can write custom readers, writers, or transformations and import them into your Amazon Glue ETL jobs as custom libraries. You can also use and share code with other developers in our GitHub repository.

Flexible job scheduler

Amazon Glue jobs can be invoked on a schedule, on-demand, or based on an event. You can start multiple jobs in parallel or specify dependencies across jobs to build complex ETL pipelines. Amazon Glue will handle all inter-job dependencies, filter bad data, and retry jobs if they fail. All logs and notifications are pushed to Amazon CloudWatch so you can monitor and get alerts from a central service.

Standard Product Icons (Features) Squid Ink
Visit the pricing page

Explore pricing options for Amazon Glue.

Learn more 
Sign up for a free account
Sign up for an account
Sign up 
Standard Product Icons (Start Building) Squid Ink
Start building on the console

Get started building with Amazon Glue on the Amazon Web Services Management Console.

Sign in