Amazon Glue Data Quality is now available in Amazon Web Services China Regions
Amazon Glue Data Quality is now general availability in Amazon Web Services China Regions. Glue Data Quality automatically measures and monitors quality of data in data repositories and in Amazon Glue ETL pipelines.
Amazon Glue Data Quality helps reduce the need for manual data quality work by automatically analyzing your data to gather data statistics. It uses open-source Deequ to evaluate rules and measure and monitor the data quality of petabyte-scale data lakes. It then recommends data quality rules to get started. You can update recommended rules or add new rules. If data quality deteriorates, you can configure actions to alert users and drill down into the issue's root cause. Data quality rules and actions can also be configured on Amazon Glue data pipelines, helping prevent "bad" data from entering data lakes and data warehouses.
Additionally, Glue DQ has ML-powered anomaly detection capability in Glue ETL that uses advanced algorithms to detect hard-to-find data quality issues and anomalies. While rule-based approach works well for known data patterns, this feature helps customers proactively identify unanticipated issues. Data engineers and analysts can write rules or analyzers and turn on Anomaly Detection in Glue ETL, which collects column statistics, applies ML algorithms, and generates easy-to-understand visual observations explaining detected issues.
To learn more, visit Amazon Glue Data Quality.
These new features are now available in the regions as Amazon Glue
• Amazon Web Services China (Beijing) Region, operated by Sinnet
• Amazon Web Services China (Ningxia) Region, operated by NWCD
To learn more, visit the Amazon Glue documentation.