Posted On: Jan 25, 2021

Amazon EMR integration with Apache Ranger is now available in Amazon Web Services China (Beijing) Region, operated by Sinnet and Amazon Web Services China (Ningxia) Region, operated by NWCD. You can define, enforce, and audit fine-grained data access control on Amazon EMR with Apache Ranger and leverage Amazon CloudWatch to capture auditing logs.  

Apache Ranger is an open-source tool to enable, monitor, and manage comprehensive data security across the Hadoop platform. Previously, you can use Apache Ranger to enforce fine-grained authorization on data in HDFS with Apache Hive using this blog post. Now this native integration enables additional capabilities. You can define the three types of authorization policies on Apache Ranger Policy Admin server. You can set table, column, and row level authorization for Apache Hive, table and column level authorization for Apache Spark, and prefix and object level authorization for Amazon S3. Amazon EMR automatically installs and configures the corresponding Apache Ranger plugins on the cluster. These Ranger plugins sync up with the Policy Admin server for authorization polices, enforce data access control, and send auditing events to Amazon CloudWatch Logs.

Here are some considerations and limitations before you enable Apache Ranger integration on Amazon EMR. 1/ Row-level authorization and data masking policies are currently only supported with Apache Hive. 2/ The EMR Ranger-Spark plugin enforces fine-grained authorization when reading and writing data using the Spark API with Java, Scala, R, and Pyspark. However, writing data using Spark SQL on Ranger-Enabled Clusters is currently not supported; only reading data using SparkSQL is supported. 3/ This native integration supports selected applications like Apache Zeppelin and Hue. For a full list of supported applications, see Supported Applications

To get started, see the following list of resources:

• Amazon EMR Management Guide: Integrating Amazon EMR with Apache Ranger
• Amazon Web Services Big Data Blog post: Introducing Amazon EMR integration with Apache Ranger