- Products›
- Amazon EMR›
- Amazon EMR features
Apache HBase on Amazon EMR
Overview
Apache HBase is a massively scalable, distributed big data store in the Apache Hadoop ecosystem. It is an open-source, non-relational, versioned database which runs on top of Amazon S3 (using EMRFS) or the Hadoop Distributed Filesystem (HDFS), and it is built for random, strictly consistent realtime access for tables with billions of rows and millions of columns. Apache Phoenix integrates with Apache HBase for low-latency SQL access over Apache HBase tables and secondary indexing for increased performance. Additionally, Apache HBase has tight integration with Apache Hadoop, Apache Hive, and Apache Pig, so you can easily combine massively parallel analytics with fast data access. Apache HBase's data model, throughput, and fault tolerance are a good match for workloads in ad tech, web analytics, financial services, applications using time-series data, and many more.Apache
HBase is natively supported in Amazon EMR, so you can quickly and easily create managed Apache HBase clusters from the Amazon Web Services Management Console, Amazon Web Services CLI, or the Amazon EMR API. You can leverage additional Amazon EMR features, including using Amazon S3 as a data store to reduce costs, creating read-replica clusters for increased availability, leveraging your choice of a wide variety Amazon EC2 instances and Amazon EBS volumes for your cluster's hardware, backup-and-restore to Amazon S3 using the Amazon EMR File System (EMRFS), automatic node replacement, and easy resize commands to add or remove instances from your cluster. Also, you can use Hue to visualize your HBase tables and explore your data. Learn more about Apache HBase and about Apache HBase on Amazon EMR.
Page topics
Features and benefits
Open allApache HBase is designed to maintain performance while scaling out to hundreds of nodes, supporting billions of rows and millions of columns. It utilizes Amazon S3 (with EMRFS) or the Hadoop Distributed Filesystem (HDFS) as a fault-tolerant datastore. Amazon EMR supports a wide variety of instance types and Amazon EBS volumes, so you can customize the hardware of your cluster to optimize for cost and performance. Additionally, you can use Apache Phoenix for low-latency SQL over massive HBase tables or creating secondary indexes for increased performance.