What does this Amazon Web Services Solution do?

This solution helps you quickly deploy a highly available ClickHouse cluster on the Amazon Cloud. ClickHouse is an open-source OLAP database management system. It can be used in a variety of scenarios, such as the analysis of user behavior data in e-commerce, data storage and data statistics in advertising and telecommunications industries, log analysis in information security, data mining in remote sensing, business intelligence, data processing and value data analysis in online games and the area of Internet of Things(IoT). For more detail features, please see the ClickHouse website.
In order to facilitate your testing and use of the deployed ClickHouse cluster, this solution takes the OnTime flight open dataset as an example, visualizes the data based on the Grafana tool, and provides an analysis report.

Amazon Web Services Solution overview

This solution allows you to quickly launch a highly available ClickHouse cluster environment within minutes. You can select the deployment parameters through the UI in Amazon CloudFormation. In addition, the solution has been integrated with cloud services such as Amazon S3 and Amazon CloudWatch.


The solution deploys the following resources:

1. A highly available architecture that spans two Availability Zones.

2. An Amazon Virtual Private Cloud (Amazon VPC) configured with public and private subnets, according to best practices, to provide you with your own virtual network on Amazon Web Services.

3. A dynamically generated random text string is stored in Amazon Secrets Manager to be used as password.

4. In the public subnets:

    - A Linux bastion host in an Amazon Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon EC2 instances in public and private subnets.

    - An Elastic IP address is associated with the bastion host.

    - An internet gateway to allow internet access for bastion host.

    - A Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.

5. In the private subnets:

    - A ClickHouse client host to allow the administrator to connect to the ClickHouse cluster using command line and graphical interface.

    - A ClickHouse database cluster that includes several Amazon EC2 instances. The default is two.

    - A Zookeeper cluster that includes three Amazon EC2 instances. 

    - Metadata for ClickHouse replication is stored in ZooKeeper. 

    - Each replica stores its state in ZooKeeper as the set of parts and its checksums. The default is three.

6. An Amazon ELB using Network Load Balancer that is in front of the ClickHouse cluster.

7. Three Amazon Security Groups are created, one for bastion and one for ClickHouse cluster and one for Zookeeper cluster.

    - A bastion’s Security Group with specific IPs to the inbound rule to allow access from them.

    - A ClickHouse’s Security Group, which add cluster communication port to the inbound rule.An Admin’s Security Group, which add graphical interface to the inbound rule.

    - A private Amazon S3 bucket is created for tiered storage (local disk first and S3 with a move factor) of the ClickHouse cluster with the following naming rules: "clickhouse-data-vpcid".

8. An Amazon CloudWatch Logs enables you to centralize the logs from the ClickHouse cluster and adjust the log retention policy.

9. An Amazon CloudWatch dashboard to monitor the CPU, memory, disk IO and network on the EC2 instances in the ClickHouse and Zookeeper cluster, send an Amazon Simple Notification Service (Amazon SNS) email notification when the alarm is triggered.

    - When CPU utilization (cpu_usage_user and cpu_usage_system from CloudWatch agent) higher than 90% and continue more than 5 minutes.

    - When memory utilization (mem_used_percent from CloudWatch agent) higher than 90% and continue more than 5 minutes.

    - Disk IOPS (diskio_writes and diskio_reads from CloudWatch agent).

    - Network throughout (net_bytes_sent and net_bytes_recv from CloudWatch agent).

10. After deploy the solution, you can obtain example datasets and import them into ClickHouse.


Integration with cloud services

When monitoring is required, users need to integrate ClickHouse with other third-party services by themselves. This solution integrates Amazon S3 for tiered storage, uses Amazon CloudWatch for log collection and resource metrics dashboard and also provides Grafana client for data visualization.

Support of ARM architecture

While ClickHouse brings excellent performance in use of hardware, it causes heavy hardware burden for customers. This solution supports the deployment in ARM based Amazon EC2 instance, allowing customers to save cost on hardware investment.

Easy to use

Many settings are often required when launching a ClickHouse cluster. This solution provides a best practice to help customers quickly build a ClickHouse distributed cluster environment on Amazon Cloud. You can easily deploy this solution into your own Amazon Web Services account with just 1-click via Amazon CloudFormation. Please refer to the deployment guide for more details.
Explore all Amazon Web Services Solutions

Browse our portfolio of Amazon Web Services -built solutions to common architectural problems.

Learn more 
Find a Partner

Find Amazon Web Services certified consulting and technology partners to help you get started.

Learn more 
Start building in the console

Sign-up and start exploring our services.

Get started