What does this Amazon Web Services Solution do?
This solution helps you quickly deploy a highly available ClickHouse cluster on the Amazon Cloud. ClickHouse is an open-source OLAP database management system. It can be used in a variety of scenarios, such as the analysis of user behavior data in e-commerce, data storage and data statistics in advertising and telecommunications industries, log analysis in information security, data mining in remote sensing, business intelligence, data processing and value data analysis in online games and the area of Internet of Things(IoT). For more detail features, please see the ClickHouse website.
In order to facilitate your testing and use of the deployed ClickHouse cluster, this solution takes the OnTime flight open dataset as an example, visualizes the data based on the Grafana tool, and provides an analysis report.
Amazon Web Services Solution overview
This solution allows you to quickly launch a highly available ClickHouse cluster environment within minutes. You can select the deployment parameters through the UI in Amazon CloudFormation. In addition, the solution has been integrated with cloud services such as Amazon S3 and Amazon CloudWatch.
The solution deploys the following resources:
1. A highly available architecture that spans two Availability Zones.
2. An Amazon Virtual Private Cloud (Amazon VPC) configured with public and private subnets, according to best practices, to provide you with your own virtual network on Amazon Web Services.
3. A dynamically generated random text string is stored in Amazon Secrets Manager to be used as password.
4. In the public subnets:
- A Linux bastion host in an Amazon Auto Scaling group to allow inbound Secure Shell (SSH) access to Amazon EC2 instances in public and private subnets.
- An Elastic IP address is associated with the bastion host.
- An internet gateway to allow internet access for bastion host.
- A Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.
5. In the private subnets:
- A ClickHouse client host to allow the administrator to connect to the ClickHouse cluster using command line and graphical interface.
- A ClickHouse database cluster that includes several Amazon EC2 instances. The default is two.
- A Zookeeper cluster that includes three Amazon EC2 instances.
- Metadata for ClickHouse replication is stored in ZooKeeper.
- Each replica stores its state in ZooKeeper as the set of parts and its checksums. The default is three.
6. An Amazon ELB using Network Load Balancer that is in front of the ClickHouse cluster.
7. Three Amazon Security Groups are created, one for bastion and one for ClickHouse cluster and one for Zookeeper cluster.
- A bastion’s Security Group with specific IPs to the inbound rule to allow access from them.
- A ClickHouse’s Security Group, which add cluster communication port to the inbound rule.An Admin’s Security Group, which add graphical interface to the inbound rule.
- A private Amazon S3 bucket is created for tiered storage (local disk first and S3 with a move factor) of the ClickHouse cluster with the following naming rules: "clickhouse-data-vpcid".
8. An Amazon CloudWatch Logs enables you to centralize the logs from the ClickHouse cluster and adjust the log retention policy.
9. An Amazon CloudWatch dashboard to monitor the CPU, memory, disk IO and network on the EC2 instances in the ClickHouse and Zookeeper cluster, send an Amazon Simple Notification Service (Amazon SNS) email notification when the alarm is triggered.
- When CPU utilization (cpu_usage_user and cpu_usage_system from CloudWatch agent) higher than 90% and continue more than 5 minutes.
- When memory utilization (mem_used_percent from CloudWatch agent) higher than 90% and continue more than 5 minutes.
- Disk IOPS (diskio_writes and diskio_reads from CloudWatch agent).
- Network throughout (net_bytes_sent and net_bytes_recv from CloudWatch agent).
10. After deploy the solution, you can obtain example datasets and import them into ClickHouse.
ClickHouse on Amazon Web Services
Last updated: 09/2022
Author: Amazon Web Services
Estimated deployment time: 40 min
Integration with cloud services
Support of ARM architecture
Easy to use
Browse our portfolio of Amazon Web Services -built solutions to common architectural problems.
Find Amazon Web Services certified consulting and technology partners to help you get started.
Sign-up and start exploring our services.