Q: What is Apache Kafka?
Q: What is streaming data?
Q: What are Apache Kafka’s primary capabilities?
- Apache Kafka stores streaming data in a fault-tolerant way as a continuous series of records and preserves the order in which the records were produced.
- Apache Kafka acts as a buffer between data producers and data consumers. Apache Kafka allows many data producers (e.g. websites, IoT devices, Amazon EC2 instances) to continuously publish streaming data and categorize this data using Apache Kafka topics. Multiple data consumers (e.g. machine learning applications, Lambda functions) read from these topics at their own rate, similar to a message queue or enterprise messaging system.
- Data consumers process data from Apache Kafka topics on a first-in-first-out basis, preserving the order data was produced.
Q: What are the key concepts of Apache Kafka?
Q: When should I use Apache Kafka?
Apache Kafka is used to support real-time applications that transform, deliver, and react to streaming data, and for building real-time streaming data pipelines that reliably get data between multiple systems or applications.
Q: What does Amazon MSK do?
Amazon MSK makes it easy to get started and run open-source versions of Apache Kafka in Amazon Web Services with high availability and security while providing integration with Amazon Web Services services without the operational overhead of running an Apache Kafka cluster. Amazon MSK allows you to use and configure open-source versions of Apache Kafka while the service manages the setup, provisioning, Amazon Web Services integrations, and on-going maintenance of Apache Kafka clusters.
With a few clicks in the console, you can provision an Amazon MSK cluster. From there, Amazon MSK replaces unhealthy brokers, automatically replicates data for high availability, manages Apache ZooKeeper nodes, automatically deploys hardware patches as needed, manages the integrations with Amazon Web Services services, makes important metrics visible through the console, and supports Apache Kafka version upgrades so you can take advantage of improvements to the open-source version of Apache Kafka.
Q: What Apache Kafka versions does Amazon MSK support?
For supported Kafka versions, see the Amazon MSK documentation.
Q: Are Apache Kafka APIs compatible with Amazon MSK?
Yes, all data plane and admin APIs are natively supported by Amazon MSK.
Q: Is the Apache Kafka AdminClient supported by Amazon MSK?
Data production and consumption
Q: Can I use Apache Kafka APIs to get data in and out of Apache Kafka?
Yes, Amazon MSK supports the native Apache Kafka producer and consumer APIs. Your application code does not need to change when clients begin to work with clusters within Amazon MSK.
Q: Can I use Apache Kafka Connect, Apache Kafka Streams, or any other ecosystem component of Apache Kafka with Amazon MSK?
Yes, you can use any component that leverages the Apache Kafka producer and consumer APIs, and the Apache Kafka Admin Client. Tools that upload .jar files into Apache Kafka clusters are currently not compatible with Amazon MSK, including Confluent Control Center, Confluent Auto Data Balancer, and Uber uReplicator.
Migrating to Amazon MSK
You can create your first cluster with a few clicks in the Amazon Web Services management console or using the Amazon SDKs. First, in the Amazon MSK console select an Amazon Web Services region to create an Amazon MSK cluster in. Choose a name for your cluster, the VPC you want to run the cluster with, a data replication strategy for the cluster, and the subnets for each AZ. Next, pick a broker instance type and quantity of brokers per AZ, and click create.
Each cluster contains broker instances, provisioned storage, and Apache ZooKeeper nodes.
You can choose EC2 T3.small or instances within the EC2 M5 instance family.
Q: Does Amazon MSK offer Reserved Instance pricing?
No, not at this time.
No, each broker you provision includes boot volume storage managed by the Amazon MSK service.
Some resources, like elastic network interfaces (ENIs), will show up in your Amazon EC2 account. Other Amazon MSK resources will not show up in your EC2 account as these are managed by the Amazon MSK service.
You need to provision broker instances and broker storage with every cluster you create. You do not provision Apache ZooKeeper nodes as these resources are included at no additional charge with each cluster you create.
Unless otherwise specified, Amazon MSK uses the same defaults specified by the open-source version of Apache Kafka. The default settings are documented here.
Q: Can I provision brokers such that they are imbalanced across AZs (e.g. 3 in cn-north-1a, 2 in cn-north-1b, 1 in cn-north-1c)?
No, Amazon MSK enforces the best practice of balancing broker quantities across AZs within a cluster.
Amazon MSK uses Apache Kafka’s leader-follower replication to replicate data between brokers. Amazon MSK makes it easy to deploy clusters with multi-AZ replication and gives you the option to use a custom replication strategy by topic. By default with each of the replication options, leader and follower brokers will be deployed and isolated using the replication strategy specified. For example, if you select a 3 AZ broker replication strategy with 1 broker per AZ cluster, Amazon MSK will create a cluster of three brokers (1 broker in three AZs in a region), and by default (unless you choose to override the topic replication factor) the topic replication factor will also be 3.
Q: Can I change the default broker configurations or upload a cluster configuration to Amazon MSK?
Yes, Amazon MSK allows you to create custom configurations and apply them to new and existing clusters. For more information on custom configurations, see the configuration documentation.
Connecting to the VPC
Q: How do I connect to my Amazon MSK cluster outside of the VPC?
There are several methods to connect to your Amazon MSK clusters outside of your VPC: VPN, VPC Peering, VPC Transit Gateway, Amazon Direct Connect. You can also use a REST proxy on an instance running within your VPC. REST proxies allow your producers and consumers to communicate to the cluster through HTTP API requests.
Yes, Amazon MSK uses Amazon EBS server-side encryption and Amazon KMS keys to encrypt storage volumes.
Yes, by default new clusters have encryption in-transit enabled via TLS for inter-broker communication. You can opt-out of using encryption in-transit when a cluster is created.
Yes, by default in-transit encryption is set to TLS only for clusters created from the CLI or Amazon Web Services Console. Additional configuration is required for clients to communicate with clusters using TLS encryption. You can change the default encryption setting by selecting the TLS/plaintext or plaintext settings. Read More: MSK Encryption
Yes, Amazon MSK clusters running Apache Kafka version 2.5.1 or greater support TLS in-transit encryption between Kafka brokers and ZooKeeper nodes.
Q: How do I control cluster authentication and Apache Kafka API authorization?
Amazon MSK offers three options for controlling authentication (AuthN) and authorization (AuthZ). 1) IAM Access Control for both AuthN/Z (recommended), 2) TLS certificate authentication (CA) for AuthN and access control lists for AuthZ, and 3) SASL/SCRAM for AuthN and access control lists for AuthZ. Amazon MSK recommends using IAM Access Control. It’s the easiest to use and because it defaults to least privilege access, the most secure option.
Q: How does authorization work in Amazon MSK?
If you are using IAM Access Control, Amazon MSK uses the policies you write and its own authorizer to authorize actions. If you are using TLS certificate authentication or SASL/SCRAM, Apache Kafka uses access control lists (ACLs) for authorization. To enable ACLs you must enable client authentication using either TLS certificates or SASL/SCRAM.
Q: How can I authenticate and authorize a client at the same time?
If you are using IAM Access Control, Amazon MSK will authenticate and authorize for you without any additional set up. If you are using TLS authentication, you can use the Dname of clients TLS certificates as the principal of the ACL to authorize client requests. If you are using SASL/SCRAM, you can use the username as the principal of the ACL to authorize client requests.
Q: How do I control service API actions?
You can control service API actions using Amazon IAM.
Q: Can I enable IAM Access Control for an existing cluster?
No, however a feature that would allow you to update your authentication settings is coming soon.
Q: Can I use IAM Access Control outside of Amazon MSK?
No, IAM Access Control is only available for Amazon MSK clusters.
Q: Can I update authentication settings on my cluster?
You can enable or disable authentication modes for your clusters from the console or through the update-security API. When using the API, the authentication modes that are explicitly declared will be modified accordingly, while those that are omitted will be maintained as is. For example, if your cluster uses mTLS for authentication and you enable IAM Access Control by calling the update-security API, both mTLS and IAM Access Control will be enabled on your cluster.
Q: Can I enable multiple authentication modes on my cluster?
Yes, you can add multiple authentication modes to your cluster, both during creation and updates. The brokers within the cluster have dedicated ports for each authentication mode, and your clients that connect to Kafka through these ports must have the corresponding authentication mode enabled.
Q: Can I disable an authentication mode on my cluster?
Yes, you can disable an authentication mode. To ensure that your clients do not lose connectivity with the brokers, do not disable any existing authentication modes until all the clients have been updated to use other available authentication modes.
Q: Can I track clients using an authentication mode with my cluster?
Yes, you can track the number of open connections by authentication mode using the ClientConnectionCount metric published to Amazon CloudWatch metrics.
Monitoring, metrics, logging, tagging
Q: How do I monitor the performance of my clusters or topics?
You can monitor the performance of your clusters using the Amazon MSK console, Amazon CloudWatch console, or you can access JMX and host metrics using Open Monitoring with Prometheus, an open source monitoring solution.
Q: What is the cost for the different CloudWatch monitoring levels?
The cost of monitoring your cluster using Amazon CloudWatch is dependent on the monitoring level and the size of your Apache Kafka cluster. Amazon CloudWatch charges per metric per month and includes a free tier; see Amazon CloudWatch pricing for more information. For details on the number of metrics exposed for each monitoring level, see Amazon MSK monitoring documentation.
Q: What monitoring tools are compatible with Open Monitoring with Prometheus?
Tools that are designed to read from Prometheus exporters are compatible with Open Monitoring, like: Datadog, Lenses, New Relic, Sumologic, or a Prometheus server. For details on Open Monitoring, see Amazon MSK Open Monitoring documentation.
Q: How do I monitor the health and performance of clients?
You can use any client-side monitoring supported by the Apache Kafka version you are using.
Q: Can I tag Amazon MSK resources?
Yes, you can tag Amazon MSK clusters from the Amazon CLI or Console.
Q: How do I monitor consumer lag?
Topic level consumer lag metrics are available as part of the default set of metrics that Amazon MSK publishes to Amazon CloudWatch for all clusters. No additional setup is required to get these metrics. To get partition level metrics (partition dimension), you can enable enhanced monitoring (PER_PARTITION_PER_TOPIC) on your cluster. Alternatively, you can enable Open Monitoring on your cluster, and use a Prometheus server, to capture partition level metrics from the brokers in your cluster. Consumer lag metrics are available at port 11001, just as other Kafka metrics.
Q: How much does it cost to publish the consumer lag metric to Amazon CloudWatch?
Topic level metrics are included in the default set of Amazon MSK metrics, which are free of charge. Partition level metrics are charged as per Amazon CloudWatch pricing.
Q: How do I access Apache Kafka broker Logs?
You can enable broker log delivery for new and existing Amazon MSK clusters. You can deliver broker logs to Amazon CloudWatch Logs, Amazon S3, and Kinesis Data Firehose. Kinesis Data Firehose supports Amazon Elasticsearch Service among other destinations. To learn how to enable this feature, see the Amazon MSK Logging Documentation. To learn about pricing refer to CloudWatch Logs and Kinesis Data Firehose pricing pages.
Q: What is the logging level for broker logs?
Amazon MSK provides INFO level logs for all brokers within a cluster.
Q: How do I access Apache ZooKeeper Logs?
You can request Apache ZooKeeper logs through a support ticket.
Q: Can I log the use of Apache Kafka resource APIs, like create topic?
Yes, if you use IAM Access Control, the use of Apache Kafka resource APIs is logged to Amazon CloudTrail.
Q: What is Apache ZooKeeper?
From https://zookeeper.apache.org/: “Apache ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications,” including Apache Kafka.
Q: Does Amazon MSK use Apache ZooKeeper?
Yes, Amazon MSK uses Apache ZooKeeper and manages Apache ZooKeeper within each cluster as a part of the Amazon MSK service. Apache ZooKeeper nodes are included with each cluster at no additional cost.
Q: How do my clients interact with Apache ZooKeeper?
Your clients can interact with Apache ZooKeeper through an Apache ZooKeeper endpoint provided by the service. This endpoint is provided in the Amazon Web Services management console or using the DescribeCluster API.
Q: What Amazon Web Services does Amazon MSK integrate with?
Amazon MSK integrates with:
- Amazon VPC for network isolation and security
- Amazon CloudWatch for metrics
- Amazon KMS for storage volume encryption
- Amazon IAM for authentication and authorization of Apache Kafka and service APIs.
- Amazon Lambda for MSK event sourcing
- Amazon IoT for IoT event sourcing
- Amazon Glue Schema Registry for controlling the evolution of schemas used by Apache Kafka applications
- Amazon CloudTrail for Amazon API logs
- Amazon Certificate Manager for Private CAs used for client TLS authentication
- Amazon CloudFormation for describing and provisioning Amazon MSK clusters using code
- Amazon Kinesis Data Analytics for fully managed Apache Flink applications that process streaming data
- Amazon Secrets Manager for client credentials used for SASL/SCRAM authentication
Q: How can I scale up storage in my cluster?
You can scale up storage in your cluster using the Amazon Web Services Management Console or the Amazon CLI.
Q. How can I automatically expand storage in my cluster?
You can create an auto scaling policy for storage using the Amazon Web Services Management console or by creating an Amazon Web Services Application Auto scaling Policy using the Amazon CLI or APIs.
Q. Can I scale a broker instance size in an existing cluster?
Yes. You can choose to scale to a smaller or larger broker type on your Amazon MSK clusters.
Q: How do I balance partitions across brokers?
You can use Cruise Control for automatically rebalancing partitions to manage I/O heat. See the Cruise Control documentation for more information. Alternatively, you can use the Kafka Admin API kafka-reassign-partitions.sh to reassign partitions across brokers.
Pricing and availability
Q: How does Amazon MSK pricing work?
Pricing is per broker-hour and per GB-month of storage provisioned. Amazon Web Services data transfer rates apply for data transfer in and out of Amazon MSK. For more information, visit our pricing page.
Q: Do I pay for data transfer as a result of data replication?
No, all in-cluster data transfer is included with the service at no additional charge.
Q: How does data transfer pricing work?
You will pay standard Amazon Web Services data transfer charges for data transferred in and out of an Amazon MSK cluster. You will not be charged for data transfer within the cluster in a region, including data transfer between brokers and data transfer between brokers and Apache ZooKeeper nodes.
Service Level Agreement
Our Amazon MSK SLA guarantees a Monthly Uptime Percentage of at least 99.9% for Amazon MSK.
Q: How do I know if I qualify for a SLA Service Credit?
You are eligible for a SLA credit for Amazon MSK under the Amazon MSK SLA if Multi-AZ deployments on Amazon MSK have a Monthly Uptime Percentage of less than 99.9% during any monthly billing cycle.