Amazon Kinesis Data Streams is a fully managed service for real-time processing of streaming data at massive scale. You can configure hundreds of thousands of data producers to continuously put data into a Kinesis stream. For example, data from website clickstreams, application logs, and social media feeds. In less than a second, the data will be available for your Kinesis Applications to read and process from the stream.
Amazon Kinesis Data Streams Concepts
A shard is the base throughput unit of a Kinesis stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second. You can dynamically add or remove shards from your stream as your data throughput changes via resharding.
A record is the unit of data stored in a Kinesis stream. A record is composed of a sequence number, partition key, and data blob. A data blob is the data of interest your data producer adds to a stream. The maximum size of a data blob (the data payload after Base64-decoding) is 1 megabyte (MB).
Partition key is used to segregate and route data records to different shards of a stream. A partition key is specified by your data producer while putting data into a Kinesis stream. For example, assuming you have a Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.
A sequence number is a unique identifier for each data record. Sequence number is assigned by Amazon Kinesis Data Streams when a data producer calls PutRecord or PutRecords API to add data to a Kinesis stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become.
Use Amazon Kinesis Data Streams
After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Streams by:
Put Data into Amazon Kinesis Data Streams
Data producers can put data into Kinesis streams via PutRecord and PutRecords APIs, or Kinesis Producer Library (KPL). PutRecord API allows a single data record within an API call and PutRecords API allows multiple data records within an API call.
Kinesis Producer Library (KPL)
Kinesis Producer Library (KPL) is an easy to use and highly configurable library that helps you put data into a Kinesis stream. Kinesis Producer Library (KPL) presents a simple, asynchronous, and reliable interface that enables you to quickly achieve high producer throughput with minimal client resources.
A Kinesis Application is a data consumer that reads and processes data from a Kinesis stream. You can build your Kinesis Applications using either Kinesis Data Streams API or Kinesis Client Library (KCL).
Kinesis Client Library (KCL)
Kinesis Client Library (KCL) is a pre-built library that helps you easily build Kinesis Applications for reading and processing data from a Kinesis stream. Kinesis Client Library (KCL) handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. Kinesis Client Library (KCL) enables you to focus on business logic while building Kinesis Applications.
Kinesis Connector Library
Kinesis Connector Library is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools. Kinesis Client Library (KCL) is required for using Kinesis Connector Library. The current version of this library provides connectors to Amazon DynamoDB, Amazon S3, and Elasticsearch. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.
Kinesis Storm Spout
Kinesis Storm Spout is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with Apache Storm. The current version of Kinesis Storm Spout fetches data from Kinesis stream and emits it as tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis Data Streams as a reliable, scalable, stream capture, storage, and replay service.
Amazon CloudWatch Integration
Amazon Kinesis Data Streams integrates with Amazon CloudWatch so that you can collect, view, and analyze CloudWatch metrics for your Kinesis streams. For more information about Amazon Kinesis Data Streams metrics, see Monitoring Amazon Kinesis with Amazon CloudWatch.
Amazon IAM Integration
Amazon Kinesis Data Streams integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to put data into your Kinesis stream. For more information about access management and control of your Kinesis stream, see Controlling Access to Amazon Kinesis Resources using IAM.
Amazon CloudTrail Integration
Amazon Kinesis Data Streams integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis Data Streams APIs, see Logging Kinesis Streams API calls Using Amazon CloudTrail.
Amazon Kinesis Data Streams allows you to tag your Kinesis streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your Kinesis streams by cost centers so that you can categorize and track your Amazon Kinesis Data Streams costs based on cost centers. For more information about Kinesis Data Streams tagging, see Tagging Your Kinesis Streams.