Amazon Kinesis Data Streams Concepts

Shard

A shard is the base throughput unit of a Kinesis stream. One shard provides a capacity of 1MB/sec data input and 2MB/sec data output. One shard can support up to 1000 PUT records per second. You will specify the number of shards needed when you create a stream. For example, you can create a stream with two shards. This stream has a throughput of 2MB/sec data input and 4MB/sec data output, and allows up to 2000 PUT records per second. You can dynamically add or remove shards from your stream as your data throughput changes via resharding.

Data Record

A record is the unit of data stored in a Kinesis stream. A record is composed of a sequence number, partition key, and data blob. A data blob is the data of interest your data producer adds to a stream. The maximum size of a data blob (the data payload after Base64-decoding) is 1 megabyte (MB).

Partition Key

Partition key is used to segregate and route data records to different shards of a stream. A partition key is specified by your data producer while putting data into a Kinesis stream. For example, assuming you have a Kinesis stream with two shards (Shard 1 and Shard 2). You can configure your data producer to use two partition keys (Key A and Key B) so that all data records with Key A are added to Shard 1 and all data records with Key B are added to Shard 2.

Sequence Number

A sequence number is a unique identifier for each data record. Sequence number is assigned by Amazon Kinesis Data Streams when a data producer calls PutRecord or PutRecords API to add data to a Kinesis stream. Sequence numbers for the same partition key generally increase over time; the longer the time period between PutRecord or PutRecords requests, the larger the sequence numbers become.

Use Amazon Kinesis Data Streams

After you sign up for Amazon Web Services, you can start using Amazon Kinesis Data Streams by:

Put Data into Amazon Kinesis Data Streams

Data producers can put data into Kinesis streams via PutRecord and PutRecords APIs, or Kinesis Producer Library (KPL). PutRecord API allows a single data record within an API call and PutRecords API allows multiple data records within an API call.

Kinesis Producer Library (KPL)

Kinesis Producer Library (KPL) is an easy to use and highly configurable library that helps you put data into a Kinesis stream. Kinesis Producer Library (KPL) presents a simple, asynchronous, and reliable interface that enables you to quickly achieve high producer throughput with minimal client resources.

Build Amazon Kinesis Applications

A Kinesis Application is a data consumer that reads and processes data from a Kinesis stream. You can build your Kinesis Applications using either Kinesis Data Streams API or Kinesis Client Library (KCL).

Kinesis Client Library (KCL)

Kinesis Client Library (KCL) is a pre-built library that helps you easily build Kinesis Applications for reading and processing data from a Kinesis stream. Kinesis Client Library (KCL) handles complex issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance. Kinesis Client Library (KCL) enables you to focus on business logic while building Kinesis Applications.

Kinesis Connector Library

Kinesis Connector Library is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with other AWS services and third-party tools. Kinesis Client Library (KCL) is required for using Kinesis Connector Library. The current version of this library provides connectors to Amazon DynamoDB, Amazon S3, and Elasticsearch. The library also includes sample connectors of each type, plus Apache Ant build files for running the samples.

Kinesis Storm Spout

Kinesis Storm Spout is a pre-built library that helps you easily integrate Amazon Kinesis Data Streams with Apache Storm. The current version of Kinesis Storm Spout fetches data from Kinesis stream and emits it as tuples. You will add the spout to your Storm topology to leverage Amazon Kinesis Data Streams as a reliable, scalable, stream capture, storage, and replay service.

Management Features

Amazon CloudWatch Integration

Amazon Kinesis Data Streams integrates with Amazon CloudWatch so that you can collect, view, and analyze CloudWatch metrics for your Kinesis streams. For more information about Amazon Kinesis Data Streams metrics, see Monitoring Amazon Kinesis with Amazon CloudWatch.

Amazon IAM Integration

Amazon Kinesis Data Streams integrates with AWS Identity and Access Management (IAM), a service that enables you to securely control access to your AWS services and resources for your users. For example, you can create a policy that only allows a specific user or group to put data into your Kinesis stream. For more information about access management and control of your Kinesis stream, see Controlling Access to Amazon Kinesis Resources using IAM.

Amazon CloudTrail Integration

Amazon Kinesis Data Streams integrates with Amazon CloudTrail, a service that records AWS API calls for your account and delivers log files to you. For more information about API call logging and a list of supported Amazon Kinesis Data Streams APIs, see Logging Kinesis Streams API calls Using Amazon CloudTrail.

Tagging Support

Amazon Kinesis Data Streams allows you to tag your Kinesis streams for easier resource and cost management. A tag is a user-defined label expressed as a key-value pair that helps organize AWS resources. For example, you can tag your Kinesis streams by cost centers so that you can categorize and track your Amazon Kinesis Data Streams costs based on cost centers. For more information about Kinesis Data Streams tagging, see Tagging Your Kinesis Streams.

Intended Usage and Restrictions

Your use of this service is subject to the AWS Customer Agreement.

Learn more about Amazon Kinesis Data Streams pricing

Visit the pricing page
Ready to build?
Get started with Amazon Kinesis Data Streams
Have more questions?
Contact us