Amazon Kinesis Data Firehose is the easiest way capture, transform, and load streaming data into data stores and analytics tools. Kinesis Data Firehose is a fully managed service that makes it easy to capture, transform, and load massive volumes of streaming data from hundreds of thousands of sources into Amazon S3, Amazon Redshift, Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), Amazon Kinesis Data Analytics, generic HTTP endpoints, and service providers like Datadog, New Relic, MongoDB, and Splunk, enabling near real-time analytics and insights.
Easy to use
You can launch Amazon Kinesis Data Firehose and create a delivery stream to load data into Amazon S3, Amazon Redshift, Amazon OpenSearch Service, HTTP endpoints, Datadog, New Relic, MongoDB, or Splunk with just a few clicks in the Amazon Web Services Management Console. You can send data to the delivery stream by writing data to the API, or use the delivery stream to consume from a Kinesis Data Stream. Kinesis Data Firehose then continuously loads the data into Amazon S3, Amazon Redshift, and Amazon OpenSearch Service.
You only pay for the volume of data you transmit through Kinesis Data Firehose. There are no minimum fees or upfront commitments.
Amazon Kinesis Data Firehose always encrypts data in transit using HTTPS, supports encryption at rest, and provides you the option to have your data automatically encrypted after it is uploaded to the destination.
Based on your ingestion pattern, Kinesis Data Firehose service might proactively increase the limits when excessive throttling is observed on your delivery stream.
Inline JSON to Parquet or ORC format conversion
Columnar data formats such as Apache Parquet and Apache ORC are optimized for cost-effective storage and analytics using services such as Amazon Athena, Amazon Redshift Spectrum, Amazon EMR, and other Hadoop based tools. Amazon Kinesis Data Firehose can convert the format of incoming data from JSON to Parquet or ORC formats before storing the data in Amazon S3, so you can save storage and analytics costs. Learn more »
Dynamically partition data during delivery to Amazon S3
Kinesis Data Firehose can continuously partition streaming data by keys within data like “customer_id” or “transaction_id”, and deliver data grouped by these keys into corresponding Amazon S3 prefixes, making it easier for you to perform high performance, cost efficient analytics on streaming data in Amazon S3 using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
Load data in near real time
You can specify a batch size or batch interval to control how quickly data is uploaded to destinations. For example, you can set the batch interval to 60 seconds if you want to receive new data within 60 seconds of sending it to your delivery stream. Additionally, you can specify if data should be compressed. The service supports common compression algorithms including GZip and Snappy. Batching and compressing data before uploading enables you to control how quickly you receive new data at the destinations.
Custom data transformations
You can configure Amazon Kinesis Data Firehose to prepare your streaming data before it is loaded to data stores. Simply select an Amazon Lambda function from the Amazon Kinesis Data Firehose delivery stream configuration tab in the Amazon Web Services Management console. Amazon Kinesis Data Firehose will automatically apply that function to every input data record and load the transformed data to destinations. Amazon Kinesis Data Firehose provides pre-built Lambda blueprints for converting common data sources such as Apache logs and system logs to JSON and CSV formats. You can use these pre-built blueprints without any change, or customize them further, or write your own custom functions. You can also configure Amazon Kinesis Data Firehose to automatically retry failed jobs and back up the raw streaming data. Learn more »
Supports multiple destinations
Amazon Kinesis Data Firehose currently supports Amazon S3, Amazon Redshift, Amazon OpenSearch Service, HTTP endpoints, Datadog, New Relic, MongoDB and Splunk as destinations. You can specify the destination Amazon S3 bucket, Amazon Redshift cluster, Amazon OpenSearch Service domain, generic HTTP endpoints, or a service provider where the data should be loaded.
Intended usage and restrictions
Your use of this service is subject to the Amazon Web Services Customer Agreement.