Amazon Data Firehose

Prepare and load real-time data streams into data stores and analytics tools

Amazon Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon OpenSearch Service (successor to Amazon Elasticsearch Service), generic HTTP endpoints, and service providers like Datadog, New Relic, and MongoDB. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, transform, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security.

You can easily create a Firehose stream from the Amazon Web Services Management Console, configure it with a few clicks, and start sending data to the stream from hundreds of thousands of data sources to be loaded continuously to Amazon Web Services – all in just a few minutes. You can also configure your Firehose stream to automatically convert the incoming data to columnar formats like Apache Parquet and Apache ORC, before the data is delivered to Amazon S3, for cost-effective storage and analytics.

With Firehose, you only pay for the amount of data you transmit through the service, and if applicable, for data format conversion and VPC delivery. There is no minimum fee or setup cost.

Benefits

Easy to use

Amazon Data Firehose provides a simple way to capture, transform, and load streaming data with just a few clicks in the Amazon Web Services Management Console. You can simply create a Firehose stream, select the destinations, and you can start sending real-time data from hundreds of thousands of data sources simultaneously. The service takes care of stream management, including all the scaling, sharding, and monitoring, needed to continuously load the data to destinations at the intervals you specify.

Integrated with Amazon Web Services services and service providers

Amazon Data Firehose is integrated with Amazon S3Amazon Redshift, and Amazon OpenSearch Service. IIt can also deliver data to generic HTTP endpoints and directly to service providers like Datadog, New Relic, MongoDB, and Splunk. From the Amazon Web Services Management Console, you can point Firehose to an Amazon S3 bucket, Amazon Redshift table, or Amazon OpenSearch Service domain. You can then use your existing analytics applications and tools to analyze streaming data.

Serverless data transformation

Amazon Data Firehose enables you to prepare your streaming data before it is loaded to data stores. With Firehose, you can easily convert raw streaming data from your data sources into formats required by your destination data stores, without having to build your own data processing pipelines.

Near real-time

Amazon Data Firehose captures and loads data in near real time. It loads new data into Amazon S3, Amazon Redshift, and Amazon OpenSearch Service within 60 seconds after the data is sent to the service. As a result, you can access new data sooner and react to business and operational events faster.

No ongoing administration

Amazon Data Firehose is a fully managed service which automatically provisions, manages and scales compute, memory, and network resources required to load your streaming data. Once set up, Data Firehose loads data continuously as it arrives.

Pay only for what you use

With Amazon Data Firehose, you pay only for the volume of data you transmit through the service, and if applicable, for data format conversion. There are no minimum fees or upfront commitments.

Use cases

Amazon Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, and Amazon OpenSearch Service, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. Below are examples of key use cases that our customers tackle using Amazon Data Firehose.  

IoT Analytics

With Amazon Data Firehose, you can capture data continuously from connected devices such as consumer appliances, embedded sensors, and TV set-top boxes. Firehose loads the data into Amazon S3, Amazon Redshift, and Amazon OpenSearch Service, enabling you to provide your customers near real-time access to metrics, insights, and dashboards.

Clickstream Analytics

You can use Amazon Data Firehose to enable delivery of real-time metrics on digital content, enabling authors and marketers to connect with their customers in the most effective way. You can stream billions of small messages that are compressed, encrypted, and delivered to Amazon OpenSearch Service and Amazon Redshift. From there, you can aggregate, filter, and process the data, and refresh content performance dashboards in near real-time. For example, Hearst Corporation built a clickstream analytics platform using Firehose to transmit and process 30 terabytes of data per day from 300+ websites worldwide. With this platform, Hearst is able to make the entire data stream - from website clicks to aggregated metrics - available to editors in minutes.

Log Analytics

Log data from your applications and servers running in the cloud and on-premises can help you monitor your applications and troubleshoot issues quickly. For example, you can detect application errors as they happen and identify root cause by collecting, monitoring, and analyzing log data. You can easily install and configure the Amazon Kinesis Agent on your servers to automatically watch application and server log files and send the data to Firehose. Firehose continuously streams the log data to Amazon OpenSearch Service, so you can visualize and analyze the data with Kibana.