Posted On: Aug 31, 2021

Amazon Kinesis Data Firehose can now continuously partition streaming data by keys within data like “customer_id” or “transaction_id”, and deliver data grouped by these keys into corresponding Amazon Simple Storage Storage (Amazon S3) prefixes, making it easier for you to perform high performance, cost efficient analytics on streaming data in Amazon S3 using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

Data partitioning is a best practice for optimizing performance and reducing the cost of analytics queries on Amazon S3 because partitions minimize the amount of data scanned. Partitioning also increases the granularity in which you can control access to data. For example, you may want to grant specific applications access to data containing “customer_id” or by a “device_type.IoT”. Traditionally, customers use Kinesis Data Firehose delivery streams to capture and load their data streams into Amazon S3. To partition a streaming data set for Amazon S3 based analytics, customers then ran partitioning applications between buckets prior to making the data available for analysis. This additional partitioning step contributed to insight delays of minutes or hours, increased cost, and complicated architectures.

Now with Dynamic Partitioning, Kinesis Data Firehose will continuously group data in-transit by dynamically or statically defined data keys, and deliver to individual Amazon S3 prefixes by key. This will reduce time-to-insight by minutes or hours, reducing costs and simplifying architectures. This feature combined with Kinesis Data Firehose's Apache Parquet and Apache ORC format conversion feature makes Kinesis Data Firehose an ideal option for capturing, preparing, and loading data that is ready for Amazon S3 analytics.

Visit the Kinesis Data Firehose user guide to get started with dynamic partitioning, or visit the pricing page to learn more about on-demand pricing for dynamic partitioning. Dynamic partitioning can be used in Amazon Web Services China (Beijing) Region, operated by Sinnet and Amazon Web Services China (Ningxia) Region, operated by NWCD as well as all commercial Amazon regions where Kinesis Data Firehose is available.