Posted On: Dec 4, 2018

Customers can now get Amazon S3 Inventory reports in Apache Parquet file format. Amazon S3 Inventory provides flat file list of objects and selected metadata for your bucket or shared prefixes. You can use S3 Inventory to list, audit and report on the status of your objects or use it to simplify and speed up business workflows and big data jobs.  

Parquet is a columnar storage file format, similar to ORC (optimized row-columnar) and is available to any project in the Hadoop ecosystem regardless of the choice of data processing framework, data model or programming language. The columnar format lets the reader read, decompress, and process only the columns that are required for the current query. For querying S3 Inventory with applications such as Amazon Athena or Redshift Spectrum or tools such as Apache Hive, Spark, HBase or Presto, we recommend configuring your inventory in either Parquet or ORC for faster query performance and lower query cost.

Parquet output format for S3 Inventory is now available in Amazon Web Services China (Beijing) Region, operated by SINNET, and in Amazon Web Services China (Ningxia) Region, operated by NWCD. You can get started by visiting the Amazon Web Services Management Console or using API, CLI or SDK to set your Inventory configuration.