Features and benefits

Interactive query performance

Presto uses a custom query execution engine with operators designed to support SQL semantics. Different from Hive/MapReduce, Presto executes queries in memory, pipelined across the network between stages, thus avoiding unnecessary I/O. The pipelined execution model runs multiple stages in parallel and streams data from one stage to the next as it becomes available.

Ease of use

You can launch an Amazon EMR cluster running Presto in minutes. You don’t need to worry about node provisioning, cluster setup, configuration, or cluster tuning. Amazon EMR takes care of these tasks so you can focus on analysis. You can also use tools such as Airpal, a web-based query execution tool open-sourced by Airbnb. Airpal’s user interface simplifies data exploration and ad hoc analysis and supports features such as syntax highlighting, the ability to export results to CSV, saving queries for later use, and the ability to explore tables to visualize schema.

Integration with Amazon EMR feature set

Run interactive queries that directly access data in Amazon S3, save costs using Amazon EC2 Spot instance capacity, use EMR Managed Scaling to dynamically add and remove capacity, and launch long-running or ephemeral clusters to match your workload. You can also add other Hadoop ecosystem applications on your cluster.

ANSI SQL support

Presto supports the ANSI SQL standard, which makes it easy for data analysts and developers to query both structured and unstructured data at scale. Currently, Presto supports a wide variety of SQL functionality, including complex queries, aggregations, joins, and window functions.

Learn more about Amazon EMR pricing

Visit the pricing page