ACTS Blog Selection
We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Amazon EMR launches support for Amazon EC2 C7g (Graviton3) instances to improve cost performance for Spark workloads by 7–13%
With Amazon EMR release 6.7, you can now use
Amazon EMR runtime performance with EC2 C7g instances
We ran TPC-DS 3 TB benchmark queries on Amazon EMR 6.9 using the Amazon EMR runtime for Apache Spark (compatible with Apache Spark 3.3) with C7g instances. Data was stored in
Our results showed 13.65–18.73% improvement in total query runtime performance and 16.98–20.28% improvement in geometric mean on EMR clusters with C7g compared to equivalent EMR clusters with C6g instances, depending on the instance size. In comparing costs, we observed 7.93–13.35% reduction in cost on the EMR cluster with C7g compared to the equivalent with C6g, depending on the instance size. We did not benchmark the C6g xlarge instance because it didn’t have sufficient memory to run the queries.
The following table shows the results from running the TPC-DS 3 TB benchmark queries using Amazon EMR 6.9 compared to equivalent C7g and C6g instance EMR clusters.
Instance Size | 16 XL | 12 XL | 8 XL | 4 XL | 2 XL |
Total size of the cluster (1 leader + 5 core nodes) | 6 | 6 | 6 | 6 | 6 |
Total query runtime on C6g (seconds) | 2774.86205 | 2752.84429 | 3173.08086 | 5108.45489 | 8697.08117 |
Total query runtime on C7g (seconds) | 2396.22799 | 2336.28224 | 2698.72928 | 4151.85869 | 7249.58148 |
Total query runtime improvement with C7g | 13.65% | 15.13% | 14.95% | 18.73% | 16.64% |
Geometric mean query runtime C6g (seconds) | 22.2113 | 21.75459 | 23.38081 | 31.97192 | 45.41656 |
Geometric mean query runtime C7g (seconds) | 18.43905 | 17.65898 | 19.01684 | 25.48695 | 37.43737 |
Geometric mean query runtime improvement with C7g | 16.98% | 18.83% | 18.66% | 20.28% | 17.57% |
EC2 C6g instance price ($ per hour) | $2.1760 | $1.6320 | $1.0880 | $0.5440 | $0.2720 |
EMR C6g instance price ($ per hour) | $0.5440 | $0.4080 | $0.2720 | $0.1360 | $0.0680 |
(EC2 + EMR) instance price ($ per hour) | $2.7200 | $2.0400 | $1.3600 | $0.6800 | $0.3400 |
Cost of running on C6g ($ per instance) | $2.09656 | $1.55995 | $1.19872 | $0.96493 | $0.82139 |
EC2 C7g instance price ($ per hour) | $2.3200 | $1.7400 | $1.1600 | $0.5800 | $0.2900 |
EMR C7g price ($ per hour per instance) | $0.5800 | $0.4350 | $0.2900 | $0.1450 | $0.0725 |
(EC2 + EMR) C7g instance price ($ per hour) | $2.9000 | $2.1750 | $1.4500 | $0.7250 | $0.3625 |
Cost of running on C7g ($ per instance) | $1.930290 | $1.411500 | $1.086990 | $0.836140 | $0.729990 |
Total cost reduction with C7g including performance improvement | -7.93% | -9.52% | -9.32% | -13.35% | -11.13% |
The following graph shows per-query improvements observed on C7g 2xlarge instances compared to equivalent C6g generations.
Benchmarking methodology
The benchmark used in this post is derived from the industry-standard TPC-DS benchmark, and uses queries from the
We calculated TCO by multiplying cost per hour by number of instances in the cluster and time taken to run the queries on the cluster. We used on-demand pricing in the US East (N. Virginia) Region for all instances.
Conclusion
In this post, we described how we estimated the cost-performance benefit from using Amazon EMR with C7g instances compared to using equivalent previous generation instances. Using these new instances with Amazon EMR improves cost-performance by an additional 7–13%.
About the authors
Al MS is a product manager for Amazon EMR at Amazon Web Services.
Yuzhou Sun is a software development engineer for EMR at Amazon Web Services.
Steve Koonce is an Engineering Manager for EMR at Amazon Web Services.