We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
How VMware consolidated a multi-tenant cloud asset data store on Amazon Aurora MySQL with Amazon RDS Proxy
This post is co-written with Peter Fein, Staff Engineer 2 at VMware
Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud. Aurora combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open-source databases.
RDS Proxy is a fully managed, highly available database proxy for
In this post, we discuss how VMware Tanzu CloudHealth consolidated and migrated the databases to
VMware Tanzu CloudHealth’s asset data store: pre-Aurora architecture
The platform’s previous data store architecture was based on self-managed MySQL databases. These databases stored customer cloud asset data, usage metrics, and operational data. Before the migration, they were managing 166 MySQL version 5.6 shards on C5 EC2 instances. Each shard was composed of a primary and a standby replica, but only the primary had operational load. Sharding is implemented at the customer level with groups of customers co-located on a single database (MySQL schema). Each schema has a unique name and contained an average of 200 customers pre-migration. The databases were under continuous high load and shards were managed using complex
Key drivers to modernize the cloud asset data store
Over the past decade, the customer base has grown significantly, resulting in increase of database capacity that needed to be self-managed. This growth also caused database operational, scaling, and resiliency issues. As a result, VMware started looking at options to migrate to cloud-managed databases.
The following key factors drove the push for modernization:
- Move away from single EC2 instance MySQL limitations. Difficulty provisioning EC2 instances of the right size due to variable load and storage needs
- MySQL
binlog replicas were only used for failover, not for read scaling - Shard expansion required a platform maintenance outage window
- Large customer and partner tenant onboarding required manual intervention to get proper balancing across shards
- Schema changes across 166 shards reduced velocity of code deployments
- Complex Chef recipes to configure databases required specialized knowledge
Why they migrated to Aurora and RDS Proxy
The VMware Tanzu CloudHealth organization had been discussing the need to do a database modernization project for a long time. As a long-time Amazon Web Services customer, they looked at the RDS family and found Aurora MySQL to be a good fit. They considered several features to match requirements, as detailed in the following figure.

Amazon Aurora Features
From the above set of features, following features were important for the migration:
- High performance and push button compute scalability
- Storage auto scaling
- Amazon Aurora Parallel Query for Aurora MySQL
- Multi-AZ deployments with Amazon Aurora Replicas
- Fault-tolerant and self-healing storage
- Amazon RDS Blue/Green Deployments
- Optimized I/O costs
In the following sections, we walk through the phases of VMware’s planning and migration approach.
Phase 1: Compliance and performance testing
VMware used the MySQL slow query log feature to capture a large sample of SQL queries over a 24-hour period of time and built a tool to replay the read queries in the logs and capture response times against both Aurora and EC2 MySQL. This helped them move towards shard consolidation to optimize top long-running SQL queries, database instance resources (CPU, memory, and I/O) utilization, and overall performance.
The result of this phase was as follows:
- Aurora MySQL 5.7 compatible version was 20% faster in first row response time under load with one primary and one reader instance when compared with MySQL 5.6 on EC2 C5.
- The exact hardware used in the test was a c5.9xlarge with 36 vCPU compared to Graviton r6g.4xlarge with 16 vCPU. With less than half the vCPU capacity, Aurora and Graviton still bested the EC2 instance.
- No application code (SQL syntax) changes were required to use the Aurora MySQL 5.7 compatible version upgraded database. As a result, it was easy to test the application using Aurora MySQL.
At the end of this phase, the VMware team was confident in terms of performance and compatibility to move forward with the Aurora MySQL database migration.
Phase 2: First Aurora shards and internal customers
During this phase, VMware added the first Aurora cluster to the database farm. They then initialized some of their tenants on this shard for a full integration testing period. During this phase, VMware also started building out an operational environment for Aurora MySQL as they moved away from Chef-based Amazon EC2 deployment. This modernization phase enhanced database operations by leveraging other Amazon Web Services services such as
The VMware team also started to work on Aurora optimal configuration with parameter group settings. Components constructed during this time included:
- A provisioning system using
Terraform to simplify the creation of Aurora clusters with size-dependent parameter group settings.Amazon Route 53 records that are used to provide compatibility with their application DNS. These are automatically provisioned as well. - An Amazon Web Services Lambda function to modernize local data integrity check scripts and local MySQL initialization scripts that were part of Chef recipes.
- A monitoring dashboard and alerting based on
Amazon CloudWatch data. - Database migration process that used scripts to back up (logical backup method, namely the
mydumper and myloader migration tool ) and restore the Amazon EC2-based MySQL database into Aurora MySQL cluster. Standard MySQL binlog replication was used to synchronize ongoing write changes to the new Aurora shard before cut over. This process also allowed for consolidation as multiple MySQL databases were migrated onto a single Aurora cluster. The logical Database structure was maintained. - Adoption of
Amazon RDS Performance Insights across the organization to help diagnose database performance issues and make the overall database monitoring process easier.
Phase 3: Start shard migration
During this phase, the team began migrating customer shards to Aurora. As the migration occurred, the applications were tuned to leverage the Aurora architecture. Accomplishments included:
- Modification of common database access libraries to move read loads to the Aurora cluster reader endpoint. This allowed applications to access the reader endpoint with minimal code changes.
- Integration of RDS Proxy in front of the Aurora clusters to better manage connection load and improve database availability. RDS Proxy also provides support for upsizing, parameter group changes, and minor version upgrades with near zero downtime.
- With RDS Proxy, VMware was able to initiate a consolidation plan that migrated three EC2 shards onto a single Aurora cluster. In this case, RDS Proxy connection multiplexing allows significantly higher client connection counts to each Aurora cluster without impacting performance.
- Normalization of MySQL client connection parameters across the application ecosystem to make better use of RDS Proxy connection pools.
Phase 4: Migration final phase and wrap up
The final phase included the following components:
- Conduct a metrics review of all Aurora clusters for the final consolidation plan and proper instance sizing
- Migrate the last remaining shards in the system, including some of the largest at over 1 TB of total storage
- Continued remapping of DB read load across applications to Aurora reader endpoints
- Purchase
Amazon RDS Reserved Instances to optimize Aurora database instance costs - Train and education for team members on the new Aurora stack
Benefits of consolidation to Aurora
At the end of the project, the consolidated multi-tenant customer cloud asset database farm on Aurora achieved 3:1 consolidation compared to the standalone MySQL databases. This was accomplished with all schemas maintained and no consolidation of individual tables. All application frameworks across the VMware Tanzu CloudHealth application ecosystem were modified to use Aurora reader endpoints. The new provisioning system built in Terraform allows for easy creation of Aurora clusters, RDS Proxy, RDS parameter groups, and Route 53 entries with a few lines of code.
The following are some key metrics resulting from migration to Aurora MySQL:
- Post-migration Aurora instance count is 124 r6g.4xlarge and r6g.8xlarge Aurora Graviton2 instances that make up 62 two-node clusters (nodes spread across 2 AZ’s for high availability). This replaced 166 EC2 shards on 332 C5 instances for a total reduction of 63% in required vCPU.
- The migration resulted in a 20% cost reduction that continues to increase over time.
- RDS Proxy supports heavy application connection load, including intensive horizontally scaled applications deployed on
Amazon Elastic Kubernetes Service (Amazon EKS) clusters. They achieved 10:1 connection compression during load spikes with flat stable DB connection counts. - The Aurora query cache is a big part of the performance gains over MySQL. Across all shards an average 40% of all queries are cached.
- Large savings by operations person factor. Database Administration Team members could focus on new strategic projects instead of keeping the previous system up and running. Reduction in pages and operational issues boosted team morale and time spent working off hours.
- Minor engine upgrades completed with minimal downtime. The farm is currently running on the latest Aurora version 2 minor release.
- Simplified monitoring with CloudWatch and Performance Insights.
- Responsive support from Amazon Web Services:
- Weekly project review meetings with the Aurora and RDS Proxy team members during core migration phase.
- Provided SQL query optimization advice.
- Enhanced RDS Proxy to better support our use cases with schema consolidation.
Lessons learned
They had the following takeaways:
- Managing large numbers of Aurora clusters requires a strategy for managing parameter groups for both clusters and instances. You need to allow for parameter tuning for different instance sizes and may want per-cluster overrides. Things that required tuning in their environment are the common MySQL timeout, buffer sizes and max connections parameters.
- Take the time to understand all Aurora parameter group options and what they mean at both the cluster and instance level.
- When making application-related changes and configurations, keep in mind RDS Proxy performance. RDS Proxy performs best with consistency across connections parameters to maximize pooling and reduce pinning.
- It can be complicated and time-consuming to modify existing legacy applications to take advantage of Aurora reader endpoints, if the application doesn’t support read and write split capabilities. Be sure to do proper analysis of applications to ensure they can best use the Aurora cluster architecture.
Conclusion
In this post, we saw how VMware Tanzu CloudHealth modernized the primary data store for customer cloud asset data to support continued business growth. The modernized database solution on Aurora is highly resilient, scalable, and performant, and resulted in 20% cost savings.
Consider evaluating Aurora for your current database workload requirements to take advantage of innovative features like Global Databases, serverless compute, and cross-Amazon Web Services Region disaster recovery. Also, review
About the Authors
Peter Fein is Staff Engineer 2 at VMware, Cloud Management Unit. He works on the VMware Tanzu CloudHealth platform, architecting and engineering its data storage layers. He is very passionate about building and operating scalable SaaS application.
Rajesh Matkar is a Principal Partner Database Specialist Solutions Architect at Amazon Web Services. He works with Amazon Web Services Technology and Consulting partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions.
Sahil Thapar is a Principal Solutions Architect. He works with ISV customers to help them build highly available, scalable, and resilient applications on the Amazon Web Services Cloud. He is currently focused on containers and machine learning solutions.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.