Improve application performance on Amazon RDS for MySQL and MariaDB instances and MySQL Multi-AZ DB clusters with Optimized Writes

by Chelluru V N S S Vidyadhar | on

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale deployments of MySQL and MariaDB in the cloud. Amazon RDS customers run different types of workloads on Amazon RDS for MySQL and Amazon RDS for MariaDB . You can use read replicas to scale read options of their workloads, but scaling writes on applications is always challenging. It requires implementing solutions, such as sharding, which needs application-level changes and additional instances that incur additional costs, depending how it’s implemented.

In this post, we discuss how the Optimized Writes feature can be helpful to scale the application’s write throughput to achieve better write performance on RDS for MySQL and RDS for MariaDB running on Single-AZ, Multi-AZ and MySQL Multi-AZ DB Cluster deployments . We also discuss Optimized Writes use cases and best practices.

What is Optimized Writes?

By default, popular Linux file systems use a block size of 4 KB because storage devices normally provide atomicity for writes that are aligned with 4 KB blocks. However, MySQL and MariaDB (InnoDB storage engine) use a page size of 16 KB by default to store table data. This means each page write from the database involves multiple 4 KB block writes to the underlying file system. If there is an operating system crash or power loss during write transactions, it can encounter incomplete or torn page writes. To protect against this and make sure that ACID properties are enforced, InnoDB uses the doublewrite buffer (DWB). For every page written to the data files in your database instance, InnoDB first writes them to this doublewrite buffer. When written there, the pages are persisted to their respective datafiles.

If there is an operating system crash or failure in the storage subsystem in the middle of the page writes, InnoDB can find the good copy of the page from the doublewrite buffer during the crash recovery process. The writes to the doublewrite buffer are performed in a large sequential chunk using a single fsync() system call, which causes additional overhead on the engine and limits the overall throughput of the database system.

With the Optimized Writes feature, the file system can write 16 KB blocks (pages) to provide torn write protection, which means the doublewrite buffer in InnoDB is no longer needed to provide torn page protection. By disabling the doublewrite buffer, the database engine can reduce write overhead. This leads to improved transaction processing and latency. The actual changes are implemented using torn write prevention for Amazon Web Services Nitro instances (at the hardware layer) to provide the torn page protection. By disabling the doublewrite buffer, the database engine can improve the number of transactions processed per second and the overall throughput of the database.

How to take advantage of optimized writes

Optimized Writes is supported for Amazon RDS for MySQL 8.0.30 and higher and Amazon RDS for MariaDB 10.6.10 and higher versions. This feature is enabled by default on instances and Multi-AZ Clusters running the supported engine version and instance class. For the list of supported instance classes, refer to the documentation for Amazon RDS for MySQL and Amazon RDS for MariaDB . The feature is supported for new DB instances and new Multi-AZ clusters launched with the supported configurations. If required, the RDS Optimized Writes feature can be disabled on the database instances by setting the parameter rds.optimized_writes to OFF.

To understand the current limitations associated with the feature, refer to Limitations of Amazon RDS Optimized Writes for MySQL and Limitations of Amazon RDS Optimized Writes for MariaDB .

Effective IOPS utilization

Without Optimized Writes, the engine needs to perform additional page writes to the doublewrite buffer. It causes additional I/O load on the instance and limits the overall throughput on the instance depending on the instance class, storage, and database configurations. However, with Optimized Writes enabled, the engine runs without the doublewrite buffer by maintaining the atomicity at the hardware layer, which eliminates additional writes. The engine can effectively use the available IOPS to achieve more throughput and improve the database’s overall write performance.

The writes to the doublewrite buffer don’t result in twice as much I/O overhead or twice as many IO operations, and these are performed in a large sequential chunk with a single fsync() operation. Unless pages are written to the doublewrite buffer, the engine doesn’t flush the changes to the respective data files. This slows down the data flush operations on write-heavy instances with highly concurrent workloads. Here, due to the doublewrite buffer, the overhead increases linearly on highly concurrent workloads. By disabling the doublewrite buffer, there is significant improvement on the database throughput because it eliminates the additional write operations.

When this feature is enabled (when the doublewrite buffer is disabled) on a Multi-AZ instance, additional overhead to replicate doublewrite buffer writes to the secondary instance is also removed. This provides better write performance for workloads running on Multi-AZ instances with the Optimized Writes feature enabled compared to the instance running with the feature disabled.

For Multi-AZ DB Cluster , Optimized Writes will help improve throughput in addition to the latency improvements that the cluster offers.

We look into the benchmarks provided later in this post to understand the overall throughput difference between instances with Optimized Writes enabled and disabled and their IOPS utilization accordingly.

Identify workloads that can benefit from Optimized Writes

Before choosing RDS Optimized Writes for your workloads, it’s important to check certain aspects that can help you understand the overall improvement gained at the database layer.

We can monitor the number of pages written and number of writes performed to the doublewrite buffer. This can give us some clarification on the additional burden on the instance.

In Amazon RDS for MySQL and Amazon RDS for MariaDB, we can monitor the doublewrite buffer usage with help of the following global status variables . These can help to monitor the overhead on the instance with doublewrite buffer enabled.

  • innodb_dblwr_writes – The number of doublewrite operations that have been performed
  • innodb_dblwr_pages_written – The number of pages that have been written for doublewrite operations
  • innodb_data_fsyncs – The total number of fsync() operations performed so far
  • innodb_data_pending_fsyncs – The current number of pending fsync() operations

We can measure the doublewrite buffer usage by using Amazon RDS Performance Insights and selecting innodb_rows_changes, innodb_data_writes, and innodb_dblwr_writes metrics from the database metrics section. For example, the following graph shows the metrics during the run of a sample workload on an instance with Optimized Writes disabled (an instance running with the doublewrite buffer enabled).

On Amazon RDS for MySQL (8.0), the doublewrite buffer is located outside the system tablespace. The doublewrite buffer data is stored in multiple files inside the data directory. When performance_schema is enabled and configured on the instance, we can monitor the I/O latency on doublewrite files using the following query. It can help you check the number of writes on the files and latency associated with respective write operations.

select FILE_NAME, COUNT_WRITE AS count_write, format_pico_time(AVG_TIMER_WRITE) AS average_write_latency, format_pico_time(MAX_TIMER_WRITE) AS max_write_latency, format_pico_time(MAX_TIMER_WRITE) AS max_write_latency, format_pico_time(SUM_TIMER_WRITE) as sum_write_latency from performance_schema.file_summary_by_instance where FILE_NAME such as '%dblwr' order by AVG_TIMER_WAIT desc;

You can also check I/O waits on doublewrite buffer on an RDS for MySQL or RDS for MariaDB instance using the following query. Depending on the type of workload running on the instance, the doublewrite buffer can be one of the wait events with higher average latency, indicating contention.

select * from performance_schema.events_waits_summary_global_by_event_name order by AVG_TIMER_WAIT desc;

From the preceding query output, you can monitor wait times on specific events in comparison to other top wait events on the instance. In MySQL, we see higher waits on either wait/io/file/innodb/innodb_dblwr_file or wait/synch/mutex/innodb/dblwr_mutex events. In MariaDB, we can monitor waits on wait/synch/mutex/innodb/buf_dblwr_mutex accordingly.

Consider monitoring your Amazon CloudWatch metrics, especially write IOPS, to understand the usage. If you notice that write IOPS usage is increasing gradually due to changes in the write workload on the instance, evaluate Optimized Writes to check whether it can scale the write throughput of your applications.

In any case, it’s recommended to compare the performance of your application workload against the Optimized Writes instance to better understand the performance gains that can be achieved. The performance gains can be measured in terms of overall throughput achieved by the database instance or number of write queries processed by the database instance over a specific period of time.

Key aspects for Optimized Writes

In this section, we discuss some keys aspects to consider while using Optimized Writes:

  • The feature can benefit tables using the InnoDB engine because the doublewrite buffer is implemented for the InnoDB engine in MySQL and MariaDB. If the database instance uses any other storage engines such as memory or MyISAM, the performance improvement isn’t visible for the operations on these tables with Optimized Writes.
  • The write performance improvements with this feature should be measured in terms of overall write throughput achieved by the database instance in a specific period of time rather than comparing individual queries’ performance. By disabling the doublewrite buffer, the background flushing performance can be improved. InnoDB automatically adjust the rate of background flushing based on the workload, and this feature helps keep up with the changes in the workload.
  • This feature is designed to help applications that perform heavy writes on database instances. The database read queries (SELECT) read pages directly from the data files and don’t read any pages from the doublewrite buffer, which is used for the crash recovery process. Therefore, disabling the doublewrite buffer doesn’t have any impact on the performance of the read-only queries.
  • This feature helps avoid additional write operations. Changing (reducing) the number allocated IOPS to the instance isn’t recommended because it may impact the overall performance on the database instance, because every write operation (especially update and delete operations) performs reads from the respective data files. Also, the database needs to serve regular read queries (SELECT) to serve the application requirements.
  • This feature helps increase the throughput of write operations being run on the database. To take better advantage of the same, the application should be able to push a greater number of write operations per second. Also, the workload shouldn’t cause any conflicting locks that can lead to timeouts or deadlocks, and eventually impact the overall throughput of the applications.

Use cases for Optimized Writes

The following use cases are candidates for RDS Optimized Writes:

  • An application that serves a high volume of concurrent write transactions can take advantage of this feature. It’s also helpful to the applications where sudden surges in write traffic are expected.
  • It can be used by applications that are planning to scale write throughput without any changes to the application code or architecture. In many cases, until long-term plans such as partitioning or scaling are implemented to enhance the write scalability of the application, Optimized Writes can be ready to use the solution. Also, utilizing this feature on applications with sharded databases can help increase the throughput served by respective shard database instances.
  • Enabling this feature on read replica instances can help minimize the replication lag by improving the write performance on the instance. In addition to enabling this feature, consider enabling parallel replication if the workload can better use the same to minimize the replication lag.
  • Applications using Multi-AZ instances and Multi-AZ Cluster deployments can take advantage of this feature.
  • Applications that perform data loading operations ( LOAD DATA LOCAL INFILE ) or bulk inserts into tables frequently can benefit from the feature.

Performance improvements

After evaluating the use cases and workload patterns against application workloads, it’s recommended to benchmark the write performance improvements achieved by Optimized Writes.

Consider performing a quick benchmarking to understand the write performance between instances with Optimized Writes enabled and disabled. As we discussed earlier in this post, the feature is enabled by default when the supported engine version, instance class, and parameters are used. However, if needed, you can disable the same by changing the rds.optimized_writes parameter to OFF (the default value is AUTO). Also, we now know that enabling Optimized Writes means the doublewrite buffer on the engine is disabled and vice versa.

For this benchmark, we launched two RDS for MySQL instances with the following configurations:

  • Engine version – 8.0.31
  • Instance class – db.r6g.8xlarge
  • Storage type – IO1
  • Provisioned IOPS – 50,000
  • Multi-AZ – Enabled
  • Automated backups – Enabled

On one of the instances, consider disabling Optimized Writes by disabling rds.optimized_writes in your custom parameter group. For more information, refer to Overview of RDS Optimized Writes .

When both the instances are active (one instance is running with the default parameter group with Optimized Writes enabled by default and the other with a custom parameter group where Optimized Writes is disabled explicitly), we populated data using the sysbench open-source benchmark tool. We prepared the test database with 250 tables, each having 33 million rows on both the instances. Finally, we ran a sysbench write-only workload with different number of threads from 1, 2, 4… 1024 and ran the workload for 20 minutes with every thread count.

The following graph illustrates the throughput in terms of transactions per second resulting from sysbench at their respective thread counts. Here, the orange line indicates throughput on instance running with Optimized Writes enabled (DWB is disabled), and the blue line indicates throughput on instance with Optimized Writes disabled (DWB is enabled).

As we can see, the throughput gains on the instances with Optimized Write enabled are significantly higher when number of active threads running the workload increases. When the number of active threads is higher, the throughput started to increase further on the Optimized Writes instance. On the other hand, the performance on instances with the feature disabled stabilized without any noticeable changes.

For demonstration purposes, we can attempt the same workload with binary logging disabled. Binary logging has many useful use cases in MySQL, but it can introduce write amplification and contention on write-heavy workloads. In Amazon RDS for MySQL, binary logging is enabled when automated backups are enabled to provide point-in-time recovery (PITR) capabilities.

On production instances, it’s always recommended to enable automated backups (binary logs), which provide features such as automated snapshots, PITR, and the ability to create read replicas.

In the following graph, we can see that the overall throughput on an instance with Optimized Writes enabled increases with the increase in the number of active threads.

The following two graphs explain the write IOPS and read IOPS utilization on instances with Optimized Writes enabled (DWB is disabled) and Optimized Writes disabled (DWB is enabled).

The first graph shows that the instance with Optimized Writes enabled (DWB is disabled) can push a greater number of IOPS and complete the data loading faster than the instance with Optimized Writes disabled. Because DWB is disabled, the instances can write directly to the data files, which improves the write latency of the engine because of reduction in additional I/O overhead and fsync() calls. This helps the engine process a greater number of writes, which increases the IOPS utilization in turn. The application can effectively use the IOPS on the RDS instance to achieve the better throughput.

In the preceding example, you may have also noticed that the instance utilizes more read IOPS with Optimized Writes enabled. When database pages need to be modified, the engine must read pages from the disk when the pages to modify aren’t available in the cache. In this example, the increased number of reads were needed to satisfy our increased DML throughput.

Overall, the instance with Optimized Write enabled can process more transactions on the instance, which helps improve the write throughput of the applications. The amount of benefit that can be achieved depends on the type of workload being run on the database instance.

Best practices for running highly concurrent workloads in addition to Optimized Writes

Consider the following best practices:

  • Configure your RDS instance class and storage appropriately to avoid resource contention by following Amazon RDS operational guidelines . For example, an RDS instance configured to use r6g.large may not be able to help take the complete workload of your application and reach the max CPU utilization before taking full advantage of the feature.
  • Use appropriate storage and instance configurations to avoid resource throttling due to workload changes.
  • Make sure that your application workload is distributed among different database objects. If the schema uses limited objects, it can cause locking issues . This can lead to deadlocks and timeouts and reduce application throughput. Make sure that that your database is configured to manage transactions efficiently and handle deadlocks appropriately inside the InnoDB engine.
  • For an instance with Optimized Writes enabled, consider using Multi-AZ, because it helps eliminate the overhead of writing doublewrite buffer-related pages to the secondary (standby) instance to improve the write performance. The write performance improvement is higher on Multi-AZ instances with Optimized Writes enabled compared to Multi-AZ instances with Optimized Writes disabled.

Conclusion

In this post, we showed how Optimized Writes can improve the performance of write-heavy workloads. We also discussed common use cases and best practices. The Amazon RDS Optimized Writes feature is available on Amazon RDS for MySQL version 8.0.30 and higher and Amazon RDS for MariaDB version 10.6.10 and higher. We encourage you to try out this new feature for your MySQL and MariaDB workloads and see whether it improves write performance and helps scale your application’s overall throughput.

For complete details about the feature, see Improving write performance with Amazon RDS Optimized Writes for MySQL and Improving write performance with Amazon RDS Optimized Writes for MariaDB .


About the Author

Chelluru Vidyadhar is a Database Engineer with the Amazon RDS team at Amazon Web Services.