How to handle Amazon Web Services DMS replication when used with Oracle database in fail-over scenarios

by Prashanth Ramaswamy, Jeemy Patel, and Saumya Mula | on

Amazon Web Services Database Migration Service (Amazon Web Services DMS) is used for migrating databases to Amazon Web Services, but you can also use it for replicating data with low latency from any supported source to any supported target .

When working with Oracle database, Amazon Web Services DMS stops replicating data changes due to reset logs when the standby database is transitioned to the primary database during a fail-over of the primary database.

In this post, we describe the sequence of the necessary steps to overcome this issue and also demonstrate how to handle Amazon Web Services DMS source endpoints during database failover. We also demonstrate how to add an extra connection attribute to support database fail-over scenarios and the steps you need to take before you re-point Amazon Web Services DMS to the standby database.

Solution Overview

The following diagram illustrates our fail-over architecture.

For Amazon Web Services DMS to keep replicating data changes from an Oracle standby database to Oracle Primary database, after a database fail-over, we add the supportResetlog=TRUE parameter as an extra connection attribute in the Amazon Web Services DMS source endpoint settings. By adding this parameter, Amazon Web Services DMS is able to read the new set of archive log sequence numbers generated post database fail-over.

Should a fail-over happen while we’re using Amazon Web Services DMS with a standby database without setting supportResetlog=TRUE, we would see the following error on the Amazon Web Services DMS task logs

Errorlog

Amazon Web Services DMS when used with Oracle Data Guard setup, can take advantage of Oracle Standby database to offload the workload from the primary database. However, should a fail-over occur, the DMS replication tasks connected to the Oracle Standby database would stop replicating changes as a result of an event of reset logs, which is not supported with standby databases. In an Oracle Data Guard environment, the standby database is in sync with the primary database by receiving and applying redo data from the primary database. If the primary database becomes unavailable, you can use Oracle Data Guard to switch or fail over the standby database to the primary role.

To overcome this issue, we need to add an extra connection attribute supportResetlog=TRUE to Amazon Web Services DMS source endpoint settings repoint Amazon Web Services DMS source to primary database and resume the DMS task either by specifying custom CDC start point or a log sequence number or by specifying a recovery checkpoint. For more information, refer to Creating tasks for ongoing replication using Amazon Web Services DMS .

Note: You need to set supportResetlog=TRUE if your Amazon Web Services DMS version is 3.4.7 and below, since this parameter by default is set to TRUE in future releases.

In the following sections, we walk through the process of adding an extra connection attribute (supportResetlog=TRUE) to the Amazon Web Services DMS source endpoint. We then reconnect Amazon Web Services DMS to the standby database after the change data capture (CDC) of Amazon Web Services DMS tasks bypasses the sequence numbers of archive logs that have been generated post database fail-over.

Prerequisites

For this walk-through, the following prerequisites are necessary:

  • An Amazon Web Services account .
  • Oracle databases in a primary and standby setup and in real-time sync. For instructions, refer to Creating a Physical Standby Database .
  • Replication instance in Amazon Web Services DMS should be able to access primary and standby hosts. For instructions, refer to Setting up a network for a replication instance
  • Source and destination target endpoints. For instructions, refer to Creating source and target endpoints .
  • An Amazon Web Services DMS migration task. For instructions, refer to Creating a task .

Create source and target endpoints

Based on the Oracle Data Guard setup, we configure the source endpoint to connect to the standby database and the target endpoint to any supported target.

For this post, we use an Oracle standby database as the source and Amazon Simple Storage Service (Amazon S3) as the target.

Configure the source endpoint to connect to the new primary database host and add the extra connection attribute

After a fail-over of the database, we want Amazon Web Services DMS source endpoint to connect to the new primary database host and add the supportResetlog=TRUE parameter to the extra connection attribute of the source endpoint. Complete the following steps:

  1. On the Amazon Web Services DMS console, choose Endpoints in the navigation pane.
  2. Choose the source endpoint you want to modify.
  3. For Server name, enter the name of the primary database host.
    pciture3
  4. Select Use endpoint connection attributes.
  5. Add the supportResetlog=TRUE parameter in the Extra Connection Attributes.

During fail-over or switchover scenarios, there is a possibility that new primary might be using different archive DEST_ID values compared to original primary. In this case, Amazon Web Services DMS needs information about what destination to get archive redo logs from to read changes.

Oracle uses random DEST_ID which ranges from 0 to 6 . Therefore it’s a best practice to add the ExtraArchivedLogDestIds=[1,2,3,4,5,6] to cover DEST_ID from 1 to 6 and useZeroDestid=true parameters as well to cover DEST_ID value as 0.

The following screenshot has been taken after Use Endpoint Connection Attributes has been Enabled and values has been added to Extra Connection attributes.

picture4

For more information about extra connection attributes, refer to Extra connection attributes when using Oracle as a source for Amazon Web Services DMS .

Resume the Amazon Web Services DMS task and verify the logs

When you resume a task, Amazon Web Services DMS continues replication from the last point before the task stopped or failed. When you restart a task, Amazon Web Services DMS begins replication from the start, and uses the table preparation mode that you chose when you created the task.

For this scenario, we need the task to continue from where it left off and therefore we will resume the Amazon Web Services DMS task and verify the logs in Amazon CloudWatch . Complete the following steps:

  1. On the Amazon Web Services DMS console, choose Database migration tasks in the navigation pane.
  2. Select the task you want to resume and on the Actions menu, choose Restart/Resume.
  3. Select Resume and choose Start task.
    picture5
  4. Choose the name of your task, then review the Overview details section.
  5. Choose View CloudWatch logs.

In the following screenshot, you can see the archived redo log sequence processing from sequence number 5, which indicates that the Amazon Web Services DMS process started reading the new sequence archive logs of the database post fail-over.

picture6

For more information about viewing your Amazon Web Services DMS task details, refer to How do I use Overview details to access my Amazon Web Services DMS task information .

Stop the Replication Task and Configure the Amazon Web Services DMS source endpoint back to the standby database

At this point, it’s safe to reconfigure the Amazon Web Services DMS source endpoint back to the standby database to avoid any performance impact to the primary database.

  1. On the Amazon Web Services DMS console, choose Database migration tasks in the navigation pane.
  2. Select the task you want to stop and on the Actions menu, choose Stop.
  3. On the Amazon Web Services DMS console, choose Endpoints in the navigation pane.
  4. Choose the source endpoint you want to modify.
  5. For Server name, enter the name of the standby database host.
    picture7

Resume the Amazon Web Services DMS task

Now you can resume the Amazon Web Services DMS task.

  1. On the Amazon Web Services DMS console, choose Database migration tasks in the navigation pane.
  2. Select the task you want to resume and on the Actions menu, choose Restart/Resume.
  3. Select Resume and choose Start task.
    picture5

The following screenshot was taken after resuming the task on the standby Database. We can also see in the last line of screenshot that CAPTURE process is reading from ONLINELOG which means there is NO lag in our replication and redo changes are current.

picture8

Conclusion

Replication using Amazon Web Services DMS is a common solution for many customers, and an Oracle standby database is a common method to capture data changes by Amazon Web Services DMS.

In this post, we demonstrated how to handle Amazon Web Services DMS replication task in a fail-over scenario, with the help of extra connection attribute supportResetlog=TRUE .We also discussed the steps needed to point Amazon Web Services DMS back to the standby database, which can help offload the primary database in a sensitive production environment.

We invite you to leave feedback in the comments sections in this post.


About the Author


Prashanth Ramaswamy
is a Senior Database Consultant with the Professional Services team at Amazon Web Services. Prashanth focuses on leading the database migration efforts to Amazon Web Services as well as provide technical guidance including cost optimization,monitoring and modernization expertise to Amazon customers.


Jeemy Patel
is a Database Consultant with the Professional Services team at Amazon Web Services. Jeemy helps customer with migration to Amazon Web Services, performance optimization as well as provide technical guidance on various Disaster Recovery solutions for Amazon customers.


Saumya Mula
is a Senior Database Consultant with the Professional Services team at Amazon Web Services. She provides overall guidance on database migrations from on-premises to Amazon Web Services along with automation , cost management and performance tuning of the critical production systems for Amazon customers.