Implement an automated approach for handling Amazon Web Services DMS operational events

by Felix David , Kanwar Bajwa , and Harish Bannai | on

Amazon Web Services Database Migration Service (Amazon Web Services DMS) allows you to tackle the complex task of migrating both homogenous and heterogeneous database engines. As businesses evolve, the need to adopt the most suitable database engines for their unique requirements arises. This often leads to the coexistence of various database systems, presenting challenges when it comes to seamless data migration. However, with Amazon Web Services DMS, customers have found an efficient and reliable solution to bridge the gap between different database engines.

Amazon Web Services DMS enables seamless migration and replication of databases, but sometimes errors may occur during the process. When these errors are recoverable, it’s essential to know how to efficiently resume and complete the task to ensure data integrity and minimize downtime. In this post, we explore the process of resuming recoverable error tasks for the relational database target tasks.

Overview

Amazon Web Services DMS replication tasks encompass a variety of events, including start, pause, finish, Full Load completion, CDC (Change Data Capture) initiation, error and more. These events play a crucial role in monitoring and managing database migration tasks. These events are uniquely identified by event IDs, each representing a specific occurrence during the migration process. Among these event IDs, 0078 and 0079 hold particular significance as they are utilized for resuming tasks seamlessly.

DMS-EVENT-0078 represents that the replication task has failed.

DMS-EVENT-0079 indicates that the replication task has stopped.

You can find additional information about the Amazon Web Services DMS generated event categories and event messages in the Amazon Web Services DMS Developer Guide .

When DMS generates events, it sends them to Amazon EventBridge default bus. In this solution, we will create EventBridge rule on default bus which would match incoming recoverable events and sends them to targets for processing. We use Amazon Web Services Lambda to process these events and restart failed or stopped tasks.

With this architecture, the solution will automatically resume recoverable tasks.

Prerequisites

For this walkthrough, you must complete the following prerequisites:

  • You must have an active Amazon Web Services account, which will serve as the foundation for all your cloud-based activities.
  • Have a Virtual Private Cloud (VPC) configured, as Amazon Web Services DMS operates within this isolated network environment to ensure secure data transfers.
  • Basic understanding of DMS concepts and functionalities will greatly enhance your experience with the service.
  • Basic understanding of Amazon Web Services Lambda, a service that lets you run code in response to events and run the necessary stop or resume task commands without needing to manage servers

Deployment

In this section, we walk you through how to deploy this solution. To launch the provided CloudFormation template, complete the following steps:

  1. Sign in to the console on the central account.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For Stack name, enter a name. For example, DMSResumableTaskStack .
  5. Provide Security group ID and Subnet ID for Amazon Web Services Lambda function which must provide connectivity to DMS API.
  6. Choose Next .
  7. Enter any tags you want to assign to the stack and choose Next .
  8. Select the acknowledgement check boxes and choose Submit .
    The stack takes approximately 5 minutes to complete. Wait until the stack is complete before proceeding to the testing and verification.

Note : As Amazon Web Services Lambda execution is in play, associated costs come into the picture, encompassing elements like request volume, execution duration, and the memory allocation to your Lambda functions. When the lambda restarts the task, it puts tag with timestamp with when the task is auto restarted. If the task fail again within 5 mins, lambda will not start the task. You must troubleshoot the task to understand the cause of the failure. Consider to create a case with support for further assistance.

Additional Rules

EventBridge receives an event, an indicator of a change in Amazon Web Services DMS environment, and applies a rule to route the event to a notification mechanism. Rules match events to notification mechanisms based on the structure of the event, called an event pattern. You can go to EventBridge and add additional eventID to automatically restart the tasks. To update EventBridge rule.

  1. Go to Amazon EventBridge and select DMS rule you created in previous section.
  2. Edit event rule, and add additional eventID based on your requirement and select next.
  3. Review newly added rule and select update rule.

Note: Update DMS events by incorporating DMS operational events into EventBridge rules, enabling the automatic restart of DMS tasks. It is essential to exercise caution when selecting these events, ensuring that only those that do not necessitate intervention are included in the EventBridge rule.

Testing and Verifying

  1. Go to DMS console, select migration task under Database migration tasks and select Stop.

  1. Wait for couple of minutes, this should trigger EventBridge rule to invoke lambda which would resume the task.
  2. For troubleshooting, Go to Amazon Web Services Lambda console, select ‘StackName-<Hash>’ Lambda function and select view Amazon CloudWatchLogs and look for any error.

Cleaning up

To avoid incurring future charges, go to Amazon Web Services CloudFormation Console, select stack and delete the stack.

Conclusion

In this post, we used Amazon Web Services Lambda and Amazon EventBridge to automate the process of resuming recoverable error tasks in Amazon Web Services DMS. By setting up EventBridge rules to capture specific event IDs, we can invoke Lambda functions that analyze and handle recoverable errors, thus eliminating the need for manual intervention and ensuring a smoother migration experience.

If you have questions or feedback, leave a comment.


About the authors

Felix David is a Sr. Technical Account Manager at Amazon Web Services. He works with Amazon Web Services customers to help understand their business and technical needs, align technical solutions, and achieve the greatest value from Amazon Web Services.

Harish Bannai is a Sr. Technical Account Manager at Amazon Web Services. He works with enterprise customers providing technical assistance on RDS, Database Migration services operational performance and sharing database best practices.

Kanwar Bajwa is an Enterprise Support Lead at Amazon Web Services who works with customers to optimize their use of Amazon Web Services services and achieve their business objectives.


The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.