We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Archive to cold storage with Amazon DynamoDB
Numerous education technology (EdTech) companies utilize DynamoDB as a persistent data store to track students’ exam scores and course progress. As students advance through various grades, their interaction with specific course materials and exam scores undergo changes. After completing a class or graduating, students significantly reduce their frequency of accessing past educational assets. Due to compliance or contractual obligations, EdTech companies must keep this data readily available to students for an extended period of time, often exceeding 5 years. This pattern extends beyond the education sector — customers across different industries face similar challenges in data access patterns. Consequently, there is a growing demand for cost-effective storage solutions in DynamoDB that maintain data accessibility.
DynamoDB organizes data into tables, offering two distinct storage classes:
For a detailed pricing example between the DynamoDB Standard table class and Standard-IA table class you can view our pricing page’s
In this post, we explore the process of creating a customized solution that uses
Solution overview
By combining the power of DynamoDB Streams and Lambda, we can capture changes made to the Standard table and trigger specific actions based on those changes. With the help of TTL, we can automatically mark data as expired in the Standard table when it reaches a certain age and generate a record in DynamoDB streams containing the expired data. Then, with Lambda event filtering, we can selectively process only the expired data events from the DynamoDB streams. This filtering mechanism allows us to efficiently handle and migrate the expired data to the Standard-IA table while avoiding unnecessary processing and costs.
The following diagram illustrates the solution architecture.
The workflow contains the following steps:
- DynamoDB TTL deletes expired items from DynamoDB Standard tables based on an item attribute.
- DynamoDB Streams generates stream records containing the expired items.
- Lambda processes the deletion event from DynamoDB Streams. With Lambda event filtering, Lambda is only invoked by deletion events from DynamoDB TTL.
- The data is written to the DynamoDB Standard-IA table.
Delete data with DynamoDB TTL
DynamoDB TTL offers a convenient way to manage the lifecycle of your data in DynamoDB. With TTL, you can assign a timestamp to each item in your table, indicating when it is considered expired or no longer needed. After the specified timestamp, DynamoDB automatically removes the item from the table, eliminating the need for you to manually delete it. The primary benefit of TTL is that it allows you to reduce stored data volumes by eliminating outdated or irrelevant items with no operational overhead. This can be particularly useful in scenarios like the one outlined earlier, where you have large amounts of data that become outdated over time. You can keep your table lean and ensure that you’re only retaining the most relevant and current data for your workload by automatically removing expired items.
Importantly, DynamoDB deletes expired items without consuming any write throughput, meaning you won’t incur additional costs or impact performance when removing outdated data. When utilizing DynamoDB global tables with TTL, DynamoDB replicates TTL deletes across all replicas. This incurs write costs as per your table’s configured capacity mode and table class for replicated TTL deletes in each replica region.
Overview of DynamoDB Streams
DynamoDB Streams provides a time-ordered log containing changes to items in a DynamoDB table. When an application creates, updates, or deletes an item in a table, a record of the modification is written to the table’s corresponding stream.
By default, DynamoDB Streams collects the following actions performed on DynamoDB items:
- INSERT – A new item was added to the table
- MODIFY – One or more of an existing item’s attributes were modified
- REMOVE – An item was deleted from the table
Users can choose what data to capture from the following options:
- Key attributes only – Only the key attributes of the modified item
- New image – The entire item, as it appears after it was modified
- Old image – The entire item, as it appeared before it was modified
- New and old images – Both the new and the old images of the item
With DynamoDB Streams, you can natively collect information of the items that are expired by TTL that can be used for further processing.
When items are expired with DynamoDB TTL, they create a record in DynamoDB streams with the following fields:
-
Records[ <index> ].userIdentity.type "Service"
-
Records[ <index> ].userIdentity.principalId "dynamodb.amazonaws.com"
These properties can then be added as an event filter for Lambda functions as seen below:
By utilizing this event filter, customers can make sure Lambda functions are only invoked from DynamoDB TTL deletes. This results in fewer invocations and larger cost savings.
Use Lambda to write to DynamoDB Standard-IA tables
Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda runs code on a highly available compute infrastructure in response to an event, and you are charged only for the resources consumed. Lambda has out-of-the-box integrations with a variety of Amazon Web Services services, including DynamoDB Streams.
GetRecords
API calls invoked by Lambda as part of consuming data from DynamoDB Streams. Standard charges for Lambda invocation duration will apply as per
Conclusion
In this post, we discussed archiving data from DynamoDB Standard tables to DynamoDB Standard-IA tables using DynamoDB TTL, DynamoDB Streams, and Lambda with event filtering. By taking advantage of Amazon Web Services services and their native integrations, you can build a fully managed and cost-effective solution to archive data within DynamoDB Standard-IA tables. Customers who want to maintain accessibility to their data through DynamoDB API’s while savings costs from storing cold data should implement the above solution.
Join the conversation! Your feedback and experiences are invaluable to us and our community. Dive into the comments below to share your insights, ask questions, or offer alternative viewpoints. Let’s collaboratively enhance our understanding! For more information about using DynamoDB, please see the
About the Authors
Andrew Chen is an Edtech Solutions Architect with an interest in data analytics, machine learning, and virtualization of infrastructure. Andrew has previous experience in management consulting in which he worked as a technical lead for various cloud migration projects. In his free time, Andrew enjoys fishing, hiking, kayaking, and keeping up with financial markets.
Lee Hannigan , is a Sr. DynamoDB Specialist Solutions Architect based in Donegal, Ireland. He brings a wealth of expertise in distributed systems, backed by a strong foundation in big data and analytics technologies. In his role as a DynamoDB Specialist Solutions Architect, Lee excels in assisting customers with the design, evaluation, and optimization of their workloads leveraging DynamoDB’s capabilities.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.