We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Rate limited bulk operations in DynamoDB Shell
DynamoDB provides APIs to update, delete, and replace items, but all of these operate on exactly one item. However, you may need to perform data maintenance on numerous items. To do so, you have to write bespoke applications utilizing the DynamoDB API, fetch the items to be modified, and then perform the modification one at a time. In addition, you also have to implement code that ensures that these operations don’t impact the foreground application’s use of the same tables. DynamoDB Shell provides a simple way to do this. It supports SQL-like constructs that can operate on many or all items in a table, and perform UPDATE, DELETE, or REPLACE operations in a controlled manner.
Note that ddbsh is provided for your use on an as is basis, is not supported for production use cases. It should only be used for non-production and experimental use cases. Refer to
ddbsh can operate directly against your DynamoDB tables, therefore deletes and drops will impact your tables and the operations are irreversible. ddbsh can perform scans and queries against your data and the operations you perform count against your table capacity, and could incur significant costs. In this post, we show you how to rate limit bulk operations in ddbsh.
For details on how to restrict access to specific APIs, refer to
It is strongly advised that you understand what ddbsh is doing, and
Examples
Let’s take a look at some simple examples. For this, we use a table with a
We populate it with 2,000 items, each of which looks something like the following code:
Each long string is 5,000 characters long. Reading all the data from this table ends up consuming 1227.5
Even if we only fetched the three numbers (a, b, c), it consumes the same number of RCUs because the scan has to read the entire item. Also, this entire SELECT operation took just over 1 second:
Introduction to rate limiting
Suppose we want to ensure that this command never consumed more than 50 RCUs in a second. To do that, we add a ratelimit
to the SELECT operation:
The command now took 23 seconds to complete!
The following are similar commands that use the -q
(quiet) command line option, and pipe the results to a line counter ( wc -l
). The first one has no rate limit (takes 1.05 seconds), the second has a 50 RCU limit (takes 23.39 seconds), and the last one has a 10 RCU limit (takes 115.33 seconds). All of them produce 2,000 rows of output.
How rate limiting works
Rate limiting in DynamoDB Shell is implemented using a simple token bucket algorithm. A token bucket accumulates tokens at a pre-determined rate. An operation is only allowed to be performed when there are a positive number of tokens in the bucket. When the operation is complete, the resources it consumed are computed and the appropriate number of tokens are removed from the bucket (and the number of remaining tokens is allowed to go negative). The number of tokens will never be allowed to go over 1 second’s worth. For the complete implementation of the token bucket, refer to the
For each command, DynamoDB Shell implements two token buckets. One is used for read tokens and one is used for write tokens. Therefore, you are able to do a rate limited update as follows:
The write took over 11 minutes and updated all 2,000 items. The same update without rate limiting takes less than 5 seconds:
Queries can specify either read limits, write limits, or both limits. The syntax for the rate limit clause is as follows:
Updating with indexes
Suppose we want to update the table bulkops
and set all items where b = 30
as follows:
This query would need to perform a scan of the table. But there’s a way to make this easier because we have a GSI. This is an extension to SQL—you can specify the index as the update target. In reality, this will only update the table, but it will use the index to find items to update, as shown in the following code. The update on bulkops.bulkgsi
allows DynamoDB Shell to perform a query against the index and use the value it found there to perform the single update.
Observe that the query against the index projects the key attribute of the table and that is used in the UpdateItem
call that follows.
Remember to delete the table when you are done.
Conclusion
DynamoDB Shell provides some SQL-like constructs and extensions that allow you to perform bulk UPDATE, DELETE, and REPLACE operations with rate limiting. This rate limiting makes sure that the operations don’t consume more than a certain number of RCUs and WCUs. This can be useful when you want to perform these operations without impacting other traffic that may be going to these tables.
If you have questions about DynamoDB Shell, or suggestions for improvements, please contact us or provide feedback. If you would like to learn more about cost-effective bulk processing with DynamoDB refer to
About the author
Amrith Kumar is a Senior Principal Engineer in Amazon Web Services and works on Amazon DynamoDB.
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.