We frequently upgrade our Amazon ElastiCache fleet, with patches and upgrades being applied to instances seamlessly. We do this in one of the two ways:
(a) continuous managed maintenance, and (b) service updates. These maintenance and service updates are required to apply upgrades that strengthen security, reliability, and operational performance.
Continuous managed maintenance happens from time to time and directly in your maintenance windows without requiring any action from your end.
Service updates give you flexibility to apply them on your own. They are timed and may be moved into the maintenance window to be applied by us after their due date lapses.
You have the option to manage updates yourself at any time prior to the scheduled maintenance window. When you manage an update yourself, your instance will receive the OS update when you relaunch the node and your scheduled maintenance window will be cancelled.
Q: What are service updates in Amazon ElastiCache?
Service updates is a feature in Amazon ElastiCache that enables you to apply certain host updates at your discretion. These updates can be of the following types: security patches or minor software updates. These updates help strengthen security, reliability, and operational performance of your clusters.
The value of these service updates is that you can control when to apply the update (e.g., you can delay applying service updates when there is an important business event that requires 24x7 availability of ElastiCache clusters).
Q: How do I get notified of an available ElastiCache service update?
When service updates applicable to your Memcached or Redis clusters become available, we will notify you via several channels, including the Amazon ElastiCache console, email, Amazon Simple Notification Service (SNS), Amazon Personal Health Dashboard, and Amazon CloudWatch events.
Q: How are updates applied in the maintenance window different from the service updates?
Updates available via our continuous managed maintenance are separate than those offered by service updates. Updates applied via continuous managed maintenance are directly scheduled in your maintenance windows without any action needed from your side. Service updates are timed and give you control on when you want to apply by the “Recommended Apply by Date”. If they are still not applied by then, ElastiCache may schedule these updates in your maintenance window.
Q: How do I determine whether I should apply the available service update?
We recommend that you apply service updates as per your business cadence. Even if you are unable to apply a service update by its “Recommended Apply by Date” you will be able to apply it until its “Update Expiration Date”. However, the “Update Expiration Date” can change anytime depending on the availability of new updates.
Q: What is the impact of applying a service update to ElastiCache for Redis clusters? Will I lose data or connectivity to my clusters?
No, Redis clusters will continue to serve traffic and experience downtime of few seconds. When you select one or more Redis cluster(s) to apply a service update to, Amazon ElastiCache applies the update one node at a time with all shards in parallel until all selected clusters are updated.
- There will be no change in the cluster configuration.
- You will see a delay in your CloudWatch metrics that catch up as soon as possible.
Q: Is there downtime associated with applying the update to ElastiCache for Memcached clusters?
Yes, the node is replaced by a new empty node. The cache contents will no longer be there and will start fresh.
Yes, however if the value of “Auto-Update after Due Date” attribute of a service update is “yes” and the “Recommended Apply by Date” has passed, ElastiCache will schedule the service update to any remaining clusters for the next maintenance window. Still, if you apply the service update to the remaining clusters prior to the maintenance window, ElastiCache will not reapply the service update during the maintenance window.
Q: Why can’t the service updates be directly applied by ElastiCache during maintenance windows?
The purpose of service updates is to give you flexibility on when to apply them. Clusters that are not participating in the ElastiCache-supported compliance programs can choose to not apply these updates, or apply them at a reduced frequency throughout the year. This is true only when the value of “Auto-Update after Due Date” attribute of a service update is “no”. For more information, see Can I opt out of service updates?.
Q: Can I use service updates to opt out of Amazon ElastiCache service maintenance and apply the available updates myself?
No, service updates are mutually exclusive to the continuous managed maintenance updates applied directly by Amazon ElastiCache during your clusters’ maintenance windows.
Q: Where is the list of all service update attributes?
A complete list of attributes and their descriptions is available in Applying the Self-Service Updates.
Q: Do all service updates have the same timeline to apply?
To help determine how soon to apply the available service updates, you can refer to the “Severity” service update attribute which has the following values (in order of priority):
1. critical: Recommended to apply immediately (within 14 days or less)
2. important: Recommended to apply as soon as your business flow allows (within 30 days or less)
3. medium: Recommended to apply within 60 days or less
4. low: Recommended to apply within 90 days or less
For more details refer to our public documentation – Applying Updates.
Q: How often are service updates released?
Release schedule depends on the importance of the service updates.
Q: What is the “Service Update SLA Met” attribute?
This attribute reflects whether your cluster was updated by the “Recommended Apply by Date”. If a service update is applied after the “Recommended Apply by Date”, the attribute “Service Update SLA Met” is set to “no”.
Q: If I miss one or more service updates, will I be able to apply them later?
Yes. Unless noted otherwise in the service update “Description” attribute, service updates are always cumulative: if you miss applying them by the “Update Expiration Date”, they will be included in the next service update. Service updates of type “security” fall under this cumulative category.
Q: Can I choose to apply a service update to specific nodes in an ElastiCache cluster?
No, service updates are applied at the cluster level. If you cancel an ongoing update, a cluster may have some nodes updated and some nodes not updated. In this case, the cluster will continue to show up in the list of clusters to apply the service update to. The cluster will continue to operate normally.
Q: Why did the update status for one or more nodes in my ElastiCache clusters change from “not-applied” to “complete” even when I did not apply the service update?
There are two cases when this may happen:
(a) If you missed applying the service update that was optional and the update is now in “expired” status. Hence clusters participating in compliance programs must always apply all the service updates.
(b) If your node(s) are replaced for any other reason, such as a planned maintenance event or node failover, Amazon ElastiCache will provision new node(s) with the latest service updates included.
In both cases, the cluster will continue to operate normally.
Q: What if I have nodes that I want to apply expired service updates to? Should I wait for next service update?
New nodes contain all applicable service updates, so you can manually replace the existing nodes that haven’t been updated to get the latest updates.
Q: Are service updates engine-specific?
Yes. A service update may be applicable to only Redis, only Memcached, or both Redis and Memcached. You can look for the “Engine” and “Engine Version” service update attributes to determine the scope of each update.
Continuous Managed Maintenance Updates
Q: What is a continuous managed maintenance update?
These updates are mandatory and applied directly in your maintenance windows without any action needed from your side. These updates are separate than those offered by service updates.
Q: How long does a node replacement take?
A replacement typically completes within a few seconds. The replacement may take longer in certain instance configurations and traffic patterns. For example, Redis primary nodes may not have enough free memory, and may be experiencing high write traffic. When an empty replica syncs from this primary, the primary node may run out of memory trying to address the incoming writes as well as sync the replica. In that case, the master disconnects the replica and restarts the sync process. It may take multiple attempts for replica to sync successfully. It is also possible that replica may never sync if the incoming write traffic continues to remains high.
Memcached nodes do not need to sync during replacement and are always replaced fast irrespective of node sizes.
Q: How does a node replacement impact my application?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. For single node Redis clusters, ElastiCache dynamically spins up a replica, replicates the data, and then fails over to it. For replication groups consisting of multiple nodes, ElastiCache replaces the existing replicas and syncs data from the primary to the new replicas. If Multi-AZ with autofailover is enabled, replacing the primary triggers a failover to a read replica. For Redis Cluster configurations that are set up to use Redis Cluster clients, and non-Cluster configurations with auto failover enabled, the planned node replacements complete while the cluster serves incoming write requests. If Multi-AZ is disabled, ElastiCache replaces the primary and then syncs the data from a read replica. The primary node is unavailable during this time, leading to longer write interruption.
For Memcached nodes, the replacement process brings up an empty new node and terminates the current node. The new node will be unavailable for a short period during the switch. Once switched, your application may see performance degradation while the empty new node is populated with cache data.
Q: What best practices should I follow for a smooth replacement experience and minimize data loss?
For Redis nodes, the replacement process is designed to make a best effort to retain your existing data and requires successful Redis replication. We try to replace just enough nodes from the same cluster at a time to keep the cluster stable. You can provision primary and read replicas in different availability zones. In this case, when a node is replaced, the data will be synced from a peer node in a different availability zone. For single node Redis clusters, we recommend that sufficient memory is available to Redis, as described here. For Redis replication groups with multiple nodes, we also recommend scheduling the replacement during a period with low incoming write traffic.
For Memcached nodes, schedule your maintenance window during a period with low incoming write traffic, test your application for failover and use the ElastiCache provided "smarter" client. You cannot avoid data loss as Memcached has data purely in memory.
Q: What client configuration best practices should I follow to minimize application interruption during maintenance?
For Redis, Cluster mode configuration has the best availability during managed or unmanaged operations and it is always recommended to use a cluster mode supported client which connects to the cluster discovery endpoint. For cluster mode disabled, it is recommended to always use the primary endpoint for all the write operations. The individual node endpoints of the replica nodes can be used for all the read operations. If auto-failover is enabled in the cluster, primary node may change, therefore, the application should confirm the role of the node and update all the read endpoints to ensure that you aren't causing a major load on the master. With auto failover disabled, the role of the node will not change, however the downtime in managed or unmanaged operations is higher as compared to clusters with auto failover enabled. Avoid directing read requests to read replicas only. If you configure your client to direct read requests to read replicas only, ensure that you have at least two read replicas to avoid any read interruption during maintenance.
Q: How do I manage node replacements on my own?
We recommend that you allow ElastiCache to manage your node replacements for you during your scheduled maintenance window. You can specify your preferred time for replacements via the weekly maintenance window when you create an ElastiCache cluster. For changing your maintenance window to a more convenient time later, you can use the ModifyCacheCluster API or click on Modify in the ElastiCache Management Console.
If you choose to manage the replacement yourself, you can take various actions depending on your use case and cluster configuration:
- Change the Maintenance Window.
- Re-launch your Redis instance using Backup & Restore process.
- If your Redis cluster configuration is Cluster Mode Disabled
o Replace a read-replica (Cluster-Mode Disabled) – A procedure to manually replace a read-replica in a Redis replication group.
o Replace the primary node (Cluster-Mode Disabled) – A procedure to manually replace the primary node in a Redis replication group.
o Replace a standalone node (Cluster-Mode Disabled) – Two different procedures to replace a standalone Redis node.
- If your Redis cluster configuration is Cluster Mode Enabled
o Replace a node in cluster with one or more shards – You can either use backup and restore or scale-out followed by a scale-in to replace the nodes.
For more instructions on all these options see Actions You Can Take When a Node is Scheduled for Replacement page.
For Memcached, you can just delete and re-create the clusters. Post replacement, your instance should no longer have a scheduled event associated with it.
Q: How do I find out about upcoming scheduled replacements?
ElastiCache will send you email notifications before your node is scheduled for replacement. You can use the Cache Events section of the ElastiCache Management Console or use the describe-events API to check for the upcoming ElastiCache:NodeReplacementScheduled event. Finally, you can set up Amazon SNS notifications for this event in Redis using the information provided here.
For setting up SNS notifications in Memcached, use the information provided here.
Q: Can I change the scheduled maintenance at a more suitable time?
Yes, you can change your cluster’s maintenance window. For changing your maintenance window to a more convenient time later, you can use the API (ModifyCacheCluster or ModifyReplicationGroup) or click on Modify in the ElastiCache Management Console.
Once you change your maintenance window, ElastiCache service will schedule your node for maintenance during the newly specified window. Please see examples on how the changes take effect below.
Let's say, currently it's Thursday, 11/09, at 1500 and the next maintenance window is Friday, 11/10, at 1700. Following are 3 scenarios with their outcomes:
- You change your maintenance window to Friday at 1600 (after the current date time and before the next scheduled maintenance window). The node will be replaced on Friday, 11/10, at 1600.
- You change your maintenance window to Saturday at 1600 (after the current date time and after the next scheduled maintenance window). The node will be replaced on Saturday, 11/11, at 1600.
- You change your maintenance window to Wednesday at 1600 (earlier in the week than the current date time). The node will be replaced next Wednesday, 11/15, at 1600.
Q: Why are you doing these node replacements?
These replacements are needed to apply mandatory software updates to your underlying host. The updates help strengthen our security, reliability, and operational performance.
Q: Do these replacements affect my nodes in Multiple Availability Zones at the same time?
We may replace multiple nodes from the same cluster depending on the cluster configuration while maintaining cluster stability. For Redis sharded clusters, we try not to replace multiple nodes in the same shard at a time. In addition, we try not to replace majority of the master nodes in the cluster across all the shards.
For non-sharded clusters, we will attempt to stagger node replacements over the maintenance window as much as possible to continue maintaining cluster stability.
Q: Can the nodes in different clusters from different regions be replaced at the same time?
Yes, it is possible that these nodes will be replaced at the same time, if your maintenance window for these clusters is configured to be the same.