We use machine learning technology to do auto-translation. Click "English" on top navigation bar to check Chinese version.
Reducing Java cold starts on Amazon Web Services Lambda functions with SnapStart
Written by Mark Sailes, Senior Serverless Solutions Architect, Amazon Web Services.
At Amazon Web Services re:Invent 2022, Amazon Web Services announced SnapStart for
Overview
Today, for Lambda’s function invocations, the largest contributor to startup latency is the time spent initializing a function. This includes loading the function’s code and initializing dependencies. For interactive workloads that are sensitive to start-up latencies, this can cause suboptimal end user experience.
To address this challenge, customers either provision resources ahead of time, or spend effort building relatively complex performance optimizations. Although these workarounds help reduce the startup latency, users must spend time on some heavy lifting instead of focusing on delivering business value. SnapStart addresses this concern directly for Java-based Lambda functions.
How SnapStart works
With SnapStart, when a customer publishes a function version, the Lambda service initializes the function’s code. It takes an encrypted snapshot of the initialized execution environment, and persists the snapshot in a tiered cache for low latency access.
When the function is first invoked and then scaled, Lambda resumes the execution environment from the persisted snapshot instead of initializing from scratch. This results in a lower startup latency.

Lambda function lifecycle
A function version activated with SnapStart transitions to an inactive state if it remains idle for 14 days, after which Lambda deletes the snapshot. When you try to invoke a function version that is inactive, the invocation fails. Lambda sends a SnapStartNotReadyException and begins initializing a new snapshot in the background, during which the function version remains in Pending state. Wait until the function reaches the
Using SnapStart
Application frameworks such as
If the functionality that these frameworks bring is implemented at runtime, then they often contribute to latency in startup time. SnapStart allows you to use frameworks like Spring and not compromise tail latency.
To demonstrate SnapStart, I use a
To deploy:
- Clone the git repository and change to project directory:
git clone https://github.com/aws-samples/serverless-patterns.git cd serverless-patterns/apigw-lambda-snapstart
- Use the Amazon Web Services SAM CLI to build the application:
sam build
- Use the Amazon Web Services SAM CLI to deploy the resources to your Amazon Web Services account:
sam deploy -g
This project deploys with SnapStart already enabled. To enable or disable this functionality in the Amazon Web Services Management Console:
- Navigate to your Lambda function.
- Select the Configuration tab.
- Choose Edit and change the SnapStart attribute to PublishedVersions.
- Choose Save .
Lambda Console confoguration
- Select the Versions tab and choose Publish new .
- Choose Publish .
Once you’ve enabled SnapStart, Lambda publishes all subsequent versions with snapshots. The time to run your publish version depends on your init code. You can run init up to 15 minutes with this feature.
Considerations
Stale credentials
Using SnapStart and restoring from a snapshot often changes how you create functions. With on-demand functions, you might access one time data in the init phase, and then reuse it during future invokes. If this data is ephemeral, a database password for example, then there might be a time between fetching the secret and using it, that the password has changed leading to an error. You must write code to handle this error case.
With SnapStart, if you follow the same approach, your database password is persisted in an encrypted snapshot. All future execution environments have the same state. This can be days, weeks, or longer after the snapshot is taken. This makes it more likely that your function has the incorrect password stored. To improve this, you could move the functionality to fetch the password to the post-snapshot hook. With each approach, it is important to understand your application’s needs and handle errors when they occur.

Demo application architecture
A second challenge in sharing the initial state is with randomness and uniqueness. If random seeds are stored in the snapshot during the initialization phase, then it may cause random numbers to be predictable.
Cryptography
Amazon Web Services has changed the managed runtime to help customers handle the effects of uniqueness and randomness when restoring functions.
Lambda has already incorporated updates to
Software that always gets random numbers from the operating system (for example, from /dev/random or /dev/urandom) is already resilient to snapshot operations. It does not need updates to restore uniqueness. However, customers who prefer to implement uniqueness using custom code for their Lambda functions must verify that their code restores uniqueness when using SnapStart.
For more details, read
Runtime hooks
These pre- and post-hooks give developers a way to react to the snapshotting process.
For example, a function that must always preload large amounts of data from
The Java managed runtime uses the open-source
The following function example shows how you can create a function handler with runtime hooks. The handler implements the CRaC Resource and the Lambda RequestHandler interface.
...
import org.crac.Resource;
import org.crac.Core;
...
public class HelloHandler implements RequestHandler<String, String>, Resource {
public HelloHandler() {
Core.getGlobalContext().register(this);
}
public String handleRequest(String name, Context context) throws IOException {
System.out.println("Handler execution");
return "Hello " + name;
}
@Override
public void beforeCheckpoint(org.crac.Context<? extends Resource> context) throws Exception {
System.out.println("Before Checkpoint");
}
@Override
public void afterRestore(org.crac.Context<? extends Resource> context) throws Exception {
System.out.println("After Restore");
}
}
For the classes required to write runtime hooks, add the following dependency to your project:
Maven
<dependency>
<groupId>io.github.crac</groupId>
<artifactId>org-crac</artifactId>
<version>0.1.3</version>
</dependency>
Gradle
implementation 'io.github.crac:org-crac:0.1.3'
Priming
SnapStart and runtime hooks give you new ways to build your Lambda functions for low startup latency. You can use the pre-snapshot hook to make your Java application as ready as possible for the first invoke. Do as much as possible within your function before the snapshot is taken. This is called priming.
When you upload your zip file of Java code to Lambda, the zip contains .class files of bytecode. This can be run on any machine with a JVM. When the JVM executes your bytecode, it is initially interpreted, then compiled into native machine code. This compilation stage is relatively CPU intensive and happens just in time (JIT Compiler).
You can use the before snapshot hook to run code paths before the snapshot is taken. The JVM compiles these code paths and the optimization is kept for future restores. For example, if you have a function that integrates with DynamoDB, you can make a read operation in your before snapshot hook.
This means that your function code, the Amazon Web Services SDK for Java, and any other libraries used in that action are compiled and kept within the snapshot. The JVM then won’t need to compile this code when your function is invoked, meaning your latency is less the first time an execution environment is invoked.
Priming requires that you understand your application code and the consequences of executing it. The sample application includes a before snapshot hook, which
Metrics
The following chart reflects invoking the sample application Lambda function 100 times per second for 10 minutes. This test is based on this function, both with and without SnapStart.
p50 | p99.9 | ||
On-demand | 7.87ms | 5,114ms | |
SnapStart | 7.87ms | 488ms |
Conclusion
This blog shows how SnapStart reduces startup (cold-start) latencies times for Java-based Lambda functions. You can configure SnapStart using
To learn more, see Configuring function options in the
This feature allows developers to use the on-demand model in Lambda with low-latency response times, without incurring extra cost. To read more about how to use SnapStart with partner frameworks, find out more from
The mentioned AWS GenAI Services service names relating to generative AI are only available or previewed in the Global Regions. Amazon Web Services China promotes AWS GenAI Services relating to generative AI solely for China-to-global business purposes and/or advanced technology introduction.