Ignorance may be Strength : January 2023

Apologies, but kind of in violation of my rules, I don't have any actual example working code. That's because this is more complex than that and the number of moving parts were just too vast. But if you want any help or conversation about this, drop me a line and I'll do/share whatever I can.

When I saw that AWS had released a "snapstart" feature for Lambda, I was ecstatic. I have taken to using Lambda as a way of delivering servers with minimum fuss, but I have somewhat abused the technology by basically moving my existing server-based code into a lambda, along with its long start time.

I grew up in a world where the logic has always been that what you want to optimize is the time spent doing frequent operations; initialization time is effectively "free" (up to the point where it becomes minutes or hours: at which point you really have to do something about it). Lambda, on the other hand, says you have 10s: if you fail to complete in this time, you will be rejected and the whole thing starts again. When you are trying to configure multiple things with multiple services … it doesn't quite make it.

I finally went over the edge when I found that in order to support the latest version of JavaFX, I needed to copy a whole bunch of ".so" files from S3 to the "local disk". This was taking pretty much all the 10s … As I started to consider my options (I'd got as far as panicking, and decided that was not very productive), AWS announced the "SnapStart" feature: initialize once, use repeatedly. Excited, I turned it on for my functions (so excitedly, I just did it in the console, rather than using CloudFormation; more on that later).

And nothing happened. Or, at least, I continued to have problems. But why?

It only works on published functions

Uh, boss, it's not as simple as that. On the upside, it is very clear that the 10s timeout does not apply to SnapStart functions - you have up to 900s to be used when the function is published. On the downside, this implies and, indeed, it is actively stated that you need to publish your functions in order to take advantage of this. Except my test environment does not bother with publishing and versioning lambdas, so I still want to bring that initialization time down.

The Runtime Hooks

Before doing that, I decided to at least try and integrate the runtime hooks and issue tracing messages. My thought process was firstly to see if these were called when the lambda wasn't published, and even if they weren't, I would then at least know when I had successfully published the lambda, because my tracing would come out.

In order to turn on the runtime hooks, it is necessary to download the CRAC library and attach this to your project. The AWSHandler then needs to implement the CRAC Resource interface, and register with the CRAC global context. So I added code like this:

public AWSHandler() {
Core.getGlobalContext().register(this);
}

and then it's a simple matter of implementing the callbacks provided in the Resource interface:

public void beforeCheckpoint(org.crac.Context<? extends Resource> arg0) {
logger.info("before checkpoint");
}
public void afterRestore(org.crac.Context<? extends Resource> arg0) {
logger.info("after restore");
}

When It Works …

I added the relevant code to publish and alias my lambdas, and also to ensure that the code inside APIGateway called the alias (and thus the published version), rather than the "$LATEST" version, and, lo and behold, it all worked.

During the publication, this happens:

INIT_START Runtime Version: java:11.v15 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:0a25e3e7a1cc9ce404bc435eeb2ad358d8fa64338e618d0c224fe509403583ca
Picked up JAVATOOLOPTIONS: -Dui4j.headless=true
-Dglass.platform=Monocle
-Dmonocle.platform=Headless
-Dprism.order=sw
-Djavafx.cachedir=/tmp/solibs
20230104-13:06:48.640 tdaserver/Thread-0 INFO: In config

The key thing here being the "INIT_START" rather than just "START". Interestingly, it doesn't seem to issue any message when the initialization is done, it just stops issuing messages.

And then, when the lambda is called during API Gateway access, I see this:

RESTORE_START Runtime Version: java:11.v15 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:0a25e3e7a1cc9ce404bc435eeb2ad358d8fa64338e618d0c224fe509403583ca
RESTORE_REPORT Restore Duration: 383.49 ms
START RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa Version: 1
…
END RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa
REPORT RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa Duration: 785.60 ms Billed Duration: 1044 ms Memory Size: 1024 MB Max Memory Used: 401 MB Restore Duration: 383.49 ms

Here the RESTORE_START and RESTORE_REPORT make it clear that a SnapStart image is being used, and how much time has been used to literally start the lambda (383ms may seem a lot, until you realize that it was over 15000ms to actually do the initialization).

After that, the lambda proceeds in the normal way.

No CRAC Output

Interestingly, up to this point, I have not seen any of the tracing output I would expect from my CRAC callback. I don't know whether I didn't succeed in registering correctly, or whether my tracing is simply not coming out. In the fullness of time, I will need to sort this out because it is necessary to check that all the initialization that has been done up to this point is up to date.

Configuring from CloudFormation

As a bleeding-edge adopter, when I first tried to use SnapStart, there wasn't any active CloudFormation documentation on using it. For all I know, it wasn't supported in CloudFormation. However, now, a couple of months later, all the relevant documentation is there.

As you'd expect, you configure this by adding a SnapStart property to your Lambda Function configuration which is quite simple so that it basically amounts to adding:

"SnapStart": {
"ApplyOn": "PublishedVersions"
}

to your existing function declarations.

Conclusion

For a long time, Lambda on AWS with Java has been plagued by painfully slow startup times. It does seem that SnapStart makes major strides towards addressing these and, provided you are publishing your lambdas, is relatively easy to set up.

On the other hand, it seems somewhat opaque to use.

Ignorance may be Strength

Thursday, January 5, 2023

Lambda Snapstart is Harder than I Thought