Friday, December 13, 2019

Integrating with a Real Server

Go to the Table of Contents for the Java API Gateway

Now things start to get interesting.  We have completed our "characterization test" phase of AWS and we have figured out how to build a gateway, deploy some Java into it and handle the request and response objects.  Now we want to connect this to a system designed to do real work.

The eagle-eyed among you will have noticed that when I created more resources and methods during the input blog post, I connected them to the same lambda I had been working with.  This is very definitely a choice - the "integration" part of each APIGateway Method definition allows you to choose which lambda to proxy to - but it was a choice I intentionally made.  All the information we need is there to handle our own routing within the lambda.

How much to put into a single lambda - and when to start creating new lambdas - is a thorny topic and not one I am going to get into here.  Suffice it to say that general software engineering best practices should apply and if there is significant code duplication, put it all in one lambda.  On the other hand, the duplication only needs to be in the packaged product, which is just storage and configuration, so you may find going towards more lambdas is a good idea.  What I would definitely advise against is literally duplicating code to make multiple lambdas.  Code duplication is almost always a bad idea.

All of this code is in the repository, tagged API_GATEWAY_TDA.

A Tell-Don't-Ask Server

As trailed in the last post, I'm going to be working towards integrating a tell-don't-ask HTTP server into API Gateway here.  Again, for full disclosure, I already have this working within a Grizzly setup.  I am going to copy-and-paste some of that code into the blog area to make it clear what is going on, and then try and rework the handler, request and response to fit with that.  You will also notice that I have changed the API Gateway configuration to point to a new resource, so if you are following along and haven't already, you will need to drop and recreate your gateway configuration.

What is a tell-don't-ask server?  Simply put, it views an HTTP request as a pipeline: that is, the data from the request is fed into the handler which then places its output (status, data, etc) in a response object, which in turn feeds it back to a client.

To be clear, no public HTTP processing framework I am aware of works like this: most pass a handler both a request object (from which it is supposed to pull data) and a response object to write to.  The AWS API we are working with here expects the handler to pull from a request object and then create a response object.  While I'm not going to get into the philosophy of why I prefer functional and pipeline approaches here, suffice it to say that there are reasons of modularity, composability and testability which I find compelling.

The intent is that the server is configured to map input paths to "handler generators": that is, classes which act as factories for handlers, creating a new object for each incoming request and attaching any global configuration (such as access to databases).  The server then processes the incoming request and "tells" the handlers the data that it wants to know to do its job.  Once the handler has been created and configured, it is given a response object and told to get on with it.  When complete, the response can be processed by the server. 

What is being Requested, anyway?

One of the things we did not look at in the last post when considering input data was how to find out what the actual resource being requested was.  It is possible to determine this by implementing additional setters on the Request object.  The documentation says that httpMethod, resource and path are the ones to look at.  For our purposes, the overlap between path and resource means that path is probably not going to be particularly useful except for debugging and errors.

For this post, I've essentially thrown away what I've done up to this point and started again.  Specifically, I've created a set of new packages to hold the "client" code, the necessary interfaces and an implementation server.

The code that we have been working with over the past posts becomes "implementation detail" in this version and thus is buried inside the server.  At the same time, I have taken the opportunity to add setters for these additional data items.

Initialization in Lambda

As far as I can tell from reading the AWS documentation, there is no specific mechanism for starting up or "bootstrapping" a lambda.  They are clear that by and large lambdas can be reused, but the first time through you have to hope that everything is initialized.

The one guarantee that we have is that the API Gateway is going to call our Handler method before anything else we care about, and we are guaranteed by Java semantics that before an object is created all of the static members will be initialized and any static constructor will have been called.

Like most software engineers, I don't like static variables and I don't like this way of approaching problems.  On the other hand, it is possible to use a static member to declare that there is a singleton configuration object which needs to be created.

So to make the initialization work, I have defined an Initialization class and created a single static instance of it in the top-level handler.  For testing purposes (not that I am seriously testing this version of this code) it is possible to create the Initialization class independently of this static variable and I will treat the Handler class as I would a main method: it does the minimal amount of wiring up which can be "tested" in production and otherwise depend on thorough testing of all its components.

This Initialization class is again an "implementation detail" and is mainly responsible for making sure that a central configuration repository is initialized, accessible to all handlers, and then calling a user-provided configuration class (passing it the repository instance).  The name of this class is determined from an environment variable passed into the lambda definition.

The Processing Flow

The processing of a request begins in the "implementation detail" handler.  It is provided with a ProcessorRequest object which has been populated by the AWS server.  But our handler, being tell-don't-ask, rejects the opportunity to extract data from the request and instead creates a response object and asks the request to get on with the job of processing the request and passing any output to the response.  When complete, it returns the response to AWS.

The ProcessorRequest object first talks back to the central repository and asks it whether it can create a handler for the given combination of method and resource.  If it cannot, it populates the response with a 404 and gives up.

Otherwise, it looks at the capabilities of the defined handler using reflection and populates it accordingly: if it wants headers, it gives it those; if it wants path parameters, it gives it those; if it wants query or form parameters, it gives it those; and if it wants the body, it gives it the body.

Finally, when the handler is fully configured, the obligatory process method is called, providing the response object, and the handler is expected to "do its thing" and complete the response object.  Ultimately, the userspace handler completes, so does the handler method in the ProcessorRequest object, and finally the AWS handler returns the response.

What about the duplication?

Apologies to those of you who have been following along and asking this question.   Clearly, there is duplication between the paths that are set up in API Gateway and the path matching that is being done here.  The question is how to avoid it.

If you follow the "pure" API Gateway approach, it is possible to eliminate the duplication by associating each of the methods with its own endpoint in its own lambda, and could then remove the initialization and pattern matching here.  The problem with that (from my perspective) is you are switching a small amount of mapping duplication for a huge amount of infrastructural duplication.

The other possibility is to deal with the duplication by creating one of the copies from the other.  The duplication will still technically "exist" but it will no longer be your problem.

My solution in real life is to extract all of the mapping setup into external (JSON) files which are then read to configure the server.  These same files can then be used at deployment time to generate the appropriate configurations in API Gateway.  Doing this is left as an exercise to the reader.

Abstractions and Testing

What may or may not be clear is that I have taken the same (actually slightly simplified) abstraction as I was already using for Grizzly and redeployed it under API Gateway.  That means, that with minimal effort, I can continue to run "local" end-to-end tests inside a Grizzly server on actual hardware, and only deploy to API Gateway for "staging" and "production" purposes.

The whole point, of course, of using these interfaces is to isolate my development cycle from the underlying technologies and so unit testing of handlers in this form becomes a breeze (although you do need to be careful that your tests populate the handlers in the same way that the configuration does).

Conclusion

Hopefully that was not too breathless a tour of integrating a separate method dispatcher into API Gateway.  In principle, I don't see any reason why the same technique would not work for commercial libraries such as Spring or Struts.

For me, the key conclusion is that it is possible to deploy an API Gateway with minimal or no changes to the underlying logic, just a separate adapter/wrapper to the web technology.  This gives me the freedom to develop and test my code in units and deploy to either grizzly containers or to AWS API Gateway.

Next: Logging to Cloudwatch

No comments:

Post a Comment