Wednesday, April 1, 2020

I just upgraded to Java 11 this weekend

I upgraded to Java 11 this weekend.  I'm neither proud not bitter about this per se, but I am bitter about the way in which I was forced into it and the incomplete way in which I ended up doing it.

To be clear, upgrading past Java 8 has been on my list for a while.  However, any number of constraints (Android, AWS Lambda) have been holding me back.  Late last year, Lambda started supporting Java 11 and migrating to that officially went onto my "to do" list.

But I ended up upgrading this weekend in a somewhat awkward halfway-house way because of a number of incompatible bugs in different versions of Java.

The Background

I have an application which has a Java server and both a Java and JavaScript client.  The Java client is supposed to operate on both desktops and Android phones, but leaving aside the fact that Android does not support Java 11 yet, the Android aspect of this is not relevant to this article.

Obviously, being me, I need to test both of these "end-to-end".  And I don't want to do that with any kind of wrapper and framework that takes forever to set up and ends up being very flaky.

So the best bet seemed to run a Web Client inside a Java runtime (the only alternative I seriously considered was to use ChromeDriver to run the tests against a Chrome instance).  In the end, I selected the UI4J library as being the simplest thing to use.

The UI4J library is really just a wrapper around a JavaFX WebView, which is built in to Java 8, but has been packaged as OpenJFX after the Java 9 breakup.  In retrospect, I'm not sure how much it buys me given that it's an extra moving part in this scenario, but upgrading was painful enough without deciding to rearchitect at the same time.

My client and server connect using websockets.  JavaFX supports websockets, although as with everything else in its JavaScript implementation, it can be very difficult to diagnose problems with them.  But I had code that was working and successfully connecting if I ran it in a browser; it just wasn't connecting in my test case.

The Problems

The first problem I had to solve was the websocket not connecting.  Googling for this on the internet revealed that WebSocket support quietly broke between Java8 u202 and u211.  I was running u211.  OK, so I haven't upgraded for a while.  I felt I'd tried at some point and had issues, but that was then.  I downloaded u241 and tried that.

It turned out to be worse.  I'm not quite sure why, but I ran into run-time link errors with UI4J.  Apparently, somebody thought it would be a good idea to change part of the JavaFX API between u211 and u241 - or else UI4J was using an undocumented interface in JavaFX that changed, and they didn't release a revised version that "fixed" that.

No other Java 8 versions are available for download.  Bite the bullet and upgrade!

I downloaded a Java 11 JDK package on my Mac and tried to restart eclipse.  It wouldn't.   What?  I set JAVA_HOME back to Java 8 and tried again.  It still died due to this issue.  I don't know why this is so all-pervading that having this dylib anywhere causes such heartache but it does.  Fortunately, deleting the entire package solved the problem and I was able to go back to Java 8 and re-run Eclipse.

OK, what now?

I downloaded a ".tar.gz" version of the same thing and didn't put it in a central place but just unpacked it inside "Downloads/".  Needless to say, MacOS wasn't impressed by this trick and I spent a lot of time dealing with security warnings.  But having done that, I was able to start Eclipse with Java 8 and configure it internally to have a Java 11 runtime (from Downloads(!)) and only end up with about 250 errors.

The problem, of course, being that JavaFX was no longer included in the Java runtime, so I had to go and find all of the appropriate libraries (JavaFX itself is broken up into modules) - and also upgrade to UI4J version 4.0.

Along the way, it turned out I needed to upgrade a few other packages (BouncyCastle, for instance, needed to go from 1.60 to 1.64) to be compatible with the module requirements.  And I had some SSL and KeyTool code that need upgrading because I'd written it to use internal Sun APIs.

But by the end of the weekend, I'd worked my way through all those problems and my WebSocket successfully connected and I was able to go back to doing TDD with a working test harness.

The Solution

So there I am.  For the time being (at least until they release the next version) I am still "running" Java 8, but Eclipse is the only thing that uses that.  Inside Eclipse, I am configured for language compatibility with Java 8 (to stop myself drifting from an Android-compatible build) while building my projects with a Java 11 JDK and OpenJFX 11 and UI4J 4.0.  Jenkins had no real problems upgrading and is running under (and using) the Linux version of the Java 11 JDK,  And my AWS Lambdas are now using the "java11" runtime.

Update

I spoke too soon.  It turns out that there are bugs in the (it would seem new) implementation of TLS 1.3 in Java 11.  I haven't found the specific bug I encountered described online, but suffice it to say that it took a long time to track down, mainly because I assumed it was a bug in my code.

I have a stress test and, as part of my standard automated build, I run a "sanity" version of the stress test - client and server in one executable, five client threads and ten client sessions.  It runs for about 10-20s and goes through some standard operations.

After the upgrade, it would hang.  After diagnosing a number of timeout issues, it became clear that it was hanging because the server would respond to one request, the client would receive the response but fail to make the next connection.  However, the server responding threads would never terminate while something was chewing up 1000% of CPU (the joys of having 16 cores!).

It turns out that the problem was that the SSLEngine implementation for TLS 1.3 has some kind of issue with multiple threads and goes into some kind of infinite loop and never quite closes a connection - or possibly can't open a new one.  As more threads try and access the SSLEngine, they seem to get sucked in too.

While I wasn't able to isolate this enough to produce a test case to file a bug, I was able to track it down on my machine by pausing all the threads.  As I released individual threads that were "in" SSLEngineImpl, each of them took up a whole processor and my CPU usage jumped to 100%, 200%, 300% ...

The fix is quite easy - and described by any number of articles discussing similar problems - turn off TLS 1.3.  Exactly how you do this depends on what communications infrastructure you are using; I am using Grizzly servers and Apache HTTP clients and this worked for me:
-Djdk.tls.client.protocols=TLSv1.2
Note that because I am running both sides of the connection in a single application, I am able to be very clear that I want to use 1.2.  To specify "NOT 1.3" you need to allow all the other options.

No comments:

Post a Comment