In my daily life, I do a lot of work with compilers and programming languages. In the course of doing that, I want to provide a quality editing experience; in the modern era that means things like syntax highlighting, auto completion and rapid feedback on errors. But I don't want to write tools, I want to integrate with them. The question then arises as to which editors to integrate with. As a Java programmer, in the past I have generally used Eclipse, but it is not an easy architecture to plug into.

Recently, I have started using Microsoft's VSCode to do front-end JavaScript, HTML and CSS development: in spite of being a Microsoft product, it actually seems to be quite stable, reliable and sane. Its approach to embedding editing features for new languages is not necessarily to strictly embed them, but to allow for a connection to an external server which delivers the relevant features. This appeals to me because it enables me to write most of the integration code in Java - with which I am familiar and which is the implementation language of the compiler itself - thus making the job easier.

As an added benefit, the Language Server Protocol is supported by a wide array of languages and tools which means that doing this work once provides easier access to a range of tools, although it is still necessary to implement a relatively thin "client" experience for each tool.

So what is the Language Server Protocol?

Basically, the language server protocol is a communication protocol between editing tools and "compilers" which abstracts away the language details and allows the two ends to communicate in terms of the kinds of abstract operations that tools want to perform on language elements - look up definitions, search for usages, complete symbols, etc.

The protocol is a version of JSON-RPC over a lightweight HTTP protocol.

Building a Server in Java

Nobody wants to actually go to all the effort of writing the code to read and write JSON-RPC over HTTP. Fortunately, people have been there before us and done that. For example, in Java there is the lsp4j library which makes it possible to write to interfaces and then have a main method that wires up a server.

Most of the work is involved in implementing the LanguageServer interface and implementing all the methods. The main method then instantiates this and creates a "server" by wrapping this using the LSPLauncher.createServerLauncher() method. In addition to the server instance, this method requires an input stream and an output stream. Where do these come from?

This is where the genuine connection to the client comes in. You need a physical transport layer - most likely a socket connection - from which you can extract a stream in each direction. In passing these to the "server" you enable it to read requests and write responses.

Finally, there is a little bit of magic in wiring up the "remote client" interface (by which the server communicates back with the actual client) with the server code by having the server implement the LanguageClientAware interface.

Embedding a Connector in VSCode

Integrating with VSCode is not quite as simple as merely implementing a server. For a variety of reasons, VSCode requires a "beachhead" on the client side to handle the communication with the server.

The process is described in the VSCode documentation. Apart from anything else, this defines the capabilities of the language server and provides the implementation of the connection layer including starting up the "remote" (from the VSCode's point of view) server.

Additionally, some of the functionality associated with a language (such as syntax highlighting) is not implemented over the server protocol at all. Syntax highlighting, for example, is implemented strictly on the client side using regular expression matching. I can't say as I like it - parsing is so much more sane - but it sadly common to most tools.

A Simple Example

For this post I am only going to offer a little code and, breaking with tradition, it is not even my code. Microsoft offers an example of how to connect to another node.js based language server but the late Adam Voss ported this to use a Java server.

Reversing the order of presentation from the sections above, I am going to start with the client side (i.e. the code embedded in VSCode). Obviously, this needs to be node.js compatible and, since we are in Microsoft-land, that means typescript (although I believe it is possible to use JavaScript if you insist).

The client code

Each client needs to have a manifest associated with it, in package.json. Most of this is, of course, vanilla node.js/npm configuration: setting up dependencies and the like.

The key elements seem to be engines, activationEvents and configuration. These are described in some detail in the developer guide and I don't think I have very much to add to that at this point. Obviously, the engines describes the versions of VSCode with which the plugin is compatible; activationEvents describes the portion of the protocol that the plugin implements; and configuration covers the rest of the concerns, including (it would seem) allowing the plugin to introduce settings which the user can then configure.

What is not configured - and is therefore presumably implicit - is how the client is configured. It would seem that the module needs to export a single function called activate which receives an ExtensionContext and is responsible for creating a new LanguageClient object (defined by a Microsoft library) and, after configuring it as appropriate, calling start on it. The cunning, of course, is all in the options parameters that are passed to it.

Moving on to the Java example, we can look at the equivalent extension.ts file in this repository.

Starting towards the bottom (line 76), the language client is created and started (all on one line). The client options look fairly vanilla, but in lieu of the server options, there is a function name. For full disclosure, I haven't so much as cloned this repository yet, so for all I know it doesn't even work, but at the same time I know that JavaScript - and typescript, presumably - will accept a function as a parameter and then call it when it needs the value. I'm assuming that is what is going to happen here. It is worth noting, on the other hand, that this repository is a couple of years out of date, so it is also possible that it is using a no-longer-supported feature.

Anyway, assuming that it is right, it is passing in the function which takes up most of the module (lines 17-60). Again, confusingly, it returns a Promise of a StreamInfo, not the LanguageServerOptions I was expecting. But no matter.

First, it creates a socketpair which, on completion, resolves the promise by providing the reader and the writer. It also listens for the socket to be closed and reports that to the console (it doesn't actually close anything, which seems surprising, but it is possible that somebody else catches that). It connects the listen event to a handler which starts a java process (the server), telling it the port number which has been opened.

I have to admit that there are a number of things going on here which don't seem exactly right to me - but that is probably because I don't understand enough about how the node.js net.server abstraction works.

The server side

The server is a fairly simple and brain-dead Java application. On the flip side, it doesn't do very much.

The main code is in App.java. Basically, this reads the port from the arguments, creates a client around it, and then does the work to set up an LSP server using the streams and an ExampleLanguageServer.

This implements the minimal number of methods to implement a TextDocumentService, although for reasons I don't understand, the actual implementation is split between a class FullTextDocumentService and an inner class inside the language server class.

The server has methods to initialize, connect, shutdown and exit the server, as well as to return the implementation of the FullTextDocumentService. It also provides an implementation of the WorkspaceService which appears to be responsible for handling user configuration changes.

Conclusion

I've learned a lot about VSCode and the Language Server Protocol that I didn't previously know and having saved away the links, I am hoping this will be of use to me when I return to actually try and implement something.

The next step is obviously to clone these various repositories, bring everything up to date, get it to work as is and understand it a little better. After that, I will need to try and understand the breadth of the protocol before trying to connect an actual compiler.

Expect to hear more.

Ignorance may be Strength

Friday, September 18, 2020

Integrating with VisualStudio using Language Server Protocol