Friday, October 30, 2020

PWA Notifications

Notifications are both more interesting and more complex than adding to the home screen.

"Push notifications" are, in fact, more a browser technology than they are a web app technology. This makes them even more browser specific than adding to the home screen. I generally develop with Chrome, and for my personal projects that's all I consider, so that's as far as I'm going to go in this article: Chrome uses a cloud module called "Firebase Cloud Messaging". However, it's my understanding that on the desktop, Firefox support for push notifications uses a similar cloud service called "autopush". It is my understanding that Edge and Safari on the desktop do not support push notifications at all. Mobile browsers are a different story again.

On the plus side, I believe that wherever they are available, these technologies are all compatible at the API level - at least as far as I describe it here. Your mileage may vary …

The basic paradigm

The basic paradigm for push notifications is that there is "special, magic infrastructure" that can deliver messages from a server to a client through the cloud and the browser. For Chrome, this is called "Firebase Cloud Messaging". As far as I can tell, it is only part of "Firebase" in a branding sense: you don't have to create an account and a project to use the cloud messaging service.

I have attempted to draw out what I think the basic architecture is here:

The browser loads the application from the server and enters into the "usual" two way interaction with it, possibly including Ajax or Websocket interactions. During this interaction, a "magic" key (called a VAPID key) is generated on the server and passed to the client. It uses this key to register its interest in push notifications with the browser. The browser in turn, magically and under the covers, notifies the cloud messaging service of this registration. The client receives a unique and somewhat persistent handle which it can pass to the server as it sees fit - the server needs this in order to be able to send messages to the client.

When the web server wants to send a push notification to the client, it contacts the cloud messaging service, passing it the appropriate request signed with the private portion of the VAPID key. The messaging server then looks up the corresponding registration(s) and delivers the messages to the corresponding browser(s). These, in turn, deliver the message to the service worker thread(s) of the appropriate web application(s), waking them up if necessary and displaying some notification to the user.

Security and Permissions

Everything to do with "the modern web" seems wrapped up in security and permissions. Sadly, with the number of bad actors out there, this is just a fact of life. Three separate processes go on in this context: firstly, as indicated above, there is a public/private keypair generated to ensure that the server sending the message corresponds to the clients wishing to receive messages; secondly, before any messages can be sent, the client must obtain permission from the user of the browser for push notifications to happen; and thirdly, when messages are transmitted with content (or "payload") the content must be encrypted from end to end which is handled by generating a shared key with the subscription.

The guidelines suggest that users should be encouraged to opt in to push notifications by taking a concrete action to enable them. Based on the number of websites I visit where the first thing you see is the message saying that the site would like to send push notifications, this does not seem to be widely adhered to. We will, of course.

Sending Messages

Sending messages theoretically requires a server, but because we are trying to just do this using a static website, we are going to take advantage of a command line tool to send notifications. There are tools of this kind for most languages it would seem, but I have chosen to install a Node.js version.

This can be installed by running

npm install -g web-push

This installs the command globally which, if you have npm correctly configured means you should now be able to run the web-push command from your command-line.

$ web-push
Usage: …

But before we can send any messages, we need to create a keypair. This is the basic security mechanism to ensure that all messages are sent by an approved party. In principle, the server generates the keypair and retains the private key, shipping the public key to the application somehow.

The private key in this context is a signing key: that is, it is used to provide a signature for the message that it sends and the public key can then be used to check that the signature is valid. Invalid messages are rejected.

This is done using the web-push command with the generate-vapid-keys subcommand:

$ web-push generate-vapid-keys
Public Key:
BNX8bG8mNTIJmXai9k35J5CKB2Wyc8kZoJS9Y31qkfUSfiQr7q22vDe5CHCxUclvpl1gEVAewVoINOvFlFFl4
Private Key:
7XXms7NXIM-FuCrxVzoQqlLYz3kuYpdftzL5Dz_LI

We can now send a message using the send-notification subcommand:

web-push send-notification --vapid-pubkey="BNX…" --vapid-pvtkey="7XXX...LI" --vapid-subject="mailto:ignorant@blogspot.com" --payload='hello, world'
Usage:
web-push send-notification --endpoint=<url> [--key=<browser key>] [--auth=<auth secret>] [--payload=<message>] [--ttl=<seconds>] [--encoding=<encoding type>] [--vapid-subject=<vapid subject>] [--vapid-pubkey=<public key url base64>] [--vapid-pvtkey=<private key url base64>] [--gcm-api-key=<api key>]

The problem is that we haven't specified an --endpoint - where to send the message. In order to send a payload, --key and --auth are also required. Fortunately, we can obtain all three at once by the simple expedient of subscribing in our web app.

Subscribing to Push Messages

In order to get the endpoint - and the end-to-end encryption keys in order to send a payload - we need to subscribe on the client. As I said in the introduction, the "polite" way of doing this is to add a button to your web page to enable the user to "ask" for notifications.

Since this depends on having a registered service worker, by default this button should be invisible and only be displayed when the service worker has been registered. In the registration callback, it can then be displayed until such time as it is clicked (or otherwise dismissed). Of course, as with everything else, this doesn't have to be a button per se but can be any kind of user affordance which indicates a deliberate intent to subscribe.

The button then needs an event listener which actually does the subscription like so:

var options = {
userVisibleOnly: true,
applicationServerKey: applicationServerKey
};
registration.pushManager.subscribe(options)
.then(function(sub) {
console.log("subscribed to", sub.endpoint);
var simple = JSON.parse(JSON.stringify(sub));
console.log("auth", simple.keys.auth);
console.log("key", simple.keys.p256dh);
});

When you click the button, it turns around and asks the registration object obtained from registering the service worker to subscribe using a set of options. The applicationServerKey is the public VAPID key used by the server. This is defined at the top of start.js; if you want to run this example, you will need to replace that value with the one you generated above. The userVisibleOnly flag is one that says that when we send a message we will alert the user that we have done so. Our current code in fact does not do this; instead, the browser will (sometimes?) display an automatic notification on our behalf to say that messages have been received.

The result of calling subscribe is a subscription object, returned through a promise, which contains a new endpoint describing this application in this browser on this machine. Note that this subscription is somewhat persistent: if you run this code multiple times, you will get the same value over and over. Obviously on a different browser or a different device, you will get a different code.

The endpoint also automatically contains everything you need to know to send messages - as a URI, it has within it the server that is capable of sending messages to this browser.

The auth and key values are the values we need to use to encrypt the payload for end-to-end transmission.

Receiving Messages

Turning to the service worker, we need to handle messages when they arrive. This is done by listening for the message event.

self.addEventListener('push', function(ev) {
console.log("received push event", ev.data.text());
});

This is where most of your code will need to be placed, but for now this is enough to show something working end to end. Add the extra parameters to web-push send-notification and you should see messages come out in the console.

web-push send-notification --vapid-pubkey="BNX…" --vapid-pvtkey="7XXX...LI" --vapid-subject="mailto:ignorant@blogspot.com" --payload='hello, world' --endpoint="..." --auth="..." --key="..."

Displaying Notifications

As noted above, we are expected to display a user notification when these messages arrive. More than that, it is obviously useful to attract the user's attention, especially since the notification can be displayed when the application (and even the browser, they say) is not running.

It is easy to create a notification in the callback:

self.addEventListener('push', function(ev) {
console.log("received push event", ev.data.text());
self.registration.showNotification('New Message', {
body: ev.data.text()
});
});

Now, what if we want to have the user able to do something when this happens?

Handling Notifications

There is a notificationclick event that the service worker can handle. In this case, it is possible to take actions based on the notification arriving. This handler simply closes the notification and shows a message:

self.addEventListener("notificationclick", function(ev) {
const notify = ev.notification;
notify.close();
var longOp = new Promise(function(resolve, reject) {
console.log('notification was clicked');
resolve();
});
ev.waitUntil(longOp);
});

although, for full disclosure, I deliberately showed the message inside a promise to show how that can be wired up to the notification handling mechanism.

It's possible to do much more than this and, in particular, it's possible to make sure that our whole app is woken up. There are examples of how to do this on the Google Developers' blog.

A Pattern for Using Notifications

For me at least, there is something of a mental model dissonance in using this push technology. I grew up on socket-based client-server architectures and then moved to event-driven computing with an event bus at TIBCO. From this perspective, the web always seems backward from this point of view; the closest web technology is the WebSocket.

I think the right way to think about push notifications is to only use them when the server already feels ignored and has no other way of communicating with the app. When the user is actively working on the client, the server should just interact with the app directly and the user should see that.

The problem, of course, is knowing when the user is interacting with the client.

There may be a better way of knowing, but I think the simplest thing to do is just to use WebSockets for communication "when the app is active" and then allow that to time out (this generally seems to happen after about ten minutes) or deliberately close it after a few minutes of inactivity. Then when the user next interacts with the app restart the websocket connection; if the server wants to bring something new to the user's attention when the websocket connection is down, it sends a "push notification" which is displayed and can get the user interacting with the client again (or not, if they so choose).

A Note on Terminology

I have found in this area that a lot of the terminology seems to be used loosely and inaccurately, but I'm not really sure what "accurate" would look like. Consequently, I've followed the herd and been loose and inaccurate. But here are my thoughts on how the terms seem to be used.

"Push" is a very vague, general term that seems to mean something along the lines of "initiated by the server". The idea seems to be akin to "unsolicited". For me, of course, the key concept is the idea that events happen "in the real world" and you want to be able to react to them. If the server sees the event, it is only reasonable that it lets the client know - and the client lets the user know.

"Messages" is a word I use a lot that means what it says. A message has been passed from somewhere to somewhere else. It's encoded in some way (separately from encryption) that has been agreed by both parties making it a sensible communication. Many people seem to use the phrase "message" or "push message" in the current context to mean the process of sending a message from the web server to the app via the cloud (the very name "Firebase Cloud Messaging" is such a usage).

"Notifications", I think, technically refer to just the final step of the journey: showing something to the user. This is common parlance in the Android world. "Push Notifications" seems to blur the meaning somewhat. Yes, it should end in a notification - as we saw, the API wants you to commit to user visibility - but it encompasses the full lifecycle of the message's travel.

"Subscription" describes the way in which the app connects to the cloud messaging service for a particular web app. This word is used in many different ways in other fields (especially within the publish/subscribe paradigm) but here it has a very specific meaning of a single web app on a single device in a particular browser.

Push from an Actual Web Server

We have used the command line to generate our messages which is obviously not realistic. From within a real web server it is possible to do exactly the same thing - the command line tool that we used simply spun up a node.js instance and used the server-side library.

As far as I can see, the github "user" web-push-libs supports libraries for Node, Java, PHP, Python, C and C#. If you need something else, it is possible to work more directly with the REST API and talk directly to the endpoint.

Likewise, we have copied and pasted various items from the console to glue all of this together. A real application would need to use a technology such as AJAX or WebSockets to connect everything together.

All of that is left as an exercise to the reader.

Firebase

Given that this uses "Firebase Cloud Messaging" on Chrome, it may seem like a good idea to use Firebase. This may in fact be a good idea. But it seems to me that it adds a lot of complexity and moving parts - and I am unclear on the benefits.

Conclusion

Notifications are definitely harder than most of the other web technologies I've used. There are more moving parts than usual and connected together in different ways. But it is certainly possible to get something working in an hour or two if you know what you're doing.

I think I do now, and hopefully you do too!

Thursday, October 29, 2020

Adding to the Home Screen in PWA

Moving on from "just being a website", there are two things that most Progressive Web Apps want to do: be added to the home screen, and to deliver notifications.

These are relatively easy to achieve, but possibly more arcane than I would like, not to mention being inconsistent: the treatment on different platforms is platform- and browser-specific. It would seem to me that Chrome on Android is the "gold standard" of what's supported, and everything else is either "inadequate", "on the way there" or "unsupported" depending on your perspective.

Tidying up from before

While everything seemed to work before, it nevertheless remained the case that Chrome felt a bit "picky" about what we'd done. There are a couple of warnings that tell you that something is up, but not really what.

Scope

Message: Site cannot be installed. No matching service worker detected.

The service worker runs in the background and has the ability to 'intercept' requests (see the next section). But it can be limited in the number of requests that it can intercept by specifying a "scope". Either way, the scope is constrained by the directory from which the service worker file is loaded. This is annoying, since it stops you properly arranging your code, but only the service worker file itself needs to be at the top level.

I moved service_worker.js up to the top level.

Intercepting Requests

One of the key functions of a PWA is its ability to continue to function even when the device is not connected to the internet (this is also one of the main reasons that support is so much better on mobile devices than desktop devices).

In order to make this work, it is necessary to be able to provide all of the resources from local storage rather than from the internet, which means that you need to know where to find them locally.

The browser delegates this task to the service worker through the "fetch" mechanism.

fetch is an event which the service worker must register for. Registering for events in the service worker is just like doing so in a regular javascript application except that the "magic" variable is not document but self. self is a global variable in the context of the service worker which resolves to the current instance of type ServiceWorkerGlobalScope.

Thus we have something like this in service_worker.js:

self.addEventListener('fetch', function(ev) {
...
});

Here ev is a FetchEvent. The key thing that it supports is a respondWith method which enables the service worker to return a cached copy of a file.

It seems to me that most of this method is "boilerplate" code in that it seems to be there to connect requests from the browser to a builtin "caching" mechanism. Of course, it also gives you the opportunity to decide that some URIs cannot be cached, or to pull them from another data source - such as a database - but it seems overly reliant on user code rather than allowing more of a "filter" approach.

The "cache" is not in fact a single cache but a set of caches. In order to store the results of a fetch it is necessary to open an individual cache, but the matching process - by which we test if we have a cached file - operates across all the caches.

The implementation of the fetch logic can be found in the git repo with the tag PWA_BASIC_FETCH.

Background Color

For some reason, before you can add to the home screen, you need to set a background_color. This is mentioned in the Mozilla Documentation

Adding to Home Screen

This feature comes for free on Android the moment you load a PWA that Google Chrome recognizes. On the desktop however, more work is still required.

There is an event on the window called beforeinstallprompt which is triggered by the browser when all the conditions for being installed locally are met. In short these are:

the manifest can be found and is configured correctly;
the service worker is installed and has a correctly configured fetch event;
the app is being served securely (either by HTTPS or from localhost);
it hasn't already been installed.

We can now add the appropriate handler in our start.js script (because this is on the window, we need to do this in the client portion of the browser, not the service worker).

This does not immediately add the app to the home screen; rather it needs to provide an affordance - usually a button - to enable the user to do so. It is a requirement of the specification that the user must actively choose to enable this feature.

However, the action to actually add the app to the home screen is triggered by calling a method on the event passed to your beforeinstallprompt handler. Thus we need to squirrel away a copy of this event when we are given it for later usage.

Because this is a two-step process (first receive the beforeinstallprompt event, then click on a button), we need to make our button initially invisible and then make it visible when the event arrives. We do this by attaching a CSS class to the button with an initial setting of display: none and then specifically overriding the style of the element when the beforeinstallprompt event arrives. (I realize that there are many other ways of doing this; I'm just saying what I chose).

This is sufficiently tricky that I've checked it in before moving on - the tag is PWA_BASIC_BEFORE_INSTALL_PROMPT.

If you refresh, the button should appear. Note that the system can be a little picky about this, so you may need to hard refresh to get it to happen reliably.

Finally, we need to wire up the event handler. This does three things:

makes the button vanish again by setting its display value back to 'none';
prompts the user to check that they want to add to the home screen;
if they agree, does any subsidiary processing.

Note that we don't actually do any subsidiary processing, but the code is there for completeness and reference.

In Chrome on a Mac (my environment), once this is complete the browser window "pops out" and becomes its own application. It is also added to the "Chrome Apps" folder which is then opened in Finder.

This is all checked in with the tag PWA_BASIC_A2HS.

Removing the App from the Home Screen

Again, I can't speak for all platforms, but on the Mac it is possible to uninstall the app by selecting the three dots in the right hand corner of the window and choosing "Uninstall …".

This is obviously useful - and important - for developing the install flow without reinstalling Chrome.

Monday, October 26, 2020

Getting Started with Progressive Web Apps

If you're reading this, I'm going to assume that you know what a web app is. You've probably written one or more. If you don't, think amazon.com.

If you're alive in the 21st century, you have to know what a mobile phone app is. You may well have written - or considered writing - one of those too.

If you have, in fact, written both web apps and mobile phone apps, you may have asked, "can I just do this once?" - particularly if you have written mobile phone apps for multiple platforms.

The answer is yes.

The technology is continually changing, but all of it essentially depends on you writing a completely client-side app - that is, you need to write your application almost entirely in JavaScript, and just communicate with the server to load data using APIs - "AJAX" as they often still confusingly call it.

The latest iteration of the technology is to say "just write a client-side web app that works when connected to your server, and then progressively add features to enrich it".

Whence progressive web app.

What's the minimum I can do?

Technically, the minimum you can do is to create a web app. In keeping with tradition, we could serve up hello world from helloworld.html. It's not my job to write HTTP servers for you, so I'm going to assume you either have one you like or can easily obtain node.js or python's SimpleHTTPServer. If you can't, you probably want to stop reading about now.

However, for my money, this doesn't actually qualify as a progressive web app because the amount of progress it has made is about zero. So the minimum I would consider is a web app with a manifest.

Adding a manifest

Any number of programming languages have had "manifests" for as long as I can remember. The idea comes from a "shipping manifest" (you know, one of those labels on the boxes FedEx drop off at your house): basically, it tells you what's in the box and where to go look for it.

I have to say, they always seem like a bit of a hack to me. If these things need to be known, why are they not required somewhere in the code? Anyway, progressive web apps require a manifest, the manifest must be a JSON document your server can serve, and your HTML page needs to tell the browser that it's there. In doing so, it notifies the browser that your web app is not just a web app, but a progressive web app and that it should trawl through it until it finds the things it needs.

You might think (well, anyway, I might think) that if this thing is JSON, the minimum it could be is an empty object, so:

{
}

In order to include this in your application, you need to add a link tag in the <head> section of your index.html.

<link rel='manifest' href='/json/manifest.json'>

Now we can load the website using our trusty friend, python's SimpleHTTPServer (or otherwise):

cd pwa/html
python -m SimpleHTTPServer

And then the website should be visible on http://localhost:8000/.

So far, so good. You can check this out from github as PWA_MINIMAL_ERRORS.

Uh, it doesn't look any different

Well, no.

That is what the "progressive" bit is all about. It starts off as a completely generic website and then when you have reached a certain threshold of adding stuff, you can start doing fancy things such as working offline, adding it to the homepage and sending notifications.

But we can check up on our work, though. Assuming you are using Google Chrome, open up the developer tools and go to Application. The top thing there, which you have probably normally ignored on your way to Cookies is a tab called Manifest. Click on that. You'll see a number of things there.

First off, there is a link to the source: always useful to check that it has downloaded the most recent version of your manifest (which can be a problem as discussed later). After that come a list of errors that Chrome helpfully describes as "instabilities". We're going to fix those before we do anything else.

Naming It

The second message here complains that there is no "name" or "short name". Drawing on Mozilla's Documentation

Note that for reasons best known to themselves, manifest JSON files separate words with underscores.

{
"name": "First Ignorant PWA",
"short_name": "PWA1"
}

We can reload and the error now goes away. Scrolling down, we can now see that these fields have been added to the Identity section of the manifest information.

Where to Start?

Going back to the first error, most websites "know" that the home page of the website is called something like index.html. This is just a convention, of course, and you can configure a web server to point to any page on your website. Much the same convention applies with progressive web apps, but Chrome considers it an error to take advantage of the convention and instead wants it to be made explicit, hency the warning. By adding a start_url field to the manifest, we can identify that index.html is where we want the application to start when it is restarted locally rather than downloaded.

What Can I Show You?

The display field says how the application wants to present itself. If you are thinking of a PWA as a webpage that just has some fancy features, you will want to go with the simplest level standalone. But PWAs also offer the opportunity to try and become full-fledged apps on your phone or tablet. In that case, there are additional levels of control, each providing less browser UI and depending on you to do all the work until you reach the fullscreen level of control - where your app takes up the whole screen.

According to the Mozilla docs, there is also a browser option that it claims is the default. Chrome says this is an error. I am going to the reality that Chrome provides rather than "the way it should be". Your mileage may vary.

We, of course, are going to go for standalone, since we're not doing anything fancy at all.

Icons

Icons are the bane of my life. Mainly because I'm not at all artistic. But also because there are all sorts of arcane rules about what is and is not allowed - sizes, formats, etc - all of which vary between platforms.

For this project, I used an online icon generatorto generate a package of icons. As it happens, it also generates a minimal manifest which would not be an unreasonable place to start. But instead, I just used the icons and copied and edited the relevant sections of the manifest into my manifest.

That completes a minimal manifest. This is tagged PWA_MINIMAL_MANIFEST.

May I Be of Service?

The final thing that Chrome complains about is the absence of a service worker. What is one of these? Well, it is the thing that is key to making a web app progressive, and it's really what we've been building up to.

Web applications, like other UI applications, have a "main thread" or "UI thread" that is responsible for dealing with user interactions and handling display. If you have done any amount of JavaScript (or other UI development), you will know that it is important to keep things simple and short on the main flow in order to keep response snappy. If you have done quite a bit of JavaScript, you will know that because there is only one thread, it can be difficult.

Workers solve this problem by offering other execution environments in which it is possible to do work that does not interfere with the main flow. And it really is separate and does not interfere: nothing is shared between these two environments except for a message passing mechanism. In a sense, they are like iframes without any visual component.

In order to handle all the things that Progressive Web Apps need to handle without interfering with the main application rendering cycle, it is necessary for them to have at least one service worker and to identify this in the manifest.

Oddly, the service worker is not declared in the manifest but rather must be created from main thread JavaScript code. (To be honest, there is nothing odd about this at all. It used to be declared in the manifest, but the declaration became inadequate to cover all the registration cases so has been deprecated.)

So at this point we need to create two JavaScript files, which I'm calling start.js and service-worker.js. Note, however, that only the start.js finds its way into a script tag in index.html. This is because service_worker.js is not loaded into the main body of the page but into a background "page".

Starting with the service worker (because it's simpler for now), all we want to do to begin with is identify that we have in fact been loaded.

console.log("hello from the service worker");

All the hard work (for now) is getting that to load.

Loading the service worker

First off, not all browsers support service workers (really? You know this is 2020?). Well, possibly they all do now, but you can't be sure, so first test that the functionality we are going to use - the serviceWorker property of navigator - has been defined. If it has, then add a callback when the document is loaded to try and load the service worker. Note that while it's not strictly necessary to wait until the whole page is loaded, it makes sense because you will probably want to have things happen and you don't want the page to not have loaded the elements that you need. You may also want to add other setup and configuration to this callback before you register, just to make sure that everything happens in the right order.

if ('serviceWorker' in navigator) {
window.addEventListener('load', () => {
...
});
}

The final thing we need to do in there is call the register method on the serviceWorker property of the navigator. This needs the path to the JavaScript file to use (service-worker.js) and returns a Promise than will contain the registration if successful - or an error if not.

navigator.serviceWorker.register('/js/service-worker.js')
.then((registration) => {
console.log("have registered service worker", registration);
registration.update().catch(ex => console.log(ex.message));
}).catch(ex => {
console.log("failed to register service worker", ex.message);
});

While not exactly idempotent, the register method can be called regardless of whether there is already a service worker installed. This may seem bizarre, but remember that PWAs can be stored locally and thus can be rerun in different scenarios. Anyway, go ahead and call it and it will do the right thing.

But what it doesn't generally do is to check whether the JavaScript is up to date. It will generally look at the local cache and accept whatever version is there. This is absolutely fine if you are not connected to the internet (offline working is the first benefit of using a PWA) and is OK if you are a casual user of an application. But it is not at all good if you are actively developing. To bypass the cache, the returned registration has an update method that says "if you can, go and check if there is a more up-to-date version out there".

If all goes well, you should now see messages come out in your console.

This is all available as PWA_MINIMAL_WORKER.

Conclusion

That is pretty much as minimal as you can make a Progressive Web App. Obviously, I have no intention of stopping there, so read some of the other posts in this thread.

Wednesday, October 7, 2020

Understanding TextMate Grammars

When I was working through the example VSCode syntax highlighter, I observed that the grammar was specified using "TextMate grammars" and noted that I had never heard of these. Research was called for.

My particular challenge was to figure out how to match the syntax of a language which leans heavily on indentation. It seems hard to find the end of a block using the definition that "it doesn't have more tabs than this one".

Excuse my ignorance (as it happens, this is a rule of reading this blog - I am exposing my ignorance precisely to help others) but it turns out somebody has already come up with a solution to this problem. It is apparently called negative lookahead. As explained in the link, the idea of "lookahead" regular expressions is that they can check whether or not a pattern occurs but do not consume the matched characters. Thus, providing a negative lookahead pattern for the end pattern of a rule which matches one level more of nesting will cause the block to end on the previous line. I think.

I gained this insight from a separate web discussion which simply went ahead and used these expressions to achieve this objective without really explaining what they did.

Breaking Down the Example

So what does it do? Well, first off, I ran it to check that it did actually work in the way I was expecting. However, it doesn't really do what I want.

This is the specification of the rule in some cson format that I don't entirely understand but appears to be some form of simplified JSON:

'indent-block': {
  name: 'indent-block.test-grammar'
  begin: '^(\\s+)(?=\\S)'
  end: '^(?!\\1\\s+)(?!\s*$)'
  patterns: [
    {include: '$$self'}
  ]
}

So this says that a block starts on a line that has a specific amount of leading whitespace and ends at the first line which starts with no more whitespace and has exactly one token on it. This is good for matching blocks such as

  while true
     …
  done

because the begin rule will match while true and then end rule will match done.

Unpicking the patterns, the begin pattern wants to match text which:

starts at the beginning of the line;
has at least one whitespace (\\s) character;
is followed by at least one non-whitespace character which is not included in the pattern.

This allows the block to contain arbitrary matches for things like while as well as inner blocks.

The end pattern wants to match text which:

starts at the beginning of the line;
does not begin with more whitespace than the begin line (\\1 represents the pattern which matched the inside of the parenthesis in the begin pattern);
contains arbitrarily many non-whitespace characters until the end of the line.

None of this (except the beginning of the line) is actually part of the pattern, leaving the remaining text to belong to an outer block (although I'm not sure why this is desired).

I did not come to a positive conclusion, but it seemed to me several times during my debugging sessions that if the beginning of line anchor is matched, it cannot be used again. That is, it seems that VSCode treats the beginning of line anchor as it would any other character and once it is matched it cannot be matched by a subsequent pattern.

Making it Work for Me

But this is not the pattern I want to match.

First off, I want a block to include all the lines after this one that are indented more than this one. And I want the block to end the moment it sees a line that is indented the same amount as this one. But I want it to end before that line starts.

To add to the complexity, I'm happy for blank lines - or lines with non-whitespace text in column 1 - to appear within a block, so any line which is blank or doesn't start with whitespace is not an end marker.

An additional challenge for me personally is a mental model shift: I'm used to thinking outside-in: breaking up the document into large chunks, then breaking those up into smaller chunks and so on. But this mechanism for matching regular expressions approaches the problem in a more inside-out way: the outer blocks "do not get a look in" until the innermost block has reached its end.

So a block is a sequence of lines which starts with a line with at least one leading tab character and continues until it finds a line which has:

at least one leading tab character;
no more leading tab characters than the begin line had;
is not blank (consisting entirely of whitespace characters).

Finally, it does not actually consume any of this, leaving it all for the next start line.

So the begin pattern is fairly easy to write:

^(\t+)(\\S+)

This only matches lines with at least one leading tab and at least one non-whitespace character after the leading tabs. The second capture group is simply there to capture the "keyword" for highlighting to make it clear where the blocks begin.

The end pattern is considerably more complex:

(?=^\\1\\S)|(?=\t(?!\\1\\S))

This is an OR of two positive lookahead conditions.

The left hand condition says that the block will end if the putative end line starts with exactly the same number of tab characters that were in the block opener, followed by a non-whitespace character. The right hand condition says it will also come to an end if the line begins with a tab and does not then contain the full block opening.

Put another way, the left hand condition will end the block when a line appears with the same indentation; the right hand condition will end the block when a line appears with less indentation (but at least one tab, thus ignoring blank lines and literate comments).

I have put this rule in the grammar for .st files and you can experiment with that.

Once again, I want to point out how very hard debugging all this is: in particular, there are some things you can do with lookahead patterns (I'm not sure exactly what) that cause the syntax highlighter to just "give up" and when you try and open the inspector, it just sits there saying "loading". Your only option at this point is to quit the editor and try a different pattern.

It also really annoys me, both from a development and a documentation standpoint, that you cannot add comments to a JSON file.

But what I Really Want is …

Having put all that hard work in, I discovered that it wasn't what I really wanted.

The reason, of course, is my mental model is wrong. As I noted above, I tend to think "outside-in", so I feel that if I can break the code up into blocks, then I can tackle one block at a time. This is not the right way to think about the problem.

Instead, what I need to do is to match each type of block that I have, and then provide it with a set of included patterns. Since my grammar does, in fact, distinguish quite significantly between blocks, this is actually fine.

But all the effort was not wasted, because pretty much every pattern is going to be a variant of what we saw above. One simplification that arises is that for many of the blocks, I know exactly what the acceptable indentation levels are, so I can just use that directly rather than the \\1 matching. On the other hand, I've made it more complex by trying to catch explicitly some versions of invalid indentation.

Another Challenge - Nesting

Another challenge I am facing is: "how do rules nest"? That is, if I specify multiple sub-patterns for a rule, in what order are they selected?

Again, a lot of this just doesn't matter. We are not fully parsing the code, trying to determine its structure and meaning, and rejecting invalid programs; we are just trying to highlight things according to their category. So, except for going deep into blocks, it is generally enough just to say "ah, this is a type" or "ah, this is a variable".

A couple of things do seem to matter, though. One is that I want end-of-line comments (starting with //) to be selected in preference to anything else. Another is that I want to handle blank and literate comments as a priority. And so, by trial and error, I determined that the rules seem to be processed in the order in which they are presented in the patterns block for a given rule. And, as noted above, once a rule has been selected, no other rule will be selected until it has finished (except for those nested within it).

Creating a grammar

Taking all these points together, I looked at the sample .fl file and created a grammar which seemed to cover the subset of rules that I had relied on.

This is now installed in flas.json and can be seen in the repository.

Making it real

My original plan was to generate a grammar file from my formal grammar, but having experimented in this way, I've realized two things:

the impedance mismatch between a formal grammar and the regular expressions used by VSCode is vast and an automatic conversion between the two would not be easy;
having sunk the investment that I have into figuring it out, it actually isn't that hard to define rules in this way, and I might either do it by hand or invest in a quick conversion tool that automates all of the hard work of JSON formatting and repetition and checks my work.

Conclusion

It's not really news, but doing anything with regular expressions is a pain.

That is particularly true when you are bending them out of shape to do something of the job of an actual parser.

But provided you can bend your mind around it all, it is perfectly possible to come up with an acceptable set of regular expressions for an indentation-based language.

Thursday, September 24, 2020

Packaging the Extension

I am not, at the moment, interested in distributing an extension through the marketplace, so the final step is to package the extension up so that I can include it in my "regular" version of VSCode (and share it with anybody else who is interested).

Everything here is based on my reading of the Microsoft documentation.

VSCE

All of the tasks around publishing seem to depend on a tool called vsce. This is obtained from Microsoft via npm as follows:

npm i -g vsce

By installing this globally, it can be run from anywhere.

Packaging the extension

From the package root (in my case vscode-java), simply run

vsce package

This produces a vsix file.

Installing the extension

From within VSCode, it is possible to select the Extensions tab on the sidebar (on the left hand side) and from the drop down menu it is possible to select Install from VSIX…. When you select this, a file chooser comes up and you can find the appropriate .vsix file and select it. This installs the extension in the current VSCode.

You can remove the extension by selecting the "settings" icon shown by the plugin (the gear icon) and choosing "Uninstall". It may be necessary to click on "Reload Required" to complete the uninstallation.

Handling the Java Executable

That, of course, would all be so easy if it weren't for the Java executable. This is going to "move" in the process of bundling the executable and so the code that locates it needs to be able to distinguish between the "development" and "release" cases.

This has two facets: first off, we need to copy it from its current build directory (in lsp-java) to the extension directory (vscode-java); then we need the extension.ts to look in both places and choose the development one if available, else the production version. All of this ends up being sufficiently complicated that I created a script, package.sh, to do all the packaging.

Conclusion

Packaging and installing an extension for VSCode was pleasantly easy. In fact, the whole process of dealing with extensions has been easier than I was expecting, and now I feel ready to tackle this in the real world.

LSP Navigation and Completion

The final things I want to do before moving on are to navigate to elements and to complete them. This demonstrates that we have at least a basic understanding of the program we are analyzing.

Building up Knowledge

The first thing to do is to build a repository of definitions encountered by the compiler.

In a real compiler, this is obviously the main purpose of parsing, but having ignored that up to now for the purposes of this exercise, I need to add building a repository to the SimpleParser. For simplicity, I am just going to capture the definitions of cards and contracts.

In order to be ready before users open files, we need to parse all the files in the workspace during initialization and store all the definitions in the repository. I want to make a few points here before moving on.

In the real world, this calls for quite a cunning data structure, in that it needs to be able to refreshed quite frequently (every time we receive a didChange event) and incrementally (removing all the definitions from the damaged areas) without leaving any dangling links (if we allow references) and (for the purposes of completion) being able to find all the tokens matching some typed text (possibly in some preference order such as usage); for this demo, I am just going to keep it small, simple and brute force.
The IgnorantLanguageServer class now seems to be growing too big and have too many responsibilities: if this were a blog about clean code, I would have a post or two about how I broke this up and found the ideal pattern that describes it; but for now, I am just going to keep putting more miscellaneous material in here as long as it comes under the very general heading of "coordination".

As soon as I started to do this, I realized that the protocol I was using on the server side was different to the client side … of course my dependencies are out of date. I upgraded to version 0.9.0 of the lsp4j library and adjusted to cope with the fallout.

In line with the general air of insouciance which pervades this article - if not indeed this entire blog - I am not going to parse the files at all carefully, but just go after what I think should be there, being just careful enough not to cause any fatal exceptions. The important thing is to capture, for each interesting definition, its name and its location.

Because this is an incremental, repeated process, we need to clear out the repository every time before we parse a file. We do this by scanning through the repository looking at each entry in turn. Because we are parsing an entire file at a time, we only need to match the entry's URI although if we were interested in handling smaller changes we could also look at line numbers (and even possibly character positions) and clear out at that level. As noted above, our repository is just storing all the items in a TreeSet, so we just brute-force our way through the entries and remove the ones with a matching URI.

Go to Definition

In order to handle the "Go to Definition" functionality, we need to inform the client that we are able to do this.

For some reason, this requires two steps. First, we must add the capability in the initialize method of the IgnorantLanguageServer, and then we must implement a new method (declaration) in the TextDocumentService. I have to say I don't like this approach. Maybe it's just because I haven't caught up with default methods in Java interfaces, but it seems to me that a better pattern would be to say in initialize "here is an implementation of a (one function) interface that supports declaration".

Anyway, the intent of the declaration method is to find all the locations where there might be a declaration of a symbol. Not, of course, that you get the symbol per se - you get the location of the cursor when the user asks for the definition. One approach would be to have the parser record all instances of all symbols and then have a location to location table; while I think there is a lot to be said for this (for example, it would make Find Usages easy to implement) it seems to be more work than I want to do right now. So instead, I'm going to go to the relevant file and line and try and identify the token around that location.

This feels like a responsibility for the parser, so I'll put the code there. I don't like allowing abstraction bleed, but I'm going to pass across an LSP Position because I don't at the moment have my own position abstraction as such (I just stored the primitive line and character position in name) so if I were going to take this further I would need to refactor (my real compiler, of course, does have a very clear, and quite rich, position abstraction). Because this method does not come with any file text, I also had to store the text of all files during parsing so that I can read it back now (it would of course be possible to read them from the disk).

Having found the token (an exercise in string manipulation), we then ask the repository to hand back a list of locations where it is defined. While (logically) that might be only one, in a "live" compiler where users can type anything, it is not unreasonable to suppose that there might be multiple definitions; or indeed, that ambiguity might exist as to which of several matching definitions might be intended. The repository scans through all its definitions and sees which match. Again, for convenience, I have let the LSP abstractions bleed through into the repository; obviously the code in the ParsingTextDocumentService should adapt these.

And hey presto, we are done. If you click on the uses of HitMe or Multiplier in the contract.st file, and select "Go to Definition" from the popup menu, the appropriate definition in contract.fl will be selected.

Completion

Completion works by you typing some characters and the editor offering you a selection of possible completions. Obviously, this list should be ordered in a way which makes the more likely selections come first and contains possible and partial matches as well as the "obvious" ones. I'm not going to do any of that. I'm just going to offer completions for which what the user has typed is a proper prefix. Yes, it is because I'm lazy - but my justification is that all of that hard work is not relevant to the integration.

As before, the first thing is to announce that we can do this. This again comes in two parts: we need to specify the capability and then we need to implement the completion method. In this case, however, the capability is more than just a boolean value, it wants a boolean saying if it is a "resolve provider" and a list of "trigger characters". It is not entirely clear what it means to be a "resolve provider" but I think it is reasonable to say "no" for now; googling around, the idea behind "trigger characters" seems to be that not everything you type in the editor will force it to the expense of a round trip to the server to obtain completions. It seems by default, alphanumeric characters will; if your language wants to extend this, it is possible to specify "trigger characters" that will force completion logic (think < in HTML).

The implementation is much the same as before: try and figure out the token and then ask the repository to complete it. Again, I let the LSP abstractions bleed over into the repository code.

And just like that, it works!

Resolving Completions

It would seem that there is a lot more information that can be communicated about completions than just the text to fill in. Some of this (such as the icon to display) probably needs to be returned at the same time as the original completion item to be useful; but other data (such as how to insert it, etc) is only relevant if you choose to do the insert.

Given that collecting and transmitting this data is probably expensive, I think the logic is that the completion() method returns a list of all the candidates and the resolve() method asks for detailed information about only the selected candidate.

I did a brief - and not very interesting - foray into this area to finish up this work.

Conclusion

Wiring up the LSP services does not seem that hard. It seems to be mainly a question of implementing the correct parsing and repository operations on the back end and adapting between the expectations of the LSP protocol and how you have your information stored. In a real system, a sophisticated repository is essential to good performance.

Connecting VSCode to Java

The next step in the process of integrating my compiler with VSCode is to get the LanguageClient inside VSCode talking to an LSP server running inside a Java process.

I was pinning my hopes on deriving this code from what Adam Voss had done. Sadly, I could not reverse engineer what he had done on the client side, so I started trying to research for myself. Sadly, although there appears to be good "explanatory" documentation for VSCode, it doesn't seem like there is very much in the way of "reference" documentation, so I ended up looking at the code.

Now, I'm not a typescript expert, but going through this, it seems that what I really want to do is to provide a hash for the "server options" containing the field command in it. OK, I can do that. Everything else I'm going to liberally borrow from the lsp-sample from Microsoft.

Let's get coding

Another problem rears its head at this point. I have one project; lsp-sample seems to have two. Logically, it has a client and a server. I want to copy the client and do my own server (in Java). But the top level directory also has a package.json and the relevant items seem to be spread across the two. I don't really understand how this works, but I will just steal what I need to go in my package.json and hope for the best.

First off, I need the dependency on language-client:

"dependencies": {
"vscode-languageclient": "^6.1.3"
},

Then I need to define activationEvents, which if I understand it, is the way in which you tell VSCode that your extension is willing to take on a particular file (possibly along with other situations).

So we declare two activation events (one for each of the two languages declared in package.json) which notice when an editor is opened which meets their criteria for editing.

"activationEvents": [
"onLanguage:flas",
"onLanguage:flas-st"
],

When I tried this, it didn't work and I received an error message that

properties `activationEvents` and `main` must both be specified or must both be omitted

I didn't see anything about this in the documentation, but by reference to the sample, it would seem that you have to specify a value for main in package.json, pointing to where the extension.ts file is found.

"main": "out/extension.js",

This may not immediately appear to be where extension.ts will be found, and it's not. Because in the real world node uses JavaScript, the "out/" is required because tsc is generating the JavaScript file (also note the .js extension here).

Defining the Extension

So that's the configuration. But what goes in extension.ts? I'm not going to reproduce it all here, but it is complicated enough - and took me long enough to figure out - that I think it's worth digging into a little bit.

Working backwards, we need to create and start a LanguageClient:

// Create the language client and start the client.
client = new LanguageClient(
  'IgnorancePlugin',
  'Plugin through Ignorance',
  serverOptions,
  clientOptions
);

// Start the client. This will also launch the server
client.start();

The four fields here are:

the id of the plugin which will come up from time to time later;
the title of the plugin which is displayed in the output and extensions windows in VSCode;
the options about how to run the LSP server;
the options about how the client is configured.

The client options seem fairly easy, although I have to say that I didn't dig too far into what all the possibilities were.

// Options to control the language client
let clientOptions: LanguageClientOptions = {
  // Register for our languages
  documentSelector: [
    { scheme: 'file', language: 'flas' },
    { scheme: 'file', language: 'flas-st' }
  ]
};

This seems to me somewhat duplicative of what we configured in the package.json but it may not be.

Finally, we have the server options, which, as noted above, can come in one of several varieties. To specify an external server which needs to be launched each time, the server options need to be an object with a command specified. args, env and cwd may also be specified. In the absence of cwd the current workspace root is used.

Thus I end up with these options:

let serverOptions: ServerOptions = {
  command: "java",
  args: [
    "-jar",
    path.resolve(context.extensionPath, '..', 'lsp-java', 'build', 'libs', 'lsp-java-all.jar')
  ]
};

Here, context.extensionPath is the path where the extension is found. Because I know that the Java binary is going to be found relative to the extension, I can specify this here. I'm not sure what happens when you come to package the extension for distribution, but that's a topic for another day.

Oh, and don't think that I figured all this out a priori. I spent a lot of time running sample scripts that were outputting relevant information and causing errors in VSCode to find out all the information I needed.

The Java Server

On the server side, I simply repackaged the Adam Voss server, putting it into my own package (under the lsp-java directory), and made it operate over standard input/output rather than using a socket.

So now it works, right? How do we know?

It appears that there is a mechanism to view the communication between client and server, but how easy is that to actually do? Actually, not too hard.

First off, you need a block like this in the contributes hash of package.json.

"configuration": {
  "type": "object",
  "title": "Ignorance Settings",
  "properties": {
    "IgnorancePlugin.trace.server": {
    "scope": "window",
    "type": "string",
    "enum": [
      "off",
        "messages",
        "verbose"
      ],
      "default": "off",
      "description": "Traces the communication between VS Code and the language server."
    }
  }
}

Overall, the "configuration" block defines all the settings that your extension has. You can define anything (within reason) here and access it from both the client and the server.

The settings block is automatically configured as a properties "window" inside Settings. If you go to Code > Settings and then select Extensions, you will see a sub-block called Ignorance Settings (the name comes from the title above).

The property defined here is interpreted directly by VSCode as defining the trace level of the communication. This works by knowing exactly the plugin name (the id of the plugin as specified in the LanguageClient constructor in extension.ts) followed by .trace.server. From the settings window, it is possible to change the value to messages or verbose and see the communication between client and server. This is obviously vital for understanding what is going on.

Once turned on, you can quit the Extension window and restart it. As you do so, you should see the messages appear in the Output window. If you can't see the Output window anywhere, you can make it pop up by selecting View>Output from the main menu.

When I do this and restart with contract.st open, I see three messages sent across to the server: initialize - (0), initialized and textDocument/didOpen. It's a lot of output, so I'm not going to reproduce it all here, but a number of fields are interesting.

In initialize, a rootPath and a rootUri are passed across, which appear to be the location of the workspace folder. The response contains information about the features that are implemented - presumably derived from the code I have copied across to get started.

The initialized message is empty.

The textDocument/didOpen message contains a full uri, the languageId which VSCode has identified it having, a version number (which is updated every time you make a change - i.e. type anything), and the full text of the original document.

Every time the document changes, a textDocument/didChange message is sent across: the textDocument element describes the uri and updated version of the document, and the contentChanges contains the full text of the document. I believe that advanced usage exists to say that you only want to see some sub-ranges of the changes, but for me right now, getting the whole document every time feels like a win. In fact, it would seem that this is a setting configured during the initialization step (see IgnorantLanguageServer.java).

Wiring up a "Real" Compiler

In my first server-side check in, I simply accepted the code as I found it, but it didn't really do what I wanted. Here I am going to rework this code and, as I do so, describe how it now works.

Now, I don't want to wire up my full compiler at the moment (well, actually I do, but here is not the place and now is not the time). But I definitely want to check out how to deal with errors and report messages if something has gone wrong. So I am going to write a very simple compiler to handle the flas and flas-st languages without dealing with all their complexity.

FLAS is an indentation-based language, so I'm going to start by saying that we can have two top-level elements in FLAS files, contract and card (mainly because that is what is in the sample I have here). Any other keyword (for example error) is going to be an error and a message needs to come back to that effect. Lines that are not indented are considered comments; lines that are indented more than one tab stop are ignored by this simple parser.

The main function is in LSPServer. This simply creates an IgnorantLanguageServer and creates an LSP server using this and the combination of standard input and output.

The top level of the LSP server is IgnorantLanguageServer. It is responsible for setting up the connection and wiring things together. The initialize method is the first method called from the client side and passes in the expectations from the client side; the response is the set of capabilities that the server is prepared to offer.

The connect method is called from LSPServer when the client connects and enables the server to respond.

Because we want to implement the text document service, we do that and return a wrapper around our simple parser, the ParsingTextDocumentService. While there are many methods this could implement, we are basically just interested in the client opening or changing documents. Every time this happens, we parse the document using our SimpleParser; in the process of parsing, it sends back any errors it encounters.

Having done all of that, restarting the extension window of VSCode produces messages about the errors in our code. Great!

Conclusion

In this episode, we successfully wired up a Java based back end for the LSP and enabled it to read and parse documents sent from the editor. In doing so, we were able to send back errors as we came across them.

That's most of what I want to do. There are just two more things I want to try in this fake universe - can I navigate to a definition from a reference and can I complete a typename?

Syntax Highlighting in VSCode

So, time to start writing some code. Or, at least copying it.

I created a new directory in the ignorance repository, called vscode-java. This is where I'm going to put the VSCode half of the language server - the client if you will. As trailed in the last post, my starting point is going to be copying the contentprovider sample and simplifying it. So that's the code that I copied.

And then I went through "simplifying" it - i.e. deleting most of the actual code so that I was just left with the syntax highlighting portion. I then copied in a couple of sample text files from my own repository. And obviously I had to run npm install and open it in VSCode.

How Syntax Highlighting Works

The instructions on configuring syntax highlighting in the Microsoft documentation are actually quite clear, but not very exhaustive. Mainly it seems to defer most of the details to "it's the same as TextMate" without referencing anything.

The official word on TextMate grammars appears to be this document, but it's not very detailed itself. I haven't managed (yet) to find any more introductory work.

The key thing seems to be to configure language and grammar contributions in package.json, so I did this:

"contributes": {
  "languages": [
    {
      "id": "flas",
      "extensions": [ ".fl" ]
    },
    {
      "id": "flas-st",
      "extensions": [ ".st" ]
    }
  ],
  "grammars": [
    {
      "language": "flas",
      "path": "./syntax/flas.json",
      "scopeName": "source.flas"
    },
    {
      "language": "flas-st",
      "path": "./syntax/flas-st.json",
      "scopeName": "source.flas-st"
    }
  ]
}

Here I appear to be defining two languages, but that's just because I have two types of file for my language: the main files have extension .fl and the system tests have .st. Each of these has its own grammar. The grammars are placed in files under the syntax directory and each has its own scope name for theming purposes.

Defining the Grammars

The grammars are defined in JSON approximately in line with the description of "TextMate grammars" insofar as I understand it (not a lot as yet). I'm sure it will become clearer as I dig in more. Sadly, the syntax is sufficiently opaque as to discourage you from learning by example.

However, this is an excerpt of one of the grammars I defined.

{
  "name": "flas",
  "scopeName": "source.flas",
  "patterns": [
    { "include": "#comment" },
    { "include": "#contract-intro"}
  ],
  "repository" : {
    "comment": {
      "name": "comment.line.file",
      "match": "^[^ \t].*$"
    },
    "contract-intro": {
      "begin": "\tcontract\\b",
      "beginCaptures": {
        "0": { "name": "keyword.intro" }
      },
      "end": "(//.*$|$)",
      "name": "statement.contract",
      "patterns": [{"include":"#typename"}]
    }
  }
}

The name and scopeName match the language and scopeName from the package.json. Failure to match both invalidates the grammar and it will not be used for syntax highlighting. The patterns array defines a set of productions or rules that can occur at the top level. In spite of being defined by regular expressions, there is an element of grammar productions to this, and it is certainly NOT the case that each regular expression just matches what it feels like.

The repository allows you to define more complex, (possibly recursively) nested rules. The #include syntax says that instead of specifying a specific pattern it is possible to delegate to a rule in the repository. The reference to the rule name must begin with a #, while the rule name itself does not; I'm not sure why. The patterns array both at the top level and within a repository definition is an array, any of which of the patterns may match some or all of the portion of the inner text, but it is not possible for them to match overlapping text.

It's also important to realize that the begin and end patterns are not part of the body of the rule and so are not included in the sub-matching of the patterns array but rather have separate logic to style them (beginCaptures and endCaptures).

Debugging

If reading (and writing) these grammars is hard, figuring out what is going on - and wrong - is insanely hard. First off, every time you make a change to any of the grammar files, you need to restart the Extension instance. This is done by using Sh-F5 to stop the current instance and then F5 to start a new instance.

It is then possible to see the consequences of your actions. If you're fortunate, you will see visual effects on the screen. If not, or if you just want clarity about what happened, it's possible to bring up the Token Inspector to see what happens. In the extension window, type M-Sh-P to bring up the command window and then type some portion of Developer: Inspect Editor Tokens and Scopes. Selecting this pops up a window which shows which rules were applied at the current location. To choose a different location, simply click there (unless it's under the popup window, in which case you may need to resort to trickery such as clicking elsewhere first or using the keyboard). To dismiss the window, press ESC.

On the upside, every time you restart, VSCode picks up from where it left off, so you don't need to go through the steps of re-opening the relevant windows. It also learns fairly quickly that you want to use the Token Inspector and suggests it sooner. And of course, if you are desperate, you can bind it to a simple keyboard shortcut.

Conclusion

Actually wiring up syntax highlighting was surprisingly easy. Getting the patterns to work was not. The complete lack of any tooling (at least in VSCode) that points out errors or failings was really annoying. Some things that particularly caught me out were forgetting the # when referencing a rule in the repository; not doubling the backslash characters before special characters (such as \b) in regular expressions (but note that this is not wanted for \t, which is a tab character); the begin and end syntax along with the fact that they are not included in the inner patterns; and the fact that regular expressions do not overlap.

I need to spend considerably more time looking into the syntax and trying to figure out how to use it to reasonably describe a quite complex "context sensitive" grammar using this weird mixture of regular expressions and production rules. But that is more for the "real world" than it is for this blog (although I may come back here if I have any wisdom to distill), and in the meantime it is time to move on to the task of integrating with a Java back end.

Sunday, September 20, 2020

First cut at LSP

Building on my previous research, I'm ready to try and do something with VSCode and LSP.

The place to start is by seeing if we can get some existing sample code to build and run. Let's try that.

Doing something

Based on my previous research, I decided to start by checking out Adam Voss' LanguageServer over Java example:

git clone https://github.com/adamvoss/vscode-languageserver-java-example.git

Given that this was based on a Microsoft sample, I checked this out too:

git clone https://github.com/microsoft/vscode-languageserver-node-example.git

But then, reading the README, it transpires that in the meantime, Microsoft have deprecated this (actually, quite a long time ago). Looking at the replacements, it looks to me not so much that the technology has changed as that Microsoft have moved from their samples from separate repositories to a "mono repo". Anyway, I went ahead and checked that out, too.

git clone https://github.com/Microsoft/vscode-extension-samples

This seems to have a number of samples within it, so which one should we pick? It seems that the most basic one is helloworld-sample, so let's start there. There appears to be some level of "getting started" documentation available, although it doesn't start with checking out this repository but using some complicated mechanism to generate your own, and seems to overly complicate matters. But based on all of this, including the instructions in the README, which also seemed a little confusing, this worked for me.

In a terminal, cd into the helloworld-sample directory
Run npm i
In VSCode, select "File > New Window" and then "Open Folder…" and then select the "helloworld-sample" directory
From here, push F5 or select "Run > Start Debugging"
A new window opens

Now, I haven't (and probably won't) dig into what this extension does, because at the moment I'm not interested, but it offers a new command you can run by doing CMD-SHIFT-P and typing Hello World. This pops up a message in the bottom right corner of the window. OK, very good, we have something working. Close both the VSCode windows (not saving whatever it was we hadn't changed) and we are back to where we started.

Working with LSP

It seems like the sample closest to what I had been looking at before is the lsp-sample. So let's repeat the above process to see if we can make this sample work as well.

In a terminal, cd into the lsp-sample directory
Run npm i
In VSCode, select "File > New Window" and then "Open Folder…" and then select the "lsp-sample" directory
From here, push F5 or select "Run > Start Debugging"
A new window opens

The documentation says that you need to open a file in the "'plain text' language mode" but I have no idea how to open a file in a particular language mode. So I guessed that what it means is open a file that doesn't have an extension that it recognizes. Fortunately, I have a fair few of those, so I tried one. The important thing seems to be to find a file which does not already have a built in language server in VSCode (e.g. .js or .ts files are a bad choice).

The sample appears to have three features: if you type a 'J' or a 'T' it will offer the completions "JavaScript" and "TypeScript" respectively; meanwhile, if you type an "identifier" that starts with at least two uppercase letters, it flags an error.

OK, that seems to do what I want, except it does it entirely inside node, whereas I want to be delegating all the hard work to Java. There doesn't seem to be an official sample that does that, so I think I'm back to rolling my own based on the model I have from Adam Voss.

Syntax Highlighting

At this point, it seems worth going on a diversion to look into syntax highlighting. As I noted last time, syntax highlighting is handled differently to the language features that depend on LSP and is handled using regular expressions entirely on the client side.

Strangely, there doesn't seem to be a sample that specifically addresses this; I'm not sure why. There are three samples, however, which do include syntax highlighting of which contentprovider-sample seems to be the simplest. This is actually a sample about how you can generate documents and display them in an editor window; the syntax highlighting is just an adjunct that is there in order to make the generated document "look good" (and possibly to define regions to which affordances can be added).

Syntax highlighting is covered thoroughly in the documentation but the thrust is that you need to add a "contribution point" into the package.json file that defines the grammars for the languages you want to define (you also need to define a "languages" contribution point which identifies the languages based on file extensions, but I think that is fairly much a given here). The basic idea is that you identify regions of text and associate each of them with a "scope" (the word scope implies, I believe accurately, that the regions can nest within other regions). In the contribution point, you specify a list of bindings, each of which identifies a language, the path to a JSON grammar file (relative to the extension directory) and the root scope for the language.

The JSON grammar file defines an object representing the grammar of the language. Within this, there appears to be duplication of the scopeName. Since we are referencing the grammar file by name, I don't see how these two can be different, yet both seem to be needed, which is my definition of duplication. The other two values are patterns which is a list of pattern names which can appear at the top level, and repository which amounts to the complete set of productions for the grammar. These can, of course, be recursive, and the upshot is that each identified pattern in the grammar has a chain of matching rules (or scope) which enables VSCode to apply the appropriate themes.

The themes can likewise be introduced as extensions, and the theme-sample shows how this can be done. It would seem that Microsoft for some reason have adopted much of this from TextMate and largely expect you to have existing TextMate themes that you wish to reuse. There does appear to be an alternative scheme for defining a new color scheme which would be compatible with the grammar.

These grammar JSON files do not look easy to define. Since I already have a machine-readable formal grammar for my language, I'm sure I will just end up writing one more tool to convert that into a set of regular expressions for syntax highlighting. It would appear that somebody at or connected with Microsoft has done something similar to generate some of their examples.

Conclusion

Well, I still haven't written any code. But I am starting to get closer to knowing what it is that I need to do, so next time out I'm going to start by copying across parts of these various samples and trying to get something which does an amalgam of them: register a language, syntax highlighting, themes and start a Java LSP server.

Friday, September 18, 2020

Integrating with VisualStudio using Language Server Protocol

In my daily life, I do a lot of work with compilers and programming languages. In the course of doing that, I want to provide a quality editing experience; in the modern era that means things like syntax highlighting, auto completion and rapid feedback on errors. But I don't want to write tools, I want to integrate with them. The question then arises as to which editors to integrate with. As a Java programmer, in the past I have generally used Eclipse, but it is not an easy architecture to plug into.

Recently, I have started using Microsoft's VSCode to do front-end JavaScript, HTML and CSS development: in spite of being a Microsoft product, it actually seems to be quite stable, reliable and sane. Its approach to embedding editing features for new languages is not necessarily to strictly embed them, but to allow for a connection to an external server which delivers the relevant features. This appeals to me because it enables me to write most of the integration code in Java - with which I am familiar and which is the implementation language of the compiler itself - thus making the job easier.

As an added benefit, the Language Server Protocol is supported by a wide array of languages and tools which means that doing this work once provides easier access to a range of tools, although it is still necessary to implement a relatively thin "client" experience for each tool.

So what is the Language Server Protocol?

Basically, the language server protocol is a communication protocol between editing tools and "compilers" which abstracts away the language details and allows the two ends to communicate in terms of the kinds of abstract operations that tools want to perform on language elements - look up definitions, search for usages, complete symbols, etc.

The protocol is a version of JSON-RPC over a lightweight HTTP protocol.

Building a Server in Java

Nobody wants to actually go to all the effort of writing the code to read and write JSON-RPC over HTTP. Fortunately, people have been there before us and done that. For example, in Java there is the lsp4j library which makes it possible to write to interfaces and then have a main method that wires up a server.

Most of the work is involved in implementing the LanguageServer interface and implementing all the methods. The main method then instantiates this and creates a "server" by wrapping this using the LSPLauncher.createServerLauncher() method. In addition to the server instance, this method requires an input stream and an output stream. Where do these come from?

This is where the genuine connection to the client comes in. You need a physical transport layer - most likely a socket connection - from which you can extract a stream in each direction. In passing these to the "server" you enable it to read requests and write responses.

Finally, there is a little bit of magic in wiring up the "remote client" interface (by which the server communicates back with the actual client) with the server code by having the server implement the LanguageClientAware interface.

Embedding a Connector in VSCode

Integrating with VSCode is not quite as simple as merely implementing a server. For a variety of reasons, VSCode requires a "beachhead" on the client side to handle the communication with the server.

The process is described in the VSCode documentation. Apart from anything else, this defines the capabilities of the language server and provides the implementation of the connection layer including starting up the "remote" (from the VSCode's point of view) server.

Additionally, some of the functionality associated with a language (such as syntax highlighting) is not implemented over the server protocol at all. Syntax highlighting, for example, is implemented strictly on the client side using regular expression matching. I can't say as I like it - parsing is so much more sane - but it sadly common to most tools.

A Simple Example

For this post I am only going to offer a little code and, breaking with tradition, it is not even my code. Microsoft offers an example of how to connect to another node.js based language server but the late Adam Voss ported this to use a Java server.

Reversing the order of presentation from the sections above, I am going to start with the client side (i.e. the code embedded in VSCode). Obviously, this needs to be node.js compatible and, since we are in Microsoft-land, that means typescript (although I believe it is possible to use JavaScript if you insist).

The client code

Each client needs to have a manifest associated with it, in package.json. Most of this is, of course, vanilla node.js/npm configuration: setting up dependencies and the like.

The key elements seem to be engines, activationEvents and configuration. These are described in some detail in the developer guide and I don't think I have very much to add to that at this point. Obviously, the engines describes the versions of VSCode with which the plugin is compatible; activationEvents describes the portion of the protocol that the plugin implements; and configuration covers the rest of the concerns, including (it would seem) allowing the plugin to introduce settings which the user can then configure.

What is not configured - and is therefore presumably implicit - is how the client is configured. It would seem that the module needs to export a single function called activate which receives an ExtensionContext and is responsible for creating a new LanguageClient object (defined by a Microsoft library) and, after configuring it as appropriate, calling start on it. The cunning, of course, is all in the options parameters that are passed to it.

Moving on to the Java example, we can look at the equivalent extension.ts file in this repository.

Starting towards the bottom (line 76), the language client is created and started (all on one line). The client options look fairly vanilla, but in lieu of the server options, there is a function name. For full disclosure, I haven't so much as cloned this repository yet, so for all I know it doesn't even work, but at the same time I know that JavaScript - and typescript, presumably - will accept a function as a parameter and then call it when it needs the value. I'm assuming that is what is going to happen here. It is worth noting, on the other hand, that this repository is a couple of years out of date, so it is also possible that it is using a no-longer-supported feature.

Anyway, assuming that it is right, it is passing in the function which takes up most of the module (lines 17-60). Again, confusingly, it returns a Promise of a StreamInfo, not the LanguageServerOptions I was expecting. But no matter.

First, it creates a socketpair which, on completion, resolves the promise by providing the reader and the writer. It also listens for the socket to be closed and reports that to the console (it doesn't actually close anything, which seems surprising, but it is possible that somebody else catches that). It connects the listen event to a handler which starts a java process (the server), telling it the port number which has been opened.

I have to admit that there are a number of things going on here which don't seem exactly right to me - but that is probably because I don't understand enough about how the node.js net.server abstraction works.

The server side

The server is a fairly simple and brain-dead Java application. On the flip side, it doesn't do very much.

The main code is in App.java. Basically, this reads the port from the arguments, creates a client around it, and then does the work to set up an LSP server using the streams and an ExampleLanguageServer.

This implements the minimal number of methods to implement a TextDocumentService, although for reasons I don't understand, the actual implementation is split between a class FullTextDocumentService and an inner class inside the language server class.

The server has methods to initialize, connect, shutdown and exit the server, as well as to return the implementation of the FullTextDocumentService. It also provides an implementation of the WorkspaceService which appears to be responsible for handling user configuration changes.

Conclusion

I've learned a lot about VSCode and the Language Server Protocol that I didn't previously know and having saved away the links, I am hoping this will be of use to me when I return to actually try and implement something.

The next step is obviously to clone these various repositories, bring everything up to date, get it to work as is and understand it a little better. After that, I will need to try and understand the breadth of the protocol before trying to connect an actual compiler.

Expect to hear more.