Wednesday, April 26, 2023

Populating a Side View with a Notification

Following on from my previous post, it has come to my attention that it is also possible to populate the view by using a notification from the back end to the front end.

The purpose of notifications is to provide updates without being asked. In our case, it makes sense when we have updated the repository (after parsing one or more files) to send such a notification. Otherwise, the front end needs to "guess" when the information may have changed and request it again.

So the task here is to try and turn the process around, and instead of sending a request for the information we requested, for the back end to automatically send it when it is ready.

Reading through the specification, it seems that this is possible, but that it requires us to "extend" the protocol. But I'm not entirely sure how to do that. It turns out that while it's possible, it doesn't seem to me that it's "genuinely" supported at all.

On the server side, we need to define a new interface which extends LanguageClient and specify the new notification methods that we want, tagged with the @JsonNotification attribute, which specifies the command string that will be used in sending the message.

package ignorance;

import org.eclipse.lsp4j.jsonrpc.services.JsonNotification;
import org.eclipse.lsp4j.services.LanguageClient;

public interface ExtendedLanguageClient extends LanguageClient {
@JsonNotification("ignorance/tokens")
void sendTokens(Object object);

}

It should then just be a case of telling the LSPLauncher that this is the protocol we want to use, but none of the factory methods takes an interface - they all assume that you want to use LanguageClient. But fortunately, there is nothing magic about any of these methods, so we can extract it locally and make the change we need.

Launcher<ExtendedLanguageClient> launcher = createServerLauncher(server, in, out);

  private static Launcher<ExtendedLanguageClient> createServerLauncher(IgnorantLanguageServer server, InputStream in, OutputStream out) {
    return new LSPLauncher.Builder<ExtendedLanguageClient>()
      .setLocalService(server)
      .setRemoteInterface(ExtendedLanguageClient.class)
      .setInput(in)
      .setOutput(out)
      .create();
  }

Likewise, on the client side, it should be just as simple as calling the onNotification method of the client, but this can only be done once the client is ready, and thus after communication has already started up - meaning that some messages might be sent (and lost) before the handler is installed. I tried a number of ways of working around this (there appears to be a notion of Features which can install handlers) but nothing fixed the fundamental problem, so I ended up having to add my own synchronization around this. But for now, here's the code that sets up the handler in the onReady method.

  client.onReady().then(() => {
    client.onNotification("ignorance/tokens", () => {
      console.log("received token notification");
    });

So to handle the synchronization, we add another command with the message ignorance/readyForTokens. In the onReady method we call this AFTER configuring the notification handler.

    client.onNotification("ignorance/tokens", () => {
      console.log("received token notification");
    });
    client.sendRequest(ExecuteCommandRequest.type, {
      command: 'ignorance/readyForTokens',
      arguments: [ ]
    });

    const tokensProvider = new TokensProvider();

On the server side, we can then start sending messages when appropriate.

      public CompletableFuture<Object> executeCommand(ExecuteCommandParams params) {
        switch (params.getCommand()) {
        case "java.lsp.requestTokens": {
          System.err.println("requestTokens command called for " + params.getArguments().get(0));
          return repo.allTokensFor(URI.create(((JsonPrimitive) params.getArguments().get(0)).getAsString()));
        }
        case "ignorance/readyForTokens": {
          System.err.println("readyForTokens");
          amReady = true;
          return CompletableFuture.completedFuture(null);
        }
        default: {
          System.err.println("cannot handle command " + params.getCommand());
          return CompletableFuture.completedFuture(null);
        }
        }
      }

Which just leaves the process of actually sending a message, which we want to do once we have finished parsing. There is a certain amount of off-camera wiring to make this happen reliably.

  private void parseAllFiles() {
    File file = new File(workspaceRoot.getPath());
    parseDir(file);
    client.sendTokens("hello, world");
  }

All the complex wiring is once again left as an exercise for the reader (see VSCODE_NOTIFICATIONS_COMPLEX_WIRING).

Conclusion

It is certainly possible to configure both client and server to handle custom notifications, but there are enough tricky issues involved that I don't think that it can honestly be described as "supported". But if you're an event-driven freak like me, you just have to bite it off.

Tuesday, April 25, 2023

Adding a side view to LSP

It seems like a while since I last worked on my LSP server (*checks git history*: it's a very long while, OK).

But right now I find myself wanting to add a new view with some meta data to my LSP server in "the real world". And so I come back here to figure out how to support that.

I was originally thinking I could add a window at the bottom (akin to Problems, or Output) but it doesn't seem possible to add something there. I considered adding a virtual document but then I realized that was a lot of hard work for not much gain and what I think I probably want is a view.

So here goes …

The basic setup is that we want to add a view which is capable of presenting some hierarchical information as a tree. The thought would be that the server has some way of deducing "global" information about a project which it can then feed back to you (a class explorer, for example).

Looking back at what I did before, I can see that for each project in the workspace, I collected together a list of tokens (called Names) which have a file (in the form of a URI), and a location. For the purposes of testing what I want to, I am going to use this a source of information and build a tree which has a list of Names, and then for each Name, it will return a list of URIs, and within each URI a list of locations. If I understand what I did previously, each name will only have one URI and location, but that is not all that interesting: it's the fact that it's presented as a list which I care about.

But before we get to that, we need to start on the front end, and add a new contribution to our existing package.json.

    "views": {
      "explorer": [
        {
          "id": "ignorantTokens",
          "name": "Ignorant Tokens"
        }
      ]
    }

And run the extension using F5. Well, yes, that was easy. The "Ignorant Tokens" view appears at the bottom of the left hand column. Click on it and it expands to show … nothing.

So now we need to put some content there. For now, we're not going to do the hard work of getting the content from the server, we're just going to hack something in, say a list of items with lists in them. We'll use multiple items here to make sure our code works, even if that isn't something the Java backend is going to offer.

Start small, Gareth, start small. So let's just start with a list of the currently active folders in the workspace.

The key to providing the content is to call the window.registerTreeDataProvider method and to provide it with an object which can provide all the data that you need. That, in itself, is not that hard.

const tokensProvider = new TokensProvider();
window.registerTreeDataProvider('ignorantTokens', tokensProvider);

Now we can move on to actually implementing the provider, along with the data model representing the tree. The provider has to be an instance of the interface vscode.TreeDataProvider, and this has basically two methods: one to get the children of an item and one to get the "tree item data" itself. The Microsoft example I'm working from assumed it was, in fact, possible for the item to be an instance of the TreeItem class, and I have followed suit. This is my implementation of ProjectTokens, which basically just says that it's not hard to represent a workspace folder as a tree item by providing the super constructor with the folder name and saying that it should initially be in the "collapsed" state (click to show children).

import { TreeItem, TreeItemCollapsibleState, WorkspaceFolder } from "vscode";

export class ProjectTokens extends TreeItem {
  constructor(wf : WorkspaceFolder) {
    super(wf.name, TreeItemCollapsibleState.Collapsed);
  }
}

The provider has a constructor which looks at all the folders in the workspace and creates an item for them. It's not actually clear to me whether it is better to put this code in the constructor, or to put it in the code which returns the top-level list of items. I have put it in the constructor here to provide for a clearer separation of responsibilities, but it is not clear to me how well this will adapt to changes.

On the subject of which, I think this is the case which is supposed to be handled by the Event onDidChangeTreeData, but I have to confess that I am ignorant about this and I could not (at this juncture) be bothered to go and check it out, nor am I sure who would do the checking to generate this event (a later juncture occurred while tidying up at the end; see below for how this works and is used).

The first method that VSCode calls when trying to generate the tree view is, oddly, getChildren. But it does so passing in undefined or null or something which gives you the clue that it doesn't know whose children it wants: and the children of nobody must be the top level items, which we have to hand and can return.

Then for each of these items that you pass back, it turns around and asks you to hand it a TreeItem. As noted above, we chose to implement our ProjectTokens as TreeItems, so we can just return the very thing we are given. In other situations, you could return a member of the object, or presumably create one on the fly.

import { Event, ProviderResult, TreeDataProvider, TreeItem } from "vscode";
import { ProjectTokens } from "./tokenlocation";
import * as vscode from 'vscode';

export class TokensProvider implements TreeDataProvider<ProjectTokens> {
  locations: ProjectTokens[];
  constructor() {
    this.locations = [];
    if (vscode.workspace.workspaceFolders == null)
      return;
    for (var wf=0;wf<vscode.workspace.workspaceFolders.length;wf++) {
      this.locations.push(new ProjectTokens(vscode.workspace.workspaceFolders[wf]));
    }
  }

  onDidChangeTreeData?: Event<ProjectTokens | null | undefined> | undefined;
  getChildren(element?: ProjectTokens | undefined): ProviderResult<ProjectTokens[]> {
    if (!element) { // it wants the top list
      if (!this.locations) {
        return Promise.resolve([]);
      } else {
        return Promise.resolve(this.locations);
      }
    }
  }
  getTreeItem(element: ProjectTokens): TreeItem {
    return element;
  }
  getParent?(element: ProjectTokens): ProviderResult<ProjectTokens> {
    throw new Error("Method not implemented.");
  }
}

To support multiple levels of nesting, we now need to provide two more data classes (implementing TreeItem): Token and TokenLocation. The idea here is that Token gives you the name of the token that has been defined and TokenLocation gives you the location where it can be found.

import { TreeItem, TreeItemCollapsibleState, WorkspaceFolder } from "vscode";

export class ProjectTokens extends TreeItem {
  tokens: Token[];
  constructor(wf : WorkspaceFolder, tokens : Token[]) {
    super(wf.name, TreeItemCollapsibleState.Collapsed);
    this.tokens = tokens;
  }

  children() : Promise<Token[]> {
    return Promise.resolve(this.tokens);
  }
}

export class Token extends TreeItem {
  locations: TokenLocation[];
  constructor(name: string, locations: TokenLocation[]) {
    super(name, TreeItemCollapsibleState.Collapsed);
    this.locations = locations;
  }

  children() : Promise<TokenLocation[]> {
    return Promise.resolve(this.locations);
  }
};

export class TokenLocation extends TreeItem {
  constructor(where: string) {
    super(where, TreeItemCollapsibleState.None);
  }

  children() : Promise<Token[]> {
    return Promise.resolve([]);
  }
};

In the provider, we need to make three sets of changes. First, we need to change the signatures of the provider and all the methods to enable all three TreeItem types to be returned. The second, and most important, change is that in the getChildren method, when an element is provided, we must ask that element for its children. And finally, because we are hacking things together, we must update the constructor to provide the appropriate tokens.

import { Event, ProviderResult, TreeDataProvider, TreeItem } from "vscode";
import { ProjectTokens, Token, TokenLocation } from "./tokenlocation";
import * as vscode from 'vscode';

export class TokensProvider implements TreeDataProvider<ProjectTokens | Token | TokenLocation> {
  locations: ProjectTokens[];
  constructor() {
    this.locations = [];
    if (vscode.workspace.workspaceFolders == null)
      return;
    const tmp = [
      new Token("List", [
        new TokenLocation("46.2"),
        new TokenLocation("53.1")
      ]),
      new Token("Map", [
        new TokenLocation("15.7"),
        new TokenLocation("28.9")
      ])
    ];
    for (var wf=0;wf<vscode.workspace.workspaceFolders.length;wf++) {
      this.locations.push(new ProjectTokens(vscode.workspace.workspaceFolders[wf], tmp));
    }
  }

  onDidChangeTreeData?: Event<ProjectTokens | Token | TokenLocation | null | undefined> | undefined;
  getChildren(element?: ProjectTokens | Token | TokenLocation | undefined): ProviderResult<ProjectTokens[] | Token[] | TokenLocation[]> {
    if (!element) { // it wants the top list
      if (!this.locations) {
        return Promise.resolve([]);
      } else {
        return Promise.resolve(this.locations);
      }
    } else {
      return element.children();
    }
  }
  getTreeItem(element: ProjectTokens | Token | TokenLocation): TreeItem {
    return element;
  }
  getParent?(element: ProjectTokens | Token | TokenLocation): ProviderResult<ProjectTokens | Token | TokenLocation> {
    throw new Error("Method not implemented.");
  }
}

The back end

Now it's time to turn our attention to the back end and how we can generate this data on the server side and pass it across. You may recall (or, if you're more like me, may not - I had to go and look at the code) that the communication between client and server is handled by a language client (called client in extension.ts).

Now, the key to doing this is to be able to send a command across from the client to the server when we would otherwise generate the data ourselves - i.e. in the constructor of the data provider. But what message should we send? The documentation doesn't seem wildly clear on this, but it would seem that we can repurpose executeCommand for this purpose. The documentation states that "usually" this will return an applyEdit message, but it seems entirely possible that we could return something else - maybe "hello, world"? Let's give it a try, shall we?

First things first: we need to pass the client to the token provider, and that in turn means that we need to move the declaration down to the bottom. There are a couple of other caveats here: because we are going to call a method in client, we need to wait to make sure that it has initialized, so we attach this code to its onReady method. And because this call is asynchronous, we will want to call await, so it cannot be located in the constructor, so we need to move it out into a separate method. Note that because of this, the constructor could be outside the onReady callback if you needed to store it somewhere; I don't, and it looks neater to me all grouped together.

  client.onReady().then(() => {
    const tokensProvider = new TokensProvider();
    tokensProvider.loadTokens(client);
    window.registerTreeDataProvider('ignorantTokens', tokensProvider);
  });

Inside the tokensprovider, we need to call sendRequest specifying that we want an ExecuteCommandRequest and providing a unique string as the command identifier and then any arguments we might want (in this case the URI of the workspace folder). When we get the response back, we need to process it but this is left as an exercise for the reader.

  async loadTokens(client: LanguageClient) : Promise<undefined> {
    if (vscode.workspace.workspaceFolders == null)
      return;
    for (var wf=0;wf<vscode.workspace.workspaceFolders.length;wf++) {
      let uri = vscode.workspace.workspaceFolders[wf].uri.toString();
      const result = await client.sendRequest(ExecuteCommandRequest.type, {
        command: 'java.lsp.requestTokens',
        arguments: [ uri ]
      })
      // this.locations.push(new ProjectTokens(vscode.workspace.workspaceFolders[wf], result));
    }
  }

On the back end, we need to make two changes. First, we need to say that we now support executeCommand and to list the unique command identifiers we support. Then, in the WorkspaceService, we need to provide a trivial implementation of the command execution - in this case, returning "hello, world".

        ServerCapabilities capabilities = new ServerCapabilities();
        ExecuteCommandOptions requestTokens = new ExecuteCommandOptions(Arrays.asList("java.lsp.requestTokens"));
    capabilities.setExecuteCommandProvider(requestTokens);
        return new WorkspaceService() {
          @Override
          public CompletableFuture<Object> executeCommand(ExecuteCommandParams params) {
            System.err.println("execute command called for " + params.getArguments().get(0));
            return CompletableFuture.completedFuture("hello, world");
          }

I thought that was all I needed to do, but it turns out that it's important to take the final step of updating the list, because we need to make sure that we send notifications when we do.

I had commented earlier about the Event object that was part of the interface and I wasn't quite sure what to do about it. The answer would appear to be that it is an idiom that you define another, similar object called an EventEmitter and then wire up the Event to be the event field of this, and then to call the fire() method on the EventEmitter when you want to the tree to be updated.

      this.locations.push(new ProjectTokens(vscode.workspace.workspaceFolders[wf], result));
    }
    this._onDidChangeTreeData.fire();
  }

  private _onDidChangeTreeData: vscode.EventEmitter<ProjectTokens | Token | TokenLocation | null | undefined> = new vscode.EventEmitter<ProjectTokens | Token | TokenLocation | null | undefined>();
  readonly onDidChangeTreeData: vscode.Event<ProjectTokens | Token | TokenLocation | null | undefined> = this._onDidChangeTreeData.event;

In order to support this, I had to change "hello, world" to be an empty JSON array:

          public CompletableFuture<Object> executeCommand(ExecuteCommandParams params) {
            System.err.println("execute command called for " + params.getArguments().get(0));
            return CompletableFuture.completedFuture(new JsonArray());
          }

Exercise

As I always go back and read my own work, I also have to follow through on the exercise of wiring up the dots. I'm not going to bore you with the details here, but you can check it out in the repository (tag VSCODE_SIDEVIEW_EXERCISE).

Conclusion

It is possible to create side views in VSCode and populate them from a repository of information that can be found on the server side.

Late in the game, I also came across https://code.visualstudio.com/api/extension-guides/tree-view which contains explanations for some of the things I found confusing while doing this work. It might be worth a read.

Monday, February 20, 2023

Playwright and FBAR forms

For those fortunate enough not to know, an FBAR is a form required by the US Government for all US Persons who have assets worth more than $10,000 in another country. And once you have more than that, they want all the details, no matter how big or small. And to make matters worse, while they will allow you to fill in your personal details just once, if you have a joint account, you need to identify that individual for each and every jointly held asset.

And you need to do this every year, even though almost no information changes from year to year.

So, for a long time, I have wanted to automate filling this form in. Normally, I download the PDF version and complete it. Two years ago, I tried to see if I could automate that using the PDFBox tool, but for various reasons that did not work. But last year, I discovered that there is also an online version of the form, and a few weeks ago I discovered the Playwright Chrome Driver library. So …

Let's download Playwright

One of the cool things about Playwright (from my perspective) is that it has an API in Java. So I'm going to use that. In order to build everything, I'm going to use gradle since that seems quite common these days, and so to start with I'm going to have this build.gradle file:

plugins {
    id 'java'
    id 'application'
}

mainClassName = 'ignorance.FBAR'

repositories {
    jcenter()
}

dependencies {
  implementation 'com.microsoft.playwright:playwright:1.30.0'
}

task copyToLib(type: Copy) {
    from configurations.default
    into "$buildDir/output/lib"
}

Following along from the documentation, the first step is to create a central Playwright instance and then use that to open a browser window. I tend to use Chrome, so that's what I'm doing here, but you can also use Webkit or Firefox.

package ignorance;

import com.microsoft.playwright.Browser;
import com.microsoft.playwright.Page;
import com.microsoft.playwright.Playwright;

class FBAR {
  public static void main(String[] argv) {
    try (Playwright playwright = Playwright.create()) {
          Browser browser = playwright.webkit().launch();
          Page page = browser.newPage();
          page.navigate("http://whatsmyuseragent.org/");
    }
  }
}

(In order to get this to work with eclipse, at least for me, I needed to copy the files into a local directory, for which I created the gradle task copyToLib, ran it, and then added the copied JAR files to my classpath).

When I first run this, it downloads a whole bunch of files, which appear to be the browsers it supports. And then, to my (not) very great surprise, it threw an exception:

Caused by: com.microsoft.playwright.impl.DriverException: Error {
message='Target closed
name='Error
stack='Error: Target closed

Now, I have basically no idea what this means, but I'm going to assume I haven't set something up correctly. Before panicking though, I'm going to try it again. No, still no joy.

Reviewing the code, I realized I was a little over-zealous with my copying and, instead of launching chromium as planned, I launched webkit. I'm not really sure what that does, or what browser it would use, but changing it to chromium certainly solves the problem.

And I also want to see what I am doing, so I have added the headless-off and slowmo options to the launch configuration. And so we have the following:

      Browser browser = playwright.chromium().launch(new BrowserType.LaunchOptions().setHeadless(false).setSlowMo(50));
      Page page = browser.newPage();
      page.navigate("https://bsaefiling1.fincen.treas.gov/lc/content/xfaforms/profiles/htmldefault.html");

Filling in some fields

So, since I don't know what I am doing, I am going to just start randomly filling in the form (my overall plan is to pull all the info from my personal records, probably through a JSON intermediary) using the most stable references I can find. If you haven't pulled the link from the code, the form I'm filling in is here. And obviously, I have this open in a regular Chrome window with the inspector on so I can find things in it.

It seems to me the best way of identifying the first field - Email Address - is to use the div with class EmailAddress and then find the input within that. So let's do that and give my official email address - mickey.mouse@disney.com.

page.fill("div.EmailAddress input", "mickey.mouse@disney.com");

OK, that didn't work. It loaded the form, and then paused for a long while (30s to be precise) before giving up and telling me it couldn't find the div:

Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for locator("div.EmailAddress input")
============================================================

And, after mature reflection, I realized that the div class is actually Email, not EmailAddress, so let's try that again:

page.fill("div.Email input", "mickey.mouse@disney.com");

Indeed, that does work. So let's quickly fill in the rest of the details in the form and check in.

      page.fill("div.Email input", "mickey.mouse@disney.com");
      page.fill("div.ConfirmedEmail input", "mickey.mouse@disney.com");
      page.fill("div.FirstName input", "Mickey");
      page.fill("div.LastName input", "Mouse");
      page.fill("div.PhoneNumber input", "770-555-1234");

The next thing to do is to "start" filling in the form. I'm not quite sure why those first fields don't count as filling in the form, I don't know. This involves pushing the "Start FBAR" button, which is the click action on the page.

But, as I go to look at the documentation for this, I discover that both fill and click on the Page have been deprecated in favour of locators, so I am going to digress for a moment into refactoring to use these.

It would seem that this is an attempt to abstract away CSS selectors, and this makes a lot of sense to me. It's more typing, to be sure, but we should never be afraid of trading typing for reliability and correctness.

Interestingly, in doing this, it turns out that there are a number of "duplicate" entries in the form. In particular, because it appears that it just matches some of the text you provide, the phrase "Enter your email address" also matches the confirmation message. To clarify, it is necessary to add .setExact(true) to the end. But I have to say, the error message is exceedingly helpful and clear:

Error: strict mode violation: getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your email address.")) resolved to 2 elements:
1) <input type="text" class="_O" name="Email_5" placehold…/> aka getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your email address.").setExact(true))
2) <input type="text" class="_O" placeholder="" maxlength…/> aka getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Re-enter your email address."))

So, with the refactoring done and the click() added, let's check in again:

      page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your email address.").setExact(true)).fill("mickey.mouse@disney.com");
      page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Re-enter your email address.")).fill("mickey.mouse@disney.com");
      page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your first name.")).fill("Mickey");
      page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your last name.")).fill("Mouse");
      page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Enter your telephone number. Do not include formatting such as spaces, dashes, or other punctuation.")).fill("770-555-1234");

      page.getByRole(AriaRole.BUTTON, new Page.GetByRoleOptions().setName("Please click this button to begin preparing your FBAR.")).click();

So now we can quickly fill in the next few fields. I am hoping that I will never file this form late again (because it will be so easy when I have this script working!) but in the past few years I have struggled because they moved the deadline from June 30 to April 15 (to align with the US tax year). And I keep forgetting that. So, for the purposes of this blog, I will choose a reason and provide an explanation.

page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Filing name")).fill("Mouse FBAR 2022");
page.getByRole(AriaRole.COMBOBOX, new Page.GetByRoleOptions().setName("reason")).selectOption("A");
page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Explanation")).fill("I keep forgetting the deadline has changed.");

Once again, it fails. And once again, playwright's exception message is very clear:

Timeout 30000ms exceeded.
=========================== logs ===========================
waiting for getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Explanation"))
locator resolved to <textarea class="_k" placeholder="" maxlength="750" tabin…></textarea>
elementHandle.fill("I keep forgetting the deadline has changed.")
waiting for element to be visible, enabled and editable
element is not enabled - waiting...
============================================================

The element is not enabled. Checking by hand, it seems that "I forgot" is enough of an explanation and that you only need to provide an explanation if you choose "Other". I'm not that bothered, so I'm just going to comment all that lot out and move on.

page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Filing name")).fill("Mouse FBAR 2022");
// page.getByRole(AriaRole.COMBOBOX, new Page.GetByRoleOptions().setName("reason")).selectOption("A");
// page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("Explanation")).fill("I keep forgetting the deadline has changed.");

Filling in the joint assets

Right, well, so far, I don't think I've achieved anything very much. As my wife would say, "you could have done that by hand with a lot less typing". Fair enough. So let's skip to the interesting part of the operation (page 3 is just more information about the primary filer which only needs to be provided once). Parts II and III reflect accounts owned individually or jointly, and the forms can be duplicated by using the appropriate + button in the top right hand corner of the page. Now, as noted above, the main thing I want to do is not provide my wife's details ten times on the ten copies of the page for each of the forms (I don't actually want to do any of it) but this is the thing that drives me crazy).

In order to show the important things, then, I'm going to create two classes now: Portfolio, which holds all my assets, and JointAsset which holds the information about a joint asset. Because this shares most of its information with an individually held asset, I'm only going to have this have two fields: an AccountInfo and a Asset; I'm going to reuse the former when I go back and fill in Part I.

In the fullness of time, I will extract all the information about the portfolio from its ultimate sources of truth; for now, I am just going to hack something together. Anyway, it's all in PortfolioLoader.java:

package ignorance;

public class PortfolioLoader {

  public Portfolio load() {
    Portfolio ret = new Portfolio();
    AccountInfo me = new AccountInfo();
    AccountInfo other = new AccountInfo();
    ret.user(me);

    ret.joint(new JointAsset().jointWith(other).setMaximumValue(10000).setType("A"));
    return ret;
  }

}

All the other classes I created are just boring POJOs, although you could think of them as DTOs between the two systems (the loader and the form-filler).

For now, we are just going to try and load one account. This should not be too difficult. Having said that, we are going to build it as if we are loading multiple accounts and just throw an error if we reach the second.

So, we start by doing the obvious thing:

page.getByRole(AriaRole.TEXTBOX, new Page.GetByRoleOptions().setName("*15")).fill(Integer.toString(joint.getMaximumValue()));

which should identify the maximum account value field, but in fact, there are four of them (one in each of sections II, III, IV and V). So that doesn't work that well. We need some means of distinguishing them.

Looking through the structure there is a div with a class subForm Part3 and that would seem enough of a distinction. Note that although we would prefer to use a nice, stable mechanism for identifying the fields, in a pinch it is still possible to use a selector. So let's do that now.

Very good. That works. Let's check in again.

      boolean first = true;
      for (JointAsset joint : portfolio.joints()) {
        if (!first) {
          throw new RuntimeException("Not implemented");
        }
        first = false;

        Locator mypage3 = page.locator("div.subform.Part3");
        mypage3.getByRole(AriaRole.TEXTBOX, new Locator.GetByRoleOptions().setName("*15")).fill(Integer.toString(joint.getMaximumValue()));
        mypage3.getByRole(AriaRole.COMBOBOX, new Locator.GetByRoleOptions().setName("*16")).selectOption(joint.getType());
      }
      Thread.sleep(10000);

So the one remaining thing I'm interested in experimenting with before I get serious and start integrating things is to try adding a second page for a second asset. Adding the second asset to PortfolioLoader is easy enough:

    ret.joint(new JointAsset().jointWith(other).setMaximumValue(10000).setType("A"));
    ret.joint(new JointAsset().jointWith(other).setMaximumValue(20000).setType("B"));
    return ret;

which is all fine and dandy until we reach the exception we included earlier for the "more than one" case. Now we need to go back and handle that.

The first thing to do is to add another page by clicking on the "+" button. This has "+" as its aria label, so we can do this quite easily:

      for (JointAsset joint : portfolio.joints()) {
        Locator mypage3 = page.locator("div.subform.Part3");
        if (!first) {
          mypage3.getByRole(AriaRole.BUTTON, new Locator.GetByRoleOptions().setName("+").setExact(true)).click();

which works first time, but then puts us in the situation where it cannot resolve which of the two "Maximum Value" entries it should be considering:

Error: strict mode violation: locator("div.subform.Part3").getByRole(AriaRole.TEXTBOX, new Locator.GetByRoleOptions().setName("*15")) resolved to 2 elements:
1) <input class="_s" value="10000" type="numeric" placeho…/> aka locator("input[name=\"MaxAcctValue_137\"]")
2) <input class="_s" type="numeric" placeholder="" tabind…/> aka locator("input[name=\"MaxAcctValueCL_1676554053694\"]")

It turns out that the Locator abstraction is happy to contain one or many potential DOM nodes, until you decide to do something with them - then it complains about the fact that it cannot choose. But we can easily force that choice using the last() operator (we could use first() or nth(), but since the new forms are added at the end, last() is what is wanted).

So now we have this code:

        Locator mypage3 = page.locator("div.subform.Part3");
        if (!first) {
          mypage3.getByRole(AriaRole.BUTTON, new Locator.GetByRoleOptions().setName("+").setExact(true)).click();
          mypage3 = mypage3.last();

Conclusion

In the space of this blog post, I have convinced myself that Playwright is a reasonable tool for interacting with websites. Hopefully after the conclusion, I will be able to go on and build a tool to import JSON files and fill out FBARs. This will save me a headache for years to come - and may be of use to others too! The repository will be updated to include the final version, even though it is not shown here.

Addendum

In the process of working through the rest of the features, I discovered a number of wrinkles that needed to be resolved.

Selecting the country on this form is a little tricky, as it has an aria-label which is the empty string. I don't think there is anything that can capture that. I solved this problem by instead selecting based on a good, old-fashioned CSS locator. It's not as elegant, but it does the trick.

mypage3.locator("div.partSub div.choicelist.Country select").selectOption(joint.getCountry());

When filling out the address of the owners, it turns out that there is JavaScript logic that connects the country to the list of states. Before a state can be selected, this logic must be run. For whatever reason, this happens in real life but not in Playwright. A little bit of googling suggested that a "blur" event was necessary. (By the way, I discovered the monitorEvents operation in the Chrome Developer Tools while I was doing this: check it out).

with.getByRole(AriaRole.COMBOBOX, new Locator.GetByRoleOptions().setName("33")).selectOption(other.getCountry());
with.getByRole(AriaRole.COMBOBOX, new Locator.GetByRoleOptions().setName("33")).dispatchEvent("blur");

Early on, I had tested that I could add additional pages to the document. What I had not considered was that this would add additional + buttons. So when I came to add a third asset, it could not tell which + button to push. A simple last() fixed that problem:

mypage3.getByRole(AriaRole.BUTTON, new Locator.GetByRoleOptions().setName("+").setExact(true)).last().click();

At the end of the operation, I manually sign and submit the form - which gives me an opportunity to download what I have submitted. Sadly, this download ended up somewhere in the ether (possibly just in memory) where I could not find it. Before next year, I need to figure this out and save it responsibly to somewhere I can keep records of it.

Thursday, January 5, 2023

Lambda Snapstart is Harder than I Thought

Apologies, but kind of in violation of my rules, I don't have any actual example working code. That's because this is more complex than that and the number of moving parts were just too vast. But if you want any help or conversation about this, drop me a line and I'll do/share whatever I can.

When I saw that AWS had released a "snapstart" feature for Lambda, I was ecstatic. I have taken to using Lambda as a way of delivering servers with minimum fuss, but I have somewhat abused the technology by basically moving my existing server-based code into a lambda, along with its long start time.

I grew up in a world where the logic has always been that what you want to optimize is the time spent doing frequent operations; initialization time is effectively "free" (up to the point where it becomes minutes or hours: at which point you really have to do something about it). Lambda, on the other hand, says you have 10s: if you fail to complete in this time, you will be rejected and the whole thing starts again. When you are trying to configure multiple things with multiple services … it doesn't quite make it.

I finally went over the edge when I found that in order to support the latest version of JavaFX, I needed to copy a whole bunch of ".so" files from S3 to the "local disk". This was taking pretty much all the 10s … As I started to consider my options (I'd got as far as panicking, and decided that was not very productive), AWS announced the "SnapStart" feature: initialize once, use repeatedly. Excited, I turned it on for my functions (so excitedly, I just did it in the console, rather than using CloudFormation; more on that later).

And nothing happened. Or, at least, I continued to have problems. But why?

It only works on published functions

Uh, boss, it's not as simple as that. On the upside, it is very clear that the 10s timeout does not apply to SnapStart functions - you have up to 900s to be used when the function is published. On the downside, this implies and, indeed, it is actively stated that you need to publish your functions in order to take advantage of this. Except my test environment does not bother with publishing and versioning lambdas, so I still want to bring that initialization time down.

The Runtime Hooks

Before doing that, I decided to at least try and integrate the runtime hooks and issue tracing messages. My thought process was firstly to see if these were called when the lambda wasn't published, and even if they weren't, I would then at least know when I had successfully published the lambda, because my tracing would come out.

In order to turn on the runtime hooks, it is necessary to download the CRAC library and attach this to your project. The AWSHandler then needs to implement the CRAC Resource interface, and register with the CRAC global context. So I added code like this:

public AWSHandler() {
Core.getGlobalContext().register(this);
}

and then it's a simple matter of implementing the callbacks provided in the Resource interface:

public void beforeCheckpoint(org.crac.Context<? extends Resource> arg0) {
logger.info("before checkpoint");
}
public void afterRestore(org.crac.Context<? extends Resource> arg0) {
logger.info("after restore");
}

When It Works …

I added the relevant code to publish and alias my lambdas, and also to ensure that the code inside APIGateway called the alias (and thus the published version), rather than the "$LATEST" version, and, lo and behold, it all worked.

During the publication, this happens:

INIT_START Runtime Version: java:11.v15 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:0a25e3e7a1cc9ce404bc435eeb2ad358d8fa64338e618d0c224fe509403583ca
Picked up JAVATOOLOPTIONS: -Dui4j.headless=true
-Dglass.platform=Monocle
-Dmonocle.platform=Headless
-Dprism.order=sw
-Djavafx.cachedir=/tmp/solibs
20230104-13:06:48.640 tdaserver/Thread-0 INFO: In config

The key thing here being the "INIT_START" rather than just "START". Interestingly, it doesn't seem to issue any message when the initialization is done, it just stops issuing messages.

And then, when the lambda is called during API Gateway access, I see this:

RESTORE_START Runtime Version: java:11.v15 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:0a25e3e7a1cc9ce404bc435eeb2ad358d8fa64338e618d0c224fe509403583ca
RESTORE_REPORT Restore Duration: 383.49 ms
START RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa Version: 1
…
END RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa
REPORT RequestId: c55dfaca-70b4-4a7e-8e4e-b6fc920269aa Duration: 785.60 ms Billed Duration: 1044 ms Memory Size: 1024 MB Max Memory Used: 401 MB Restore Duration: 383.49 ms

Here the RESTORE_START and RESTORE_REPORT make it clear that a SnapStart image is being used, and how much time has been used to literally start the lambda (383ms may seem a lot, until you realize that it was over 15000ms to actually do the initialization).

After that, the lambda proceeds in the normal way.

No CRAC Output

Interestingly, up to this point, I have not seen any of the tracing output I would expect from my CRAC callback. I don't know whether I didn't succeed in registering correctly, or whether my tracing is simply not coming out. In the fullness of time, I will need to sort this out because it is necessary to check that all the initialization that has been done up to this point is up to date.

Configuring from CloudFormation

As a bleeding-edge adopter, when I first tried to use SnapStart, there wasn't any active CloudFormation documentation on using it. For all I know, it wasn't supported in CloudFormation. However, now, a couple of months later, all the relevant documentation is there.

As you'd expect, you configure this by adding a SnapStart property to your Lambda Function configuration which is quite simple so that it basically amounts to adding:

"SnapStart": {
"ApplyOn": "PublishedVersions"
}

to your existing function declarations.

Conclusion

For a long time, Lambda on AWS with Java has been plagued by painfully slow startup times. It does seem that SnapStart makes major strides towards addressing these and, provided you are publishing your lambdas, is relatively easy to set up.

On the other hand, it seems somewhat opaque to use.

Friday, October 30, 2020

PWA Notifications

Notifications are both more interesting and more complex than adding to the home screen.

"Push notifications" are, in fact, more a browser technology than they are a web app technology. This makes them even more browser specific than adding to the home screen. I generally develop with Chrome, and for my personal projects that's all I consider, so that's as far as I'm going to go in this article: Chrome uses a cloud module called "Firebase Cloud Messaging". However, it's my understanding that on the desktop, Firefox support for push notifications uses a similar cloud service called "autopush". It is my understanding that Edge and Safari on the desktop do not support push notifications at all. Mobile browsers are a different story again.

On the plus side, I believe that wherever they are available, these technologies are all compatible at the API level - at least as far as I describe it here. Your mileage may vary …

The basic paradigm

The basic paradigm for push notifications is that there is "special, magic infrastructure" that can deliver messages from a server to a client through the cloud and the browser. For Chrome, this is called "Firebase Cloud Messaging". As far as I can tell, it is only part of "Firebase" in a branding sense: you don't have to create an account and a project to use the cloud messaging service.

I have attempted to draw out what I think the basic architecture is here:

The browser loads the application from the server and enters into the "usual" two way interaction with it, possibly including Ajax or Websocket interactions. During this interaction, a "magic" key (called a VAPID key) is generated on the server and passed to the client. It uses this key to register its interest in push notifications with the browser. The browser in turn, magically and under the covers, notifies the cloud messaging service of this registration. The client receives a unique and somewhat persistent handle which it can pass to the server as it sees fit - the server needs this in order to be able to send messages to the client.

When the web server wants to send a push notification to the client, it contacts the cloud messaging service, passing it the appropriate request signed with the private portion of the VAPID key. The messaging server then looks up the corresponding registration(s) and delivers the messages to the corresponding browser(s). These, in turn, deliver the message to the service worker thread(s) of the appropriate web application(s), waking them up if necessary and displaying some notification to the user.

Security and Permissions

Everything to do with "the modern web" seems wrapped up in security and permissions. Sadly, with the number of bad actors out there, this is just a fact of life. Three separate processes go on in this context: firstly, as indicated above, there is a public/private keypair generated to ensure that the server sending the message corresponds to the clients wishing to receive messages; secondly, before any messages can be sent, the client must obtain permission from the user of the browser for push notifications to happen; and thirdly, when messages are transmitted with content (or "payload") the content must be encrypted from end to end which is handled by generating a shared key with the subscription.

The guidelines suggest that users should be encouraged to opt in to push notifications by taking a concrete action to enable them. Based on the number of websites I visit where the first thing you see is the message saying that the site would like to send push notifications, this does not seem to be widely adhered to. We will, of course.

Sending Messages

Sending messages theoretically requires a server, but because we are trying to just do this using a static website, we are going to take advantage of a command line tool to send notifications. There are tools of this kind for most languages it would seem, but I have chosen to install a Node.js version.

This can be installed by running

npm install -g web-push

This installs the command globally which, if you have npm correctly configured means you should now be able to run the web-push command from your command-line.

$ web-push
Usage: …

But before we can send any messages, we need to create a keypair. This is the basic security mechanism to ensure that all messages are sent by an approved party. In principle, the server generates the keypair and retains the private key, shipping the public key to the application somehow.

The private key in this context is a signing key: that is, it is used to provide a signature for the message that it sends and the public key can then be used to check that the signature is valid. Invalid messages are rejected.

This is done using the web-push command with the generate-vapid-keys subcommand:

$ web-push generate-vapid-keys
Public Key:
BNX8bG8mNTIJmXai9k35J5CKB2Wyc8kZoJS9Y31qkfUSfiQr7q22vDe5CHCxUclvpl1gEVAewVoINOvFlFFl4
Private Key:
7XXms7NXIM-FuCrxVzoQqlLYz3kuYpdftzL5Dz_LI

We can now send a message using the send-notification subcommand:

web-push send-notification --vapid-pubkey="BNX…" --vapid-pvtkey="7XXX...LI" --vapid-subject="mailto:ignorant@blogspot.com" --payload='hello, world'
Usage:
web-push send-notification --endpoint=<url> [--key=<browser key>] [--auth=<auth secret>] [--payload=<message>] [--ttl=<seconds>] [--encoding=<encoding type>] [--vapid-subject=<vapid subject>] [--vapid-pubkey=<public key url base64>] [--vapid-pvtkey=<private key url base64>] [--gcm-api-key=<api key>]

The problem is that we haven't specified an --endpoint - where to send the message. In order to send a payload, --key and --auth are also required. Fortunately, we can obtain all three at once by the simple expedient of subscribing in our web app.

Subscribing to Push Messages

In order to get the endpoint - and the end-to-end encryption keys in order to send a payload - we need to subscribe on the client. As I said in the introduction, the "polite" way of doing this is to add a button to your web page to enable the user to "ask" for notifications.

Since this depends on having a registered service worker, by default this button should be invisible and only be displayed when the service worker has been registered. In the registration callback, it can then be displayed until such time as it is clicked (or otherwise dismissed). Of course, as with everything else, this doesn't have to be a button per se but can be any kind of user affordance which indicates a deliberate intent to subscribe.

The button then needs an event listener which actually does the subscription like so:

var options = {
userVisibleOnly: true,
applicationServerKey: applicationServerKey
};
registration.pushManager.subscribe(options)
.then(function(sub) {
console.log("subscribed to", sub.endpoint);
var simple = JSON.parse(JSON.stringify(sub));
console.log("auth", simple.keys.auth);
console.log("key", simple.keys.p256dh);
});

When you click the button, it turns around and asks the registration object obtained from registering the service worker to subscribe using a set of options. The applicationServerKey is the public VAPID key used by the server. This is defined at the top of start.js; if you want to run this example, you will need to replace that value with the one you generated above. The userVisibleOnly flag is one that says that when we send a message we will alert the user that we have done so. Our current code in fact does not do this; instead, the browser will (sometimes?) display an automatic notification on our behalf to say that messages have been received.

The result of calling subscribe is a subscription object, returned through a promise, which contains a new endpoint describing this application in this browser on this machine. Note that this subscription is somewhat persistent: if you run this code multiple times, you will get the same value over and over. Obviously on a different browser or a different device, you will get a different code.

The endpoint also automatically contains everything you need to know to send messages - as a URI, it has within it the server that is capable of sending messages to this browser.

The auth and key values are the values we need to use to encrypt the payload for end-to-end transmission.

Receiving Messages

Turning to the service worker, we need to handle messages when they arrive. This is done by listening for the message event.

self.addEventListener('push', function(ev) {
console.log("received push event", ev.data.text());
});

This is where most of your code will need to be placed, but for now this is enough to show something working end to end. Add the extra parameters to web-push send-notification and you should see messages come out in the console.

web-push send-notification --vapid-pubkey="BNX…" --vapid-pvtkey="7XXX...LI" --vapid-subject="mailto:ignorant@blogspot.com" --payload='hello, world' --endpoint="..." --auth="..." --key="..."

Displaying Notifications

As noted above, we are expected to display a user notification when these messages arrive. More than that, it is obviously useful to attract the user's attention, especially since the notification can be displayed when the application (and even the browser, they say) is not running.

It is easy to create a notification in the callback:

self.addEventListener('push', function(ev) {
console.log("received push event", ev.data.text());
self.registration.showNotification('New Message', {
body: ev.data.text()
});
});

Now, what if we want to have the user able to do something when this happens?

Handling Notifications

There is a notificationclick event that the service worker can handle. In this case, it is possible to take actions based on the notification arriving. This handler simply closes the notification and shows a message:

self.addEventListener("notificationclick", function(ev) {
const notify = ev.notification;
notify.close();
var longOp = new Promise(function(resolve, reject) {
console.log('notification was clicked');
resolve();
});
ev.waitUntil(longOp);
});

although, for full disclosure, I deliberately showed the message inside a promise to show how that can be wired up to the notification handling mechanism.

It's possible to do much more than this and, in particular, it's possible to make sure that our whole app is woken up. There are examples of how to do this on the Google Developers' blog.

A Pattern for Using Notifications

For me at least, there is something of a mental model dissonance in using this push technology. I grew up on socket-based client-server architectures and then moved to event-driven computing with an event bus at TIBCO. From this perspective, the web always seems backward from this point of view; the closest web technology is the WebSocket.

I think the right way to think about push notifications is to only use them when the server already feels ignored and has no other way of communicating with the app. When the user is actively working on the client, the server should just interact with the app directly and the user should see that.

The problem, of course, is knowing when the user is interacting with the client.

There may be a better way of knowing, but I think the simplest thing to do is just to use WebSockets for communication "when the app is active" and then allow that to time out (this generally seems to happen after about ten minutes) or deliberately close it after a few minutes of inactivity. Then when the user next interacts with the app restart the websocket connection; if the server wants to bring something new to the user's attention when the websocket connection is down, it sends a "push notification" which is displayed and can get the user interacting with the client again (or not, if they so choose).

A Note on Terminology

I have found in this area that a lot of the terminology seems to be used loosely and inaccurately, but I'm not really sure what "accurate" would look like. Consequently, I've followed the herd and been loose and inaccurate. But here are my thoughts on how the terms seem to be used.

"Push" is a very vague, general term that seems to mean something along the lines of "initiated by the server". The idea seems to be akin to "unsolicited". For me, of course, the key concept is the idea that events happen "in the real world" and you want to be able to react to them. If the server sees the event, it is only reasonable that it lets the client know - and the client lets the user know.

"Messages" is a word I use a lot that means what it says. A message has been passed from somewhere to somewhere else. It's encoded in some way (separately from encryption) that has been agreed by both parties making it a sensible communication. Many people seem to use the phrase "message" or "push message" in the current context to mean the process of sending a message from the web server to the app via the cloud (the very name "Firebase Cloud Messaging" is such a usage).

"Notifications", I think, technically refer to just the final step of the journey: showing something to the user. This is common parlance in the Android world. "Push Notifications" seems to blur the meaning somewhat. Yes, it should end in a notification - as we saw, the API wants you to commit to user visibility - but it encompasses the full lifecycle of the message's travel.

"Subscription" describes the way in which the app connects to the cloud messaging service for a particular web app. This word is used in many different ways in other fields (especially within the publish/subscribe paradigm) but here it has a very specific meaning of a single web app on a single device in a particular browser.

Push from an Actual Web Server

We have used the command line to generate our messages which is obviously not realistic. From within a real web server it is possible to do exactly the same thing - the command line tool that we used simply spun up a node.js instance and used the server-side library.

As far as I can see, the github "user" web-push-libs supports libraries for Node, Java, PHP, Python, C and C#. If you need something else, it is possible to work more directly with the REST API and talk directly to the endpoint.

Likewise, we have copied and pasted various items from the console to glue all of this together. A real application would need to use a technology such as AJAX or WebSockets to connect everything together.

All of that is left as an exercise to the reader.

Firebase

Given that this uses "Firebase Cloud Messaging" on Chrome, it may seem like a good idea to use Firebase. This may in fact be a good idea. But it seems to me that it adds a lot of complexity and moving parts - and I am unclear on the benefits.

Conclusion

Notifications are definitely harder than most of the other web technologies I've used. There are more moving parts than usual and connected together in different ways. But it is certainly possible to get something working in an hour or two if you know what you're doing.

I think I do now, and hopefully you do too!

Thursday, October 29, 2020

Adding to the Home Screen in PWA

Moving on from "just being a website", there are two things that most Progressive Web Apps want to do: be added to the home screen, and to deliver notifications.

These are relatively easy to achieve, but possibly more arcane than I would like, not to mention being inconsistent: the treatment on different platforms is platform- and browser-specific. It would seem to me that Chrome on Android is the "gold standard" of what's supported, and everything else is either "inadequate", "on the way there" or "unsupported" depending on your perspective.

Tidying up from before

While everything seemed to work before, it nevertheless remained the case that Chrome felt a bit "picky" about what we'd done. There are a couple of warnings that tell you that something is up, but not really what.

Scope

Message: Site cannot be installed. No matching service worker detected.

The service worker runs in the background and has the ability to 'intercept' requests (see the next section). But it can be limited in the number of requests that it can intercept by specifying a "scope". Either way, the scope is constrained by the directory from which the service worker file is loaded. This is annoying, since it stops you properly arranging your code, but only the service worker file itself needs to be at the top level.

I moved service_worker.js up to the top level.

Intercepting Requests

One of the key functions of a PWA is its ability to continue to function even when the device is not connected to the internet (this is also one of the main reasons that support is so much better on mobile devices than desktop devices).

In order to make this work, it is necessary to be able to provide all of the resources from local storage rather than from the internet, which means that you need to know where to find them locally.

The browser delegates this task to the service worker through the "fetch" mechanism.

fetch is an event which the service worker must register for. Registering for events in the service worker is just like doing so in a regular javascript application except that the "magic" variable is not document but self. self is a global variable in the context of the service worker which resolves to the current instance of type ServiceWorkerGlobalScope.

Thus we have something like this in service_worker.js:

self.addEventListener('fetch', function(ev) {
...
});

Here ev is a FetchEvent. The key thing that it supports is a respondWith method which enables the service worker to return a cached copy of a file.

It seems to me that most of this method is "boilerplate" code in that it seems to be there to connect requests from the browser to a builtin "caching" mechanism. Of course, it also gives you the opportunity to decide that some URIs cannot be cached, or to pull them from another data source - such as a database - but it seems overly reliant on user code rather than allowing more of a "filter" approach.

The "cache" is not in fact a single cache but a set of caches. In order to store the results of a fetch it is necessary to open an individual cache, but the matching process - by which we test if we have a cached file - operates across all the caches.

The implementation of the fetch logic can be found in the git repo with the tag PWA_BASIC_FETCH.

Background Color

For some reason, before you can add to the home screen, you need to set a background_color. This is mentioned in the Mozilla Documentation

Adding to Home Screen

This feature comes for free on Android the moment you load a PWA that Google Chrome recognizes. On the desktop however, more work is still required.

There is an event on the window called beforeinstallprompt which is triggered by the browser when all the conditions for being installed locally are met. In short these are:

the manifest can be found and is configured correctly;
the service worker is installed and has a correctly configured fetch event;
the app is being served securely (either by HTTPS or from localhost);
it hasn't already been installed.

We can now add the appropriate handler in our start.js script (because this is on the window, we need to do this in the client portion of the browser, not the service worker).

This does not immediately add the app to the home screen; rather it needs to provide an affordance - usually a button - to enable the user to do so. It is a requirement of the specification that the user must actively choose to enable this feature.

However, the action to actually add the app to the home screen is triggered by calling a method on the event passed to your beforeinstallprompt handler. Thus we need to squirrel away a copy of this event when we are given it for later usage.

Because this is a two-step process (first receive the beforeinstallprompt event, then click on a button), we need to make our button initially invisible and then make it visible when the event arrives. We do this by attaching a CSS class to the button with an initial setting of display: none and then specifically overriding the style of the element when the beforeinstallprompt event arrives. (I realize that there are many other ways of doing this; I'm just saying what I chose).

This is sufficiently tricky that I've checked it in before moving on - the tag is PWA_BASIC_BEFORE_INSTALL_PROMPT.

If you refresh, the button should appear. Note that the system can be a little picky about this, so you may need to hard refresh to get it to happen reliably.

Finally, we need to wire up the event handler. This does three things:

makes the button vanish again by setting its display value back to 'none';
prompts the user to check that they want to add to the home screen;
if they agree, does any subsidiary processing.

Note that we don't actually do any subsidiary processing, but the code is there for completeness and reference.

In Chrome on a Mac (my environment), once this is complete the browser window "pops out" and becomes its own application. It is also added to the "Chrome Apps" folder which is then opened in Finder.

This is all checked in with the tag PWA_BASIC_A2HS.

Removing the App from the Home Screen

Again, I can't speak for all platforms, but on the Mac it is possible to uninstall the app by selecting the three dots in the right hand corner of the window and choosing "Uninstall …".

This is obviously useful - and important - for developing the install flow without reinstalling Chrome.

Monday, October 26, 2020

Getting Started with Progressive Web Apps

If you're reading this, I'm going to assume that you know what a web app is. You've probably written one or more. If you don't, think amazon.com.

If you're alive in the 21st century, you have to know what a mobile phone app is. You may well have written - or considered writing - one of those too.

If you have, in fact, written both web apps and mobile phone apps, you may have asked, "can I just do this once?" - particularly if you have written mobile phone apps for multiple platforms.

The answer is yes.

The technology is continually changing, but all of it essentially depends on you writing a completely client-side app - that is, you need to write your application almost entirely in JavaScript, and just communicate with the server to load data using APIs - "AJAX" as they often still confusingly call it.

The latest iteration of the technology is to say "just write a client-side web app that works when connected to your server, and then progressively add features to enrich it".

Whence progressive web app.

What's the minimum I can do?

Technically, the minimum you can do is to create a web app. In keeping with tradition, we could serve up hello world from helloworld.html. It's not my job to write HTTP servers for you, so I'm going to assume you either have one you like or can easily obtain node.js or python's SimpleHTTPServer. If you can't, you probably want to stop reading about now.

However, for my money, this doesn't actually qualify as a progressive web app because the amount of progress it has made is about zero. So the minimum I would consider is a web app with a manifest.

Adding a manifest

Any number of programming languages have had "manifests" for as long as I can remember. The idea comes from a "shipping manifest" (you know, one of those labels on the boxes FedEx drop off at your house): basically, it tells you what's in the box and where to go look for it.

I have to say, they always seem like a bit of a hack to me. If these things need to be known, why are they not required somewhere in the code? Anyway, progressive web apps require a manifest, the manifest must be a JSON document your server can serve, and your HTML page needs to tell the browser that it's there. In doing so, it notifies the browser that your web app is not just a web app, but a progressive web app and that it should trawl through it until it finds the things it needs.

You might think (well, anyway, I might think) that if this thing is JSON, the minimum it could be is an empty object, so:

{
}

In order to include this in your application, you need to add a link tag in the <head> section of your index.html.

<link rel='manifest' href='/json/manifest.json'>

Now we can load the website using our trusty friend, python's SimpleHTTPServer (or otherwise):

cd pwa/html
python -m SimpleHTTPServer

And then the website should be visible on http://localhost:8000/.

So far, so good. You can check this out from github as PWA_MINIMAL_ERRORS.

Uh, it doesn't look any different

Well, no.

That is what the "progressive" bit is all about. It starts off as a completely generic website and then when you have reached a certain threshold of adding stuff, you can start doing fancy things such as working offline, adding it to the homepage and sending notifications.

But we can check up on our work, though. Assuming you are using Google Chrome, open up the developer tools and go to Application. The top thing there, which you have probably normally ignored on your way to Cookies is a tab called Manifest. Click on that. You'll see a number of things there.

First off, there is a link to the source: always useful to check that it has downloaded the most recent version of your manifest (which can be a problem as discussed later). After that come a list of errors that Chrome helpfully describes as "instabilities". We're going to fix those before we do anything else.

Naming It

The second message here complains that there is no "name" or "short name". Drawing on Mozilla's Documentation

Note that for reasons best known to themselves, manifest JSON files separate words with underscores.

{
"name": "First Ignorant PWA",
"short_name": "PWA1"
}

We can reload and the error now goes away. Scrolling down, we can now see that these fields have been added to the Identity section of the manifest information.

Where to Start?

Going back to the first error, most websites "know" that the home page of the website is called something like index.html. This is just a convention, of course, and you can configure a web server to point to any page on your website. Much the same convention applies with progressive web apps, but Chrome considers it an error to take advantage of the convention and instead wants it to be made explicit, hency the warning. By adding a start_url field to the manifest, we can identify that index.html is where we want the application to start when it is restarted locally rather than downloaded.

What Can I Show You?

The display field says how the application wants to present itself. If you are thinking of a PWA as a webpage that just has some fancy features, you will want to go with the simplest level standalone. But PWAs also offer the opportunity to try and become full-fledged apps on your phone or tablet. In that case, there are additional levels of control, each providing less browser UI and depending on you to do all the work until you reach the fullscreen level of control - where your app takes up the whole screen.

According to the Mozilla docs, there is also a browser option that it claims is the default. Chrome says this is an error. I am going to the reality that Chrome provides rather than "the way it should be". Your mileage may vary.

We, of course, are going to go for standalone, since we're not doing anything fancy at all.

Icons

Icons are the bane of my life. Mainly because I'm not at all artistic. But also because there are all sorts of arcane rules about what is and is not allowed - sizes, formats, etc - all of which vary between platforms.

For this project, I used an online icon generatorto generate a package of icons. As it happens, it also generates a minimal manifest which would not be an unreasonable place to start. But instead, I just used the icons and copied and edited the relevant sections of the manifest into my manifest.

That completes a minimal manifest. This is tagged PWA_MINIMAL_MANIFEST.

May I Be of Service?

The final thing that Chrome complains about is the absence of a service worker. What is one of these? Well, it is the thing that is key to making a web app progressive, and it's really what we've been building up to.

Web applications, like other UI applications, have a "main thread" or "UI thread" that is responsible for dealing with user interactions and handling display. If you have done any amount of JavaScript (or other UI development), you will know that it is important to keep things simple and short on the main flow in order to keep response snappy. If you have done quite a bit of JavaScript, you will know that because there is only one thread, it can be difficult.

Workers solve this problem by offering other execution environments in which it is possible to do work that does not interfere with the main flow. And it really is separate and does not interfere: nothing is shared between these two environments except for a message passing mechanism. In a sense, they are like iframes without any visual component.

In order to handle all the things that Progressive Web Apps need to handle without interfering with the main application rendering cycle, it is necessary for them to have at least one service worker and to identify this in the manifest.

Oddly, the service worker is not declared in the manifest but rather must be created from main thread JavaScript code. (To be honest, there is nothing odd about this at all. It used to be declared in the manifest, but the declaration became inadequate to cover all the registration cases so has been deprecated.)

So at this point we need to create two JavaScript files, which I'm calling start.js and service-worker.js. Note, however, that only the start.js finds its way into a script tag in index.html. This is because service_worker.js is not loaded into the main body of the page but into a background "page".

Starting with the service worker (because it's simpler for now), all we want to do to begin with is identify that we have in fact been loaded.

console.log("hello from the service worker");

All the hard work (for now) is getting that to load.

Loading the service worker

First off, not all browsers support service workers (really? You know this is 2020?). Well, possibly they all do now, but you can't be sure, so first test that the functionality we are going to use - the serviceWorker property of navigator - has been defined. If it has, then add a callback when the document is loaded to try and load the service worker. Note that while it's not strictly necessary to wait until the whole page is loaded, it makes sense because you will probably want to have things happen and you don't want the page to not have loaded the elements that you need. You may also want to add other setup and configuration to this callback before you register, just to make sure that everything happens in the right order.

if ('serviceWorker' in navigator) {
window.addEventListener('load', () => {
...
});
}

The final thing we need to do in there is call the register method on the serviceWorker property of the navigator. This needs the path to the JavaScript file to use (service-worker.js) and returns a Promise than will contain the registration if successful - or an error if not.

navigator.serviceWorker.register('/js/service-worker.js')
.then((registration) => {
console.log("have registered service worker", registration);
registration.update().catch(ex => console.log(ex.message));
}).catch(ex => {
console.log("failed to register service worker", ex.message);
});

While not exactly idempotent, the register method can be called regardless of whether there is already a service worker installed. This may seem bizarre, but remember that PWAs can be stored locally and thus can be rerun in different scenarios. Anyway, go ahead and call it and it will do the right thing.

But what it doesn't generally do is to check whether the JavaScript is up to date. It will generally look at the local cache and accept whatever version is there. This is absolutely fine if you are not connected to the internet (offline working is the first benefit of using a PWA) and is OK if you are a casual user of an application. But it is not at all good if you are actively developing. To bypass the cache, the returned registration has an update method that says "if you can, go and check if there is a more up-to-date version out there".

If all goes well, you should now see messages come out in your console.

This is all available as PWA_MINIMAL_WORKER.

Conclusion

That is pretty much as minimal as you can make a Progressive Web App. Obviously, I have no intention of stopping there, so read some of the other posts in this thread.