Tuesday, September 26, 2023

Parsing the Diagram Description


The core of what we are trying to do here is to describe a diagram (or set of diagrams) in a declarative, textual way. As much as possible, we want that description to be abstract and close to the semantics of what is going on. But there must obviously be some means by which that is mapped onto a diagram consisting of boxes and lines.

That being the case, I'm going to start from a very generic notion from graph theory that we have a graph consisting of nodes and edges. Each of these items may have metadata of various forms attached to them, which will inform how we lay them out and render them. Some of these may be semantic and need further interpretation; others may be more direct.

It would be great if the layout were fully automatic and exactly right every time; however, such an algorithm has not yet been invented, so there are going to have to be commands in the description which enable us to describe the manner in which nodes and edges should be connected.

And this is the right point to add the disclaimer that I am not any kind of expert in the fields of graph theory or layout, and that I'm going to come up with a layout algorithm that sort of works. I'll be happy to work with anybody who can do better - I'm just interested in a "tool" that can produce decent-ish pictures from a completely textual description.

The description language

The basis of the description language is a set of lines with nesting. Each "top level" line either introduces a new node or a new edge; adds metadata to an existing node or edge; or adds a constraint between multiple existing items. Because the language is declarative, the order of the declarations is irrelevant. If needed, additional information is included in nested lines.

Let's start with a very simple example:
node producer
    label "Producer"
node consumer
    label "Consumer"
edge
    from producer
    to consumer
        cap arrow
So this introduces two nodes, a "producer" and a "consumer" (and gives them labels) and then defines an arrow leading from the producer to the consumer.

Analyzing the description

The way in which I'm approaching this problem is much the same as at the top level: I'm going to build a TDA pipeline in which the parser is mainly responsible for breaking the input up into lines, and then calling a tokenizer on each of them, and having that pass any "interesting" (non-blank, non-comment) lines on to the next step.
import { tokenize } from "./tokenize.js";
import Blocker from "./blocker.js";

function parser(top, errors) {
  return function(text) {
    var lines = text.split(/\n/);
    var blocker = new Blocker(top);
    for (var i=0;i<lines.length;i++) {
      tokenize(lines[i], blocker, errors);
    }
  }
}

export default parser;
The next step is to try and "group" these by indentation level. Any lines with no indentation go directly to the "top level parser", which creates a model object (attaching it to the top level model) and returns a parser which will handle any nested items. Lines with greater indentation are assessed according to a "stack": it is passed to the deepest parser with an indentation strictly less than the current line.

The parser which is called is responsible for understanding the meaning of the tokenized line in its context and updating the model it possesses, and then returning a new parser which is capable of handling any nested content. If it does not wish to support nested content, returning null or undefined will cause the blocker to automatically raise an error.
class Blocker {
  constructor(top, errors) {
    this.top = top;
    this.errors = errors;
    this.stack = [];
  }

  line(tok) {
    var current = this.top; // the default

    // find which parser to use, closing and shifting others more deeply indented
    while (this.stack.length > 0) {
      var first = this.stack[0];
      if (tok.indent > first.indent) {
        current = first.handler;
        break;
      }
      if (first.handler && first.handler.complete) {
        first.handler.complete();
      }
      this.stack.shift();
    }

    // call the current parser
    if (!current) {
      // if there is no handler specified at this level, it's because no further nesting is allowed
      this.errors.raise("no content allowed here");
    } else {
      var nested = current.line(tok);
    
      // record this parser for everything under this nesting level
      this.stack.unshift({indent: tok.indent, handler: nested});
    }
  }

  complete() { // on end of input
    this.top.complete();
  }
}

While we have been doing this, we have quietly introduced modules into the code, and started writing simple tests of the tokenizer and blocker. We have also introduced some rudimentary error handling.

The Top Level Parser

We're now ready to try building a top level parser. Three lines of input should be presented to it: the two node lines and the edge line. It should be able to identify these, reject anything else, build a model entry for the item and add it to the model, and then return a parser to handle any configuration items that they may have (a NodeConfigurationParser and an EdgeConfigurationParser respectively). How hard can this be?
import NodeConfigParser from "./nodeconfig.js";
import EdgeConfigParser from "./edgeconfig.js";
import Node from "./model/node.js";
import Edge from "./model/edge.js";

class TopLevelParser {
  constructor(model, errors) {
    this.model = model;
    this.errors = errors;
  }

  line(l) {
    var cmd = l.tokens[0];
    switch (cmd) {
      case "node": {
        switch(l.tokens.length) {
          case 1: {
            this.errors.raise("node command requires a name");
            break;
          }
          case 2: {
            var node = new Node(l.tokens[1]);
            this.model.add(node);
            return new NodeConfigParser(node);
          }
          default: {
            this.errors.raise("node: too many arguments");
            break;
          }
        }
        break;
      }
      case "edge": {
        switch(l.tokens.length) {
          case 1: {
            var edge = new Edge();
            this.model.add(edge);
            return new EdgeConfigParser(edge);
          }
          default: {
            this.errors.raise("edge: not allowed arguments");
            break;
          }
        }
        break;
      }
      default: {
        this.errors.raise("no command: " + cmd);
      }
    }
  }
}

Each of the nested parsers go much the same way: the node parser supports the label property and the edge parser supports the from and to properties. The latter, in turn, allows further configuration using the EdgeEndParser. We now have the ability to parse the input script outlined above.

Handling Errors

I've quietly added some (primitive) error handling into the code above, but it still just prints on the console. This doesn't seem the best idea, so I'm going to add an error tab and put the errors in there. We still haven't implemented any CSS, so we don't have tabs as such, but the idea will be that normally this error tab will be hidden, but that when there are errors all the diagram tabs will be hidden, the error tab will be visible and selected.

This follows much the same pattern as the happy path in the code. We add a block in the tabs for the errors to appear:
          <div class="error-tab">
            <div class="tab-title">Errors</div>
            <div class="error-messages">
            </div>
          </div>
In the pipeline, we check if we have seen any errors. If so, we recover the error-tab and pass it to the show method on the ErrorReporter object, otherwise we continue with the normal flow, making sure to clear any previous errors.
function pipeline(ev) {
  var errors = new ErrorReporter();
  var model = new DiagramModel(errors);
  readText("text-input", parser(new TopLevelParser(model, errors), errors));
  if (errors.hasErrors()) {
    applyToDiv("error-messages", tab => errors.show(tab));
  } else {
    applyToDiv("error-messages", tab => tab.innerHTML = '');
    var portfolio = new Portfolio();
    model.partitionInto(portfolio);
    applyToDiv("tabs-row", ensureTabs(portfolio));
    portfolio.each((graph, tab) => graph.layout(d => d.drawInto(tab)));
  }
}
And here is the ErrorReporter class, now pulled out into its own file:
class ErrorReporter {
  constructor() {
    this.errors = [];
  }

  hasErrors() {
    return this.errors.length > 0;
  }

  raise(s) {
    console.log("error: " + s);
    this.errors.push(s);
  }

  show(elt) {
    elt.innerHTML = '';
    for (var i=0;i<this.errors.length;i++) {
      var msgdiv = document.createElement("div");
      msgdiv.className = "error-message";
      var msgtext = document.createTextNode(this.errors[i]);
      msgdiv.appendChild(msgtext);
      elt.appendChild(msgdiv);
    }
  }
}

export default ErrorReporter;

No comments:

Post a Comment