Monday, March 24, 2025

A Very Simple Compiler


In this episode, we're going to write a compiler.

Now, when I was younger, I thought that writing compilers was the very pinnacle of software development. That if I could do that, I could do anything. After I had mastered the art of writing compilers, I then went and worked on real-time software with race conditions and realised that writing compilers was actually quite easy.

It's even easier if you leave out all of the hard parts such as type checking, helpful error messages, error cascade detection and so on. So if you're scared by the assertion that we are going to write a compiler in this blog post, don't be. On the other hand, if you're hoping to learn all the intricacies of doing so, that won't happen either.

If you are completely unfamiliar with compilers, they operate in "phases", with one phase generally coming after the next. While a typical compiler may have a dozen phases, we are just going to implement the most important two. Parsing is the process by which we understand the human-readable text of the source code and convert it into an internal data structure. Then Code Generation is the process by which we turn an internal data structure into something that can be executed (in our case a JSON file which can be read in the browser). Obviously, since we only have these two phases the "internal data structure" must be the same in both cases.

A Toy Language

And if you want to make it really easy, you write a compiler for a toy language. I spent some time thinking about what the simplest language I could come up with was, and while I didn't quite get there, I think I've come up with something fairly simple. For my purposes, the key feature it has to have is that it mustn't look like JavaScript, and it mustn't be possible to just say "ah, use a source map". So:
  • Its source code mustn't map directly onto underlying JavaScript;
  • The runtime data layout must not be at all obvious from the debugger;
  • Ideally, the function call stack isn't either.
In other words, I want a language where you would be almost hopelessly lost trying to use the standard Chrome JavaScript debugger to try and debug programs. If that seems like an odd specification for a language, bear in mind that I'm trying to motivate building a whole new debugger plugin for Chrome, and that I already have a non-trivial language with these characteristics (I know, because I'm struggling to debug it with the Chrome debugger).

So the concept I ended up with was a language to program the kinds of tills that you see these days in coffee shops. Now, when I last worked in such a place (Burger King, 1986), these things were all pretty much "manual". I'm sure there was something that mapped the buttons to the orders, but it wasn't on an iPad.

My vision, then, is of something with (say) five rows and four columns - 20 buttons. Each of those buttons has some label on it. Some of them are "nouns" - items you can order like tea or coffee. Others are "adjectives" - things that describe how you would like it, such as "strong" or "milk". And then there are a couple of "verbs" - "item is complete" and "order is complete".

As you push one or more of the buttons, it changes which other buttons are available. For instance, once you push "coffee", you can't push "tea" anymore. Until you have selected one "noun", neither "next" or "done" are available. How does it know? Because we have written a program, of course. In a programming language, of course. A very simple, toy programming language that I have, imaginatively, named till.

So, before I start writing a compiler, I'm going to present a sample of what I naively think is going to be a working till program. I've checked this in, and we will see whether it actually ends up looking anything like this when I'm done.
A layout enables us to say where the buttons go on the screen.

In theory, we could have any number of layouts, but for ease, I'm only going to support one for now,
and thus the name is irrelevant.  If we did support multiple layouts, then the idea would be
that there would be a verb "show" that switched to a different layout.  It may be the case that I
will find that adding more will add an essential complexity to the language that makes the debugger
more interesting.

    layout main
        row1 <- Coffee Tea Steamer Water
        row2 <- Strong Black
        row3 <- Dairy blank Oat Almond
        row4 <- blank blank blank blank
        row5 <- NEXT blank blank DONE

All good programming languages need an entry point and this one is no different.

The entry point is always called init, and, as shown here, does not have any arguments.
All routines are the same in that they have a header at one level of indentation and then
nested operations.  We'll come back to what the operations mean later.

    init
        enable all
        disable NEXT
        disable DONE
        milks <- Oat Dairy Almond
        drinks <- Tea Coffee Steamer Water

Each button on the screen has a name which is both an identifier (used elsewhere in the program)
and the label that appears on the button.  For simplicity, spaces are not allowed.  When pressed,
the operations listed "inside" it are carried out.

    button Coffee
        style noun
        noun <- Coffee
        disable all
        enable Strong Black milks
        enable NEXT

    button Tea
        noun <- Tea
        disable all
        enable Black milks
        enable NEXT

    button Water
        disable all
        enable NEXT

    button Oat
        style adjective
        adjs <- Oat
        disable milks Water

    button Dairy
        adjs <- Dairy
        disable milks Water

    button Almond
        adjs <- Almond
        disable milks Water

    button NEXT
        order <- noun adjs
        clear noun adjs
        enable all
        disable NEXT

    button DONE
        submit order
        clear order
        enable all
        disable NEXT
        disable DONE

CDP_SAMPLE:cdp-till/samples/cafe.till

I have no idea how intuitive this is to you right now, or even if it looks like a program to you. Certainly don't sweat the details, we should come back to all of that before we're done. The key point is that this language has been designed to be very, very easy to compile to a JSON intermediate form that we can then load into the browser and "run", just so we can get around to trying to debug it.

I suspect that if you are familiar with functional languages such as Miranda or Haskell the style will be a lot more familiar that if you are more used to C, Java, Go or JavaScript. To be clear, while I have designed this to be event driven and in some sense "declarative", it is most certainly not a functional language.

It does follow the "semi-literate" style of Miranda's "literate scripts". That is, any line which is not indented is treated as commentary and no attempt is made to parse it or use it in the compiler program. Identation is the exclusive means by which nesting is defined (no braces or keywords are used) with the caveat that non-indented lines, because they are ignored, may appear anywhere.

We are going to support three "top-level" keywords: layout, init and button. layout takes an argument which we will not (currently) use. init does not take an argument. button takes an argument that acts both as an identifier (by which it is referenced elsewhere) and a label. For simplicity, everything is case-sensitive and spaces are not permitted.

Inside the layout, we simply indicate for each row which buttons should appear on it.

Inside the other two blocks, we have a series of actions. Each action is either of the form keyword args or of the form var <- args. There are a bunch of keywords such as style, enable, disable, clear and submit which are "built in" to the language definition. The variables are not builtin and you can use them as you wish.

The "assignment" operator <- is not an assignment operator as such, but a "create-and-append" operator. All variables in Till are lists, and the <- operator appends the items on the right to the list on the left. If it does not already exist, an empty list is created. A variable which is referenced but does not exist has the "empty" value. If you want to remove things from a list, you use the clear keyword.

This probably all sounds more complicated than it really is.

A Parser

One time, when the team I was on was writing a parser for a templating language (like Velocity or Freemarker), a colleague asked me why we didn't use "tell-don't-ask" to write parsers. I explained that there were reasons, but we had a go anyway ... and failed.

Since then, I've practised doing TDA a lot more, and I've learnt how to do this. There are still reasons why it is tricky, mainly involving "global state", and I'm still not sure whether I like it or not, but the resulting parsers that are quite testable at a "unit" level.

So here is the basic tell-don't-ask parser pipeline:
package parser

import "github.com/gmmapowell/ignorance/cdp-till/internal/compiler"

func Parse(repo compiler.Repository, srcdir string) {
    scope := NewGlobalScope(repo)
    lineLexicator := NewLineLexicator()
    blocker := NewBlocker(scope, lineLexicator)
    splitter := NewSplitter(blocker)
    scanner := NewScanner(splitter)
    scanner.ScanFilesInDirectory(srcdir)
}

CDP_TDA_PARSER:cdp-till/internal/parser/parser.go

In the same commit, I have checked in all the minimal code to make this compile. It doesn't run - for a start, we haven't attached this to any code anywhere yet! - but it is enough to explain how we expect the parser to operate. In a little while, we will come back to the code generation pipeline.

As with all TDA code, this reads "backwards", so to understand what it does, you have to read from the bottom upwards. The Scanner is (going to be) a class that looks at a directory on the file system and finds files with the .till extension. It provides the paths to each of these files in turn to the splitter. The Splitter has the easy task of splitting these files into lines and telling its client about files starting, ending and the line number and text content of each line. Here, we specify its client as a Blocker, which is responsible for tracking indentation, throwing away comment lines and then keeping a "stack" of indentations and indented scopes. It has a scope which represents the "top level" scope, within which all the others will be nested, and a lineLexicator which is responsible for breaking up each line into tokens. Internally, it will pass the resulting tokens to the current scope and that will generate new scopes and build up the parse tree. This will become clearer when we look at the Blocker in detail.

All of these units communicate using interfaces to make it easy to test them in isolation. I am not going to do much of this here for two reasons: the didactic one is that it is just more code to present; the other is that I am stealing a lot of this code from other places and modifying it - in those other places there are unit tests.

Now, let's start filling in the blanks. I'm sure some or all of the assumptions I've made in this checkin will change, but no worries...

The Basic OS Stuff

The scanner and the splitter are so boring, I'm tempted to not even mention them. But they are there, so here we go.

The scanner just looks through a directory on the disk, given a path name.

(For ease, speed and clarity I am avoiding most of the usual error handling and just ignoring errors or panicking as seems appropriate.)
package parser

import (
    "fmt"
    "os"
    "path/filepath"
    "strings"
)

type FileHandler interface {
    ProcessFile(file string)
}

type Scanner struct {
    handler FileHandler
}

func (s *Scanner) ScanFilesInDirectory(srcdir string) {
    s.scanFor(srcdir, ".till")
}

func (s *Scanner) scanFor(srcdir, suffix string) {
    files, err := os.ReadDir(srcdir)
    if err != nil {
        panic(fmt.Sprintf("could not read src directory %s: %v", srcdir, err))
    }
    for _, f := range files {
        if !f.IsDir() && strings.HasSuffix(f.Name(), suffix) {
            s.handler.ProcessFile(filepath.Join(srcdir, f.Name()))
        }
    }
}

func NewScanner(handler FileHandler) *Scanner {
    return &Scanner{handler: handler}
}

CDP_PARSER_OS:cdp-till/internal/parser/scanner.go

This is just a simple piece of code that uses os.ReadDir to find all the files in a directory, sees if they have the desired suffix and, if so, joins the file name to the directory path and hands all of that off to the file handler.
package parser

import (
    "bufio"
    "fmt"
    "os"
)

type LineHandler interface {
    BeginFile(file string)
    ProcessLine(lineNo int, line string)
    EndFile(file string)
}

type Splitter struct {
    handler LineHandler
}

func (s *Splitter) ProcessFile(file string) {
    stream, err := os.Open(file)
    if err != nil {
        panic(fmt.Sprintf("could not process %s: %v", file, err))
    }
    defer stream.Close()

    s.handler.BeginFile(file)
    scanner := bufio.NewScanner(stream)
    i := 1
    for scanner.Scan() {
        text := scanner.Text()
        s.handler.ProcessLine(i, text)
        i++
    }
    s.handler.EndFile(file)
}

func NewSplitter(handler LineHandler) *Splitter {
    return &Splitter{handler: handler}
}

CDP_PARSER_OS:cdp-till/internal/parser/splitter.go

The splitter, likewise, takes a file and between notifying the LineHandler of the start and end of the file, splits it into lines and sends each line - along with its number - to the LineHandler.

The Blocker

The Blocker implements LineHandler and is responsible for grouping the incoming lines by indentation. Lines which are blank or have no leading white space are discarded. All other lines must begin with one or more tabs, followed by a non-white-space character. Anything else is an error (and we will panic).

We start off with some interface definitions, which provide placeholders for us to "send requests":
package parser

import (
    "fmt"
    "path/filepath"
    "unicode"
)

type Scope interface {
    PresentTokens([]string) Scope
    Close()
}

type LineLexer interface {
    Lexicate(line string) []string
}

type Blocker struct {
    lineLexer LineLexer
    scopes    []Scope
}

CDP_BLOCKER:cdp-till/internal/parser/blocker.go

The idea of a Scope is that it is something which knows what can be parsed there, and can attach any nested lines to it. We will have a total of about four different scopes (global, layout, methods and "invalid") in this language. The idea is that we present the tokenized version of a line to its immediate parent scope, which interprets it and updates the repository structures, and then returns a new, nested scope. If nothing can be legally nested inside the current command, an "invalid" scope is returned which will just panic if PresentTokens is called.

Because this implements the LineHandler interface, we next have the three methods for that.
func (b *Blocker) BeginFile(file string) {
    fmt.Printf("%s:\n", filepath.Base(file))
}

func (b *Blocker) ProcessLine(lineNo int, line string) {
    indent, remaining := figureIndent(lineNo, line)
    if indent == 0 {
        return
    }

    b.closeScopes(indent)
    if indent > len(b.scopes)+1 {
        panic(fmt.Sprintf("double indent at line %d", lineNo))
    }

    tokens := b.lineLexer.Lexicate(remaining)
    inner := b.scopes[len(b.scopes)-1].PresentTokens(tokens)
    b.scopes = append(b.scopes, inner)
}

func (b *Blocker) EndFile(file string) {
    b.closeScopes(1)
}

CDP_BLOCKER:cdp-till/internal/parser/blocker.go

The BeginFile method just echos the filename so that we can see where we are up to in the compilation. The EndFile method closes off any scopes that may still be open (except the global scope). Most of the interesting code here, is, of course, in ProcessLine.

It delegates to figureIndent (below) the task of figuring out if this line is blank, a comment, invalid or a code line. If it is blank or a comment, it returns an indent of zero, and we just give up. If it is invalid it panics, so we don't see the result. So the other possibility is that it returns a non-zero indent.

In this case, we close all the scopes which we created at this level of indentation or more. There is a quick check that we haven't "double indented" (e.g. going from 1 to 3 without 2), and then we tokenize the line. We present these tokens to the current scope and then append the returned scope to our list of current scopes.

figureIndent is just an exercise in string handling. It's supposed to do what I described above and while it isn't hard, it isn't clean, either.
func figureIndent(lineNo int, line string) (int, string) {
    if len(line) == 0 {
        // it's an empty line
        return 0, ""
    }

    runes := []rune(line)
    hasnontabs := false
    indent := -1
    for i, r := range runes {
        if unicode.IsSpace(r) {
            if r != '\t' {
                hasnontabs = true
            }
        } else {
            indent = i
            break
        }
    }
    if indent > 0 && hasnontabs {
        panic(fmt.Sprintf("line %d has non-tab characters in indent", lineNo))
    }
    if indent <= 0 {
        // it's blank or a comment
        return 0, ""
    }

    return indent, string(runes[indent:])
}

CDP_BLOCKER:cdp-till/internal/parser/blocker.go

Closing scopes just involves what you'd think: going around a loop until you've got the number down to what you want and telling each one that's going away that you're closing it, and then removing it from the end of the list.
func (b *Blocker) closeScopes(upto int) {
    for len(b.scopes) > upto {
        b.scopes[len(b.scopes)-1].Close()
        b.scopes = b.scopes[:len(b.scopes)-1]
    }
}

CDP_BLOCKER:cdp-till/internal/parser/blocker.go

And NewBlocker gets the show on the road by creating a new blocker with a list of scopes which has just the global scope in it and the line lexer.
func NewBlocker(scope Scope, lineLexicator LineLexer) *Blocker {
    scopes := append(make([]Scope, 0), scope)
    return &Blocker{scopes: scopes, lineLexer: lineLexicator}
}

CDP_BLOCKER:cdp-till/internal/parser/blocker.go

Tokenization

On the subject of that lexer, I've tried to keep that as simple as possible as well. Almost everything in this language is an alphanumeric string separated by spaces. The exception is the assign/append operator (<-), but that is easy to spot as well. So we are just going to rattle through the input string, gathering up alphanumeric strings, spotting the assign operator, and panicking if anything else happens. This is the kind of thing where unit testing really helps to get the difficult cases right simultaneously, so I'm going to write a few (look in lexicator_test.go if you want to see them) but just present the simple final code.
package parser

import "unicode"

type LineLexicator struct {
}

func (l *LineLexicator) Lexicate(line string) []string {
    ol := &OneLine{}
    for _, r := range line {
        if unicode.IsLetter(r) || unicode.IsDigit(r) || r == '_' {
            ol.completeAssign()
            ol.alpha(r)
        } else if r == '<' || r == '-' {
            ol.completeAlpha()
            ol.assign(r)
        } else if unicode.IsSpace(r) {
            ol.completeAlpha()
            ol.completeAssign()
        } else {
            panic("huh?")
        }
    }
    ol.completeAlpha()
    ol.completeAssign()
    return ol.ret
}

func NewLineLexicator() *LineLexicator {
    return &LineLexicator{}
}

CDP_LEXER:cdp-till/internal/parser/lexicator.go

We need to manage a bunch of state here, and it's all interconnected, so the easiest thing to do is to bundle it off into an object. Since the state comes into being and goes away all within the course of a line, I've called it OneLine and we'll look at it below.

We then iterate through all the characters in the line, taking appropriate actions on the state depending on whether the character is alphanumeric, a valid operator character, a space character or an invalid character.

If it's alphanumeric, we observe that any "symbol" in progress must be completed, and then this character can be appended to an alphanumeric token. On the other hand, if it's a valid symbol character, we need to complete any ongoing alphanumeric token and then append this to a symbol token. If it's a space character, then we need to complete either the ongoing alphanumeric or symbol token. Anything else (e.g. + or =) will cause an immediate panic.

When we've finished processing, we tell the OneLine state object to complete either type of token, and then we return the collection of tokens it processed.

In OneLine, we maintain a current token (of any kind) as a slice of runes. We keep the current set of completed tokens as a slice of strings. And we keep track of whether we are "in" (i.e. have already processed at least one rune of) an alphanumeric or symbol token.
type OneLine struct {
    curr              []rune
    ret               []string
    inAlpha, inAssign bool
}

func (ol *OneLine) alpha(r rune) {
    ol.inAlpha = true
    ol.curr = append(ol.curr, r)
}

func (ol *OneLine) assign(r rune) {
    ol.inAssign = true
    ol.curr = append(ol.curr, r)
}

func (ol *OneLine) completeAlpha() {
    if !ol.inAlpha {
        return
    }
    tok := string(ol.curr)
    ol.curr = nil
    ol.ret = append(ol.ret, tok)
    ol.inAlpha = false
}

func (ol *OneLine) completeAssign() {
    if !ol.inAssign {
        return
    }
    op := string(ol.curr)
    ol.curr = nil
    if op != "<-" {
        panic("invalid operator: " + op)
    }
    ol.ret = append(ol.ret, op)
    ol.inAssign = false
}

CDP_LEXER:cdp-till/internal/parser/lexicator.go

We then have four methods: two to say "we have just seen a rune which belongs to a certain type of token" and two to say "if we are in this type of token finish up". The two methods in each pair are very similar, mainly just updating the current state. The complete functions also move any token onto the ret slice.

Note that the complete methods are careful not to "complete" a token which has not been started.

And then, depending on your view of what "parsing" is, we are done. We still need to "understand" what the parsed lines mean, which we'll do next, but we haven't had to faff with any of that "lookahead token" stuff or "shift/reduce conflicts". Keeping it simple really helps.

Interpreting the Tokens in a Scope

We are certainly not done with the front end. We need to build up a repository of top level objects with nested commands inside them. This involves each of the tokenized lines being understood (or interpreted) in a context (or scope).

This happens when the Blocker passes the list of tokens to an existing Scope. Since that is where the magic begins, let's first look at GlobalScope:
package parser

import "github.com/gmmapowell/ignorance/cdp-till/internal/compiler"

type GlobalScope struct {
    repo compiler.Repository
}

func (gs *GlobalScope) PresentTokens(lineNo int, tokens []string) Scope {
    switch tokens[0] {
    case "layout":
        return &LayoutScope{repo: gs.repo, hdrLine: lineNo, name: tokens[1]}
    case "init":
        return &MethodScope{repo: gs.repo, hdrLine: lineNo, name: "init"}
    case "button":
        return &MethodScope{repo: gs.repo, hdrLine: lineNo, name: tokens[1]}
    default:
        panic("cannot handle global command" + tokens[0])
    }
}

func (gs *GlobalScope) Close() {
    panic("you should not close the global scope")
}

func NewGlobalScope(repo compiler.Repository) Scope {
    return &GlobalScope{repo: repo}
}

CDP_INTERPRETATION:cdp-till/internal/parser/globalscope.go

All that really matters here is PresentTokens (by the way, I have quietly added a lineNo argument here because it turns out to be needed). The first token tells you what you've got, and then you need to construct an appropriate nested scope. We pass in the repo because they will need that to add any items they create, the line number associated with the start line, and the "name" of the thing - provided in the second argument.

Note that for simplicity, I have said that init is basically the same as button - both create a method. If there turns out to be a difference, it will be possible to split them.

The LayoutScope handles all those pesky row assignments:
package parser

import "github.com/gmmapowell/ignorance/cdp-till/internal/compiler"

type LayoutScope struct {
    repo    compiler.Repository
    hdrLine int
    name    string
    rows    []compiler.RowInfo
}

func (s *LayoutScope) PresentTokens(lineNo int, tokens []string) Scope {
    if tokens[1] == "<-" {
        s.rows = append(s.rows, compiler.RowInfo{LineNo: lineNo, Label: tokens[0], Tiles: tokens[2:]})
        return &InvalidScope{}
    } else {
        panic("layout cannot handle " + tokens[0])
    }
}

func (s *LayoutScope) Close() {
    s.repo.Layout(s.hdrLine, s.name, s.rows)
}

CDP_INTERPRETATION:cdp-till/internal/parser/layoutscope.go

If the second token is not <-, then we panic. Otherwise, we append a RowInfo object which has the row name (the first token) and all the tile arguments (remaining tokens). We return an InvalidScope because it is nothing may be nested within this.

When the scope closes (in Close), we tell the repository we want to add a layout starting at the line number we were passed in, with the given name and the rows we have collected.

The MethodScope handles assignments and all the actions:
package parser

import "github.com/gmmapowell/ignorance/cdp-till/internal/compiler"

type MethodScope struct {
    repo    compiler.Repository
    hdrLine int
    name    string
    actions []compiler.Action
}

func (s *MethodScope) PresentTokens(lineNo int, tokens []string) Scope {
    if tokens[1] == "<-" {
        s.actions = append(s.actions, compiler.AssignAction{LineNo: lineNo, Dest: tokens[0], Append: tokens[2:]})
        return &InvalidScope{}
    }
    switch tokens[0] {
    case "clear":
        s.actions = append(s.actions, compiler.ClearAction{LineNo: lineNo, Vars: tokens[1:]})
        return &InvalidScope{}

    case "disable":
        s.actions = append(s.actions, compiler.DisableAction{LineNo: lineNo, Tiles: tokens[1:]})
        return &InvalidScope{}
    case "enable":
        s.actions = append(s.actions, compiler.EnableAction{LineNo: lineNo, Tiles: tokens[1:]})
        return &InvalidScope{}

    case "submit":
        s.actions = append(s.actions, compiler.SubmitAction{LineNo: lineNo, Var: tokens[1]})
        return &InvalidScope{}

    case "style":
        s.actions = append(s.actions, compiler.StyleAction{LineNo: lineNo, Styles: tokens[1:]})
        return &InvalidScope{}
    }
    panic("method cannot handle " + tokens[0])
}

func (s *MethodScope) Close() {
    s.repo.Method(s.hdrLine, s.name, s.actions)
}

CDP_INTERPRETATION:cdp-till/internal/parser/methodscope.go

If the second token is the assignment operator, then we create an AssignAction with the variable name from the first token and the items to append from the remaining tokens.

Otherwise, we create an appropriate action based on the keyword (first token) and an appropriate interpretation of the remaining tokens.

When we Close this, we pass the whole thing to the repository as a Method.

These are the Action classes:
package compiler

type Action interface {
}

type AssignAction struct {
    LineNo int
    Dest   string
    Append []string
}

type EnableAction struct {
    LineNo int
    Tiles  []string
}

type DisableAction struct {
    LineNo int
    Tiles  []string
}

type ClearAction struct {
    LineNo int
    Vars   []string
}

type SubmitAction struct {
    LineNo int
    Var    string
}

type StyleAction struct {
    LineNo int
    Styles []string
}

CDP_INTERPRETATION:cdp-till/internal/compiler/actions.go

And the RowInfo class used by Layout:
package compiler

type RowInfo struct {
    LineNo int
    Label  string
    Tiles  []string
}

CDP_INTERPRETATION:cdp-till/internal/compiler/layout.go

And I have added the relevant methods to the Repository interface:
package compiler

type Repository interface {
    Layout(lineNo int, name string, rows []RowInfo)
    Method(lineNo int, name string, actions []Action)
}

CDP_INTERPRETATION:cdp-till/internal/compiler/repository.go

And then I've gone ahead and implemented that interface in the same file with a simple map and some complicated structs:
type Entry interface {
}

type LayoutEntry struct {
    lineNo int
    name   string
    rows   []RowInfo
}

type MethodEntry struct {
    lineNo  int
    name    string
    actions []Action
}

type RepositoryStore struct {
    entries map[string]Entry
}

func (r *RepositoryStore) Layout(lineNo int, name string, rows []RowInfo) {
    _, ok := r.entries[name]
    if ok {
        panic("duplicate name: " + name)
    }
    r.entries[name] = LayoutEntry{lineNo: lineNo, name: name, rows: rows}
}

func (r *RepositoryStore) Method(lineNo int, name string, actions []Action) {
    _, ok := r.entries[name]
    if ok {
        panic("duplicate name: " + name)
    }
    r.entries[name] = MethodEntry{lineNo: lineNo, name: name, actions: actions}
}

CDP_INTERPRETATION:cdp-till/internal/compiler/repository.go

And then added a method to create a RepositoryStore, but this method is not used yet.
func NewRepository() Repository {
    return &RepositoryStore{entries: make(map[string]Entry)}
}

CDP_INTERPRETATION:cdp-till/internal/compiler/repository.go

Let's Take Stock

That completes the front end of the compiler. We have the ability to read all the files in a directory, parse them, interpret the commands and build up a repository of all the items we've created by name.

That leaves three more tasks in the compiler, which I'm hoping will be simpler:
  • Generate a JSON file, either in memory or on disk and enable the webserver to serve it to the client;
  • Add a handler to serve it to the client;
  • Add a task to our watcher to watch the samples directory and generate the appropriate code.
While that will complete the "compiler" portion of the project, we will still have to write a runtime libary to read the code, execute it and render the tiles. That's for the next episode, though, so we're closer to the end of this episode than you might think.

Generate the Code

Now, as luck would have it, Go has very good JSON support, providing that I can put my elements in the right form to have it work correctly. Sadly, when I first tried this, I was very close but not quite close enough. The fixes are mainly of the form of changing field names from being internal names (with lower case first letters) to external, and adding explicit fields to identify the types of repository entries and actions.

Having done that, we can add a Json() method to the repository:
package compiler

import (
    "encoding/json"
    "log"
    "maps"
    "slices"
)

type Repository interface {
    Layout(lineNo int, name string, rows []RowInfo)
    Method(lineNo int, name string, actions []Action)
    Json() []byte
}

CDP_EXPORT_JSON:cdp-till/internal/compiler/repository.go

And implement it in the obvious way:
func (r *RepositoryStore) Json() []byte {
    bs, err := json.Marshal(slices.Collect(maps.Values(r.entries)))
    if err != nil {
        log.Printf("Error %v\n", err)
        return nil
    } else {
        return bs
    }
}

CDP_EXPORT_JSON:cdp-till/internal/compiler/repository.go

The malarkey inside the call to Marshal() is just turning the map into a slice. For updating, having a map is useful, but "as code", it doesn't make a lot of sense. Of course, we will probably turn it back into a map when it gets to the other end 🙂

And Wire Everything Up

Now we have all the pieces, and it's just a race to the finish to wire it all up. Let's start at the top in main:
func main() {
    reloader := chrome.NewReloader("http://localhost:"+port+"/", "http://localhost:9222")
    go watcher.Watch("website", reloader)
    repo := compiler.NewRepository()
    recompile := generator.NewCompiler(repo, "samples", reloader)
    go watcher.Watch("samples", recompile)
    server.StartServer(":"+port, repo)
}

CDP_COMPILER_WIRING:cdp-till/cmd/till/main.go

We need a "top level" action for the compiler to parse the code and store it in the repository. We wire this up to our watcher and also connect it to the reloader. If the sample file changes, we will regenerate the code and then reload the web page. In the fulness of time, that will turn around and reload the code. As with the reloader, we make sure this is invoked on startup so that there is always code available even if the file is not changed.

Let's look at that NewCompiler function:
package generator

import (
    "github.com/gmmapowell/ignorance/cdp-till/internal/compiler"
    "github.com/gmmapowell/ignorance/cdp-till/internal/parser"
    "github.com/gmmapowell/ignorance/cdp-till/internal/watcher"
)

type Compiler interface {
    watcher.FileChanged
}

type CodeGenerator struct {
    repo     compiler.Repository
    srcdir   string
    reloader watcher.FileChanged
}

func (c *CodeGenerator) Changed(file string) {
    c.repo.Clean()
    parser.Parse(c.repo, c.srcdir)
    c.reloader.Changed(file)
}

func NewCompiler(repo compiler.Repository, srcdir string, reloader watcher.FileChanged) Compiler {
    ret := &CodeGenerator{repo: repo, srcdir: srcdir, reloader: reloader}
    ret.Changed("")
    return ret
}

CDP_COMPILER_WIRING:cdp-till/internal/generator/generator.go

It creates a CodeGenerator which is a wrapper around Parse which makes sure the Repository is empty, then recompiles all the code and notifies the reloader.

Finally, we update the HTTP Server:
func StartServer(addr string, repo compiler.Repository) {
    handlers := http.NewServeMux()
    index := NewFileHandler("website/index.html", "text/html")
    handlers.Handle("/{$}", index)
    handlers.Handle("/index.html", index)
    favicon := NewFileHandler("website/favicon.ico", "image/x-icon")
    handlers.Handle("/favicon.ico", favicon)
    cssHandler := NewDirHandler("website/css", "text/css")
    handlers.Handle("/css/{resource}", cssHandler)
    jsHandler := NewDirHandler("website/js", "text/javascript")
    handlers.Handle("/js/{resource}", jsHandler)
    repoHandler := NewRepoHandler(repo, "application/json")
    handlers.Handle("/till-code", repoHandler)
    server := &http.Server{Addr: addr, Handler: handlers}
    err := server.ListenAndServe()
    if err != nil && !errors.Is(err, http.ErrServerClosed) {
        fmt.Printf("error starting server: %s\n", err)
    }
}

CDP_COMPILER_WIRING:cdp-till/internal/web/server.go

We create a RepoHandler, providing it with a pointer to the repository (which it shares with the parser), and then attach that to the Mux at /till-code.

The RepoHandler looks like this:
type RepoHandler struct {
    repo      compiler.Repository
    mediatype string
}

func (r *RepoHandler) ServeHTTP(resp http.ResponseWriter, req *http.Request) {
    bs := r.repo.Json()
    resp.Header().Set("Content-Type", r.mediatype)
    resp.Header().Set("Content-Length", fmt.Sprintf("%d", len(bs)))
    resp.Write(bs)
}

func NewRepoHandler(repo compiler.Repository, mediatype string) http.Handler {
    return &RepoHandler{repo: repo, mediatype: mediatype}
}
This simply recovers the JSON generated by the Repository and packages it up over HTTP.

And then we can test that we can retrieve the generated code:
$ curl http://localhost:1399/till-code
[{"EntryType":"layout","LineNo":9,"Name":"main","Rows":[{"LineNo":10,"Label":"row1","Tiles":["Coffee","Tea","Steamer","Water"]},{"LineNo":11,"Label":"row2","Tiles":["Strong","Black"]},{"LineNo":12,"Label":"row3","Tiles":["Dairy","blank","Oat","Almond"]},{"LineNo":13,"Label":"row4","Tiles":["blank","blank","blank","blank"]},{"LineNo":14,"Label":"row5","Tiles":["NEXT","blank","blank","DONE"]}]},{"EntryType":"method","LineNo":33,"Name":"Coffee","Actions":[{"ActionName":"style","LineNo":34,"Styles":["noun"]},{"ActionName":"assign","LineNo":35,"Dest":"noun","Append":["Coffee"]},{"ActionName":"disable","LineNo":36,"Tiles":["all"]},{"ActionName":"enable","LineNo":37,"Tiles":["Strong","Black","milks"]},{"ActionName":"enable","LineNo":38,"Tiles":["NEXT"]}]},{"EntryType":"method","LineNo":40,"Name":"Tea","Actions":[{"ActionName":"assign","LineNo":41,"Dest":"noun","Append":["Tea"]},{"ActionName":"disable","LineNo":42,"Tiles":["all"]},{"ActionName":"enable","LineNo":43,"Tiles":["Black","milks"]},{"ActionName":"enable","LineNo":44,"Tiles":["NEXT"]}]},{"EntryType":"method","LineNo":46,"Name":"Water","Actions":[{"ActionName":"disable","LineNo":47,"Tiles":["all"]},{"ActionName":"enable","LineNo":48,"Tiles":["NEXT"]}]},{"EntryType":"method","LineNo":50,"Name":"Oat","Actions":[{"ActionName":"style","LineNo":51,"Styles":["adjective"]},{"ActionName":"assign","LineNo":52,"Dest":"adjs","Append":["Oat"]},{"ActionName":"disable","LineNo":53,"Tiles":["milks","Water"]}]},{"EntryType":"method","LineNo":63,"Name":"NEXT","Actions":[{"ActionName":"assign","LineNo":64,"Dest":"order","Append":["noun","adjs"]},{"ActionName":"clear","LineNo":65,"Vars":["noun","adjs"]},{"ActionName":"enable","LineNo":66,"Tiles":["all"]},{"ActionName":"disable","LineNo":67,"Tiles":["NEXT"]}]},{"EntryType":"method","LineNo":22,"Name":"init","Actions":[{"ActionName":"enable","LineNo":23,"Tiles":["all"]},{"ActionName":"disable","LineNo":24,"Tiles":["NEXT"]},{"ActionName":"disable","LineNo":25,"Tiles":["DONE"]},{"ActionName":"assign","LineNo":26,"Dest":"milks","Append":["Oat","Dairy","Almond"]},{"ActionName":"assign","LineNo":27,"Dest":"drinks","Append":["Tea","Coffee","Steamer","Water"]}]},{"EntryType":"method","LineNo":55,"Name":"Dairy","Actions":[{"ActionName":"assign","LineNo":56,"Dest":"adjs","Append":["Dairy"]},{"ActionName":"disable","LineNo":57,"Tiles":["milks","Water"]}]},{"EntryType":"method","LineNo":59,"Name":"Almond","Actions":[{"ActionName":"assign","LineNo":60,"Dest":"adjs","Append":["Almond"]},{"ActionName":"disable","LineNo":61,"Tiles":["milks","Water"]}]},{"EntryType":"method","LineNo":69,"Name":"DONE","Actions":[{"ActionName":"submit","LineNo":70,"Var":"order"},{"ActionName":"clear","LineNo":71,"Vars":["order"]},{"ActionName":"enable","LineNo":72,"Tiles":["all"]},{"ActionName":"disable","LineNo":73,"Tiles":["NEXT"]},{"ActionName":"disable","LineNo":74,"Tiles":["DONE"]}]}]

Conclusion

When I was planning out this project, the idea of creating a language and compiler from scratch seemed somewhat daunting. Not in the thought of doing it per se, but doing it as scaffolding for writing a debugger plugin. But I decided that with enough simplifications and compromises, I could build a complete compiler in a day. I'm glad to say I've managed that.

I didn't help myself by frittering away two hours this morning on completing the previous blog post, and certainly writing this blog post slowed me down, but it's important to recognize that I have made major compromises.

First off, this language is very simple. Secondly, the compiler's only reaction to bad input is to panic, and that won't help users a great deal. On top of that, there aren't any of the usual checks you would see: are things defined if they're referenced? What about types? In short, as a compiler, this is much like JavaScript was back in the days when it was LiveScript.

But the important thing is that I have something that works and that, I believe, can be used to effectively control a limited experience in the browser. Next time, we will take this JSON output and treat it as a program and use that to simulate a till.

Then we can start building our debugger plugin.

No comments:

Post a Comment