Wednesday, August 21, 2024

Reading the Configuration


One of the things that is tricky in these situations is you simultaneously want to change things and want to have them remain the same while you are changing the way it works.

I know I definitely don't want to make any changes to all the (dozens, at least) of source files I have. I want all of that to keep on working just as it has.

But I do know that the whole point of this is to make the configuration more modular, so I will ultimately have to change the configuration files so that they specify the modules in use. Part of me wants to do that up front, but the sensible part of me wants to do it last. So, for now, as I start to pull things out of the "main" code into modules, I am just going to "hardcode" the loading of the modules so that the configuration files do not change. Only at the last minute will I then quickly change all of the configuration files and strip out the hardcoded module definitions.

The ReadConfig class

I may have mentioned before that I write a lot of parsers. And in some sense, this project is a string of parsers connected in one way or another. But I would say that, in fact, it is more just a "text processor" at the moment. There is no abstract grammar involved - I just look at each line of text in context and do a switch based on the first token on the line.

And, tempting though it might be, I'm not going to change that. Very simply because in making everything modular, I am making it less susceptible to an overarching model. That would promote consistency (which is always a good thing), but would also place arbitrary constraints on how the modules work and potentially on literally what they can do; and that is a bad thing. I don't know what every single module might want to do, so I will keep the interface as simple as possible.

The ReadConfig class itself, though, knows almost nothing about this. It just creates a ConfigParser class and pumps it the contents of the Place that it is passed in.
        public Config read() throws ConfigException {
                ConfigParser parser = new ConfigParser(universe, place.region());
                place.lines(parser);
                try {
                        return parser.config();
                } catch (IOException ex) {
                        throw new ConfigException("Could not read configuration " + place + ": " + ex.toString());
                } catch (Exception ex) {
                        throw WrappedException.wrap(ex);
                }
        }

The ConfigParser class

The ConfigParser as it currently stands has too many responsibilities: in reality, it should take the input lines and turn these into a set of actions and inform a listener that something has happened. Because the configuration is hierarchical, it should keep a hierarchy of listeners that reflect the level of nesting. At the various levels, some of the rules remain the same; others change.

In fact, this class has the responsibility for creating the final configuration, as well as constructing all the objects from the parsed configuration. My first task is going to be to break these out. While I'm about this, I'm going to move all of the parsing code into a new package config.reader, and try and leave the actual configuration classes in the config package.

A Nasty Surprise

I was in the middle of reorganizing all of the configuration code, and was about halfway through, when suddently it asked me to authenticate with Blogger. I wasn't expecting that. It turns out that the constructor for the BloggerSink immediately contacts Blogger to find out which posts are live. This seems out of order to me, but, when I thought about where I would expect it to go, I started looking for a phase where back ends are initialized, and realized there is no such place. So I think I will have to consider this when revisiting main() and make sure that for each phase there is a clear "initialize front end processors" and "initialize back end processors".

Interestingly, the drive loader did not seem to load anything from Google Drive during the configuraion step. But did it load the index file from disk? Or is that something else again? It seems like there may be more gremlins hiding in this code than I had realized.

Losing Control ...

It's usually about this point when a major fork appears in the road and it feels like I am losing control of the changes. This is happening here. The fork appears when you have to choose between nailing down one section of the code, and keeping the code working. There is no "one, true" path. This is the price that must be paid for not doing things right in the first place.

In this case, I have the choice between taking the configuration all the way up front, or building some "scaffolding" to keep all the other cases working while I work through the entirety of one case (say the blogger portion). Given that there are no satisfactory solutions, it won't be surprising to hear that I have never found one. And, this time, as usual, I have decided that I don't like the inefficiency of building scaffolding so that I can keep jumping around in the code; I would rather tackle one part of the code and nail it down properly. On the other hand, I am doing the minimum for the moment to truly "clean up" the code; I am just moving what I have to around to make the configuration more "modular": the rest of the code is still hard-coupled together.

In some ways, this is a problem - for a while at least, I'm not going to be able to test anything. And I'm certainly not going to be able to release anything. And there is a lot of work left undone. On the other hand, this always was a huge task, and the configuration files are the one thing in all the projects that need to change (after all, we are trying to introduce modules into the configuration), so once we are through this phase, everything should settle down a little. So my "justification" is: we can't go all the way to our destination on a nice, smooth road, and this feels like the shortest amount of off-roading, so let's get on with it and back onto a road where all our (regression) tests pass again (i.e. all the existing cases where I have used the formatter).

One of the consequences of this choice is that some of the aspects of the configuration are going to have rough edges for now: for example, in refactoring the way in which we think about filesystems, the "Google Drive loader" is no longer specifically loading from Google Drive: it's loading from an abstract point in the filesystem. That needs to be cleaned up, but it's not urgent. We are also saying "why not allow multiple loaders?" but that is not going into the code just yet. So in these cases, we may have some scaffolding; and in others, the code may just be inconsistent. I'm not even 100% sure that by the end, when I declare "victory", it will all be done. Of course, if this were a professional project, the existence of end-to-end tests covering all possible cases would force me into making sure that was done. But in this case, I am just using my existing projects to drive the work forward, and when they all pass I will be happy.

Likewise, one of the biggest things I need to do is to make the system extensible, and that means pulling all of the module code out of the main body of code. As I'm going, I'm trying to move code into packages called "*.modules.name.*", but what I'm not doing yet is requiring the configuration file to have lines in it that look like:
module classname
But that is a step I need to go back to as soon as I have finished the first go-around of modifying the configuration processor.

Then Just Like That ...

And then, suddenly (as always), I make a couple of changes and I'm out the other side. All the configurations load and parse, and when I turn the rest of the formatter back on I'm just a couple of bugfixes away from everything (mainly) working.

The very fact that you are reading this says that I have all of the blogger code working at least to a first pass.

The one exception is that in reorganizing the configuration of the processor files, I explicitly said some variables would be used to configure "modules" - but nowhere did I specify those modules or make the code pass the variables to them. Consequently, those features simply don't work.

I'm not sure that was what I was going to work on next, but it is "divergent" - that is, I would expect it to create more problems than it fixes - so I think I'll tackle that next.

ScriptFormatter main()


Putting in place all the regression testing I wanted took more time than I would have liked. There were a number of reasons for this.
  • Actually isolating the intermediate form and writing it to a file took some time;
  • Reading this back in and decoding it to text, along with making sure everything was correct took a while;
  • Finding all the places where I had used ScriptFormatter, adding them to a regression script and then getting them all to work again (ScriptFormatter has "drifted" over the years, and with no automated testing :-));
  • Of particular issue was the "presentation" module, which (understandably, if wrongly) worked completely differently and didn't split into a "front" and "back" end.
The last one required me to do considerably more surgery than I was comfortable with on a piece of code that I'm trying not to adjust. But there you go. The price you pay.

But once I had done all that, I had a regression suite I was happy with, and that seemed to work quite well.

Replacing main()

My first goal was to try and pull main() into the shape I wanted. I suspected it would not be too hard, and it wasn't. It was basically in the right shape already, just with too many bits and pieces distributed in the wrong places for my liking. While I was about it, I moved "everything else" somewhere else. The idea of course, is to be able to make all of "everything else" be completely testable, and for main() to be the only thing that deals in "real" classes. It's worth pointing out that I haven't done that yet, but it gives that vague appearance. I will come back to that, although I haven't quite decided what gets shared here and what doesn't.

So, for now, main() looks like this:
package com.gmmapowell.script;


import com.gmmapowell.geofs.Universe;
import com.gmmapowell.geofs.lfs.LocalFileSystem;
import com.gmmapowell.geofs.simple.SimpleUniverse;
import com.gmmapowell.script.config.Config;
import com.gmmapowell.script.config.ConfigArgs;
import com.gmmapowell.script.intf.FilesToProcess;


public class Main {
        public static void main(String[] args) {
                Universe uv = new SimpleUniverse();
                LocalFileSystem lfs = new LocalFileSystem(uv);
                try {
                        Config cfg = ConfigArgs.processConfig(lfs, args).read();
                        FilesToProcess files = cfg.updateIndex();
                        cfg.generate(files);
                        cfg.show();
                        cfg.upload();
                } catch (Throwable t) {
                        ExceptionHandler.handleAllExceptions(t);
                }
        }
}
Yes, I've included the whole thing (imports and all) to show that there isn't any trickery going on. I did keep two other (exception-related) classes in the same package, but that's it. Yes, the ExceptionHandler class is static, but even so, it can be unit-tested if you are that way inclined (I'm not, at the moment, but the goal here is testability, not test coverage).

The eagle-eyed among you will notice that even now this does not quite reflect the outline I presented above. Partly because the names are wrong, but mainly because the steps themselves are wrong. But, remember, this is just the first step of a much bigger task: as that task unfolds, I will come back to this and make subtle adjustments.

The key thing to note is that everything already goes through a single object - the configuration cfg - and this is already an interface. The static class ConfigArgs is responsible for creating an instance of the ConfigReader interface and present it to us, at which point we call the exposed read() method. This separation allows us to test the ReadConfig class directly without worrying about the pesky details of file systems and the like.

Moving On

Having got this into shape, I can now tackle the thing I most want to tackle - reading the configuration file and breaking this up into "modules" and then introducing a class to handle modular configuration and dispatch. I'm hoping that this will turn out to be just what I need when I come to processing the modules within my text files. Oh, and the thing that started me down the path: when I want to add nested modules within the modules.

I think this is the design I had in mind when I started this project, although after four years it's hard to be sure. But it certainly seems that a lot of the code falls that way. So let's get started and start refactoring the code that reads the configuration.

Tuesday, August 13, 2024

Rescuing ScriptFormatter

I don't like duplication. And I generally don't like graphical things. I do, on the other hand, like things that are shared seamlessly between environments.

So when it comes to writing documents, I have moved towards writing everything in Google Docs. But I don't format anything there. At least, I don't use the Google Docs formatting tools.

One reason is that I prefer structure over appearance: while it's important to me to be able to indicate that I want certain words to be bold or italic at the drop of a hat, I want to be explicit about chapter and section breaks and not have to guess what's going on based on fonts.

So when I was writing a movie script during lockdown, I didn't "format" it in my word processor, I just wrote what came naturally and then said "I'll write a program to transform that to the appropriate standard". And I did.

It was a quick hack, and it was somewhat experimental - it involved downloading from Google Docs, and it involved generating PDFs. And while I had sort of done those things before, all of this was basically new to me, so I just went ahead and did it. And I hardcoded things like styles as well.

When I wanted to format some documentation, I thought "well, I've got that, so I could just...".

When I wanted to write a book, I said "well, it isn't so very different...".

When I became completely bored of formatting this blog in the Blogger tools, I said, "well, I could integrate with Blogger...".

When I wanted to work on a presentation, I said, "it would be cool if I could extend this..."

And now I have a mess.

The Good

The good news is that with the amount of experience I have, I did at least lay out a basic design and structure first time up that reflects at least a four-stage pipeline. There are the obvious two stages of "front end" (parsing all the input files into an intermediate form) and "back end" (turning that into something readable). There are also the stages of collecting the input files (e.g. from Google Drive) and uploading the final result somewhere (e.g. to Blogger). The process of configuring the tool might be considered a separate stage (or two). So we can say that there are six stages, and they are largely separated in the code.
  • Read a configuration file,
  • Configure the tool, finding any local files and loading modules,
  • Download source files from all appropriate remote locations,
  • Parse the source files,
  • Generate the output file(s),
  • Upload the files.
The individual processors are currently all classes in separate packages within the project, and have a flexible configuration mechanism. A lot of the processors use inheritance to share code, although in a haphazard way.

The Bad

Everything is currently monolithic and in the same project, and there is no ability to extend this.

There is quite a bit of duplication. For example, I have two different pieces of code (which work slightly differently) to access GIT repositories - one for the documentation tool and one for the blogger tool.

There are basically no tests of any description.

The Ugly

This is not really a piece of software that "does one thing well". In fact, it does a whole bunch of different things fairly well, but with no real consistency or vision. In other words, it's a hack.

I guess in a lot of ways, that's what I paid for when I built it. But it's not what I want now, and certainly not what I want going forward.

Often I up my game when I want to add new features. I don't see how to add a new feature without retrenching. On this occasion, however, it's just simply that I've reached a point where I can see more clearly what to "do one thing well" would look like. And thus how to factor (or modularize) this whole thing correctly.

For now, I'm going to do it all within one project, but I want to build it out to be extensible across multiple projects.

The Larger Design Space

In the original version of his paper on the spineless, tagless, G-machine, Simon Peyton-Jones described how different implementations of functional language machines originally "appeared as isolated 'islands'" but later it was possible to see how they related to each other in a "larger design space". Something very similar has happened here: I have five or six applications all of which fall into the overall design space of "converted annotated input from somewhere (possibly with additional metadata from elsewhere) into some readable document somewhere".

That is the one thing we want to do well. Including, of necessity, the task of finding all the modules that the user wants to incorporate and plugging them all together.

Testing and Testability

One thing I'm going to insist on as we do this is that everything I build should be testable at multiple levels:
  • I want lots of unit tests wrapped around the individual "functions" within modules;
  • I want to be able to test the configuration tool works correctly;
  • I want some tests which assert that the interfaces to the external world (e.g. Google Drive, Blogger, git) work;
  • I want the ability to create "golden" tests for whole phases - particularly the "front end" - that enable me to assert that certain collections of features work together nicely.
I tend to distinguish between testing and testability for a couple of reasons: one is that I am simply not very good at always "testing first" - I am often driven more by the overall goal or sweep of a story rather than specifically writing tests; the other is that it takes a lot more effort to make something testable than it does to write a specific test - so once you have made something "testable", it becomes a lot easier to actually test it.

My goal of this rescue project is to do the hard yards of making everything testable. I can then write tests as and when I feel like it.

Testability and the File System

"Always design to interfaces, not classes", we are told, and then we are given filesystem abstractions that are nothing like that. I don't understand why.

Over my life, I have tried fifty different ways to resolve and simplify the poor tools that we are given for dealing with files. So here is number 51. The main goal is testability, but I'm also keen on reducing bloat, making sure that it is easy to do the things I want to do, thus reducing duplication, and encouraging a "tell-don't-ask" style.

The Universe/World/Region/Place model

If you start of thinking about "content" rather than files, you immediately move to something like a URI notation rather than a path. I'm going for something similar, except I want more "active" objects rather than just a path that you then need to do something with.

So every content document is somewhere in "the Universe". This "Universe" consists of providers, some of which are going to be on your local filesystem, some could be on remote servers (such as NFS or SMB), some could be in databases, and some could be in the cloud (Google Drive, iCloud or S3). I am going to model each of these providers as a "World", although I accept that there might be situations where a single provider would offer multiple "Worlds" for some reason (e.g. AWS might offer S3v1 and S3v2, built against V1 and V2 of the S3 client library).

Each world can offer a default "root" and multiple named "roots". A root is basically the starting point for navigation and gives you access to a "Region". Loosely, a "Region" corresponds to a directory, a folder, a label or a prefix (in S3). Each Region can then nest other Regions, and finally, within some Regions it is possible to find "Place"s. A Place is simply a content file.

The only solid class you need to create to make all of this happen is the Universe - after that, you go the Universe to ask it for a World; you ask the World for a Region; Regions for subregions and Places; and you ask Places for their contents.

Each of these objects also has other operations you can perform - such as creating new Regions and Places, or writing to a Place. And each of these operations happens by interacting with an object you have already been given, not by creating a new concrete reader or writer implementation.

This, of course, makes the whole thing testable: simply have your main() implementation create the "real" Universe and pass it in to your code; alternatively, create a double of the Universe (or smaller object) in your test code and pass that in to a class under test.

In addition to be testable, this is also extensible: either explicitly or by using a service provider model, it is possible to have the "real" Universe pick up drivers for other Worlds and configure them appropriately. As long as they implement all the necessary methods, they should just slot straight in.

Initial Impressions

Basically, I'm really happy with this. It took me a couple of days (admittedly, spread out over a week or two, but that's just the pace at which I'm working at the moment) to convert everything and implement most of this, along with unit tests and integration tests against Google Drive.

One of the things that I struggled with was how much of the code was very similar and extracting that. I don't think I want all of these things to inherit from some base classes, so I created some utility classes instead that can take the abstract references (to Worlds, Regions and Places) and then do the hard work in a central place. One of the things I wanted Places to do was offer a lines method, that did all the hard work to provide the contents of the Place one line at a time: I found myself duplicating code about LineNumberReaders all over the place. Of course, I don't want to do that; I want to share that code, while having the code that can acquire a Reader from the Place.

Configuration and Modularity

In every project, there is a "bootstrapping" problem. In this case, it is: "how do you find the configuration file?". There are two parts to the answer to this question. The first part is that the application is going to take exactly one command-line argument, which is the name of the configuration file. The second part is that the configuration file must be somewhere in "the Universe" before configuration happens. For me, for now, that means it is on my local file system, and the driver for the local filesystem is loaded into "the Universe" in main(). After that, other worlds can be configured as modules in the configuration file.

This is, of course, not the only way of doing this. One alternative is to use some kind of "service provider" pattern in which any World implementation found on the classpath is automatically added to the Universe before looking for the configuration file. I have done this in other places on other occasions. I have never been as happy with it as I would have liked. It gives the appearance of reducing duplication, because you are not specifying a driver both in the classpath and in the configuration file. And when everything works, this is great. But when things do not work, you can spend forever trying to track down an obscure bug (I had this recently trying to use the javax.mail replacement: there was a nested dependency I had not included, and none of my IMAP messages were being decoded; four hours of my life I will not get back). But the thing is, at the end of the day, we are not talking about true duplication here: the classpath specifies where to look for what drivers are available; the configuration file selects which of these should be used. The real advantage is that if you want to use something that is not available, you fail early: an error message tells you that the driver you want is not available.

A Modular Pipeline

So, basically the tool that we are building is a configurable pipeline which:
  • Reads the configuration file;
  • Identifies the modules to be loaded and loads them;
  • Figures out which modules should be used in this configuration;
  • Configures them with any options the user might have provided;
  • Asks the "download" modules to collect all the relevant files;
  • Asks the "front end" modules to interpret the files and produce an intermediate form;
  • Asks the "back end" modules to convert the intermediate form into a final form;
  • Asks the "distribution" modules to upload files as necessary.
Note the plurals there. I think it is very reasonable that you could want, for example, multiple sources for your input files. Possibly multiple locations from the same system, or possibly a combination of systems (e.g. some local files and some from Google Drive). It's also possible you want to generate PDF, EPUB and HTML from the same intermediate form.

The core code then, should be very simple. Probably 20-30 classes with a total of about 1-2 kloc. All the hard work will be done in the modules. But this is great, because the more modular everything is, the easier it is to test close to the bone. The main code should be very short indeed (10 lines?) with only a couple of concrete class names being invoked in order to get everything started.

Testing from the Top

But the current implementation has few, if any, tests. How do I know if I'm going to break anything?

Good question. I'm going to have a three part strategy:
  • First off, I can eyeball things if I want to. For example, I'm writing this blog post in parallel with making the changes, and I'm checking through it as I go. If I break something obvious in the Blogger pipeline, it will show up.
  • I can compare "before" and "after" files somewhat automatically. For example, I can generate (but not upload), all my historical blog posts and then compare the output HTML to the HTML actually on my blog. Actually, I think I could add a different "distribution" module which directly compared the generated HTML to what's already up there (rather than uploading it). I love it when you can use something you're developing to help you solve its own problems! It is possible to do much the same for PDF using Acrobat.
  • I am going to (early on) add code to "capture" the intermediate form in a file. After doing this, I will be able to write "golden tests" for the front end which transform a known input in a test directory and then assert that it generates the correct intermediate form.
And then I hope to strangle the rest of it by adding more and more tests.

Right Now

I've implemented the "Universe" model - at least as far I need to - although I still want to clean it up and factor more things out.

Before getting in to the hard work, I want to save the intermediate form to a file to make "big picture" testing easier.

Then I want to try and separate out the "pipeline" from the modules, while currently hardwiring in the modules so that things keep on working. My goal is to try and reduce main() to about 10 lines which reflect the claims I made above about it being an 8 step pipeline.

I'm not going to blog continously about this, (or generally provide code samples), but you can find all of the code on github. From time to time, particularly if I have done anything interesting, or feel I have achieved something clear, I may follow up with another post.

Tuesday, January 30, 2024

Collecting the FBAR Data


You may recall from last year that I used Playwright to try and automate the process of filling out the US Government's Foreign Funds form FBAR.

The problem was I had to collect all the data by hand and put it into a JSON document to start the process. Not ideal. Especially when I keep all the information I need about what I own in a spreadsheet already. That is, I keep all the data in a spreadsheet. There aren't any calculations in the spreadsheet.

For the simple reason that I hate spreadsheets. Yes, they are fine for entering and holding data, but they suck as far as doing calculations. When I have this strong an opinion (especially when it isn't widely held), I do like to check it with a little rational analysis. Something like this:

Interrogator: Why do you feel this way about spreadsheets?

Me: Data and code are different things. You shouldn't mix them. Spreadsheets jumble data and calculations together and then make you repeat the calculations over and over in different cells when they apply to different data elements. And if you ever want to change the way you do a calculation ... Life isn't supposed to be that way.

Interrogator: So what's the alternative?

Me: To use individual sheets to store your data in much the same way you store data in a (relational) database - each sheet represents a table and each row represents a single instance of a common item type: not dissimilar to a struct or class in a programming language.

Interrogator: If you like databases so much, and they already exist, why not use one?

Me: Because I do like spreadsheets from a data entry point of view; they are much easier to manipulate and use for visualization than a database is.

So, in short, I want to enter data in a spreadsheet, organise it like a relational database, and then access it using a programming language with a functional metaphor. While I'm doing exactly what I want, I'm going to invoke event sourcing and star schemas. If you think that star schemas are all about "big data", you may be confused by the fact that what I'm talking about here is just a few hundred rows of data. But what I'm after is the fact that a star schema almost perfectly models the concept of event sourcing: for example, the ability to support metadata that changes slowly. If you own an asset for long enough, something will always change: the name, registered address, contact person, whatever. Not to mention the fact that what you own also changes, and thus you need to keep a record of what you currently own and what needs to be reported on. I may only own a few assets - and value each of them 2 or 3 times a year - but over a long enough period of time, it is complicated enough that you want that level of help in tracking what is currently owned and thus needs to be reported.

So let's start off with a nice, simple abstraction. We can have a set of asset valuations. Say we have three assets: A1, A2 and A3. Each of these has a particular value on a particular date. We thus create a table with columns for date, symbol and value. So far, so good. When we say we are modelling this using event sourcing, what we mean is that we never delete or update a row. The table just keeps on growing. We have a complete history of all the valuations we have ever recorded.

So how do we know what the "current" value is? Well, that information may not (in fact, generally is not) in the spreadsheet. But we can find the most recent valuation. We simply filter for the asset we're interested in and then sort by date to find the most recent one. This is not hard using the UI. It is hard (although possible) using things like VLOOKUP and pivot tables. You may want to do that; I don't.

However, I claim that adding the notion of star schemas and slowly changing dimensions makes it simply impossible to do in a spreadsheet (although I'm more than happy to admit I'm wrong if you want to show me how, it still won't change the fact that I don't want to do it that way). So, for example, we can add a second sheet that indicates the kind of information - such as registered address - that we need to have to report an FBAR. This can change over time, so we have each row in this sheet be keyed by date and asset - just like the main valuation table - and do the same "reduction" operation to find the most recent version of the information. We can then "join" these - in the sense intended by standard relational databases - on the asset symbol to come up with all the relevant information about that asset.

We can, of course, do more. And we need to in order to complete the FBAR. For example, we only want to submit information relevant to a given calendar year - so we need to trim our data set to exclude all the dates that are not in the last calendar year. Except, for the dimensional data, we need to make sure that we have the data which was up to date at the end of the year, even if it was from a previous year (the nature of slowly changing dimensions is that they change very slowly).

The final thing that we need to talk about before launching into code is the fact that I want to approach this in a "functional" or a "pipeline" way. I've written before about how, in my mind, these two things are the same thing, just presented differently, much like "Polish" and "Reverse Polish" mathematical notation. So my approach is to use a functional language to describe what I want to do, and thus build up a model of a pipeline; the program will then interrogate that model and design an appropriate pipeline with the requisite delivery between components to try and ensure the minimum amount of rework is done.

Fortunately, I carry a functional language compiler with me everywhere I go: as part of my day job, I have the FLAS compiler which is, fortunately enough, designed to be embeddable and extensible. Just the thing we need for this job.

Let's get started!

Modelling the Data

The whole point here is to store the data in a spreadsheet, and, because we want to store different kinds of data, we will use multiple sheets for this. Each sheet has a name, and that is how we will find it to load it. For our purposes, it does not matter that much which database is used: I started off doing this using Excel but moved to Google Sheets a few months ago (transferring the data was actually harder than I'd expected, but I think I've sorted all the issues out now). Each sheet, then, contains rows of data of a particular type with a header indicating the field. So we can model these in a programming language using a record-like construct. In FLAS, this is called a struct, just like it is in C.

Let's start with what I'm going to describe as "the fact table", the table where I record the value of each asset every time I look or they send me a statement:
    struct Valuation
        Date date
        String code
        Number quant
        Number unitprice
        Number total
        String comment
There are a few more fields here than I outlined above, but hopefully the reasoning is obvious: some assets come as a quantity held and a price per unit: the total is obtained by multiplying these together; others just come with a simple total. We need to support both classes in this sheet and struct (yes, we could put them in different sheets and use different structs to hold them, but that is not the choice I made). There is also a field for comments that I have used from time to time, in particular to record "odd" things that have happened, such as an asset being split or redeemed or something that means that it appears to change value dramatically, but there is a good reason.

While we're on historical reasons, I have some assets in British Pounds, and some in US Dollars. In order to be able to do anything (and, in particular, complete an FBAR), it's necessary to convert between these. So from time to time I look up the exchange rate between these two and store it as the asset GBP with a value of a Pound in Dollars ($1.26/£ last time I checked), although I'm starting to think that it's a sufficiently different beast that it should go in its own sheet as its own dimension. The clue is really in the fact that it has its own struct.
    struct ExchRate
        Date date
        String symbol
        Number rate

Querying the Sheets

Yes, I really do think about this as if it were a database. So I'm going to go ahead and write some "operations" that enable us to load data from the sheets. There are two "operations" for this: LoadFromXLS and LoadFromSheets. As noted, I'm fairly agnostic about where the data comes from - you just need to have the appropriate adapter. And these are the two that I have written: Excel and Google Sheets.

What do I mean by "operation"? Well, in building up a data processing pipeline, we are going to do all sorts of things such as loading, transforming, merging and printing. Each of these steps is an "operation". Except, in line with almost all things functional, we are going to describe our desire or ability to do an operation, but not do it yet. We will only do that when we get to the very end and know what output is needed. Don't waste work.

So we describe our desire to load information from Google Sheets by creating a struct of type LoadFromSheets defined as follows:
    struct LoadFromSheets
        String fileId
        String sheetName
        Type emit
And so we can load our valuation fact table like so:
    values = LoadFromSheets sheetId "Valuations" (type Valuation)
This says that we would like to load a spreadsheet from Google Sheets. Obviously in order to do this, we need to tell it which sheets document, which sheet within that document, and then the kind of struct we want to map from there. Utlimately, it will attempt to match up (as best it can) the field names in the struct to the column headers in the spreadsheet and be ready to load the table.

Where did sheetId come from? Well, it's basically an environment variable. In this "Modeller" app, we can access external variables of this kind by asking the builtin object modeller to find a named variable, like so:
    sheetId = modeller.getvar "sheetId"

Extracting the Exchange Rate

I noted above that for historical reasons, the exchange rate is mixed in with everything else. In order to extract it, we need to filter and map the Valuation objects to find the ones that have the symbol GBP and then convert them into ExchRate objects. This is a mix of "operations" (Map and Filter) with vanilla functional code (isExch and toExch):
    exch = Map toExch (Filter "code" isExch values)
        isExch s = s == "GBP"
        toExch (Valuation v) = ExchRate (v.date) "GBP" (v.unitprice)
This first finds all the entries in the Valuation table which have the code GBP. It then converts these from a Valuation to an ExchRate giving the date, symbol and unit price.

Dimensions

I'm not going to claim that the FBAR is the most complicated form ever, but it does have a lot of requirements, and a lot of moving parts that need to be coupled together. To make all this happen, we have four additional dimensional tables. Again, these are tables which include information about the assets that just doesn't change very often: for example, your address or the name of the asset.

We load all of them and populate objects in much the same way we loaded the fact table.

The first is not FBAR-specific, but is the "core information" about an asset:
    struct CoreInfo
        Date date
        String code
        String name
        Boolean active
        String currency
        String accountNo        
    coreInfo = LoadFromSheets sheetId "CoreInfo" (type CoreInfo)
The FBARInfo is the information specific to FBAR associated with a given asset:
    struct FBARInfo
        Date date
        String code
        String status
        String mergeAs
        String assetType
        String institution
        String accountNo
        String address
        String city
        String postCode
        String country
    fbarraw = LoadFromSheets sheetId "FBARAssets" (type FBARInfo)
There are also two tables which are needed in order to complete the form. FBARBase contains the boring information that goes on the front page and FBARUser gives information about the several owners of the assets.

Preparing the Data

There are many shenanigans that go on in preparing the data to be joined together: mainly around the task of choosing the most recent information. I'm going to skip most of it, because it's a lot of hard work, but I claim it is fairly well described in the code itself.

I am going to present one piece of the code which I think is fairly typical of these shenanigans. In order to report on last year's valuations, we need to trim the Valuations table down to just a mapping of asset name to highest value. This is handled in these three lines of code:
    lastYear = Filter "date" isYear values
The first line extracts only the Valuation entries which have this year's date (the function isYear is not shown for simplicity). Then highestLastYear is defined using the KeyedTable operation, which selects exactly one item for each matching key (the key in this case is just the code field, but it is possible to use multiple fields in the key). The function selectHigherValue is specified here to decide how to choose between any pair of source entries: in this case we choose the one with the higher value in the total field.

Doing the Join

Now that we have all of our tables in place, we can do a star join. The result will be a stream of values, each of which will have one fact and one item from each of the dimensions with which it is joined. Each source fact will only appear at most once; the dimensional data may be repeated. This is achieved by defining a new type which can hold the joined data:
    struct FBARStar
        CoreInfo core
        FBARInfo fbar
        Valuation highest
Then we can specify the actual join itself. This requires the Star operation, which takes the output type (FBARStar as above) together with a list consisting of the field names where the various parts of the join are to be place, the field in each table which will be used to do the join, the fact "table", a list of dimensional tables and finally the inclusion semantics (see below).
    fbarstar = Star (type FBARStar) ["highest", "core", "fbar"] ["code", "code", "code"] highestLastYear [fbarCore, fbardim] [False, True]
Getting just slightly ahead of ourselves, this works by looking at each entry in turn in the fact table. An attempt is made to join it to each of the dimensions in turn. If the join fails for a given dimension, it may either produce a null, or the whole fact may be thrown out. This is determined by the "inclusion semantics". If set to True for a given dimension, then the dimension must join in order for an output row to be produced; if False, an output row will be produced anyway, but that dimension will show up as null. In this case, we want the presence of rows in the final output to be conditional on the FBARInfo being there for them, so we specify True for that table.

Producing JSON

In order to connect to the automated Playwright tool to fill in the form (which I wrote last year), it will be necessary to produce a JSON file. Of course, this is not technically necessary, it would be possible to have this Modeller program directly talk to and control Playwright. But, for now at least, I am happy to run this as two separate steps, with a very visible (and checkable!) barrier between the two in the form of a JSON file.
    fbar = JsonStore 
              fbarFile                                        // the output file
              {}                                            // a base hash to become the document
              [                                                // a list of injections
                ("", injectbase, base),                        // inject the "base" information into the root of the object
                ("users", injectuser, users),                // users
                ("solos", injectsolo, soloassets),            // solo assets
                ("joints", injectjoint, jointassets)        // joint assets
              ]
Wow! That looks complicated. But wait, and sit with it for a while.

The first line is basically just saying that the rule fbar is a JsonStore command; and the second identifies the file where the resultant JSON should be stored. But what is the JSON? Well, it's a Hash object (i.e. a JavaScript object) which is populated from the various different sources of information that we have. The third line defines a "base" JSON object, which in this case is the empty hash. The final six lines offer four "injections" into this object. That is, each of them is a triple of a location in which to store the information; a function which injects the information into that location; and a pipeline expression which has the data to be injected. There is, of course, a bunch of plumbing that goes into supporting that, but it is all just boilerplate functional code. It's there in the repo if you want to look at it, but I'm not going to dwell on it here.

That variable fbar is not special, as such, but it is the one that we pass to Modeller on the command line to say that we want the effect of evaluating that pipeline. And that's what happens: the spreadsheets are read, the transformations done, the calculations calculated, and the JSON file is spat out at the end. It may be possible to do all of that in Excel, but I wouldn't like to try. It wasn't quick getting to this point this way, either, but most of that time was spent writing the tool, which is reusable work 🙂

How Does this Work?

OK, so now we've described what we think we want to do. How does the Modeller program deal with that? Well, it starts by finding and evaluating the constant specified in its command line (in our case, fbar). It expects that to form some kind of graph (specifically it expects it to be directed and without cycles) of operation instances. It then works its way through the graph until it reaches the leaves, which should be sources, that is, operations like LoadFromSheets that can magic information out of nowhere. It then inverts the graph, connecting each source to all the operations which reference it, until it comes together back at the root of the graph with an operation we expect to be a sink, that is an operation such as JsonStore which knows how to consume data (in this case storing it to disk) but doesn't produce any.

As it does this, it records all of the sources in a list to be invoked, and then asks each of them to start generating data (in this case, reading it from a spreadsheet). As each of them reads a row from the spreadsheet, it passes it to all of the interested parties (or "subscribers"). They, in turn, do all of the processing they want to do and spit out their outputs in turn to their subscribers. This process continues until there is no more data coming out of the sources and, as this knowledge trickles through the network, the activity slowly dries up until finally the JsonStore is told that it has all the information it needs and it can stop updating the object and write the whole thing to disk. Note that nowhere in here are there any guarantees about ordering: we don't say which sheets will be read in which order, or whether the rows in a sheet will be presented in any particular order. It is up to the components (and, to a lesser extent, the code) to make sure that it is agnostic to the ordering of information by concentrating on the data itself and not the order in which the data arrives. For example, if you want something sorted by date, choose the record with the appropriate date, not just the first or last record to arrive.

There is a small wrinkle in all of this which is vaguely disquieting to me. Having worked on such systems in the past (at a bigger scale), it feels that this kind of flow should be continuous: the idea that we read from a sheet and then stop is vaguely unsettling to me. But it does mirror the small-scale workflow that we are targetting here. If we wanted to make it continuous, we would need to add a clock to the system which, from time to time, sent a message to all of the sources, saying "act as if you've just finished". As each source received this message, it would send out appropriate updates, followed by a message that said "this is a checkpoint, pass it on". This would carry on up the chain until the JsonStore wrote out a version of the file with the most-current data. As I say, this could be done, but really doesn't suit the kinds of very small workflows I have in mind. It is easier just to run this once a year (or once a month if you want a summary of your assets every month).

But That's Not All

Now, I claim that just being able to do all that is pretty cool. But actually, you can do so much more. For example, the monthly valuation report uses the same data (although just the Valuation fact table together with the CoreInfo dimension table). But it pulls it together differently and then summarizes it into a PDF file (using the MakePDF sink rather than JsonStore). This uses (some would say abuses) the notion of cards that is central to the conception of FLAS as a web programming language.

The user defines and then creates a card as a recipient for one or more streams of data. As that data is delivered, the card processes it and renders the output as HTML in real time. Modeller can then read that HTML and turn it into PDF using styles specified by the user in a format not dissimilar to CSS.

Final Notes

Because, while this is an adventure into the unknown, it is not an adventure into the complete unknown starting from scratch, it is in its own GitHub repository at https://github.com/gmmapowell/modeller. Everything here is current at the tag MODELLER_FOR_FBAR_BLOG.

The question is bound to arise as to why I did not simply do everything is a straightforward functional fashion - after all, if we can express our intent in a functional program, why not just code it that way? Again, the answer really comes down to history and experience. When I first started writing this (the evidence would seem to suggest over five years ago), the functional configuration was an aspiration (the FLAS compiler was in its infancy then) and I was basically trying to implement on a small scale an in-memory version of something like Ziggrid that I had been working on back in 2013. So I built an engine that worked that way, and hardcoded applications using it. And then over the past year I've backed my way into a more configurable thing.

But also, I have to give a plug to Phil Wadler's classic 1984 paper, and say that by inverting this as I have, I have once more shown that listlessness is better than laziness.

Conclusion

Even though I spent a lot of time building out the Modeller core, along with implementing all of this, and entering the relevant data into sheets, and then debugging it all, I'm going to say that this has been my easiest FBAR filing to date. But I'm hoping that it will be better in future years.

But more than that, I'm just really happy with separating code and data, and there could not be a clearer example of it than this.

I realise that this is a best a brief overview of what is (now) a large project. I may come back and tackle various aspects of it at other times, but to cover the whole project here would be an enormous task and, while I am very happy with the product, I am substantially less happy with the code, which is at best held together with string and sealing-wax.

Capturing Downloads in Playwright

When I left the FBAR project last year, I added a note that I had subsequently added more code to parse the real data from a JSON file, and used this to submit my form electronically, but that the completed form disappeared into the ether when I tried to download it. I resolved to fix the problem when I came back to it this year.

And now here I am.

The problem is, of course, that playwright (being a test tool) operates in a sandbox environment. So any downloads are put somewhere on the file system, but where? Having now figured this out, I can confirm that it also clears them up once the test has finished.

However, all is not lost. A quick web search reveals this suggestion from Microsoft. So let's try that (at least, translating to Java). In the same project, I added a new test script, TestDownloads and put (basically) this code into it:
        public static void main(String[] args) {
                try (Playwright playwright = Playwright.create()) {
                        Browser browser = playwright.chromium().launch(new BrowserType.LaunchOptions().setHeadless(false).setSlowMo(50));


                        Page page = browser.newPage();
                        page.navigate("https://pixabay.com/photos/flowers-meadow-sunlight-summer-276014/");
                        
                        page.onDownload(d -> { System.out.println("Download found at " + d.path());});
                        Thread.sleep(10000);
                } catch (InterruptedException e) {
                        e.printStackTrace();
                }
        }
The first few lines should be familiar. I'm navigating to a different (somewhat random) page as being somewhere that I can easily push a "download" button. The idea (as I see it) is that the onDownload method will fire the callback when it sees the download take place. But no joy. Nothing comes out.

It's interesting that this doesn't work. The documentation says it should, and simply not firing seems an egregious bug. But looking at this code, we don't interact with playwright after setting up the onDownload, so maybe it just never gets to fire (if it has to fire from the main thread, not a background thread). Is there some kind of loop we can go to?

I found waitForClose, and it seems reasonable that that would do the job. If nothing else, it seems an upgrade on that 10s sleep we've got there. But then, just down from waitForClose is waitForDownload, which also returns the Download object. Let's try that!
                        Download download = page.waitForDownload(() -> { });
                        Path isAt = download.path();
                        System.out.println("Download found at " + isAt);
Lo and behold, it works! More than that, the onDownload also fires, so now we have two lines of output. I also replaced my sleep with waitForClose, and that works too, including firing the onDownload callback (this is fired when the download happens; it doesn't wait until the browser is closed).

So, we have two ways of doing this: a notification method and explicitly waiting. I'm not entirely sure which one I prefer. The onDownload method has the advantage that it wil capture multiple downloads. The waitForDownload method has the advantage of clarity: we are clearly waiting for a download to happen.

So, now we have a path. This is where we learn that it is indeed under /tmp, and is indeed deleted when the test is cleaned up after we close the browser.

Copying the file

What we need to do is to copy the file somewhere else where it will be safe. Obviously, this should involve some kind of argument to the program - I'll do that in the "real" FBAR program - but the easier thing is just to say I'll copy it to "normal" /tmp.
        File isAt = download.path().toFile();
        File saveAs = new File("/tmp", isAt.getName());
        System.out.println("Copying " + isAt + " to " + saveAs);
        try {
                Utils.copyFile(download.path().toFile(), saveAs);
        } catch (IOException e) {
                e.printStackTrace();
        }
And, running this, we end up with:
-rw-rw-r-- 1 gareth gareth 370627 Jan 13 03:48 /tmp/5ea3c1a8-2dfc-416c-9559-13eed8f6153f
This is checked in and tagged as PLAYWRIGHT_CATCH_DOWNLOADS.

One Last Thing ...

When I went to actually submit the form and capture the download in real life, I found I couldn't. I can't rightly say if this is a change since last year, or if I did something cunning last year. But this year, there are JavaScript alerts that pop up when you sign and submit the form and (by default) Playwright just cancels them. A quick web search gives the answer though and a few minutes later I'd added this code:
                        page.onDialog(dialog -> {
                                System.out.println("Dialog message: " + dialog.message());
                                System.out.println("Dialog prompt: " + dialog.defaultValue());
                                dialog.accept();
                        });
I ran through it all and, lo and behold, everything worked and I had a copy of the downloaded form in my long-lasting directory.

Conclusion

Although nothing is ever as straightforward as you would hope, adding in all of the code to capture downloads and trap alert boxes in Playwright is not hard, and it's possible to submit an FBAR from a JSON file and then download the result. Woo-hoo!

Sunday, January 7, 2024

Getting to Grips with Memory Allocation

Now I want to go back and handle memory allocation properly. All of our buffers and arrays are statically allocated (and often reused). And we have the hack to address aligning message buffers that I want to remove.

It was at this point that I discovered the Embedded Rust Manual while searching for information about memory allocation. In particular, I became much more aware of the difference between the core crate and the std crate. Specifically, most of the hard work is done in the core crate and the main things that are missing are memory allocation and automated loading of libraries. Collections such as Vec are also excluded, but can be included separately by importing them from the collections crate. Macros such as println! also seem to be a problem, but it's one I've seen other people work around and will come back to.

Explicit memory allocation is covered in Chapter 7 of the Embedded Rust Manual and it seems at first glance to be really complicated. Now that I think I've understood it (having read numerous other references as well), let's test that by seeing if I can explain it while writing some code.

Let's start by writing the line of code that we really want, replacing the code that allocates a message buffer on the stack with one that allocates a 35-byte message on the heap and returns an array.
    let mut buf: &mut [u32] = allocate_message_buffer(35);
To implement this function we need to allocate a block of memory using the "global allocator" and then "turn that" into an array slice.

In order to use alloc directly, we need to add the following declaration at the top of our file:
extern crate alloc;
The documentation says that it is also necessary to specify #![feature(alloc)], but the compiler contradicts this, saying that this is only necessary for unstable features, and the alloc feature has been stable for a while, so the feature declaration is no longer needed.

The function to allocate the memory we need for our message buffer is then fairly easy to write, with the caveat that it involves some arcane Rust magic:
fn allocate_message_buffer(nwords: usize) -> &'static mut [u32] {
    unsafe {
        let ptr = alloc::alloc::alloc(alloc::alloc::Layout::from_size_align_unchecked(nwords * 4, 16)) as *mut u32;
        alloc::slice::from_raw_parts_mut(ptr, nwords)
    }
}
Starting in the middle, the normal way of referencing alloc would be std::alloc::alloc, that is, the alloc function in the alloc module of the std crate, but because we aren't using std but the alloc crate direcly, it becomes alloc::alloc::alloc. Likewise, the Layout class would normally be referenced as std::alloc::Layout but becomes alloc::alloc::Layout.

The key thing we want to achieve is 16-byte alignment, so we specify a Layout with a size of nwords words and an alignment of 16. It's my impression that both of these values are expected to be in bytes, but I can't claim that I have read anything which makes this absolutely clear. We then pass the resultant Layout object to the alloc method which returns a *mut u8 pointer, which we then cast to a *mut u32 pointer.

However, what we really want to do is to return an array slice, which is a "regular", stack-allocated object which wraps this pointer together with other meta-information (specifically, the size of each element and the number of elements that can be referenced without overflow) which can be used "safely", that is, with full compiler checks and no need for the unsafe keyword.

To effect this transition, we call the slice::from_raw_parts_mut function, passing in the pointer and the available number of words (I think it deduces the size of each entry from the type of the pointer). This constructs and returns an array slice, which we then return to the caller.

Most of the return type - &mut [u32] - makes perfect sense to me, but the compiler forced me to add the 'static lifetime. I'm still a Rust newbie, so I can't claim to understand this, but if it's what the compiler requires, it "must be" right - at least until it isn't or until I know more. My instinct is that we want to copy what we have created on the stack into the parent, not return something with the static lifecycle, but since I don't understand what I am doing, I will follow the compiler's advice until I run into trouble.

There are a couple more changes that I need to make in order for all the code to be consistent. Obviously, I want to remove the copy into the Message struct, and we need to then convert the array slice address to a raw pointer using the as_mut_ptr.
    mbox_send(8, &mut buf);

    let volbuf: *mut u32 = buf.as_mut_ptr();
We need to change the signature of the mbox_send to remove the explicit array length, and likewise change the code to extract the pointer.
fn mbox_send(ch: u8, buf: &mut[u32]) {
    while mmio_read(MBOX_STATUS) & MBOX_BUSY != 0 {
    }

    // obtain the address of buf as a raw pointer
    let volbuf = buf.as_mut_ptr();
So now we can try building this. As you might expect, we get linking errors:
   Compiling homer_rust v0.1.0 (/home/gareth/Projects/IgnoranceBlog/homer_rust)
    Finished release [optimized] target(s) in 0.20s
aarch64-linux-gnu-ld -nostdlib boot.o ../target/aarch64-unknown-linux-gnu/release/libhomer_rust.rlib -T linker.ld -o kernel8.elf
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/release/libhomer_rust.rlib(homer_rust-2e5114238dd615aa.homer_rust.f56f16d2546ffac4-cgu.0.rcgu.o): in function `kernel_main':
homer_rust.f56f16d2546ffac4-cgu.0:(.text.kernel_main+0xb8): undefined reference to `__rust_no_alloc_shim_is_unstable'
aarch64-linux-gnu-ld: homer_rust.f56f16d2546ffac4-cgu.0:(.text.kernel_main+0xbc): undefined reference to `__rust_no_alloc_shim_is_unstable'
aarch64-linux-gnu-ld: homer_rust.f56f16d2546ffac4-cgu.0:(.text.kernel_main+0xcc): undefined reference to `__rust_alloc'
make: *** [Makefile:14: kernel8.img] Error 1
Having said I was expecting linking errors, I don't think I was expecting these linking errors: I don't know what rust_no_alloc_shim_is_unstable even is, and I was thinking the rust_alloc would be defined and have some internal logic to handle the actual allocation process.

Anyway, I'm now going to go ahead and implement an allocator as laid out in the Embedded Rust Book chapter on alloc and see what happens.

I'm going to put the allocator in a new module, so in lib.rs, I added the following line toward the top of the file:
mod allocator;
and created a file allocator.rs.

The first thing I want to do is build enough code that the program links again, and then worry about whether it works or not - once it builds, I can always connect a debugger if I need to in order to see what is happening inside.

Following the instructions in the guide, the first thing we are going to do is use the global_allocator attribute to declare an allocator called HEAP.
use core::alloc::GlobalAlloc;

#[global_allocator]
static HEAP: HeapAllocator = HeapAllocator {
};

struct HeapAllocator {
}
In order to be able to fulfill the role of global_allocator, it is necessary to implement the GlobalAlloc trait. This has two methods, alloc and dealloc. For now, while we are just trying to get something to link, we are going to just return 0 in alloc. We don't need to implment dealloc at all yet.
unsafe impl GlobalAlloc for HeapAllocator {
    unsafe fn alloc(&self, _layout: core::alloc::Layout) -> *mut u8 {
        // simply return 0 - won't work but will link
        0x0 as *mut u8
    }

    unsafe fn dealloc(&self, _ptr: *mut u8, _layout: core::alloc::Layout) {
        // we don't support deallocation yet
    }
}
Excellent. That links and we can try running it. Unsurprisingly, it doesn't resize the screen and doesn't display Homer. OK, I can live with that for now.

Moving on, let's try and allocate a heap and then have our allocator return the block at the beginning of that area.

It seems to me that the right way to do this is to allocate a .heap section in the linker script, linker.ld. I'm going to put it at the end, after the .bss section and before the .end (obviously):
    . = ALIGN(4096); /* align to page size */
    __bss_end = .;
    __bss_size = __bss_end - __bss_start;

    __heap_start = .;
    .heap :
    {

    }
    . = 0x100000;
    __heap_end = .;
    __heap_size = __heap_end - __heap_start;
    
    __end = .;
This declares a .heap section and three symbols: __heap_start, which marks the start of the block, immediately after the end of the .bss section, aligned to a 4096-byte page boundary; __heap_end, at the end of the heap, which I have explicitly forced to be at 0x100000; and __heap_size, the size of the heap, which the linker calculates as the difference between these two.

Now we can update the allocator to include __heap_start:
extern {
    pub static __heap_start : *mut u8;
}
and return it from the alloc function:
    unsafe fn alloc(&self, _layout: core::alloc::Layout) -> *mut u8 {
        // just return the top of heap
        __heap_start
    }
Strangely, this now goes back to issuing some linker errors:
   Compiling homer_rust v0.1.0 (/home/gareth/Projects/IgnoranceBlog/homer_rust)
    Finished dev [unoptimized + debuginfo] target(s) in 0.06s
aarch64-linux-gnu-ld -nostdlib boot.o ../target/aarch64-unknown-linux-gnu/debug/libhomer_rust.rlib -T linker.ld -o kernel8.elf
aarch64-linux-gnu-ld: ../target/aarch64-unknown-linux-gnu/debug/libhomer_rust.rlib(homer_rust-c9f0f32593953886.1y1lhhvr9vp1xcdg.rcgu.o): in function `alloc::alloc::alloc':
/rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/alloc.rs:92: undefined reference to `__rust_no_alloc_shim_is_unstable'
aarch64-linux-gnu-ld: /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/alloc/src/alloc.rs:92: undefined reference to `__rust_no_alloc_shim_is_unstable'
It's that symbol __rust_no_alloc_shim_is_unstable again. I suspect that this is something to do with unstable Rust builds; let's track that down.

This thread explains that it is there as a hint to remind implementors that some aspects of the API may change in the future. I'm not quite sure what that means, but it sounds a bit like signing a legal contract. I don't see how it will actually help me in the future, unless it is going to be replaced by _is_stable or something when the API changes, which will stop the program linking again until I make the necessary changes.

Anyway, it seems to be declared in the allocator API and we just need to define and export it:
#[allow(non_upper_case_globals)]
#[no_mangle]
pub static __rust_no_alloc_shim_is_unstable: u8 = 0;
This feels more wrangling than actual code: the first simply suppresses the warning associated with having to declare a C style variable: Rust expects static variables to be declared in upper snake case. The second stops the name (which has already been mangled) being mangled again.

Running the application with this allocator in place, we now see Homer again.

Adding in a trace statement to check on the address that is returned from our allocator, I'm staggered to see it returning 00000000. How can that be? And if so, why did it not work when I returned 0 explicitly?

I suspect that there is something funky going on between declaring the variable __heap_start in the linker and using it here. So let's attach the debugger again. Remember that we have scripts scripts/debug.sh to compile and start the program, and scripts/gdb.sh to run the debugger. When we do this, we can put a breakpoint in the allocate function:
(gdb) b allocator.rs:21
Breakpoint 1 at 0x83364: file src/allocator.rs, line 21.
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, homer_rust::allocator::{impl#0}::alloc (self=0x8a163, _layout=...) at src/allocator.rs:21
21                __heap_start
(gdb) x/5i $pc
=> 0x83364 <_ZN89_LThomer_rust..allocator..HeapAllocatoru20asu20core..alloc..global..GlobalAllocGT5alloc17h5b531fd15c42200cE+16>:        adrp        x8, 0x8b000
   0x83368 <_ZN89_LThomer_rust..allocator..HeapAllocatoru20asu20core..alloc..global..GlobalAllocGT5alloc17h5b531fd15c42200cE+20>:        ldr        x8, [x8, #16]
   0x8336c <_ZN89_LThomer_rust..allocator..HeapAllocatoru20asu20core..alloc..global..GlobalAllocGT5alloc17h5b531fd15c42200cE+24>:        ldr        x0, [x8]
   0x83370 <_ZN89_LThomer_rust..allocator..HeapAllocatoru20asu20core..alloc..global..GlobalAllocGT5alloc17h5b531fd15c42200cE+28>:        add        sp, sp, #0x20
   0x83374 <_ZN89_LThomer_rust..allocator..HeapAllocatoru20asu20core..alloc..global..GlobalAllocGT5alloc17h5b531fd15c42200cE+32>:        ret
So it seems that what is happening here is that it is treating __heap_start not as a pointer itself, but as the location where the desired value can be found. Consequently, it is dereferencing it and returning the value in memory at that location - which could be anything but happens to be 0. So we need to take the address of __heap_start and then work through all the insanity of casting it to the value it needs to be:
    unsafe fn alloc(&self, _layout: core::alloc::Layout) -> *mut u8 {
        // just return the top of heap
        (&__heap_start as *const _) as *mut u8
    }
In this context, *const _ refers to a "raw" pointer which is what we need to move from the pointer we have to the pointer we want. I think in the fullness of time the correct solution to declare the value of __heap_start not as a pointer but as an array or object or something from which we can more reasonably take the address. But this works, and the value shown by the debugger is now 00090000 which matches the value we can see using nm:
$ nm asm/kernel8.elf
...
0000000000090000 B __heap_start
...
This is checked in and tagged as RUST_BARE_METAL_MESSAGE_ALLOCATOR.

A Simple, Page-Based Allocator

So, the actual process of incorporating an allocator wasn't too bad. On the other hand, I haven't actually written an allocator yet - I have just implemented the API and it won't so much as handle two calls, let alone complexities such as reallocation or multi-threading.

Not surprisingly, the alloc documentation in the Embedded Rust Book suggests that you don't write your own allocator, but use a battle-hardened one. But that's not why we're here. We are here (as JFK would say) to "play Texas": these are precisely the things we want to do in order to see how they work and how we can deal with the problems that arise.

For now, at least, I want to write something very simple, which can handle both allocation and deallocation and can handle different sizes of memory, along with alignment.

One of the challenges of memory management is fragmentation, and knowing where the most suitable chunk of memory can be found. What I'm going to do to reduce this complexity is only offer three possible chunk sizes: 16, 256 and 4096 bytes. Each of these will be aligned to the same boundary as its size. If you want less memory than one of these, I'll give you the next size up. If you want more than 4096 bytes in a single allocation, you will see receive a series of pages, fresh from the heap, but at least they will be guaranteed to be consecutive.

Memory is often (generally?) issued in 4096-byte "pages". This is certainly the case in the ARM v8 MMU. So if we handle memory internally in pages, we will probably find life easier than if we do something else.

The struct associated with the global_allocator can hold arbitrary data items, although obviously this is all allocated in the data segment, not on the heap. We can use this to hold references to our currently available pages.

My basic strategy is to divide up the memory between __heap_start and __heap_end into pages, and to allocate one of these each time we have run out of free memory. We will call this next_page. We can either allocate the new page as a whole page, as 15 256-byte blocks, or as 255 16-byte blocks. If a new page is requested and the pointer is already at __heap_end, we will return nulll (indicating out of memory, which apparently causes a panic in _rust_alloc). There is no garbage collection here, we are only going to reuse memory which has been deallocated.

We also need pointers to the free lists, which we will call free_16, free_256 and free_4096. Note that even when someone requests a whole page, we may have pages on hand which have been previously allocated and then deallocated which we can hand them. We will initialize each of these pointers to 0, indicating that there are no free entries available for that size.

When an alloc request is received, if a suitable block is on the free list, we want to return that, and update the pointer to point to the next free block. If the pointer is 0, then we need to allocate a whole new page, and then initialize it. What is meant by initialization? For a whole page, nothing is required, but for the other cases, we need to do two things:
  • The first entry (somewhat wastefully) contains an integer which is the size of the entries in that block, i.e. either 16 or 256. We will use this in dealloc to determine how big the block was that we issued.
  • We need to set up a linked list through the remaining entries in the block, so that when we return one of the entries in the block, we know where the next one is.
OK, ready to start?

This is a non-trivial piece of code, and it is made more complicated by the fact that I am abusing Rust without really knowing what I'm doing. I am going to press on regardless, and get something to work which is in about the shape that we want. I'm then going to double back and use automated tests (now that I know how) to flesh out the implementation.

First off, I'm going to change __heap_start to be a u32, since we're only taking its address anyway. I'm also going to introduce __heap_end as a u32, so that we can detect when we run out of memory.
extern {
    pub static __heap_start : u32;
    pub static __heap_end : u32;
}
I'm going to rename HeapAllocator to PageAllocator, since that more accurately reflects what it is doing, rather than the role that it is playing. And I am going to add four members: one to track where the next unassigned page is, and the other three to track where the heads of the three free lists are.
struct PageAllocator {
    next_page: UnsafeCell<usize>,
    free_16: UnsafeCell<usize>,
    free_256: UnsafeCell<usize>,
    free_4096: UnsafeCell<usize>
}
I spent a lot of time trying to figure out how to store these values, but I ended up settling for the same UnsafeCell<usize> that is used in BumpPointerAlloc in the embedded Rust book. But doing this causes an error to appear:
error[E0277]: `UnsafeCell<usize>` cannot be shared between threads safely
  --> src/allocator.rs:14:14
   |
14 | static HEAP: PageAllocator = PageAllocator {
   |              ^^^^^^^^^^^^^ `UnsafeCell<usize>` cannot be shared between threads safely
   |
   = help: within `PageAllocator`, the trait `Sync` is not implemented for `UnsafeCell<usize>`
note: required because it appears within the type `PageAllocator`
  --> src/allocator.rs:21:8
   |
21 | struct PageAllocator {
   |        ^^^^^^^^^^^^^
   = note: shared static variables must have a type that implements `Sync`
I spent some time looking into this, and trying to understand how the BumpPointerAlloc class worked, until I spotted this line in the example:
unsafe impl Sync for BumpPointerAlloc {}
Basically, just wish it to be true! The code explicitly says that the memory allocator is only valid in a single-threaded environment. Presumably it would be possible to make the allocator thread-safe by appropriate use of thread-aware constructs, but since we are (currently) also in a single-threaded environment, I'm not going to worry about this too much, and just copy it.
// OK to hack this because we are single threaded for now
unsafe impl Sync for PageAllocator {}
Having declared the allocator, we need to initialize it. I again spent a non-trivial amount of time trying to figure out how to pass __heap_start into the initializer before finally giving up and saying "that is just a case I will need to handle in alloc". So we initialize all the fields to zero instead:
#[global_allocator]
static HEAP: PageAllocator = PageAllocator {
    next_page: UnsafeCell::new(0),
    free_16: UnsafeCell::new(0),
    free_256: UnsafeCell::new(0),
    free_4096: UnsafeCell::new(0)
};
Now it's just a question of writing the actual allocation function. This has four cases, which are:
  • allocate a small block (16 bytes or less)
  • allocate a middling block (17-256 byres)
  • allocate a page (for requests from 257-4096 bytes)
  • fail for now (for larger requests)
    unsafe fn alloc(&self, layout: core::alloc::Layout) -> *mut u8 {
        if layout.size() <= 16 && layout.align() <= 16 {
            self.next(&self.free_16)
        } else if layout.size() <= 256 && layout.align() <= 256 {
            self.next(&self.free_256)
        } else if layout.size() <= 4096 && layout.align() <= 4096 {
            self.next(&self.free_4096)
        } else {
            core::ptr::null_mut() // this is allegedly what will cause OOM exceptions
        }
    }
So, yes, this does just delegate all of the hard work to a function called next. The reason for this is that the three cases where we do allocate memory are all very similar and we just need to consider which list to use (we also need to know the sizes of the block, but that code isn't written yet).

That function does all the hard work. I think I'm going to present it first, then talk about it.
impl PageAllocator {
    unsafe fn next(&self, list: &UnsafeCell<usize>) -> *mut u8 {
        let lp = list.get();
        if *lp == 0 {
            let np = self.next_page.get();
            // we don't have any free memory, allocate next page
            if *np == 0 {
                *np = (&__heap_start as *const _) as usize;
            }
            if *np >= (&__heap_end as *const _) as usize {
                return core::ptr::null_mut(); // we are out of memory
            }

            // TODO: still need to initialize inside of block if not 4096
            *lp = *np;
            *np = *np + 4096;
        }
        let curr = *lp;
        *lp = *(curr as *const usize);
        curr as *mut u8
    }
}
First, note that this is not in the same impl block as alloc. That's because that block is exclusively for implementing the methods of the GlobalAlloc trait. This is a new method exclusive to PageAllocator and so goes in its own block.

An UnsafeCell is an object which can hold a value and return a pointer to it when desired. We are using these here to store the head of a linked list inside our heap. In next, we start by obtaining and dereferencing this pointer to obtain the usize value. If it is the value 0, then we know we don't have any available items on this free list and we need to allocate some. Leaving aside that block, when we have an entry on the free list, we assign it to curr, the current value we will return, and then assign the contents of that memory address to the "head" pointer. This assumes that we have stored the next free pointer (if any) in this location previously, or 0 if there are no more free locations.

Now let's return to the base case, in which there are no entries on the free list. Here, we look at the next_page pointer, which is where we are storing the pages that haven't been used yet. If this is 0, then this is the first call to alloc, and we need to assign the address of __heap_start to the next_page pointer before continuing.

The next check ensures that the value in the next_page pointer is not past the address of __heap_end. If it is, then we have completely run out of heap and cannot allocate the requested memory. Following the example we are basing our code on, we return core::ptr::null_mut() to indicate that we could not perform the allocation, and it would seem that the parent code deals with the rest of the issues.

We have two final steps to complete: first, we need to initialize this block to make sure that it contains all the right values in all the right places (just a comment right now); and then we need to update the current free list to point to the page just allocated, and move the next_page pointer on by 4096 to point to the next page (which must either be the next free page or __heap_end).

For all that complexity, this code does exactly the same thing as our prior code in the sample we are using: it just returns the address of __heap_start. So, unsurprisingly, the code still works.

I have checked this in as RUST_BARE_METAL_PAGE_ALLOCATOR_1.

Completing the Implementation

So up to this point, I have been mainly fiddling in the dark without really knowing what I was doing and mainly just trying to abuse Rust to the point where it let me compile something. But now I have something working, I want to switch gears and complete the implementation in the canonical way - by writing unit tests.

So let's write a test, which we will place in its own module and file, tests.rs:
#[cfg(test)]
mod tests {
    use crate::allocator::PageAllocator;
    use core::{cell::UnsafeCell, alloc::GlobalAlloc};

    #[test]
    fn test_allocate_first_page() {
        let pa = simple_allocator();
        unsafe {
            let addr = pa.alloc(alloc::alloc::Layout::fromsizealign(4096, 16).unwrap()) as usize;
            assert_eq!(addr, 4096);
        }
    }

    fn simple_allocator() -> PageAllocator {
        return PageAllocator{
            next_page: UnsafeCell::new(4096),
            free_16: UnsafeCell::new(0),
            free_256: UnsafeCell::new(0),
            free_4096: UnsafeCell::new(0)        
        }
    }
}
The problem with this is that when I go to run the test, the linker gives me errors. Why? Well, I can't say it's entirely unexpected. We have written code that depends on __heap_start and that is a symbol we bind in during our explicit linking process in linker.ld. The cargo script doesn't know how to find it. However, the error message is as impressive as usual for Rust, and in addition to reporting the entire command line which is passed to ld through cc, it gives these errors and suggestions:
          /usr/bin/ld: /home/gareth/Projects/IgnoranceBlog/homerrust/target/debug/deps/homerrust-6186d6a57da04f90.3ti03rv249xsljw4.rcgu.o: in function `homer_rust::allocator::PageAllocator::next':
          /home/gareth/Projects/IgnoranceBlog/homer_rust/src/allocator.rs:58: undefined reference to `__heap_start'
          /usr/bin/ld: /home/gareth/Projects/IgnoranceBlog/homer_rust/src/allocator.rs:60: undefined reference to `__heap_end'
          collect2: error: ld returned 1 exit status
          
  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#cargorustc-link-libkindname)
I have checked the current state of the code in as RUST_BARE_METAL_PAGE_ALLOCATOR_TESTS_BROKEN, in large part so I can try different strategies and come back to a known state.

Interestingly, what my research reveals is that the solution to the whole problem is a lot simpler than seems to be indicated by the messages above. What I hadn't realized, but is clearly documented is that the #[cfg(test)] attribute is in fact a directive for conditional compilation comparable to using #ifdef in C. So we can put any code we want inside the #[cfg(test)] and it will be defined there and not in the live application. And, in contrast, we can define a block with #[cfg(not(test))] which will include code that is only defined when compiling not for tests.

So, I can add this directly into tests.rs:
    #[no_mangle]
    pub static __heap_start :u32 = 0;
    #[no_mangle]
    pub static __heap_end :u32 = 0;
And we can add yet one more attribute to the existing definition of the unstable shim variable in allocator.rs:
#[cfg(not(test))]
#[allow(non_upper_case_globals)]
#[no_mangle]
pub static __rust_no_alloc_shim_is_unstable: u8 = 0;
This stops it being defined more than once in the test scenario.

The tests still fail:
$ cargo test
    Finished test [unoptimized + debuginfo] target(s) in 0.00s
     Running unittests src/lib.rs (target/debug/deps/homer_rust-6186d6a57da04f90)
memory allocation of 48 bytes failed
error: test failed, to rerun pass `--lib`
This is not actually that surprising, because we have defined a #[global-allocator] and so the code we are trying to test is being used to allocate memory for the test. This was never going to end well.

But surely our new "get out of jail free" card can be played here as well:
#[cfg(not(test))]
#[global_allocator]
static HEAP: PageAllocator = PageAllocator {
    ...
};
Now it goes a little further, but still no joy:
$ cargo test
   Compiling homer_rust v0.1.0 (/home/gareth/Projects/IgnoranceBlog/homer_rust)
    Finished test [unoptimized + debuginfo] target(s) in 0.21s
     Running unittests src/lib.rs (target/debug/deps/homer_rust-6186d6a57da04f90)

running 2 tests
test tests::test_set_4_in_1_from_0 ... ok
error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/home/gareth/Projects/IgnoranceBlog/homer_rust/target/debug/deps/homer_rust-6186d6a57da04f90` (signal: 11, SIGSEGV: invalid memory reference)
SIGSEGV is never good. And it suggests that we have done something wrong with memory and pointers and all that unsafe stuff we've been messing around with. Basically, our code doesn't work. But at least it is compiling and running now.

Removing all the code from the test case gets the test to pass:
running 2 tests
test tests::test_set_4_in_1_from_0 ... ok
test allocator::tests::tests::test_allocate_first_page ... ok
So at least we have a baseline that compiles and links we can work from. I'm going to put the code back, and then check this in as RUST_BARE_METAL_PAGE_ALLOCATOR_TESTS_SEGV.

I don't know how to attach a debugger to cargo test, and, rather than find out right now, I've decided to fall back on the old tactic of commenting out code until I find the line that fails. It's this one:
        *lp = *(curr as *const usize);
But it's not obvious to me why. Presumably the variable curr is not correctly defined. Looking through my code, I see that in the test definition of the allocator, I set it to 4096 which is clearly not a valid address. I'd done that without thinking about the fact that I would be wanting to write to that address. So let's instead set it to zero, and then, now that we have __heap_start defined in the test, that should work.

Sadly, it still fails, but at least it's now an assertion failure:
assertion `left == right` failed
  left: 94405430325248
 right: 4096
So, now that we have reached this point, I think it's time to ask if we are going the way we want to go or not. So, again, I'm going to check in broken code as RUST_BARE_METAL_PAGE_ALLOCATOR_TESTS_FAILING.

I'm not really ready for how complicated this is becoming. But having floundered my way through it, I hope I can explain it in words of one syllable. It occurs to me that what I need to do (in the tests) is to use the standard allocator to allocate a block of (test) heap which I can then treat as if it were the entire heap available to the allocator under test.

So it seems to me that the right thing to do is to extract the section of code that handles initialisation when the initial value of next_page is 0. We can then wrap this up in a closure and pass it to the PageAllocator constructor.

So, in the test, this looks something like this:
            let blk = alloc::alloc::alloc(alloc::alloc::Layout::from_size_align(4096, 16).unwrap());
            let start = (blk as *const _) as usize;
            let f = || { (start, start+4096) };
The first line here allocates a block of memory on the heap which is exactly the size of a page. The second line converts the pointer to a usize value, and the third defines a closure which returns the start and end of the block.

Meanwhile, in the non-test case, we can extract the values of __heap_start and __heap_end as defined by the linker in a function:
    fn init_from_heap() -> (usize, usize) {
        unsafe { ((&__heap_start as *const _) as usize, (&__heap_end as *const _) as usize) }
    }    
We need to update the PageAllocator struct to store this field, as well as a field to hold the end value.
struct PageAllocator {
    next_page: UnsafeCell<usize>,
    end: UnsafeCell<usize>,
    free_16: UnsafeCell<usize>,
    free_256: UnsafeCell<usize>,
    free_4096: UnsafeCell<usize>,
    init: fn() -> (usize, usize)
}
What could possibly go wrong?

Well, quite a bit.

First, we have declared init to have a function type, but Rust views a closure as something different because it references (or "captures") the value of start.
error[E0308]: mismatched types
  --> src/allocator/tests.rs:27:23
   |
20 |             let f = || { (start, start+4096) };
   |                     -- the found closure
...
27 |                 init: f
   |                       ^ expected fn pointer, found closure
   |
   = note: expected fn pointer `fn() -> (usize, usize)`
                 found closure `[closure@src/allocator/tests.rs:20:21: 20:23]`
note: closures can only be coerced to `fn` types if they do not capture any variables
  --> src/allocator/tests.rs:20:27
   |
20 |             let f = || { (start, start+4096) };
   |                           ^^^^^ `start` captured here
This now sends us down a chain of corrections. OK, it's a closure, so we need to declare it as such. To declare a closure, you need to use a Rust trait, which, as I understand it, is similar to an interface in Java. But you can't just declare the closure type, as you can with a function, because each closure is different and requires the compiler to generate a different enclosing type (in this case PageAllocator) for each actual closure instance. So, we need to provide the type with a type attribute, and then express the constraint that it is a closure trait.
struct PageAllocator<F> where F: Fn() -> (usize,usize) {
    ...
    init: F
}
So far, so good. The problem is that we need to add a similar attribute everywhere we use PageAllocator.

In the actual initialization of the global_allocator:
    static HEAP: PageAllocator<fn()->(usize,usize)> = PageAllocator {
        ...
    };
In the test initialization:
    fn simple_allocator() -> (usize, PageAllocator<Fn() -> (usize,usize)>) {
        ...
    }
And then in each of the impl blocks for PageAllocator. Interestingly, in these cases, we need to declare the parameter all over again - it isn't automatically inherited from the struct declaration. I'm sure there are reasons for this, but I don't understand them.
impl<F> PageAllocator<F> where F: Fn() -> (usize,usize) {
    ...
}

unsafe impl<F> Sync for PageAllocator<F> where F: Fn() -> (usize,usize) {}

unsafe impl<F> GlobalAlloc for PageAllocator<F> where F: Fn() -> (usize,usize) {
    ...
}
OK, now does it work? Well, sadly not. The closure type cannot be used directly in the test initialization because the compiler cannot figure out what size it is. I think what it means is that it cannot figure out how to generate the code for the PageAllocator to store or invoke the closure, because I can't see how it doesn't know it when generating the code to call the constructor.
error[E0782]: trait objects must include the `dyn` keyword
  --> src/allocator/tests.rs:16:52
   |
16 |     fn simple_allocator() -> (usize, PageAllocator<Fn() -> (usize,usize)>) {
   |                                                    ^^^^^^^^^^^^^^^^^^^^^
   |
help: add `dyn` keyword before this trait
   |
16 |     fn simple_allocator() -> (usize, PageAllocator<dyn Fn() -> (usize,usize)>) {
   |                                                    +++
OK, so let's try adding a dyn as shown. This leads to another message:
error[E0277]: the size for values of type `(dyn Fn() -> (usize, usize) + 'static)` cannot be known at compilation time
  --> src/allocator/tests.rs:16:30
   |
16 |     fn simple_allocator() -> (usize, PageAllocator<dyn Fn() -> (usize,usize)>) {
   |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ doesn't have a size known at compile-time
   |
   = help: the trait `Sized` is not implemented for `(dyn Fn() -> (usize, usize) + 'static)`
note: required by a bound in `PageAllocator`
  --> src/allocator.rs:7:22
   |
7  | struct PageAllocator<F> where F: Fn() -> (usize,usize) {
   |                      ^ required by this bound in `PageAllocator`
help: you could relax the implicit `Sized` bound on `F` if it were used through indirection like `&F` or `Box<F>`
  --> src/allocator.rs:7:22
   |
7  | struct PageAllocator<F> where F: Fn() -> (usize,usize) {
   |                      ^ this could be changed to `F: ?Sized`...
...
13 |     init: F
   |           - ...if indirection were used here: `Box<F>`
All of these suggestions can be followed, and I did try a couple of them, but the Box seemed the simplest to get working:
    fn simple_allocator() -> (usize, PageAllocator<Box<dyn Fn() -> (usize,usize)>>) {
        unsafe {
            ...
            let f = || { (start, start+4096) };
            (start, PageAllocator{
                ...
                init: Box::new(f)
            })
        }
    }
So, now it works, right? Almost, but not quite.

Quite understandably, the compiler thinks that start will go out of scope before it is used in the closure:
error[E0373]: closure may outlive the current function, but it borrows `start`, which is owned by the current function
  --> src/allocator/tests.rs:20:21
   |
20 |             let f = || { (start, start+4096) };
   |                     ^^    ----- `start` is borrowed here
   |                     |
   |                     may outlive borrowed value `start`
   |
note: closure is returned here
  --> src/allocator/tests.rs:21:13
   |
21 | /             (start, PageAllocator{
22 | |                 next_page: UnsafeCell::new(0),
23 | |                 end: UnsafeCell::new(0),
24 | |                 free_16: UnsafeCell::new(0),
...  |
27 | |                 init: Box::new(f)
28 | |             })
   | |_______^
help: to force the closure to take ownership of `start` (and any other referenced variables), use the `move` keyword
   |
20 |             let f = move || { (start, start+4096) };
   |                     ++++
But at least the suggested solution is quite simple: add the move keyword to the definition of the closure.

And, yes, now everything works. And the test passes while the code keeps on running.

That's all checked in as RUST_BASE_METAL_PAGE_ALLOCATOR_FIRST_TEST. Before checking it in, I did a couple of quick refactorings to put all the non-test code in its own module, all neatly wrapped up in a single #[not(cfg(test))].

Tidying up

OK, I think everything from here on is straightforward TDD of a fairly normal piece of code - the fact that it will be used as a memory allocator can be safely ignored. Some refactoring happened as well, and I've replaced all the static memory allocation with dynamic allocation. I don't think any interesting issues came up.

The final version is checked in as RUST_BARE_METAL_PAGE_ALLOCATOR_COMPLETE.

Conclusion

We have successfully managed to incorporate memory management into our Homer example. That sorts out most of the issues we have right now and just leaves the fact that the code is a mess, but I'm not going to fix that in the Homer project. But before I start a whole new project, there are a couple of things I still want to try and understand how to do in the bare-metal context.