Saturday, May 24, 2025

Processing the Verbs


The final step of generating a GnuCash file is to process the verbs. Then we can wire everything up and let it rip.

Each "verb/action" combines two sources of information: the source and destination accounts from the configuration file together with the date, description and amount from the spreadsheet. There is also the odd case where the spreadsheet can "override" the source or destination accounts.

Configuration

We can start by defining the verbs in the configuration file:
type Configuration struct {
    APIKey       string
    OAuth        string
    Token        string
    RedirectPort int
    Spreadsheet  string
    Output       string

    Business Business
    Accounts []Account
    Verbs    []Verb
    VerbMap  map[string]*Verb
}

type Verb struct {
    Name   string
    Source string
    Dest   string
}

GNUCASH_SPREADSHEET_TO_XML:accounts/internal/gnucash/config/config.go

Because we want to configure the verbs as a list, but use them as a map, I have defined VerbMap here which is not read from the JSON file but filled in later:
func ReadConfig(file string) (*Configuration, error) {
    log.Printf("reading %s\n", file)
    bs, err := os.ReadFile(file)
    if err != nil {
        return nil, err
    }
    ret := Configuration{VerbMap: make(map[string]*Verb)}
    err = json.Unmarshal(bs, &ret)
    if err != nil {
        panic(err)
    }
    for _, v := range ret.Verbs {
        ret.VerbMap[v.Name] = &v
    }
    log.Printf("read %v\n", ret)
    return &ret, nil
}

GNUCASH_SPREADSHEET_TO_XML:accounts/internal/gnucash/config/reader.go

That completes handling the configuration.

Processing the Sheets

The bulk of the code we're going to add is in the AccountsTransformer. It receives a sheet and processes it.
func (at *AccountsTransformer) DeliverSheet(tabs []sheets.Tab) {
    fmt.Printf("%v\n", tabs)
    accts := writer.NewAccounts(at.conf)
    for _, tab := range tabs {
        verb := at.conf.VerbMap[tab.Title]
        if verb == nil {
            panic("there is no verb " + tab.Title)
        }
        for _, r := range tab.Rows {
            dt := r.Columns["Date"]
            var yr, mon, day int
            fmt.Sscanf(dt.(string), "%04d-%02d-%02d", &yr, &mon, &day)

            amount := r.Columns["Value"].(string)
            var amt writer.Money
            amount = strings.Replace(amount, "£", "", -1)
            amount = strings.Replace(amount, ",", "", -1)
            if strings.Contains(amount, ".") {
                var gbp, pence int
                fmt.Sscanf(amount, "%d.%d", &gbp, &pence)
                amt = writer.GBPP(gbp, pence)
            } else {
                var gbp int
                fmt.Sscanf(amount, "%d", &gbp)
                amt = writer.GBP(gbp)
            }

            tx := accts.Verb(verb, writer.Date(yr, mon, day, 900), r.Columns["Description"].(string), amt)
            if r.Columns["Source"] != nil && r.Columns["Source"] != "" {
                tx.SetSrc(r.Columns["Source"].(string))
            }
            if r.Columns["Dest"] != nil && r.Columns["Dest"] != "" {
                tx.SetDest(r.Columns["Dest"].(string))
            }
        }
    }
    at.dlvr.Deliver(accts)
}

GNUCASH_SPREADSHEET_TO_XML:accounts/internal/gnucash/accounts/transform.go

The verb is chosen based on the title of the tab. (This was not recorded when I wrote the original version of the spreadsheet reader, so I had to go back and add it.)

Then, all the Rows are processed sequentially. The date is recovered and parsed using Sscanf. The amount is extracted and all the punctuation is removed before it is parsed using Sscanf (I'm sure that it's possible to do this all in one step, but I didn't want to deal). I decided to consider the two separate cases of round pounds (which most of my transactions are) as well as pounds and pence (about 3).

Then we call the Verb method on GnuCash which is just a simple wrapper over the Transact method we wrote last time which does the combining of the verb and transaction details.
func (g *Gnucash) Verb(verb *config.Verb, date DateInfo, description string, amount Money) *Transaction {
    return g.Transact(date, description, verb.Source, verb.Dest, amount)
}

GNUCASH_SPREADSHEET_TO_XML:accounts/internal/gnucash/writer/gnucash.go

And finally we consider that the user might have defined (and put values in) columns called Source and Dest which override the configured columns by calling SetSrc or SetDest on the transaction:
func (tx *Transaction) SetDest(name string) {
    destGuid := tx.accountGuids[name]
    if destGuid == "" {
        panic("there is no account for " + name)
    }
    tx.destAcct.Value = destGuid
}

func (tx *Transaction) SetSrc(name string) {
    srcGuid := tx.accountGuids[name]
    if srcGuid == "" {
        panic("there is no account for " + name)
    }
    tx.srcAcct.Value = srcGuid
}

GNUCASH_SPREADSHEET_TO_XML:accounts/internal/gnucash/writer/gnucash.go

(In the previous episode, we ignored the transaction description, so I quietly reworked that to add it. I also realized that I had inverted the splits "from" and "to" so I changed them. On top of that, a whole bunch of other things had to change to make it possible to implement SetSrc and SetDest.)

Conclusion

And, for now, that is that. I ran through all my spreadsheets and after correcting a few data entry errors found that everything matched up.

I want to do two more things with this project, though. The first - and most important - is to see if I can use the GnuCash ixbrl-reporter project to automate generation of my accounts in iXBRL format (inline eXtensible Business Reporting Language). If I understand this correctly, if correctly configured, I should be able to generate an HTML document that will present my accounts. But, internally, it will have semantic tags that enable the government's computers to process it automatically. The other - which is somewhat related inasmuch as it supports me understanding how to do that - is to generate my own reports (i.e. annual P&L and Balance Sheets). I can do all that without going through GnuCash, of course - and my plan is to do just that by keeping track of all the accounts in memory and then spewing out the appropriate things.

Because it is a supporting task, and because I think I understand how to do it, I am going to do the second one first.

Friday, May 23, 2025

Generating the XML


I'm now going to skip to the other end, writing out the XML file, for a number of reasons:
  • I haven't done it before, so it seems high risk;
  • It connects to our other dependency in this project, so its shape is forced on us;
  • In order to build this out, I will need to define the structs that I want to transform into;
  • The middle piece is the most interesting, so I'm pushing it off until last.
So, for now, the middle stage of the pipeline is just going to print out the incoming sheet data and then call the final phase with initially no accounts. As I need them, I will add more test data there.

In the first episode, I toyed around with GnuCash, so I have a sample file. I'm going to try and produce something "similar" to that, although I doubt that I can generate an identical file (for example, the GUIDs will be different).

XML Marshalling

So the basic thing we want to do is take the incoming structs and marshal them to XML. There is a package xml and a method MarshalIndent that does exactly that. How hard can it be?

It's not that hard to do the writing:
package writer

import (
    "encoding/xml"
    "os"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
)

const (
    header = `<?xml version="1.0" encoding="UTF-8"?>` + "\n"
)

type Writer struct {
    Config *config.Configuration
}

func (w *Writer) Deliver(accts *Gnucash) {
    bs, err := xml.MarshalIndent(accts, "", "  ")
    if err != nil {
        panic(err)
    }

    withHeader := header + string(bs)

    os.WriteFile(w.Config.Output, []byte(withHeader), 0666)
}

func MakeWriter(conf *config.Configuration) *Writer {
    return &Writer{Config: conf}
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/writer.go

As much as anything, it's a pain that the xml package doesn't allow you to specify the doctype header, so there are hoops to jump through for that.

All the pain, of course, comes from having to specify the types to be marshalled and getting all the XML tags in place.

At the top level, we have the GnuCash definition which represents the whole file:
package writer

import "encoding/xml"

type Gnucash struct {
    XMLName xml.Name `xml:"gnc-v2"`
    Namespaces
    Elements
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

The field XMLName here is a special field whose value or tag specifies the "tag" associated with the element, in this case gnc-v2. The other two fields represent struct values which are anonymous: this means that the marshaller does not output any field name for them.
type Namespaces struct {
    GNC string `xml:"xmlns:gnc,attr"`
    ACT string `xml:"xmlns:act,attr"`
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

This is where we will put all our namespaces. Because it is included anonymously inside the GnuCash definition, these attr items are automatically included as attributes on the gnc-v2 element. Their values are defined on initialization to have the appropriate scopes:
func completeNamespaces(gnc *Gnucash) {
    gnc.GNC = "http://www.gnucash.org/XML/gnc"
    gnc.ACT = "http://www.gnucash.org/XML/act"
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

In the same way, we can define the anonymous field of type Elements to just be a slice of any, allowing us to include an arbitrary number of nested elements, each self-identifying:
type Elements []any

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And then we can make a start on all the big elements in a GnuCash file, starting with CountData and Book:
type CountData struct {
    XMLName xml.Name `xml:"gnc:count-data"`
    Type    string   `xml:"cd:type,attr"`
    Count   int      `xml:",chardata"`
}

type AccountBook struct {
    XMLName     xml.Name `xml:"gnc:book"`
    BookVersion string   `xml:"version,attr"`
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And it's fairly easy to create the basic set of these:
func NewAccounts() *Gnucash {
    ret := Gnucash{}
    completeNamespaces(&ret)
    ret.Elements = append(ret.Elements, NewCountData("book", 1))
    ret.Elements = append(ret.Elements, NewAccountBook())
    return &ret
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

func NewCountData(ty string, cnt int) *CountData {
    ret := CountData{Type: ty, Count: cnt}
    return &ret
}

func NewAccountBook() *AccountBook {
    ret := AccountBook{BookVersion: "2.0.0"}
    return &ret
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And having done all that, we can see that we get the expected output in our output file:
<?xml version="1.0" encoding="UTF-8"?>
<gnc-v2 xmlns:gnc="http://www.gnucash.org/XML/gnc" xmlns:act="http://www.gnucash.org/XML/act">
  <gnc:count-data cd:type="book">1</gnc:count-data>
  <gnc:book version="2.0.0"></gnc:book>
</gnc-v2>
Now, while I have checked this about five times, there's no guarantee that I've done it perfectly, so it may be that we later find out that it isn't valid GnuCash XML. Anyway, this is really very boring and tedious, so I'm going to go off and finish all the boilerplate stuff and I'll come back here when I want to start dealing with the actual accounts and transactions.

You can see the remaining changes in GNUCASH_WRITE_XML_2. Note that while most of this is "complete boilerplate", there are a couple of sections which copy data directly from the configuration file into the output file, which is something of a precursor to what is about to follow.

Defining Accounts

Ideally, I would like all of the account information to be in just one place and, logically, that would have to be in the spreadsheet. However, I've tried this before and spreadsheets are really only good at one thing - storing grid data. Everything else we try and do with them - charts, reports, presentations - just shows up how bad they are at doing anything else, which is why I built my own Modeller tool to do those things instead.

Also, the information I have in the spreadsheet is not arranged by account but by "verb" and so it's not really clear where this information would go anyway. I could have a separate tab with account info in, but at the end of the day I'd rather put it in JSON in the configuration file.

So let's do that.

The account structure is hierarchical, which means that accounts can contain accounts. In the GnuCash configuration, this is represented by parent pointers, but it is perfectly reasonable to do it using a list member for each account. So I will have an Accounts field which is a list of accounts, and each account can likewise have an Accounts field which contains a list of sub-accounts. If it doesn't, it is a leaf account.

GnuCash has a "Root Account" which is a fake account which is the parent of all the others. So let's generate that first before we try and map the others.

As with everything else, we need to start by defining the XML record types we want to write to the GnuCash file.
type Account struct {
    XMLName xml.Name `xml:"gnc:account"`
    Version string   `xml:"version,attr"`
    Elements
}

type AccountItem struct {
    XMLName xml.Name
    Type    string `xml:"type,attr,omitempty"`
    Value   string `xml:",chardata"`
    Elements
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

Then in NewAccountBook, we can call mapAccounts, which will do all the hard work and return a list of accounts, and then use that response to set both the number of accounts and to include all the accounts in the list of Elements in the Account Book.
func NewAccountBook(conf *config.Configuration) *AccountBook {
    ret := AccountBook{BookVersion: "2.0.0"}
    bookId := BookId{Type: "guid", Guid: "95d515f6a8ef4c6fb50d245c82e125b3"}
    ret.Elements = append(ret.Elements, bookId)
    slots := BookSlots{}
    opts := MakeSlot("options", "frame")
    opts.Value.AnonymousValue = []any{FillBusiness(conf), FillTax(conf)}
    slots.Elements = append(slots.Elements, opts)
    ret.Elements = append(ret.Elements, slots)

    mappedAccounts := mapAccounts(conf)

    commodities := CountData{Type: "commodity", Count: 1}
    ret.Elements = append(ret.Elements, commodities)
    accounts := CountData{Type: "account", Count: len(mappedAccounts)}
    ret.Elements = append(ret.Elements, accounts)
    txns := CountData{Type: "transaction", Count: 3} // TODO: figure out this number
    ret.Elements = append(ret.Elements, txns)

    gbp := Commodity{Version: "2.0.0"}
    space := NewCommodityItem("space", "CURRENCY")
    id := NewCommodityItem("id", "GBP")
    gq := NewCommodityItem("get_quotes", "")
    qs := NewCommodityItem("quote_source", "currency")
    qtz := NewCommodityItem("quote_tz", "")
    gbp.Elements = []any{space, id, gq, qs, qtz}
    ret.Elements = append(ret.Elements, gbp)

    ret.Elements = append(ret.Elements, mappedAccounts...)

    return &ret
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

And we can define mapAccounts and its supporting functions which, for now, just define the root account:
func mapAccounts(conf *config.Configuration) []any {
    name := NewAccountItem("name", "RootAccount")
    id := NewAccountItem("id", newGuid())
    id.Type = "guid"
    ty := NewAccountItem("type", "ROOT")
    curr := NewAccountItem("commodity", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    scu := NewAccountItem("commodity-scu", "100")

    rootAccount := Account{Version: "2.0.0", Elements: []any{name, id, ty, curr, scu}}
    return []any{rootAccount}
}

func NewAccountItem(tag, value string) AccountItem {
    name := xml.Name{Local: "act:" + tag}
    return AccountItem{XMLName: name, Value: value}
}

func newGuid() string {
    return strings.Replace(uuid.New().String(), "-", "", -1)
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

And for all this hard work, we get a Root Account that looks something like this (the Guid changes every time we run it, obviously):
   <gnc:account version="2.0.0">
      <act:name>RootAccount</act:name>
      <act:id type="guid">1b69d9969d1640fba38a048c52a12c48</act:id>
      <act:type>ROOT</act:type>
      <act:commodity>
        <cmdty:space>CURRENCY</cmdty:space>
        <cmdty:id>GBP</cmdty:id>
      </act:commodity>
      <act:commodity-scu>100</act:commodity-scu>
    </gnc:account>
It's interesting that when I try and open this file from the command line, GnuCash starts but it opens the file I was experimenting with previously. I'm not sure why, but it's presumably not a very important bug. From the menu, I'm able to select the generated file and it opens properly and shows no account hierarchy or transactions, but I can see the company info in the Properties window.
$ open accounts.gnucash
Now that we have a root account, we have somewhere to attach our other accounts as we (recursively) traverse the configuration. So let's look at the configuration as read from the JSON file:
      "Accounts": [
                {
                        "Name": "Assets",
                        "Type": "ASSET",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Current Assets",
                                        "Type": "ASSET",
                                        "Placeholder": true,
                                        "Accounts": [
                                                {
                                                        "Name": "Checking Account",
                                                        "Type": "BANK"
                                                }
                                        ]
                                }
                        ]
                },
                {
                        "Name": "Income",
                        "Type": "INCOME"
                },
                {
                        "Name": "Expenses",
                        "Type": "EXPENSE",
                        "Accounts": [
                                {
                                        "Name": "Accountancy Fees",
                                        "Type": "EXPENSE"
                               }
                        ]
                },
                {
                        "Name": "Equity",
                        "Type": "EQUITY",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Capital Account - Gareth Powell",
                                        "Type": "EQUITY"
                                }
                        ]
                },
                {
                        "Name": "Liabilities",
                        "Type": "LIABILITY",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Expense Account - Gareth Powell",
                                        "Type": "LIABILITY"
                                },
                                {
                                        "Name": "Director Loan Account - Gareth Powell",
                                        "Type": "LIABILITY"
                                }
                        ]
                }
        ]
When it comes down to it, I may end up needing more or less accounts and they may need to be different to these. But I believe this is basically what I want.

I started off by adding the appropriate struct to the configuration, which then allows all of this to be read in by our existing code:
type Configuration struct {
    APIKey       string
    OAuth        string
    Token        string
    RedirectPort int
    Spreadsheet  string
    Output       string

    Business Business
    Accounts []Account
}
type Account struct {
    Name        string
    Type        string
    Placeholder bool
    Accounts    []Account
}

GNUCASH_MAP_ACCOUNTS:accounts/internal/gnucash/config/config.go

Mapping the accounts now becomes recursive, with the base case being mapping the Root account with the slice of accounts from the configuration and explicit settings for name and type, and then the recursive case taking its information from an Account object. Most of this is a refactoring from what we already had, but it was also necessary to "fill in" the features (such as parent) that were not used in the base case.

Most of the work is done by the recursive case makeAccount, but the helper function mapAccount massages the conf.Account structure into the right form.
func mapAccounts(conf *config.Configuration) []any {
    return makeAccount([]any{}, "RootAccount", "ROOT", "", false, conf.Accounts)
}

func mapAccount(mapped []any, acc config.Account, parent string) []any {
    return makeAccount(mapped, acc.Name, acc.Type, parent, acc.Placeholder, acc.Accounts)
}

func makeAccount(mapped []any, called, ofType, parent string, placeholder bool, accts []config.Account) []any {
    name := NewAccountItem("name", called)
    guid := newGuid()
    id := NewAccountItem("id", guid)
    id.Type = "guid"
    ty := NewAccountItem("type", ofType)
    curr := NewAccountItem("commodity", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    scu := NewAccountItem("commodity-scu", "100")
    acct := Account{Version: "2.0.0", Elements: []any{name, id, ty, curr, scu}}

    if parent != "" {
        desc := NewAccountItem("description", called)
        acct.Elements = append(acct.Elements, desc)
    }

    if placeholder {
        plac := NewAccountItem("slots", "")
        ps := MakeSlot("placeholder", "string")
        ps.Value.StringValue = "true"
        plac.Elements = []any{ps}
        acct.Elements = append(acct.Elements, plac)
    }

    if parent != "" {
        parElt := NewAccountItem("parent", parent)
        parElt.Type = "guid"
        acct.Elements = append(acct.Elements, parElt)
    }

    mapped = append(mapped, acct)
    for _, a := range accts {
        mapped = mapAccount(mapped, a, guid)
    }

    return mapped
}

GNUCASH_MAP_ACCOUNTS:accounts/internal/gnucash/writer/gnucash.go

And, just like that, all the accounts have been mapped.

Transactions

The transactions really need to come out of the spreadsheets, but only after being transformed. Remember that our spreadsheets hold "verb" statements and so need to be interpreted in the context of yet more configuration (to be defined in the next episode), so we need to have the middle layer provide something closer to what GnuCash expects.

On the other hand, the GnuCash format itself is horrible (because it is so closely tied to XML). So we want an intermediate form which is isomorphic to the GnuCash format, but simpler to use. While I'm making it simpler to use, I'm going to omit (at least for now) some of the more complex features (such as split transactions and the ability to handle multiple currencies) and say that a transaction consists of the date of the transaction, the name of a source account, the name of a destination account and a Money item, which is just a type to wrap up some of the complexities of handling money.

We can define a couple of sample transactions in the accounting pipeline in lieu of doing the real work of reading and processing a spreadsheet:
func AccountsPipeline(conf *config.Configuration) {
    w := writer.MakeWriter(conf)
    // TODO: these two lines skip reading the spreadsheet
    accts := writer.NewAccounts(conf)
    accts.Transact(writer.Date(2021, 10, 29, 1059), "Income", "Checking Account", writer.GBP(1))
    accts.Transact(writer.Date(2020, 12, 31, 1059), "Income", "Checking Account", writer.GBP(8624))
    w.Deliver(accts)
    // accts := accounts.MakeAccounts(conf, writer)
    // sheets.ReadSpreadsheet(conf, accts)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/pipeline/accounts.go

This uses the Date and GBP functions from the writer package.

Money is defined in its own file, money.go:
package writer

import "fmt"

type Money struct {
    Units, Subunits int
}

func GBP(units int) Money {
    if units < 0 {
        panic("invalid money")
    }
    return Money{Units: units, Subunits: 0}
}

func GBPP(units, subunits int) Money {
    if units < 0 || subunits < 0 || subunits > 100 {
        panic("invalid money")
    }
    return Money{Units: units, Subunits: subunits}
}

func (m Money) GCCredit() string {
    return fmt.Sprintf("%d/100", 100*m.Units+m.Subunits)
}

func (m Money) GCDebit() string {
    return "-" + m.GCCredit()
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/money.go

The key functions here are GBP and GBPP which accept numbers of whole pounds, or pounds and pence and pack them into a structure. That structure then has the ability to generate "credit" or "debit" lines that we will use to build a transaction in XML.

DateInfo is defined in the same way in date.go:
package writer

import (
    "encoding/xml"
    "fmt"
)

type DateInfo struct {
    Year, Month, Day int
    HHMM             int
}

type DateXML struct {
    XMLName xml.Name `xml:"ts:date"`
    Value   string   `xml:",chardata"`
}

type GDateXML struct {
    XMLName xml.Name `xml:"gdate"`
    Value   string   `xml:",chardata"`
}

func Date(yyyy, mm, dd, hhmm int) DateInfo {
    return DateInfo{Year: yyyy, Month: mm, Day: dd, HHMM: hhmm}
}

func (d DateInfo) AsXML() DateXML {
    return DateXML{Value: d.String()}
}

func (d DateInfo) AsGDateXML() GDateXML {
    return GDateXML{Value: d.JustDate()}
}

func (d DateInfo) String() string {
    return fmt.Sprintf("%04d-%02d-%02d %02d:%02d:00 +0000", d.Year, d.Month, d.Day, d.HHMM/100, d.HHMM%100)
}

func (d DateInfo) JustDate() string {
    return fmt.Sprintf("%04d-%02d-%02d", d.Year, d.Month, d.Day)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/date.go

This allows users to create dates giving a date in three parts and the time in the 24-hour clock. It can convert that into two different XML formats used by GnuCash, as well as the two corresponding string formats.

In order to add transactions to the XML, we need to define appropriate XML-compliant structs:
type Transaction struct {
    XMLName xml.Name `xml:"gnc:transaction"`
    Version string   `xml:"version,attr"`
    Elements
}

type TransactionItem struct {
    XMLName xml.Name
    Type    string `xml:"type,attr,omitempty"`
    Value   string `xml:",chardata"`
    Elements
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/gnucash.go

Then we can add a method to GnuCash to accept simple transaction definitions and map them onto these GnuCash XML transactions:
func (g *Gnucash) Transact(date DateInfo, src string, dest string, amount Money) {
    srcGuid := g.accountGuids[src]
    if srcGuid == "" {
        panic("there is no account for " + src)
    }
    destGuid := g.accountGuids[dest]
    if destGuid == "" {
        panic("there is no account for " + dest)
    }
    tx := &Transaction{Version: "2.0.0"}
    guid := newGuid()
    id := NewTxItem("id", guid)
    id.Type = "guid"
    curr := NewTxItem("currency", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    dateXML := date.AsXML()
    datePosted := NewTxItem("date-posted", "")
    datePosted.Elements = []any{dateXML}
    dateEntered := NewTxItem("date-entered", "")
    dateEntered.Elements = []any{dateXML}
    desc := NewTxItem("description", "")
    slots := NewTxItem("slots", "")
    dp := MakeSlot("date-posted", "")
    dp.Value.Type = "gdate"
    dp.Value.AnonymousValue = date.AsGDateXML()
    notes := MakeSlot("notes", "")
    notes.Value.Type = "string"
    slots.Elements = []any{dp, notes}
    splits := NewTxItem("splits", "")
    splitFrom := NewTxItem("split", "")
    {
        sfid := NewSplitItem("id", newGuid())
        sfid.Type = "guid"
        rec := NewSplitItem("reconciled-state", "y")
        value := NewSplitItem("value", amount.GCCredit())
        quant := NewSplitItem("quantity", amount.GCCredit())
        acct := NewSplitItem("account", destGuid)
        acct.Type = "guid"
        splitFrom.Elements = []any{sfid, rec, value, quant, acct}
    }
    splitTo := NewTxItem("split", "")
    {
        stid := NewSplitItem("id", newGuid())
        stid.Type = "guid"
        rec := NewSplitItem("reconciled-state", "y")
        value := NewSplitItem("value", amount.GCDebit())
        quant := NewSplitItem("quantity", amount.GCDebit())
        acct := NewSplitItem("account", srcGuid)
        acct.Type = "guid"
        splitTo.Elements = []any{stid, rec, value, quant, acct}
    }
    splits.Elements = []any{splitFrom, splitTo}
    tx.Elements = []any{id, curr, datePosted, dateEntered, desc, slots, splits}
    g.book.Elements = append(g.book.Elements, tx)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/gnucash.go

This is all just a lot of malarkey to generate the expected transaction with its splits:
   <gnc:transaction version="2.0.0">
      <trn:id type="guid">8fafffa713914d96bed806ecb7c18685</trn:id>
      <trn:currency>
        <cmdty:space>CURRENCY</cmdty:space>
        <cmdty:id>GBP</cmdty:id>
      </trn:currency>
      <trn:date-posted>
        <ts:date>2021-10-29 10:59:00 +0000</ts:date>
      </trn:date-posted>
      <trn:date-entered>
        <ts:date>2021-10-29 10:59:00 +0000</ts:date>
      </trn:date-entered>
      <trn:description></trn:description>
      <trn:slots>
        <slot>
          <slot:key>date-posted</slot:key>
          <slot:value type="gdate">
            <gdate>2021-10-29</gdate>
          </slot:value>
        </slot>
        <slot>
          <slot:key>notes</slot:key>
          <slot:value type="string"></slot:value>
        </slot>
      </trn:slots>
      <trn:splits>
        <trn:split>
          <split:id type="guid">948211d71ae841a48452fde946509056</split:id>
          <split:reconciled-state>y</split:reconciled-state>
          <split:value>100/100</split:value>
          <split:quantity>100/100</split:quantity>
          <split:account type="guid">60ae38766cce4d65b5c843911af9414f</split:account>
        </trn:split>
        <trn:split>
          <split:id type="guid">dbe06968e0e2472db535fd47a28d03df</split:id>
          <split:reconciled-state>y</split:reconciled-state>
          <split:value>-100/100</split:value>
          <split:quantity>-100/100</split:quantity>
          <split:account type="guid">b178172425484d66b57febee0a1fdb7b</split:account>
        </trn:split>
      </trn:splits>
    </gnc:transaction>
We are now in a good place to take the final step of processing the verbs.

Conclusion

That felt like a lot of hard work to me. I wonder, in retrospect, if it is easier to deal with XML by simply having more generic types and manipulating the XMLName values more. But I do now have an XML file which GnuCash can open and show me what I expect to see.

Thursday, May 22, 2025

Reading the Spreadsheet


So we have a pipeline, but all the code is just stubbed out. Let's start filling in the blanks by reading the spreadsheet.

In doing this, I'm not going to make any pretense to "understand" what is in the spreadsheet. I am just going to convert each tab into a list of rows, each of which is a map of column name (taken from row 1) to an any value (the value in the cell of row K in the corresponding column). The final result, passed down the pipeline, is a map of tab name to a list of such rows.

Not that I've followed it as such, but there is a "quickstart" provided by Google.

Authentication

The first step is to make sure that we can authenticate the application to use Google Sheets. In the past, I've used OAuth to do this, but today I see that "API Key" is an option, so let's try that.

First, you have to go to the Google Cloud Welcome Page (all this has changed since I last did it, of course). In the top left corner, there's a thing which tells you which project you are working in, so click that and select 'New Project'.

From there, it's possible to select "APIs & Services" and then "Credentials" (look everywhere for all these things; I don't know who designed this UI, but it wasn't a UX professional). There is a "create credentials" button which has the option to create an API Key. Because it insisted, I also went through the "define branding" setup page, but I'm not sure whether that's needed.

You then need to go to the "API LIbrary" and find the "Google Sheets API" and enable it.

This API key goes in our configuration file, which now looks something like this (as I said, I'm not going to share all my files, because, well, it's kind of private).
{
        "Spreadsheet": "<id>",
        "APIKey": "<key>",
        "Output": "<a local file>"
}
And we can update Configuration to reflect the fact that this is there:
package config

type Configuration struct {
    APIKey      string
    Spreadsheet string
    Output      string
}

GNUCASH_CONNECT_SHEETS:accounts/internal/gnucash/config/config.go

In order to use Google Sheets with Go, we need to get the appropriate library:
$ go get google.golang.org/api/sheets/v4
which loads in a lot more packages as well, and updates go.mod and creates go.sum.

As we start to write the code to connect to sheets, I realize that I had originally said I would pass the whole Configuration around, but ended up just passing fields. So I went back and changed that. Now we can start writing the code to connect to the sheets:
package sheets

import (
    "context"
    "fmt"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
    "google.golang.org/api/option"
    sheets "google.golang.org/api/sheets/v4"
)

func ReadSpreadsheet(conf *config.Configuration, rcvr Receiver) {
    ctx := context.Background()
    sheetsService, err := sheets.NewService(ctx, option.WithAPIKey(conf.APIKey))
    if err != nil {
        panic(err)
    }
    call := sheetsService.Spreadsheets.Get(conf.Spreadsheet)
    fmt.Printf("%v\n", call)
}

GNUCASH_CONNECT_SHEETS:accounts/internal/gnucash/sheets/reader.go

Very good.

For what it's worth, the reference page for the Sheets library is at https://pkg.go.dev/google.golang.org/api/sheets/v4.

Troubleshooting

I always have problems with Google APIs.

It turns out that all I have done here is to create a "call" which still needs to be executed in order to actually connect to the Sheets API. This requires a .Do() to be put on the end. When I do that, it tells me that I don't have permission.

I think the deal is, that while the API key allows me to connect to the API, it doesn't allow me to access my spreadsheet. This is because, of course, while I see them as both "mine", Google Cloud doesn't. OK, let's add an OAuth API key, and copy it down onto the filesystem and provide that in the Configuration.

When I do that, it tells me that it won't allow two forms of authentication. OK, let's go back and look at the sample and try and copy that code "more closely".

There are, of course, a number of things I don't like about that code, including the manual copying-and-pasting into the browser (which I replaced by using the package github.com/pkg/browser and the manual copying-and-pasting of the response (which took a bit more work since I had to implement a browser). This also required adding some more configuration options. But at the end of the day I was able to get it to work and to recover the slice of sheets and print out the Title of each one.

This has changed sufficiently that I'm just going to present it all again. Here's the configuration:
package config

import "fmt"

type Configuration struct {
    APIKey       string
    OAuth        string
    Token        string
    RedirectPort int
    Spreadsheet  string
    Output       string
}

func (c *Configuration) RedirectURI() string {
    return fmt.Sprintf("http://localhost:%d/redirect_uri", c.RedirectPort)
}

func (c *Configuration) ListenOn() string {
    return fmt.Sprintf(":%d", c.RedirectPort)
}

GNUCASH_AUTHENTICATE:accounts/internal/gnucash/config/config.go

And then all of the code to set up servers, read tokens, connect, burble burble:
package sheets

import (
    "context"
    "encoding/json"
    "errors"
    "fmt"
    "net/http"
    "os"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
    "github.com/pkg/browser"
    "golang.org/x/oauth2"
    "golang.org/x/oauth2/google"
    "google.golang.org/api/option"
    sheets "google.golang.org/api/sheets/v4"
)

func ReadSpreadsheet(conf *config.Configuration, rcvr Receiver) {
    ctx := context.Background()

    client, err := getClient(conf)
    if err != nil {
        panic(err)
    }

    sheetsService, err := sheets.NewService(ctx, option.WithHTTPClient(client))
    if err != nil {
        panic(err)
    }

    svc, err := sheetsService.Spreadsheets.Get(conf.Spreadsheet).Do()
    if err != nil {
        panic(err)
    }

    for _, id := range svc.Sheets {
        fmt.Println(id.Properties.Title)
    }
}

func getClient(conf *config.Configuration) (*http.Client, error) {
    jc, err := readCredentials(conf)
    if err != nil {
        return nil, err
    }

    tok, err := getToken(conf, jc)
    if err != nil {
        return nil, err
    }

    return jc.Client(context.Background(), tok), nil
}

func readCredentials(conf *config.Configuration) (*oauth2.Config, error) {
    b, err := os.ReadFile(conf.OAuth)
    if err != nil {
        return nil, fmt.Errorf("unable to read client secret file: %v", err)
    }

    c, err := google.ConfigFromJSON(b, "https://www.googleapis.com/auth/spreadsheets.readonly")
    if err != nil {
        return nil, fmt.Errorf("unable to parse client secret file to config: %v", err)
    }

    c.RedirectURL = conf.RedirectURI()
    return c, nil
}

func getToken(conf *config.Configuration, jc *oauth2.Config) (*oauth2.Token, error) {
    tok, err := tokenFromFile(conf.Token)
    if err == nil {
        return tok, nil
    }

    tok, err = tokenFromWeb(conf, jc)
    if err != nil {
        return nil, err
    }

    err = saveToken(conf, tok)
    if err != nil {
        return nil, err
    }
    return tok, nil
}

func tokenFromFile(file string) (*oauth2.Token, error) {
    f, err := os.Open(file)
    if err != nil {
        return nil, err
    }
    defer f.Close()
    tok := &oauth2.Token{}
    err = json.NewDecoder(f).Decode(tok)
    return tok, err
}

func tokenFromWeb(conf *config.Configuration, jc *oauth2.Config) (*oauth2.Token, error) {
    ch := make(chan string)
    ws := launchWebServer(conf, ch)
    authURL := jc.AuthCodeURL("state-token", oauth2.AccessTypeOffline)
    browser.OpenURL(authURL)

    defer ws.Shutdown(context.Background())

    authCode := <-ch

    tok, err := jc.Exchange(context.TODO(), authCode)
    if err != nil {
        return nil, fmt.Errorf("unable to retrieve token from web: %v", err)
    }
    return tok, nil
}

func saveToken(conf *config.Configuration, tok *oauth2.Token) error {
    fmt.Printf("Saving credential file to: %s\n", conf.Token)
    f, err := os.OpenFile(conf.Token, os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0600)
    if err != nil {
        return fmt.Errorf("unable to cache oauth token: %v", err)
    }
    defer f.Close()
    json.NewEncoder(f).Encode(tok)

    return nil
}

func launchWebServer(conf *config.Configuration, ch chan string) *http.Server {
    handlers := http.NewServeMux()
    redirectUri := &RedirectHandler{channel: ch}
    handlers.Handle("/redirect_uri", redirectUri)

    addr := conf.ListenOn()
    server := &http.Server{Addr: addr, Handler: handlers}
    go func() {
        err := server.ListenAndServe()
        if err != nil && !errors.Is(err, http.ErrServerClosed) {
            fmt.Printf("error starting server: %s\n", err)
        }
    }()
    return server
}

type RedirectHandler struct {
    channel chan string
}

func (r *RedirectHandler) ServeHTTP(resp http.ResponseWriter, req *http.Request) {
    code := req.URL.Query()["code"]
    r.channel <- code[0]
}

GNUCASH_AUTHENTICATE:accounts/internal/gnucash/sheets/reader.go

That's long enough that I think it deserves some explanation, but I'm not really sure what to say. Hopefully the function names and structure give most of the clarity that is required. I think the only real alternative is to go back and rewrite Google's example and documentation for them and I'm not really interested in doing that. The point being that I don't really feel I have written any of this code, I just scrabbled things together from Google and made it work. In the ordinary course of events I would shunt this off into a library - it's exactly the same code regardless of which Google Sheet you're reading and what you're doing with it.

Reading the Sheets

Now we can move on to actually reading and parsing the data in the sheets, which is why we came here in the first place. We can see that we have the sheets and that each one has a Title. Now we need to recover the values in each sheet and slice and dice it into something sensible (i.e. a list of rows, where each row is a map of the header to the value).

The way in which the Google APIs work always seems "backwards" to me, because you keep having to go back to the service and ask for this or that, rather than it returning more "active" objects. Anyway, with a little bit of cribbing and a little bit of googling, I managed to get this to work out:
func ReadSpreadsheet(conf *config.Configuration, rcvr Receiver) {
    ctx := context.Background()

    client, err := getClient(conf)
    if err != nil {
        panic(err)
    }

    sheetsService, err := sheets.NewService(ctx, option.WithHTTPClient(client))
    if err != nil {
        panic(err)
    }

    info, err := sheetsService.Spreadsheets.Get(conf.Spreadsheet).Do()
    if err != nil {
        panic(err)
    }

    var tabs []Tab
    for _, sheet := range info.Sheets {
        tab := gatherTabData(sheetsService, conf.Spreadsheet, sheet.Properties.Title, sheet.Properties.GridProperties.RowCount, sheet.Properties.GridProperties.ColumnCount)
        tabs = append(tabs, tab)
    }

    rcvr.DeliverSheet(tabs)
}

func gatherTabData(svc *sheets.Service, sheetId, title string, rc, cc int64) Tab {
    cells, err := svc.Spreadsheets.Values.Get(sheetId, fmt.Sprintf("%s!R1C1:R%dC%d", title, rc, cc)).Do()
    if err != nil {
        panic(err)
    }

    heads := make(map[int]string)
    var tab Tab
    for i, r := range cells.Values {
        if i == 0 {
            for j, c := range r {
                heads[j] = c.(string)
            }
        } else {
            row := Row{Columns: make(map[string]any)}
            for j, c := range r {
                if heads[j] != "" {
                    row.Columns[heads[j]] = c
                }
            }
            tab.Rows = append(tab.Rows, row)
        }
    }
    return tab
}

GNUCASH_READ_SPREADSHEET:accounts/internal/gnucash/sheets/reader.go

I noticed during my googling that other people seemed to have similar frustrations and had gone as far as to write a wrapper around the sheets API to make it work more sensibly. Anyway, I hope what I ended up writing was clear enough. If not, get in touch and I'll try and address your concerns.

Conclusion

While it took more time and effort to read a spreadsheet than I would like, I was at least ready for that from my previous experiences with Google APIs. It's just a lot of details and quite a bit of UI and none of it is intuitive to me.

Building out a Pipeline

OK, we've understood what we're dealing with, so let's get started.

I have worked with Google Docs and Sheets in the past (I am, in fact, writing this blog post as a Google Doc: it reaches you through a series of "automated" transformations) but always in Java.

Currently, I'm doing a lot in Go, and there doesn't seem any reason not to do this that way. Reading Google Sheets from Go should be supported since both come from Google, right?

As usual with Go, I am going to try and keep the cmd portion down to a minimum. So all that does is read the configuration and then pass that to the AccountsPipeline function which sets up a pipeline to read the accounts, generate a GnuCash file, create accounts from that, and then submit them.

Configuration

Since I don't want to hardcode lots of things into my code, but I also don't want 10,000 arguments, I am going to have a configuration file whose name is provided as the main argument to the program. It will define all the paths and addresses and private keys that I need. I suspect I may end up with a couple more arguments, such as the range of dates to include in the accounts. But this is good enough for now.

Since GnuCash files are in XML and I will need to be using XML to interact with that, I'm tempted to put my configuration file in XML. But I realize that even though it's functionally equivalent, I don't hate myself that much and I'll use JSON instead.

For obvious reasons, I'm not going to share either my accounts files or my configuration file here. Most of the details will become obvious as we play the game, and I will probably share excerpts of things where it is less obvious and I am not sharing private data. All the other code is where you would expect it.

Let's start by defining our initial configuration options in Go:
package config

type Configuration struct {
    Spreadsheet string
    Output      string
}

GNUCASH_CONFIG:accounts/internal/gnucash/config/config.go

We can easily write code to read that using the JSON Serializer:
package config

import (
    "encoding/json"
    "log"
    "os"
)

func ReadConfig(file string) (*Configuration, error) {
    log.Printf("reading %s\n", file)
    bs, err := os.ReadFile(file)
    if err != nil {
        return nil, err
    }
    ret := Configuration{}
    json.Unmarshal(bs, &ret)
    log.Printf("read %v\n", ret)
    return &ret, nil
}

GNUCASH_CONFIG:accounts/internal/gnucash/config/reader.go

And we can define a main program:
package main

import (
    "fmt"
    "os"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
)

func main() {
    _, err := config.ReadConfig(os.Args[1])
    if err != nil {
        fmt.Printf("failed to read config: %v\n", err)
    }
}

GNUCASH_CONFIG:accounts/cmd/accounts/main.go

Setting up a Pipeline

As you know, I'm a big fan of TDA (tell-don't-ask) programming, and this is no different. The general approach is to set up a pipeline of operations where each one takes what arguments it needs, along with a processor for "the next step". I'm going to put all of this in a function that I can call from my cmd file after reading the configuration. There are three basic steps:
  • Read the spreadsheet;
  • Convert into Gnucash format;
  • Write to a .gnucash file.
Not that hard. The cmd file now looks like this:
package main

import (
    "fmt"
    "os"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/pipeline"
)

func main() {
    conf, err := config.ReadConfig(os.Args[1])
    if err == nil {
        pipeline.AccountsPipeline(conf)
    } else {
        fmt.Printf("failed to read config: %v\n", err)
    }
}

GNUCASH_OUTLINE_PIPELINE:accounts/cmd/accounts/main.go

And the pipeline is fairly simple too:
package pipeline

import (
    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/accounts"
    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/sheets"
    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/writer"
)

func AccountsPipeline(conf *config.Configuration) {
    writer := writer.MakeWriter(conf.Output)
    accts := accounts.MakeAccounts(writer)
    sheets.ReadSpreadsheet(conf.Spreadsheet, accts)
}

GNUCASH_OUTLINE_PIPELINE:accounts/internal/gnucash/pipeline/accounts.go

And for now, I am just going to stub out the rest of the things in packages under internal/gnucash.

Company Accounts


Like many of you, I have a limited company. Like you, I have to submit annual accounts and a tax return. Probably unlike most of you, my company hasn't had any income since Covid struck, and my accountants were so useless I fired them (for the amount they were charging you would think they could submit some very simple "nothing happened" accounts within nine months but apparently not).

So I was left with the problem of submitting my accounts myself. Up until now I have done it "by hand", filling out the government issued forms in my browser. It's not that hard (at least for me) since the only activity is paying my administratiion fee.

But it always raises the question (for me) of "if I can do it by hand, can the computer do it by itself?".

Some googling later, I found the GnuCash ecosystem, which apparently also has a module to submit UK corporate accounts. On the surface this sounds like exactly what I need: while it has a GUI, everything is based around open (XML) files, which I can read, create and edit.

Furthermore, it seems to have all the features I need in order to get the job done. There's just one problem: I don't want to use their UI, and I don't really want to enter into their complex mental model. But what do I want to do?

A Simpler Model

I was sufficiently wrapped up in the complexities presented by GnuCash that I lost sight of the obvious, and kudos to Martin Rue for pointing it out to me: what I want to do is to track all my transactions in a Google Sheet. I'm not entirely sure I understood what he was describing, but my thought is to have a single Google Sheet with one tab for each "type" of transaction. The obvious types of transactions are "income" and "expense" but I suspect that these need to be further categorized.

Given that, I can write a simple program which reads the sheets, runs through all the transactions, creates or updates the GnuCash files (I am tempted to create the files from scratch on every run) and then invokes the tool to prepare accounts for Companies House and HMRC.

Simples. Let's get started.

Playing with GnuCash

The GnuCash file format may be documented somewhere, although I've only seen vague references, but as I've commented before, while I find documentation useful for clearing up details, I generally find it a lot easier to work with an example. So I'm not going to look that hard but instead try and reverse engineer it by myself.

So the first thing I'm going to do is boot up the GnuCash UI and mess around for a while until I have something I'm happy with, and then look at that XML file. Then I can use that as a guide in generating something similar.

The first step is to download GnuCash, and because today is a Wednesday, I'm going to do this on my Mac rather than on Linux and download the Mac x86 dmg and then install GnuCash into Applications. Once there, I'm going to start it up. I think there's something to be said for going through all the steps with pictures, but there's documentation on how to do that, so I'm not going to. I'll just elaborate on the basic steps.

Two windows come up; one is a "Book" window which is presumably where my accounts are; and one is a dialog asking what I want to do. As it happens, I want to create a new set of accounts, which is the first choice, so I'll select that.

This opens a new dialog called "New Account Hierarchy Setup". This is all described in the documentation so I won't dwell on it that much except to say that I'm going through it with the documentation in one hand and my previous years' accounts in the other.

In going through the various options, I opted to set up for "UK VAT Accounts". Now, I'm not registered for - and don't pay - VAT, but this seems the closest thing to what I want. Remember that it doesn't really matter that much as I'm planning on generating all this myself in a little while.

In doing all of this, and tinkering around and setting up accounts and entering values, I kept on getting the same thing - all the "headline numbers" in my balance sheet were $0. (And yes, in dollars.) Everything just seemed screwed up. I went back and tried various different options, but nothing seemed to work. Unlike having issues with the UI or entering transactions, it seems to me that this really does matter if I am going to be using GnuCash to submit my company accounts.

Then I tried "Balance Sheet (multicolumn)" and that "just worked". I am sitting here right now, looking at two balance sheets. One has entries of $0 almost everywhere (apart from one value which is correct, but ignored in the big picture) and the other looks exactly how I would expect to see it. Huh?

Some amount of Googling later, I found this. It suggests that I have mixed currencies, which seems true to me, and to sort this out in "local settings". Not sure what that is, but going through "Gnucash > Settings", I found a number of places (specifically and, I suspect, most significantly, "Reports") where the currency was set to USD. I'm annoyed about this, because I was told, specifically, to be careful during setup that you chose the right currency as this would affect everything. Apparently it doesn't, and you end up with USD half the time anyway. (For the record, my laptop is set up to believe it is in the US.)

OK, now I get what I expect from the balance sheet. I'll have to be careful to replicate those settings when I generate it by hand.

While I'm looking at settings, I had read that by default the XML file is compressed and there is an option (General > Files > Compress Files) to stop that. So I unchecked that.

OK, I seem to have exactly what I want now. Time to look at the resulting XML file.

Introspecting a GnuCash File

I'm not going to explain XML format or syntax here. If you don't understand that yet, and want to, I'd recommend a good book such as O'Reilly's Learning XML. Other than that, there are online resources, such as W3Schools. I'm just going to dive in to this XML file. Because I'm a luddite, I just read these things as text files using more on the command line, but there are good structured editors out there.

The top level node is gnc-v2, and that then includes a lot of namespaces, so we'll need to be sure to copy those. Sadly, the links are all "fraudulent", so we can't go there for documentation. Then there is a count-data entry that seems to say there is 1 "book", which seems reasonable, and then a book definition using version 2.0.0. This has a GUID ID, so we'll need to make sure we can generate a Guid.

There are then a bunch of "slot" definitions, The first is for "counter_formats", each of which has a key but no value; I'm not sure what these are for, so I'm tempted to leave them out. These are followed by "counters" which were configured during setup - and seem to correspond to the slots. I didn't need any of these, so I'm tempted to ask if I can do without them. Then "features", then "options". I'm starting to recognize all of these as the configuration settings in the setup screens.

Then we have a count for "commodity" set to 1, a count for "accounts" set to 12 and "transactions" set to 3. The commodity seems to be GBP and still seems to reflect a world in which I don't just want all my accounts in GBP. Although possibly it is just "for rigour" and you always need to specify this. Maybe I should go back and do a set of accounts all in USD and see what happens. Interestingly, there is then another "commodity" entry which is something to do with "templates". I don't know...

Then I have my tree of accounts, starting with the root account. The root account does not have a parent but all the others have an <act:parent> entry with a guid parent. These guids also do not have any dashes - I'm not sure if that's essential but it's something to bear in mind.

And finally we have a bunch of <gnc:transaction> elements which seem more complicated than possibly they need to be but basically show the accounts they were transferred between. It also has a somewhat confusing syntax for the value - citing it in pence and then writing "/100" after it. Presumably this is to make it easy to parse as integers rather than floating point.

Simplifying the GnuCash file

It seems like a lot of this file is junk, but the more I think about it, the more I consider it is "necessary complexity" rather than unnecessary complexity. However, to satisfy my curiosity more than anything else, I deleted the "counter_formats" and "counters" slots, the "template" commodity and the "notes" slot inside a transaction, as well as changing transaction values and marking everything "reconciled" and was able to open the file again in GnuCash and see everything working.

I have a (possibly misplaced) degree of confidence that I know what I'm doing here.

Entering the Data

I have all my accounts and bank statements for the lifetime of the company, so I can put those in a Google Sheet easily enough (because of the way I work, this is not an overwhelming amout of data either).

Now, as I do this, I am struck by how "regular" many of my transactions are: I get paid, I pay myself, I pay my accountant, I pay Companies House, and so on. And, because I'm thinking a lot about "verbs" at the moment, it seems that one way of organizing this spreadsheet is to have one tab for each "kind" of action: being paid, paying myself, paying particular kinds of bills, etc. In this way, each tab will be configured to be from one account to another account. And thus all I need to enter is a date and an amount. As it happens, I think a description may also be useful, so all the tabs will have those three columns.

Talk among yourselves while I enter that data ...

Having entered all that, I find there is one exception to the general rule. It turns out that sometimes I have paid Companies House from my business account, and at other times from my personal account (which shows up as a Director Loan). After some reflection, I decided I would rather keep all this on one tab, so I added an extra column Source to allow me to override the configured source account. It is not unreasonable to assume a hypothetical Destination column to override the configured destination account. I also put some information in a Comments field to help me remember in future what I had done. This column will be ignored.

Moving forward

So I think I'm all set up and ready to start writing code. But that's for next time.