Friday, May 23, 2025

Generating the XML


I'm now going to skip to the other end, writing out the XML file, for a number of reasons:
  • I haven't done it before, so it seems high risk;
  • It connects to our other dependency in this project, so its shape is forced on us;
  • In order to build this out, I will need to define the structs that I want to transform into;
  • The middle piece is the most interesting, so I'm pushing it off until last.
So, for now, the middle stage of the pipeline is just going to print out the incoming sheet data and then call the final phase with initially no accounts. As I need them, I will add more test data there.

In the first episode, I toyed around with GnuCash, so I have a sample file. I'm going to try and produce something "similar" to that, although I doubt that I can generate an identical file (for example, the GUIDs will be different).

XML Marshalling

So the basic thing we want to do is take the incoming structs and marshal them to XML. There is a package xml and a method MarshalIndent that does exactly that. How hard can it be?

It's not that hard to do the writing:
package writer

import (
    "encoding/xml"
    "os"

    "github.com/gmmapowell/ignorance/accounts/internal/gnucash/config"
)

const (
    header = `<?xml version="1.0" encoding="UTF-8"?>` + "\n"
)

type Writer struct {
    Config *config.Configuration
}

func (w *Writer) Deliver(accts *Gnucash) {
    bs, err := xml.MarshalIndent(accts, "", "  ")
    if err != nil {
        panic(err)
    }

    withHeader := header + string(bs)

    os.WriteFile(w.Config.Output, []byte(withHeader), 0666)
}

func MakeWriter(conf *config.Configuration) *Writer {
    return &Writer{Config: conf}
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/writer.go

As much as anything, it's a pain that the xml package doesn't allow you to specify the doctype header, so there are hoops to jump through for that.

All the pain, of course, comes from having to specify the types to be marshalled and getting all the XML tags in place.

At the top level, we have the GnuCash definition which represents the whole file:
package writer

import "encoding/xml"

type Gnucash struct {
    XMLName xml.Name `xml:"gnc-v2"`
    Namespaces
    Elements
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

The field XMLName here is a special field whose value or tag specifies the "tag" associated with the element, in this case gnc-v2. The other two fields represent struct values which are anonymous: this means that the marshaller does not output any field name for them.
type Namespaces struct {
    GNC string `xml:"xmlns:gnc,attr"`
    ACT string `xml:"xmlns:act,attr"`
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

This is where we will put all our namespaces. Because it is included anonymously inside the GnuCash definition, these attr items are automatically included as attributes on the gnc-v2 element. Their values are defined on initialization to have the appropriate scopes:
func completeNamespaces(gnc *Gnucash) {
    gnc.GNC = "http://www.gnucash.org/XML/gnc"
    gnc.ACT = "http://www.gnucash.org/XML/act"
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

In the same way, we can define the anonymous field of type Elements to just be a slice of any, allowing us to include an arbitrary number of nested elements, each self-identifying:
type Elements []any

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And then we can make a start on all the big elements in a GnuCash file, starting with CountData and Book:
type CountData struct {
    XMLName xml.Name `xml:"gnc:count-data"`
    Type    string   `xml:"cd:type,attr"`
    Count   int      `xml:",chardata"`
}

type AccountBook struct {
    XMLName     xml.Name `xml:"gnc:book"`
    BookVersion string   `xml:"version,attr"`
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And it's fairly easy to create the basic set of these:
func NewAccounts() *Gnucash {
    ret := Gnucash{}
    completeNamespaces(&ret)
    ret.Elements = append(ret.Elements, NewCountData("book", 1))
    ret.Elements = append(ret.Elements, NewAccountBook())
    return &ret
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

func NewCountData(ty string, cnt int) *CountData {
    ret := CountData{Type: ty, Count: cnt}
    return &ret
}

func NewAccountBook() *AccountBook {
    ret := AccountBook{BookVersion: "2.0.0"}
    return &ret
}

GNUCASH_WRITE_XML_1:accounts/internal/gnucash/writer/gnucash.go

And having done all that, we can see that we get the expected output in our output file:
<?xml version="1.0" encoding="UTF-8"?>
<gnc-v2 xmlns:gnc="http://www.gnucash.org/XML/gnc" xmlns:act="http://www.gnucash.org/XML/act">
  <gnc:count-data cd:type="book">1</gnc:count-data>
  <gnc:book version="2.0.0"></gnc:book>
</gnc-v2>
Now, while I have checked this about five times, there's no guarantee that I've done it perfectly, so it may be that we later find out that it isn't valid GnuCash XML. Anyway, this is really very boring and tedious, so I'm going to go off and finish all the boilerplate stuff and I'll come back here when I want to start dealing with the actual accounts and transactions.

You can see the remaining changes in GNUCASH_WRITE_XML_2. Note that while most of this is "complete boilerplate", there are a couple of sections which copy data directly from the configuration file into the output file, which is something of a precursor to what is about to follow.

Defining Accounts

Ideally, I would like all of the account information to be in just one place and, logically, that would have to be in the spreadsheet. However, I've tried this before and spreadsheets are really only good at one thing - storing grid data. Everything else we try and do with them - charts, reports, presentations - just shows up how bad they are at doing anything else, which is why I built my own Modeller tool to do those things instead.

Also, the information I have in the spreadsheet is not arranged by account but by "verb" and so it's not really clear where this information would go anyway. I could have a separate tab with account info in, but at the end of the day I'd rather put it in JSON in the configuration file.

So let's do that.

The account structure is hierarchical, which means that accounts can contain accounts. In the GnuCash configuration, this is represented by parent pointers, but it is perfectly reasonable to do it using a list member for each account. So I will have an Accounts field which is a list of accounts, and each account can likewise have an Accounts field which contains a list of sub-accounts. If it doesn't, it is a leaf account.

GnuCash has a "Root Account" which is a fake account which is the parent of all the others. So let's generate that first before we try and map the others.

As with everything else, we need to start by defining the XML record types we want to write to the GnuCash file.
type Account struct {
    XMLName xml.Name `xml:"gnc:account"`
    Version string   `xml:"version,attr"`
    Elements
}

type AccountItem struct {
    XMLName xml.Name
    Type    string `xml:"type,attr,omitempty"`
    Value   string `xml:",chardata"`
    Elements
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

Then in NewAccountBook, we can call mapAccounts, which will do all the hard work and return a list of accounts, and then use that response to set both the number of accounts and to include all the accounts in the list of Elements in the Account Book.
func NewAccountBook(conf *config.Configuration) *AccountBook {
    ret := AccountBook{BookVersion: "2.0.0"}
    bookId := BookId{Type: "guid", Guid: "95d515f6a8ef4c6fb50d245c82e125b3"}
    ret.Elements = append(ret.Elements, bookId)
    slots := BookSlots{}
    opts := MakeSlot("options", "frame")
    opts.Value.AnonymousValue = []any{FillBusiness(conf), FillTax(conf)}
    slots.Elements = append(slots.Elements, opts)
    ret.Elements = append(ret.Elements, slots)

    mappedAccounts := mapAccounts(conf)

    commodities := CountData{Type: "commodity", Count: 1}
    ret.Elements = append(ret.Elements, commodities)
    accounts := CountData{Type: "account", Count: len(mappedAccounts)}
    ret.Elements = append(ret.Elements, accounts)
    txns := CountData{Type: "transaction", Count: 3} // TODO: figure out this number
    ret.Elements = append(ret.Elements, txns)

    gbp := Commodity{Version: "2.0.0"}
    space := NewCommodityItem("space", "CURRENCY")
    id := NewCommodityItem("id", "GBP")
    gq := NewCommodityItem("get_quotes", "")
    qs := NewCommodityItem("quote_source", "currency")
    qtz := NewCommodityItem("quote_tz", "")
    gbp.Elements = []any{space, id, gq, qs, qtz}
    ret.Elements = append(ret.Elements, gbp)

    ret.Elements = append(ret.Elements, mappedAccounts...)

    return &ret
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

And we can define mapAccounts and its supporting functions which, for now, just define the root account:
func mapAccounts(conf *config.Configuration) []any {
    name := NewAccountItem("name", "RootAccount")
    id := NewAccountItem("id", newGuid())
    id.Type = "guid"
    ty := NewAccountItem("type", "ROOT")
    curr := NewAccountItem("commodity", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    scu := NewAccountItem("commodity-scu", "100")

    rootAccount := Account{Version: "2.0.0", Elements: []any{name, id, ty, curr, scu}}
    return []any{rootAccount}
}

func NewAccountItem(tag, value string) AccountItem {
    name := xml.Name{Local: "act:" + tag}
    return AccountItem{XMLName: name, Value: value}
}

func newGuid() string {
    return strings.Replace(uuid.New().String(), "-", "", -1)
}

GNUCASH_MAP_ROOT_ACCOUNT:accounts/internal/gnucash/writer/gnucash.go

And for all this hard work, we get a Root Account that looks something like this (the Guid changes every time we run it, obviously):
   <gnc:account version="2.0.0">
      <act:name>RootAccount</act:name>
      <act:id type="guid">1b69d9969d1640fba38a048c52a12c48</act:id>
      <act:type>ROOT</act:type>
      <act:commodity>
        <cmdty:space>CURRENCY</cmdty:space>
        <cmdty:id>GBP</cmdty:id>
      </act:commodity>
      <act:commodity-scu>100</act:commodity-scu>
    </gnc:account>
It's interesting that when I try and open this file from the command line, GnuCash starts but it opens the file I was experimenting with previously. I'm not sure why, but it's presumably not a very important bug. From the menu, I'm able to select the generated file and it opens properly and shows no account hierarchy or transactions, but I can see the company info in the Properties window.
$ open accounts.gnucash
Now that we have a root account, we have somewhere to attach our other accounts as we (recursively) traverse the configuration. So let's look at the configuration as read from the JSON file:
      "Accounts": [
                {
                        "Name": "Assets",
                        "Type": "ASSET",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Current Assets",
                                        "Type": "ASSET",
                                        "Placeholder": true,
                                        "Accounts": [
                                                {
                                                        "Name": "Checking Account",
                                                        "Type": "BANK"
                                                }
                                        ]
                                }
                        ]
                },
                {
                        "Name": "Income",
                        "Type": "INCOME"
                },
                {
                        "Name": "Expenses",
                        "Type": "EXPENSE",
                        "Accounts": [
                                {
                                        "Name": "Accountancy Fees",
                                        "Type": "EXPENSE"
                               }
                        ]
                },
                {
                        "Name": "Equity",
                        "Type": "EQUITY",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Capital Account - Gareth Powell",
                                        "Type": "EQUITY"
                                }
                        ]
                },
                {
                        "Name": "Liabilities",
                        "Type": "LIABILITY",
                        "Placeholder": true,
                        "Accounts": [
                                {
                                        "Name": "Expense Account - Gareth Powell",
                                        "Type": "LIABILITY"
                                },
                                {
                                        "Name": "Director Loan Account - Gareth Powell",
                                        "Type": "LIABILITY"
                                }
                        ]
                }
        ]
When it comes down to it, I may end up needing more or less accounts and they may need to be different to these. But I believe this is basically what I want.

I started off by adding the appropriate struct to the configuration, which then allows all of this to be read in by our existing code:
type Configuration struct {
    APIKey       string
    OAuth        string
    Token        string
    RedirectPort int
    Spreadsheet  string
    Output       string

    Business Business
    Accounts []Account
}
type Account struct {
    Name        string
    Type        string
    Placeholder bool
    Accounts    []Account
}

GNUCASH_MAP_ACCOUNTS:accounts/internal/gnucash/config/config.go

Mapping the accounts now becomes recursive, with the base case being mapping the Root account with the slice of accounts from the configuration and explicit settings for name and type, and then the recursive case taking its information from an Account object. Most of this is a refactoring from what we already had, but it was also necessary to "fill in" the features (such as parent) that were not used in the base case.

Most of the work is done by the recursive case makeAccount, but the helper function mapAccount massages the conf.Account structure into the right form.
func mapAccounts(conf *config.Configuration) []any {
    return makeAccount([]any{}, "RootAccount", "ROOT", "", false, conf.Accounts)
}

func mapAccount(mapped []any, acc config.Account, parent string) []any {
    return makeAccount(mapped, acc.Name, acc.Type, parent, acc.Placeholder, acc.Accounts)
}

func makeAccount(mapped []any, called, ofType, parent string, placeholder bool, accts []config.Account) []any {
    name := NewAccountItem("name", called)
    guid := newGuid()
    id := NewAccountItem("id", guid)
    id.Type = "guid"
    ty := NewAccountItem("type", ofType)
    curr := NewAccountItem("commodity", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    scu := NewAccountItem("commodity-scu", "100")
    acct := Account{Version: "2.0.0", Elements: []any{name, id, ty, curr, scu}}

    if parent != "" {
        desc := NewAccountItem("description", called)
        acct.Elements = append(acct.Elements, desc)
    }

    if placeholder {
        plac := NewAccountItem("slots", "")
        ps := MakeSlot("placeholder", "string")
        ps.Value.StringValue = "true"
        plac.Elements = []any{ps}
        acct.Elements = append(acct.Elements, plac)
    }

    if parent != "" {
        parElt := NewAccountItem("parent", parent)
        parElt.Type = "guid"
        acct.Elements = append(acct.Elements, parElt)
    }

    mapped = append(mapped, acct)
    for _, a := range accts {
        mapped = mapAccount(mapped, a, guid)
    }

    return mapped
}

GNUCASH_MAP_ACCOUNTS:accounts/internal/gnucash/writer/gnucash.go

And, just like that, all the accounts have been mapped.

Transactions

The transactions really need to come out of the spreadsheets, but only after being transformed. Remember that our spreadsheets hold "verb" statements and so need to be interpreted in the context of yet more configuration (to be defined in the next episode), so we need to have the middle layer provide something closer to what GnuCash expects.

On the other hand, the GnuCash format itself is horrible (because it is so closely tied to XML). So we want an intermediate form which is isomorphic to the GnuCash format, but simpler to use. While I'm making it simpler to use, I'm going to omit (at least for now) some of the more complex features (such as split transactions and the ability to handle multiple currencies) and say that a transaction consists of the date of the transaction, the name of a source account, the name of a destination account and a Money item, which is just a type to wrap up some of the complexities of handling money.

We can define a couple of sample transactions in the accounting pipeline in lieu of doing the real work of reading and processing a spreadsheet:
func AccountsPipeline(conf *config.Configuration) {
    w := writer.MakeWriter(conf)
    // TODO: these two lines skip reading the spreadsheet
    accts := writer.NewAccounts(conf)
    accts.Transact(writer.Date(2021, 10, 29, 1059), "Income", "Checking Account", writer.GBP(1))
    accts.Transact(writer.Date(2020, 12, 31, 1059), "Income", "Checking Account", writer.GBP(8624))
    w.Deliver(accts)
    // accts := accounts.MakeAccounts(conf, writer)
    // sheets.ReadSpreadsheet(conf, accts)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/pipeline/accounts.go

This uses the Date and GBP functions from the writer package.

Money is defined in its own file, money.go:
package writer

import "fmt"

type Money struct {
    Units, Subunits int
}

func GBP(units int) Money {
    if units < 0 {
        panic("invalid money")
    }
    return Money{Units: units, Subunits: 0}
}

func GBPP(units, subunits int) Money {
    if units < 0 || subunits < 0 || subunits > 100 {
        panic("invalid money")
    }
    return Money{Units: units, Subunits: subunits}
}

func (m Money) GCCredit() string {
    return fmt.Sprintf("%d/100", 100*m.Units+m.Subunits)
}

func (m Money) GCDebit() string {
    return "-" + m.GCCredit()
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/money.go

The key functions here are GBP and GBPP which accept numbers of whole pounds, or pounds and pence and pack them into a structure. That structure then has the ability to generate "credit" or "debit" lines that we will use to build a transaction in XML.

DateInfo is defined in the same way in date.go:
package writer

import (
    "encoding/xml"
    "fmt"
)

type DateInfo struct {
    Year, Month, Day int
    HHMM             int
}

type DateXML struct {
    XMLName xml.Name `xml:"ts:date"`
    Value   string   `xml:",chardata"`
}

type GDateXML struct {
    XMLName xml.Name `xml:"gdate"`
    Value   string   `xml:",chardata"`
}

func Date(yyyy, mm, dd, hhmm int) DateInfo {
    return DateInfo{Year: yyyy, Month: mm, Day: dd, HHMM: hhmm}
}

func (d DateInfo) AsXML() DateXML {
    return DateXML{Value: d.String()}
}

func (d DateInfo) AsGDateXML() GDateXML {
    return GDateXML{Value: d.JustDate()}
}

func (d DateInfo) String() string {
    return fmt.Sprintf("%04d-%02d-%02d %02d:%02d:00 +0000", d.Year, d.Month, d.Day, d.HHMM/100, d.HHMM%100)
}

func (d DateInfo) JustDate() string {
    return fmt.Sprintf("%04d-%02d-%02d", d.Year, d.Month, d.Day)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/date.go

This allows users to create dates giving a date in three parts and the time in the 24-hour clock. It can convert that into two different XML formats used by GnuCash, as well as the two corresponding string formats.

In order to add transactions to the XML, we need to define appropriate XML-compliant structs:
type Transaction struct {
    XMLName xml.Name `xml:"gnc:transaction"`
    Version string   `xml:"version,attr"`
    Elements
}

type TransactionItem struct {
    XMLName xml.Name
    Type    string `xml:"type,attr,omitempty"`
    Value   string `xml:",chardata"`
    Elements
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/gnucash.go

Then we can add a method to GnuCash to accept simple transaction definitions and map them onto these GnuCash XML transactions:
func (g *Gnucash) Transact(date DateInfo, src string, dest string, amount Money) {
    srcGuid := g.accountGuids[src]
    if srcGuid == "" {
        panic("there is no account for " + src)
    }
    destGuid := g.accountGuids[dest]
    if destGuid == "" {
        panic("there is no account for " + dest)
    }
    tx := &Transaction{Version: "2.0.0"}
    guid := newGuid()
    id := NewTxItem("id", guid)
    id.Type = "guid"
    curr := NewTxItem("currency", "")
    space := NewCommodityItem("space", "CURRENCY")
    currid := NewCommodityItem("id", "GBP")
    curr.Elements = []any{space, currid}
    dateXML := date.AsXML()
    datePosted := NewTxItem("date-posted", "")
    datePosted.Elements = []any{dateXML}
    dateEntered := NewTxItem("date-entered", "")
    dateEntered.Elements = []any{dateXML}
    desc := NewTxItem("description", "")
    slots := NewTxItem("slots", "")
    dp := MakeSlot("date-posted", "")
    dp.Value.Type = "gdate"
    dp.Value.AnonymousValue = date.AsGDateXML()
    notes := MakeSlot("notes", "")
    notes.Value.Type = "string"
    slots.Elements = []any{dp, notes}
    splits := NewTxItem("splits", "")
    splitFrom := NewTxItem("split", "")
    {
        sfid := NewSplitItem("id", newGuid())
        sfid.Type = "guid"
        rec := NewSplitItem("reconciled-state", "y")
        value := NewSplitItem("value", amount.GCCredit())
        quant := NewSplitItem("quantity", amount.GCCredit())
        acct := NewSplitItem("account", destGuid)
        acct.Type = "guid"
        splitFrom.Elements = []any{sfid, rec, value, quant, acct}
    }
    splitTo := NewTxItem("split", "")
    {
        stid := NewSplitItem("id", newGuid())
        stid.Type = "guid"
        rec := NewSplitItem("reconciled-state", "y")
        value := NewSplitItem("value", amount.GCDebit())
        quant := NewSplitItem("quantity", amount.GCDebit())
        acct := NewSplitItem("account", srcGuid)
        acct.Type = "guid"
        splitTo.Elements = []any{stid, rec, value, quant, acct}
    }
    splits.Elements = []any{splitFrom, splitTo}
    tx.Elements = []any{id, curr, datePosted, dateEntered, desc, slots, splits}
    g.book.Elements = append(g.book.Elements, tx)
}

GNUCASH_XML_TRANSACTIONS:accounts/internal/gnucash/writer/gnucash.go

This is all just a lot of malarkey to generate the expected transaction with its splits:
   <gnc:transaction version="2.0.0">
      <trn:id type="guid">8fafffa713914d96bed806ecb7c18685</trn:id>
      <trn:currency>
        <cmdty:space>CURRENCY</cmdty:space>
        <cmdty:id>GBP</cmdty:id>
      </trn:currency>
      <trn:date-posted>
        <ts:date>2021-10-29 10:59:00 +0000</ts:date>
      </trn:date-posted>
      <trn:date-entered>
        <ts:date>2021-10-29 10:59:00 +0000</ts:date>
      </trn:date-entered>
      <trn:description></trn:description>
      <trn:slots>
        <slot>
          <slot:key>date-posted</slot:key>
          <slot:value type="gdate">
            <gdate>2021-10-29</gdate>
          </slot:value>
        </slot>
        <slot>
          <slot:key>notes</slot:key>
          <slot:value type="string"></slot:value>
        </slot>
      </trn:slots>
      <trn:splits>
        <trn:split>
          <split:id type="guid">948211d71ae841a48452fde946509056</split:id>
          <split:reconciled-state>y</split:reconciled-state>
          <split:value>100/100</split:value>
          <split:quantity>100/100</split:quantity>
          <split:account type="guid">60ae38766cce4d65b5c843911af9414f</split:account>
        </trn:split>
        <trn:split>
          <split:id type="guid">dbe06968e0e2472db535fd47a28d03df</split:id>
          <split:reconciled-state>y</split:reconciled-state>
          <split:value>-100/100</split:value>
          <split:quantity>-100/100</split:quantity>
          <split:account type="guid">b178172425484d66b57febee0a1fdb7b</split:account>
        </trn:split>
      </trn:splits>
    </gnc:transaction>
We are now in a good place to take the final step of processing the verbs.

Conclusion

That felt like a lot of hard work to me. I wonder, in retrospect, if it is easier to deal with XML by simply having more generic types and manipulating the XMLName values more. But I do now have an XML file which GnuCash can open and show me what I expect to see.

No comments:

Post a Comment