Thursday, July 24, 2025

iXBRL Schemas


Picking up where we left off, the HMRC has rejected our submission because of a schema validation, but locally everything validates. How can that be?

Well, xmllint is somewhat "generous" in that it only validates schemas it has access to; anything else it just waves on through. We need to find and use those. Do I have these additional schemas somewhere?

Digging through everything I have, I do not have the schemas, but it turns out that one of the technical packs has another document that uses them (and it's considerably longer), so I'm going to copy that across. But where are the schemas? It turns out that xbrl.org has put the schemas exactly where they declare them to be. Who knew?

There are quite a few of them; a bit of manual downloading followed by writing a short script to curl more of them down later, and we have at least the ixbrl schema. I think I may need to go back and download others. When I run xmllint on my ixbrl sample, I get schema validation errors. I fixed a number of them (which were just spaces in tokens that I had not removed from the PDF), and one seemed a random token (had I copied it in by accident)?

OK, let's try submitting again:
---- xmllint output
submit.xml validates
----
2 error(s) reported:
  Code Raised By Location Type Message
  3001 Department business The submission of this document has failed due to departmental specific business logic in the Body tag.
  4999 ChRIS /hd:GovTalkMessage[1]/hd:Body[1]/ct:IRenvelope[1]/ct:CompanyTaxReturn[1]/ct:AttachedFiles[1]/ct:XBRLsubmission[1]/ct:Accounts[1]/ct:Instance[1]/ct:InlineXBRLDocument[1] schema Internal schema validation error
OK, no different. Well, let's check all those schemas in under the tag CT600_XBRL_SCHEMAS anyway. I'm sure it's better validation (it certainly takes longer; I'd started to think I'd created an infinite loop). I included more of these in importer.xml but it seems that a lot of them are already included by something else, so I commented them out again.

Let's try using the other file I found.
---- xmllint output
submit.xml validates
----
101 error(s) reported:
  Code Raised By Location Type Message
  3001 Department business The submission of this document has failed due to departmental specific business logic in the Body tag.
  0 ChRIS Accounts xbrl.core.xml.SchemaValidationError.UndeclaredPrefix UndeclaredPrefix: Cannot resolve 'iso4217:GBP' as a QName: the prefix 'iso4217' is not declared.
  0 ChRIS Accounts xbrl.core.xml.SchemaValidationError.cvc-type31_3 cvc-type.3.1.3: The value 'iso4217:GBP' of element 'xbrli:measure' is not valid.
  0 ChRIS Accounts xbrl.core.xml.SchemaValidationError.UndeclaredPrefix UndeclaredPrefix: Cannot resolve 'ct-comp:CompanyName' as a QName: the prefix 'ct-comp' is not declared.
...
(I won't bore you with all 101; there is a lot of similarity here.)

I don't know why this one gives detailed errors while the other one didn't, but what stands out to me here is that it is telling me that I don't have schemas defined, when the xbrl file does. What gives?

Well, we're still saving the submissions as we send them (mainly so that we can check them against the schema locally). So we can look at what we're actually submitting. And something immediately jumps out at me. When we look at the attachment, it looks like this:
         <XBRLsubmission>
            <Accounts>
              <Instance>
                <InlineXBRLDocument>
<html xmlns="http://www.w3.org/1999/xhtml" version="-//XBRL International//DTD XHTML Inline XBRL 1.0//EN">
   <head>
      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"></meta>
      <title>Sample iXBRL instance</title>
   </head>
All the namespace declarations have been removed! No wonder it's complaining that they're not there. So the question is, where do I attach them? Let's try with the iso4217 first and try attaching it to the GovTalkMessage first, since I normally see them attached to the top level element.

So now submit.xml starts like this:
<GovTalkMessage xmlns="http://www.govtalk.gov.uk/CM/envelope" xmlns:iso4217="http://www.xbrl.org/2003/iso4217">
and the response starts like this:
101 error(s) reported:
  Code Raised By Location Type Message
  3001 Department business The submission of this document has failed due to departmental specific business logic in the Body tag.
  0 ChRIS Accounts xbrl.core.xml.SchemaValidationError.UndeclaredPrefix UndeclaredPrefix: Cannot resolve 'iso4217:GBP' as a QName: the prefix 'iso4217' is not declared.
OK, that's no improvement. Let's try adding it onto the IRenvelope instead.

No, again, the canonicalisation process seems to remove it from there. Time to go back to Google.

Google offers that there is an iXBRL validator offered by Companies House, which, while part of the government, is different from the HMRC tax people. Let's try that.

That fails. Your file is not valid. Most telling, it says that it cannot obtain the schemaRef referred to by http://www.hmrc.gov.uk/schemas/ct/comp/2023-01-01/ct-comp-2023.xsd. But that URL is a schema.

Wait a moment ... that URL is the actual schema location; I thought we were using the fake schema names. Looking through the file, I can see that there is a section in ix:header that explicitly links in the schema location:
           <ix:references>
               <link:schemaRef xlink:type="simple"
                               xlink:href="http://www.hmrc.gov.uk/schemas/ct/comp/2023-01-01/ct-comp-2023.xsd"/>
            </ix:references>
And given that the xsd has a targetNamespace which links back to the "fake" name for the schema, all should be good. So now there are about three things I do not understand:
  • How to submit my CT-600 with these external namespaces;
  • Why the ixbrl validation fails for the sample files;
  • Why, specifically, it complains that a schema for which it can find the correct xsd path cannot be found.
I sense I am inevitably heading down the path of emailing the support team again. But before I do, there is an "escape hatch" (described on page 7 of the xbrl style guide) to submit "encoded" accounts if you are doing something with them that can't be embedded. I'm not sure what cases they are really trying to cover there, but it's worth trying to base64 encode this file and submit it within an EncodedInlineXBRLDocument tag.

At the end of the day, nothing I read, including the xbrl style guide and the XBRL Technical Pack helped me at all. I'm stuck. Time to email the SDST again.

Conclusion

I don't think I made any real progress here at all. I made some random changes and have checked them in as CT600_ATTEMPTS_AT_IXBRL

Tuesday, July 22, 2025

Including Accounts


I'm starting to feel a little bit of deadline pressure.

I need to get my accounts submitted by the end of September, and it's now the middle of July. If that doesn't sound like deadline pressure to you, then you're not me 🙂

The sample XML I have been working from up until now has been called no-attach.xml because it doesn't have any attachments. The accounts are an attachment, so we need to look at another example that has that. If it seems to you that I've suddenly changed my focus from CT600 to attachments, it's because my focus at the moment is on getting something complete that HMRC will accept; I can then come back and make something that reflects my accounts and does the appropriate CT600 computations; while that will require more configuration and coding, I'm confident that I can do that, and I don't think that's very interesting. On the other hand, I suspect that there is significant risk in trying to attach something to the return.

I have checked the HMRC sample with accounts attachments in as ixbrl-accounts.xml. This is basically what it has that the other one doesn't:
                        <ThisPeriodAccounts>yes</ThisPeriodAccounts>
...
                <AttachedFiles>
                    <XBRLsubmission>
                        <Accounts>
                            <Instance>
                                <InlineXBRLDocument>
                                    <!--  Place your iXBRL accounts here -->
                                </InlineXBRLDocument>
                            </Instance>
                        </Accounts>
                    </XBRLsubmission>
                </AttachedFiles>

CT600_SAMPLE_ATTACHMENTS:accounts/ct600/ixbrl-accounts.xml

This is less enlightening than I had hoped. An example of the ixbrl would be nice.

iXBRL

I struggled to find examples out there of iXBRL accounts. There is an organisation that plugs iXBRL, and it has some resources, such as the specification but the only useful example I could find was a sample in PDF on the Companies House website. I would have preferred an XHTML example, but I have extracted the text from the PDF and checked it in as ixbrl-sample.xml (we will use this in a bit as our first cut at a document to include) and we can have a look at it.

This is an outline:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
    xmlns:ix="http://www.xbrl.org/2008/inlineXBRL"
    xmlns:link="http://www.xbrl.org/2003/linkbase" 
    xmlns:xlink="http://www.w3.org/1999/xlink" 
    xmlns:xbrldi="http://xbrl.org/2006/xbrldi" 
    xmlns:iso4217="http://www.xbrl.org/2003/iso4217" 
    xmlns:uk-gaap-pt="http://www.xbrl.org/uk/gaap/core/2009-09-01" 
    xmlns:uk-gaap-cd-bus="http://www.xbrl.org/uk/cd/business/2009-09-01" 
    xmlns:uk-gaap-rp-dir="http://www.xbrl.org/uk/reports/direp/2009-09-01"
    xmlns:ixt="http://www.xbrl.org/2008/inlineXBRL/transformation" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://www.w3.org/1999/xhtml http://www.xbrl.org/2008/inlineXBRL/xhtml-inlinexbrl-1_0.xsd">
  <head>
    <meta name="generator" content="Companies House WebFiling - AA02 Form"/>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style type="text/css">
      ...
    </style>
    <title>Dormant Company Accounts - CH Sample Company Ltd</title>
  </head>
  <body>
    <div class="hidden">
      <ix:header>
        <ix:hidden>
          ...
        </ix:hidden>
        <ix:references>
          <link:schemaRef xlink:type="simple" xlink:href="http://www.xbrl.org/uk/uk-gaap-main-2009-09-01.xsd"/>
        </ix:references>
        <ix:resources>
          ...
        </ix:resources>
      </ix:header>
    </div>
    <div class="titlepage accountspage pagebreak title">
      <p>Registered Number 12345678</p>
      <p>CH Sample Company Ltd</p>
      <p>Dormant Accounts</p>
      <p>31 December 2011</p>
      <p class="dottedLine"/>
    </div>
    <div class="accountspage">
      <div class="accountsheader">
        <h2 class="print">CH Sample Company Ltd<span class="right">Registered Number 12345678</span></h2>
        <p class="print spacer"/>
      </div>
      <div id="balancesheet">
        ...
      </div>
      <div id="statements">
        ...
      </div>
      <div id="approval">
        ...
      </div>
    </div>
  </body>
</html>

CT600_SAMPLE_ATTACHMENTS:accounts/ct600/ixbrl-sample.xml

It starts off reasonably enough. XHTML docs are supposed to have a DOCTYPE, but I will accept they chose not to do that because it is for inline use. Note all the schema definitions. I wonder if I have those schemas in my technical packs somewhere or if I can download them somewhere. We will need schemas for anything that we want to run through the validator.

To comply with the "inline xbrl" rules, the document must be fully self-contained. Since it isn't allowed to have Javascript anyway, linking in script files is not an issue. On the other hand, all the CSS needs to be included as an inline style. For us, I think it will make sense to copy those in from files provided to us in the configuration when we're ready to do that.

I don't know about you, but this ix:header block seems weird to me. I was expecting everything to be nested inside the visible elements. Why are there things that are sent electronically but would not be in the paper copy? Especially when you are already embedded in an envelope that handles all the authentication?

I was very close to the end of the specification before I realized that there is something weird going on here. Weird, but probably very sensible when you understand it.

I am going to suggest that ixbrl is akin to literate programming, and the ixbrl content we submit is the woven version of the program. There is a process by which the "original code" - in this case, a straightforward xbrl XML document, can be extracted. This needs more things than the paper copy in order to comply with a schema or something, and it needs those things in a particular order. Thus the ixbrl needs to contain instructions on how to do that; we will presumably come back to that when we are generating our accounts. Whatever this process is, it is internal to the HMRC, so I'm not sure whether we can duplicate it, or whether we will just have to deal with any errors when we submit our ixbrl attachments.

In addition to submitting accounts, there is also the requirement (or opportunity?) to submit "tax computations" as an ixbrl. If you thought there weren't many examples of ixbrl accounts out there, you'll be even more disappointed by the tax computations.

All I could find was a picture in a slide deck and an example from a tax preparer.

Fortunately, I also have my previous years' accounts which I believe are in ixbrl, so hopefully that will help. I also found this guide which has quite a lot of useful information in it.

If we can generate the ixbrl for the tax computations, it's at least clear what to do with it in the submitted XML:
                        <ThisPeriodComputations>yes</ThisPeriodComputations>
...
                <AttachedFiles>
                    <XBRLsubmission>
                        <Computation>
                            <Instance>
                                <InlineXBRLDocument>
                                    <!--  Place your iXBRL comps here -->
                                </InlineXBRLDocument>
                            </Instance>
                        </Computation>
                    </XBRLsubmission>
                </AttachedFiles>

CT600_SAMPLE_ATTACHMENTS:accounts/ct600/ixbrl-computations.xml

Submitting what We Have

I am now thoroughly bored of reading what I can find and the fruitless searching for other things. So I want to return to writing code.

At the moment, we just have the "no accounts" message hacked in during accounts:
func accounts() any {
    return ElementWithNesting("Accounts", ElementWithText("NoAccountsReason", "Not within charge to CT"))
}

func computations() any {
    return ElementWithNesting("Computations", ElementWithText("NoComputationsReason", "Not within charge to CT"))
}

CT600_SAMPLE_ATTACHMENTS:accounts/internal/ct600/govtalk/irenvelope.go

We need to change this to support variable content. Starting at the top, we can add new customization to the IREnvelope (this is still hardcoded, but it drives the rest of the layers):
func Submit(conf *config.Config) error {
    utr := conf.Utr
    if utr == "" {
        utr = conf.Business.TaxNum
    }
    ctr := &govtalk.IRenvelope{Business: conf.Business, ReturnType: "new",
        Sender: "Company", // the type of business we are, I believe.  The schema limits it to a handful of options

        UTR:         utr,
        PeriodStart: "2021-04-01", PeriodEnd: "2022-03-31",
        Turnover: 100000.0, TradingProfits: 0, LossesBroughtForward: 0, TradingNetProfits: 0,
        CorporationTax: 0,

        AccountsIXBRL: "ct600/ixbrl-sample.xml",
        NoComputationsReason: "Not within charge to CT",
    }
    submitOptions := &govtalk.EnvelopeOptions{Qualifier: "request", Function: "submit", IncludeSender: true, IncludeKeys: true, IncludeBody: true, IRenvelope: ctr}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/submission/submit.go

This is then analyzed in the IREnvelope AsXML method:
type IRenvelope struct {
    Business   config.Business
    UTR        string
    ReturnType string

    PeriodStart string
    PeriodEnd   string
    Sender      string

    Turnover             float64
    TradingProfits       float64
    LossesBroughtForward float64
    TradingNetProfits    float64
    CorporationTax       float64

    NoAccountsReason     string
    AccountsIXBRL        string
    NoComputationsReason string
    ComputationsIXBRL    string
}

func (ire *IRenvelope) AsXML() any {
    if ire.NoAccountsReason == "" && ire.AccountsIXBRL == "" {
        log.Fatalf("Must give accounts or a reason not to")
    }
    if ire.NoAccountsReason != "" && ire.AccountsIXBRL != "" {
        log.Fatalf("Must EITHER give accounts OR a reason not to")
    }
    if ire.NoComputationsReason == "" && ire.ComputationsIXBRL == "" {
        log.Fatalf("Must give computations or a reason not to")
    }
    if ire.NoComputationsReason != "" && ire.ComputationsIXBRL != "" {
        log.Fatalf("Must EITHER give computations OR a reason not to")
    }

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/irenvelope.go

And used to build the summary, and then the attachments are added to the end of the overall return:
    summary := ElementWithNesting("ReturnInfoSummary", ire.accounts(), ire.computations())
    turnover := ElementWithNesting("Turnover", ElementWithText("Total", fmt.Sprintf("%.2f", ire.Turnover)))
    calc := ElementWithNesting("CompanyTaxCalculation", ire.taxCalc())
    too := ElementWithNesting("CalculationOfTaxOutstandingOrOverpaid", ire.cotoo())
    decl := ElementWithNesting("Declaration", decl())
    attachments := ire.figureAttachments()
    var attach any
    if attachments != nil {
        attach = ElementWithNesting("AttachedFiles", ElementWithNesting("XBRLsubmission", attachments))
    }
    ctr := MakeCompanyTaxReturn(ire.ReturnType, ci, summary, turnover, calc, too, decl, attach)
    return MakeIRenvelopeMessage(irh, ctr)
}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/irenvelope.go

The excuses are now conditional:
func (ire *IRenvelope) accounts() any {
    if ire.NoAccountsReason != "" {
        return ElementWithNesting("Accounts", ElementWithText("NoAccountsReason", ire.NoAccountsReason))
    } else {
        return ElementWithNesting("Accounts", ElementWithText("ThisPeriodAccounts", "yes"))
    }
}

func (ire *IRenvelope) computations() any {
    if ire.NoComputationsReason != "" {
        return ElementWithNesting("Computations", ElementWithText("NoComputationsReason", ire.NoComputationsReason))
    } else {
        return ElementWithNesting("Computations", ElementWithText("ThisPeriodComputations", "yes"))
    }
}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/irenvelope.go

And code appears at the end to generate XML elements for the attached accounts and computations:
func (ire *IRenvelope) figureAttachments() []any {
    ret := []any{}
    if ire.AccountsIXBRL != "" {
        acxml := ElementWithNesting("Accounts", ElementWithNesting("Instance", ContentFromFile("InlineXBRLDocument", ire.AccountsIXBRL)))
        ret = append(ret, acxml)
    }
    return ret
}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/irenvelope.go

We need some more tricks from Go's book of tricks for manipulating XML to get this to work by just including a file from the disk (which is probably only temporary as we expect to generate this later, but we will probably end up inlining CSS):
type ContentElement struct {
    XMLName xml.Name
    Content []byte `xml:",innerxml"`
}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/xml.go

The innerxml here says that this is a "string" which contains formatted XML, so don't reformat it.
func ContentFromFile(tag, filename string) *ContentElement {
    fp, err := os.Open(filename)
    if err != nil {
        log.Fatalf("Could not read %s: %v", filename, err)
    }
    defer fp.Close()
    bs, err := io.ReadAll(fp)
    if err != nil {
        log.Fatalf("Could not read %s: %v", filename, err)
    }
    return &ContentElement{XMLName: xml.Name{Local: tag}, Content: bs}
}

CT600_IXBRL_ACCOUNTS:accounts/internal/ct600/govtalk/xml.go

And we can create one of these by reading a file and not doing sensible error handling because I can't be bothered right now (it may appear at some point).

We can now run through that submission and get to this point when we check the response:
2 error(s) reported:
  Code Raised By Location Type Message
  3001 Department business The submission of this document has failed due to departmental specific business logic in the Body tag.
  4999 ChRIS /hd:GovTalkMessage[1]/hd:Body[1]/ct:IRenvelope[1]/ct:CompanyTaxReturn[1]/ct:AttachedFiles[1]/ct:XBRLsubmission[1]/ct:Accounts[1]/ct:Instance[1]/ct:InlineXBRLDocument[1] schema Internal schema validation error
OK, given that it does appear to validate on my end (before we submit it), we have to figure that the problem here is that the inline document itself fails schema validation. It would be nice if it gave a crisper error. It would be nice if there were more throughly worked examples. Within the realm of reality though, it would be nice if I had more schema definitions to check it against. I can at least go and look for those.

Conclusion

It wasn't easy, but it wasn't hard to include the accounts ixbrl document in the document we submitted to the government. But, for some reason, it was rejected there. I need to look into that.

Monday, July 21, 2025

Notifying Watchers


I think I've reached the point where I've actually done all the Neptune experiments that I really need to do.

Now I'm going to start moving on to building a dummy app that tests that I can do this for real. I'm not going to present all this code because there will be a lot of it and a lot of it is orthogonal to what I really want to do. But it feels that there is one thing that is still relevant to this discussion and that is being aware of connected users.

Before I go any further, I just want to make a few observations that I have noticed in getting to this point.

First off, I'm disappointed that when you choose "serverless" mode for your database, the only way of actually not having a server running is to shut down the database instance. It apparently knows when it is "idle", but still charges you. We're not talking vast sums of money here, but that is different to how (say) Dynamo works.

Secondly, in the introduction I said that I like the way relational databases model relationships but that "they don't scale". It would seem that Neptune is, in fact, built on top of a relational database, not from the ground up. I'm not sure, therefore, what does happen if you try and build a Neptune Cluster across regions or availability zones. I should at least research that if not experiment with that.

Thirdly, my reading of the documentation had led me to believe that I needed two engines: a "writer" and a "reader". That's a fallacy. You need a "primary" (which is both a writer and a reader) and then you can scale by adding more readers. You cannot add multiple writers. So I have shut down one of my two engines.

Tracking Connections

I'm planning on deploying my application to AWS using APIGateway and Lambda. APIGateway (v2) allows websocket connections and, when a websocket connects, provides you with a unique handle that you can later recover to send "unsolicited" messages on (as opposed to replies). I want to store that handle in the neptune graph. Each user can have multiple of these (they could be logged on from a computer and a phone, for instance). From my previous experience, this is just a string, but we will have a node type of Endpoint which could have multiple properties but for now we are just going to model a connectionId.

So I want to do three things:
  • Add a method to "connect" a user by adding a new Endpoint node with a given connectionId in Neptune.
  • Add a mehtod to "disconnect" a user by removing an existing Endpoint with a given connectionId from Neptune.
  • Update the FindWatchers logic to return a list of pairs (username, connectionId).
In the fulness of time, these will just come from the lambda code, but for now I'm going to add a new main() program endpoint which takes three arguments:
  • c to connect or d to disconnect;
  • a userid;
  • a connectionid.
Remember that this code is not expected to generate a unique id; in the case we are considering, that is given to us by APIGateway when the user connects.

Going back to curl, I came up with this to implement connecting:
$ curl https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=
  MATCH (u:User {username:$username})
  CREATE (e:Endpoint {connId:$connId})
  CREATE (u)-[r:endpoint]->(e) RETURN e, r'
-d 'parameters={"username": "user003", "connId": "xx-cc-3"}'
{
  "results": [{
      "e": {
        "~id": "83617f37-c299-4c43-9d56-58e7d97c6b12",
        "~entityType": "node",
        "~labels": ["Endpoint"],
        "~properties": {
          "connId": "xx-cc-3"
        }
      },
      "r": {
        "~id": "5e23d753-f4c2-4ca4-91be-0f366374ad0f",
        "~entityType": "relationship",
        "~start": "492f8845-e1e3-4592-8fb8-27fdeb9351e1",
        "~end": "83617f37-c299-4c43-9d56-58e7d97c6b12",
        "~type": "endpoint",
        "~properties": {}
      }
    }]
}
It took me a while to figure this out. I was trying to understand how I could "create" a node and a relationship at the same time; eventually I realized that this is a "program" and so I can do multiple steps: first, find the user node; then create the new endpoint node with its associated connection id; and then create a relationship between them. I opted to return the answers so that I can see that it worked; it's my expectation that if no user can be found, no new node will be created. I checked and this is correct:
$ curl https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=
  MATCH (u:User {username:$username})
  CREATE (e:Endpoint {connId:$connId})
  CREATE (u)-[r:endpoint]->(e) RETURN e, r'
-d 'parameters={"username": "userNOTFOUND", "connId": "xx-cc-3"}'
{"results":[]}
So in my code I can report if the user matched or not.

Showing Endpoints

Given we have forced an endpoint in there "by hand", I can now make the fairly simple updates to watchers to return the endpoint along with the user:
func FindStockWatchers(db string, stock string) ([]*Connection, error) {
    svc, err := openNeptune(db)
    if err != nil {
        return nil, err
    }
    query := `
    MATCH (u:User)-[r]->(s:Stock {symbol:$symbol})
    MATCH (u)-[]->(e:Endpoint)
    RETURN u.username, s.symbol, e.connId
    `
    params := fmt.Sprintf(`{"symbol": "%s"}`, stock)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(query), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return nil, err
    }

    results, err := unpack(out.Results)
    if err != nil {
        return nil, err
    }
    var ret []*Connection
    for _, m := range results {
        ret = append(ret, &Connection{User: m["u.username"].(string), ConnectionId: m["e.connId"].(string)})
    }

    return ret, nil
}

NEPTUNE_SHOW_ENDPOINTS:neptune/internal/neptune/findWatchers.go

I have highlighted the changes. It may be that it is possible to do all of the matching in one MATCH expression, but certainly my openCypher fu is not up to the task. Instead, it's possible to find all the users that have a stock, and then all the endpoints for "that" user. This should return all the endpoints for all users watching the stock (along with the user id). Because we are now returning a pair, we need to declare a struct for that:
package neptune

import "strings"

type Connection struct {
    User         string
    ConnectionId string
}

NEPTUNE_SHOW_ENDPOINTS:neptune/internal/neptune/connection.go

In the main function for watchers, we need to receive this list of connections and display them appropriately:
func main() {
    if len(os.Args) < 2 {
        log.Printf("Usage: watchers <stock>")
        return
    }
    stock := os.Args[1]
    watchers, err := neptune.FindStockWatchers("user-stocks", stock)
    if err != nil {
        panic(err)
    }
    if len(watchers) == 0 {
        fmt.Printf("no watchers found\n")
        return
    }
    slices.SortFunc(watchers, neptune.OrderConnection)
    curr := ""
    fmt.Printf("Stock %s watched by:\n", stock)
    for _, w := range watchers {
        if w.User != curr {
            fmt.Printf("  %s\n", w.User)
            curr = w.User
        }
        fmt.Printf("    connected at %s\n", w.ConnectionId)
    }
}

NEPTUNE_SHOW_ENDPOINTS:neptune/cmd/watchers/main.go

In order to sort the Connections, we need to provide a comparison function, which I've put in the same file as the Connection:
func OrderConnection(left, right *Connection) int {
    ret := strings.Compare(left.User, right.User)
    if ret != 0 {
        return ret
    }
    return strings.Compare(left.ConnectionId, right.ConnectionId)
}

NEPTUNE_SHOW_ENDPOINTS:neptune/internal/neptune/connection.go

And when we run it, we see this output:
Stock UPM6 watched by:
  user003
    connected at xx-cc-3

Deleting Endpoints

When the user disconnects, we will want to delete the associated endpoint. We should be able to find it by matching on connId and then "calling" $DELETE":
$ curl https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=
  MATCH (e:Endpoint {connId:$connId})
  DELETE (e)
  RETURN e'
-d 'parameters={"connId": "xx-cc-3"}'
{
  "code": "BadRequestException",
  "requestId": "4cf7d73a-332c-4cbf-be7c-88e1df36e933",
  "detailedMessage": "Cannot delete node, because it still has relationships. To delete this node, you must first delete its relationships.",
  "message": "Cannot delete node, because it still has relationships. To delete this node, you must first delete its relationships."
}
Unsurprisingly, we are not allowed to delete a node which still has relationships, and this is linked to the user node. However, according to the cheat sheet openCypher has a keyword DETACH to deal with this exact situation.
$ curl https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=
  MATCH (e:Endpoint {connId:$connId})
  DETACH DELETE (e)
  RETURN e'
-d 'parameters={"connId": "xx-cc-3"}'
{
  "results": [{
      "e": {
        "~id": "83617f37-c299-4c43-9d56-58e7d97c6b12",
        "~entityType": "node",
        "~labels": [],
        "~properties": {}
      }
    }]
}
It's interesting to me that this node shows up with the --id that it had when we created it, but its label and properties have been removed before returning it.

Fixing Watchers with no Endpoints

When we run the watchers program again, we see:
no watchers found
Now, I realize at this point that I have significantly changed the semantics of watchers here. When updating FindWatchers above, I was focused on getting the connection back, and that worked for the user with a connection, but now I notice that I am not seeing the watchers on here who are not connected. While that is fine if all I am interested in is notifying connected users, it's not what I intended to happen. (Yes, yes, regression tests, I know.)

The problem, of course, if that I have what in relational terms would be called an "inner" join, but I want a "left" join. Can I do that in openCypher? You betcha. Going back to the cheat sheet, there is a special section on OPTIONAL MATCH which is exactly what we want. If it's there, it's included. If it's not, null comes back in its place. Let's add that.

First, in the query portion:
func FindStockWatchers(db string, stock string) ([]*Connection, error) {
    svc, err := openNeptune(db)
    if err != nil {
        return nil, err
    }
    query := `
    MATCH (u:User)-[r]->(s:Stock {symbol:$symbol})
    OPTIONAL MATCH (u)-[]->(e:Endpoint)
    RETURN u.username, s.symbol, e.connId
    `
    params := fmt.Sprintf(`{"symbol": "%s"}`, stock)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(query), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return nil, err
    }

    results, err := unpack(out.Results)
    if err != nil {
        return nil, err
    }
    var ret []*Connection
    for _, m := range results {
        connId := ""
        cid := m["e.connId"]
        if cid != nil {
            connId = cid.(string)
        }
        ret = append(ret, &Connection{User: m["u.username"].(string), ConnectionId: connId})
    }

    return ret, nil
}

NEPTUNE_OPTIONAL_ENDPOINT:neptune/internal/neptune/findWatchers.go

My feeling is that this code is complicated by the way Go handles casting and empty strings, but it is just a question of adding steps and lines rather than explicit complexity. Anyway, the key thing is that when there is no endpoint, the map exists and contains u.username but there is no entry for e.connId, so it is impossible to cast it to string.

Much the same happens in the main routine:
func main() {
    if len(os.Args) < 2 {
        log.Printf("Usage: watchers <stock>")
        return
    }
    stock := os.Args[1]
    watchers, err := neptune.FindStockWatchers("user-stocks", stock)
    if err != nil {
        panic(err)
    }
    if len(watchers) == 0 {
        fmt.Printf("no watchers found\n")
        return
    }
    slices.SortFunc(watchers, neptune.OrderConnection)
    curr := ""
    fmt.Printf("Stock %s watched by:\n", stock)
    for _, w := range watchers {
        if w.User != curr {
            fmt.Printf("  %s\n", w.User)
            curr = w.User
        }
        if w.ConnectionId != "" {
            fmt.Printf("    connected at %s\n", w.ConnectionId)
        }
    }
}

NEPTUNE_OPTIONAL_ENDPOINT:neptune/cmd/watchers/main.go

Here, we may get multiple rows back for any given user, but we will get at least one for each user watching the stock. But we are not guaranteed that there will be a valid ConnectionId; but Go guarantees that in that case there will be an empty string there. So we need to look for that and guard against printing an invalid endpoint.

And, finally, we are back to:
Stock UPM6 watched by:
  user003
  user015
When we have our connection code working, we can check all the other cases also work.

Encoding Connection Updates

We may be further along than it appears, but what we don't have yet is any code to add and delete connections. Let's fix that, starting with the new main() program endpoint:
package main

import (
    "log"
    "os"

    "github.com/gmmapowell/ignorance/neptune/internal/neptune"
)

func main() {
    if len(os.Args) < 3 {
        log.Printf("Usage: endpoint c <user> <connId>")
        log.Printf(" or    endpoint d <connId>")
        return
    }
    command := os.Args[1]
    var err error
    switch command {
    case "c":
        if len(os.Args) != 4 {
            log.Printf("Usage: endpoint c <user> <connId>")
            return
        }
        watcher := os.Args[2]
        connId := os.Args[3]
        err = neptune.ConnectEndpoint("user-stocks", watcher, connId)
    case "d":
        connId := os.Args[2]
        err = neptune.DisconnectEndpoint("user-stocks", connId)
    default:
        log.Printf("Usage: the command must be 'c' or 'd'")
        return
    }
    if err != nil {
        panic(err)
    }
}

NEPTUNE_ENDPOINT_CONNECTOR:neptune/cmd/endpoint/main.go

This is basically just a lot of arguments processing, but it comes down to calling one of two methods ConnectEndpoint and DisconnectEndpoint, which I have put in the same file.

Here is ConnectEndpoint, which should be fairly familiar:
package neptune

import (
    "context"
    "fmt"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/service/neptunedata"
)

func ConnectEndpoint(db string, watcher string, connId string) error {
    svc, err := openNeptune(db)
    if err != nil {
        return err
    }

    program := `
        MATCH (u:User {username:$username})
        CREATE (e:Endpoint {connId:$connId})
        CREATE (u)-[r:endpoint]->(e)
        RETURN e, r
`
    params := fmt.Sprintf(`{"username": "%s", "connId":"%s"}`, watcher, connId)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(program), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return err
    }

    ret, err := unpack(out.Results)
    if err != nil {
        return err
    }

    if len(ret) == 0 {
        return fmt.Errorf("no user found to connect: %s", watcher)
    }

    return nil
}

NEPTUNE_ENDPOINT_CONNECTOR:neptune/internal/neptune/endpoints.go

This is just our usual query infrastructure with the CREATE query we figured out in curl above, and processing to make sure that at least one row of results is returned, otherwise it generates an error that it (presumably) couldn't find the user.

And DisconnectEndpoint:
func DisconnectEndpoint(db string, connId string) error {
    svc, err := openNeptune(db)
    if err != nil {
        return err
    }

    program := `
        MATCH (e:Endpoint {connId:$connId})
        DETACH DELETE (e)
        RETURN e
`
    params := fmt.Sprintf(`{"connId":"%s"}`, connId)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(program), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return err
    }

    ret, err := unpack(out.Results)
    if err != nil {
        return err
    }

    if len(ret) == 0 {
        return fmt.Errorf("no connectionId found to disconnect: %s", connId)
    }

    return nil
}

NEPTUNE_ENDPOINT_CONNECTOR:neptune/internal/neptune/endpoints.go

This is the same thing again but with our other (DETACH DELETE) query.

Using this to create an endpoint, we see no feedback because it is written to only provide error feedback:
$ cmd/endpoint/endpoint c user003 xx-cc-3
But we can check that it connected successfully by running our watchers program:
Stock UPM6 watched by:
  user003
    connected at xx-cc-3
  user015
Which also confirms that that handles both the connected and unconnected cases.

And we can likewise run the disconnect version of endpoint and see it return to no endpoints:
$ cmd/endpoint/endpoint d xx-cc-3
$ cmd/watchers/watchers UPM6
Stock UPM6 watched by:
  user003
  user015

Conclusion

We have successfully managed to navigate the processes associated with having endpoint relationships: creating and deleting them, along with finding the endpoints associated with the users watching a stock. This is basically all the (database) code we need in order to implement a stock watching webapp.

Building such a webapp obviously requires a bunch of other code, mainly around wrapping in a lambda and building out a JavaScript client; and then requires more infrastructure in my deployer. I want to do this, but I'm not sure I've got the energy. On the other hand, I do need to implement a lot of that code in my deployer, so it's as good a testbed as anything.

Finding Watchers


Now I want to turn it around the other way and consider what happens when a stock price changes.

On this occasion, I have a fixed stock in mind, and want to find all the users who are watching that stock so that I can notify them that the price has changed.

Going back to curl, we can come up with a suitable query:
curl https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=MATCH (u:User)-[r]->(s:Stock {symbol:$symbol}) RETURN u.username, s.symbol' -d 'parameters={"symbol": "UPM6"}'
{
  "results": [{
      "u.username": "user003",
      "s.symbol": "UPM6"
    }, {
      "u.username": "user015",
      "s.symbol": "UPM6"
    }]
}
If you're surprised that user003 showed up here, don't be. I obtained this stock from the list we generated in the last episode - which was a list of stocks being watched by user003. It would be very strange if it did not show up.

So let's repeat our trick from last time and put that into code, starting with a main:
package main

import (
    "fmt"
    "log"
    "os"
    "slices"

    "github.com/gmmapowell/ignorance/neptune/internal/neptune"
)

func main() {
    if len(os.Args) < 2 {
        log.Printf("Usage: watchers <stock>")
        return
    }
    stock := os.Args[1]
    watchers, err := neptune.FindStockWatchers("user-stocks", stock)
    if err != nil {
        panic(err)
    }
    if len(watchers) == 0 {
        fmt.Printf("no watchers found\n")
        return
    }
    slices.Sort(watchers)
    fmt.Printf("Stock %s watched by:\n", stock)
    for _, w := range watchers {
        fmt.Printf("  %s\n", w)
    }
}

NEPTUNE_FIND_WATCHERS:neptune/cmd/watchers/main.go

This is simpler than before, since we have no need to go to dynamo to find out more information about the stock - since we already started with the stock in hand.

The code to query Neptune is basically just a copy-and-paste of the query to go the other way with the appropriate changes for the query above. I have highlighted the lines that are different.
package neptune

import (
    "context"
    "fmt"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/service/neptunedata"
)

func FindStockWatchers(db string, stock string) ([]string, error) {
    svc, err := openNeptune(db)
    if err != nil {
        return nil, err
    }
    query := `
    MATCH (u:User)-[r]->(s:Stock {symbol:$symbol})
    RETURN u.username, s.symbol
    `
    params := fmt.Sprintf(`{"symbol": "%s"}`, stock)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(query), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return nil, err
    }

    results, err := unpack(out.Results)
    if err != nil {
        return nil, err
    }
    var ret []string
    for _, m := range results {
        ret = append(ret, m["u.username"].(string))
    }

    return ret, nil
}

NEPTUNE_FIND_WATCHERS:neptune/internal/neptune/findWatchers.go

Yes, I agree: if they are so similar, we should probably refactor them to extract the commonalities. On this occasion, I would rather not do that because I'm not exactly sure what the commonalities are and what patterns I would use to put them back together. In short, I'm not sure how much shorter I could make it, and I think the loss of clarity would not be worth it.

And when we're done we end up with:
$ AWS_PROFILE=ziniki-admin cmd/watchers/watchers UPM6
Stock UPM6 watched by:
  user003
  user015

Conclusion

Once you've got things started, it's really easy to run Neptune queries. I'm still struggling with identifying and fixing syntax errors in the openCypher language, mainly because the error messages seem opaque. But as my understanding of the language grows, I'm getting better at it.

Sunday, July 20, 2025

Watching Stocks


I'll often approach software from the top down, but often from the bottom up.

I'm sure there's a lot of interestingly psychology behind this, but I think the simple answer is that it tells you where my focus is. On this occasion my focus is very much on the technology - and particularly on Neptune - and writing an application at all is not very interesting.

But I still think in terms of the application, sketch all of the code from the browser downwards in my head, until I figure out what the eventual method/lambda call would be. So right now I am thinking about a user, who, having logged in (i.e. we have their Username) wants to see the current prices of all the stocks they are watching.

In order to do this, we need to follow all the Watching relationships from the given user, then go back and look up the keys in Dynamo. It is obviously the first of these that is more interesting, but let's do both.

Going back to the cheat sheet, we can look at the code to match a relationship; I think the closest thing they have to what I want is:
MATCH (:Movie {title: 'Wall Street'})<-[:ACTED_IN]-(actor:Person)
RETURN actor.name AS actor
So I want something like this:
MATCH (:User {username: "user008"})-[:watching]->{s:Stock}
RETURN stock.symbol AS symbol
I think I can try this using curl and debug it there:
$ curl -k https://user-stocks.cluster-ckgvna81hufy.us-east-1.neptune.amazonaws.com:8182/openCypher -d 'query=MATCH (u:User {username:$username})-[r]->(s:Stock) RETURN u.username, s.symbol' -d 'parameters={"username": "user003"}'
{
  "results": [{
      "u.username": "user003",
      "s.symbol": "CWT5"
    }, {
      "u.username": "user003",
      "s.symbol": "AUOR"
    }, {
      "u.username": "user003",
      "s.symbol": "DRYM"
    }, {
      "u.username": "user003",
      "s.symbol": "PMA2"
    }, {
      "u.username": "user003",
      "s.symbol": "UDBO"
    }, {
      "u.username": "user003",
      "s.symbol": "GKXW"
    }, {
      "u.username": "user003",
      "s.symbol": "RTO2"
    }, {
      "u.username": "user003",
      "s.symbol": "IFC0"
    }, {
      "u.username": "user003",
      "s.symbol": "PFW9"
    }, {
      "u.username": "user003",
      "s.symbol": "POCG"
    }, {
      "u.username": "user003",
      "s.symbol": "ZKDI"
    }, {
      "u.username": "user003",
      "s.symbol": "IRI8"
    }, {
      "u.username": "user003",
      "s.symbol": "XEH1"
    }, {
      "u.username": "user003",
      "s.symbol": "UBD9"
    }, {
      "u.username": "user003",
      "s.symbol": "MAGS"
    }, {
      "u.username": "user003",
      "s.symbol": "OGCY"
    }, {
      "u.username": "user003",
      "s.symbol": "WUIU"
    }, {
      "u.username": "user003",
      "s.symbol": "DJBS"
    }, {
      "u.username": "user003",
      "s.symbol": "UPM6"
    }, {
      "u.username": "user003",
      "s.symbol": "ULB4"
    }, {
      "u.username": "user003",
      "s.symbol": "EUT6"
    }, {
      "u.username": "user003",
      "s.symbol": "LNRF"
    }]
}
So now let's try to put that in code. We need a new main program to drive this, which I'll call stockprices:
package main

import (
    "fmt"
    "log"
    "os"
    "slices"

    "github.com/gmmapowell/ignorance/neptune/internal/dynamo"
    "github.com/gmmapowell/ignorance/neptune/internal/neptune"
)

func main() {
    if len(os.Args) < 2 {
        log.Printf("Usage: stockprices <user>")
        return
    }
    user := os.Args[1]
    stocks, err := neptune.FindWatchedStocks("user-stocks", user)
    if err != nil {
        panic(err)
    }
    if len(stocks) == 0 {
        fmt.Printf("no stocks found\n")
        return
    }
    slices.Sort(stocks)
    prices, err := dynamo.FindStockPrices("Stocks", stocks)
    if err != nil {
        panic(err)
    }
    for _, s := range stocks {
        fmt.Printf("%s: %d\n", s, prices[s])
    }
}

NEPTUNE_STOCK_PRICES:neptune/cmd/stockprices/main.go

This calls the neptune engine to find all the stocks associated with the user specified in argument 1, and then goes to dynamo to find the prices of all these stocks. It then sorts the stocks by symbol and prints them out.

Turning to the dynamo code, we have the query from above embedded into a new function FindWatchedStocks:
package neptune

import (
    "context"
    "fmt"

    "github.com/aws/aws-sdk-go-v2/aws"
    "github.com/aws/aws-sdk-go-v2/service/neptunedata"
)

func FindWatchedStocks(db string, user string) ([]string, error) {
    svc, err := openNeptune(db)
    if err != nil {
        return nil, err
    }
    query := `
    MATCH (u:User {username:$username})-[r]->(s:Stock)
    RETURN u.username, s.symbol
    `
    params := fmt.Sprintf(`{"username": "%s"}`, user)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(query), Parameters: aws.String(params)}
    out, err := svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return nil, err
    }

    results, err := unpack(out.Results)
    if err != nil {
        return nil, err
    }
    var ret []string
    for _, m := range results {
        ret = append(ret, m["s.symbol"].(string))
    }

    return ret, nil
}

NEPTUNE_STOCK_PRICES:neptune/internal/neptune/findStocks.go

I'm not really sure what to say about this, except it is an embedding in Go of the query (with parameters) I showed above. We then go through all the results and extract the fields s.symbol and build a list. The user is the same in all the entries because we specified that in the query parameters.

We then pass this list to the FindPrices function in the dynamo package:
package dynamo

import (
    "context"
    "strconv"

    "github.com/aws/aws-sdk-go-v2/service/dynamodb"
    "github.com/aws/aws-sdk-go-v2/service/dynamodb/types"
)

func FindStockPrices(table string, stocks []string) (map[string]int, error) {
    svc, err := openDynamo()
    if err != nil {
        return nil, err
    }
    var keys []map[string]types.AttributeValue
    var attrs []string
    for _, s := range stocks {
        key := make(map[string]types.AttributeValue)
        key["Symbol"] = &types.AttributeValueMemberS{Value: s}
        keys = append(keys, key)
    }
    attrs = append(attrs, "Symbol", "Price")
    tableRequest := make(map[string]types.KeysAndAttributes)
    tableRequest[table] = types.KeysAndAttributes{Keys: keys, AttributesToGet: attrs}
    out, err := svc.BatchGetItem(context.TODO(), &dynamodb.BatchGetItemInput{RequestItems: tableRequest})
    if err != nil {
        return nil, err
    }

    ret := make(map[string]int)
    for _, x := range out.Responses[table] {
        sym := x["Symbol"].(*types.AttributeValueMemberS).Value
        price := x["Price"].(*types.AttributeValueMemberN).Value
        ret[sym], err = strconv.Atoi(price)
    }
    return ret, nil
}

NEPTUNE_STOCK_PRICES:neptune/internal/dynamo/findPrices.go

This looks complicated, but that's only because the Dynamo API is so awful. For efficiency, we want to fetch all of the stock prices together, which involves a BatchGetItem call. This obviously requires multiple keys, but instead of just taking a list, it insists on having it carefully constructed. I'm sure there's a reason that applies to cases I have never considered.

Separately, it wants a list of attributes that you want to recover. For some reason, it doesn't (as far as I can see) give you the key back, so you have to say that you want the key fields (Symbol) as well as the attribute we want (Price).

And then we receive back a list of Responses (which I think of as Rows, but are more properly Documents). Each of these has an entry for each of the attributes requested, so we need to extract the Symbol and the Price and then we can build a map.

And when we've finished, we get this:
$ cmd/stockprices/stockprices user003
AUOR: 313
CWT5: 189
DJBS: 456
DRYM: 150
EUT6: 181
GKXW: 329
IFC0: 355
IRI8: 478
LNRF: 322
MAGS: 371
OGCY: 434
PFW9: 415
PMA2: 190
POCG: 477
RTO2: 428
UBD9: 424
UDBO: 180
ULB4: 105
UPM6: 239
WUIU: 257
XEH1: 173
ZKDI: 183

Conclusion

We have been able to traverse the relationship graph in Neptune and then use the responses from that to identify the current stock prices. It's easy to imagine how this could be shown in a web app (or on a command line tool).