Saturday, July 12, 2025

Fixing the IRmark


So the HMRC replied to me promptly and helpfully as usual.

In their reply, they included the following important information:
  • The IRmark is calculated from the payload of the submission, from (and including) the first <Body> element all the way to and including the last <Body> element. (From experimentation, I can confirm that this does not include any spaces before or after the tags).
  • They pointed me to this collection of information (see below).
  • They gave me the IRmark that their system was calculating from my submission, which makes matching so much easier.
Most importantly, the additional set of documents they sent me included a step by step guide to generating the IRmark which I had not managed to find myself.

Section 1.1 answers questions I had and addresses the main problem I had. In fact, if I had had this document last time, I think I would have made it to the finish line.

The IRmark is generated from the payload of the submission so this part of the XML must be extracted first. The payload is everything inside and including the <Body></Body> node. When you extract the body you must “inherit” any and all namespace declarations in the <GovTalkMessage> node and place them in the <Body> node.

I would never have thought to do this, but it is certainly something that I can do in generation. The other important thing, although phrased differently to the way I would say it, is also in section 1.1:

Finally, to prepare the XML you need to remove the IRmark node from the <Body>. However you choose to do this any data around the IRmark opening and closing tags e.g. white space, line-endings, tabs etc must be preserved.

In other words, when inserting the IRmark node, it's important not to also insert any additional whitespace.

Even more helpfully, this collection of information also includes a worked sample, including the submitted message, which includes the govtalk header and the calculated IRmark and the canonicalised Body which clearly shows the changes above and can be used to check the algorithm in full knowledge that it is correct.

This enabled me to "tinker" with my submission until I was able to generate the same IRmark that they did, which, after all, is the objective here.

So, now let's go and do the same thing with our generated code.

The Correct IRmark Implementation

So, we need to generate the <Body> first, and then we need to process that, generate the IRmark, manipulate the body (as text; the IRmark process is incredibly sensitive) and then wrap all of that in the GovTalk message. I'm going to present the code as it ended up, rather than any intermediate steps.
func (gtm *GovTalkMessage) AsXML() (*GovTalkMessageXML, error) {
...
    var body *SimpleElement
    var canonBody string
    if gtm.opts.IncludeBody {
        body = gtm.makeBody()
        var err error
        canonBody, err = canonicaliseBody(body)
        if err != nil {
            return nil, err
        }
    }

    gt := MakeGovTalkMessage(
        canonBody,
        env,
        ElementWithNesting("Header", msgDetails, sndrDetails),
        gtDetails)

    return gt, nil
}

CT600_CORRECT_IRMARK:accounts/internal/ct600/govtalk/govtalk.go

The key difference here is that we have put all the work of dealing with the IRmark - and then generating a text body - into the canonicaliseBody method. We have then put that body in the GovTalkMessage structure. It is used during the final generation of the XML text:
func Generate(conf *config.Config, options *govtalk.EnvelopeOptions) (io.Reader, error) {
    msg := govtalk.MakeGovTalk(options)
    msg.Identity(conf.Sender, conf.Password)
    msg.Utr(conf.Utr)
    msg.Product(conf.Vendor, conf.Product, conf.Version)
    m, err := msg.AsXML()
    if err != nil {
        return nil, err
    }
    bs, err := xml.MarshalIndent(m, "", "  ")
    if err != nil {
        return nil, err
    }
    bs, err = m.AttachBodyTo(bs)
    if err != nil {
        return nil, err
    }

    bs = []byte(string(bs) + "\n")

    err = checkAgainstSchema(bs)
    if err != nil {
        return nil, err
    }

    return bytes.NewReader(bs), nil
}

CT600_CORRECT_IRMARK:accounts/internal/ct600/submission/generate.go

And the process of attaching the body is to just use a placeBefore method that we'll look at in a bit more detail later on.
func (gtx *GovTalkMessageXML) AttachBodyTo(bs []byte) ([]byte, error) {
    return placeBefore(bs, "</GovTalkMessage>", gtx.canonBody)
}

CT600_CORRECT_IRMARK:accounts/internal/ct600/govtalk/xml.go

Now let's turn and look at canonicaliseBody.
func canonicaliseBody(from *SimpleElement) (string, error) {
    body := MakeBodyWithSchemaMessage(from.Elements...)

    // Generate a text representation
    bs, err := xml.MarshalIndent(body, "  ", "  ")
    if err != nil {
        return "", err
    }

    bs, err = placeBefore(bs, "<Sender>", "\n        ")
    if err != nil {
        return "", err
    }

    // now canonicalise that
    decoder := xml.NewDecoder(bytes.NewReader(bs))
    out, err := c14n.Canonicalize(decoder)
    if err != nil {
        return "", err
    }

    // Generate a SHA-1 encoding
    hasher := sha1.New()
    _, err = hasher.Write([]byte(out))
    if err != nil {
        return "", err
    }
    sha := hasher.Sum(nil)

    // And then turn that into Base64
    w := new(bytes.Buffer)
    enc := base64.NewEncoder(base64.StdEncoding, w)
    enc.Write(sha)
    enc.Close()

    // The string of this is the IRmark
    b64sha := w.String()

    // remove the "fake" schema
    bs, err = deleteBetween(out, "<Body", ">")
    if err != nil {
        return "", err
    }

    // Add the IRmark
    bs, err = placeBefore(bs, "\n        <Sender>", `<IRmark Type="generic">`+b64sha+"</IRmark>")
    if err != nil {
        return "", err
    }

    // Fix up whitespace around Body
    ret := "  " + string(bs) + "\n"

    return ret, err
}

CT600_CORRECT_IRMARK:accounts/internal/ct600/govtalk/govtalk.go

The main steps here are as indicated in the comments. I have highlighted the other points of note.

We copy the contents of the body into a special Body element that has an associated schema. We then marshal this body with an additional two-space indent, since we will be placing it inside the GovTalk element. We add the extra line (together with indent) for the <IRmark>. Then, after we have done all the calculation (and come up with the IRmark value in b64sha), we delete the schema from the body tag and insert the <IRmark> element in the right place.

And, then, finally, here are the placeBefore and deleteBetween methods which manipulate the XML text buffer directly:
func placeBefore(bs []byte, match string, insert string) ([]byte, error) {
    str := string(bs)
    s1 := strings.Index(str, match)
    if s1 == -1 {
        return nil, fmt.Errorf("did not find " + match)
    }
    str = str[0:s1] + insert + str[s1:]
    bs = []byte(str)
    return bs, nil
}

func deleteBetween(bs []byte, from string, to string) ([]byte, error) {
    canonBody := string(bs)
    j := strings.Index(canonBody, from)
    if j == -1 {
        return nil, fmt.Errorf("did not find " + from)
    }
    j += len(from)
    j1 := strings.Index(canonBody[j:], to)
    canonBody = canonBody[0:j] + canonBody[j+j1:]
    return []byte(canonBody), nil
}

CT600_CORRECT_IRMARK:accounts/internal/ct600/govtalk/govtalk.go

And, with a deep sigh of relief, I can say "aha! that submits!" and get a <SuccessResponse> back from the government.

Conclusion

With the help of the HMRC support team, I was able to get all of the IRmark code working. Hopefully it keeps working as we now move forward to update the CT600.

No comments:

Post a Comment