Wednesday, March 5, 2025

Weaving the Chains Together

The final step of the process by which we build the blockchain (before we turn our attention to testing and implementing on AWS) is to weave all these individual chains together.

At the moment, each node has generated its own transactions and its own blocks, and chained those blocks together. Each node has distributed its transactions and blocks to all the other nodes, and all of them have been stored in journals specific to the originating node: we have created multiple copies of all the transactions and blocks, and that "should" be enough, but there is one more step I want to go through.

At any given moment in time T, each node has produced a "latest" block and that should have been sent to all the other nodes. At a point of eventual consistency, all the nodes will have the same set of blocks which were the latest at time T. We can thus define a Weave as a structure that contains:
  • The unique timestamp associated with the Weave (i.e. the time T);
  • The previous Weave.ID, if any;
  • A list of all the nodes, together with the ID of the last Block they published before or at time T.
Note that for consistency, the list of block ids will be in alphabetical order by node name (i.e. the ASCII collation order of the URL strings of each node).

We can then hash this value and come up with a unique hash for "the overall weave at time T".

Separately (and not today), we will require all the nodes to provide a signature for that hash: why this is separate will become clear when we get there.

Defining the Weave

First things first: we need to define a type to hold the Weave:
package records

import "github.com/gmmapowell/ChainLedger/internal/types"

type Weave struct {
    ID           types.Hash
    ConsistentAt types.Timestamp
    PrevID       types.Hash
    LatestBlocks []NodeBlock
}

WEAVE_OUTLINE:internal/records/weave.go

This depends on the idea of a NodeBlock which is a pair of the node name and its latest block ID:
package records

import "github.com/gmmapowell/ChainLedger/internal/types"

type NodeBlock struct {
    NodeName      string
    LatestBlockID types.Hash
}

WEAVE_OUTLINE:internal/records/nodeblock.go

And then we need a class that can create the Weave. For hopefully obvious reasons, I've called this a Loom:
package loom

import (
    "github.com/gmmapowell/ChainLedger/internal/records"
    "github.com/gmmapowell/ChainLedger/internal/types"
)

type Loom struct {
}

func (loom *Loom) WeaveAt(when types.Timestamp) *records.Weave {
    ret := records.Weave{}
    return &ret
}

WEAVE_OUTLINE:internal/loom/loom.go

Copying the process we followed for building blocks, we also have a thread that will build the Weaves on a time basis:
package loom

import "github.com/gmmapowell/ChainLedger/internal/helpers"

type LoomThread interface {
    Start()
}

type IntervalLoomThread struct {
    clock helpers.Clock
    loom  *Loom
}

func (t *IntervalLoomThread) Start() {
    go t.Run()
}

func (t *IntervalLoomThread) Run() {
    t.loom.WeaveAt(t.clock.Time())
}

WEAVE_OUTLINE:internal/loom/loom_thread.go

(I have put a minimal skeleton in each of these, mainly to stop Go complaining about things not being used; none of this is actually used, because I haven't wired it in.)

Getting it Started

Following along with the model from the Blocker, we will call Start on this thread when we start up the node
package loom

import "github.com/gmmapowell/ChainLedger/internal/helpers"

type LoomThread interface {
    Start()
}

type IntervalLoomThread struct {
    clock helpers.Clock
    loom  *Loom
}

func (t *IntervalLoomThread) Start() {
    go t.Run()
}

func (t *IntervalLoomThread) Run() {
    t.loom.WeaveAt(t.clock.Time())
}

func NewLoom(clock helpers.Clock) LoomThread {
    loom := &Loom{}
    return &IntervalLoomThread{clock: clock, loom: loom}
}

WEAVE_START_THREAD:internal/loom/loom_thread.go

Now we need to make the thread call the Loom on a repeating basis. But when?

When Do We Weave?

It's vitally important that each Weave is generated on all the nodes using the same set of blocks, which means they all have to generate a block based on a specific moment in (Unix) time. To be clear, they don't need to actually weave at the same time, but they need to weave based on the data that was present at a single moment in time. And their clocks do not need to be in sync (although it helps if they are close) - they are working on the times at which each of the nodes said it did the work.

But how do they pick a time that they can all agree on?

This is one of the big problems in distributed software: agreeing on anything. Generally, you either need to designate one node the "leader", in which case you have a single point of failure (if that node goes down, who leads?) or you have to have some kind of system for deciding how to resolve different opinions.

On the other hand, if you can decide on a "rule" that you can build into the software, you can avoid all of these issues. In this case, there is an obvious rule to follow which will generate the same results on all the nodes. It is this:

Generate a weave at a regular heartbeat of Nms, and do it when the unix time on your local clock modulo N is zero.

That is, have a recurring heartbeat, which fires at least every Nms or so; when it does, take the current time and "clear out" the bottom Nms, so that the result is an exact multiple of N. If that weave hasn't been built yet, then build it now.

I'm going to take advantage of that check on previous work and have my clock go three times as fast as I want to weave: this reduces the possibility that I simply "miss" an interval and means that it will generally produce a weave within about N/6ms of the registered time.

For now, I am going to specify a value of 1000ms (or 1s) for N. This is very frequent and I don't think such a value would be appropriate for a production system; I would probably choose 60,000ms (or 1 minute). On the other hand, remember that until the final Weave is generated, you do not have guaranteed transactional integrity and non-repudiability, so a granularity of many minutes or hours would delay the time when anyone could declare the transaction "complete".

How Do We Specify This?

It's quite simple: this is a system-wide configuration parameter. The obvious thing to do is to put it right in the code, but since we have a configuration file, I'm going to put it there. Note that it's really important that you specify the same value on all nodes, and it would probably a good idea on startup for the nodes to agree on this value. An alternative would be to have this configuration - along with the names and public keys of all the nodes - in a central place. For now, I'm just going to put it at the top level of the harness configuration and make that work its way through all the nodes, although it's so dull I'm going to do it off camera.

The Loom Loop

With all that in place, we can now update the LoomThread:
func (t *IntervalLoomThread) Run() {
    delay := time.Duration(t.interval/3) * time.Millisecond
    timer := t.clock.After(delay)

    for {
        select {
        case weaveBefore := <-timer:
            weaveBefore = weaveBefore.RoundTime(t.interval)
            if !t.myjournal.HasWeaveAt(weaveBefore) {
                weave := t.loom.WeaveAt(weaveBefore)
                t.myjournal.StoreWeave(weave)
            }
        }
        timer = t.clock.After(delay)

    }
}

WEAVE_REGULAR_BUILD:internal/loom/loom_thread.go

RoundTime is a new method we have defined on types.Timestamp which returns a timestamp which is an exact multiple of the argument granularity:
func (ts Timestamp) RoundTime(granularity int) Timestamp {
    i64 := int64(ts)
    i64 /= int64(granularity)
    i64 *= int64(granularity)
    return Timestamp(i64)
}

WEAVE_REGULAR_BUILD:internal/types/timestamp.go

The Journaller is given the job of recording the weaves. We have two operations we are using here: HasWeaveAt and StoreWeave. Putting all of the plumbing to one side, the core of the code is handled here:
func LaunchJournalThread(name string, finj helpers.FaultInjection) chan<- JournalCommand {
    var txs []*records.StoredTransaction
    var blocks []*records.Block
    weaves := make(map[types.Timestamp]*records.Weave)
    ret := make(chan JournalCommand, 20)
...
            case JournalHasWeaveAtCommand:
                v.ResultChan <- weaves[v.When] != nil
            case JournalStoreWeaveCommand:
                weaves[v.Weave.ConsistentAt] = v.Weave
...
}

WEAVE_REGULAR_BUILD:internal/storage/journal_thread.go

We are storing the Weaves in a map based on their timestamp because this makes it very easy to find if there is one already defined for a given timestamp. This is the only case we have right now, and I'm thinking that any other cases we have will probably also want to access them based on timestamp, so I think it's a good fit, but if something comes up, we can always revisit that decision.

And when we run it we see that we are trying to weave together the blocks on a regular basis but not duplicating them:
2025/03/04 14:44:43 http://localhost:5001 weaving at 1741099483000
2025/03/04 14:44:43 http://localhost:5002 weaving at 1741099483000
2025/03/04 14:44:44 http://localhost:5001 weaving at 1741099484000
2025/03/04 14:44:44 http://localhost:5002 weaving at 1741099484000
2025/03/04 14:44:45 http://localhost:5001 weaving at 1741099485000
2025/03/04 14:44:45 http://localhost:5002 weaving at 1741099485000
(Note that each timestamp is an exact multiple of 1000.)

Actually Weaving the Blocks

We have built the loop and, to make it work, we have built something approximating a Weave, but we certainly haven't met the contract we set for ourselves. That is going to require quite a bit more work, most of which is more plumbing. I'll just quietly get on with that and just point to the highlights.

The Weave ID is the first thing in the struct, but it's the hardest thing to calculate, because it is a hash of all the other fields, so we need to assemble those first.

ConsistentAt is the timestamp we have been passed in.

PrevID seems fairly easy; we just need to keep track of each time we generate a Weave and record that. The first one will be nil or blank or something.

The block ids are, understandably, the most complicated piece. But let's take it slowly and it will all become clear.

We know we want a slice of NodeBlock, so we can create that:
func (loom *Loom) WeaveAt(when types.Timestamp, prev *records.Weave) *records.Weave {
    var prevID types.Hash
    if prev != nil {
        prevID = prev.ID
    }
    nbs := make([]records.NodeBlock, len(loom.allJournals))

WEAVE_LOOM_COMPLETE:internal/loom/loom.go

We then want a big loop which can figure out, for each of the nodes in the system that "this node" knows about, what the node name and ID of the most recent block recorded by that node are. We'll delegate figuring that out to the journal, and just assemble the pieces here:
    k := 0
    for n, j := range loom.allJournals {
        blk := j.LatestBlockBy(when)
        if len(blk) == 0 {
            return nil // it is not possible to weave if we don't have at least one block for every node
        }
        nbs[k] = records.NodeBlock{NodeName: n, LatestBlockID: blk}
        k++
    }

WEAVE_LOOM_COMPLETE:internal/loom/loom.go

If there is no latest block for a given node (this won't be true locally, but we might not have received the initial block from a remote node yet), then we simply can't calculate a valid Weave, and so we give up at that point and return nil.

A very important point comes next: in order for all the nodes to hash consistently, they must hash exactly the same set of bytes in the same order. Hash functions are (intentionally) very sensitive to even the smallest change in input. To make sure that all the nodes do the same thing, we sort the weave by node name (the sort function is shown below).
    sort.Slice(nbs, sortByName(nbs))

WEAVE_LOOM_COMPLETE:internal/loom/loom.go

Finally, we can assemble the Weave, calculate it's ID by hashing it (I have learnt from previous experience and just implemented HashMe as a function to begin with), and return it:
    sort.Slice(nbs, sortByName(nbs))
    ret := records.Weave{ConsistentAt: when, PrevID: prevID, LatestBlocks: nbs}
    ret.ID = ret.HashMe(loom.hf)
    return &ret
}

WEAVE_LOOM_COMPLETE:internal/loom/loom.go

How does that sort work? Well, the second argument to sort.Slice is a function which takes two parameters and asks "which of these two elements should come first?" Sadly, it does not also hand back the slice; it expects you to "capture" it, which is the normal case if you put the function inline. I didn't want to do that, so I needed to provide a function that returns a function. If you are used to functional programming (or even JavaScript) this might not seem an outrageous idea. For everybody else...
func sortByName(nbs []records.NodeBlock) func(i, j int) bool {
    return func(i, j int) bool {
        return nbs[i].NodeName < nbs[j].NodeName
    }
}

WEAVE_LOOM_COMPLETE:internal/loom/loom.go

So, sortByName is a function which takes a slice of NodeBlock records. It then returns a func. This is really just a way of passing an extra argument (nbs) to the inner function by "capturing" nbs in the outer scope.

The only other code I think that's worth talking about is the new journal method to find the latest block, although it's simple "Data Structures and Algorithms" stuff:
            case JournalLatestBlockByCommand:
                var last types.Hash
                var lastWhen types.Timestamp
                for _, b := range blocks {
                    if b.UpUntil <= v.Before && (last == nil || b.UpUntil > lastWhen) {
                        last = b.ID
                        lastWhen = b.UpUntil
                    }
                }
                v.ResultChan <- last
It scans through the list of blocks (if any) looking for one that is before the specified time, and is either the first such one it has seen, or is newer that any of the others it has seen. If it doesn't find any (either because there are no blocks, or because none of them qualify), it will return an empty (length 0) hash.

When we now run the harness, the last few lines are showing both nodes weaving away and generating the same Weave ID:
2025/03/05 10:23:02 http://localhost:5002 wove at 1741170182000: f3dc727521554a5e6a5cebcce2883b6be33a56f305eb00a84d199f91a9be096a03ec49b7e99146a7fc920655e24bd715b52d8e60f113f8d71e0bdd8e948c0072
2025/03/05 10:23:02 http://localhost:5001 wove at 1741170182000: f3dc727521554a5e6a5cebcce2883b6be33a56f305eb00a84d199f91a9be096a03ec49b7e99146a7fc920655e24bd715b52d8e60f113f8d71e0bdd8e948c0072
2025/03/05 10:23:03 http://localhost:5001 wove at 1741170183000: c8b135f5dae19ddade15c7debbf1a66e9eae90b7c6642df17b2b935586fa0c2937e7e5d30e20ab5fa021ab7af9fc33435cea3dff6fdf5b21372a5599cb17144a
2025/03/05 10:23:03 http://localhost:5002 wove at 1741170183000: c8b135f5dae19ddade15c7debbf1a66e9eae90b7c6642df17b2b935586fa0c2937e7e5d30e20ab5fa021ab7af9fc33435cea3dff6fdf5b21372a5599cb17144a
It might be objected that the IDs, while consistent across the nodes, seem to be changing every second, even though no messages have been published. This is, in fact, a bogus objection: there are no new messages or blocks, but the time the Weave was valid has changed. The ConsistentAt member of the Weave has bumped by 1000 and yes, that does make that much difference to a hash. The key thing is that the two nodes are generating the same Weave based on the same data.

Conclusion

And so we have successfully woven together all the blocks from all the nodes on all the nodes. But what we haven't done is signed off on them. As I've said a couple of times, if we want everything in the blockchain to be non-repudiatable, all the nodes need to:
  • Agree on the set of messages that have been published and that an approved node has signed them;
  • Incorporate all such messages must be incorporated into a valid, signed block;
  • Chain together all the blocks for the node;
  • Distribute all the messages and blocks to all the other nodes;
  • Weave together some of those blocks from all of the nodes at a consistent interval;
  • Sign off on the finished work.
It's that last step that we will tackle next time in order to declare that we are "code complete": on the happy path at least.

No comments:

Post a Comment