Saturday, July 12, 2025

Building a Graph

So now we turn the question of building a "serious" sized graph.

Not too serious, of course. I'm not even going to build something that wouldn't fit in memory. I just want to build something where you could visualize it not fitting in memory.

So the idea here is that we have an application for "watching" stock prices. There are only a limited number of stocks, but we would visualize several million users, each of whom is watching 20-30 stocks, so there are ~250M relationships. The amount of memory machines have these days, that might fit, but it is certainly big.

So, starting with main, what we want to do is delegate almost all the work, so we end up with two lines: one to create stocks and one to create users and link them to the stocks:
func main() {
    inserter, err := dynamo.NewInserter()
    if err != nil {
        log.Fatal(err)
    }

    nodeCreator, err := neptune.NewNodeCreator("user-stocks")
    if err != nil {
        log.Fatal(err)
    }

    stocks := model.CreateAndInsertStocks(inserter, nodeCreator, 2000)
    model.CreateInsertAndLinkUsers(inserter, nodeCreator, stocks, 100, 10, 30)
}

NEPTUNE_CREATE_NETWORK:neptune/cmd/create/main.go

This depends on a new package model which is where all the hard work is going to go on. First off, we have moved the definition of Stock and added a definition of User to types.go:
package model

type Stock struct {
    Symbol string
    Price  int
}

type User struct {
    Username string
}

NEPTUNE_CREATE_NETWORK:neptune/internal/model/types.go

Then we have a file to create the stocks, insert them into dynamo, create nodes in Neptune, and return a list of stocks so that the users will be able to look at them.
package model

import (
    "log"
    "math/rand/v2"

    "github.com/gmmapowell/ignorance/neptune/internal/dynamo"
    "github.com/gmmapowell/ignorance/neptune/internal/neptune"
)

func CreateAndInsertStocks(inserter *dynamo.Inserter, creator *neptune.NodeCreator, count int) []*Stock {
    // We could very easily generate randomly random data
    // But I quite like having reproducability
    r := rand.New(rand.NewPCG(1, 2))
    stocks := make([]*Stock, count)
    for i := 0; i < count; i++ {
        name := uniqueName(r, stocks[0:i])
        s := Stock{Symbol: name, Price: somePrice(r, 100, 500)}
        stocks[i] = &s
        log.Printf("stock %s %d\n", s.Symbol, s.Price)

        err := inserter.Insert("Stocks", s)
        if err != nil {
            log.Fatal(err)
        }

        err = creator.Insert("Stock", "symbol", s.Symbol)
        if err != nil {
            log.Fatal(err)
        }

    }
    return stocks
}

NEPTUNE_CREATE_NETWORK:neptune/internal/model/stocks.go

This is little different to the code we already had in main(). The obvious difference is that we now create a unique name for each stock (see below), and build a list of stocks to return to the user.

Unique name creation is just random name creation combined with looping if we've seen the name before.
func uniqueName(r *rand.Rand, notIn []*Stock) string {
tryAgain:
    for {
        tryIt := someName(r)
        for _, s := range notIn {
            if s.Symbol == tryIt {
                continue tryAgain
            }
        }
        return tryIt
    }
}

func someName(r *rand.Rand) string {
    name := make([]rune, 4)
    for i := 0; i < 3; i++ {
        name[i] = rune(65 + r.IntN(26))
    }
    if r.IntN(2) == 0 {
        name[3] = rune(65 + r.IntN(26))

    } else {
        name[3] = rune(48 + r.IntN(10))
    }
    return string(name)
}

func somePrice(r *rand.Rand, from, to int) int {
    return from + r.IntN(to-from+1)
}

NEPTUNE_CREATE_NETWORK:neptune/internal/model/stocks.go

Creating users is all new code, but the ideas are much the same as creating stocks, and shares the same inserters for Dynamo and Neptune.
package model

import (
    "fmt"
    "log"
    "math/rand/v2"

    "github.com/gmmapowell/ignorance/neptune/internal/dynamo"
    "github.com/gmmapowell/ignorance/neptune/internal/neptune"
)

func CreateInsertAndLinkUsers(inserter *dynamo.Inserter, creator *neptune.NodeCreator, stocks []*Stock, count int, from, to int) {
    r := rand.New(rand.NewPCG(1, 2))
    for i := 1; i <= count; i++ {
        name := fmt.Sprintf("user%03d", i)
        u := User{Username: name}
        log.Printf("User %s\n", u.Username)

        err := inserter.Insert("Users", u)
        if err != nil {
            log.Fatal(err)
        }

        err = creator.Insert("User", "username", u.Username)
        if err != nil {
            log.Fatal(err)
        }

NEPTUNE_CREATE_NETWORK:neptune/internal/model/users.go

...

We will come back to the rest of this function below.

Unlike the stocks, I have just issued each user with a sequential user ID. There isn't any particular logic to my choices; it just helps me to "feel it more" to have the stocks have reasonable ticker symbols, whereas users just have names.

So the final part of the code has to do with linking together users and stocks. We can do this as follows:
        linkTo := make([]int, from+r.IntN(to-from+1))
        for j := 0; j < len(linkTo); j++ {
            linkTo[j] = uniqueStock(r, len(stocks), linkTo)
            sym := stocks[linkTo[j]].Symbol
            log.Printf("  watching %s\n", sym)
            err := creator.Link("Watching", "User", "username", u.Username, "Stock", "symbol", sym)
            if err != nil {
                log.Fatal(err)
            }
        }
    }
}

NEPTUNE_CREATE_NETWORK:neptune/internal/model/users.go

For each user, we choose a random number of stocks to watch, and then choose a (unique) stock at random and add a relationship in Neptune between user and stock.

To find a unique stock index we do much the same thing as we did before; the logic here is that the user is not already watching the stock:
func uniqueStock(r *rand.Rand, quant int, notIn []int) int {
tryAgain:
    for {
        tryIt := r.IntN(quant)
        for _, s := range notIn {
            if s == tryIt {
                continue tryAgain
            }
        }
        return tryIt
    }
}

NEPTUNE_CREATE_NETWORK:neptune/internal/model/users.go

So all that is left is implementing the Link method in Neptune.

Creating a Link

The Cypher Cheat Sheet has a create section which includes this about creating relationships:
CREATE (n:Label)-[r:TYPE]->(m:Label)
Now, it is not clear to me exactly what this means, but it does say that it binds r to the resulting relationship. I infer from this that it expects n and m to already be bound. It also says that the relationship is directional. I think of the relationship I want as being "from" the user and "to" the stock, but I will want to use it the other way. I assume that you can in fact follow relationships either way.

So I think I want to add two MATCH commands to the front of this to find the nodes for User and Stock with the names of the items I want to link. So I assume I want an openCypher program that looks like this:

Oddly, the match section of the cheat sheet does not offer just returning a node matching an attribute, but I think I can infer that I want these two:
MATCH (u:User {username: "user001"})
MATCH (s:Stock {symbol: "ESAV"})
and thus the complete program we want to run is:
MATCH (u:User {username: "user001"})
MATCH (s:Stock {symbol: "ESAV"})
CREATE (u)-[r:Watching]->(s)
Thus I implement the following function inside the node creator:
func (nc *NodeCreator) Link(rel string, t1, p1, l1 string, t2, p2, l2 string) error {
    program := `
MATCH (u:%s {%s: $%s}) 
MATCH (s:%s {%s: $%s})
CREATE (u)-[r:%s]->(s)
RETURN r
`
    doLink := fmt.Sprintf(program, t1, p1, p1, t2, p2, p2, rel)
    params := fmt.Sprintf(`{"%s": "%s","%s": "%s"}`, p1, l1, p2, l2)
    log.Printf("Query: %s Params: %s\n", doLink, params)
    linkQuery := neptunedata.ExecuteOpenCypherQueryInput{OpenCypherQuery: aws.String(doLink), Parameters: aws.String(params)}
    out, err := nc.svc.ExecuteOpenCypherQuery(context.TODO(), &linkQuery)
    if err != nil {
        return err
    }
    return showResults(out.Results)
}

NEPTUNE_CREATE_NETWORK:neptune/internal/neptune/insert.go

We can then let it rip and hope for the best. It is totally unreasonable to consider speed "'tests" when I am using the serverless mode, am on the other side of the Atlantic from my server, have done nothing to parallelize the operations or anything, and am working through a VPN, but it took 8 minutes to create 2000 stocks and another 5 to create 100 users with 20-30 relationships each.

That's not quick, but I do think most of the time has gone in round trips. If I ran all the updates in parallel, or from within a lambda, I'm sure it would be a lot quicker. I'll probably come back to that when I have the lambda sorted out.

Conclusion

We can build a graph and, with all the pesky issues of connectivity and neptune syntax sorted out, it went quite quickly.

No comments:

Post a Comment