Simulating Infinity in Conway’s Game of Life with Modern C++
Conway’s Game of Life, simulating Conway’s Game of Life.
GOLDE is an editor and simulator for cellular automata capable of simulating trillions of generations instantly. I started building it eight months ago with no C++ experience.
I had heard the rumors that C++ was a scary language filled with footguns and segmentation faults, but I had never given it a fair chance myself. This project started as a way for me to learn the basics of OpenGL and C++ by making an app for rendering the Game of Life. What followed was a feedback loop of going deeper into both modern C++ and cellular automata until GOLDE became what it is today.
Though there’s a lot to talk about in GOLDE, I want to walk through the aspect of this journey that I found the most technically challenging: implementing the beautiful HashLife algorithm discovered by Bill Gosper which makes near-infinite evolution possible. I’ll discuss how using modern C++23 features made this process safer, cleaner, and more performant.
HashLife
HashLife represents the Game of Life universe as a memoized quadtree, where each node caches its state many generations into the future. Each node in HashLife is canonical, meaning identical subpatterns are never computed twice. Larger nodes represent larger areas of the universe and are able to cache more generations into the future. Specifically, a node that is $2^n \times 2^n$ cells large can compute itself $2^{n-2}$ generations in the future. This means a $2048 \times 2048$ universe can jump $512$ generations near-instantly. If we want to go further, GOLDE automatically makes the universe larger by padding the edges with empty cells.
For patterns like the Breeder, which expands endlessly, HashLife can jump billions of generations in the time it would take a naive simulation to progress a few hundred. If you’re looking for a more detailed explanation of the algorithm, I recommend this article by Tomas Rokicki. What I want to focus on is the parts that weren’t in any article I could find.
Representing the universe
The actual node-level representation of HashLife looks like this:
1
2
3
4
5
6
7
8
9
10
11
struct LifeNode {
const LifeNode* NorthWest;
const LifeNode* NorthEast;
const LifeNode* SouthWest;
const LifeNode* SouthEast;
uint64_t Hash;
bool IsEmpty;
constexpr LifeNode(const LifeNode* nw, const LifeNode* ne, const LifeNode* sw, const LifeNode* se);
};
The four pointers on LifeNode form the quadtree structure of HashLife. We precompute the hash of the node so node lookups are almost free, and also cache IsEmpty so we can skip over large areas of the universe that are entirely empty. For a glider traveling through an otherwise empty universe, this means HashLife descends only into the handful of nodes that actually contain live cells, skipping vast empty regions in a single pointer check.
To represent the base cases, we use two globally-defined nodes:
1
2
3
4
constexpr inline LifeNode StaticTrueNode{nullptr, nullptr, nullptr, nullptr};
constexpr inline const LifeNode* FalseNode = nullptr;
constexpr inline const LifeNode* TrueNode = &StaticTrueNode;
FalseNode simply represents a dead cell and TrueNode represents a live cell. A nice benefit of this approach is that this will make it easier to expand GOLDE to support multi-state rules in the future, since we just have to add additional base-case nodes.
You might be slightly uncomfortable with the use of raw pointers here, but notice that every LifeNode only stores const pointers. Since the entire quadtree data structure is canonical, it is also immutable. But how do we make it canonical? Enter LifeNodeArena, a bump-pointer allocator that hands out nodes in contiguous blocks:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class LifeNodeArena {
public:
const LifeNode* emplace(const LifeNode* nw, const LifeNode* ne, const LifeNode* sw, const LifeNode* se);
void clear();
private:
constexpr static auto BlockCapacity = 65536UZ / sizeof(LifeNode);
struct BlockDeleter {
void operator()(LifeNode* p) const;
};
std::vector<std::unique_ptr<LifeNode, BlockDeleter>> m_Blocks;
size_t m_Current = BlockCapacity;
};
This is a simple arena implementation with the key advantage that it maintains pointer stability. It’s similar to std::deque, but stores much larger blocks for better cache locality. The custom deleter paired with std::unique_ptr makes memory management trivial here. The emplace function itself is also fairly simple:
1
2
3
4
5
6
7
8
9
10
11
const LifeNode* LifeNodeArena::emplace(const LifeNode* nw, const LifeNode* ne, const LifeNode* sw, const LifeNode* se) {
if (m_Current == BlockCapacity) {
auto* raw = static_cast<LifeNode*>(
::operator new(BlockCapacity * sizeof(LifeNode)));
m_Blocks.emplace_back(raw);
m_Current = 0;
}
auto* node = m_Blocks.back().get() + m_Current++;
std::construct_at(node, nw, ne, sw, se);
return node;
}
std::construct_at is particularly useful here to avoid the naked call to a placement new. In fact, the ::operator new in this code snippet is the only occurrence of a raw new in the entire codebase!
Since LifeNodeArena has no built-in way to look up cached nodes, I turned to ankerl::unordered_dense for an open-addressing hash table fit for the job and used an ankerl::unordered_dense::map<const LifeNode*, const LifeNode*>.
The base case: 65,536 precomputed universes
We represent the universe using an $8 \times 8$ base case. Recall that HashLife will advance by $2^{n-2}$ generations, and since $n = 3$ for the $8 \times 8$ base case, we will advance $2^{3-2}=2$ generations. The goal of the base case is therefore to compute what the center $4 \times 4$ grid of the $8 \times 8$ region looks like $2$ generations from now.
A $4 \times 4$ grid contains $16$ cells, and can therefore be squeezed inside a uint16_t like so:
1
2
3
4
bit 15 14 13 12
bit 11 10 9 8
bit 7 6 5 4
bit 3 2 1 0
Bit $0$ is the bottom-right cell, and bit $15$ is the top-left cell. The entire $8 \times 8$ grid is then four uint16_ts: nw, ne, sw, se.
Since there are $16$ bits that each have two states, there are a total of $2^{16}=65,536$ possible combinations. For a computer, this is a relatively small number, so we pre-compute all $65,536$ patterns at startup using the Game of Life rules.
Crucially, each result in the lookup table is stored in a uint16_t as well, with the four center bits going in bits $0$, $1$, $4$, and $5$, which are the bottom-right quadrant of the bit layout shown above. This decision will pay off later when we’re assembling the final result.
This approach allows for easy integration with rules other than Conway’s Game of Life, since all we need to change is the lookup table for different rules, and the rest of the HashLife algorithm still works. The entire BuildRuleTable function is also constexpr, but in practice compilers find it a bit tricky to fill in such a big table at compile-time. For now, GOLDE simply computes this table when starting up the app (or after changing a rule).
For advancing this $8 \times 8$ base case, we essentially apply the HashLife algorithm at the bit-level rather than the node-level. Nine overlapping $4 \times 4$ windows are formed in the $8 \times 8$ grid. Each window is extracted from the $8 \times 8$ grid purely using bitwise operations. Here is how the center is extracted, for example:
1
2
3
4
5
6
7
8
9
10
// Bitmasks for extracting 2x2 quadrants from a 16-bit 4x4 grid.
constexpr uint16_t MaskNW = 0xCC00;
constexpr uint16_t MaskNE = 0x3300;
constexpr uint16_t MaskSW = 0x00CC;
constexpr uint16_t MaskSE = 0x0033;
uint16_t WindowCenter(uint16_t nw, uint16_t ne, uint16_t sw, uint16_t se) {
return ((nw << 10) & MaskNW) | ((ne << 6) & MaskNE) | ((sw >> 6) & MaskSW) |
((se >> 10) & MaskSE);
}
Each of the nine windows is looked up in the precomputed table, returning the next-generation state of that window’s center $2 \times 2$ cells. The results from this first step are combined like so:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
HashLife::FirstGenResults HashLife::ComputeFirstGeneration(const LeafQuadrants& q) const {
const auto& table = s_Rule.Table();
return {
.nw = table[q.nw],
.n = table[WindowN(q.nw, q.ne)],
.ne = table[q.ne],
.w = table[WindowW(q.nw, q.sw)],
.center = table[WindowCenter(q.nw, q.ne, q.sw, q.se)],
.e = table[WindowE(q.ne, q.se)],
.sw = table[q.sw],
.s = table[WindowS(q.sw, q.se)],
.se = table[q.se]
};
}
These nine $2 \times 2$ results are then recombined into four overlapping $4 \times 4$ windows and looked up a second time, producing the final centered $4 \times 4$ output. Two generations, thirteen table lookups, and zero branching.
Supporting bounded topologies through abstraction
GOLDE supports finite toroidal grids, where the universe wraps around like a donut:
HashLife has no concept of this. It assumes the universe extends infinitely in all directions, and that everything outside the quadtree is dead.
The solution is to lie to HashLife. Before each simulation step, GOLDE copies live cells from each edge of the bounded region to just outside the opposite edge. This means a cell at the rightmost column will get a ghost copy one cell past the left boundary, and vice versa.
HashLife then runs normally and sees the correct neighbors for each cell. Afterwards, the ghost cells are discarded and the simulation is rendered to the screen. All of this happens without HashLife ever even knowing there were bounds to the universe.
Though GOLDE only supports the torus right now, there are several other possible topologies, such as klein bottles, cross-surface topologies, and spheres. The extension point is made readily available through the three-way abstraction central to GOLDE’s algorithm layer: LifeDataStructure, LifeAlgorithm, and Topology. When it’s time to add another topology, all it takes is extending the Topology interface:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
class Topology {
public:
Topology(Rect bounds = {});
virtual ~Topology() = default;
// ...
virtual bool CompatibleWith(const LifeDataStructure& data) const = 0;
virtual int32_t Log2MaxIncrement(const BigInt& requestedStep) const = 0;
virtual void PrepareBorderCells(LifeDataStructure& data) = 0;
virtual void CleanupBorderCells(LifeDataStructure& data) = 0;
};
LifeDataStructure is the complementary side of this contract. At minimum it exposes Get and Set methods, but our Torus can deny any data structures that don’t have additional features it needs, like Extract, using the CompatibleWith method. HashLife itself only ever sees the abstract interfaces, so swapping in a new topology requires no changes to the algorithm. PrepareBorderCells and CleanupBorderCells handle the pre- and post-processing of the bounded grid we discussed earlier.
This topology then gets passed into LifeAlgorithm, which HashLife inherits from:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class LifeAlgorithm {
public:
virtual ~LifeAlgorithm() = default;
// ...
virtual BigInt Step(LifeDataStructure& data, const BigInt& numSteps, /* ... */) = 0;
virtual void SetTopology(std::unique_ptr<Topology> topology) = 0;
virtual void SetRule(const LifeRule& rule) = 0;
virtual bool CompatibleWith(const LifeDataStructure& data) const = 0;
};
Together, these interfaces will make it simple to extend GOLDE to support algorithms like QuickLife, unique rules, and more in future updates.
Looking back at the Topology interface, the Log2MaxIncrement virtual method controls the maximum number of generations the algorithm can advance in a single step. On an infinite plane, that can be enormous, but on a bounded grid it’s typically capped at one, since the ghost cell trick only works generation by generation. That brings us to one of the most interesting aspects of GOLDE’s HashLife implementation: how do we support arbitrary step sizes at all?
Jumping exactly $n$ generations
As discussed above, HashLife naturally advances by powers of $2$. For an interactive editor where a user can type any number into the step count field, this is a problem. If someone wants exactly $1000$ generations, the naive HashLife implementation has no answer.
I ran into this wall early in development and couldn’t find any resources about the solution online. Unsure of how to proceed, I sent a cold email to Tomas Rokicki, the author of the Dr. Dobbs article linked earlier and one of the core developers of Golly, and asked him directly.
Golly’s approach is elegant in its simplicity. It finds a maxadvance equal to the greatest power of $2$ that divides the step size. For a step size of $1000$, that’s $2^3=8$ generations, so Golly takes $125$ steps of $8$ generations each. If the node being advanced would already predict maxadvance generations (or fewer) through regular HashLife, we can simply use the approach described above. For the slow path, however, we need a different algorithm.
Rather than the standard $9$-window approach, the slow path divides each node into an $8 \times 8$ grid and builds $16$ overlapping windows. We recurse exactly one level down at a time (as opposed to regular HashLife advancing each node twice as they’re assembled), and thread the maxadvance constraint through the call stack. We also need an alternate base-case that only jumps one generation instead of two. This is the technique Rokicki described in his email, and it allows each jump to land on exactly the right generation without overshooting.
At the end of his reply, Rokicki added: “This is not the only possible implementation… it might be a fun thing to play with!” That offhand suggestion became the foundation of GOLDE’s simulation loop.
Rather than finding the greatest power of $2$ that divides $n$, GOLDE finds the greatest power of $2$ that is less than or equal to $n$, jumps by that amount, subtracts it from the remaining steps, then repeats. In code, that looks like this:
1
2
3
4
5
6
7
8
9
10
BigInt generation{};
while (generation < numSteps) {
const auto advanceLevel =
m_Topology->Log2MaxIncrement(numSteps - generation);
const auto gens =
BigPow2(DoOneJump(hashQuadtree, advanceLevel, stopToken));
generation += gens;
}
So a step of $1000$ is $512+256+128+64+32+8$: six instead of Golly’s $125$.
This approach isn’t strictly better than Golly’s. Because maxadvance changes with each iteration of the loop, the memoized results from one jump can’t always be reused by the next. We have to key the hash table on both the node size and the generation count to maintain efficiency. Golly’s fixed maxadvance means the cache stays hot across the entire step, but GOLDE trades some cache efficiency for the ability to land on any generation in at most $\log_2{n}$ jumps.
In further conversations with Rokicki, we discussed improvements to the algorithm that combine the benefits of both Golly’s and GOLDE’s approaches, which I will be investigating as GOLDE’s development continues.
Stopping gracefully
For a responsive editor, it’s important to make sure that HashLife can end on a moment’s notice from the user, especially considering how massive some jumps can be. To solve this constraint, I opted to use C++20’s std::jthread and std::stop_token and threaded it through the HashLife algorithm.
When the simulation starts, the std::jthread passes a stop token into HashLife. On each recursive call, HashLife makes a quick check to see if the stop token has requested stop, and begins an early exit if so:
1
2
3
4
5
6
7
NodeUpdateInfo HashLife::AdvanceNode(const LifeNode* node, std::stop_token stopToken, /* ... */) const {
if (stopToken.stop_requested()) {
return {node, 0};
}
// Do HashLife...
}
Both the fast and slow path call back to this function, so stop-token checks are never missed. After a stop is requested, some cached results may be incomplete, so we clear the relevant cache entries before returning control to the user.
These features prevent us from encountering any nasty bugs from attempting to kill the thread directly, and also ensure a safe exit when the user closes the editor, since std::jthread’s destructor automatically requests a stop before joining.
GOLDE also supports running multiple simulations at once, which causes trouble for a typical static hash table. A simple solution would be to store the hash table as a member variable, but that invariably will result in additional code complexity considering how much the data structure is being copied (for undo/redo management, etc.). Using this approach in an early version of GOLDE also revealed some serious performance penalties, around a 2x slowdown. Additionally, we want the editor thread to be able to use the same cache while performing operations like copy, paste, etc. There’s little concern regarding data races since editing is only allowed while the simulation is paused.
The solution I’ve settled on is a static std::array<HashLifeCache, 256>, with each thread possessing a thread_local size_t CacheIndex. When a thread is born, it selects a cache based on a unique ID provided to each editor. This gives us the best of both worlds: efficiency from a static cache, and the ability to safely share caches between threads.
Conclusion
There’s a lot more to GOLDE than HashLife: a multithreaded simulation loop offloads grid mutations to a background thread while the camera keeps rendering; the OpenGL renderer batches the entire universe into a single draw call by packing cells into a flat texture; and the GUI library built on top of ImGui required bridging its C-Style immediate-mode API with GOLDE’s ownership model without sacrificing the benefits of either. All of these are technically interesting discussions in their own right that deserve follow-up articles. Modern C++23 made building all of this cleaner than C++’s reputation would suggest, so if you’re afraid of the language like I was, it’s worth a second look.
You can download GOLDE and view the full source code on GitHub. Issues, PRs, and questions are very welcome. I want to continue evolving GOLDE to support extensions, scripting, multi-state rules, and more. If any of that interests you, come build GOLDE with me.
