I may be carbon-dating myself here, but one of the first things I implemented on my Apple IIe in the early 80s was John Conway's Game of Life. If you're not familiar with it, it's a mathematical game that simulates cellular automata with fixed rules. There's an infinite square array containing spaces for cells. Every generation, cells can be born or they can die, according to very simple rules. Despite the simple rules, very complex structures can be created. See the wiki page for more information.
All the code here can be found on github: https://github.com/pauljchang/sandbox/blob/master/life/life.sql
The rules for the game are:
- Any empty cell with exactly three neighbours gives "birth" to a new cell.
- Any cell with 1 or 0 neighbours "dies" from starvation and is removed.
- Any cell with 4 or more neighbours "dies" from overcrowding and is removed.
For fun, I've implemented this in BASIC and C++, and maybe even Java, but I thought I'd share my MSSQL implementation, as it uses some interesting concepts.
First of all, instead of using an array, we are going to create a table that contains one row for every cell that is in the field. This may seem overkill, but for very sparse fields where there are just a few cells occupying a very large area, this can be surprisingly efficient, and more efficient than a very large, two-dimensional array.
So, let's see the table:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
--
-- dbo.cell -- Table to hold all cells
--
-- Each cell has an entry in the dbo.cell table
-- It knows its (x,y) coordinates, as well as the generation number
-- The generation number is useful for diagnostics
-- and indicates how "old" a cell is
--
CREATE TABLE dbo.cell (
x SMALLINT NOT NULL
, y SMALLINT NOT NULL
, gen SMALLINT NOT NULL
);
CREATE UNIQUE CLUSTERED INDEX KUX_cell ON dbo.cell (y, x);
CREATE UNIQUE INDEX UX_cell_x_y ON dbo.cell (x, y) INCLUDE (gen);
CREATE INDEX X_cell_gen ON dbo.cell (gen, y, x);
|
You might notice that I clustered the table along the y-axis, which will facilitate printing out the field for display. I also included other indices that will help later on as we step through generations of cells.
There is also a "gen" field for the generation of the cell. This is interesting information, as to the age of the cells, and also helps us distinguish between newly "born" cells and cells that were already there.
Let's also seed the field with the classic R-Pentamino shape. This happens to be one of the smallest patterns that grows into a huge shape that doesn't stabilise for many generations. Here's what we want to seed:
+---+ 1 | 00| 0 |00 | -1 | 0 | +---+
The numbers on the left are the y-axis, and I didn't think it was necessary to display the x-axis. Here's the code to seed the table:
1
2
3
4
5
6
7
8
|
INSERT INTO dbo.cell (x, y, gen)
VALUES
( 0, 1, 0)
, ( 1, 1, 0)
, (-1, 0, 0)
, ( 0, 0, 0)
, ( 0, -1, 0)
;
|
In order to step through, there are two calculations we must make. One, we have to see if any new cells will be born with the existing cells on the field. And two, we have to see if any of the existing cells (not the newly born cells) will die.
To do this, I use CTEs (Common Table Expressions), which are like temporary queries or views that live only for the lifespan of a query, but can be referenced by the query.
The first CTE I create is "delta", which allows me to look at neighbouring cells or spaces in all eight directions. Just one delta CTE can be referenced multiple times, which is very convenient.
1
2
3
4
5
|
WITH delta (num) AS (
SELECT CAST (-1 AS SMALLINT ) AS num
UNION ALL SELECT CAST ( 0 AS SMALLINT ) AS num
UNION ALL SELECT CAST ( 1 AS SMALLINT ) AS num
)
|
...
The next CTE is for generating a set of neighbouring empty spaces. This is based on the current set of cells, but looking in left, right, up, down, and diagonally for empty spaces. Here, we can see how "delta" is used. It's a pretty simple query -- return distinct neighbouring spaces that don't already have a cell.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
...
, empty_neighbour (x, y) AS (
SELECT DISTINCT
dbo.cell.x + delta_x.num AS x
, dbo.cell.y + delta_y.num AS y
FROM
dbo.cell
CROSS JOIN delta AS delta_x
CROSS JOIN delta AS delta_y
WHERE
NOT EXISTS (
SELECT *
FROM
dbo.cell AS other_cell
WHERE
other_cell.x = dbo.cell.x + delta_x.num
AND other_cell.y = dbo.cell.y + delta_y.num
)
)
...
|
And another CTE, but this one counts how many neighbouring cells exist for each of those empty spaces. Again, we use "delta" to look in eight directions to count neighbouring cells.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
...
, neighbour_count (x, y, neighbour_count) AS (
SELECT
empty_neighbour.x
, empty_neighbour.y
-- This expression is here to eliminate the silly NULL agregation warning
-- Otherwise, we could just COUNT(other_cell.gen)
, COALESCE ( SUM ( CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END ), 0) AS neighbour_count
FROM
empty_neighbour
CROSS JOIN delta AS delta_x
CROSS JOIN delta AS delta_y
LEFT JOIN dbo.cell AS other_cell
ON other_cell.x = empty_neighbour.x + delta_x.num
AND other_cell.y = empty_neighbour.y + delta_y.num
GROUP BY
empty_neighbour.x
, empty_neighbour.y
)
...
|
I have a funny COALESCE(SUM(CASE...)) statement because I'm trying to avoid the annoying aggregation of NULL values warning. Just think of it as counting the number of neighbours.
And finally, the INSERT. We insert with the current generation number to distinguish the newly "born" cells from older cells. This will be useful in the DELETE query.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
...
INSERT INTO dbo.cell (x, y, gen)
SELECT
neighbour_count.x
, neighbour_count.y
, @gen AS gen
FROM
neighbour_count
WHERE
neighbour_count.neighbour_count = 3
ORDER BY
neighbour_count.y
, neighbour_count.x
;
...
|
+---+ 1 |100| 0 |00 | -1 |10 | +---+
The "1" cells are the ones that we just inserted, based on the rule of empty spaces with three surrounding cells. But we still need to delete cells that are overcrowded or starved. In this case, the centre cell should be removed because it has four neighbours (not including the newly "born" ones).
Here, we merely count the neighbours of existing cells, minus newly "born" ones. Again, we use the "delta" CTE.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
...
, neighbour_count (x, y, neighbour_count) AS (
SELECT
dbo.cell.x
, dbo.cell.y
-- This expression is here to eliminate the silly NULL agregation warning
-- Otherwise, we could just COUNT(other_cell.gen)
, COALESCE ( SUM ( CASE WHEN other_cell.gen IS NOT NULL THEN 1 ELSE 0 END ), 0) AS neighbour_count
FROM
dbo.cell
CROSS JOIN delta AS delta_x
CROSS JOIN delta AS delta_y
LEFT JOIN dbo.cell AS other_cell
ON other_cell.x = dbo.cell.x + delta_x.num
AND other_cell.y = dbo.cell.y + delta_y.num
-- Don't count the cells we just created
AND other_cell.gen < @gen
-- We don't want to count the cell itself, just neighbours
AND ( other_cell.x <> dbo.cell.x
OR other_cell.y <> dbo.cell.y
)
WHERE
-- Don't count the cells we just created
dbo.cell.gen < @gen
GROUP BY
dbo.cell.x
, dbo.cell.y
)
...
|
...and the final delete:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
...
DELETE
dbo.cell
FROM
dbo.cell
INNER JOIN neighbour_count
ON neighbour_count.x = dbo.cell.x
AND neighbour_count.y = dbo.cell.y
AND ( neighbour_count.neighbour_count <= 1
OR neighbour_count.neighbour_count >= 4
)
;
...
|
Now our field looks like the following:
+---+ 1 |100| 0 |0 | -1 |10 | +---+
If we step forward a few generations, we see the pattern grow:
+----+ 2 | 2 | 1 | 10 | 0 |2 2| -1 | 10 | +----+ +----+ 2 | 32 | 1 | 103| 0 |2 2| -1 | 10 | +----+ +----+ 2 | 3 4| 1 |4 3| 0 |2 2| -1 | 10 | +----+ +-----+ 2 | 5 | 1 |45 35| 0 |2 2 | -1 | 10 | +-----+
And after 100 generations, we see it grows, and keeps growing:
+--------------------------------------------------+ 12 | 98 | 11 | 9 9 | 10 | 58 8 8 | 9 | 45 98 | 8 | 89 | 7 | 0 70 45 00 9 | 6 | 989 0 69 49 80 | 5 | 9 70 | 4 | 80 07 069 89| 3 | 950 980 08 80| 2 | 09 0 06 0 6 89 00076 | 1 | 0 0 38 8 5 0 7 90 | 0 | 0 0 6 8 9 8 | -1 | 9 9 9 98 9 0 0 | -2 | 00 0 089 | -3 | | -4 | 09 80 | -5 | 7 | -6 |07 0 71 | -7 |88 0 52 | -8 |0 | -9 | 9 0 | -10 | 0 8 | -11 | 090 | +--------------------------------------------------+
The code on github contains two stored procedures -- "print_cells" to display the field, and "step_cells" to step one generation. It also contains code to set up the initial field with the R-Pentamino example above. The code is written for MSSQL, but I've tried to stay within ANSI SQL-92, so it should be reasonably easy to port this to other RDBMSs.
I'm reasonably happy with this code. Unlike array implementations, which slow down as the patterns grow larger and occupy more rectangular space, this implementation grows O(n), linearly with the number of cells, n.
Is there a better way to implement this in SQL? Of course, we all know that SQL is likely not the best way to implement this, but within the bounds of SQL, how else can we improve this?