Making a Game in Go, part 5
Last post about draw ordering, I promise.
In the first four posts of this series, I iterated
on an implementation for a 2D game
engine, starting from a minimal implementation of the ebiten.Game
interface
through to a system for organising components both in a logical hierarchy as
well as a data structure for drawing the right component at the right time
(assuming the components can be totally ordered). The previous post ended on
some open pondering on whether switching from sort.Sort
to a topological
ordering made sense, from both an ease-of-use perspective (the user can define
fewer ordering constraints) and an efficiency perspective (big-O amount of work
is potentially quadratic).
Friends, it does make sense to implement topological ordering. And all the other kinds, too.
Making things easier for the users of a library is often a good call. Accidentally sorting things incorrectly, because they happen to not be totally ordered, could be really confusing.
And it’s not always the library writer’s job to shield the users from inefficiencies. While a library should ideally aim to avoid needless inefficiency, there are other concerns to balance, like maintainability and ease-of-use, and some problems are by their very nature intractable, or at least difficult in the sense of requiring a lot of processing to solve.
A library or framework can embody the opinions of its author. A responsible library author would clearly document the chosen tradeoffs - at least a big fat “this implementation is hella slow” in the godoc and example code would go a long way. Designing the library interface to avoid or discourage inefficient usage could be even better.
If a game maker wants to overlap thousands of components in the same place, knowing that it would probably be inefficient given the engine being used, then that’s a little bit on them.
And good library might offer a few different approaches, and let the user pick among them.
But all discussion of theoretical efficiency is kind of moot. What matters is how the program behaves when it is actually run. As long as the game’s performance isn’t annoying players, it almost doesn’t matter what kind of algorithm is being used.
By performance I don’t just mean framerate. Other concerns players might have include memory usage, remaining battery, GPU fan noise, wear on the hard drive, secretly mining cryptocurrencies (which, to be clear, are all evil - this isn’t hyperbole - if you support cryptocurrencies or NFTs then you are morally bankrupt and any rights you might have had to read my blog are revoked), the ongoing climate catastrophe…
Benchmarking and profiling tools exist to measure how fast an implementation is. A heuristic I’ve been using is to take a CPU profile, and compare how long is spent preparing to draw versus actually drawing. Actually drawing should take a bit longer.
For my own goals, I’m aiming to make my game run at 60fps under WebAssembly on
an iPad. In a rough early cut of implementing topological ordering, the
framerate on my iPad began to dip a bit below 60. But, with a little work, it
runs about as quickly (often faster) than the previous sort
method, even
though it would be even slower in the worst case (and, bonus: everything is
correctly ordered now).
The gory details
Let’s start with a few data structures.
More data structures? Why not keep everything in a flat slice? Because incremental changes to the right data structure might be more efficient overall compared with doing processing in-place on a flat slice.
Firstly, here is a type that is a set (of things to draw):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
struct{}
has no fields and there’s only one value of the type, which isstruct{}{}
, but you can still use it as a value type in a map.Representing a set as
map[...]struct{}
versusmap[...]bool
is a stylistic choice.On the one hand, iterating over
map[...]struct{}
does not require checking if the corresponding value isfalse
, if that is a possibility, and there is (theoretically) no memory required to store eachstruct{}{}
.On the other, testing to see if something is an element is easier in
map[...]bool
, because failed lookups result in the zero value for the value type (false
).if m[key] {
instead of the longerif _, found := m[key]; found {
. Also,true
is slightly easier to type on many keyboards, compared withstruct{}{}
, when adding new values.
Secondly, a direct acyclic graph (DAG) data structure, implemented in a two-sided edge-set fashion:
1 2 3 4 5 |
|
This dag
maps things to draw (type Drawer
) to two sets: a set of things to
draw before it (in
) and a set of things to draw after it (out
). I will now
switch terminology: the things to draw are the vertices, and the ordering
constraints are the edges linking the vertices together.
Some edge-list or edge-set implementations of graphs focus only on the
outgoing edges from each vertex. I’m also recording ingoing edges, for reasons
that will be explained soon. What is important is that if v
is an element of
d[u].out
, then u
should be an element of d[v].in
.
Here’s a few methods for managing edges and vertices in dag
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
A quick note about
addEdge
andremoveEdge
. Ifu
isn’t already ind
, thend[u]
has the valueedges{}
, or more verbosely,edges{in: nil, out: nil}
. Using.in
and.out
on this value is safe (unlike, say, if the map stored pointers toedges
, because thend[u]
would benil
ifu
is not in the map) and deleting from.in
and.out
is also safe (becausedelete
is documented to do nothing fornil
maps). Setting values in.in
and.out
(wherenil
) would panic, however. This is whyaddEdge
needs to calladdVertex
to ensure the maps exist, butremoveEdge
does not.
The degree of a vertex is how many edges connect to it. In a directed graph,
the outdegree of a vertex is how many edges start with that vertex, and the
indegree is how many end with that vertex. In- and out-degrees can be counted
easily with dag
: indegree of v
is len(d[v].in)
and outdegree is
len(d[v].out)
Ease of counting indegrees and iterating out-edges suggests using a topological sorting algorithm based on… counting indegrees and iterating out-edges (Kahn’s Algorithm):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
This can be extended to tolerate (though not truly “support”) cyclic graphs; if there is a cycle, the queue will become empty but some vertices will remain unvisited. But, if there is a cycle, then there is no draw ordering that respects all the constraints. At this level of the library, it may be appropriate to enqueue an arbitrary unvisited vertex, perhaps one with minimal indegree, let it draw, and hope for the best.
Hope? What happened to “hope is not a strategy”?
Actually, hope is a strategy, just often a bad one.
Using the dag
Now I just have to get a dag
loaded up with stuff to draw and enough ordering
constraints, and we’re good to go. In the previous post this is where I threw my
hands up, exlaiming “comparing everything against everything else is quadratic!
This is going to suck!”. And that’s still true, in the worst case.
In practice, a game consisting of lots of things overlapping a small area is likely to be inscrutable to the player, and there might be easier ways to achieve the intended effect (e.g. a single flat image or animation instead).
The number of comparisons and ordering constraints can be reduced in the following two ways, among others:
- If something hasn’t moved (e.g. components that are “fixed terrain”), then we don’t have to re-evaluate its ordering against other fixed things.
- If, after projecting into screen space, two items don’t overlap, we don’t have to compare them. We can speed up finding potentially overlapping things in a variety of ways.
To simplify the rest of the discussion, I will assume every component that needs
drawing also has a 3D bounding box, i.e. the Drawer
interface now looks like:
1 2 3 4 |
|
where geom.Box
is a 3D analogue of image.Rectangle
. This definition makes it
harder on components that “just want to provide a Z value”, since they now have
to invent a bounding box. Though, that bounding box could be:
1 2 3 4 |
|
This is an edge case worth keeping in mind later - what happens with incredibly huge components?
For the sake of letting the game maker choose different draw-ordering approaches, let’s expose the DAG-based drawing code via a new type:
1 2 3 4 5 6 7 8 9 |
|
Other approaches to drawing can have their own type. For instance, DrawDFS
(short for “draw depth-first search”) could draw subcomponents in a depth-first
traversal of the tree, without regard to whether the components have other ideas
about their draw ordering. The game maker can then pick and choose which “draw
manager” to use, or even write their own. With a bit of care, multiple draw
managers could be used for different parts of the game tree!
Anyway.
The API for DrawDog DrawDAG
will involve being notified of subcomponents
being added and removed, as well as needing to take stock of all subcomponents
initially. Since each component is actually a tree of components, registering
and unregistering will have to happen recursively (recall Scan
for finding
immediate subcomponents of a component).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
After Update
-ing subcomponents, DrawDAG
is ready to update d.dag
. A
simplistic approach would be to delete d.dag
and build it from scratch. But
some work can be saved if we only update for components whose bounding boxes
have changed. If registerOne
and unregisterOne
are relatively quick, we can
remove all the components that changed, and re-add them.
These changes to DrawDAG
might look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
Note that this does not deal with the possibility that draw ordering might need to change in cases where bounding boxes remain unchanged. This could be added using an interface for components to report whether they’ve changed shape in the current update.
Still, this is an improvement on rebuilding the DAG on every update - components that have changed have their constraints checked, and components that are fixed only change if they were or are connected to something that changed. In the worst case this could still be every component, but for a scene consisting mostly of fixed terrain, this is a big improvement.
Next, let’s deal with avoiding comparisons in registerOne
by checking if
there are overlaps in screen space. A blunt-force approach, that makes the
theoretical worst-case even more worse, is to divide screen space into
equally-sized square chunks, and for each chunk, track the set of components
that overlap that chunk.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
|
If the scene consists of small components that are spread out, this approach is pretty good. Each chunk will only have a few components, and each component is small, so there are few chunks to gather and few components to compare.
If the scene consists of lots of large, perpetually moving components, all overlapping the same chunks, then all this is bad! - at least \(O(A|V| + |V|^2)\), where A is the average screen area of the components (as measured in chunks, but that’s a constant factor).
It’s a start!