GovHack 2017

Nice.

GovHack is wrapped up for 2017. It was finished on the weekend, but Monday and Tuesday were a waste - I have been too miserable and sick to blog about it, and only just got a burst of energy to break through the mehffort. Our entry, Death Who? (Colonial Edition), is a virtual card game based upon real lives recorded in Tasmanian historical records. I wanted to share what I’ve learned about making a super hacky multiplayer game backend from scratch in 46 hours, in case you want to try the same thing.

The past few years I have flown down to Hobart to team up with old friends. As usual, I stated up-front that I wanted no share of any prizes we might win. I’m literally in it for the fun of hanging out with my friends.

The mix was a bit different this year. Our team of 8 consisted of me, Hannah, Paris, Mars, Jon, Tim, Seb, and Shea. Our new team members Hannah, Mars, and Shea all brought their own different strengths this year; Hannah and Shea worked on a web-based component, which is unusual for our team. Previous regular Rex went and did a solo entry which you need to check out.

Similar to past years, I made a backend for the game in Go. Each time this has been a server of some kind, depending on what is needed. There are a few preliminaries that can be done while the team settles on a direction, but it’s often better to have the decisions settled on Friday night, get a good night’s sleep, and then get cracking on Saturday morning. This time, we did the latter.

Constraints

  • Serve a game based on some government-provided data.
  • Work with multiple players.
  • SLO: It has to work for demonstration purposes (what might be called “best effort” SLO, but I have a whole other rant about that).
  • No need to worry about bandwidth, CPU, memory, disk usage, or any of that as long as it works for demo purposes.
  • Players can cheat and hack the server as much as they like - demo purposes, see above.

As such, the server doesn’t have to be Google-class production-grade stuff. It just has to work for 30 minutes. Nevertheless, it’s relatively easy to do the right things in Go. I think it’s still running even now…

Hello, GovHack

The designers of Go had web servers in mind, so adding one feels natural. I generally jump straight to a skeleton main.go in a server directory in the GitHub repo without thinking much more about it:

package main

import (
    "flag"
    "fmt"
    "log"
    "net/http"
)

var httpPort = flag.Int("http_port", 23480, "Port the webserver listens on")

func main() {
    flag.Parse()

	// Set up HTTP handlers
	http.HandleFunc("/helloz", func(w http.ResponseWriter, r *http.Request) {
		fmt.Fprint(w, "Hello, GovHack 2017!\n")
	})

    // Start listening on HTTP port; block.
	if err := http.ListenAndServe(fmt.Sprintf(":%d", *httpPort), nil); err != nil {
		log.Fatalf("Couldn't serve HTTP: %v", err)
	}
}

If you look carefully through our repo history, you’ll notice that flag.Parse call was missing until Sunday. I frequently forget the basics!

The beauty of running a web server is that it lets you (1) provide a way of checking that the server is still running, and (2) provide a way of inspecting the game state at any point without logging—just leave a browser tab open on localhost:23480/something and refresh when you want to take a peek. So let’s set up a handler for that:

	http.HandleFunc("/statusz", func(w http.ResponseWriter, r *http.Request) {
		// TODO: write the game state to w
	})

If you’re wondering about the z at then end of /hello and /status: I really can’t help putting it there. It’s a Google-ism.

State

It’s best to agree upon the basic structure of the state-space with the people making the client before going any further. In this case, we have a multiplayer card game with two types of cards. We whiteboarded some states:

  • Lobby: the players are joining the game, the game hasn’t “started”
    • When someone pushes the “start” button, transition to…
  • In game: the game is underway
    • Players take turns. Once one player plays, go to the next player.
    • Once all players have had a turn, begin the next round.
    • Once all rounds have been played, transition to…
  • Game over: the winner is proclaimed.
    • The game can be returned to the lobby state by pushing a button.

Start setting up some types to track the state. I like to do this in a separate package, but there’s no compelling reason to be so neat in a jam/hackathon.

package game

// Statum is some fake latin I made up
type Statum int

const (
    StateLobby Statum = iota
    StateInGame
    StateGameOver
)

type State struct {
    State     Statum          `json:"state"`
    Players   map[int]*Player `json:"players"`
    WhoseTurn int             `json:"whose_turn"`
    Clock     int             `json:"clock"`
}

type Player struct {
    // ... snip ...  
    Score       int     `json:"score"`
}

While it’s no big deal in Go to handle sending/receiving different JSON types nested inside some kind of genericised “message” type by adding MarshalJSON/UnmarshalJSON methods, to make it easier for clients I recommend avoiding doing that. In this case some parts of the state only have meaning depending on other parts of the state (e.g. WhoseTurn and Clock only mean something when State == StateInGame). To make it even easier we are also sending the entire game state to the client—we don’t care about cheating (it’s a hackathon).

Since this is a multiplayer game, be sure to smother state mutations in mutexes. sync.RWMutex is great and lets you get better performance in some cases but a sync.Mutex would be fine (it’s a hackathon). It would be possible to use the new sync.Map here, but we have to guard more than just the map from concurrent access and I like concrete types, so embedding a mutex makes the most sense.

type State struct {
    //...snip...

    // Non-exported fields for bookkeeping
    mu     sync.RWMutex
    nextID int
}

func New() *State {
    return &State{
        Players: make(map[int]*Players),
    }
}

func (s *State) AddPlayer() int {
    s.Lock()
    defer s.Unlock()
    id := s.nextID
    s.Players[id] = new(Player)
    s.nextID++
    return id
}

func (s *State) RemovePlayer(id int) {
    s.Lock()
    defer s.Unlock()
    delete(s.Players, id)
}

Player is defined in a way that its “zero value” is a sensible default. State is almost but not quite as simple, hence the New (it contains a map which we’d like to just use, but nil maps don’t work that way). It starts in the lobby state and supports arbitrarily adding and removing players. You could use a slice for storing players but then you will either have to handle nil “holes” in the slice, or fiddly logic with reslicing. Just use a map (it’s a hackathon). I might use a slice with nils next time.

Serving a game

The next step needs you to settle on the communication between the game client and the server. We chose what seemed like the easy thing which was to send JSON messages over TCP. Typical “serious” game servers often implement a compact binary protocol over UDP. It’s straightforward to do JSON and TCP in Go, and you can see how I did it this time in server.go. The example TCP listener in the net package documentation is a nice starting point. However, some notes.

Firstly, an infinite loop will… infinitely loop, blocking its goroutine indefinitely. Since this binary is also running a web server, one of the two has to be executed in a new goroutine. (Best practice is to have the goroutine created only in the main func so it’s obvious what its lifetime is, but this is a hackathon.)

Secondly, unless the server is synchronous (in the sense that there are no messages sent from the server that aren’t in response to something from the client), you need 2 goroutines per connection: one for receiving and one for sending. One client can affect all the other clients, so this was needed. We planned for the server to just spam the clients with state objects as it pleases.

Thirdly, it is very important that goroutine leaks are avoided: they might be lightweight but they consume memory and CPU cycles, after all. Contexts are great for this, especially in larger server projects. Here they serve the purpose of keeping the sending and receiving goroutines organised. When the context is cancelled, the connection can be closed and both goroutines can end. Additionally, the context can hold some per-player state. For a while I was using the context to hold the player ID, but instead went for an explicit parameter. Instead of a cancellable context, it is pretty much equivalent to give it a “quit” channel that gets closed for cancellation, but I use contexts all the time at work and didn’t bother to think about it much (it’s a hackathon).

Notifying all the clients

The goroutine handling outbound data (the imaginatively-named handleOutbound) is notified by a channel closing when it is time to transmit the game state, but this bears a little closer examination since there’s a great time-saving upside to this: there is no need to implement a registry of things to send notifications to, Go can handle it.

Firstly, remember that all reads on a closed unbuffered channel finish straight away and get the zero value. The outbound handler is an infinite loop around a select waiting on that channel or on the context. Here’s the part in State:

type State struct {
    // ... snip ...
    changedNote chan struct{}
}

func (s *State) Changed() <-chan struct{} {
	s.RLock()
	defer s.RUnlock()
	return s.changedNote
}

func (s *State) notify() {
	close(s.changedNote)
	s.changedNote = make(chan struct{})
}

Every time Changed is called it returns a channel whose sole purpose in life is to be closed in the future by notify. When the state changes (and it should only be changed by methods that do the correct locking), notify is called, closing the current channel and replacing it with a new one. (Calls to notify need to be guarded by the mutex.) Anything that is interested in the state can then just call Changed, and proceed once the returned channel is closed.

However, it’s a hackathon - why not do something really cheap and use a timer to spam updates every second (or something?):

func (s *Server) handleOutbound(conn net.Conn) {
    for range time.Tick(time.Second) {
        s.state.Dump(conn)
    }
}

This is fine, as long as steps are taken to avoid the goroutine leak (hint: it should end when the context is done, which means selecting on both the ticker and <-ctx.Done()). But it didn’t occur to me to do this at the time. Using channel-closing as a snappy notification system for arbitrarily many clients is a technique I’ve used a lot, and feels very natural in Go.

One thing that’s important for development speed is to give the client developer (Jon) a simple message to send the server that does nothing, to ensure communication works from the client side without much effort. Here’s Action:

type Act int

const (
	ActNoOp      Act = iota
	ActStartGame
	ActPlayCard
	ActDiscard
	ActReturnToLobby
)

type Action struct {
	Act  Act `json:"act"`
	Card int `json:"card"`
}

The zero value for Action has Act = ActNoOp, so, sending the empty JSON object {} works as a no-op message. This also helps manually testing the server: you can netcat/telnet into it and manually enter {} (or real actions as JSON).

Unit testing

Test what it does, not how it does it.

It’s a hackathon: if you don’t have time, don’t bother with unit tests, and just test manually. However I can hardly live without at least one or two unit tests. By writing a unit test against your actual API you force yourself to understand some implications of the API design.

Go doesn’t come with a mocking framework. You don’t need one. Run and test the actual server:

func TestGame(t *testing.T) {
	s := server{}
	r := &response{}
	if err := s.listenAndServe("localhost:0"); err != nil {
		t.Fatalf("Couldn't start: %v", err)
	}
	defer s.Close()

    // Connect player 0
	conn0, err := net.Dial("tcp", s.Addr().String())
	if err != nil {
		t.Fatalf("Couldn't connect: %v", err)
	}
	defer conn0.Close()

	send0 := json.NewEncoder(conn0)
	recv0 := json.NewDecoder(conn0)

    // ... snip...

    // Play a game!
	for i, p := range actions {
        // Send an action as one of the players.
        // Check each player receives the state.
        // Check the state against the desired state.
    }
}

All that really has to be faked is the card deck: you don’t want a flaky test because your virtual players got dud hands. So the game state uses a deck that you give it satisfying a Deck interface, and it has two implementations: the real deck which can be Shuffled, and a fake RiggedDeck which is the same but calling Shuffle has no effect.

Adding the, y’know, game

Adding the game logic isn’t all that interesting: with the above in place, it would be possible to make it into almost any kind of multiplayer game (with tweaks). The biggest tweak would be reducing the outbound data to only partial state updates, but that adds unnecessary complexity (it’s a hackathon).

Implementing the rules of the game is a lot like careful state bookkeeping. Having clear delineation of states helps a lot. There are explicit and implicit actions, e.g. player 1 plays card 3, versus player 2 has disconnected. It is important that concurrent actions don’t corrupt one another - use mutexes and the race detector (go test -race ...). It’s also important to think about what states could be “black holes” (often related to implicit actions). For example, if someone disconnects during a game, then the game shouldn’t wait for them to play. Or another example: if a player successfully connecting requires the game to be in the lobby state, and all the players disconnect, then the state should reset to the lobby state so people can rejoin.

Fortunately it’s a hackathon, so it’s allowed to have bugs galore, but I don’t think there are many. 😜

The game is data-driven (a bunch of historical data is churned into game cards). Data wrangling was done by Seb and Tim. We agreed early on that the data should be a JSON-formatted array of objects, which is not hard for them to encode and the server to decode. Loading the data is straightforward (hello my old friend json.NewDecoder) but the magic is in creating the cards out of them. (A few loops though.)

Conclusion

It’s a lot of fun working with such talented people in a hackathon environment. Making a good game server involves a mix of planning, technique, experience, design, teamwork, and communication. I’m once again looking forward to GovHack next year!

Exit, WordPress

I moved this blog out of WordPress. It’s now a static site generated with Hugo. Here’s what I did.

The old setup

Previously, this blog was hosted on DreamHost (both the domain registration and hosting). Due to the remarkable quirk of me being a cheapskate while I was at uni, I bought the domain myself but one of Paris or Jon (I can’t remember who now) did the hosting, because they had hosting. DreamHost was the domain registrar and provided the box that the A record pointed to.

DreamHost makes this easy…too easy. To host content for a domain on DreamHost, if you have purchased hosting in some form, all you have to do is enter the domain name, and be the first person to do that. Then they can set it all up. They get the options all the way down to and including the DNS settings, but not to renew or cancel the domain registration (unless they registered the domain too). If someone else has a hosting account (say, the person who registered the domain) and they want to take over hosting, one of two things has to happen:

  1. The person who is currently hosting has to remove the hosting in their account, or,
  2. The person who registered the domain files a ticket with DreamHost support to get the hosting cut over to their account.

So under the old setup, I registered the domain, and one of my friends kindly and graciously provided the hosting, which is the more expensive part, and for that I’m really grateful.

The hosting itself was a typical WordPress setup. The server ran Linux, the web server was Apache with PHP 5, and MySQL was the database backend.

Benefits of this old setup include:

  • Standard! So freaking standard that helpful articles just oozes out of the internets.
  • WordPress is a dynamic blogging platform, so you can get pretty complicated with content generated out of the database.
  • WordPress has themes, and auto resizes photos, and and and…
  • WordPress supports plugins. I have various bits of math floating around and having a plugin that renders LaTeX in actual symbols is really nice!
  • Comments. I no longer consider this a benefit, but it was nice for the first 5 years.
  • WordPress has mobile apps that you can use to edit your blog. I only did this a couple of times but it was a cute touch.

Problems with the old setup

Problem #1: Comment spam. A mere 4 months after my first blog post, I wrote a blog post about spammers.

Until I migrated off WordPress yesterday, I used Akismet to do automated spam filtering. This requires setting up a WordPress account and getting an API key, then you put the API key into the Akismet WordPress plugin and you’re set. Every now and again you review some spam comments but the volume of spam is greatly less.

But nobody really commented much on my blog. For that reason, and also because there exist people I don’t to hear from, I disabled the comment form on posts and pages. Sadly this wasn’t enough. For some reason I don’t care to understand (because the old way is dead now), comments would still appear in the moderation queue.

Problem #2: Security vulnerabilities. Old PHP had ‘em. Old WordPress had ‘em. WordPress plugins get ‘em. The database password was crap. The attack surface of the old WordPress blog is pretty big, but the value of the target was small. (My blog isn’t that interesting.) This is kind of a deal with any dynamically-generated content, compared with static sites.

I know that there were vulnerabilities and they were abused. I had shell access to the server, and a few times found various PHP files that DreamHost automatically blocked either by setting the file perms to 0000, or moving them into “.INFECTED” files. Just how badly pwnd my old blog was in the end, I’ll never be sure. But it was pwnd.

Security means security updates. It was a chore signing in to click the “Update everything” button. It’s more of a chore doing the recommended file and database backup before WordPress updates. It’s not a large burden, but it’s a burden, and my feelings on automating and getting rid of toil like this blossomed as a result of working for Google now.

Problem #3: The hosting situation. Because I’m a Googler now, I don’t want to be a cheapskate. I have the privilege of having a good income. I’m sure my friends get good feelings from being generous. Again, thank you guys. It was greatly appreciated. However! I can host things myself now.

The new setup

The new setup works like this.

  • Domain registrar: DreamHost (still).
  • Hosting option: Redirect (HTTP 301 redirect) joshdeprez.com to www.joshdeprez.com, on my own DreamHost account.
  • Custom DNS option: www (in the joshdeprez.com zone) is a CNAME for c.storage.googleapis.com. So the files are hosted by Google Cloud Storage.
  • The files for the site are generated with Hugo.
  • The source code for the site is written in Markdown, which Hugo converts into HTML according to various Go HTML templates and layout files.

The benefits of this approach:

  • It’s my own DreamHost account, and Google Cloud account.
  • It’s static, so the content is highly cacheable and can be served from Google’s CDN.
  • It’s static, so there’s no database or PHP.
  • It’s static, so there’s no comments at all. I could use a plugin like Disqus later on if I decide I really want comments (I really do not want comments).
  • It’s static, so entire classes of web vulnerabilities don’t happen.
  • It’s static, so I compose content offline, use git to version control the whole thing, and upload when I’m happy with it.

The drawbacks:

  • The 301 redirect to www is an annoying but necessary part of using a CNAME to use transparent hosting on a different domain. Why can’t I point the A record at Google Cloud Storage? Because GCS uses DNS load balancing, and the IP address (the A record target) would change depending on location, load, the phase of the moon, etc.

Erm. That’s about it for drawbacks, actually.

Migration

Migration was a bit of a pain, but I prevailed.

There exists a WordPress to Hugo Exporter. It is a WordPress plugin. You push the button and then you download a zip file containing static content and all your pages and posts helpfully converted to Markdown for you. I used this.

What I was unable to do was run it on my live blog. I tried and it failed. When you push the Export button, it scans the site, building an archive in /tmp on the server. The content for my old blog was over 1 GB, but /tmp didn’t have that much space, so it crashed and failed.

I solved this problem by:

  1. Running up Debian 8 in a VM at home, giving it stacks of disk and RAM;
  2. Installed the standard LAMP stack from the Debian repos;
  3. Packed all the WordPress files from my live site, and a full database dump, into a tarball that I downloaded;
  4. Unpacked the tarball in my Debian VM;
  5. Reconfigured WordPress in the VM enough to get it working;
  6. Ran the exporter locally.

The exported files worked pretty well in Hugo, but I wanted to make it really sing. So began the editing process.

Most importantly was the look and feel. There is a Twenty Fourteen theme for Hugo, which is an adaptation of the WordPress theme of the same name. It’s the theme I used on my old blog, and it’s kind of nice, so I kept it for the new blog.

It was easy enough to implement the theme (git clone the theme into the themes directory). Some adjustments later (such as the summary view), and it was pretty good.

Most of the editing effort was reorganising the content. I wanted to migrate off the wp-content/uploads structure that was unhelpfully preserved by the exporter. So I went through all the posts one by one and found the bits that were actually used, and moved them into one of two places:

  1. For “galleries” of photos that I dumped at the end of blog posts, I uploaded them to Google Photos and pasted the share-link.
  2. For photos interspersed with text in blog posts, I copied them into /static/$postnumber and then rewrote the autogenerated <figure> tags with the figure shortcodes provided by the theme.

Rewriting <figure> to shortcodes and fixing the paths got really boring, really fast. So I gave up and wrote a short Go program to did it for me.

Much tweaking later (deleting autogenerated HTML weirdness, replacing formatting with Markdown equivalents, tidying filenames, fixing brokenness in the theme layouts…), I was happy with the output from Hugo (what you see on this site now!) so I:

  • filed a DreamHost ticket to move hosting to my account,
  • gsutil -m rsync-ed the site up to the GCS bucket,
  • set the hosting options with the redirect and custom CNAME,
  • SSH’d into the old host and deleted all the old site files and dropped the database (local backup just in case!) and…

Everything worked!

Time for dinner!

Nines of nines

In the operations business we like to talk about nines of things, especially regarding service levels. If

  • “one nine of availability” = available 0.9 of the time,
  • “two nines of availability” = available 0.99 of the time,
  • and so on…

then generally,

  • “\(n\) nines of availability” = available \((1 - 10^{-n})\) of the time,

right?

This works for any whole number n: e.g. 5 nines is $$\begin{align}1 - 10^{-5} &= 1 - 0.00001 \\ &= 0.99999.\end{align}$$

There’s a problem with this simple generalisation, and that is, when people say “three and a half nines” the number they actually mean doesn’t fit the pattern. “Three and a half nines” means 0.9995, but

  • \(1 - 10^{-3.5} \approx 0.9996838\), and going the other way,
  • \(0.9995 \approx 1 - 10^{-3.30103}\).

We could resolve this difficulty by saying “3.3ish nines” when we mean 0.9995, or by meaning ~0.9996838 when we say “three and a half nines.” But there’s at least one function that fits the half-nines points as well!

Let’s start with the function above: $$f(n) = 1 - 10^{-n}.$$ For every odd integer, it just has to be lower by a small, correspondingly decreasing amount. We can do this by increasing the exponent of 10 by $$\begin{align}k &= 0.5 + \log_{10}(0.5) \\ &\approx 0.19897.\end{align}$$

One function for introducing a perturbation for halfodd integers is $$p(n) = \sin^2(\pi n).$$ When n is a whole integer, \(p(n) = 0\), and when \(n\) is half an odd integer, \(p(n) = 1\). Multiply this function by some constant and you’re in business.

Thus, define a new function \(g(n)\) for all \(n\):

$$g(n) := 1 - 10^{-n + k p(n)}$$

i.e.

$$g(n) = 1 - 10^{-n + (0.5 + \log_{10}(0.5))\sin^2(\pi n)}$$

which, when plotted, looks like this:

a negative exponential curve with a negative exponential wiggle. And it has the desired property that at every integer and half-integer it has a value with the traditional number of nines and trailing five (or not).