Steinar H. Gunderson

Tue, 30 Oct 2018 - SAT solvers for fun and fairness

Trøndisk 2018, the first round of the Norwegian ultimate series (the frisbee sport, not the fighting style) is coming up this weekend! Normally that would mean that I would blog something about all the new and exciting things we are doing with Nageru for the stream, but for now, I will just point out that the stream is on plastkast.no and will be live from 0945–1830 CET on Saturday (group stage) and 1020–1450 (playoffs) on Sunday.

Instead, I wanted to talk about a completely different but interesting subproblem we had to solve; how do you set up a good schedule for the group stages? There are twelve teams, pre-seeded and split into two groups (call them A0–A5 and B0–B5) that are to play round-robin, but there are only two fields—and only one of them is streamed. You want a setup that maximizes fairness in the sense that people get adequate rest between matches, and also more or less equal number of streamed games. Throw in that one normally wants the more exciting games last, and it starts to get really tricky to make something good by hand. Could we do it programmatically?

My first thought was that since this is all about the ordering, it sounded like a variant of the infamous travelling salesman problem. It's well-known that TSP is NP-hard (or NP-complete, but I won't bother with the details), but there are excellent heursitic implementations in practice. In particular, I had already used OR-Tools, Google's optimization toolkit, to solve TSP problems in the past; it contains a TSP solver that can deal with all sorts of extra details, like multiple agents to travel (in our case, multiple fields), subconstraints on ordering and so on. (OR-Tools probably doesn't contain the best TSP solver in the world—there are specialized packages that do even better—but it's much better than anything I could throw together myself.)

However, as I tried figuring out something, and couldn't quite get it to fit (there are so many extra nonlocal constraints), I saw that the OR-Tools documentation had a subsection on scheduling problems. It turns out this kind of scheduling can be represented as a so-called SAT (satisfiability) problem, and OR-Tools also has a SAT solver. (SAT, in its general forms, is also NP-hard, but again, there are great heuristics.) I chose the Python frontend, which probably wasn't the best idea in the world (it's poorly documented, and I do wonder when Python will take the step into the 90s and make spelling errors in variables into compile-time errors instead of throwing a runtime exception four hours into a calculation), but that's what the documentation used, and the backend is in C++ anyway, so speed doesn't matter.

The SAT solver works by declaring variables and various constraints between them, and then asking the machine to either come up with a solution that fits, or to prove that it's not possible. Let's have a look of some excerpts to get a feel for how it all works:

We know we have 15 rounds, two fields on each, and every field should contain a match. So let's generate 30 such variables, each containing a match number (we use the convention that match 0, 2, 4, 6, etc. are on the stream field and 1, 3, 5, 7, etc. are played in parallel on the other field):

matchnums = []
for match_idx in range(num_matches):
  matchnums.append(model.NewIntVar(0, num_matches - 1, "matchnum%d" % (match_idx))) 

So this is 30 variables, and each go from 0 to 29, inclusive. We start a fairly obvious constraint; we can only play each match once:

model.AddAllDifferent(matchnums)

The SAT solver might make this into a bunch of special constraints underneath, or it might not. We don't care; it's abstracted away for us.

Now, it's not enough to just find any ordering—after all, we want to find an ordering with some constraints. However, the constraints are rarely about the match numbers, but more about the teams that play in those matches. So we'll need some helper variables. For instance, it would be interesting to know which teams play in each match:

home_teams = []
away_teams = []
for match_idx in range(num_matches):
  home_teams.append(model.NewIntVar(0, num_teams - 1, "home_team_match%i" % (match_idx)))
  away_teams.append(model.NewIntVar(0, num_teams - 1, "away_team_match%i" % (match_idx)))
  model.AddElement(matchnums[match_idx], home_teams_for_match_num, home_teams[match_idx])
  model.AddElement(matchnums[match_idx], away_teams_for_match_num, away_teams[match_idx])

AddElement() here simply is an indexing operation; since there's no difference between home and away teams for us, we've just pregenerated all the matches as A0 vs. A1, A0 vs. A2, etc. up until A4 vs. A6, A5 vs. A6 and then similarly for the other gruop. The “element” constraint makes sure that e.g. home_team_match0 = home_teams_for_match_num[matchnum0]. Note that even though I think of this is as an assignment where the home team for match 0 follows logically from which match is being played as match 0, it is a constraint that goes both ways; the solver is free to do inference that way, or instead first pick the home team and then deal with the consequences for the match number. (E.g., if it picks A5 as the home team, the match number most certainly needs to be 14, which corresponds to A5–A6.)

We're not quite done with the helpers yet; we want to explode these variables into booleans:

home_team_in_match_x_is_y = [[
  model.NewBoolVar('home_team_in_match_%d_is_%d' % (match_idx, team_idx)) for team_idx in range(num_teams)
] for match_idx in range(num_matches)]
for match_idx in range(num_matches):
  model.AddMapDomain(matchnums[match_idx], match_x_has_num_y[match_idx])

and similarly for away team and match number.

So now we have a bunch of variables of the type “is the home team in match 6 A4 or not?”. Finally we can make some interesting constraints! For instance, we've decided already that the group finals (A0–A1 and B0–B1) should be the last two matches of the day, and on the stream field:

model.AddBoolOr([match_x_has_num_y[28][0], match_x_has_num_y[28][15]])
model.AddBoolOr([match_x_has_num_y[26][0], match_x_has_num_y[26][15]])

This is a hard constraint; we don't have a solution unless match 0 and match 15 are the last two (and we earlier said that they must be different).

We're going to need even more helper variables now. It's useful to know whether a team is playing at all in a given round; that's the case if they are the home or away team on either field:

plays_in_round = {}
for team_idx in range(num_teams):
  plays_in_round[team_idx] = {}
  for round_idx in range(num_rounds):
    plays_in_round[team_idx][round_idx] = model.NewBoolVar('plays_in_round_t%d_r%d' % (team_idx, round_idx))
    model.AddMaxEquality(plays_in_round[team_idx][round_idx], [
        home_team_in_match_x_is_y[round_idx * 2 + 0][team_idx],
        home_team_in_match_x_is_y[round_idx * 2 + 1][team_idx],
        away_team_in_match_x_is_y[round_idx * 2 + 0][team_idx],
        away_team_in_match_x_is_y[round_idx * 2 + 1][team_idx]])

Now we can establish a few other very desirable properties; in particular, each team should never need to play two matches back-to-back:

for round_idx in range(num_rounds - 1):
  for team_idx in range(num_teams):
    model.AddBoolOr([plays_in_round[team_idx][round_idx].Not(), plays_in_round[team_idx][round_idx + 1].Not()])

Note that there's nothing here that says the same team can't be assigned to play on both fields at the same time! However, this is taken care of by some constraints on the scheduling that I'm not showing for brevity (in particular, we established that each round must have exactly one game from group A and one from group B).

Now we're starting to get out of the “hard constraint” territory and more into things that would be nice. For this, we need objectives. One such objective is what I call ”tiredness”; playing matches nearly back-to-back (ie., game - rest - game) should have a penalty, and the solution should try to avoid it.

tired_matches = []
for round_idx in range(num_rounds - 2):
  for team_idx in range(num_teams):
    tired = model.NewBoolVar('team_%d_is_tired_in_round_%d' % (team_idx, round_idx))
    model.AddMinEquality(tired, [plays_in_round[team_idx][round_idx], plays_in_round[team_idx][round_idx + 2]])
    tired_matches.append(tired)
sum_tiredness = sum(tired_matches)

So here we have helper variables that are being set to the minimum (effectively a logical AND) of “do I play in round N” and “do I play in round N + 2”. Tiredness is simply a sum of those 0–1 variables, which we can seek to minimize:

model.Minimize(sum_tiredness)

You may wonder how we went from a satisfiability problem to an optimization problem. Conceptually, however, this isn't so hard. Just ask the solver to find any solution, e.g. something with sum_tiredness 20. Then simply add a new constraint saying sum_tiredness <= 19 and ask for a re-solve (or continue). Eventually, the solver will either come back with a better solution (in which case you can tighten the constraint further), or the message that you've asked for something impossible, in which case you know you have the optimal solution. (I have no idea whether modern SAT solvers actually work this way internally, but again, conceptually it's simple.)

As an extra bonus, you do get incrementally better solutions as you go. These problems are theoretically very hard—in fact, I let it run for fun for a week now, and it's still not found an optimal solution—and in practice, you just take some intermediate solution that is “good enough”. There are always constraints that you don't bother adding to the program anyway, so there's some eyeballing involved, but still feels like a more fair process than trying to nudge it by hand.

We had many more objectives, some of them contradictory (e.g., games between more closely seeded opponents are more “exciting”, and should be put last—but they should also be put on the stream, so do you put them early on the stream field or late on the non-stream field?). It's hard to weigh all the factors against each other, but in the end, I think we ended up with something pretty nice. Every team gets to play two or three times (out of five) on the stream, only one team needs to be “tired” twice (and I checked; if you ask for a hard maximum of once for every team, it comes back pretty fast as infeasible), many of the tight matches are scheduled near the end… and most importantly, we don't have to play the first matches while I'm still debugging the stream. :-)

You can see final schedule here. Good luck to everyone, and consider using a SAT solver next time you have a thorny scheduling problem!

[22:55] | | SAT solvers for fun and fairness

Steinar H. Gunderson <sgunderson@bigfoot.com>