Integer Programming (and Operations Research)

Wed, Dec 1, 2021 tags: [ Rwth Or Programming Julia ]

updated: 2021-12-27

Towards the end of my Bachelor’s program, I decided to broaden my horizon and expand my options – and subsequently enrolled not only for Master’s studies in Physics, but also in Economics. This is a special curriculum offered by my university for students of science and engineering.

Long story short, whereas some courses are quite, hm, boring – I’ve been amazed by the Operations Research I lecture (given by Prof. Lübbecke). In fact, I even chose Operations Research as the focus of studies for the economics degree. But what is Operations Research (OR) in the first place? Being very much a beginner in this field, I don’t dare give a good definition or even overview of it. Generally, this field is concerned with optimization problems, often discrete ones, with applications in business, logistics, government, military, and so on. Typical problems in OR are route planning (including travelling sales person/TSP), packing, scheduling, allocation of resources, and so on.

A very important tool of OR is Linear Programming (LP) as well as (Mixed) Integer Programming ((M)ILP). I had heard about LP before, but never actually read more about it. Depending on where you read about it, there is a lot of theory involved – which I don’t want to focus on here. There are good books for that! Instead, I want to present a few typical problems that can be solved using these two tools; most of them I adapted from the lecture as we’ve been treated to a nice overview of scenarios.

The goal of using both LP and MILP is usually to solve a linear (L in LP) optimization problem. The optimization problem is always stated as an objective function which is either to be minimized or maximized, as well as a set of constraints which all linearly depend on a set of variables. The constraints are formulated from data about a given problem. In terms of objective functions, there can also be special types like min max objectives, which are used to, for example, minimize the largest X where X could be a distance, duration, or any other quantity worth optimizing for (think: “minimize the longest distance between a hospital and any settlement”, which is different from “minimize the mean distance between a hospital and all settlements”).

The difference between LP and MILP is that Linear Programs use variables from the set of real numbers $\mathbb R$. MILP uses at least one variable from the set of integer numbers $\mathbb Z$ (or positive integers $\mathbb Z_+$). It may sound counterintuitive: but the restriction to integers actually makes MILP a lot more powerful (for many applications) than having to use real numbers! Essentially, all combinatorial problems are hard or impossible to solve as LPs, but become tractable as MILPs (… linear programs).

The goal of this short non-exhaustive practical hand-waving article is to show you how to start using LP and MILP in order to solve every-day problems, or maybe even simple problems in your work. I didn’t know of these tools before discovering that they are actually fairly easy to use – I hope this text will help you do the same!

Intro: Nutrition Program

Before slam-dunking on why I am amazed by these two basic tools, I’d like to present a typical linear program used for introducing basic concepts: the nutrition problem. It is a quite famous example too, so following it here is a safe bet.

The problem is: We have a limited offering of foods – think of a global pandemic combined with broken trade contracts –, specifically: Bread, Meat, Cheese, and Apples.

(per kg)	Bread ($i = 1$)	Meat ($i = …$)	Cheese	Apples
Price $P_i$	4	17	20	4
Carbs $carbs_i$	700g	100g	50g	150g
Fat $fat_i$	10g	200g	400g	1g
Proteins $prot_i$	30g	300g	300g	50g

Additionally, we know the calorie content of macronutrients (roughly):

	$\mathrm{cal}(\cdot)$ / Kg
Carbohydrates	4100
Fat	9000
Proteins	4000

From this we can also calculate the calories per kilogram for each food. Now we’d like to know what we need to buy in order to fulfill our 2000 recommended daily calories, from which 1/2 come from carbs and each 1/4 from fat and proteins, while at the same time paying the least possible price.

We can formulate this as a linear program. We introduce the variables storing how much (weight in kg) we buy for each food:

$$x_i,\qquad 1 \le i \le 4$$

The objective function is the price we pay in total:

$$\min \sum_i x_i P_i$$

And now we just need to formulate our constraints. First, we want to eat at least 2000 calories! That means that the sum of calories per nutrient per kilogram times the weight of each product needs to be at least 2000:

$$\mathrm{totalcals} := \sum_i x_i * \sum_{j\in{\mathrm{carbs,fat,prot}}} \frac{j_i}{\mathrm{kg}} \ge 2000$$

Also, we don’t want to only eat fat just because it’s cheap (for example). Thus, for each nutrient the sum over all products $i$ needs to be equal to the desired fraction of total calories:

$$\sum_i \mathrm{carbs}_i/\mathrm{kg} * x_i \mathrm{cal(carbs)} = \frac{1}{2} \mathrm{totalcals} $$ $$\sum_i \mathrm{fat}_i/\mathrm{kg} * x_i \mathrm{cal(fat)} = \frac{1}{4} \mathrm{totalcals} $$ $$\sum_i \mathrm{protein}_i/\mathrm{kg} * x_i \mathrm{cal(protein)} = \frac{1}{4} \mathrm{totalcals} $$

Finally, we shouldn’t forget that we can only buy positive quantities:

$$x_i \ge 0 \qquad \forall i \in 1..4$$

That’s the linear program representing the problem at hand. Mathematically, this can be reduced to a polyeder in an $n$-dimensional space, or mathematically as

$$\min \lbrace a^\top x \mid Ax \le c, x \ge 0 \rbrace $$

where $a$ is a vector representing the prices (in this case), $x$ are the product weights (as above), and $A$ a matrix containing our conditions phrased above. I won’t translate this here – but as all constraints are linear, be assured that it is possible to phrase the problem elegantly as a vector and a matrix.

Intro 2: Integer program

One of the easiest problems formulated as integer program is the knapsack (binpacking) problem. Every computer science student knows this, and knows that solving it is NP-hard. For now, let us just write it down as linear integer program.

Assuming

a single bin of capacity $C$, and
$n$ objects of size $s_i$ ($1 \le i \le n$), and
“profit” $p_i$ for each object,

the goal is to pack objects into the bin in a way that maximizes the profit: Imagine a truck being packed with discrete goods, and we want to maximize the profit earned by driving the goods to their destinations.

The variable vector $x_i \in \lbrace 0, 1 \rbrace$ determines whether item $i$ is packed into the bin.

The objective is maximizing the profit: $$\max \sum_i p_i x_i$$

The objective is subject to (s.t.) fulfilling the only constraint (so far): the capacity of the bin may not be exceeded.

$$\sum_i x_i s_i \le C$$

What is the difference to the shopping program above? The fact that $x_i$ can only be 0 or 1 changes everything! And this is not even exaggerated; it would be unsatisfying to write down this problem as LP with only real variables.

To solve this naively, we would have to check each combination to be sure to find the best combination, which results in an overwhelming number of combinations to try; or use a greedy strategy, which works well in many cases but is not guaranteed to be optimal.

Solving problems

The next interesting question is: how can we solve problems like the presented ones? Fortunately, a lot of software, proprietary and open source, for treating this kind of problem already exists! And in most cases, it does a quite decent job, too.

What this means for you: Many tricky optimization problems can be formulated as linear (integer) programs and solved quite efficiently. This saves writing specific algorithms to pack a bin or calculate an optimal nutrition program.

I will not treat proprietary software packages here (such as Gurobi), and instead focus on the best combination I’ve encountered so far: Julia using the JuMP package (jump.dev). What this package does for us is quite amazing: It encapsulates a number of existing solvers – such as GLPK and SCIP, both open, but also Gurobi and many more – behind a single interface making it very easy to solve linear programs. And even better than that, it also supports non-linear optimization problems (such as semi-definite or conic programs), which I will not treat here.

In any case, similar software packages exist for other programming languages.

Translating to Julia

Nutrition

First, we need to introduce the data which has been presented as tables above:

# target energy per day
want_cal = 2000

# how many calories from each macronutrient
want_frac_carb = 1/2
want_frac_prot = 1/4
want_frac_fat = 1/4

# calories per kilogram
fat_cals = 9000
carb_cals = 4100
prot_cals = 4000

# available foods. All following arrays refer to this order;
# one could use DataFrames instead, too.
foods = [:bread, :meat, :cheese, :apple]
price_per_kg = [4, 17, 20, 4]

carbs_per_kg = [.600, .100, .050, .150]
fat_per_kg = [.010, .200, .400, .001]
prot_per_kg = [.030, .300, .300, .050]

# calculate calories per kg from the macronutrients.
# this matrix contains the macronutrient content per food in each column;
# calculating calories per mass for each food is then an easy matrix-vector multiplication.
macromat = hcat(carbs_per_kg, prot_per_kg, fat_per_kg)

calories_per_kg = macromat * [carb_cals, prot_cals, fat_cals]

Organizing the input is of course subject to different constraints, depending on how the model looks. Some ways may be better than others, and I don’t claim that my input format here is perfect!

In the next step, we can already formulate the model. It incorporates the constraints already written down above; the code just phrases them in terms of the JuMP primitives @variable, @objective, and @constraint. The API Reference contains all the details, and is very readable. Here, we use the SCIP optimizer; frankly it doesn’t matter too much here, alternatively the GLPK optimizer works just as well. Attesting to the thought going into JuMP, we can switch out the optimizer in the first line of the function without changing anything else!

import JuMP as J
import SCIP

# Nutrition model!
function nutrition()
    m = J.Model(SCIP.Optimizer)
    F = length(foods)

    # Variables: weight bought per food. The only variable needed here, it is a vector. Each element
    # will be set in so that the objective function is optimized, subject to the constraints.
    # Note that the macros don't adhere to plain Julia syntax, and instead implement a kind of
    # domain-specific language making it easier to write terse models.
    J.@variable(m, weight[i=1:F])
    
    # Objective function: minimize prize. In the objective and constraints, we use standard Julia
    # expressions for values.
    J.@objective(m, Min, sum(price_per_kg[i] * weight[i] for i in 1:F))
    
    # Constraints:

    # 1. We only buy positive amounts of food.
    J.@constraint(m, [i=1:F], weight[i] >= 0)

    # 2. In total, we buy at least 2000 calories.
    J.@constraint(m, sum(weight[i] * calories_per_kg[i] for i in 1:F) >= want_cal)
    
    # 3. Ensure that each macronutrient provides its desired share of total calories -- and here it
    # could pay off to organize the inputs in a more structured fashion. Also, the generator
    # expressions could be replaced by elementwise multiplications.
    J.@constraint(m, carbcals, sum(carbs_per_kg[i] * weight[i] * carb_cals for i in 1:F) >=
        want_frac_carb * sum(weight[i] * calories_per_kg[i] for i in 1:F))
    J.@constraint(m, fatcals, sum(fat_per_kg[i] * weight[i] * fat_cals for i in 1:F) >=
        want_frac_fat * sum(weight[i] * calories_per_kg[i] for i in 1:F))
    J.@constraint(m, protcals, sum(prot_per_kg[i] * weight[i] * prot_cals for i in 1:F) >=
        want_frac_prot * sum(weight[i] * calories_per_kg[i] for i in 1:F))
    
    # That was it! Optimize the model now. This effectively finds the best values for the `weight`
    # vector.
    J.optimize!(m)
    
    # `value()` provides access to the optimum weights calculated before.
    o_weight = J.value.(weight)
    # The value of the objective function is the price we have to pay for our food.
    o_price = J.objective_value(m)
    # We can calculate the calories again to check that this constraint is actually fulfilled.
    o_cals = sum(o_weight[i] * calories_per_kg[i] for i in 1:F)
    
    (weights=o_weight, price=o_price, calories=o_cals)
end

By running nutrition(), the model is created and solved, and we obtain a solution:

(weights = [0.1597925757578072, 0.2657397182651143, 0.0, 0.80968614495463],
 price = 8.395490093356692,
 calories = 2000.0)

In the case of using SCIP, we also gain insights into the solving process – which worked really fast here:

presolving:
(round 1, fast)       0 del vars, 4 del conss, 0 add conss, 4 chg bounds, 0 chg sides, 0 chg coeffs, 0 upgd conss, 0 impls, 0 clqs
presolving (2 rounds: 2 fast, 1 medium, 1 exhaustive):
 0 deleted vars, 4 deleted constraints, 0 added constraints, 4 tightened bounds, 0 added holes, 0 changed sides, 0 changed coefficients
 0 implications, 0 cliques
presolved problem has 4 variables (0 bin, 0 int, 0 impl, 4 cont) and 4 constraints
      4 constraints of type <linear>
Presolving Time: 0.00

 time | node  | left  |LP iter|LP it/n|mem/heur|mdpt |vars |cons |rows |cuts |sepa|confs|strbr|  dualbound   | primalbound  |  gap   | compl. 
* 0.0s|     1 |     0 |     3 |     - |    LP  |   0 |   4 |   4 |   4 |   0 |  0 |   0 |   0 | 8.395490e+00 | 8.395490e+00 |   0.00%| unknown
  0.0s|     1 |     0 |     3 |     - |   586k |   0 |   4 |   4 |   4 |   0 |  0 |   0 |   0 | 8.395490e+00 | 8.395490e+00 |   0.00%| unknown

SCIP Status        : problem is solved [optimal solution found]
Solving Time (sec) : 0.00
Solving Nodes      : 1
Primal Bound       : +8.39549009335669e+00 (1 solutions)
Dual Bound         : +8.39549009335669e+00
Gap                : 0.00 %

To the best of my knowledge, the last few lines indicate that the found solution is provably optimal. Just what we wanted!

So what did we find? We need to buy 160 grams of bread, 266 grams of meat, no cheese, and 810 grams of apples. Hmm, a lot of meat and apples indeed (and I’m a vegetarian…). I guess we didn’t put an objective on taste :-)

Binpacking (truck packing?)

This problem is actually a fair bit simpler than the previous one, especially in the simple case of one bin, no conflicts, etc. Without further ado, let me just write down my approach:

import JuMP as J
import SCIP

# data section (could also be read from an external source)

# We can pack 200 units
capacity = 200
# There are 500 articles
n = 500
# Each article has a size of 1 to 15 units.
sizes = rand(UInt, n) .% 15 .+ 1
# and a float profit between 0 and 20.
profits = rand(Float32, n) * 20

function pack()
    m = J.Model(SCIP.Optimizer)
    
    # Item i is packed or not? (binary variable)
    J.@variable(m, x[i=1:n], Bin)
    
    # Maximize profit, i.e. sum of profits for all packed items.
    J.@objective(m, Max, sum(x[i] * profits[i] for i in 1:n))
    
    # Here: only observe capacity. Of course, there could be many more interesting constraints, such
    # as conflicts, priority or deadlines, and so on.
    J.@constraint(m, sum(x[i] * sizes[i] for i in 1:n) <= capacity)
    
    J.optimize!(m)
    
    o_x = J.value.(x) .> 0
    # Return the vector of packed items.
    o_x
end

That’s it! I find it almost too boring to include here – but I also have trust in you, dear reader, to generalize this problem and implement your own complications! Note that even with thousands or many more items, the problem is solved fairly quickly (milliseconds or less).

More Integer Programming

Linear programming is interesting, but (in my opinion) not as interesting as integer programming. Therefore, I will follow the simple examples above with a few more that demonstrate some cool properties of being able to simply translate a problem and have it solved without doing the work yourself.

Sudoku

Filling a sudoku riddle is somewhat of a prime application for integer programming: For one, it only depends on binary variables (like the binpacking problem above); and furthermore, it can be modelled as a graph coloring problem: Imagine each sudoku cell as a vertex with edges to all cells in the same block, row, and column; the goal after all is to assign a “color” (i.e., number) so that it is not shared with any adjacent vertex (cell).

As before, it is useful to first write down the problem in mathematical terms. I assume a 9x9 sudoku from here on:

In terms of variables, there is a tensor of rank 3 representing the field: $$x_{ijk} \in \lbrace 0,1 \rbrace \quad \hat = \quad \text{Cell in line i, column j is set to number k.}$$
There is no objective: after all, as long as the conditions are fulfilled, any solution is worth equally much. $$\text{optimize} \qquad \max const$$
The meat is in the constraints:
- A field can must be filled with exactly one number: $$\sum_{k} x_{ijk} = 1 \qquad \forall i,~j$$
- Each number can only appear once in each line (this formula in fact represents 9x9=81 constraints, one for each line and each number): $$\sum_{\text{column } j} x_{ijk} = 1 \qquad \forall \text{ line } i, \text{ number } k$$
- Each number can only appear once in each column: $$\sum_{\text{line } i} x_{ijk} = 1 \qquad \forall \text{column } j, \text{ number } k$$
- A bit more complicated: the block constraint. Set $b := 3$, the side length of each block; then: $$\sum_{i = (l-1) b + 1}^{l b} \sum_{j = (m-1) b + 1}^{m b} x_{ijk} = 1 \qquad \forall \text{ row block } l \in \lbrace 1..3 \rbrace, \text{ column block } m \in \lbrace 1..3 \rbrace, \text{ number } k$$ This expression sums over all indices $(i,j)$ within block $(l, m)$ and ensures that the sum over all cells filled with one number $k$ is exactly $1$.

Now we only need code to convert a sudoku from matrix form into the 3-rank matrix:

const Sudoku = Matrix{Int}

function field_to_bitfield(s::Sudoku)::Array{Int8, 3}
    n = size(s, 1)
    if n != size(s, 2)
        error("not quadratic")
    end
    
    bits = zeros(Int8, n, n, n)
    for i in 1:n
        for j in 1:n
            if s[i,j] > 0
                bits[i, j, s[i,j]] = 1
            end
        end
    end
    bits
end
function bitfield_to_field(s::Array{Int8, 3})::Sudoku
   n = size(s, 1)
    if n != size(s, 2) || n != size(s, 3)
        error("not quadratic")
    end
    
    field = zeros(Int8, n, n)
    for i in 1:n
        for j in 1:n
            field[i,j] = findfirst(==(1), s[i,j,:])
        end
    end
    field
end

Also, I copied an evil level sudoku riddle from sudoku.com:

evil = [
    0 0 0  0 1 0  0 0 0;
    0 9 0  0 0 0  2 0 0;
    0 0 3  5 0 4  0 6 0;
    
    3 0 0  0 0 0  0 0 4;
    0 0 0  0 0 8  0 0 0;
    0 0 4  7 0 6  0 5 0;
    
    0 0 7  0 8 0  0 0 0;
    2 0 0  1 0 7  6 0 0;
    0 0 0  0 3 0  0 1 0;
]

Here, a 0 represents an empty cell. The conversion functions above respect this. The constraints are clear now, and therefore it’s fairly simple to write the code – after all, this is the great advantage of the integer programming approach. No complicated code, everything comes down to thinking of the correct constraints!

import JuMP as J
import SCIP

function fill_sudoku(s::Sudoku)
    model = J.Model(SCIP.Optimizer)
    # n == 9
    n = size(s, 1)
    # block == 3
    block = round(Int, sqrt(n))

    # field variable: binary, as promised.
    J.@variable(model, field[i = 1:n, j = 1:n, k = 1:n], Bin)

    # Place initial constraints on field.
    for i in 1:n
        for j in 1:n
            # 1. Each cell can only contain one number.
            J.@constraint(model, sum(field[i, j, :]) == 1)
            if s[i,j] != 0
                # 2. All pre-filled numbers are fixed, not subject to being inferred.
                # https://jump.dev/JuMP.jl/stable/reference/variables/#JuMP.fix
                J.fix(field[i, j, s[i,j]], 1)
            end
        end
    end
    
    # Now, the simple constraints: Note the implicit loop using `[digit = 1:n]`, i.e. one constraint
    # per digit is added.
    for l in 1:n  # 3. line uniqueness
        J.@constraint(model, [digit = 1:n], sum(field[l, :, digit]) == 1)
    end
    for c in 1:n  # 4. column uniqueness
        J.@constraint(model, [digit = 1:n], sum(field[:, c, digit]) == 1)
    end

    # 5. Finally, the block constraint as described above.
    for bi in 1:block  # one for each row block
        for bj in 1:block  # one for each column block
            J.@constraint(model,
                    [digit=1:n], # one for each digit
                    # actual expression
                    sum(field[i,j,digit]
                        for i in (bi-1)*block+1:bi*block  # first sum sign
                        for j in (bj-1)*block+1:bj*block  # second sum sign
                    ) == 1)
        end
    end

    J.optimize!(model)

    # Convert solved bit tensor to the human-friendly sudoku format.
    bitfield_to_field(round.(Int8, J.value.(field)))
end

And after some work by SCIP, we get the solution within a few milliseconds. Honestly, I find solving sudokus super boring, so all the better if we know how to get a computer to solve them for us!

> fill_sudoku(evil)
9×9 Matrix{Int64}:
 4  7  6  2  1  9  5  8  3
 5  9  1  8  6  3  2  4  7
 8  2  3  5  7  4  1  6  9
 3  6  2  9  5  1  8  7  4
 7  1  5  3  4  8  9  2  6
 9  8  4  7  2  6  3  5  1
 1  3  7  6  8  5  4  9  2
 2  4  8  1  9  7  6  3  5
 6  5  9  4  3  2  7  1  8

I wrote a sudoku solver in Haskell a few years ago (a few = 8), and I definitely prefer the new way! The old one you can find on GitHub, anyway. I used fairly boring backtracking there, which was good enough for most humanly solvable puzzles.

Travelling Sales Person (TSP)

As mentioned in the beginning, the TSP problem is a classic optimization problem. Every computer science student learns about it – but thanks to IP, we can skip the nasty details and focus on how to shape it into a well-formulated problem.

On first thought, we have a few requirements: 1. Every vertex needs to be visited once 2. The distance travelled shall be minimal. With a bit more thought, we also find that 3. the tour should be a closed loop and not have multiple sub-loops – that is where the problem is a bit more tricky. Naively, we could use constraints that state: every subset of vertices must be entered and left at least once – this avoids closed subtours (Dantzig/Fulkerson/Johnson, 1954). However, there is an exponential number of subsets; $2^n$, to be precise. This gets out of hand with as few as 10 vertices (1024 subsets); it is possible but not desirable. Let me write it down here anyway:

TSP: subset tour elimination via cut constraints

The inputs are the number of facilities $n$, and the distances (or costs) between each facility. If only straight geographic distance is used as cost, then this is a symmetric matrix $c_{ij}$.

variables $x_{ij} \in \lbrace 0,1\rbrace$ whether the SP travels between vertices $i$ and $j$.
objective $$\min \sum_{ij} x_{ij} c_{ij}$$
constraints
- No tour from a facility leads back to a facility (this is redundant with later constraints, but useful to keep an overview): $$x_{ii} = 0 \qquad \forall 1 \le i \le n $$
- For each facility $i$, one tour leads from it to another one $j$; and one leads from another facility $k$ to this facility $i$. $$\sum_j x_{ij} = 1 \qquad \forall 1 \le i \le n$$ $$\sum_j x_{ji} = 1 \qquad \forall 1 \le i \le n$$
- Now the interesting part: for every subset of vertices (facilities), the tour must enter and exit the subset at least once. This constraint disallows closed subtours (because there would exist at least one subset which is neither entered nor left by the tour), but results in an exponential number of constraints. Here, we use $\delta(S)$ to denote all edges $(i,j)$ which cross the subset boundary (graph theorists call this a cut set), and $E$ to refer to the set of all edges. Therefore, $$\sum_{(i,j) \in \delta(S)} x_{ij} \ge 2 \qquad \forall S \in P(E) \text{ (power set P)}$$

The problem left is to determine the set of cuts for every subset. A naive but working function (on my machine: 136 ms for $2^{15}$ subsets) looks like this:

function all_subset_cuts(count)
    subset = zeros(count)
    
    allcuts::Vector{Vector{Tuple}}= []

    for i = 1:2^count
        for j = 1:count
            if (i & (1 << (j-1))) > 0
                subset[j] = 1
            end
        end
        
        cuts::Vector{Tuple} = []
        for j in 1:count
            for k in 1:count
                if subset[j] + subset[k] == 1
                    push!(cuts, (j, k))
                end
            end
        end
        subset .= 0
        push!(allcuts, cuts)
    end
    allcuts
end

and returns a result such as

> all_subset_cuts(4)
16-element Vector{Vector{Tuple}}:
[(1, 2), (1, 3), (1, 4), (2, 1), (3, 1), (4, 1)]
[(1, 2), (2, 1), (2, 3), (2, 4), (3, 2), (4, 2)]
[(1, 3), (1, 4), (2, 3), (2, 4), (3, 1), (3, 2), (4, 1), (4, 2)]
[(1, 3), (2, 3), (3, 1), (3, 2), (3, 4), (4, 3)]
[(1, 2), (1, 4), (2, 1), (2, 3), (3, 2), (3, 4), (4, 1), (4, 3)]
[(1, 2), (1, 3), (2, 1), (2, 4), (3, 1), (3, 4), (4, 2), (4, 3)]
[(1, 4), (2, 4), (3, 4), (4, 1), (4, 2), (4, 3)]
[(1, 4), (2, 4), (3, 4), (4, 1), (4, 2), (4, 3)]
[(1, 2), (1, 3), (2, 1), (2, 4), (3, 1), (3, 4), (4, 2), (4, 3)]
[(1, 2), (1, 4), (2, 1), (2, 3), (3, 2), (3, 4), (4, 1), (4, 3)]
[(1, 3), (2, 3), (3, 1), (3, 2), (3, 4), (4, 3)]
[(1, 3), (1, 4), (2, 3), (2, 4), (3, 1), (3, 2), (4, 1), (4, 2)]
[(1, 2), (2, 1), (2, 3), (2, 4), (3, 2), (4, 2)]
[(1, 2), (1, 3), (1, 4), (2, 1), (3, 1), (4, 1)]
[]
[]

Imagining a four-element set, you can verify that these are indeed all possible cuts. For example, the first element contains all cuts for the subset $\lbrace 1 \rbrace$ if the set is $\lbrace 1,2,3,4 \rbrace$.

You know by now roughly how to translate the IP into a JuMP problem, so not much more explanation is needed. Except that the first version is very explicit: we leave no wiggle room, and want to end up with an exactly defined route with certain order. Further below, there is an optimized version using half as many variables and even fewer constraints.

import JuMP as J
import SCIP

# Dantzig/Fulkerson/Johnson
# Travelling Sales Person problem
function tsp_dfj()
    m = J.Model(SCIP.Optimizer)
    
    # variable: edge i->j is travelled.
    J.@variable(m, tour[i=1:n, j=1:n], Bin)
    
    J.@objective(m, Min, sum(tour[i,j] * dists[i,j] for i = 1:n for j = 1:n))
    
    # 1. no self-visiting.
    J.@constraint(m, facility_no_self[i=1:n], tour[i,i] == 0)
    # 2. for each facility: exactly one incoming edge
    J.@constraint(m, facility_incoming[i=1:n], sum(tour[:, i]) == 1)
    # 3. for each facility: exactly one outgoing edge
    J.@constraint(m, facility_outgoing[i=1:n], sum(tour[i, :]) == 1)

    # eliminate subtours: enforce enter/exit for each subset.
    allcuts = all_subset_cuts(n)
    for subset = allcuts
        if length(subset) > 0
            # 4. Every cut must be crossed at least twice (once in, once out)
            J.@constraint(m, sum(tour[i,j] for (i,j) = subset) >= 2)
        end
    end
    
    J.optimize!(m)
    
    J.value.(tour), J.objective_value(m)
end

The data is generated at random, using Euclidean distances between facilities:

n = 15
facilities = rand(n, 2) .* 50

dists = zeros(n, n)
for i = 1:n
    for j = 1:n
        dists[i,j] = sqrt(sum((facilities[i, :] .- facilities[j, :]).^2))
    end
end

Running this for 15 facilities gives us a result after 18 seconds (on my machine, of course). That’s already kind of lame for just 15 vertices! Using the following function, the resulting graph matrix can be converted into a sequence:

function graph_to_sequence(g::Matrix)::Vector
    i = 1
    path = zeros(Int, size(g, 1))
    g_ = (g .== 1)

    n = 1
    while n <= length(path)
        path[n] = i
        i = findfirst(g_[i, :])
        n += 1
    end
    path
end

If we don’t care so much about the exact order – after all, we could walk both ways – but just about the connections, the model can work with half as many constraints and variables:

function tsp_dfj_short()
    m = J.Model(GLPK.Optimizer)
    
    # variable: connection i->j is used. Note: Only created for j >= i!
    J.@variable(m, tour[i=1:n, j=i:n], Bin)
    
    J.@objective(m, Min, sum(tour[i,j] * dists[i,j] for i = 1:n for j = 1:n))

    # There must be two connections from/to any single node
    J.@constraint(m, enterexit[i=1:n], sum(tour[k, l] for k in 1:n for l in 1:n if (i == k || i == l) && k < l) == 2)

    # and each subset must be entered/left at least twice.
    for ps in Combinatorics.powerset(collect(1:n), 2)
        if length(ps) == n
            continue
        end
        # There must be at least 2 crossing edges for any subset.
        J.@constraint(m, sum(tour[i,j] for i in 1:n for j in 1:n if (in(i, ps) + in(j, ps)) == 1) >= 2)
    end
    
    J.optimize!(m)
    
    J.value.(tour), J.objective_value(m)
end

Extracting the sequence, we have to piece it together ourselves:

g, o = tsp_dfj_short()
[k for k in keys(g.data) if g[k] == 1]

# 8-element Vector{Tuple{Int64, Int64}}:
# (4, 5)
# (1, 2)
# (2, 5)
# (1, 3)
# (3, 8)
# (4, 6)
# (7, 8)
# (6, 7)

I.e.: (for this example): 1 -> 2 -> 5 -> 4 -> 6 -> 7 -> 8 -> 3 -> 1.

TSP: Improving on exponential subsets

Using the approach presented by Miller/Tucker/Zemlin in 1960, we can improve a lot. Caution, spoiler alert: for the 15 vertices solved above in 18 seconds, this problem formulation takes just 770 ms. Even for 25 vertices, only 2.4 seconds are needed to find an optimal solution. So how does it work?

The idea is to not look at every subset – which is prohibitively expensive – but instead to assign a sequence number to each vertex, and ensure that it is monotonically increasing along the tour. This prevents returning to the same node (except for the first one).

variables
- Edge $i \rightarrow j$ is taken: $x_{ij} \in \lbrace 0,1\rbrace$
- Sequence numbers of vertices: $u_i \in \mathbb Z$.
objective $$\min \sum_{ij} x_{ij} c_{ij}$$
constraints
- No self-visit: $$x_{ii} = 0 \qquad \forall 1 \le i \le n $$
- For each facility $i$, one tour leads from it to another one $j$; and one leads from another facility $k$ to this facility $i$. $$\sum_j x_{ij} = 1 \qquad \forall 1 \le i \le n$$ $$\sum_j x_{ji} = 1 \qquad \forall 1 \le i \le n$$
- Enforce monotonicity: $u_i + 1 \le u_j$ if $x_{ij} > 0$. This is a problem! We can’t write this constraint in a linear program! Therefore we need a trick. The trick is called “big-M trick” and not especially elegant, and potentially detrimental to the solver performance. However, this is not such a big problem for now.
  The idea is to “disable” the constraint for those edges that are not travelled, by adding a large constant to one side of an inequality, so that the constraint only “kicks in” if $x_{ij} = 1$ in this case: $$u_i + 1 \le u_j + M (1 - x_{ij})$$ This can be rearranged a bit to make the solver’s job easier (all variables on one side), and we can see that $M = n$ (the number of nodes), as the largest difference between two sequence numbers is $n$. We can then write the constraint as $$u_i - u_j + x_{ij} n \le n - 1 \qquad \forall i \in \lbrace 1..n \rbrace,~j \in \lbrace 2..n \rbrace$$

In JuMP, this problem can be written as

# Miller/Tucker/Zemlin (1960)

function tsp_mtz()
    m = J.Model(SCIP.Optimizer)
    
    # variable: connection i->j is used.
    J.@variable(m, tour[i=1:n, j=1:n], Bin)
    # variable: sequence number of visited nodes
    J.@variable(m, sequence[i=1:n] >= 0, Int)
        
    # facility 1 is the start.
    J.fix(sequence[1], 1, force=true)

    # dists contains distances between vertices.
    J.@objective(m, Min, sum(tour[i,j] * dists[i,j] for i = 1:n for j = 1:n))
    
    # 1. no self-visiting.
    J.@constraint(m, facility_no_self[i=1:n], tour[i,i] == 0)
    # 2. for each facility: exactly one incoming edge
    J.@constraint(m, facility_incoming[i=1:n], sum(tour[:, i]) == 1)
    # 3. for each facility: exactly one outgoing edge
    J.@constraint(m, facility_outgoing[i=1:n], sum(tour[i, :]) == 1)
    
    # 4. eliminate subtours: enforce strict monotonicity
    #J.@constraint(m, monoton[i=1:n, j=2:n], sequence[i] + 1 <= sequence[j] + n * (1 - tour[i,j]))
    # or reformulated:
    J.@constraint(m, monoton[i=1:n, j=2:n], sequence[i] - sequence[j] + n * tour[i,j] <= n-1)
    
    J.optimize!(m)
    
    J.value.(sequence), J.objective_value(m)
end

The journey can be visualized in a simple plot, showing the optimal tour for a particular set of 25 points (distance = 227.8):

Found optimal route

TSP, the third: use incremental constraints (DFJ revisited)

While experimenting more, I’ve also tried a variation of the Dantzig/Fulkerson/Johnson method described above. As it turns out, not all constraints need to be added in the beginning. Instead, we can solve the model using the existing DFJ constraints (in/outgoing edges, no self-visiting), detect subtours not connected to each other, and add a new constraint for each detected subtour. Finding subtours can be achieved in polynomial time by looking for min-cuts; I’m using the Graphs.jl framework here.

This approach drastically reduces the number of constraints having to be considered by the solver, exploiting the presence of subsolutions (subtours) which we already knew before. In my observations, the number of iterations was usually in the same order of magnitude as the number of vertices.

Despite having to run the solver several times in a row, this approach proved to be faster and more scalable than the both problem statements above. Of course, I validated the solutions, here by using the MTZ approach above – the solved cost and paths are identical for all samples I’ve tried.

import JuMP as J
import SCIP
import GLPK

import Graphs as G
import SimpleWeightedGraphs as SWG

# Convert a tour matrix into a weighted graph.
function result_to_graph(g::Matrix)::SWG.SimpleWeightedGraph
    gr = SWG.SimpleWeightedGraph(size(g, 1))

    n = size(g, 1)
    for i in 1:n
        for j in 1:n
            if g[i,j] > .9
                G.add_edge!(gr, i, j, g[i,j])
            end
        end
    end
    gr
end

# Finds a min-cut in a weighted graph, and returns the edges in that cut. These edges will be added
# as constraints to the MILP problem.
function find_sec_cut(rg::SWG.SimpleWeightedGraph)::Tuple
    (partition, cost) = G.mincut(rg)
    a, b = findall(==(1), partition), findall(==(2), partition)
    if (cost >= 2)
        return [], cost
    end
    return [(i,j) for i in a for j in b], cost
end

function tsp_incremental()
    m = J.Model(GLPK.Optimizer)
    n = size(facilities, 1)

    J.@variable(m, 0 <= tour[i=1:n, j=1:n] <= 1, Bin)

    J.@objective(m, Min, sum(tour[i, j] * dists[i, j] for i = 1:n for j = 1:n))

    J.@constraint(m, [i=1:n, j=1:n], tour[i,j] + tour[j,i] <= 1)

    # no self-visiting.
    J.@constraint(m, facility_no_self[i=1:n], tour[i,i] == 0)
    J.@constraint(m, facility_net[i=1:n], sum(tour[:, i]) + sum(tour[i, :]) == 2)
    # for each facility: exactly one outgoing and one incoming edge
    J.@constraint(m, facility_outgoing[i=1:n], sum(tour[i, :]) == 1)
    J.@constraint(m, facility_ingoing[i=1:n], sum(tour[:, i]) == 1)

    secs = 1
    while true
        J.optimize!(m)

        rg = result_to_graph(J.value.(tour))
        seccut, cost = find_sec_cut(rg)
        @show cost, length(seccut)
        if cost >= 2-1e-3
            result = tour
            break
        end
        # Undirected graph: consider both directions: into/out of this subset.
        J.@constraint(m, sum(tour[i,j] + tour[j,i] for (i,j) = seccut) >= 2)
        secs += 1
    end
    return result_to_graph(J.value.(tour)), J.value.(tour), J.objective_value(m), secs
end

TSP: Comparison & Summary

In the following chart, I marked a few run time values on a logarithmic time scale. The DFJ approach (blue diamonds) is the first approach above, with the exponential power set size being reflected in the straight rise in this logarithmic plot; the MTZ approach shows a lot more variance, and rises exponentially as well, but with a smaller constant. You can also see that even the slightly better MTZ approach runs quite slow even for moderate problem sizes (~ 30 vertices). In contrast, the incremental DFJ approach presented last performs a lot (up to 10²) better in the GLPK solver (although there are large differences between individual instances of the problem, some being much more easily solved than others). Conversely, the MTZ approach works better with the SCIP solver.

Run time by problem size for three TSP approaches using SCIP

SCIP runtime behavior

Run time by problem size for three TSP approaches using GLPK

GLPK runtime behavior: Much better than SCIP for incremental approach

In a real scenario, only equipped with this problem formulation, one might partition a long tour into subtours manually, or use multiple sales persons. After all, 50 stops can become a bit much even for the most motivated sales person – if it’s a sales person’s route we are planning.

This comparison’s most valuable result is that the same problem can often be modelled in very different ways, of which one may be much more suitable to the solver software used. Arriving at a good model definitely takes some creativity, and I have great admiration for the operations researchers who have made great progress to now being able to solve the TSP problems for millions of vertices in acceptable time – of course using much more clever approaches than the ones presented here.

You can find three different implementations (DFJ, MTZ, and symmetric DFJ) in a notebook here (in which you can find out aspects like: more variables and constraints can lead to quicker solving time).

Scheduling

Finally, I’ve come up with a small scenario in which there are a few workers covering three shifts, for four weeks (the coming month). Each worker has certain days off each week, and additionally takes some vacation, as well as a shift preference. Consecutive shifts are to be avoided, and there should be two rest shifts between any two worked shifts for each worker.

name	weekly_hours	weekly_off_days	vacation_days	shift_preference
String	Int64	Vector{Int64}	Vector{Int64}	Vector{Int64}

Jenny	25	[1, 7]	[3]	[1, 0, 0]
Marco	40	[2, 3]	[12, 13, 14]	[0, 1, 0]
Pete	45	[2, 3]	[1, 2, 3]	[0, 1, 1]
Jessica	45	[1, 7]	[20]	[0, 1, 1]
Brutus	40	[1, 5]	[26, 27, 28]	[1, 0, 1]
Lark	40	[2, 4]	[15, 16, 17]	[1, 1, 1]

Weekly off days refer to day-of-the-week (1 = Monday, for example), and vacation days refer to day-of-month (1-28, i.e. four weeks). The shift preference is presented in the basis of [early, day, late].

The constraints are fed by the parameters, whereas the table above is stored as DataFrame:

weeks = 4
hours_per_shift = 8

min_pause_shifts = 2  # Free shifts between working shifts

n_workers = DF.nrow(schedule_data)
n_shifts = weeks * 7 * 3

start_day_of_month = 3 # day-of-week of first day in period

There are a few utility functions for handling the dates and day-of-week numbers:

# Convert day-of-month to day-of-week
function day_to_dow(d)
    x = (d+start_day_of_month-1) % 7
    if x == 0
        7
    else
        x
    end
end
# Return day-of-month that a shift is on. Shifts are enumerated consecutively.
function shift_on_day(s)
    return div(s-1, 3) + 1
end
# Return the three shifts on a certain day-of-month.
function shifts_on_day(d)
    d = d-1
    [d*3+1, d*3+2, d*3+3] 
end
# Return type (early/day/late) of shift.
function shift_type(s)
    t = s%3
    if t == 0
        return [0,0,1]
    elseif t == 1
        return [1,0,0]
    elseif t == 2
        return [0,1,0]
    end
end

This time, the model is a bit more complex; although the representation is still very simple. I will not first write down the model in mathematical terms, as it should be quite clear from the JuMP model:

function schedule()
    m = J.Model(SCIP.Optimizer)
 
    # Shift s is taken by worker w.
    J.@variable(m, shift_assign[s=1:n_shifts, w=1:n_workers], Bin)
    
    # Add constraints

    # Per shift

    # Each shift is taken by at most one worker. It is possible that a shift is not assigned:
    # The model shouldn't fail then, but give the option to manually reschedule.
    J.@constraint(m, shift_filled[s=1:n_shifts], sum(shift_assign[s,:]) <= 1)
    
    # Per worker
    for (i, w) in enumerate(schedule_data.name)
        # Respect hours count per week, with flexibility of 150%
        for week in 1:weeks
            firstshift = (week-1)*7*3 + 1
            lastshift = (week)*7*3
            J.@constraint(m, sum(shift_assign[firstshift:lastshift, i]) * hours_per_shift
                <= 1.5 * schedule_data.weekly_hours[i])
        end
        # Respect total hours worker in the period:
        J.@constraint(m, sum(shift_assign[:, i]) * hours_per_shift
                <= weeks * schedule_data.weekly_hours[i])
        # Respect weekly off days:
        od = schedule_data.weekly_off_days[i]
        for shift in 1:n_shifts
            if day_to_dow(shift_on_day(shift)) in od
                J.@constraint(m, shift_assign[shift, i] == 0)
            end
        end
        # Respect vacation:
        for vd in schedule_data.vacation_days[i]
            for shift in shifts_on_day(vd)
               J.@constraint(m, shift_assign[shift, i] == 0)
            end
        end
    end
    
    # Adhere to shift breaks, i.e. min_pause_shifts between two active shifts for each worker.
    J.@constraint(m, breaks[shift=1:n_shifts-min_pause_shifts, worker=1:n_workers],
        sum(shift_assign[shift:shift+min_pause_shifts, worker]) <= 1)
    
    # The objective comes from respecting shift priorities (that's positive) and unscheduled shifts
    # (that's negative). The sum of unscheduled shifts is added to the shift preference term.
    J.@objective(m, Max, sum(
            shift_assign[s, w] * sum(schedule_data.shift_preference[w] .* shift_type(s))
            for s in 1:n_shifts 
            for w in 1:n_workers) +
            sum(sum(shift_assign, dims=2) .- 1)))  # subtract -1 for each unscheduled shift.
    
    J.optimize!(m)
    
    J.value.(shift_assign)
end

And that’s it, already. I wrote another function converting the (here) 84x6 binary matrix back into a shift plan that’s human-readable. We find that we can almost find a perfect schedule, except for two uncovered shifts on Monday, 13th and on Tuesday, 28th. These we can try to cover in some other way (ask for overtime – because e.g. Pete is working at only 36% of his capacity, but covering Mon 13th would require him to work a double shift). As you see in the utilization numbers, we’re staying quite below the full work time, anyway: so there’s definitely room to optimize.

2 shifts could not be scheduled.
Hours for week 1 (1 - 7)
  Jenny: scheduled for 8
  Marco: scheduled for 40
  Pete: scheduled for 16
  Jessica: scheduled for 32
  Brutus: scheduled for 40
  Lark: scheduled for 32
Hours for week 2 (8 - 14)
  Jenny: scheduled for 32
  Marco: scheduled for 16
  Pete: scheduled for 16
  Jessica: scheduled for 24
  Brutus: scheduled for 40
  Lark: scheduled for 32
Hours for week 3 (15 - 21)
  Jenny: scheduled for 32
  Marco: scheduled for 32
  Pete: scheduled for 16
  Jessica: scheduled for 24
  Brutus: scheduled for 40
  Lark: scheduled for 24
Hours for week 4 (22 - 28)
  Jenny: scheduled for 24
  Marco: scheduled for 32
  Pete: scheduled for 16
  Jessica: scheduled for 24
  Brutus: scheduled for 24
  Lark: scheduled for 40
TOTAL HOURS:
 Jenny: 96 of 100 (96 %)
 Marco: 120 of 160 (75 %)
 Pete: 64 of 180 (36 %)
 Jessica: 104 of 180 (58 %)
 Brutus: 144 of 160 (90 %)
 Lark: 128 of 160 (80 %)

day	weekday	shift1	shift2	shift3
1	Wed	Brutus	Jessica	Lark
2	Thu	Brutus	Marco	Jessica
3	Fri	Lark	Marco	Jessica
4	Sat	Brutus	Marco	Pete
5	Sun	Lark	Marco	Brutus
6	Mon	Lark	Marco	Pete
7	Tue	Jenny	Jessica	Brutus
8	Wed	Jenny	Lark	Brutus
9	Thu	Jenny	Jessica	Brutus
10	Fri	Jenny	Marco	Jessica
11	Sat	Lark	Marco	Brutus
12	Sun	Lark	Pete	Brutus
13	Mon	unfilled	Pete	Lark
14	Tue	Jenny	Jessica	Brutus
15	Wed	Jenny	Jessica	Brutus
16	Thu	Jenny	Marco	Brutus
17	Fri	Jenny	Pete	Jessica
18	Sat	Lark	Marco	Brutus
19	Sun	Lark	Marco	Brutus
20	Mon	Lark	Marco	Pete
21	Tue	Jenny	Jessica	Brutus
22	Wed	Jenny	Lark	Brutus
23	Thu	Jenny	Jessica	Brutus
24	Fri	Lark	Marco	Jessica
25	Sat	Lark	Marco	Brutus
26	Sun	Lark	Marco	Pete
27	Mon	Lark	Marco	Pete
28	Tue	Jenny	unfilled	Jessica

This problem can also be extended arbitrarily: From finding the best contiguous vacation days, to allocating multiple workers for each shift, covering peak demand on weekend days, or anything else; it’s all a matter of the right constraints!

You can look at the notebook in PDF form or download it as Jupyter notebook.

Final remarks

If you’ve made it here: great! This has been a slightly random tour through what I’ve come across in (MI)LP so far, and I hope you found it interesting. Let me know if you find a mistake in the problems above, or if you know an interesting application area yourself!