Personalised city trip itinerary using integer linear programming
Case
As a research project I have developed an itinerary service. The idea started when I was doing a hackathon with colleagues for the city of Amsterdam (see earlier post). I wanted to recommend an itinerary to a tourist visiting the city of Amsterdam. Furthermore, I wanted to make the itinerary based on the user’s interests to recommend interesting places and activities for him in the city. If the user is interested in modern art for example, the recommendation scores for modern art museums will increase for that user.
The tool divides the duration of the tourist’s stay into separate time slots. For example, a single day could be divided in 3 time slots: morning, afternoon, evening. POIs get a different recommendation score for each time slot they can be visited. The Vondelpark for example could be less recommended on Monday morning because of expected rain or because of expected crowds. On Tuesday morning the Vondelpark could be recommended because of an interesting event or nice weather.
The itinerary tool will try to limit travel time between the recommended POIs (point of interests). In this way the tourist will not waste time on travailing. The user can also set a budget for the entire itinerary.
By taking all these considerations into account the tool should be able to aid the user in making good decisions about which places to visit and when to visit them.
Here is a screenshot of the user interface of the itinerary tool that was created as part of the research project:
In this blog I’m going to explain the techniques used in making the itinerary planning algorithm. The calculation of the recommendation scores for the POIs is handled by another system (a recommendation engine) and is not described in this blog.
Problem
When creating an itinerary (or for that matter any kind of optimised planning) where you have a lot of possible solutions, you quickly run into the situation that you can’t find an the perfect solution by trying all combinations. The amount of feasible combinations you have in making the itinerary practically rules out the possibility to try out all combinations to find an optimised one.
Let’s say we want to plan a trip where we visit 6 POIs in the city of Amsterdam in succession. A quick look on Google maps shows you how many POIs are in the city of Amsterdam.
Below is the an overview of museums in the city of Amsterdam:
The amount of restaurants in Amsterdam is even more impressive:
Brute force attempt
Let’s say there are about a 1,000 POIs in the city of Amsterdam. If you’d want to try out all combinations to find the most optimum itinerary for 6 time slots you get the following amount of combinations: 1,000 * 1,000 * 1,000 * 1,000 * 1,000 * 1,000 = 1,000,000,000,000,000,000.
Remember that the order in which you visit the POIs does matter, as each combination of a time slot and a POI has it’s own recommendation score. For simplicity I don’t take into account that you may not want to visit the same POI twice, this would slightly reduce the amount of permutations.
If you can make an implementation that can check 100,000 combinations per second, it will take you about 300,000 years to try out all combinations. I doubt that the city of Amsterdam will still have the same tourist attractions in the amount of time it takes to find the most optimum itinerary.
Here is an example of how we can solve this by trying all combinations.
(for the complete example see LargeItineraryBruteForce.java)
// List of pois List pois = IntStream.range(0, 1000) .mapToObj(i -> "poi" + i) .collect(Collectors.toList()); // Weight per poi Map<String, Integer> weightPerPoi = new HashMap<>(); Random random = new Random(); for (String poi : pois) { weightPerPoi.put(poi, random.nextInt(10000) + 1); } int maxSum = 0; for (String poiTimeSlot1 : pois) { for (String poiTimeSlot2 : pois) { for (String poiTimeSlot3 : pois) { for (String poiTimeSlot4 : pois) { for (String poiTimeSlot5 : pois) { for (String poiTimeSlot6 : pois) { // Calculate the sum of recommendations int sum = weightPerPoi.get(poiTimeSlot1) + weightPerPoi.get(poiTimeSlot2) + weightPerPoi.get(poiTimeSlot3) + weightPerPoi.get(poiTimeSlot4) + weightPerPoi.get(poiTimeSlot5) + weightPerPoi.get(poiTimeSlot6); // Check if we have a new best solution if(sum > maxSum){ maxSum = sum; List solution = Arrays.asList(poiTimeSlot1, poiTimeSlot2, poiTimeSlot3, poiTimeSlot4, poiTimeSlot5, poiTimeSlot6); System.out.println("Found new best solution: " + solution + ", sum:" + sum); } } } } } } }
(I didn’t implement any checks for visiting duplicate pois for brevity)
On my machine this calculation will take about 3000 years, which is not very practical.
The example given was actually still a simple one, probably there are some shortcuts imaginable. However, one of the features in the actual planning algorithm is that it also takes into account the distance between each POI to make sure you don’t travel too much between them. Furthermore, there is also a constraint that takes the costs of the POIs into account. In this way a user can set a budget for his trip. These features add more complexity which makes the brute force attempt even more unfeasible because optimizations, or shortcuts within the brute force program become much harder.
Integer linear programming
Let’s see if we can find a better solution which can solve the problem in near real time. Luckily there are techniques specialised in solving these kinds of optimization problems. One of these techniques is ILP (Integer Linear Programming). With ILP you write the problem in a mathematical form and let an algorithm (also called a solver) come up with an optimised solution. The mathematical form contains constraints that specify the desired properties of the solution.
History
Linear programming has been around for a long time. According to wikipedia linear programming was already being researched in the 1940s. Optimization is central to any problem involving decision making. It’s found to be useful in a lot of different problem domains in economics, engineering like production planning, scheduling, transportation planning, telecommunications networks and cellular networks.
Problem specification
Integer linear programming works by optimising the output of a linear function. For example:
a * 1 + b * 4 + c * 5
a, b and c are the variables in the function. 1, 4 and 5 are the multipliers in the function. If we set the values of a, b and c to 8, 9 and 10 we can calculate the outcome of the linear function:
(a) 8 * 1 + (b) 9 * 4 + (c) 10 * 5 = 94
The linear function of the itinerary planning, also called the objective function, contains a variable for each of the poi’s for each time slot.
For example, if we want to plan a trip where we have two possible POIs and two time slots we have 4 combinations possible in making the itinerary, 2 time slots times 2 POIs. We define a variable for the Rijksmuseum in the morning and one for the Rijksmuseum on in the afternoon.
Each of these variables has an associated multiplier or weight for that POI/time slot combination. In the case of only two POIs and two time slots, the linear function would consist of the following components:
a = Rijksmuseum morning
b = Rijksmuseum afternoon
c = van Gogh museum morning
d = van Gogh museum afternoon
Together with the weight, in this case recommendation score, the objective function would look like this:
a * 6 + b * 9 + c * 5 + d * 5
The algorithm will try to find a value for each of the variables that maximizes the output of this function. Without any constraints this would be the most optimum solution:
infinity * 3 + infinity * 9 + infinity * 5 + infinity * 5 = infinity
So the algorithm is trying to tell us that in order for the most optimum day you have to go infinite times to the Rijksmuseum in the morning, an infinite times to the Rijksmuseum in the afternoon, an infinite times to the van Gogh museum in the morning and an infinite times to the van Gogh museum in the afternoon. This maximizes the output of the objective function.
Not very practical to visit a museum infinite times, is it? Let’s try adding some constraints. Let’s first add a constraint for each museum per time slot so that you can only go once to every museum per time slot. Here is the constraint that limits variable a, the Rijksmuseum in the morning, to be either 0 (don’t visit) or 1 (visit).
A constraint is basically a row of multipliers combined with an operator and a value. This constraint is met if the sum of the variables combined with the multipliers plus the offset is lower or equal to 1.
variable |
multiplier |
operator |
value |
a |
1 |
<= |
1 |
The first constraint is calculated like this:
a * 1 <= 1
If the algorithms picks 0 for variable a the constraint is met:
(a) 0 * 1 = 0
0 <= 1 = true
0 is smaller than 1 so the constraint is met.
If the algorithms picks 1 for variable a the constraint is also met:
(a) 1 * 1 = 1
1 <= 1 = true
1 is equal to 1 so the constraint is met.
If we pick a number for variable a larger than 1 we can see that the constraint is no longer met:
(a) 2 * 1 = 2
2 <= 1 = false
2 is larger than 1. The algorithm cannot pick a number larger than 1 for variable a.
We need to specify this constraint for each variable. The list of constraint will look like this in a table (when a constraint multiplier is zero effectively the value of the accompanying variable is ignored, multiplying something with zero is always zero):
constraint | a | b | c | d | operator | value |
1 | 1 | 0 | 0 | 0 | <= | 1 |
2 | 0 | 1 | 0 | 0 | <= | 1 |
3 | 0 | 0 | 1 | 0 | <= | 1 |
4 | 0 | 0 | 0 | 1 | <= | 1 |
Constraint 1 limits the value of a to be either 1 or 0. Constraint 2 limits the value of b to be either 1 or 0 etc..
When we run the algorithm with these constraints with the optimization function (a3 + b9 + c5 + d5) we get the following answer:
(a) 1 * 6 + (b) 1 * 9 + (c) 1 * 5 + (d) 1 * 5 = 25
The algorithm picks the value 1 for variable a, b, c and d. We see that the most optimum (maximal) outcome is 25.
We have still have a problem. We are asked to visit two museums at the same time. We can solve that by adding the following constraints:
a (Rijksmuseum morning) |
b (Rijksmuseum afternoon) |
c (van Gogh morning) |
d (van Gogh afternoon) |
operator |
value |
1 |
0 |
1 |
0 |
<= |
1 |
0 |
1 |
0 |
1 |
<= |
1 |
This will force the algorithm pick a or c (but not both), or to pick none of them. The same is true for b and d. So because a, c and b, d are in the same time slots will not have to visit two museums in the same time slot
When we run the algorithm adding these constraints we get the following answer:
(a) 1 * 6 + (b) 1 * 9 + (c) 0 * 5 + (d) 0 * 5 = 15
Still there is one problem left to solve. We are being asked to visit museum the Rijksmuseum in the morning and in the afternoon. Not cool. We don’t want to visit the same museum twice. So there is one set of constraints left:
a (Rijksmuseum morning) |
b (Rijksmuseum afternoon) |
c (van Gogh morning) |
d (van Gogh afternoon) |
operator |
value |
1 |
1 |
0 |
0 |
<= |
1 |
0 |
0 |
1 |
1 |
<= |
1 |
Now the museum is forced to pick one of:
a or b
and one of:
c or d
Given these constraints the outcome now becomes:
(a) 0 * 6 + (b) 1 * 9 + (c) 1 * 5 + (d) 0 * 5 = 14
The Rijksmuseum in the afternoon (b) is recommended more than the Rijksmuseum in the morning (a). The van Gogh museum is equally recommended in the morning as in the afternoon. So the optimum solution (maximizing the outcome of the objective function) given all the constraints is to chose the van Gogh museum in the morning and the Rijksmuseum in the afternoon.
ILP implementation
Google open sourced a library with optimization algorithms that I use in this example. It can be found at https://developers.google.com/optimization/?hl=en.
Let’s first try to find the most optimum itinerary for the earlier simplified example of visiting two museums in two time slots:
First we create the solver:
MPSolver solver = new MPSolver("SmallItinerary", MPSolver.OptimizationProblemType.GLOP_LINEAR_PROGRAMMING);
Then we define the variables:
// Create integer variables a, b, c and d with a lower limit of 0 and an upper limit of 1 MPVariable rijksmuseumMorning = solver.makeIntVar(0, 1, "a"); MPVariable rijksmuseumAfternoon = solver.makeIntVar(0, 1, "b"); MPVariable vanGoghMuseumMorning = solver.makeIntVar(0, 1, "c"); MPVariable vanGoghMuseumAfternoon = solver.makeIntVar(0, 1, "d");
The OR tools solver let’s you immediately set a minimum and a maximum for a variable, it wouldn’t surprise me if this was implemented by just adding a constraint for each variable as explained earlier.
Then we insert the objective function a * 6 + b * 9 + c * 5 + d * 5:
// Create objective function MPObjective objective = solver.objective(); // Set the variables and their multipliers (recommendation scores) objective.setCoefficient(rijksmuseumMorning, 6); objective.setCoefficient(rijksmuseumAfternoon, 9); objective.setCoefficient(vanGoghMuseumMorning, 5); objective.setCoefficient(vanGoghMuseumAfternoon, 5); // The goal is to maximize outcome of the objective function objective.setMaximization();
Now we set the constraints to limit visiting two museums in the same time slot:
// Make a constraint with a minimum outcome of 0 and a maximum outcome of 1 MPConstraint constraint1 = solver.makeConstraint(0, 1, "Visit only one museum in the morning"); constraint1.setCoefficient(rijksmuseumMorning, 1); constraint1.setCoefficient(vanGoghMuseumMorning, 1); // Make a constraint with a minimum outcome of 0 and a maximum outcome of 1 MPConstraint constraint2 = solver.makeConstraint(0, 1, "Visit only one museum in the afternoon"); constraint2.setCoefficient(rijksmuseumAfternoon, 1); constraint2.setCoefficient(vanGoghMuseumAfternoon, 1);
Now we set the constraints to limit visiting one museum in the morning and afternoon:
// Make a constraint with a minimum outcome of 0 and a maximum outcome of 1 MPConstraint constraint3 = solver.makeConstraint(0, 1, "Visit the Rijksmuseum once"); constraint3.setCoefficient(rijksmuseumMorning, 1); constraint3.setCoefficient(rijksmuseumAfternoon, 1); // Make a constraint with a minimum outcome of 0 and a maximum outcome of 1 MPConstraint constraint4 = solver.makeConstraint(0, 1, "Visit the van Gogh museum once"); constraint4.setCoefficient(vanGoghMuseumMorning, 1); constraint4.setCoefficient(vanGoghMuseumAfternoon, 1);
Now that we’ve inserted the constraints and the objective function we can solve for the maximum outcome of the objective function:
solver.solve();
Print the solution:
System.out.println("Optimal objective value = " + (int) solver.objective().value()); System.out.println("Visit the Rijksmuseum in the morning " + (int) rijksmuseumMorning.solutionValue() + " times"); System.out.println("Visit the Rijksmuseum in the afternoon " + (int) rijksmuseumAfternoon.solutionValue() + " times"); System.out.println("Visit the van Gogh museum in the morning " + (int) vanGoghMuseumMorning.solutionValue() + " times"); System.out.println("Visit the van Gogh museum in the afternoon " + (int) vanGoghMuseumAfternoon.solutionValue() + " times");
Which will output:
Optimal objective value = 14
Visit the Rijksmuseum in the morning 0 times
Visit the Rijksmuseum in the afternoon 1 times
Visit the van Gogh museum in the morning 1 times
Visit the van Gogh museum in the afternoon 0 times
(for the complete example see SmallItineraryILP.java)
Ramping it up
I’ve also created an implementation of the 1000 POIs in 6 time slots which is a bit more complex but is just an expansion of the previous example but with more variables. You can find the code in LargeItineraryILP.java. On my machine this calculation takes about 150 milliseconds. Which is a whole lot better than the brute force attempt.
Conclusion
With ILP it’s a lot easier to solve certain kinds of problems that can’t be easily solved using traditional procedural styles of programming. The nice thing about these algorithms is if you know how to specify the problem, the constraints and the objective function, a complex algorithm will find the answer for you without it being necessary that you understand the algorithm itself. Google’s OR tools library contains several implementations of these algorithms. It’s possible to switch the implementation of the solver without changing the specification of the objective function or the constraints. Basically you’re programming what to output of the program should be, instead of programming how to get there.
Example code
On github you can find the example code: https://github.com/mvanzelst/itinerary-blog
Install the google or tools library
mvn install:install-file -DgroupId=com.google.ortools -DartifactId=ortools -Dversion=0.3300 -Dpackaging=jar -Dfile=lib/com.google.ortools.jar
Running the examples
Run LargeItineraryILP.java with:
mvn clean package exec:java -Dexec.mainClass=”nl.trifork.blog.ilp.itinerary.LargeItineraryILP”
Run SmallItineraryILP.java with:
mvn clean package exec:java -Dexec.mainClass=”nl.trifork.blog.ilp.itinerary.SmallItineraryILP”
Run LargeItineraryBruteForce.java with:
mvn clean package exec:java -Dexec.mainClass=”nl.trifork.blog.ilp.itinerary.LargeItineraryBruteForce”
Native libraries
I’ve included the native libraries for windows, linux and mac. The program will try to detect your os to load the correct library. I’ve only included the 64 bit libraries. If you need libraries for a different os or architecture you can find the libraries under the java section at the OR tools website: https://developers.google.com/optimization/installing
Resources
https://developers.google.com/optimization
https://developers.google.com/optimization/puzzles/sudoku
https://en.wikipedia.org/wiki/Integer_programming