Travelling Salesman Problem

Problem definition

Consider a salesman needing to travel to n cities:

He can start from any city
He needs to visit any city once and only once

What is the path with the shortest traveling distance?

This is the TSP: Traveling Salesman Problem

Example with 4 cities:

For example, the path A->B->D->C has a distance of 5+7.2+7.8+5.8 = 25.8. Is it the shortest distance?

Symmetry / Asymmetry

Above the distance is symmetric: from A to B the distance is the same as B to A, this is named sTSP. By default, the TSP is considered sTSP.

TSP can also consider asymmetric distances, called aTSP. For example, the road isn’t the same between going and coming

Example of an aTSP problem:

How many paths possibles in a TSP?

A naive counting

It can be counted this way:

At the first level: all cities are possible: n possibilities
At second level: n-1 possibilities (because it is forbidden to visit twice a city)
…
At the last level: 1 possibility

For example, the first two levels of a 4-cities problem:

This mean n*(n-1)*(n-2)*…*1 possible paths:

There are n! possible paths

With 4-cities, we can list the 4! = 4*3*2*1 = 24 possibilities

A	B	C	D
B	A	C	D
C	A	B	D
A	C	B	D
B	C	A	D
C	B	A	D
C	B	D	A
B	C	D	A
D	C	B	A
C	D	B	A
B	D	C	A
D	B	C	A
D	A	C	B
A	D	C	B
C	D	A	B
D	C	A	B
A	C	D	B
C	A	D	B
B	A	D	C
A	B	D	C
D	B	A	C
B	D	A	C
A	D	B	C
D	A	B	C

First skimming

If you shift a path, from 1, 2,…,n offset, the distance is same.

For example, the distance of path [A-B-C-D] is the same as [B-C-D-A], and the same as [C-D-A-B] and [D-A-B-C]. Concretely, it means it means is possible to start from a single chosen city.

From the naive count n!, we can divide by n: the first skimming solution is (n-1)! possible paths.

With 4-cities, we can summarize the 24 solutions as solutions starting from the city A, there are 3! = 3*2*1 = 6 solutions. Here we choose to start only from the city A

A	B	C	D
A	C	B	D
A	D	B	C
A	B	D	C
A	C	D	B
A	D	C	B

Last skimming

When the distances are symmetric, when you reverse a path, the distance is the same.

For example, the distance [A-B-C-D] is the same as [D-C-B-A].

With 4-cities, we skim the inverse duplicates:

Row 1 and 6: row A is [A-B-C-D], its reverse is [D-C-B-A], and after rearrangement, it is similar to [A-D-C-B]
Row 2 and 3
Row 4 and 5

There are now 3 solutions:

A	B	C	D
A	C	B	D
A	B	D	C

Final count

If there are n cities, in a symmetric scenario, the total number of paths to evaluate is (n-1)!/2

How difficult is it to solve the Traveler Salesman Problem?

There is no problem with the calculation of distances: this is basically additions.

There is an issue with the number of distances to calculate, due to the factorial effect:

Number of cities	Number of solutions
4	3
10	181440
20	6.0823E+16
30	4.4209E+30
40	1.0199E+46
50	3.0414E+62
60	6.9342E+79
70	8.5561E+97
80	4.473E+116
90	8.254E+135
100	4.666E+155

Already with 20 cities, there are around 40.000.000.000.000.000 paths to calculate.

With 100 cities, surely your desktop or laptop won’t finish the calculations before the end of the XXIst century.

Specificity of the TSP

Despite the aim is to find the shortest path, if there are too many cities, the aim will be to find the best possible short path! It may not be the shortest one, indeed no possibility of checking it.

Hence, the idea if to try several algorithms, compare them, and choose the one with the minimum distance.

Algorithm for solving the TSP

Nearest insertion algorithm example

Algorithm definition

Starting step
Choose a starting city I
Find a new city R such that the distance R-I is minimal. The starting subpath is [I-R-I]
Regular step
Selection step: find the city R closest to any city of the current subpath, meaning finding the city R, such that for any city J in the subpath, the distance(R-J) is minimal
Insertion step: from the current subpath, find the adjacent cities I and J, where R has a minimal cost of insertion distance(I-R) + distance(R-J) – distance(I-J). Insert R between I and J
If the travel is complete, stop, else go to regular step

Example with 4 cities

1 – Starting step

We start from city A.

We choose the nearest city from A: it is B with a distance of 5:

Hence the starting subpath is [A-B-A]

2 – Regular step

We choose the nearest city from the subpath [A-B-A]: this is C with a distance of 5.8 from A:

For the insertion of C into [A-B-A], it could be:

[A-C-B-A]
[A-B-C-A]

Let’s compare the minimal insertion cost:

[A-C-B-A]: distance(A-C) + distance(C-B) – distance(A-B) = 5.8 + 9 – 5 = 9.8
[A-B-C-A]: distance(B-C) + distance(C-A) – distance(A-B) = 9 + 5.8 – 5 = 9.8

Have you noted the distance is the same? It’s because, in this step, we have basically a triangle, with symmetric distances, hence the paths are the same.

The subpath [A-B-C-A] is chosen amongst the two (quite the same) possibilities.

3 – Regular step

It remains one city to select, so finding the city with the closest distance to any city of the current subpath is easy

It remains one city D to insert:

[A-D-B-C-A]
[A-B-D-C-A]
[A-B-C-D-A]

Amongst these 2 options, which one minimizes the insertion distance?

[A-D-B-C-A] has an insertion distance of distance(A-D) + distance(D-B) – distance(A-B) = 9 + 7.2 – 5 = 11.2
[A-B-D-C-A] has an insertion distance of distance(B-D) + distance(D-C) – distance(B-C) = 7.2 + 7.8 – 9 = 6
[A-B-C-D-A] has an insertion distance of distance(C-D) + distance(D-A) – distance(C-A) =7.8 + 9 – 5.8 =11

The winner with the nearest insertion step is [A-B-D-C-A] The total distance from this algorithm is 20.

Performance of the algorithm

A scholarly and complex calculation indicates that the distance with the nearest insertion is no more than twice the optimal distance.

The number of required calculations is increasing with the square of n: conventionally written as O(n²)

Cheapest insertion algorithm

Algorithm definition:

The cheapest insertion is the cousin of the nearest insertion. The nearest insertion is looking for the closest city, and then is looking for the minimal insertion cost.

The cheapest insertion is more accurate, and thus requires more calculation: every city is a candidate (instead of the closest city), and then every minimal insertion cost is checked.

This means the cheapest insertion:

is as good or better than nearest insertion
requires more calculations

The steps of the cheapest insertion algorithm:

Starting step
Choose a starting city I
Regular step
From the current subpath, find the city R, not in the subpath, and two adjacent cities in the subpath I and J, such that the cost of insertion is minimal: meaning distance(I-R) + distance(R-J) – distance(I-J)
If the travel is complete, stop, else go to regular step

Example with 4 cities

Step 1

We start from city A.

We look for the city X such as distance A-X-A is minimum (the step is the same as nearest insertion):

with A-B-A the distance is 5+5=10
with A-C-A the distance is 5.8+5.8=10.6
with A-D-A the distance is 9+9=18

C procures the cheapest insertion and is selected:

Step 2

Now we are looking between B and D which one procures the cheapest insertion inside the subpath [A-C-A]:

for insertion of B, we have the options [A-B-C-A] and [A-C-B-A]
for insertion of D, we have the options [A-D-C-A] and [A-C-D-A]

The insertion costs are:

[A-B-C-A]: distance(A-B) + distance(B-C) – distance(A-C) = 5 + 9 – 5.8 = 8.2
[A-C-B-A]: distance(C-B) + distance(B-A) – distance(C-A) = 9 + 5 5.8 = 8.2
[A-D-C-A]: distance(A-D) + distance(D-C) – distance(A-C) = 9 + 7.8 – 5.8 = 11
[A-C-D-A]: distance(C-D) + distance(D-A) – distance(C-A) = 7.8 + 9 – 5.8 = 11

You have noted, like for nearest insertion, some ways are redundant: it is like to the symmetryc distance and triangle effect.

[A-B-C-A] and [A-C-B-A] are the winners, and arbitrarily, [A-B-C-A] is the chosen one.

Step 3

It remains the city D to insert in the subpath [A-B-C-A], the options are:

[A-D-B-C-A]
[A-B-D-C-A]
[A-B-C-D-A]

The insertion costs are:

[A-D-B-C-A]: distance(A-D) + distance(D-B) – distance(A-B) = 9 + 7.2 – 5 = 11.2
[A-B-D-C-A]: distance(B-D) + distance(D-C) – distance(B-C) = 7.2 + 7.8 – 9 = 6
[A-B-C-D-A]: distance(C-D) + distance(D-A) – distance(C-A) = 7.8 + 9 – 5.8 = 11

The calculations provide [A-B-D-C-A] as the winner:

Performance of the algorithm

A sophisticated calculation indicates that the distance with the nearest insertion is no more than twice the optimal distance.

The number of required calculations is increasing with the square of n²*log(n): conventionally written as O(n².log(n))

Nearest Neighbor Algorithm

Algorithm definition:

This algorithm is easier than the precedent ones: the aim is to find the closest city and to add it at the end of the subpath, without looking for a better insertion cost inside the subpath.

As consequences, Nearest Neighbour:

is less performant than the previous algorithm
unintuitively, it doesn’t reduce so much the number of calculations compared to nearest insertion

The steps of the nearest neighbor algorithm:

Starting step
Choose a starting city
Regular step
From the current subpath, find the city R which is the closest to the last city of the current subpath. Add this city R to the end of the subpath.
If the travel is complete, stop, else go to regular step

Example with 4 cities

Step 1

We start from city A.

Step 2

We choose the nearest city from A: it is B with a distance of 5:

Hence the starting subpath is [A-B-A]

Step 3

We choose the nearest city from the last city of [A-B], which is indeed B.

And the closest city from B is D. The current subpath is [A-B-C]:

Step 4

It remains only one city, D, which is added at the end of the current subpath.

The solution from the algorithm is [A-B-C-D].

It’s better to include the start city, so we can compare with the previous solutions, which provides [A-B-C-D-A]

Performance of the algorithm

A scholarly and complex calculation indicates that the distance with the nearest insertion is no more than 0.5 * rounded_to_upper_integer[log₂(n)] + 0.5

The number of required calculations is increasing with the square of n²: conventionally written as O(n²)

Farthest insertion

Algorithm description

This algorithm is more visual than the others, it requires to draw lines and to estimate distances between points and lines.

The steps are:

Starting step
Find the two cities I and J with the farthest distance. Draw an edge between I and J.
The city R to be added is the one which is the farthest from the edge[I-J]
Regular step
From all the edges of the subpath, chose the city R with the maximal distance.
Find the closest edge [I-J] of the subpath from city R
Replace this edge [I-J] by the two edges [I-R] and [R-J]

Reminder: how to calculate the distance to and edge

For the orange and violet point, we are looking the distance from the edge of the two blue points:

We transform the edge into a line:

We draw the perpendiculars to the line:

The perpendicular of the orange point is touching the edge: its distance is the one between the blue edge and the orange point

The perpendicular of the violet point is not touching the edge: its distance is the one between the violet point and the closest blue point.

Example with 4 cities

We start with cities A and B which have the smallest distance:

Between C and D, D has the farthest distance from the edge [A-B]:

Hence, the subpath [A-B] is replaced by [A-D-B]:

Now it remains only one city, C.

Its minimal distance to the tour is designated hereunder:

So, the subpath[A-D] is replaced by the subpath [A-C-D]:

The solution is [A-B-D-C]

Concorde code

Definition of the underlying algorithm

It’s very complex to understand. The main parts of the algorithm:

It finds a very good initial path with an algorithm called Chained Lin-Kernighan
It improves this initial path with Branch-and-Bound, Branch-and-Cut algorithm and solver.

It is only for symmetric distance. It’s the best solver as it often provides a perfect solution.

Feel free to provide me vulgarized information about its methodology;

Areas of applications

The Concorde solver can be used only for educational purposes.

You cannot use it in a commercial way, for your company or your customer!

Concorde implementation

The main site is: http://www.math.uwaterloo.ca/tsp/concorde/index.html

You can download the Concorde code at http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm

And you’ll be able to call this solver from the TSP R-library

It’s also possible to work with Concorde solver as a stand-alone.

Windows users, please note that you’ll need to install Cygwin too : https://www.cygwin.com/

Handling Concorde

When Concorde GUI is installed, you can, for example, generate some random nodes :

And then run the solver :

You also can load some configuration file containing your own cities, in a relevant file format :

Application with language R to a 4-cities example

This link https://vgir.fr/toolbox/ provides information about R installation.

We will use the TSP library, and precise we want to apply the Nearest Insertion Algorithm

We need to manually create a matrix of all distances:

Tips:

The distance between A and A is not relevant, no need to fill the matrix
Here we have symmetric distances, hence the matrix is symmetric
You read the matrix « from / to » as « row /to column »

We note the distances:

Distance from/to	A	B	C	D
A	–	2	3	7
B	2	–	5	6
C	3	5	–	8
D	7	6	8	–

Please note that TSP library will translate cities into numbers, as numbers are consistent with row and column numbers

A	1
B	2
C	3
D	4

Hereunder the code for creation of the matrix:

# call the TSP library
library(TSP)
# manually create the lines of the matrix
matrix_row1 <- c(0,2,3,7)
matrix_row2 <- c(2,0,5,6)
matrix_row3 <- c(3,5,0,8)
matrix_row4 <- c(7,6,8,0)
# bind the vector row by row
matrix <- data.matrix(rbind(matrix_row1, matrix_row2,matrix_row3,matrix_row4))
# Note that a matrix m is only symmetric if its rownames and colnames are identical. Consider using unname(m). 
rownames(matrix) <- NULL
# adapt the format to the one required by the library TSP                      
tsp_matrix <-  as.TSP(matrix)

The matrix matrix is created:

     [,1] [,2] [,3] [,4]
[1,]    0    2    3    7
[2,]    2    0    5    6
[3,]    3    5    0    8
[4,]    7    6    8    0

We call the solver. Please note in the manual all the available algorithms https://cran.r-project.org/web/packages/TSP/TSP.pdf :

tour <- solve_TSP(tsp_matrix, method = "nearest_insertion")
tour

The result is:

object of class ‘TOUR’ 
result of method ‘nearest_insertion’ for 4 cities
tour length: 19

So, the minimal distance is 19 with the Nearest Insertion Algorithm.

We ask the corresponding path:

tour_solution <- as.vector(tour)
tour_solution

Manually we can convert this list: the solution 1 2 4 3 is converted as A B D C.

Which tools for solving a TSP?

As described above, you cannot use the Concorde solver in a non-environment.

Hence, it’s forbidden at your job, as an employee, as a consultant.

Run all the possible algorithms available in your language/library, and keep the best one.

Consider using another language, if it contains a promising library

Type of distance / Context	Educational	Business, Consulting,…
Symmetric distances	Concorde: Optimal solution	TSP libraries Consider licensed software Suboptimal solution
Asymmetric distance	Compare: TSP open-source libraries Concorde by approximation from aTSP to tTSP suboptimal solution	TSP libraries Suboptimal solution