# How to implement your first parallel hill-climbing algorithm on GPU?

This lesson will let you

• Learn the main principles of ParadisEO-GPU
• Browse through the code of HC algorithm
• Tabu Search on GPU

1. How to run my simple Hill-climbing on GPU!

cmake ..

and type

make

All tutorials are now compiled (otherwise call the hot-line in the forum). You can run your hill climbing. From the "ParadisEO-GPU/paradiseo-gpu/build/tutorial/OneMax" directory, type:

./GPUtestSimpleHC

A very simple hill climbing had solved the oneMax problem (which maximizes the number of ones in the bit string) for the bit strings of size 32. On your output screen, you can see on the first line the random initial solution, on the second line the initial solution fitness, on the third line the time that GPU make to execute Hill climbing and on the last line the final solution which is the bit string of size 32 with all ones. For example:

1 0 1 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 1 0 1 1
initial: 17
Execution time = 0.878000 ms
final: 32

The simple HC stops on local optima when no improvement can not be done.

2. The main principles of ParadisEO-GPU

In ParadisEO-GPU a prallel neighborhoods design is defined by this schema:

The main idea it's to divide neighborhood to a group of one or more neighbors that we can evaluate parallelly on GPU and save those calculated fitness in array structure. At each iteration, from the current solution:

• Generate neighbors from the neighborhood and fetch them pre-calculated fitness
• Select a neighbor
• Decide to replace solution by the selected neighbor

This sequence diagram shows the different interactions between objects at each iteration :

Like in paradisEO-eo, ParadisEO-GPU separates the representation-dependent part (represention of solution, neighbor, evaluation) from the generic part .

So, to define a local search, it is necessary to define:

1. the solution representation
2. the neighbor representation
3. the evaluation function of a solution
4. the evaluation function of a neighbor
5. the neighborhood (generation of the neighbors)

If you can define the representation-dependent part of your problem, you can use all local search algorithms from MO (Hill-climbing, tabu search, iterated local search, variable neighborhood search, and more) and even combined them with evolutionary algorithms !

3. Browsing the HC code

Now let's have a look on the code. All the elements of the simple HC are already defined in a both MO and ParadisEO-GPU framework, and the code only puts all this bricks together. The code following the previous general principles. It defines the solution and neighbor representation, the solution and neighbor evaluation, the neighborhood, and the local search used.

1. The includes part:

1. include "moGPUConfig.h"

This include GPU Config File that contains many object-like macros

```  #define BLOCK_SIZE 8
#ifndef NB_POS
#define NB_POS 2
#endif
#ifndef SIZE
#define SIZE 100
#endif
#endif
```
• BLOCK_SIZE : Define the number of threads per block, it must be a power of two number (8,16,32,64,128...)
• NB_POS : Define the number of position index to change on each solution.For example, the number of position to change for permutation neighborhood is two.
• SIZE : Define the size of solution vector.

The general includes for the c++ stdlib streams:

``` #include <iostream>
#include <stdlib.h>
```

This includes for eo which contains all include files of EO:

``` #include <eo>
```

This includes the evaluation function of a solution (full evaluation), the incremental evaluation of a neighbor for the oneMax problem.

``` #include <problems/eval/moGPUEvalOneMax.h>
#include <eval/moGPUVectorEval.h>
#include <problems/eval/moGPUOneMaxIncrEval.h>
```

This inculdes solution/neighbor & neighborhood. The first line to include the bit string representation on a both CPU & GPU,the second one to include the bit string neighbor representation and the third line to include gpu-neighborhood.

``` #include <GPUType/moGPUBitVector.h>
#include <neighborhood/moGPUBitNeighbor.h>
#include <neighborhood/moGPUOrderNeighborhoodByModif.h>
```

This include the timer for timing calculation.

``` #include <performance/moGPUTimer.h>
```

This includes all the MO elements necessary to run the simple HC

``` // The Solution and neighbor comparator
#include <comparator/moNeighborComparator.h>
#include <comparator/moSolNeighborComparator.h>
// The continuator
#include <continuator/moTrueContinuator.h>
// Local search algorithm
#include <algo/moLocalSearch.h>
// Simple HC algorithm
#include <algo/moSimpleHC.h>
// The simple HC algorithm explorer
#include <explorer/moSimpleHCexplorer.h>
```

2. The typedef part:

As in EO and MO the paradiseo-gpu classes parametrized by the type of solutions, type of neighbors so it is useful to use a synonym (with a typedef) of the solution's type and neighbor's type .

Here the solution representation is a bit string and the neighbor representation is related to a bit string solution and Hamming distance 1 (only 1 bit can be flipped), both using an "eoMaximizingFitness" fitness value.

``` typedef moGPUBitVector<eoMaximizingFitness> solution;
typedef moGPUBitNeighbor <solution,eoMaximizingFitness> Neighbor;
typedef moGPUOrderNeighborhoodByModif<Neighbor> Neighborhood;
```

3. Object definition part:

Follows the main function "main_function" where all useful objects are defined.
First, a code to parse the command line and a file. It gives the value of the random seed and many parameters. The lesson 3 of EO tutorial gives more precision on this code. Here we have only to understand that the variables "seed" and "vecSize" are initialized.

```  eoParser parser(argc, argv);

eoValueParam<uint32_t> seedParam(time(0), "seed", "Random number seed", 'S');
parser.processParam( seedParam );
unsigned seed = seedParam.value();
```

(...)

To seed the random seed (see lesson 1 of EO tutorial for more precision):

```  rng.reseed(seed);
```

The definition the initialization of solutions is defined in paradiseo-gpu we can't reuse type defined in EO because the solution must be useful in a both GPU and CPU. The "moGPUBitVector" is a class that makes a random intialization of bit string of a given length. Each bit is true with 1/2 rate. You can see the lesson 3 of GPU tutorial for more precision.

```  solution sol(SIZE);
```

The fitness function of the oneMax problem is the number of 1 in the bit string. It is already defined in paradiseo-gpu:

```  moGPUEvalOneMax<solution> eval;
```

A neighbor is not necessary a modified copy of the solution. But often a neighbor defines how to move, i.e. how to modify a solution to compute the neighbor. For example, for the bit string, a neighbor indicates which bit is flipped.
In the same way, it is often possible to define the incremental evaluation of a neighbor knowing the modification (the move).
The incremental evaluation for the oneMax problem on GPU is already defined and adds +1 or -1 at the fitness value according to the flipped bit.

``` moGPUOneMaxIncrEval<Neighbor> incr_eval;
moGPUEvalByModif<Neighbor,moGPUOneMaxIncrEval<Neighbor> > gpueval(SIZE,incr_eval);
```

we should pass in parameter the incremental evaluation to "moGPUEval" in order to launch the Kernel to compute parallely all neighbors fitness.

for the simple hill-climbing on GPU, all the neighbors are explored in increasing order of their bit flip. So, you can use the class "moGPUOrderNeighborhoodByModif" where the size of the neighborhood and neighbors evaluation has to be precise in the constructor:

```  Neighborhood neighborhood(SIZE,gpueval);
```

All representation-dependent part is now defined, so the simple Hill-Climbing can be defined. The constructor needs the neighborhood, the solution and neighbor evaluations. The solution and neighbor representation are defined by the template of the class:

```  moSimpleHC<Neighbor> simpleHC(neighborhood,eval,gpueval);
```

4. The execution of hill-climbing part:

Always in the "main_function" function, it is the execution part.
In MO, the local search algorithm never initialises the solution. It must be made outside the local search algorithm.

Now apply your simple HC on the solution as follows:

```  // The current solution
solution sol(SIZE);
//Evaluation of the intial solution:
eval(sol);
// output: the initial solution fitness
std::cout << "initial: " << sol<< std::endl;
// Create timer for timing  calculation
moGPUTimer timer;
timer.start();
//apply the local search on the solution
simpleHC(sol);
timer.stop();
// print execution time
printf("Execution time = %f s\n",timer.getTime());
// output: the final solution
std::cout << "final:   " << sol<< std::endl;
```

4. Tabu Search on GPU

1. Browsing the Tabu Search code:

```   a. The includes part:

The same inclusions must be made such as  the previous section and others specific to run Tabu search algorithm must be added.This includes all the MO elements necessary to run the simple Tabou search :
```
``` // The time continuator
#include <continuator/moTimeContinuator.h>
// Local search algorithm
#include <algo/moLocalSearch.h>
// The Tabou Search algorithm explorer
#include <explorer/moTSexplorer.h>
//Algorithm and its components
#include <algo/moTS.h>
//Tabu list
#include <memory/moNeighborVectorTabuList.h>
memories inclusions
#include <memory/moDummyIntensification.h>
#include <memory/moDummyDiversification.h>
#include <memory/moBestImprAspiration.h>
```
``` The typedef part and object definition parts are similar to the previous section. The difference that can be observed is specific to the execution of the tabu search:
```
``` //define continuator
moTimeContinuator <Neighbor> continuator(timeLimit);
//define tabu list
moNeighborVectorTabuList<Neighbor> tl(sizeTabuList,0);
// define memories
moDummyIntensification<Neighbor> inten;
moDummyDiversification<Neighbor> div;
moBestImprAspiration<Neighbor> asp;
// define Tabu search explorer of solution neighborhood's
moTSexplorer<Neighbor> explorer(neighborhood, gpueval, comparator,solComparator, tl, inten, div, asp);
// Run Tabu search using general local search template
moLocalSearch<Neighbor> localSearch1(explorer, continuator, eval);
```

It's possible to use another method to run Tabu search :

// Run Tabu search using basic Constructor

```  moTS<Neighbor> localSearch2(neighborhood,eval, gpueval,  2, 7);
// Run Tabu search using simple Constructor
moTS<Neighbor> localSearch3(neighborhood, eval, gpueval, 5, tl);
// Run Tabu search using general Constructor
moTS<Neighbor> localSearch4(neighborhood, eval, gpueval, comparator,  solComparator, continuator, tl, inten, div, asp);
```
```   b. The execution of Tabu Search part:

Now apply your local search on the solution as follows:
```
```  solution sol1(SIZE);
eval(sol1);
std::cout << "initial: " << sol1<< std::endl;
moGPUTimer timer1;
timer1.start();
localSearch1(sol1);
timer1.stop();
printf("Execution time = %f s\n",timer1.getTime());
std::cout << "final:   " << sol1<< std::endl;
```
``` Run my simple Tabu search on GPU :
```

./GPUtestSimpleTS

On your output screen, you can see on the first the initial solution fitness, on the second line the time continuator, on the third line time that GPU make to execute tabu search and on the last line the final solution fitness which is the bit string of size 1000 with all ones. For example:

``` Tabu Search 1:
---------------------
initial: 522
STOP in moTimeContinuator: Reached maximum time [1/1]
Execution time = 0.883 s
final:   1000
```

NB: To note that up to the time value that you associate to a time continuator's the final solution fitness can be different than bit string with all ones.