Boost Graph - Very slow Astar on big graph - c++

Currently I have some trouble with my Pathfinding system which is "anormally" slow on my big graph:
My Graph
Graph properties: 16814 vertices / 61512 edges
Graph is directed.
Each vertex has an ID of subgraph (Island ID) → No solution between subgraph BUT ALWAYS inside same subgraph.
Each vertex of graph is defines by:
type (rock, sand, ...).
height
Last rule, earth is not connect to ocean (so we have many sub-graph).
My Astar configuration
My heuristic is very classic: I compute dot between current vertex position and goal position.
I don't have a pre-compute weight for edges.
I use "complexe" algo (depends of speed of walker, kind of ground, if we go up or down)
float PathWorld::updateWeight(const Agent& agent, const EdgeInfo& edgeInfo) const
{
const Agent::Navigation& navigation = agent.getNavigation();
const auto& fromTerrain = edgeInfo._from->_terrain;
const auto& toTerrain = edgeInfo._to->_terrain;
const float mean = (navigation._speed.at(fromTerrain._type) + navigation._speed.at(toTerrain._type)) * 0.5f;
const float diff = BT::Maths::clamp((1000.0f + toTerrain._height - fromTerrain._height) / 1000.0f, 0.5f, 2.0f);
return edgeInfo._distance / mean * diff;
}
Issues
Currently, the execution time take less than 1ms to 1 second when I compute one path. The path solution is just between 8 or 80 vertices and I don't have proportionnal time. (So 8 vertices path can take 1 second and 80 vertices path take 1 ms).
I make a quick profiling with visual Studio: boost is my bottleneck.
Code and testing data
All complete code and testing data can be found on my GitHub.
https://github.com/Rominitch/myBlogSource/tree/master/DEMO/TestPathfinding
The easy/small demo don't suffer of my issue. Just complexe case.
All graphes was generated by same program (not published).
My Testing program output
My testing program is really dummy:
- I take a node to start into my graph
- I take XXX nodes after this (using index) and compute path.
Outputs:
Statistics:
Start node: Ocean H= 0 SubGraph= 2
nbValid: 2053/15000 (valid path / number of path computed)
min / max: 1/75 (number of vertex in path computed)
min time for one path: 0 ms
max time for one path: 7 ms
Statistics:
Start node: Forest H= 100 SubGraph= 1
nbValid: 1420/1500
min / max: 1/76
min time for one path: 0 ms
max time for one path: 558 ms
Statistics:
Start node: Swamp H= 50 SubGraph= 1
nbValid: 601/1000
min / max: 1/51
min time for one path: 0 ms
max time for one path: 1246 ms
Statistics:
Start node: Clay H= 300 SubGraph= 22
nbValid: 138/15000
min / max: 1/12
min time for one path: 0 ms
max time for one path: 0 ms
Questions
Where is my issue ? (bad boost using / bad graph / boost limitation)
Boost is a good choose to resolve pathfinding (another library) ?
Can we optimize my graph data (best boost algo, reduce data duplication, ...) ?
Thanks !

Ok ! I found my issue.
Currently, Bug was inside my heuristic implementation which doesn't compute square of distance between current node and goal.
It's just make a "quasi random" heuristic.
Moreover, in my case
boost::astar_search
is less performant than
boost::astar_search_tree
Finally, I optimized my graph too (remove dummy edges).
New stats:
Statistics:
Start node: Ocean H= 0 SubGraph= 2
nbValid: 2028/15000
min / max: 1/145
min time for one path: 0 ms
max time for one path: 13 ms
mean: 0 ms
Global time: 1845 ms
Statistics:
Start node: Forest H= 100 SubGraph= 1
nbValid: 1420/1500
min / max: 1/92
min time for one path: 0 ms
max time for one path: 13 ms
mean: 0 ms
Global time: 1232 ms
Statistics:
Start node: Swamp H= 50 SubGraph= 1
nbValid: 601/1000
min / max: 1/50
min time for one path: 0 ms
max time for one path: 11 ms
mean: 0 ms
Global time: 504 ms
Statistics:
Start node: Clay H= 300 SubGraph= 23
nbValid: 138/15000
min / max: 1/17
min time for one path: 0 ms
max time for one path: 1 ms
mean: 0 ms
Global time: 115 ms

Related

perft-function of chess engine is giving self-contradictory output

I am currently developing a chess engine in C++, and I am in the process of debugging my move generator. For this purpose, I wrote a simple perft() function:
int32_t Engine::perft(GameState game_state, int32_t depth)
{
int32_t last_move_nodes = 0;
int32_t all_nodes = 0;
Timer timer;
timer.start();
int32_t output_depth = depth;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
last_move_nodes = perft_no_print(game_state, depth - 1);
all_nodes += last_move_nodes;
std::cout << index_to_square_name(move.get_from_index()) << index_to_square_name(move.get_to_index()) << ": " << last_move_nodes << "\n";
game_state.unmake_move(move);
}
std::cout << "\nDepth: " << output_depth << "\nTotal nodes: " << all_nodes << "\nTotal time: " << timer.get_milliseconds() << "ms/" << timer.get_milliseconds()/1000.0f << "s\n\n";
return all_nodes;
}
int32_t Engine::perft_no_print(GameState game_state, int32_t depth)
{
int32_t nodes = 0;
if (depth == 0)
{
return 1;
}
std::vector<Move> legal_moves = generator.generate_legal_moves(game_state);
for (Move move : legal_moves)
{
game_state.make_move(move);
nodes += perft_no_print(game_state, depth - 1);
game_state.unmake_move(move);
}
return nodes;
}
It's results for the initial chess position (FEN: rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1) for depths 1 and 2 match the results of stockfish's perft command, so I assume they are correct:
h2h3: 1
h2h4: 1
g2g3: 1
g2g4: 1
f2f3: 1
f2f4: 1
e2e3: 1
e2e4: 1
d2d3: 1
d2d4: 1
c2c3: 1
c2c4: 1
b2b3: 1
b2b4: 1
a2a3: 1
a2a4: 1
g1h3: 1
g1f3: 1
b1c3: 1
b1a3: 1
Depth: 1
Total nodes: 20
Total time: 1ms/0.001s
h2h3: 20
h2h4: 20
g2g3: 20
g2g4: 20
f2f3: 20
f2f4: 20
e2e3: 20
e2e4: 20
d2d3: 20
d2d4: 20
c2c3: 20
c2c4: 20
b2b3: 20
b2b4: 20
a2a3: 20
a2a4: 20
g1h3: 20
g1f3: 20
b1c3: 20
b1a3: 20
Depth: 2
Total nodes: 400
Total time: 1ms/0.001s
The results stop matching at depth 3, though:
Stockfish:
go perft 3
a2a3: 380
b2b3: 420
c2c3: 420
d2d3: 539
e2e3: 599
f2f3: 380
g2g3: 420
h2h3: 380
a2a4: 420
b2b4: 421
c2c4: 441
d2d4: 560
e2e4: 600
f2f4: 401
g2g4: 421
h2h4: 420
b1a3: 400
b1c3: 440
g1f3: 440
g1h3: 400
Nodes searched: 8902
My engine:
h2h3: 361
h2h4: 380
g2g3: 340
g2g4: 397
f2f3: 360
f2f4: 436
e2e3: 380
e2e4: 437
d2d3: 380
d2d4: 437
c2c3: 399
c2c4: 326
b2b3: 300
b2b4: 320
a2a3: 280
a2a4: 299
g1h3: 281
g1f3: 280
b1c3: 357
b1a3: 320
Depth: 3
Total nodes: 7070
Total time: 10ms/0.01s
I figured that my move generator was just buggy, and tried to track down the bugs by making a move the engine gives incorrect values for on the board and then calling perft() with depth = 2 on it to find out which moves are missing. But for all moves I tried this with, the engine suddenly starts to output the correct results I expected to get earlier!
Here is an example for the move a2a3:
When calling perft() on the initial position in stockfish, it calculates 380 subnodes for a2a3 at depth 3.
When calling perft() on the initial position in my engine, it calculates 280 subnodes for a2a3 at depth 3.
When calling perft() on the position you get after making the move a2a3 in the initial position in my engine, it calculates the correct number of total nodes at depth 2, 380:
h7h5: 19
h7h6: 19
g7g5: 19
g7g6: 19
f7f5: 19
f7f6: 19
e7e5: 19
e7e6: 19
d7d5: 19
d7d6: 19
c7c5: 19
c7c6: 19
b7b5: 19
b7b6: 19
a7a5: 19
a7a6: 19
g8h6: 19
g8f6: 19
b8c6: 19
b8a6: 19
Depth: 2
Total nodes: 380
Total time: 1ms/0.001s
If you have any idea what the problem could be here, please help me out. Thank you!
EDIT:
I discovered some interesting new facts that might help to solve the problem, but I don't know what to do with them:
For some reason, using std::sort() like this in perft():
std::sort(legal_moves.begin(), legal_moves.end(), [](auto first, auto second){ return first.get_from_index() % 8 > second.get_from_index() % 8; });
to sort the vector of legal moves causes the found number of total nodes for the initial position (for depth 3) to change from the wrong 7070 to the (also wrong) 7331.
When printing the game state after calling game_state.make_move() in perft(), it seems to have had no effect on the position bitboards (the other properties change like they are supposed to). This is very strange, because isolated, the make_move() method works just fine.
I'm unsure if you were able to pin down the issue but from the limited information available in the question, the best I can assume (and something I faced myself earlier) is that there is a problem in your unmake_move() function when it comes to captures since
Your perft fails only at level 3 - this is when the first legal capture is possible, move 1 and 2 can have no legal captures.
Your perft works fine when it's at depth 1 in the position after a2a3 rather than when it's searching at depth 3 from the start
This probably means that your unmake_move() fails at a depth greater than 1 where you need to restore some of the board's state that cannot be derived from just the move parameter you are passing in (e.g. enpassant, castling rights etc. before you made the move).
This is how you would like to debug your move generator using perft.
Given startpos as p1, generate perft(3) for your engine and sf. (you did that)
Now check any move that have different nodes, you pick a2a3. (you did that)
Given startpos + a2a3 as p2, generate perft(2) for your engine and sf. (you partially did this)
Now check any move that have different nodes in step 3. Let's say move x.
Given startpos + a2a3 + x as p3, generate perft(1) for your engine and sf.
Since that is only perft(1) by this time you will be able to figure out the wrong move or the missing move from your generator. Setup that last position or p3 on the board and see the wrong/missing moves from your engine compared to sf perft(1) result.

dramatic error in lp_solve?

I've a simple problem that I passed to lp_solve via the IDE (5.5.2.0)
/* Objective function */
max: +r1 +r2;
/* Constraints */
R1: +r1 +r2 <= 4;
R2: +r1 -2 b1 = 0;
R3: +r2 -3 b2 = 0;
/* Variable bounds */
b1 <= 1;
b2 <= 1;
/* Integer definitions */
int b1,b2;
The obvious solution to this problem is 3. SCIP as well as CBC give 3 as answer but not lp_solve. Here I get 2. Is there a major bug in the solver?
Thank's in advance.
I had contact to the developer group that cares about lpsolve software. The error will be fixed in the next version of lpsolve.
When I tried it, I am getting 3 as the optimal value for the Obj function.
Model name: 'LPSolver' - run #1
Objective: Maximize(R0)
SUBMITTED
Model size: 3 constraints, 4 variables, 6 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
Relaxed solution 4 after 4 iter is B&B base.
Feasible solution 2 after 6 iter, 3 nodes (gap 40.0%)
Optimal solution 2 after 7 iter, 4 nodes (gap 40.0%).
Excellent numeric accuracy ||*|| = 0
MEMO: lp_solve version 5.5.2.0 for 32 bit OS, with 64 bit REAL variables.
In the total iteration count 7, 1 (14.3%) were bound flips.
There were 2 refactorizations, 0 triggered by time and 0 by density.
... on average 3.0 major pivots per refactorization.
The largest [LUSOL v2.2.1.0] fact(B) had 8 NZ entries, 1.0x largest basis.
The maximum B&B level was 3, 0.8x MIP order, 3 at the optimal solution.
The constraint matrix inf-norm is 3, with a dynamic range of 3.
Time to load data was 0.001 seconds, presolve used 0.017 seconds,
... 0.007 seconds in simplex solver, in total 0.025 seconds.

OMP: How to find the right size of cache at runtime

I have the following piece of (pseudo) code:
static void ConvertBuffer( unsigned char * buffer, const int width )
{
#pragma omp parallel for
for ( int x = 0; x < width; ++x ) // one image row
{
RGB rgb = {0,0,0}; HSB hsb;
rgb.red = (float)buffer[x] / 255.;
RGBToHSB(rgb, hsb);
buffer[x] = hsb.brightness * 255;
}
}
This is a very naive implementation of an RGB → HSB conversion algorithm.
The first implementation would pull a single scanline (=one row of the image) at a time, in my case 65536 bytes. However after trial and error on my particular system, I discovered that I could decrease the total computation time by a factor of 2, if instead I would process 16 scanlines at a time (= 1048576 bytes).
What tool are available for me to guess that magic number, possibly at runtime so that I do not need to hard-code a magical value of 16 somewhere in my code ?
If I know that RGBToHSB is embarrassingly parallel and cache friendly, can I just completely fill the L3 cache and that should be close to the maximum possible speed ?
For reference, my system is described by:
$ sudo likwid-topology
-------------------------------------------------------------
CPU type: Intel Core SandyBridge processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets: 1
Cores per socket: 4
Threads per core: 1
-------------------------------------------------------------
HWThread Thread Core Socket
0 0 0 0
1 0 1 0
2 0 2 0
3 0 3 0
-------------------------------------------------------------
Socket 0: ( 0 1 2 3 )
-------------------------------------------------------------
*************************************************************
Cache Topology
*************************************************************
Level: 1
Size: 32 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
-------------------------------------------------------------
Level: 2
Size: 256 kB
Cache groups: ( 0 ) ( 1 ) ( 2 ) ( 3 )
-------------------------------------------------------------
Level: 3
Size: 6 MB
Cache groups: ( 0 1 2 3 )
-------------------------------------------------------------
*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 1
-------------------------------------------------------------
Domain 0:
Processors: 0 1 2 3
Relative distance to nodes: 10
Memory: 122.332 MB free of total 5898.17 MB
-------------------------------------------------------------
You can't really define a 'right size' for buffering. My answer would be to set it as big as reasonably possible. I would say somewhere between 10MB and 100MB, but you can set it higher if you can afford it, or lower if you are short on RAM.
If you are reading a file and writing to a file (same or another), you should consider using memory mapped files. This way you get rid of the buffering (managed by the OS), and you can call your function once for the whole image. Note that this is probably not a good idea on a 32-bit system if your image is bigger than 4GB.

How to generate an XML file for face detection

Now I am creating my own classifier for face detection.now I want to train the classifier.So when I give the command 'opencv_haartraining -data facehaar -vec vecfile.vec -bg negatives.txt -npos 3 -nneg 5 -nstages 30 -w 30 -h 32' it shows like this.What is this error?I don't understand?Could any one help me?
Data dir name: facehaar
Vec file name: vecfile.vec
BG file name: negatives.txt, is a vecfile: no
Num pos: 3
Num neg: 5
Num stages: 30
Num splits: 1 (stump as weak classifier)
Mem: 200 MB
Symmetric: TRUE
Min hit rate: 0.995000
Max false alarm rate: 0.500000
Weight trimming: 0.950000
Equal weights: FALSE
Mode: BASIC
Width: 30
Height: 32
Applied boosting algorithm: GAB
Error (valid only for Discrete and Real AdaBoost): misclass
Max number of splits in tree cascade: 0
Min number of positive samples per cluster: 500
Required leaf false alarm rate: 9.31323e-10
Tree Classifier
Stage
+---+
| 0|
+---+
Number of features used : 234720
Parent node: NULL
*** 1 cluster ***
POS: 3 3 1.000000
Invalid background description file.

(Re)loading the R Tree with spatialindex library

I am bulkloading an R Tree with spatialindex (http://libspatialindex.github.com/) library:
string baseName = "streets";
size_t capacity = 10 * 1024 * 1024;
bool bWriteThrough = false;
indexIdentifier = 0;
IStorageManager *disk = StorageManager::createNewDiskStorageManager(baseName, 512);
fileInMem = StorageManager
::createNewRandomEvictionsBuffer(*disk, capacity, bWriteThrough);
// bulkLoads my tree
bulkLoadRTree();
cout << "tree info:" << endl;
cout << *tree << endl;
delete disk;
The following is output at the info about the built tree:
Dimension: 2
Fill factor: 0.7
Index capacity: 100
Leaf capacity: 100
Tight MBRs: enabled
Near minimum overlap factor: 32
Reinsert factor: 0.3
Split distribution factor: 0.4
Utilization: 69%
Reads: 1
Writes: 35980
Hits: 0
Misses: 0
Tree height: 4
Number of data: 2482376
Number of nodes: 35979
Level 0 pages: 35463
Level 1 pages: 507
Level 2 pages: 8
Level 3 pages: 1
Splits: 0
Adjustments: 0
Query results: 0
now I am trying to load what I have saved in the disk:
IStorageManager *ldisk = StorageManager::loadDiskStorageManager(baseName);
SpatialIndex::StorageManager::IBuffer* fileLoadBuffer = StorageManager
::createNewRandomEvictionsBuffer(*ldisk, capacity, bWriteThrough);
id_type id = 1;
tree = RTree::loadRTree(*fileLoadBuffer, id);
cout << *tree << endl;
and the tree has only one node (the output of the tree is:)
Dimension: 2
Fill factor: 0.7
Index capacity: 100
Leaf capacity: 100
Tight MBRs: enabled
Near minimum overlap factor: 32
Reinsert factor: 0.3
Split distribution factor: 0.4
Utilization: 0%
Reads: 0
Writes: 0
Hits: 0
Misses: 0
Tree height: 1
Number of data: 0
Number of nodes: 1
Level 0 pages: 1
Splits: 0
Adjustments: 0
Query results: 0
What do I do wrong? Why don't I load the whole tree from the disk?
Did you maybe not sync your changes to disc?
Plus, usually one would implement the tree on-disk, and not read it completely on the first access. So at this point, it cannot report accurate statistics.
Or maybe your bulkLoadRTree does not use fileInMem.
One has to delete the fileInMem so the pages are further sent back to disk and further sent back to delete *disk. This line needs to be added before delete disk:
delete fileInMem