A* pathfinding slow

A* pathfinding slow - c++

I am currently working on a A* search algorithm. The algorithm would just be solving text file mazes. I know that the A* algorithm is supposed to be very quick in finding the finish. Mine seems to take 6 seconds to find the path in a 20x20 maze with no walls. It does find the finish with the correct path it just takes forever to do so.
If I knew which part of code was the problem I would just post that but I really have no idea what is going wrong. So here is the algorithm that I use...
while(!openList.empty()) {
visitedList.push_back(openList[index]);
openList.erase(openList.begin() + index);
if(currentCell->x_coor == goalCell->x_coor && currentCell->y_coor == goalCell->y_coor)
}
FindBestPath(currentCell);
break;
}
if(map[currentCell->x_coor+1][currentCell->y_coor] != wall)
{
openList.push_back(new SearchCell(currentCell->x_coor+1,currentCell->y_coor,currentCell));
}
if(map[currentCell->x_coor-1][currentCell->y_coor] != wall)
{
openList.push_back(new SearchCell(currentCell->x_coor-1,currentCell->y_coor,currentCell));
}
if(map[currentCell->x_coor][currentCell->y_coor+1] != wall)
{
openList.push_back(new SearchCell(currentCell->x_coor,currentCell->y_coor+1,currentCell));
}
if(map[currentCell->x_coor][currentCell->y_coor-1] != wall)
{
openList.push_back(new SearchCell(currentCell->x_coor,currentCell->y_coor-1,currentCell));
}
for(int i=0;i<openList.size();i++) {
openList[i]->G = openList[i]->parent->G + 1;
openList[i]->H = openList[i]->ManHattenDistance(goalCell);
}
float bestF = 999999;
index = -1;
for(int i=0;i<openList.size();i++) {
if(openList[i]->GetF() < bestF) {
for(int n=0;n<visitedList.size();n++) {
if(CheckVisited(openList[i])) {
bestF = openList[i]->GetF();
index = i;
}
}
}
}
if(index >= 0) {
currentCell = openList[index];
}
}
I know this code is messy and not the most efficient way to do things but I think it should still be faster then what it is. Any help would be greatly appreciated.
Thanks.

Your 20x20 maze has no walls, and therefore many, many routes which are all the same length. I'd estimate trillions of equivalent routes, in fact. It doesn't seem so bad when you take that into account.
Of course, since your heuristic looks perfect, you should get a big benefit from excluding routes that are heuristically predicted to be precisely as long as the best route known so far. (This is safe if your heuristic is correct, i.e. never overestimates the remaining distance).

Here is a big hint.
If ever you find two paths to the same cell, you can always throw away the longer one. If there is a tie, you can throw away the second one to get there.
If you implement that, with no other optimizations, the search would become more than acceptably fast.
Secondly the A* algorithm should only bother backtracking if the length to the current cell plus the heuristic exceeds the length to the current cell plus the heuristic for any other node. If you implement that, then it should directly find a path and stop. To facilitate that you need to store paths in a priority queue (typically implemented with a heap), not a vector.

openList.erase is O(n), and the for-loop beginning with for(int i=0;i<openList.size();i++) is O(n^2) due to the call to CheckVisited - these are called every iteration, making your overall algorithm O(n^3). A* should be O(n log n).
Try changing openList to a priority-queue like it's supposed to be, and visitedList to a hash table. The entire for loop can then be replaced by a dequeue - make sure you check if visitedList.Contains(node) before enqueuing!
Also, there is no need to recalculate the ManHattenDistance for every node every iteration, since it never changes.

Aren't you constantly backtracking?
The A* algorithm backtracks when the current best solution becomes worse than another previously visited route. In your case, since there are no walls, all routes are good and never die (and as MSalters correctly pointed, there are several of them). When you take a step, your route becomes worse than all the others that are one step shorter.
If that is true, this may account for the time taken by your algorithm.

Related

Shortest path for unweighted graph using iterative depth first search algorithm

I have managed to find shortest path for unweighted graph using recursive dfs. Here is such an attempt.
void dfsHelper(graph*& g, int start,int end, bool*& visited, int& min, int i) {
visited[start] = true;
i = i + 1;
if (start == end) {
if (i<=min) { min = i; }
}
node* current = g->adj[start];
while (current != NULL) {
if (!visited[current->dest]) {
dfsHelper(g, current->dest,end, visited,min,i);
}
current = current->next;
}
visited[start] = false;
}
However for an iterative algorithm of dfs such as this one, how should I approach.
void dfsItr(graph*& g, int start, int end) {
bool* isVisited = new bool[g->numVertex];
for (int i = 0; i < g->numVertex; ++i) {
isVisited[i] = false;
}
stack<int> st;
isVisited[start] = true;
st.push(start);
while (!st.empty()) {
start = st.top();
cout << start << " ";
st.pop();
node* current = g->adj[start];
while (current != NULL) {
if (!isVisited[current->dest]) {
isVisited[current->dest] = true;
st.push(current->dest);
if (current->dest == end) {
cout << current->dest << endl;
}
}
current = current->next;
}
}
}
Is there any algorithm detailing about the procedure to follow. I am well aware of finding shortest path using BFS algorithm as given here or as suggested here. My initial intuition as to why such idea would work for BFS is that the traversal happens layer by layer,multiple children share same parent in each layer, so it is easy to backtrack just by following the parent node. In iterative dfs, it is not the case. Can someone shed some light as to how to proceed. Is there any proven algorithm to tackle this scenario.
Thanks.

It is not entirely clear for me what you are asking about...
If you ask about how to optimise iterative implementation of DFS, the one thing I'd do here is not use stack but write own collection that has LIFO interface but does pre-allocation of memory.
Other room for optimisation is not to use stream operators since they are significantly slower than printf. Check out this answer section about performance. Also, does it really make sense to print to STDOUT all the time? If performance is the key this might be done once every couple of iterations, since IO operations are really slow.
If you're asking about what algorithm is better than DFS approach it is hard to answer since it always depends on given problem. If you want to find best path between nodes, go for BFS-based (e.g. Dijkstra algorithm) since it would perform best in unweighted graphs in comparison to DFS (btw. A* won't do the trick here, since with no weights and no fancy heuristic it'll just collapse to DFS). If you are more interested in this topic, you can find more info on what tricks you could do to optimise path-finding in this books series.
Last but not least, give some heuristics a try too. Maybe there's no need to do exhaustive search to find solution to your problem.

Here is an example that illustrates why depth-first search, even with some optimizations, can be a bad idea. (It's almost always a bad idea, but the illustration does not go that far).
Suppose your graph is the complete graph on nodes 0, ..., n, that is, the graph containing every possible edge. Suppose further that the edges always appear in order in the data structure, and you want to find the shortest path from 0 to n.
Naive depth-first search will explore (n-1)! paths before it finds the optimal path. Breadth-first search explores n paths. Both cases are essentially worst cases (edit: worst-case orderings for this graph) for their respective algorithms.
You could optimize depth-first search in a couple of ways:
1) Prune the search if the current path is one hop shorter than the best successful path so far, and is not a successful path.
2) More aggressively, each time you visit a node for the first time, store the length of the current path in the node. Each later time you visit, compare the length of the current path with the previously stored length. If the new path is shorter, store the new length. Otherwise, prune the search.
Of these two, (2) is the more aggressive optimization. It's entirely worse than breadth-first search. In breadth-first search, every time you pass through a node it is because you reached it by a shortest path, and at that point the node becomes a dead end for all further path traversals. Neither of these things are the case for depth-first search. Additionally, the (asymptotic) memory cost of storing the lengths is no better than that of using a breadth-first queue.

Segmentation fault in recursive function when using smart pointers

I get a segmentation fault in the call to
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
after a few recursive calls. Strange thing is that it's always at the same point in time. Can anyone spot the problem?
This is an implementation for a dynamic programming problem and here I'm accumulating the costs of a path. I have simplified the cost function but in this example the problem still occurs.
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
n->cost= 1 + n->prev->cost;
//Check if we reached the last column(done!)
if (n->x==current_edges.cols-1)
{
//Save the info in the last node if it's the cheapest path
if (last_node->cost > n->cost)
{
last_node->cost=n->cost;
last_node->prev=n;
}
}
else
{
//Check for neighboring pixels to see if they are edges, launch dp with all the ones that are
for (int i=0;i<2;i++)
{
for (int j=-1;j<2;j++)
{
if (i==0 && j==0) continue;
if (n->x+i >= current_edges.cols || n->x+i < 0 ||
n->y+j >= current_edges.rows || n->y+j < 0) continue;
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
dp(n1);
}
}
}
}
}
class Node
{
public:
Node(){}
Node(std::shared_ptr<Node> p,int x_,int y_){prev=p;x=x_;y=y_;lost=0;}
Node(Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
std::shared_ptr<Node> prev; //Previous and next nodes
int cost; //Total cost until now
int lost; //Number of steps taken without a clear path
int x,y;
Node& operator=(const Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
Node& operator=(Node &&n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;n1.prev=nullptr;}//next=n1.next;n1.next.clear();}
};

Your code looks like a pathological path search, in that it checks almost every path and doesn't keep track of paths it has already checked you can get to more than one way.
This will build recursive depth equal to the length of the longest path, and then the next longest path, and ... down to the shortest one. Ie, something like O(# of pixels) depth.
This is bad. And, as call stack depth is limited, will crash you.
The easy solution is to modify dp into dp_internal, and have dp_internal return a vector of nodes to process next. Then write dp, which calls dp_internal and repeats on its return value.
std::vector<std::shared_ptr<Node>>
HorizonLineDetector::dp_internal(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> retval;
...
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
retval.push_back(n1);
}
...
return retval;
}
then dp becomes:
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> nodes={n};
while (!nodes.empty()) {
auto node = nodes.back();
nodes.pop_back();
auto new_nodes = dp_internal(node);
nodes.insert(nodes.end(), new_nodes.begin(), new_nodes.end());
}
}
but (A) this will probably just crash when the number of queued-up nodes gets ridiculously large, and (B) this just patches over the recursion-causes-crash, doesn't make your algorithm suck less.
Use A*.
This involves keeping track of which nodes you have visited and what nodes to process next with their current path cost.
You then use heuristics to figure out which of the ones to process next you should check first. If you are on a grid of some sort, the heuristic is to use the shortest possible distance if nothing was in the way.
Add the cost to get to the node-to-process, plus the heuristic distance from that node to the destination. Find the node-to-process that has the least total. Process that one: you mark it as visited, and add all of its adjacent nodes to the list of nodes to process.
Never add a node to the list of nodes to process that you have already visited (as that is redundant work).
Once you have a solution, prune the list of nodes to process against any node whose current path value is greater than or equal to your solution. If you know your heuristic is a strong one (that it is impossible to get to the destination faster), you can even prune based off of the total of heuristic and current cost. Similarly, don't add to the list of nodes to process if it would be pruned by this paragraph.
The result is that your algorithm searches in a relatively strait line towards the target, and then expands outwards trying to find a way around any barriers. If there is a relatively direct route, it is used and the rest of the universe isn't even touched.
There are many optimizations on A* you can do, and even alternative solutions that don't rely on heuristics. But start with A*.

QHashIterator in c++

I developed a game in C++, and want to make sure everything is properly done.
Is it a good solution to use a QHashIterator to check which item in the list has the lowest value (F-cost for pathfinding).
Snippet from my code:
while(!pathFound){ //do while path is found
QHashIterator<int, PathFinding*> iterator(openList);
PathFinding* parent;
iterator.next();
parent = iterator.value();
while(iterator.hasNext()){ //we take the next tile, and we take the one with the lowest value
iterator.next();
//checking lowest f value
if((iterator.value()->getGcost() + iterator.value()->getHcost()) < (parent->getGcost() + parent->getHcost())){
parent = iterator.value();
}
}
if(!atDestionation(parent,endPoint)){ //here we check if we are at the destionation. if we are we return our pathcost.
clearLists(parent);
filllists(parent,endPoint);
}else{
pathFound = true;
while(parent->hasParent()){
mylist.append(parent);
parent = parent->getParent();
}
pathcost = calculatePathCost(mylist); //we calculate what the pathcost is and return it
}
}
If no? Are there better improvements?
I also found someting about the std::priority_queue. It this mutch better then a QHashIterator?
It's maybe not a problem with gameworld where there which are not big. But i'm looking for a suitable solution when the game worlds are big (like + 10000 calculations).Any marks?

Here you basically scan the whole map to find the element that is the minimum one according to some values:
while(iterator.hasNext()){ //we take the next tile, and we take the one with the lowest value
iterator.next();
//checking lowest f value
if((iterator.value()->getGcost() + iterator.value()->getHcost()) < (parent->getGcost() + parent->getHcost())){
parent = iterator.value();
}
}
All this code, if you had an stl container, for instance a map, could be reduced to:
auto parent = std::min_element(iterator.begin(), iterator.end(), [](auto& lhs, auto& rhs)
{ lhs.value()->getGcost() + lhs.value()->getHcost()) < (rhs.value()->getGcost() + rhs.value()->getHcost() }
Once you have something easier to understand you can play around with different containers, for instance it might be faster to hold a sorted vector in this case.
)
Your code does not present any obvious problems per se, often performance gains are not conquered by optimizing little loops, it's more on how you code is organized. For instance I see that you have a lot of indirections, those cost a lot in cache misses. Or if you have to always find the minimum element, you could cache it in another structure and you would have it at a constant time, all the time.

Is there any way of optimising this function?

This piece of code seems to be the worst offender in terms of time in my program. What my program is trying to do find the minimum number of individual "nodes" required to satisfy a network with two constraints:
Each node must connect to x number of other nodes
Each node must have y degrees of separation between it and each of the nodes it's connected to.
However for values of x greater than 600 this task takes a very long time, the task is on the order of exponential anyway so I expect it to take forever at some point but that also means that if any small changes could be made here it'd speed up the entire program by alot.
uniint = unsigned long long int (64-bit)
network is a vector of the form vector<vector<uniint>>
The piece of code:
/* Checks if id2 is in id1's list of connections */
inline bool CheckIfInList (uniint id1, uniint id2)
{
uniint id1size = network[id1].size();
for (uniint itr = 0; itr < id1size; ++itr)
{
if (network[id1][itr] == id2)
{
return true;
}
}
return false;
}

The only way is to sort the network[id1] array when you build it.
If you arrive here with a sorted array you can easiliy find, if exists, what you are looking for using a dichotomic search.

Use std::map or std::unordered_map for fast search. I guess it's impossible to MICRO optimize this code, std::vector is cool. But not for 600 elements search.

I'm guessing CheckIfInList() is called in a loop? Perhaps a vector is not the best choice, you could try vector<set<uniint>>. This will give you O(log n) for a look up of the inner collection instead of O(n)

For quick microoptimization, check whether your compiler optimizes the multiple calls to network[id1] away. If not, that is where you loose a lot of time, so remember the address:
vector<uniint>& connectedNodes = network[id1];
uniint id1size = connectedNodes.size();
for (uniint itr = 0; itr < id1size; ++itr)
{
if (connectedNodes[itr] == id2)
{
return true;
}
}
return false;
If your compiler already took care of that, I'm afraid that there's not much you can micro optimize about this method. The only real optimization can be achieved on the algorithmic level, starting with sorting the neighbour lists, moving on to using unordered_map<> instead of vector<>, and ending with asking yourself whether you can't somehow reduce the number of calls to CheckIfInList().

This is not as effective as HAL9000's suggestion, and is good for cases when you have an unsorted list/array. What you can do is to ask less question in each iteration if you put the value you looking for at the end of the vector.
uniint id1size = network[id1].size();
network[id1][id1size] = id2;
for (uniint itr = 0; network[id1][itr] == id2; ++itr);
//if itr != id1size return true else flase....
need to add checks if the last member in the vector was your id2.
This way you don't need to ask each time whether you get to the end of the list.

Optimizing C++ Tree Generation

I'm generating a Tic-Tac-Toe game tree (9 seconds after the first move), and I'm told it should take only a few milliseconds. So I'm trying to optimize it, I ran it through CodeAnalyst and these are the top 5 calls being made (I used bitsets to represent the Tic-Tac-Toe board):
std::_Iterator_base::_Orphan_me
std::bitset<9>::test
std::_Iterator_base::_Adopt
std::bitset<9>::reference::operator bool
std::_Iterator_base::~_Iterator_base
void BuildTreeToDepth(Node &nNode, const int& nextPlayer, int depth)
{
if (depth > 0)
{
//Calculate gameboard states
int evalBoard = nNode.m_board.CalculateBoardState();
bool isFinished = nNode.m_board.isFinished();
if (isFinished || (nNode.m_board.isWinner() > 0))
{
nNode.m_winCount = evalBoard;
}
else
{
Ticboard tBoard = nNode.m_board;
do
{
int validMove = tBoard.FirstValidMove();
if (validMove != -1)
{
Node f;
Ticboard tempBoard = nNode.m_board;
tempBoard.Move(validMove, nextPlayer);
tBoard.Move(validMove, nextPlayer);
f.m_board = tempBoard;
f.m_winCount = 0;
f.m_Move = validMove;
int currPlay = (nextPlayer == 1 ? 2 : 1);
BuildTreeToDepth(f,currPlay, depth - 1);
nNode.m_winCount += f.m_board.CalculateBoardState();
nNode.m_branches.push_back(f);
}
else
{
break;
}
}while(true);
}
}
}
Where should I be looking to optimize it? How should I optimize these 5 calls (I don't recognize them=.

The tic-tac-toe game tree is very redundant. Eliminating rotated and mirrored boards will reduce the final ply of the game tree by 3 or 4 orders of magnitude. No amount of optimizations will make bubblesort as fast as introsort.
struct Game_board;
struct Node
{
Game_board game_board;
Node* parent;
std::vector<Node*> children;
enum { X_Win, Y_Win, Draw, Playing } outcome;
};
// returns the same hash value for all "identical" boards.
// ie boards that can be rotated or mirrored to look the
// same will have the same hash value
int hash( const Game_board& game_board );
// uses hash() function to generate hashes from Node*
struct Hash_functor;
// nodes yet to be explored.
std::hash_set<Node*,Hash_functor> open;
//nodes already explored.
std::hash_set<Node*,Hash_functor> closed;
while( ! open.empty() )
{
Node* node_to_expore = get_a_node( open );
assert( node_to_expore not in close or open sets )
if( node_to_expore is win lose or draw )
{
Mark node as win lose or draw
add node to closed set
}
loop through all children of node_to_expore
{
if( child in close )
{
add node from closed set to children list of node_to_expore
}
else if( child in open )
{
add node from open set to children list of node_to_explore
}
else
{
add child to open set
add child to children list of node_to_expore
}
}
}

Those functions are typically trivial. That means that an optimized ("release") build will typically have them inlined. However, in a debug build they're not. The result is that a debug build is slower, but allows you to set breakpoints on those functions. So, the "milliseconds comment" should be applied to the release build, where you wouldn't even have those functions anymore.

You're getting all wrapped up in data structure.
Don't build the tree, just walk it. Have only one copy of the board. At each node in the search tree, just modify the board, and on the way back out, un-modify it.
And if you want to know what it's doing, just hit the pause button at random. It will show you why it's in those routines you don't recognize that are taking all the time.

Honestly, and I don't mean this as a slam against you, you're asking us to examine a poorly documented piece of code that is a smaller part to a larger code base. We don't have the context that gives much information. I personally am also turned off by examining others' code when it doesn't appear that they've done all they can do to examine it themselves yet (and I don't mean this to say I'm annoyed at you or anything, just that my eyes are glazing over looking at your code).
I recommend you run your code through a profiler and determine what exactly is taking so much time. Treat this profiling like you're debugging. When you find a module taking a long time, examine that module in small sections (as if you're hunting for a bug) to see why.
This will allow you to ask a much more informed question if you still need to ask something.

You've posted far too little of your code.

You are asking how to optimize the code however you should also be asking how to optimize the algorithm.
There are two things that I immediately see.
As "Michael Dorgan" stated generate the tree of moves once.
How many broads are you generating in your tree? 362880? Your code appears to be generating redundant entries. For example, with an empty board there are actually three moves not nine moves. All other combinations are the board rotated (which are equal). You can reduce the number of boards that needs to be generated and speed up the generation of the tree.
Here are the three first moves(rotate the last two board to generate the other boards)
| |
|X|
| |
|X|
| |
| |
X| |
| |
| |

Let me add that if your system is taking 9 seconds to do its work, that means that something is being called billions and billions of times more than it should. If you don't have release level profiling abilities, place a few global counters in your code and increment them every time the code they are in is called. This will give you a poor man's profile that will work on release builds. If you see a billions calls somewhere you don't expect, you now have a place to look closer.
In reality, Tic-Tac-Toe's entire move tree should be trivial to store as a hash if you need the speed. 9! moves just isn't that big of a domain (anymore). 362k moves shouldn't break the bank, and that's a brute force analysis. That domain can be cut way down when you take into consideration all the various symetries of data.
Bah, here's how I would do it if I was coding it since people have latched onto my back of the envelope math:
I wouldn't even go the tree route, just some rules and be done.
Turn 1. Go in center.
Turn 2. If center unoccupied, go there, else go in corner
Turn 3. If opponent filled a corner, go in opposite corner, else go in corner - you've won.
Turn 4. Block if needed. If Xs occupy opposite corners, fill edge. If Xs occupy center and opposite corner, fill corner. If Xs occupy opposite edges, fill corner and win. If Xs fill adjacent edges, Fill corner betweem them.
Turn 5 Win if possible. Block if needed. Else go in corner opposite of adjacent edge move and win.
Turn 6-9 Win if possible. Block if needed. Else, place random towards draw.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js