Low Memory Shortest Path Algorithm - c++

I have a global unique path table which can be thought of as a directed un-weighted graph. Each node represents either a piece of physical hardware which is being controlled, or a unique location in the system. The table contains the following for each node:
A unique path ID (int)
Type of component (char - 'A' or 'L')
String which contains a comma separated list of path ID's which that node is connected to (char[])
I need to create a function which given a starting and ending node, finds the shortest path between the two nodes. Normally this is a pretty simple problem, but here is the issue I am having. I have a very limited amount of memory/resources, so I cannot use any dynamic memory allocation (ie a queue/linked list). It would also be nice if it wasn't recursive (but it wouldn't be too big of an issue if it was as the table/graph itself if really small. Currently it has 26 nodes, 8 of which will never be hit. At worst case there would be about 40 nodes total).
I started putting something together, but it doesn't always find the shortest path. The pseudo code is below:
bool shortestPath(int start, int end)
if start == end
if pathTable[start].nodeType == 'A'
Turn on part
end if
return true
else
mark the current node
bool val
for each node in connectedNodes
if node is not marked
val = shortestPath(node.PathID, end)
end if
end for
if val == true
if pathTable[start].nodeType == 'A'
turn on part
end if
return true
end if
end if
return false
end function
Anyone have any ideas how to either fix this code, or know something else that I could use to make it work?
----------------- EDIT -----------------
Taking Aasmund's advice, I looked into implementing a Breadth First Search. Below I have some c# code which I quickly threw together using some pseudo code I found online.
pseudo code found online:
Input: A graph G and a root v of G
procedure BFS(G,v):
create a queue Q
enqueue v onto Q
mark v
while Q is not empty:
t ← Q.dequeue()
if t is what we are looking for:
return t
for all edges e in G.adjacentEdges(t) do
u ← G.adjacentVertex(t,e)
if u is not marked:
mark u
enqueue u onto Q
return none
C# code which I wrote using this code:
public static bool newCheckPath(int source, int dest)
{
Queue<PathRecord> Q = new Queue<PathRecord>();
Q.Enqueue(pathTable[source]);
pathTable[source].markVisited();
while (Q.Count != 0)
{
PathRecord t = Q.Dequeue();
if (t.pathID == pathTable[dest].pathID)
{
return true;
}
else
{
string connectedPaths = pathTable[t.pathID].connectedPathID;
for (int x = 0; x < connectedPaths.Length && connectedPaths != "00"; x = x + 3)
{
int nextNode = Convert.ToInt32(connectedPaths.Substring(x, 2));
PathRecord u = pathTable[nextNode];
if (!u.wasVisited())
{
u.markVisited();
Q.Enqueue(u);
}
}
}
}
return false;
}
This code runs just fine, however, it only tells me if a path exists. That doesn't really work for me. Ideally what I would like to do is in the block "if (t.pathID == pathTable[dest].pathID)" I would like to have either a list or a way to see what nodes I had to pass through to get from the source and destination, such that I can process those nodes there, rather than returning a list to process elsewhere. Any ideas on how i could make that change?

The most effective solution, if you're willing to use static memory allocation (or automatic, as I seem to recall that the C++ term is), is to declare a fixed-size int array (of size 41, if you're absolutely certain that the number of nodes will never exceed 40). By using two indices to indicate the start and end of the queue, you can use this array as a ring buffer, which can act as the queue in a breadth-first search.
Alternatively: Since the number of nodes is so small, Bellman-Ford may be fast enough. The algorithm is simple to implement, does not use recursion, and the required extra memory is only a distance (int, or even byte in your case) and a predecessor id (int) per node. The running time is O(VE), alternatively O(V^3), where V is the number of nodes and E is the number of edges.

Related

Fastest ways to check if a value already exists in a stl container

I am holding a very big list of memory addresses (around 400.000) and need to check if a certain address already exists in it 400.000 times a second.
A code example to illustrate my setup:
std::set<uintptr_t> existingAddresses; // this one contains 400.000 entries
while (true) {
// a new list with possible new addresses
std::set<uintptr_t> newAddresses; // also contains about ~400.000 entries
// in my own code, these represent a new address list
for (auto newAddress : newAddresses) {
// already processed this address, skip it
if (existingAddresses.find(newAddress) != existingAddresses.end()) {
continue;
}
// we didn't have this address yet, so process it.
SomeHeavyTask(newAddress);
// so we don't process it again
existingAddresses.emplace(newAddress);
}
Sleep(1000);
}
This is the first implementation I came up with and I think it can be greatly improved.
Next I came up with using some custom indexing strategy, also used in databases. The idea is to take a part of the value and use that to index it in its own group set. If I would take for example the last two numbers of the address I would have 16^2 = 256 groups to put addresses in.
So I would end up with a map like this:
[FF] -> all address ending with `FF`
[EF] -> all addresses ending with `EF`
[00] -> all addresses ending with `00`
// etc...
With this I will only need to do a lookup on ~360 entries in the corresponding set. Resulting in ~360 lookups being done 400.000 times a second. Much better!
I am wondering if there are any other tricks or better ways to do this? My goal is to make this address lookup as FAST as possible.
std::set<uintptr_t> uses a balanced tree, so look-up time is O(log N).
std::unordered_set<uintptr_t>, on the other hand, is hash-based, with lookup time of O(1).
Although this is only an asymptotic complexity measure, meaning that there is no guaranteed improvement due to constant factors involved, the difference may prove significant when the collection contains 400,000 elements.
You may use algorithm similar to merge:
std::set<uintptr_t> existingAddresses; // this one contains 400.000 entries
while (true) {
// a new list with possible new addresses
std::set<uintptr_t> newAddresses; // also contains about ~400.000 entries
auto existing_it = existingAddresses.begin();
auto new_it = newAddresses.begin();
while (new_it != newAddresses.end() && existing_it != existingAddresses.end()) {
if (*new_it < *existing_it) {
// we didn't have this address yet, so process it.
SomeHeavyTask(*new_it);
// so we don't process it again
existingAddresses.insert(existing_it, *new_it);
++new_it;
} else if (*existing_it < *new_it) {
++existing_it;
} else { // Both equal
++existing_it;
++new_it;
}
}
for (new_it != newAddresses.end())
// we didn't have this address yet, so process it.
SomeHeavyTask(*new_it);
// so we don't process it again
existingAddresses.insert(existingAddresses.end(), *new_it);
++new_it;
}
Sleep(1000);
}
Complexity is now linear: O(N + M) instead of O(N log M) (with N number of new addresses, and M count for old ones).

Segmentation fault in recursive function when using smart pointers

I get a segmentation fault in the call to
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
after a few recursive calls. Strange thing is that it's always at the same point in time. Can anyone spot the problem?
This is an implementation for a dynamic programming problem and here I'm accumulating the costs of a path. I have simplified the cost function but in this example the problem still occurs.
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
n->cost= 1 + n->prev->cost;
//Check if we reached the last column(done!)
if (n->x==current_edges.cols-1)
{
//Save the info in the last node if it's the cheapest path
if (last_node->cost > n->cost)
{
last_node->cost=n->cost;
last_node->prev=n;
}
}
else
{
//Check for neighboring pixels to see if they are edges, launch dp with all the ones that are
for (int i=0;i<2;i++)
{
for (int j=-1;j<2;j++)
{
if (i==0 && j==0) continue;
if (n->x+i >= current_edges.cols || n->x+i < 0 ||
n->y+j >= current_edges.rows || n->y+j < 0) continue;
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
dp(n1);
}
}
}
}
}
class Node
{
public:
Node(){}
Node(std::shared_ptr<Node> p,int x_,int y_){prev=p;x=x_;y=y_;lost=0;}
Node(Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
std::shared_ptr<Node> prev; //Previous and next nodes
int cost; //Total cost until now
int lost; //Number of steps taken without a clear path
int x,y;
Node& operator=(const Node &n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;}//next=n1.next;}
Node& operator=(Node &&n1){x=n1.x;y=n1.y;cost=n1.cost;lost=n1.lost;prev=n1.prev;n1.prev=nullptr;}//next=n1.next;n1.next.clear();}
};
Your code looks like a pathological path search, in that it checks almost every path and doesn't keep track of paths it has already checked you can get to more than one way.
This will build recursive depth equal to the length of the longest path, and then the next longest path, and ... down to the shortest one. Ie, something like O(# of pixels) depth.
This is bad. And, as call stack depth is limited, will crash you.
The easy solution is to modify dp into dp_internal, and have dp_internal return a vector of nodes to process next. Then write dp, which calls dp_internal and repeats on its return value.
std::vector<std::shared_ptr<Node>>
HorizonLineDetector::dp_internal(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> retval;
...
if (current_edges.at<char>(n->y+j,n->x+i)!=0)
{
auto n1=std::make_shared<Node>(n,n->x+i,n->y+j);
//n->next.push_back(n1);
nlist.push_back(n1);
retval.push_back(n1);
}
...
return retval;
}
then dp becomes:
void HorizonLineDetector::dp(std::shared_ptr<Node> n)
{
std::vector<std::shared_ptr<Node>> nodes={n};
while (!nodes.empty()) {
auto node = nodes.back();
nodes.pop_back();
auto new_nodes = dp_internal(node);
nodes.insert(nodes.end(), new_nodes.begin(), new_nodes.end());
}
}
but (A) this will probably just crash when the number of queued-up nodes gets ridiculously large, and (B) this just patches over the recursion-causes-crash, doesn't make your algorithm suck less.
Use A*.
This involves keeping track of which nodes you have visited and what nodes to process next with their current path cost.
You then use heuristics to figure out which of the ones to process next you should check first. If you are on a grid of some sort, the heuristic is to use the shortest possible distance if nothing was in the way.
Add the cost to get to the node-to-process, plus the heuristic distance from that node to the destination. Find the node-to-process that has the least total. Process that one: you mark it as visited, and add all of its adjacent nodes to the list of nodes to process.
Never add a node to the list of nodes to process that you have already visited (as that is redundant work).
Once you have a solution, prune the list of nodes to process against any node whose current path value is greater than or equal to your solution. If you know your heuristic is a strong one (that it is impossible to get to the destination faster), you can even prune based off of the total of heuristic and current cost. Similarly, don't add to the list of nodes to process if it would be pruned by this paragraph.
The result is that your algorithm searches in a relatively strait line towards the target, and then expands outwards trying to find a way around any barriers. If there is a relatively direct route, it is used and the rest of the universe isn't even touched.
There are many optimizations on A* you can do, and even alternative solutions that don't rely on heuristics. But start with A*.

Need suggestion to improve speed for word break (dynamic programming)

The problem is: Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "hithere",
dict = ["hi", "there"].
Return true because "hithere" can be segmented as "leet code".
My implementation is as below. This code is ok for normal cases. However, it suffers a lot for input like:
s = "aaaaaaaaaaaaaaaaaaaaaaab", dict = {"aa", "aaaaaa", "aaaaaaaa"}.
I want to memorize the processed substrings, however, I cannot done it right. Any suggestion on how to improve? Thanks a lot!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
for(int i(0); i<len; i++) {
string tmp = s.substr(0, i+1);
if((wordDict.find(tmp)!=wordDict.end())
&& (wordBreak(s.substr(i+1), wordDict)) )
return true;
}
return false;
}
};
It's logically a two-step process. Find all dictionary words within the input, consider the found positions (begin/end pairs), and then see if those words cover the whole input.
So you'd get for your example
aa: {0,2}, {1,3}, {2,4}, ... {20,22}
aaaaaa: {0,6}, {1,7}, ... {16,22}
aaaaaaaa: {0,8}, {1,9} ... {14,22}
This is a graph, with nodes 0-23 and a bunch of edges. But node 23 b is entirely unreachable - no incoming edge. This is now a simple graph theory problem
Finding all places where dictionary words occur is pretty easy, if your dictionary is organized as a trie. But even an std::map is usable, thanks to its equal_range method. You have what appears to be an O(N*N) nested loop for begin and end positions, with O(log N) lookup of each word. But you can quickly determine if s.substr(begin,end) is a still a viable prefix, and what dictionary words remain with that prefix.
Also note that you can build the graph lazily. Staring at begin=0 you find edges {0,2}, {0,6} and {0,8}. (And no others). You can now search nodes 2, 6 and 8. You even have a good algorithm - A* - that suggests you try node 8 first (reachable in just 1 edge). Thus, you'll find nodes {8,10}, {8,14} and {8,16} etc. As you see, you'll never need to build the part of the graph that contains {1,3} as it's simply unreachable.
Using graph theory, it's easy to see why your brute-force method breaks down. You arrive at node 8 (aaaaaaaa.aaaaaaaaaaaaaab) repeatedly, and each time search the subgraph from there on.
A further optimization is to run bidirectional A*. This would give you a very fast solution. At the second half of the first step, you look for edges leading to 23, b. As none exist, you immediately know that node {23} is isolated.
In your code, you are not using dynamic programming because you are not remembering the subproblems that you have already solved.
You can enable this remembering, for example, by storing the results based on the starting position of the string s within the original string, or even based on its length (because anyway the strings you are working with are suffixes of the original string, and therefore its length uniquely identifies it). Then, in the beginning of your wordBreak function, just check whether such length has already been processed and, if it has, do not rerun the computations, just return the stored value. Otherwise, run computations and store the result.
Note also that your approach with unordered_set will not allow you to obtain the fastest solution. The fastest solution that I can think of is O(N^2) by storing all the words in a trie (not in a map!) and following this trie as you walk along the given string. This achieves O(1) per loop iteration not counting the recursion call.
Thanks for all the comments. I changed my previous solution to the implementation below. At this point, I didn't explore to optimize on the dictionary, but those insights are very valuable and are very much appreciated.
For the current implementation, do you think it can be further improved? Thanks!
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict) {
int len = s.size();
if(len<1) return true;
if(wordDict.size()==0) return false;
vector<bool> dq (len+1,false);
dq[0] = true;
for(int i(0); i<len; i++) {// start point
if(dq[i]) {
for(int j(1); j<=len-i; j++) {// length of substring, 1:len
if(!dq[i+j]) {
auto pos = wordDict.find(s.substr(i, j));
dq[i+j] = dq[i+j] || (pos!=wordDict.end());
}
}
}
if(dq[len]) return true;
}
return false;
}
};
Try the following:
class Solution {
public:
bool wordBreak(string s, unordered_set<string>& wordDict)
{
for (auto w : wordDict)
{
auto pos = s.find(w);
if (pos != string::npos)
{
if (wordBreak(s.substr(0, pos), wordDict) &&
wordBreak(s.substr(pos + w.size()), wordDict))
return true;
}
}
return false;
}
};
Essentially one you find a match remove the matching part from the input string and so continue testing on a smaller input.

priority_queue becomes extremely slow in debug mode

I am currently writing an A* pathfinding algorithm for a game and came across a very strange performance problem regarding priority_queue's.
I am using a typical 'open nodes list', where I store found, but yet unprocessed nodes. This is implemented as an STL priority_queue (openList) of pointers to PathNodeRecord objects, which store information about a visited node. They are sorted by the estimated cost to get there (estimatedTotalCost).
Now I noticed that whenever the pathfinding method is called, the respective AI thread gets completely stuck and takes several (~5) seconds to process the algorithm and calculate the path. Subsequently I used the VS2013 profiler to see, why and where it was taking so long.
As it turns out, the pushing to and popping from the open list (the priority_queue) takes up a very large amount of time. I am no expert in STL containers, but I never had problems with their efficiency before and this is just weird to me.
The strange thing is that this only occurs while using VS's 'Debug' build configuration. The 'Release' conf. works fine for me and the times are back to normal.
Am I doing something fundamentally wrong here or why is the priority_queue performing so badly for me? The current situation is unacceptable to me, so if I cannot resolve it soon, I will need to fall back to using a simpler container and inserting it to the right place manually.
Any pointers to why this might be occuring would be very helpful!
.
Here is a snippet of what the profiler shows me:
http://i.stack.imgur.com/gEyD3.jpg
.
Code parts:
Here is the relevant part of the pathfinding algorithm, where it loops the open list until there are no open nodes:
// set up arrays and other variables
PathNodeRecord** records = new PathNodeRecord*[graph->getNodeAmount()]; // holds records for all nodes
std::priority_queue<PathNodeRecord*> openList; // holds records of open nodes, sorted by estimated rest cost (most promising node first)
// null all record pointers
memset(records, NULL, sizeof(PathNodeRecord*) * graph->getNodeAmount());
// set up record for start node and put into open list
PathNodeRecord* startNodeRecord = new PathNodeRecord();
startNodeRecord->node = startNode;
startNodeRecord->connection = NULL;
startNodeRecord->closed = false;
startNodeRecord->costToHere = 0.f;
startNodeRecord->estimatedTotalCost = heuristic->estimate(startNode, goalNode);
records[startNode] = startNodeRecord;
openList.push(startNodeRecord);
// ### pathfind algorithm ###
// declare current node variable
PathNodeRecord* currentNode = NULL;
// loop-process open nodes
while (openList.size() > 0) // while there are open nodes to process
{
// retrieve most promising node and immediately remove from open list
currentNode = openList.top();
openList.pop(); // ### THIS IS, WHERE IT GETS STUCK
// if current node is the goal node, end the search here
if (currentNode->node == goalNode)
break;
// look at connections outgoing from this node
for (auto connection : graph->getConnections(currentNode->node))
{
// get end node
PathNodeRecord* toNodeRecord = records[connection->toNode];
if (toNodeRecord == NULL) // UNVISITED -> path record needs to be created and put into open list
{
// set up path node record
toNodeRecord = new PathNodeRecord();
toNodeRecord->node = connection->toNode;
toNodeRecord->connection = connection;
toNodeRecord->closed = false;
toNodeRecord->costToHere = currentNode->costToHere + connection->cost;
toNodeRecord->estimatedTotalCost = toNodeRecord->costToHere + heuristic->estimate(connection->toNode, goalNode);
// store in record array
records[connection->toNode] = toNodeRecord;
// put into open list for future processing
openList.push(toNodeRecord);
}
else if (!toNodeRecord->closed) // OPEN -> evaluate new cost to here and, if better, update open list entry; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
}
}
else // CLOSED -> evaluate new cost to here and, if better, put back on open list and reset closed status; otherwise skip
{
float newCostToHere = currentNode->costToHere + connection->cost;
if (newCostToHere < toNodeRecord->costToHere)
{
// update record
toNodeRecord->connection = connection;
toNodeRecord->estimatedTotalCost = newCostToHere + (toNodeRecord->estimatedTotalCost - toNodeRecord->costToHere);
toNodeRecord->costToHere = newCostToHere;
// reset node to open and push into open list
toNodeRecord->closed = false;
openList.push(toNodeRecord); // ### THIS IS, WHERE IT GETS STUCK
}
}
}
// set node to closed
currentNode->closed = true;
}
Here is my PathNodeRecord with the 'less' operator overloading to enable sorting in priority_queue:
namespace AI
{
struct PathNodeRecord
{
Node node;
NodeConnection* connection;
float costToHere;
float estimatedTotalCost;
bool closed;
// overload less operator comparing estimated total cost; used by priority queue
// nodes with a higher estimated total cost are considered "less"
bool operator < (const PathNodeRecord &otherRecord)
{
return this->estimatedTotalCost > otherRecord.estimatedTotalCost;
}
};
}
std::priority_queue<PathNodeRecord*> openList
I think the reason is that you have a priority_queue of pointers to PathNodeRecord.
and there is no ordering defined for the pointers.
try changing it to std::priority_queue<PathNodeRecord> first, if it makes a difference then all you need is passing on your own comparator that knows how to compare pointers to PathNodeRecord, it will just dereference the pointers first and then do the comparison.
EDIT:
taking a wild guess about why did you get an extremely slow execution time, I think the pointers were compared based on their address. and the addresses were allocated starting from one point in memory and going up.
and so that resulted in the extreme case of your heap (the heap as in data structure not the memory part), so your heap was actually a list, (a tree where each node had one children node and so on).
and so you operation took a linear time, again just a guess.
You cannot expect a debug build to be as fast as a release optimized one, but you seems to do a lot of dynamic allocation that may interact badly with the debug runtime.
I suggest you to add _NO_DEBUG_HEAP=1 in the environment setting of the debug property page of your project.

Pointer comparision issue

I'm having a problem with a pointer and can't get around it..
In a HashTable implementation, I have a list of ordered nodes in each bucket.The problem I have It's in the insert function, in the comparision to see if the next node is greater than the current node(in order to inserted in that position if it is) and keep the order.
You might find this hash implementation strange, but I need to be able to do tons of lookups(but sometimes also very few) and count the number of repetitions if It's already inserted (so I need fasts lookups, thus the Hash , I've thought about self-balanced trees as AVL or R-B trees, but I don't know them so I went with the solution I knew how to implement...are they faster for this type of problem?),but I also need to retrieve them by order when I've finished.
Before I had a simple list and I'd retrieve the array, then do a QuickSort, but I think I might be able to improve things by keeping the lists ordered.
What I have to map It's a 27 bit unsigned int(most exactly 3 9 bits numbers, but I convert them to a 27 bit number doing (Sr << 18 | Sg << 9 | Sb) making at the same time their value the hash_value. If you know a good function to map that 27 bit int to an 12-13-14 bit table let me know, I currently just do the typical mod prime solution.
This is my hash_node struct:
class hash_node {
public:
unsigned int hash_value;
int repetitions;
hash_node *next;
hash_node( unsigned int hash_val,
hash_node *nxt);
~hash_node();
};
And this is the source of the problem
void hash_table::insert(unsigned int hash_value) {
unsigned int p = hash_value % tableSize;
if (table[p]!=0) { //The bucket has some elements already
hash_node *pred; //node to keep the last valid position on the list
for (hash_node *aux=table[p]; aux!=0; aux=aux->next) {
pred = aux; //last valid position
if (aux->hash_value == hash_value ) {
//It's already inserted, so we increment it repetition counter
aux->repetitions++;
} else if (hash_value < (aux->next->hash_value) ) { //The problem
//If the next one is greater than the one to insert, we
//create a node in the middle of both.
aux->next = new hash_node(hash_value,aux->next);
colisions++;
numElem++;
}
}//We have arrive to the end od the list without luck, so we insert it after
//the last valid position
ant->next = new hash_node(hash_value,0);
colisions++;
numElem++;
}else { //bucket it's empty, insert it right away.
table[p] = new hash_node(hash_value, 0);
numElem++;
}
}
This is what gdb shows:
Program received signal SIGSEGV, Segmentation fault.
0x08050b4b in hash_table::insert (this=0x806a310, hash_value=3163181) at ht.cc:132
132 } else if (hash_value < (aux->next->hash_value) ) {
Which effectively indicates I'm comparing a memory adress with a value, right?
Hope It was clear. Thanks again!
aux->next->hash_value
There's no check whether "next" is NULL.
aux->next might be NULL at that point? I can't see where you have checked whether aux->next is NULL.