finding a element in a matrix with the minimum cost starting from one point - c++

I have n*n matrix and I want to find the element from the matrix that has the minimum cost, the cost of a node meaning cost = Manhattandistance(startingnode,node) + costof(node) where starting node is a node in which perspective I am searching!
I did it just with 4 for loops and it works but I want to optimize it and I did something like a bfs: I used a queue and added first the starting node, after that in a while loop I pop ed the node from the queue and added to the queue all the elements around that node with the Manhatttan 1. I do this while the distance of the node that I just popped from the queue + the minimum price from the whole matrix (which I know from the start) is less than the minimum price that I have just found ( I compare the price of the node I just popped with min) if it's bigger I stop searching because the minimum node I found is the lowest possible value. The problem is this algorithm is to slow possibly because I use a std::queue? and it works in more time than the 4 for loops version. (I also used a flags matrix to see if the element I am inspecting when I add to the queue has been already added). The most time consuming block of code is the part I expand the node don't know why I just inspect if the element is valid I mean it's coordinates are less than n and bigger than 0 if ok I add the element to the queue!
I want to know how can I improve this or if it's another way to do it! Hope I was explicit enough.
this is the part of code that takes to long:
if((p1.dist + 1 + Pmin) < pretmincomp || (p1.dist + 1 + Pmin) < pretres){
std::vector<PAIR> vec;
PAIR pa;
int p1a=p1.a,p1b = p1.b,p1d = p1.dist;
if(isok(p1a+1,p1b,n)){
pa.a = p1a + 1;
pa.b = p1b;
pa.dist = p1d + 1;
vec.push_back(pa);
}
if(isok(p1a-1,p1b,n)){
pa.a = p1a - 1;
pa.b = p1b;
pa.dist = p1d + 1;
vec.push_back(pa);
}
if(isok(p1a,p1b+1,n)){
pa.a = p1a;
pa.b = p1b + 1;
pa.dist = p1d + 1;
vec.push_back(pa);
}
if(isok(p1a,p1b -1 ,n)){
pa.a = p1.a;
pa.b = p1.b - 1;
pa.dist = p1d + 1;
vec.push_back(pa);
}
for(std::vector<PAIR>::iterator it = vec.begin();
it!=vec.end(); it++){
if(flags[(*it).a][(*it).b] !=1){
devazut.push(*it);
flags[(*it).a][(*it).b] = 1;
}
}

You are dealing with a shortest path problem, which can be efficiently solved with BFS (if the graph is unweighted) or A* algorithm - if you have some "knowledge" on the graph and can estimate how much it will "cost" you to find a target from each node.
Your solution is very similar to BFS with one difference - BFS also maintains a visited set - of all the nodes you have already visited. The idea of this visited set is that you don't need to revisit a node that was already visited, because any path through it will be not shorter then the shortest path you will find during the first visit of this node.
Note that without the visited set - each node is revisited a lot of times, which makes the algorithm very inefficient.
Pseudo code for BFS (with visited set):
BFS(start):
q <- new queue
q.push(pair(start,0)) //0 indicates the distance from the start
visited <- new set
visited.add(start)
while (not q.isEmpty()):
curr <- q.pop()
if (curr.first is target):
return curr.second //the distance is indicated in the second element
for each neighbor of curr.first:
if (not set.contains(neighbor)): //add the element only if it is not in the set
q.push(pair(neighbor,curr.second+1)) //add the new element to queue
//and also add it to the visited set, so it won't be re-added to the queue.
visited.add(neighbot)
//when here - no solution was found
return infinity //exhausted all vertices and there is no path to a target

Related

Order Notation of pop-max in a binary heap

I need to write a pop_max function for a binary heap that removes the max element. The solution given is as below:
void pop_max() {
assert(!m_heap.empty());
int tmp = (size()+1)/2;
for (int i = tmp+1; i < size(); i++) {
if (m_heap[tmp] < m_heap[i])
tmp = i;
}
m_heap[tmp] = m_heap.back();
m_heap.pop_back();
this->percolate_up(tmp);
}
The solution also says the number of "nodes" visited is n+log(n) where n is the total number of nodes in the heap. It then goes on to say the running time is o(n).
This makes zero sense to me though.
Their solution finds the first leaf node int tmp = (size()+1)/2; then goes through the remaining leaf nodes.
Is their solution not n/2 nodes visited and o(n/2) for running time as well? Could someone explain why this might be?
Edit: O(n/2) = O(n). But what about the number of nodes visited? I still don't quite understand how it is o(n+log(n))
O(n/2) is equal to O(n)
Number of nodes visited means n for the leaf nodes and log(n) for percolate up!

If edges are not inserted in the deque in sorted order of weights, does 0-1 BFS produce the right answer?

The general trend of 0-1 BFS algorithms is: if the edge is encountered having weight = 0, then the node is pushed to the front of the deque and if the edge's weight = 1, then it will be pushed to the back of the deque.
If we randomly push the edges, then can 0-1 BFS calculate the right answer? What if edges are entered in the deque are not in sorted order of their weights?
This is the general 0-1 BFS algorithm. If I skip out the last if and else parts and randomly push the edges, then what will happen?
To me, it should work, but then why is this algorithm made in this way?
void bfs (int start)
{
std::deque<int> Q; // double ended queue
Q.push_back(start);
distance[start] = 0;
while(!Q.empty())
{
int v = Q.front();
Q.pop_front();
for(int i = 0 ; i < edges[v].size(); i++)
{
// if distance of neighbour of v from start node is greater than sum of
// distance of v from start node and edge weight between v and its
// neighbour (distance between v and its neighbour of v) ,then change it
if(distance[edges[v][i].first] > distance[v] + edges[v][i].second)
{
distance[edges[v][i].first] = distance[v] + edges[v][i].second;
// if edge weight between v and its neighbour is 0
// then push it to front of
// double ended queue else push it to back
if(edges[v][i].second == 0)
{
Q.push_front(edges[v][i].first);
}
else
{
Q.push_back(edges[v][i].first);
}
}
}
}
}
It is all a matter of performance. While random insertion still finds the shortest path, you have to consider a lot more paths (exponential in the size of the graph). So basically, the structured insertion guarantees a linear time complexity. Let's start with why the 0-1 BFS guarantees this complexity.
The basic idea is the same as the one of Dijkstra's algorithm. You visit nodes ordered by their distance from the start node. This ensures that you won't discover an edge that would decrease the distance to a node observed so far (which would require you to compute the entire subgraph again).
In 0-1 BFS, you start with the start node and the distances in the queue are just:
d = [ 0 ]
Then you consider all neighbors. If the edge weight is zero, you push it to the front, if it is one, then to the back. So you get a queue like this:
d = [ 0 0 0 1 1]
Now you take the first node. It may have neighbors for zero-weight edges and neighbors for one-weight edges. So you do the same and end up with a queue like this (new node are marked with *):
d = [ 0* 0* 0 0 1 1 1*]
So as you see, the nodes are still ordered by their distance, which is essential. Eventually, you will arrive at this state:
d = [ 1 1 1 1 1 ]
Going from the first node over a zero-weight edge produces a total path length of 1. Going over a one-weight edge results in two. So doing 0-1 BFS, you will get:
d = [ 1* 1* 1 1 1 1 2* 2*]
And so on... So concluding, the procedure is required to make sure that you visit nodes in order of their distance to the start node. If you do this, you will consider every edge only twice (once in the forward direction, once in the backward direction). This is because when visiting a node, you know that you cannot get to the node again with a smaller distance. And you only consider the edges emanating from a node when you visit it. So even if the node is added to the queue again by one of its neighbors, you will not visit it because the resulting distance will not be smaller than the current distance. This guarantees the time complexity of O(E), where E is the number of edges.
So what would happen if you did not visit nodes ordered by their distance from the start node? Actually, the algorithm would still find the shortest path. But it will consider a lot more paths. So assume that you have visited a node and that node is put in the queue again by one of its neighbors. This time, we cannot guarantee that the resulting distance will not be smaller. Thus, we might need to visit it again and put all its neighbors in the queue again. And the same applies to the neighbors, so in the worst case this might propagate through the entire graph and you end up visiting nodes over and over again. You will find a solution eventually because you always decrease the distance. But the time needed is far more than for the smart BFS.

Unable to Input the graph

I am solving the problem http://www.spoj.com/problems/SHOP/ in C++ but I am unable to figure out how to input the graph to furhter apply Dijkstra algorithm in it.
Here is the graph format-
4 3
X 1 S 3
4 2 X 4
X 1 D 2
First line indicated the columns & rows of the grid ,"S" & "D" -indicates source and destination respetively Numbers -indicates the time required to pass that block,"X"-indicates the no entry zone.
HOw to convert the following graph in nodes and edges as required by DIjkstra algorithm.I don't know how to convert the map into a graph.
There is no need to convert. Just imagine that you are in some point (i,j). (I assume that you have four moves allowed from each square). Then, you can go to either (i + 1, j), (i, j + 1), (i - 1, j), (i, j - 1) if:
1) That index is inside the table
2) That index is not marked with X
So, you give the position of square S to your Dijkstra algorithm. And each time you add the new set of allowed squares to your data structure. Once your reach the position of D you print it.
Besides, this problem does not seem weighted to me so you can use a simple BFS as well using a queue. But if you want to use Dijkstra and going to different squares has different costs. The you use a priority queue instead of queue.
For example, you can use a set data structure like this:
int dist[][]; // this contains the cost to get to some square
//dist is initialized with a large number
struct node{
int i, j; //location
node(int ii, int jj){
i = ii;
j = jj;
}
bool operator < (node &n)const{ //set in c++ will use this to sort
if(dist[i][j] == dist[n.i][n.j]) return i < n.i || j < n.j; //this is necessary
return dist[i][j] < dist[n.i][n.j];
}
};
set <node> q;
int main(){
//initialized dist with large number
dist[S.i][S.j] = 0; //we start from source
q.push(node(S.i, S.j));
while(true){
//pick the first element in set
//this element has the smallest cost
//update dist using this node if necessary
//for every node that you update remove from q and add it again
//this way the location of that node will be updated in q
//if you see square 'D' you are done and you can print dist[D.i][D.j]
}
return 0;
}
There is no need to convert the matrix into nodes and edges.
You can make structure which contain (row number,column number ,time ) where time will represent how much time taken to reach this coordinate from source. now make a min heap of this structure with key as time. now extract element (initially source will be in min heap with time as 0) from min heap and push the adjacent elements into min heap(only those elements which are not visited and do not contain a X) set visited of extracted element true.Go on like this until extracted element is not destination.

How to implement a minimum heap sort to find the kth smallest element?

I've been implementing selection sort problems for class and one of the assignments is to find the kth smallest element in the array using a minimum heap. I know the procedure is:
heapify the array
delete the minimum (root) k times
return kth smallest element in the group
I don't have any problems creating a minimum heap. I'm just not sure how to go about properly deleting the minimum k times and successfully return the kth smallest element in the group. Here's what I have so far:
bool Example::min_heap_select(long k, long & kth_smallest) const {
//duplicate test group (thanks, const!)
Example test = Example(*this);
//variable delcaration and initlization
int n = test._total ;
int i;
//Heapifying stage (THIS WORKS CORRECTLY)
for (i = n/2; i >= 0; i--) {
//allows for heap construction
test.percolate_down_protected(i, n);
}//for
//Delete min phase (THIS DOESN'T WORK)
for(i = n-1; i >= (n-k+1); i--) {
//deletes the min by swapping elements
int tmp = test._group[0];
test._group[0] = test._group[i];
test._group[i] = tmp;
//resumes perc down
test.percolate_down_protected(0, i);
}//for
//IDK WHAT TO RETURN
kth_smallest = test._group[0];
void Example::percolate_down_protected(long i, long n) {
//variable declaration and initlization:
int currPos, child, r_child, tmp;
currPos = i;
tmp = _group[i];
child = left_child(i);
//set a sentinel and begin loop (no recursion allowed)
while (child < n) {
//calculates the right child's position
r_child = child + 1;
//we'll set the child to index of greater than right and left children
if ((r_child > n ) && (_group[r_child] >= _group[child])) {
child = r_child;
}
//find the correct spot
if (tmp <= _group [child]) {
break;
}
//make sure the smaller child is beneath the parent
_group[currPos] = _group[child];
//shift the tree down
currPos = child;
child = left_child(currPos);
}
//put tmp where it belongs
_group[currPos] = tmp;
}
As I stated before, the minimum heap part works correctly. I understand what I what to do- it seems easy to delete the root k times but then after that what index in the array do I return... 0? This almost works- it doesn't worth with k = n or k = 1.Would the kth smallest element be in the Any help would be much appreciated!
The only array index which is meaningful to the user is zero, which is the minimum element. So, after removing k elements, the k'th smallest element will be at zero.
Probably you should destroy the heap and return the value rather than asking the user to concern themself with the heap itself… but I don't know the details of the assignment.
Note that the C++ Standard Library has algorithms to help with this: make_heap, pop_heap, and nth_element.
I am not providing a detailed answer, just explaining the key points in getting k smallest elements in a min-heap ordered tree. The approach uses skip lists.
First form a skip list of nodes of the tree with just one element the node corresponding to the root of the heap. the 1st minimum element is just the value stored at this node.
Now delete this node and insert its child nodes in the right position such that to maintain the order of values. This steps takes O(logk) time.
The second minimum value is just then the value at first node in this skip list.
Repeat the above steps until you get all the k minimum elements. The overall time complexity will be log(2)+log(3)+log(4)+... log(k) = O(k.logk). Forming a heap takes time n, so overall time complexity is O(n+klogk).
There is one more approach without making a heap that is Quickselect, which has an average time complexity of O(n) but worst case as O(n^2).
The striking difference between the two approaches is that the first approach gives all the k elements the minimum upto the kth minimum, while quickSelect gives only the kth minimum element.
Memory wise the former approach uses O(n) extra space which quickSelect uses O(1).

Top 10 Frequencies in a Hash Table with Linked Lists

The code below will print me the highest frequency it can find in my hash table (of which is a bunch of linked lists) 10 times. I need my code to print the top 10 frequencies in my hash table. I do not know how to do this (code examples would be great, plain english logic/pseudocode is just as great).
I create a temporary hashing list called 'tmp' which is pointing to my hash table 'hashtable'
A while loop then goes through the list and looks for the highest frequency, which is an int 'tmp->freq'
The loop will continue this process of duplicating the highest frequency it finds with the variable 'topfreq' until it reaches the end of the linked lists on the the hash table.
My 'node' is a struct comprising of the variables 'freq' (int) and 'word' (128 char). When the loop has nothing else to search for it prints these two values on screen.
The problem is, I can't wrap my head around figuring out how to find the next lowest number from the number I've just found (and this can include another node with the same freq value, so I have to check that the word is not the same too).
void toptenwords()
{
int topfreq = 0;
int minfreq = 0;
char topword[SIZEOFWORD];
for(int p = 0; p < 10; p++) // We need the top 10 frequencies... so we do this 10 times
{
for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
{
node* tmp;
tmp = hashtable[m];
while(tmp != NULL) // Walk through the entire linked list
{
if(tmp->freq > topfreq) // If the freqency on hand is larger that the one found, store...
{
topfreq = tmp->freq;
strcpy(topword, tmp->word);
}
tmp = tmp->next;
}
}
cout << topfreq << "\t" << topword << endl;
}
}
Any and all help would be GREATLY appreciated :)
Keep an array of 10 node pointers, and insert each node into the array, maintaining the array in sorted order. The eleventh node in the array is overwritten on each iteration and contains junk.
void toptenwords()
{
int topfreq = 0;
int minfreq = 0;
node *topwords[11];
int current_topwords = 0;
for(int m = 0; m < HASHTABLESIZE; m++) // Go through the entire hast table
{
node* tmp;
tmp = hashtable[m];
while(tmp != NULL) // Walk through the entire linked list
{
topwords[current_topwords] = tmp;
current_topwords++;
for(int i = current_topwords - 1; i > 0; i--)
{
if(topwords[i]->freq > topwords[i - 1]->freq)
{
node *temp = topwords[i - 1];
topwords[i - 1] = topwords[i];
topwords[i] = temp;
}
else break;
}
if(current_topwords > 10) current_topwords = 10;
tmp = tmp->next;
}
}
}
I would maintain a set of words already used and change the inner-most if condition to test for frequency greater than previous top frequency AND tmp->word not in list of words already used.
When iterating over the hash table (and then over each linked list contained therein) keep a self balancing binary tree (std::set) as a "result" list. As you come across each frequency, insert it into the list, then truncate the list if it has more than 10 entries. When you finish, you'll have a set (sorted list) of the top ten frequencies, which you can manipulate as you desire.
There may be perform gains to be had by using sets instead of linked lists in the hash table itself, but you can work that out for yourself.
Step 1 (Inefficient):
Move the vector into a sorted container via insertion sort, but insert into a container (e.g. linkedlist or vector) of size 10, and drop any elements that fall off the bottom of the list.
Step 2 (Efficient):
Same as step 1, but keep track of the size of the item at the bottom of the list, and skip the insertion step entirely if the current item is too small.
Suppose there are n words in total, and we need the most-frequent k words (here, k = 10).
If n is much larger than k, the most efficient way I know of is to maintain a min-heap (i.e. the top element has the minimum frequency of all elements in the heap). On each iteration, you insert the next frequency into the heap, and if the heap now contains k+1 elements, you remove the smallest. This way, the heap is maintained at a size of k elements throughout, containing at any time the k highest-frequency elements seen so far. At the end of processing, read out the k highest-frequency elements in increasing order.
Time complexity: For each of n words, we do two things: insert into a heap of size at most k, and remove the minimum element. Each operation costs O(log k) time, so the entire loop takes O(nlog k) time. Finally, we read out the k elements from a heap of size at most k, taking O(klog k) time, for a total time of O((n+k)log k). Since we know that k < n, O(klog k) is at worst O(nlog k), so this can be simplified to just O(nlog k).
A hash table containing linked lists of words seems like a peculiar data structure to use if the goal is to accumulate are word frequencies.
Nonetheless, the efficient way to get the ten highest frequency nodes is to insert each into a priority queue/heap, such as the Fibonacci heap, which has O(1) insertion time and O(n) deletion time. Assuming that iteration over the hash table table is fast, this method has a runtime which is O(n×O(1) + 10×O(n)) ≡ O(n).
The absolute fastest way to do this would be to use a SoftHeap. Using a SoftHeap, you can find the top 10 items in O(n) time whereas every other solution posted here would take O(n lg n) time.
http://en.wikipedia.org/wiki/Soft_heap
This wikipedia article shows how to find the median in O(n) time using a softheap, and the top 10 is simply a subset of the median problem. You could then sort the items that were in the top 10 if you needed them in order, and since you're always at most sorting 10 items, it's still O(n) time.