How to store a complete path that a priority queue follows while performing A* search - c++

I have been give a problem in which I am provided with user-entered matrix (rows and columns). User will also provide Start State (row and column) and the Goal State.
The job is to use A* search to find the path from the start node to the goal node.
A sample matrix is provided below,
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 G
0 0 0 1 1 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
0 1 0 1 0 1 1 0 0 0
0 1 0 1 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
where "S" is the start state, and "G" is the goal state. 0 are the states, in which you can move to and 1 are the obstacles in the grid, you can't move to them.
There are 3 actions allowed.
Up one cell (cost is 1)
right one cell (cost is 3)
diagonally up towards the right (cost is 2)
To solve this problem, I used Manhattan's Distance as my heuristic function and calculated the heuristic values for all my states.... They looked something like this (for the grid specified above)
10 9 8 7 6 5 4 3 2 1
9 8 7 6 5 4 3 2 1 0
10 9 8 7 6 5 4 3 2 1
11 10 9 8 7 6 5 4 3 2
12 11 10 9 8 7 6 5 4 3
13 12 11 10 9 8 7 6 5 4
14 13 12 11 10 9 8 7 6 5
15 14 13 12 11 10 9 8 7 6
16 15 14 13 12 11 10 9 8 7
17 16 15 14 13 12 11 10 9 8
18 17 16 15 14 13 12 11 10 9
19 18 17 16 15 14 13 12 11 10
20 19 18 17 16 15 14 13 12 11
21 20 19 18 17 16 15 14 13 12
Now, this is my code for A* search
void A_search()
{
priority_queue<node, vector<node>, CompareCost>q; // Priority Queue is used.
// node contains 4 elements... 1. "Heuristic" value, 2. Index row, 3. Index Col, 4. Actual Cost until this point
q.push(node(heuristic[srow][scol], srow, scol, 0)); // srow, scol is start state. 0 is Actual Cost
while (!q.empty())
{
node temp = q.top();
path_cost = temp.cost; // path_cost is global variable, which stores actual cost until this point
path[temp.i][temp.j] = true; // Boolean array, which tells the path followed so far.
q.pop();
if (temp.i == grow && temp.j == gcol) // If goal state is found, we break out of the loop
break;
if (temp.i > 0) // Checking for rows above the current state.
{
if (arr[temp.i - 1][temp.j] != 1) // Checking if index above current state is obstacle or not
{
q.push(node(heuristic[temp.i - 1][temp.j] + (temp.cost+1), temp.i - 1, temp.j, temp.cost + 1)); // pushing the above index into queue
}
if (temp.j - 1 < cols)
{
if (arr[temp.i - 1][temp.j + 1] != 1) // Diagonal Index checking
{
q.push(node(heuristic[temp.i - 1][temp.j + 1] + (temp.cost + 2), temp.i - 1, temp.j + 1, temp.cost + 2));
}
}
}
if (temp.j - 1 < cols) // Horizontal Index... Checking if column has exceeded the total cols or not
{
if (arr[temp.i][temp.j + 1] != 1) // Obstacle check for horizontal index
{
q.push(node(heuristic[temp.i][temp.j + 1] + (temp.cost + 3), temp.i, temp.j + 1, temp.cost + 3));
}
}
}
}
And this is the result I get after running this algorithm (Please note that # represents the path taken by the program... I am simply using a boolean 2D array to check which nodes are being visited by Priority Queue. For those indexes only, I am printing # and rest of the grid remains the same)
0 0 0 0 0 # # # # #
# # # 1 1 # # # # G
# # # 1 1 # # # 1 1
# # 0 # # # # # 0 0
# 1 1 # # # # 0 0 1
# # # # # # # 0 1 0
# # # # # 1 1 0 0 0
# # # # 0 1 1 0 0 0
# 1 # 1 0 1 1 0 0 0
# 1 # 1 0 1 1 0 0 0
# # 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
Path Cost: 21
Now, the problem, as evident from the output, is that it is storing every index that gets visited (because heuristic values have very low difference for all the indexes, that is why, almost every node is being visited.. However, ultimately, A* search finds the best path, and that can be seen from "Path Cost: 21" which is the actual cost of the optimal path)
I believe that my algorithm is correct, considering the path cost but what I want now is store also the path of the optimal path.
For this, I want to keep a record of all the indexes (row and column) that are visited by one path.
For example, my path starts from
Row 11, Col 0
Then "optimal paths" goes to,
Row 10, Col 1 -> When I push these nodes into queue, I want to store "11, 0" as well. So that, I can know what path this node has taken previously to reach this state.
Following the same, then it will go to,
Row 9, Col 2 -> So, this node should also store both "11, 0" and "10, 1" in it, hence keeping record of the path it has taken so far.
And this goes on, until the "goal" node.
But I can't seem to find a way to implement this thing, something that keeps track of all the path every node has taken. In this way, I can easily avoid the problem I am facing (I will simply print the path the "goal node" took to reach that point, ignoring all the other nodes which were visited unnecessarily)
Can anyone help me in trying to find a logic for this?
Also, just to be clear, my class node and CompareCost have this implementation,
class node
{
public:
int i, j, heuristic, cost;
node() { i = j = heuristic = cost = 0; }
node(int heuristic, int i, int j, int cost) :i(i), j(j), heuristic(heuristic), cost(cost) {}
};
struct CompareCost {
bool operator()(node const& p1, node const& p2)
{
return p1.heuristic > p2.heuristic;
}
};
I am guessing that I need to store something extra in my "class node" but I can't seem to figure out the exact thing.

Construct your node like a linked list:
class node
{
public:
int i, j, cost;
node* next;
}
Add a method to the class to display the full path:
void ShowPath()
{
Node* temp = this;
do
{
if (temp != NULL)
{
std::cout << "(" << temp->i << ", " << temp->j << ")";
temp = temp->next;
}
} while (temp != NULL);
}
Last, modify A_search() so that it returns the new node definition. You can then call ShowPath() on the return value.

Related

Extracting integers from file

I have the following file stored in a string vector.
ratingsTiny.txt...
Jesse
-3 5 -3 0 -1 -1 0 0 5
Shakea
5 0 5 0 5 0 0 0 1
Batool
5 -5 0 0 0 0 0 -3 -5
Muhammad
0 0 0 -5 0 -3 0 0 0
Maria
5 0 5 0 0 0 0 1 0
Alex
5 0 0 5 0 5 5 1 0
Riley
-5 3 -5 0 -1 0 0 0 3
My goal is extract the numbers, preferably column wise, to add them together and get an average rating for each column among the 7 users.
My closest attempt is below, but I can only print the first column and I can't figure out how to iterate through the entirety of the rows to get the rest of the integers.
Any help is very much appreciated.
ourvector<string> ratings;
for (int i = 1; i < ratings.size(); i += 2){
int num = atoi(ratings[i].c_str());
intRatings.push_back(num);
cout << num << endl;
}

time series sliding window with occurrence counts

I am trying to get a count between two timestamped values:
for example:
time letter
1 A
4 B
5 C
9 C
18 B
30 A
30 B
I am dividing time to time windows: 1+ 30 / 30
then I want to know how many A B C in each time window of size 1
timeseries A B C
1 1 0 0
2 0 0 0
...
30 1 1 0
this shoud give me a table of 30 rows and 3 columns: A B C of ocurancess
The problem is the data is taking to long to be break down because it iterates through all master table every time to slice the data eventhough thd data is already sorted
master = mytable
minimum = master.timestamp.min()
maximum = master.timestamp.max()
window = (minimum + maximum) / maximum
wstart = minimum
wend = minimum + window
concurrent_tasks = []
while ( wstart <= maximum ):
As = 0
Bs = 0
Cs = 0
for d, row in master.iterrows():
ttime = row.timestamp
if ((ttime >= wstart) & (ttime < wend)):
#print (row.channel)
if (row.channel == 'A'):
As = As + 1
elif (row.channel == 'B'):
Bs = Bs + 1
elif (row.channel == 'C'):
Cs = Cs + 1
concurrent_tasks.append([m_id, As, Bs, Cs])
wstart = wstart + window
wend = wend + window
Could you help me in making this perform better ? i want to use map function and i want to prevent python from looping through all the loop every time.
This is part of big data and it taking days to finish ?
thank you
There is a faster approach - pd.get_dummies():
In [116]: pd.get_dummies(df.set_index('time')['letter'])
Out[116]:
A B C
time
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 0 0
30 0 1 0
If you want to "compress" (group) it by time:
In [146]: pd.get_dummies(df.set_index('time')['letter']).groupby(level=0).sum()
Out[146]:
A B C
time
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 1 0
or using sklearn.feature_extraction.text.CountVectorizer:
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(token_pattern=r"\b\w+\b", stop_words=None)
r = pd.SparseDataFrame(cv.fit_transform(df.groupby('time')['letter'].agg(' '.join)),
index=df['time'].unique(),
columns=df['letter'].unique(),
default_fill_value=0)
Result:
In [143]: r
Out[143]:
A B C
1 1 0 0
4 0 1 0
5 0 0 1
9 0 0 1
18 0 1 0
30 1 1 0
If we want to list all times from 1 to 30:
In [153]: r.reindex(np.arange(r.index.min(), r.index.max()+1)).fillna(0).astype(np.int8)
Out[153]:
A B C
1 1 0 0
2 0 0 0
3 0 0 0
4 0 1 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
11 0 0 0
12 0 0 0
13 0 0 0
14 0 0 0
15 0 0 0
16 0 0 0
17 0 0 0
18 0 1 0
19 0 0 0
20 0 0 0
21 0 0 0
22 0 0 0
23 0 0 0
24 0 0 0
25 0 0 0
26 0 0 0
27 0 0 0
28 0 0 0
29 0 0 0
30 1 1 0
or using Pandas approach:
In [159]: pd.get_dummies(df.set_index('time')['letter']) \
...: .groupby(level=0) \
...: .sum() \
...: .reindex(np.arange(r.index.min(), r.index.max()+1), fill_value=0)
...:
Out[159]:
A B C
time
1 1 0 0
2 0 0 0
3 0 0 0
4 0 1 0
5 0 0 1
6 0 0 0
7 0 0 0
8 0 0 0
9 0 0 1
10 0 0 0
... .. .. ..
21 0 0 0
22 0 0 0
23 0 0 0
24 0 0 0
25 0 0 0
26 0 0 0
27 0 0 0
28 0 0 0
29 0 0 0
30 1 1 0
[30 rows x 3 columns]
UPDATE:
Timing:
In [163]: df = pd.concat([df] * 10**4, ignore_index=True)
In [164]: %timeit pd.get_dummies(df.set_index('time')['letter'])
100 loops, best of 3: 10.9 ms per loop
In [165]: %timeit df.set_index('time').letter.str.get_dummies()
1 loop, best of 3: 914 ms per loop

Eigen: Zero matrix with smaller matrix as the "diagonals"

Is it possible to create a 9x9 matrix where the "diagonal" is another matrix and the rest are zeroes, like this:
5 5 5 0 0 0 0 0 0
5 5 5 0 0 0 0 0 0
5 5 5 0 0 0 0 0 0
0 0 0 5 5 5 0 0 0
0 0 0 5 5 5 0 0 0
0 0 0 5 5 5 0 0 0
0 0 0 0 0 0 5 5 5
0 0 0 0 0 0 5 5 5
0 0 0 0 0 0 5 5 5
from a smaller 3x3 matrix repeated:
5 5 5
5 5 5
5 5 5
I am aware of the Replicate function but that repeats it everywhere in the matrix and doesn't maintain the zeroes. Is there a builtin way of achieving what I'm after?
One way of doing this is by using blocks where .block<3,3>(0,0) is a 3x3 block starting at 0,0. (Note: Your IDE might flag this line as an error but it will compile and run)
for (int x=0, x<3, x++){
zero_matrix.block<3,3>(x*3,x*3) = five_matrix;
}
You can use the (unsupported) KroneckerProduct module for that:
#include <unsupported/Eigen/KroneckerProduct>
int main()
{
Eigen::MatrixXd A = Eigen::kroneckerProduct(Eigen::Matrix3d::Identity(), Eigen::Matrix3d::Constant(5));
std::cout << A << '\n';
}

bit vector intersect in handling parquet file format

I am handling parquet file format. For example:
a group of data:
1 2 null 3 4 5 6 null 7 8 null null 9 10 11 12 13 14
I got a bit vector to indicate null element:
1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1
and only store the non-null element:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
I want to evaluate a predicate: big then 5
I compared non-null element to 5 and got a bit vector:
0 0 0 0 0 1 1 1 1 1 1 1 1 1
I want to got a bit vector for all elements:
0 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1
the 0 in bold is null elements, should be false.
void IntersectBitVec(vector<int64_t>& bit_vec, vector<int64_t>& sub_bit_vec) {
int data_idx = 0,
int bit_idx = 63;
for (int i = 0; i < bit_vec.size(); ++i) {
for (int j = 63; j >=0; --j) {
if (bit_vec[i] & 0x01 << j) {
if (!(sub_bit_vec[data_idx] & 0x01 << bit_idx)) {
bit_vec[i] &= ~(0x01 << j);
}
if (--bit_idx < 0) {
--data_idx;
bit_idx = 63;
}
}
}
}}
My code is quite ugly, is there anyway to make it fast? Great thanks!

Find all puddles on the square (algorithm)

The problem is defined as follows:
You're given a square. The square is lined with flat flagstones size 1m x 1m. Grass surround the square. Flagstones may be at different height. It starts raining. Determine where puddles will be created and compute how much water will contain. Water doesn't flow through the corners. In any area of ​​grass can soak any volume of water at any time.
Input:
width height
width*height non-negative numbers describing a height of each flagstone over grass level.
Output:
Volume of water from puddles.
width*height signs describing places where puddles will be created and places won't.
. - no puddle
# - puddle
Examples
Input:
8 8
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Output:
11
........
........
..#.....
....#...
........
..####..
........
........
Input:
16 16
8 0 1 0 0 0 0 2 2 4 3 4 5 0 0 2
6 2 0 5 2 0 0 2 0 1 0 3 1 2 1 2
7 2 5 4 5 2 2 1 3 6 2 0 8 0 3 2
2 5 3 3 0 1 0 3 3 0 2 0 3 0 1 1
1 0 1 4 1 1 2 0 3 1 1 0 1 1 2 0
2 6 2 0 0 3 5 5 4 3 0 4 2 2 2 1
4 2 0 0 0 1 1 2 1 2 1 0 4 0 5 1
2 0 2 0 5 0 1 1 2 0 7 5 1 0 4 3
13 6 6 0 10 8 10 5 17 6 4 0 12 5 7 6
7 3 0 2 5 3 8 0 3 6 1 4 2 3 0 3
8 0 6 1 2 2 6 3 7 6 4 0 1 4 2 1
3 5 3 0 0 4 4 1 4 0 3 2 0 0 1 0
13 3 6 0 7 5 3 2 21 8 13 3 5 0 13 7
3 5 6 2 2 2 0 2 5 0 7 0 1 3 7 5
7 4 5 3 4 5 2 0 23 9 10 5 9 7 9 8
11 5 7 7 9 7 1 0 17 13 7 10 6 5 8 10
Output:
103
................
..#.....###.#...
.......#...#.#..
....###..#.#.#..
.#..##.#...#....
...##.....#.....
..#####.#..#.#..
.#.#.###.#..##..
...#.......#....
..#....#..#...#.
.#.#.......#....
...##..#.#..##..
.#.#.........#..
......#..#.##...
.#..............
................
I tried different ways. Floodfill from max value, then from min value, but it's not working for every input or require code complication. Any ideas?
I'm interesting algorithm with complexity O(n^2) or o(n^3).
Summary
I would be tempted to try and solve this using a disjoint-set data structure.
The algorithm would be to iterate over all heights in the map performing a floodfill operation at each height.
Details
For each height x (starting at 0)
Connect all flagstones of height x to their neighbours if the neighbour height is <= x (storing connected sets of flagstones in the disjoint set data structure)
Remove any sets that connected to the grass
Mark all flagstones of height x in still remaining sets as being puddles
Add the total count of flagstones in remaining sets to a total t
At the end t gives the total volume of water.
Worked Example
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Connect all flagstones of height 0 into sets A,B,C,D,E,F
A A A A A 1 B B
A 1 1 1 A 1 B B
A 1 C 2 1 2 4 5
A 1 1 2 D 2 4 5
A 3 3 3 3 3 3 4
A 3 E 1 2 F 3 4
A 3 3 3 3 3 3 A
A A A A A A A A
Remove flagstones connecting to the grass, and mark remaining as puddles
1
1 1 1 1
1 C 2 1 2 4 5 #
1 1 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E 1 2 F 3 4 # #
3 3 3 3 3 3
Count remaining set size t=4
Connect all of height 1
G
C C C G
C C 2 D 2 4 5 #
C C 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
2 2 4 5 #
2 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # # #
3 3 3 3 3 3
t=4+3=7
Connect all of height 2
A B 4 5 #
A B 4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # # #
3 3 3 3 3 3
t=7+4=11
Connect all of height 3
4 5 #
4 5 #
E E E E E E 4
E E E E E E 4 # # # #
E E E E E E
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
4
4 # # # #
After doing this for heights 4 and 5 nothing will remain.
A preprocessing step to create lists of all locations with each height should mean that the algorithm is close to O(n^2).
This seems to be working nicely. The idea is it is a recursive function, that checks to see if there is an "outward flow" that will allow it to escape to the edge. If the values that do no have such an escape will puddle. I tested it on your two input files and it works quite nicely. I copied the output for these two files for you. Pardon my nasty use of global variables and what not, I figured it was the concept behind the algorithm that mattered, not good style :)
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
int SIZE_X;
int SIZE_Y;
bool **result;
int **INPUT;
bool flowToEdge(int x, int y, int value, bool* visited) {
if(x < 0 || x == SIZE_X || y < 0 || y == SIZE_Y) return true;
if(visited[(x * SIZE_X) + y]) return false;
if(value < INPUT[x][y]) return false;
visited[(x * SIZE_X) + y] = true;
bool left = false;
bool right = false;
bool up = false;
bool down = false;
left = flowToEdge(x-1, y, value, visited);
right = flowToEdge(x+1, y, value, visited);
up = flowToEdge(x, y+1, value, visited);
down = flowToEdge(x, y-1, value, visited);
return (left || up || down || right);
}
int main() {
ifstream myReadFile;
myReadFile.open("test.txt");
myReadFile >> SIZE_X;
myReadFile >> SIZE_Y;
INPUT = new int*[SIZE_X];
result = new bool*[SIZE_X];
for(int i = 0; i < SIZE_X; i++) {
INPUT[i] = new int[SIZE_Y];
result[i] = new bool[SIZE_Y];
for(int j = 0; j < SIZE_Y; j++) {
int someInt;
myReadFile >> someInt;
INPUT[i][j] = someInt;
result[i][j] = false;
}
}
for(int i = 0; i < SIZE_X; i++) {
for(int j = 0; j < SIZE_Y; j++) {
bool visited[SIZE_X][SIZE_Y];
for(int k = 0; k < SIZE_X; k++)//You can avoid this looping by using maps with pairs of coordinates instead
for(int l = 0; l < SIZE_Y; l++)
visited[k][l] = 0;
result[i][j] = flowToEdge(i,j, INPUT[i][j], &visited[0][0]);
}
}
for(int i = 0; i < SIZE_X; i++) {
cout << endl;
for(int j = 0; j < SIZE_Y; j++)
cout << result[i][j];
}
cout << endl;
}
The 16 by 16 file:
1111111111111111
1101111100010111
1111111011101011
1111000110101011
1011001011101111
1110011111011111
1100000101101011
1010100010110011
1110111111101111
1101101011011101
1010111111101111
1110011010110011
1010111111111011
1111110110100111
1011111111111111
1111111111111111
The 8 by 8 file
11111111
11111111
11011111
11110111
11111111
11000011
11111111
11111111
You could optimize this algorithm easily and considerably by doing several things. A: return true immediately upon finding a route would speed it up considerably. You could also connect it globally to the current set of results so that any given point would only have to find a flow point to an already known flow point, and not all the way to the edge.
The work involved, each n will have to exam each node. However, with optimizations, we should be able to get this much lower than n^2 for most cases, but it still an n^3 algorithm in the worst case... but creating this would be very difficult(with proper optimization logic... dynamic programming for the win!)
EDIT:
The modified code works for the following circumstances:
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 1 1
And these are the results:
11111111
10000001
10111101
10100101
10110101
10110101
10000101
11111111
Now when we remove that 1 at the bottom we want to see no puddling.
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 0 1
And these are the results
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1