bit vector intersect in handling parquet file format - c++

I am handling parquet file format. For example:
a group of data:
1 2 null 3 4 5 6 null 7 8 null null 9 10 11 12 13 14
I got a bit vector to indicate null element:
1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1
and only store the non-null element:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
I want to evaluate a predicate: big then 5
I compared non-null element to 5 and got a bit vector:
0 0 0 0 0 1 1 1 1 1 1 1 1 1
I want to got a bit vector for all elements:
0 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1
the 0 in bold is null elements, should be false.
void IntersectBitVec(vector<int64_t>& bit_vec, vector<int64_t>& sub_bit_vec) {
int data_idx = 0,
int bit_idx = 63;
for (int i = 0; i < bit_vec.size(); ++i) {
for (int j = 63; j >=0; --j) {
if (bit_vec[i] & 0x01 << j) {
if (!(sub_bit_vec[data_idx] & 0x01 << bit_idx)) {
bit_vec[i] &= ~(0x01 << j);
}
if (--bit_idx < 0) {
--data_idx;
bit_idx = 63;
}
}
}
}}
My code is quite ugly, is there anyway to make it fast? Great thanks!

Related

Printing an std::array gives random values

I am trying to print out an std::array as seen below, the output is supposed to consist of only booleans, but there seem to be numbers in the output aswell (also below). I've tried printing out the elements which give numbers on their own, but then I get their actual value, which is weird.
My main function:
float f(float x, float y)
{
return x * x + y * y - 1;
}
int main()
{
std::array<std::array<bool, ARRAY_SIZE_X>, ARRAY_SIZE_Y> temp = ConvertToBinaryImage(&f);
for(int i = 0; i < (int)temp.size(); ++i)
{
for(int j = 0; j < (int)temp[0].size(); ++j)
{
std::cout << temp[i][j] << " ";
}
std::cout << std::endl;
}
}
The function that sets the array:
std::array<std::array<bool, ARRAY_SIZE_X>, ARRAY_SIZE_Y> ConvertToBinaryImage(float(*func)(float, float))
{
std::array<std::array<bool, ARRAY_SIZE_X>, ARRAY_SIZE_Y> result;
for(float x = X_MIN; x <= X_MAX; x += STEP_SIZE)
{
for(float y = Y_MIN; y <= Y_MAX; y += STEP_SIZE)
{
int indx = ARRAY_SIZE_X - (x - X_MIN) * STEP_SIZE_INV;
int indy = ARRAY_SIZE_Y - (y - Y_MIN) * STEP_SIZE_INV;
result[indx][indy] = func(x, y) < 0;
}
}
return result;
}
The constants
#define X_MIN -1
#define Y_MIN -1
#define X_MAX 1
#define Y_MAX 1
#define STEP_SIZE_INV 10
#define STEP_SIZE (float)1 / STEP_SIZE_INV
#define ARRAY_SIZE_X (X_MAX - X_MIN) * STEP_SIZE_INV
#define ARRAY_SIZE_Y (Y_MAX - Y_MIN) * STEP_SIZE_INV
My output:
184 225 213 111 0 0 0 0 230 40 212 111 0 0 0 0 64 253 98 0
0 0 0 0 1 0 1 0 1 1 1 1 6 1 0 0 168 0 0 0
0 183 213 111 0 0 0 0 0 0 0 0 0 0 0 0 9 242 236 108
0 0 0 1 64 1 1 0 1 1 1 1 240 1 1 1 249 1 0 0
0 21 255 0 0 0 0 0 98 242 236 108 0 0 0 0 0 0 0 0
0 0 0 1 128 1 1 0 1 1 1 1 128 1 1 1 0 1 1 0
0 1 255 1 0 1 1 0 1 1 1 1 0 1 1 1 31 1 1 1
0 0 0 0 184 225 213 111 0 0 0 0 2 0 0 0 0 0 0 0
9 1 0 1 0 1 1 0 1 1 1 1 0 1 1 1 64 1 1 1
0 1 0 1 64 1 1 0 1 1 1 1 96 1 1 1 249 1 1 1
0 1 213 1 0 1 1 0 1 1 1 1 0 1 1 1 32 1 1 1
0 1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1
0 21 255 0 0 0 0 0 80 59 117 0 0 0 0 0 32 112 64 0
0 1 0 1 17 1 1 16 1 1 1 1 104 1 1 1 0 1 1 1
0 0 144 1 249 1 1 0 1 1 1 1 0 1 1 1 0 1 1 0
0 0 0 1 80 1 1 0 1 1 1 1 24 1 1 1 0 1 1 0
0 0 0 0 0 0 0 0 17 0 1 16 0 0 0 0 112 7 255 0
0 0 0 1 134 1 1 30 1 1 1 1 8 1 1 1 0 1 0 0
0 0 0 0 0 1 1 0 1 1 1 1 0 1 1 1 32 0 0 0
0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 0 0 0 0
Floating point maths will often not produce accurate results, see Is floating point math broken?.
If we print out the values of indx and indy:
20, 20
20, 19
20, 18
20, 17
20, 15
20, 14
20, 13
20, 13
20, 11
20, 10
20, 9
20, 9
20, 8
20, 6
20, 5
20, 5
20, 3
20, 3
20, 1
20, 1
19, 20
19, 19
19, 18
19, 17
...
You can see that you are writing to indexes with the value 20 which is out of bounds of the array and also you aren't writing to every index leaving some of the array elements uninitialised. Though normally booleans are only true or false they are usually actually stored as a byte allowing storing values between 0 and 255, printing the uninitialised values is undefined behaviour.
We can fix your code in this particular instance by calculating the indexes a little more carefully:
int indx = std::clamp(int(round(ARRAY_SIZE_X - (x - X_MIN) * STEP_SIZE_INV)), 1, ARRAY_SIZE_X)-1;
int indy = std::clamp(int(round(ARRAY_SIZE_Y - (y - Y_MIN) * STEP_SIZE_INV)), 1, ARRAY_SIZE_Y)-1;
There are two fixes here, you were generating values between 1 and 20, the -1 reduces this to 0 to 19. The round solves the issue of not using all the indexes (you were simply truncating by assigning to an int). The clamp ensures the values are always in range (though in this case the calculations work out to be in range).
As you want to always write to every pixel a better solution would be to iterate over the values of indx and indy and calculate the values of x and y from the indices:
for (int indx = 0; indx < ARRAY_SIZE_X; indx++)
{
float x = X_MIN - (indx - ARRAY_SIZE_X) * STEP_SIZE;
for (int indy = 0; indy < ARRAY_SIZE_Y; indy++)
{
float y = Y_MIN - (indy - ARRAY_SIZE_Y) * STEP_SIZE;
result[indx][indy] = func(x, y) < 0;
}
}

How to store a complete path that a priority queue follows while performing A* search

I have been give a problem in which I am provided with user-entered matrix (rows and columns). User will also provide Start State (row and column) and the Goal State.
The job is to use A* search to find the path from the start node to the goal node.
A sample matrix is provided below,
0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0 G
0 0 0 1 1 0 0 0 1 1
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
0 1 0 1 0 1 1 0 0 0
0 1 0 1 0 1 1 0 0 0
0 0 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
where "S" is the start state, and "G" is the goal state. 0 are the states, in which you can move to and 1 are the obstacles in the grid, you can't move to them.
There are 3 actions allowed.
Up one cell (cost is 1)
right one cell (cost is 3)
diagonally up towards the right (cost is 2)
To solve this problem, I used Manhattan's Distance as my heuristic function and calculated the heuristic values for all my states.... They looked something like this (for the grid specified above)
10 9 8 7 6 5 4 3 2 1
9 8 7 6 5 4 3 2 1 0
10 9 8 7 6 5 4 3 2 1
11 10 9 8 7 6 5 4 3 2
12 11 10 9 8 7 6 5 4 3
13 12 11 10 9 8 7 6 5 4
14 13 12 11 10 9 8 7 6 5
15 14 13 12 11 10 9 8 7 6
16 15 14 13 12 11 10 9 8 7
17 16 15 14 13 12 11 10 9 8
18 17 16 15 14 13 12 11 10 9
19 18 17 16 15 14 13 12 11 10
20 19 18 17 16 15 14 13 12 11
21 20 19 18 17 16 15 14 13 12
Now, this is my code for A* search
void A_search()
{
priority_queue<node, vector<node>, CompareCost>q; // Priority Queue is used.
// node contains 4 elements... 1. "Heuristic" value, 2. Index row, 3. Index Col, 4. Actual Cost until this point
q.push(node(heuristic[srow][scol], srow, scol, 0)); // srow, scol is start state. 0 is Actual Cost
while (!q.empty())
{
node temp = q.top();
path_cost = temp.cost; // path_cost is global variable, which stores actual cost until this point
path[temp.i][temp.j] = true; // Boolean array, which tells the path followed so far.
q.pop();
if (temp.i == grow && temp.j == gcol) // If goal state is found, we break out of the loop
break;
if (temp.i > 0) // Checking for rows above the current state.
{
if (arr[temp.i - 1][temp.j] != 1) // Checking if index above current state is obstacle or not
{
q.push(node(heuristic[temp.i - 1][temp.j] + (temp.cost+1), temp.i - 1, temp.j, temp.cost + 1)); // pushing the above index into queue
}
if (temp.j - 1 < cols)
{
if (arr[temp.i - 1][temp.j + 1] != 1) // Diagonal Index checking
{
q.push(node(heuristic[temp.i - 1][temp.j + 1] + (temp.cost + 2), temp.i - 1, temp.j + 1, temp.cost + 2));
}
}
}
if (temp.j - 1 < cols) // Horizontal Index... Checking if column has exceeded the total cols or not
{
if (arr[temp.i][temp.j + 1] != 1) // Obstacle check for horizontal index
{
q.push(node(heuristic[temp.i][temp.j + 1] + (temp.cost + 3), temp.i, temp.j + 1, temp.cost + 3));
}
}
}
}
And this is the result I get after running this algorithm (Please note that # represents the path taken by the program... I am simply using a boolean 2D array to check which nodes are being visited by Priority Queue. For those indexes only, I am printing # and rest of the grid remains the same)
0 0 0 0 0 # # # # #
# # # 1 1 # # # # G
# # # 1 1 # # # 1 1
# # 0 # # # # # 0 0
# 1 1 # # # # 0 0 1
# # # # # # # 0 1 0
# # # # # 1 1 0 0 0
# # # # 0 1 1 0 0 0
# 1 # 1 0 1 1 0 0 0
# 1 # 1 0 1 1 0 0 0
# # 0 0 0 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 1 0 0 0
0 0 0 0 0 1 1 0 0 0
Path Cost: 21
Now, the problem, as evident from the output, is that it is storing every index that gets visited (because heuristic values have very low difference for all the indexes, that is why, almost every node is being visited.. However, ultimately, A* search finds the best path, and that can be seen from "Path Cost: 21" which is the actual cost of the optimal path)
I believe that my algorithm is correct, considering the path cost but what I want now is store also the path of the optimal path.
For this, I want to keep a record of all the indexes (row and column) that are visited by one path.
For example, my path starts from
Row 11, Col 0
Then "optimal paths" goes to,
Row 10, Col 1 -> When I push these nodes into queue, I want to store "11, 0" as well. So that, I can know what path this node has taken previously to reach this state.
Following the same, then it will go to,
Row 9, Col 2 -> So, this node should also store both "11, 0" and "10, 1" in it, hence keeping record of the path it has taken so far.
And this goes on, until the "goal" node.
But I can't seem to find a way to implement this thing, something that keeps track of all the path every node has taken. In this way, I can easily avoid the problem I am facing (I will simply print the path the "goal node" took to reach that point, ignoring all the other nodes which were visited unnecessarily)
Can anyone help me in trying to find a logic for this?
Also, just to be clear, my class node and CompareCost have this implementation,
class node
{
public:
int i, j, heuristic, cost;
node() { i = j = heuristic = cost = 0; }
node(int heuristic, int i, int j, int cost) :i(i), j(j), heuristic(heuristic), cost(cost) {}
};
struct CompareCost {
bool operator()(node const& p1, node const& p2)
{
return p1.heuristic > p2.heuristic;
}
};
I am guessing that I need to store something extra in my "class node" but I can't seem to figure out the exact thing.
Construct your node like a linked list:
class node
{
public:
int i, j, cost;
node* next;
}
Add a method to the class to display the full path:
void ShowPath()
{
Node* temp = this;
do
{
if (temp != NULL)
{
std::cout << "(" << temp->i << ", " << temp->j << ")";
temp = temp->next;
}
} while (temp != NULL);
}
Last, modify A_search() so that it returns the new node definition. You can then call ShowPath() on the return value.

Inaccurate C++ factorial program

I wrote an implementation of the following tutorial: LINK
Basically, since C/C++ does not have BIG Integer we are storing the factorial decimal values in an array. This is equivalent to writing a multiplication that performs the multiplication kids are taught at schools.
Problem: It works fine for values up to 17! after that (18!, 19!,... ) it does not output correct values.
#include <iostream>
using namespace std;
int main(){
int fact[1000]={1};
int n; scanf("%d", &n); //n are the number of factorials we will calculate
while(n--){
int number; scanf("%d", &number); //scan the number
if(number == 0) printf("%d", 1);
int flag = number;
int index = 0, length = 0;
//following lines we find the length of the entered number
while(flag!=0){
fact[index] = flag%10;
flag /= 10;
index++; length++;
}
//following lines are the multiplication code
while(number>1){
index = 0;
int temp = 0;
number--;
for(index = 0; index<length; index++){
int x = (fact[index] * number) + temp;
fact[index] = x%10;
temp = x/10;
}
//here we append the carry over left from multiplication
while(temp){
fact[index] = temp%10;
temp /= 10;
length++;
}
}
//print the array from most to least significant digit
for(int i = length-1; i>=0; i--){
printf("%d", fact[i]);
}
printf("\n");
}
return 0;
}
For a start, you need to be very careful with:
long long int x = (fact[index] * number) + temp;
Since fact[], number and temp are all int types, the calculation will be done as an int and only widened to a long long when placing the value into x.
You would be better off with:
long long x = fact[index];
x *= number;
x += temp;
That way, it becomes a long long early enough that the calculations will be done with that type.
However, that doesn't actually fix your problem, so let's modify your code a little to see where the problem lies:
#include <iostream>
using namespace std;
int main(){
int fact[1000]={1};
int n = 18, numberx = 0;
while(n-- > 0){
int number = ++numberx;
if(number == 0) { printf("%d", 1); continue; }
int flag = number;
int index = 0, length = 0;
//following lines we find the length of the entered number
while(flag!=0){
fact[index] = flag%10;
flag /= 10;
index++; length++;
}
//following lines are the multiplication code
while(number>1){
index = 0;
int temp = 0;
number--;
for(index = 0; index<length; index++){
long long int x = fact[index];
x *= number;
x += temp;
fact[index] = x%10;
temp = x/10;
}
//here we append the carry over left from multiplication
while(temp){
fact[index] = temp%10;
temp /= 10;
length++;
}
}
//print the array from most to least significant digit
printf("%d! = ", number);
for(int i = length-1; i>=0; i--){
printf("%d ", fact[i]);
}
printf("\n");
}
return 0;
}
Running this gives you:
1! = 1
2! = 2
3! = 6
4! = 2 4
5! = 1 2 0
6! = 7 2 0
7! = 5 0 4 0
8! = 4 0 3 2 0
9! = 3 6 2 8 8 0
10! = 3 6 2 8 8 0 0
11! = 3 9 9 1 6 8 0 0
12! = 4 7 9 0 0 1 6 0 0
13! = 6 2 2 7 0 2 0 8 0 0
14! = 8 7 1 7 8 2 9 1 2 0 0
15! = 1 3 0 7 6 7 4 3 6 8 0 0 0
16! = 2 0 9 2 2 7 8 9 8 8 8 0 0 0
17! = 3 5 5 6 8 7 4 2 8 0 9 6 0 0 0
18! = 1 9 9 1 0 4 7 1 7 3 8 5 7 2 8 0 0 0
which is, as you state okay up until 18!, where if fails. And, in fact, you can see the ratio between 17! and 18! is about 500 rather than 18 so that's where we should look.
Let's first strip out the extraneous stuff by starting at 17!. That can be done simply by changing a couple of starting values:
int n = 2, numberx = 16;
and that gives:
17! = 3 5 5 6 8 7 4 2 8 0 9 6 0 0 0
18! = 1 9 9 1 0 4 7 1 7 3 8 5 7 2 8 0 0 0
Then we can add debug code to see what's happening, outputting temporary results along the way. The main loop can become:
while(number>1){
index = 0;
int temp = 0;
number--;
if (numberx > 17) printf("\n");
for(index = 0; index<length; index++){
if (numberx > 17) printf("index %d fact[] %d number %d temp %d", index, fact[index], number, temp);
long long int x = fact[index];
x *= number;
x += temp;
fact[index] = x%10;
temp = x/10;
if (numberx > 17) printf(" -> fact[] %d temp %d\n", fact[index], temp);
}
//here we append the carry over left from multiplication
while(temp){
fact[index] = temp%10;
temp /= 10;
length++;
}
if (numberx > 17) {
printf("temp: ");
for(int i = length-1; i>=0; i--){
printf("%d ", fact[i]);
}
printf("\n");
}
}
This shows you *exactly where things start to go wrong (// bits are added by me):
17! = 3 5 5 6 8 7 4 2 8 0 9 6 0 0 0
index 0 fact[] 8 number 17 temp 0 -> fact[] 6 temp 13
index 1 fact[] 1 number 17 temp 13 -> fact[] 0 temp 3
temp: 3 0 6 // okay: 18 * 17 = 306
index 0 fact[] 6 number 16 temp 0 -> fact[] 6 temp 9
index 1 fact[] 0 number 16 temp 9 -> fact[] 9 temp 0
index 2 fact[] 3 number 16 temp 0 -> fact[] 8 temp 4
temp: 4 8 9 6 // okay 306 * 16 = 4896
index 0 fact[] 6 number 15 temp 0 -> fact[] 0 temp 9
index 1 fact[] 9 number 15 temp 9 -> fact[] 4 temp 14
index 2 fact[] 8 number 15 temp 14 -> fact[] 4 temp 13
index 3 fact[] 4 number 15 temp 13 -> fact[] 3 temp 7
temp: 7 3 4 4 0 // okay 4896 * 15 = 73440
index 0 fact[] 0 number 14 temp 0 -> fact[] 0 temp 0
index 1 fact[] 4 number 14 temp 0 -> fact[] 6 temp 5
index 2 fact[] 4 number 14 temp 5 -> fact[] 1 temp 6
index 3 fact[] 3 number 14 temp 6 -> fact[] 8 temp 4
index 4 fact[] 7 number 14 temp 4 -> fact[] 2 temp 10
temp: 8 1 2 8 1 6 0 // no good: 73440 * 14 = 10128160 !!!
1 0 2 8 1 6 0 // is what it should be
With a bit of thought, it appears to be the point where the final "carry" from the multiplication is greater than nine, meaning it's almost certainly in the code for handling that:
while(temp){
fact[index] = temp%10;
temp /= 10;
length++;
}
Thinking about that (and comparing it to other code that changes index and length together), it becomes obvious - even though you increase the length of the array, you're not increasing the index. That means, for a final carry of ten or more, the subsequent carry will not populate the correct index, it will simply overwrite the same index each time.
This can be seen here:
temp: 8 1 2 8 1 6 0 // no good: 73440 * 14 = 10128160 !!!
1 0 2 8 1 6 0 // is what it should be
where it will have placed the zero (10 % 10) at that second location (increasing the length) but then placed the one (10 / 10) at the same index, leaving the 8 at whatever value it had before.
So, if we increment index as well, what do we see (going back to the less verbose code)?
1! = 1
2! = 2
3! = 6
4! = 2 4
5! = 1 2 0
6! = 7 2 0
7! = 5 0 4 0
8! = 4 0 3 2 0
9! = 3 6 2 8 8 0
10! = 3 6 2 8 8 0 0
11! = 3 9 9 1 6 8 0 0
12! = 4 7 9 0 0 1 6 0 0
13! = 6 2 2 7 0 2 0 8 0 0
14! = 8 7 1 7 8 2 9 1 2 0 0
15! = 1 3 0 7 6 7 4 3 6 8 0 0 0
16! = 2 0 9 2 2 7 8 9 8 8 8 0 0 0
17! = 3 5 5 6 8 7 4 2 8 0 9 6 0 0 0
18! = 6 4 0 2 3 7 3 7 0 5 7 2 8 0 0 0
19! = 1 2 1 6 4 5 1 0 0 4 0 8 8 3 2 0 0 0
20! = 2 4 3 2 9 0 2 0 0 8 1 7 6 6 4 0 0 0 0
That solves your specific problem and hopefully provides some education on debugging as well :-)

QuickSort crash on odd sized set

I've been trying to implement quick sort and have been having a lot of problems. I even copied a lot of implementations and accepted answers from the net and they ALL crash on odd sized array/vector if you run it enough times (each time I run I run quick sort against random numbers to be sorted... rather than pretend my code works just cuz it can sort one particular set of numbers).
Here is my code and also prints to help debug the error.
template <typename T>
void quickSortMidPivot(vector<T>&vec, size_t left, size_t right)
{
mcount++;
if(right - left < 1)
return;
//crash all the time
//if(left >= right)
// return;
size_t l = left;
size_t r = right;
T pivot = vec[left + ((right-left)/2)];
cout << endl << "PivotValue:" << pivot << endl;
while (l <= r)
{
while (vec[l] < pivot)
l++;
while (vec[r] > pivot)
r--;
if (l <= r) {
cout << endl << "swap:" << vec[l] << "&" << vec[r] << endl;
std::swap(vec[l], vec[r]);
l++;
r--;
for (int i =left; i<=right; i++)
cout << vec[i] << " ";
}
}
cout << endl << "left:" << left << " r:" << r << endl;
cout << "l:" << l << " right:" << right << endl;
if(left < r)
quickSortMidPivot(vec, left, r);
if(l < right)
quickSortMidPivot(vec, l, right);
}
//in main
quickSortMidPivot(dsVector, 0, dsVector.size() - 1);
mcount is a global just so that I can count number of recursive calls. Help figure out most effective implementation...
Here is some debug info.
When run on even sized vector.
Test values are (PRE-SORTING):
8 4 6 5 2 4 1 2
PivotValue:5
swap:8&2
2 4 6 5 2 4 1 8
swap:6&1
2 4 1 5 2 4 6 8
swap:5&4
2 4 1 4 2 5 6 8
left:0 r:4
l:5 right:7
PivotValue:1
swap:2&1
1 4 2 4 2
left:0 r:0
l:1 right:4
PivotValue:2
swap:4&2
2 2 4 4
swap:2&2
2 2 4 4
left:1 r:1
l:3 right:4
PivotValue:4
swap:4&4
4 4
left:3 r:3
l:4 right:4
PivotValue:6
swap:6&6
5 6 8
left:5 r:5
l:7 right:7
# Recursions:5 0
Data Sorted.
Sorted test values are (POST-SORTING):
1 2 2 4 4 5 6 8
Here is case with odd sized array (9). Works 90% of time.
Test values are (PRE-SORTING):
7 7 5 6 5 8 9 5 8
PivotValue:5
swap:7&5
5 7 5 6 5 8 9 7 8
swap:7&5
5 5 5 6 7 8 9 7 8
swap:5&5
5 5 5 6 7 8 9 7 8
left:0 r:1
l:3 right:8
PivotValue:5
swap:5&5
5 5
left:0 r:0
l:1 right:1
PivotValue:8
swap:8&8
6 7 8 9 7 8
swap:9&7
6 7 8 7 9 8
left:3 r:6
l:7 right:8
PivotValue:7
swap:7&7
6 7 8 7
left:3 r:4
l:5 right:6
PivotValue:6
swap:6&6
6 7
left:3 r:2
l:4 right:4
PivotValue:8
swap:8&7
7 8
left:5 r:5
l:6 right:6
PivotValue:9
swap:9&8
8 9
left:7 r:7
l:8 right:8
# Recursions:7 0
Data Sorted.
Sorted test values are (POST-SORTING):
5 5 5 6 7 7 8 8 9
Here is print output for when odd sized (9) vector input causes crash.
Test values are (PRE-SORTING):
8 3 2 3 9 3 8 1 5
PivotValue:9
swap:9&5
8 3 2 3 5 3 8 1 9
left:0 r:7
l:8 right:8
PivotValue:3
swap:8&1
1 3 2 3 5 3 8 8
swap:3&3
1 3 2 3 5 3 8 8
swap:3&3
1 3 2 3 5 3 8 8
left:0 r:2
l:4 right:7
PivotValue:3
swap:3&2
1 2 3
left:0 r:1
l:2 right:2
PivotValue:1
swap:1&1
1 2
swap:2&0
1 0
swap:3&0
1 0
swap:3&1
1 0
swap:5&0
1 0
swap:3&1
1 0
swap:8&0
1 0
swap:8&0
1 0
swap:9&0
1 0
swap:7274596&0
1 0
swap:666050571&0
1 0
swap:369110150&0
1 0
swap:1&0
1 0
swap:1&0
1 0
swap:110&0
1 0
swap:649273354&0
1 0
swap:134229126&0
1 0
swap:3764640&0
1 0
swap:2293216&0
1 0
swap:8&0
1 0
swap:2&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764672&0
1 0
swap:3764608&0
1 0
swap:3&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764704&0
1 0
swap:3764640&0
1 0
swap:2&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764736&0
1 0
swap:3764672&0
1 0
swap:3&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764768&0
1 0
swap:3764704&0
1 0
swap:9&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764800&0
1 0
swap:3764736&0
1 0
swap:3&0
1 0
swap:6619252&0
1 0
swap:649273354&0
1 0
swap:134229127&0
1 0
swap:3764832&0
1 0
swap:3764768&0
1 0
swap:8&0
1 0
swap:666050571&0
1 0
swap:402664583&0
1 0
swap:3765152&0
1 0
swap:3764800&0
1 0
swap:1&0
1 0
swap:900931609&0
1 0
swap:268446854&0
1 0
swap:2046&0
1 0
swap:2046&0
1 0
swap:649273354&0
1 0
swap:134229140&0
1 0
swap:2293216&0
1 0
swap:3764832&0
1 0
swap:5&0
1 0
swap:11399&0
1 0
swap:3735896&0
1 0
swap:3735896&0
1 0
swap:548610060&1
1 0
swap:50342980&0
1 0
swap:6356944&-1
1 0
swap:3735800&-2
1 0
swap:3735648&0
1 0
swap:3735648&-1
1 0
swap:3768320&0
1 0
swap:32768&1
1 0

Find all puddles on the square (algorithm)

The problem is defined as follows:
You're given a square. The square is lined with flat flagstones size 1m x 1m. Grass surround the square. Flagstones may be at different height. It starts raining. Determine where puddles will be created and compute how much water will contain. Water doesn't flow through the corners. In any area of ​​grass can soak any volume of water at any time.
Input:
width height
width*height non-negative numbers describing a height of each flagstone over grass level.
Output:
Volume of water from puddles.
width*height signs describing places where puddles will be created and places won't.
. - no puddle
# - puddle
Examples
Input:
8 8
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Output:
11
........
........
..#.....
....#...
........
..####..
........
........
Input:
16 16
8 0 1 0 0 0 0 2 2 4 3 4 5 0 0 2
6 2 0 5 2 0 0 2 0 1 0 3 1 2 1 2
7 2 5 4 5 2 2 1 3 6 2 0 8 0 3 2
2 5 3 3 0 1 0 3 3 0 2 0 3 0 1 1
1 0 1 4 1 1 2 0 3 1 1 0 1 1 2 0
2 6 2 0 0 3 5 5 4 3 0 4 2 2 2 1
4 2 0 0 0 1 1 2 1 2 1 0 4 0 5 1
2 0 2 0 5 0 1 1 2 0 7 5 1 0 4 3
13 6 6 0 10 8 10 5 17 6 4 0 12 5 7 6
7 3 0 2 5 3 8 0 3 6 1 4 2 3 0 3
8 0 6 1 2 2 6 3 7 6 4 0 1 4 2 1
3 5 3 0 0 4 4 1 4 0 3 2 0 0 1 0
13 3 6 0 7 5 3 2 21 8 13 3 5 0 13 7
3 5 6 2 2 2 0 2 5 0 7 0 1 3 7 5
7 4 5 3 4 5 2 0 23 9 10 5 9 7 9 8
11 5 7 7 9 7 1 0 17 13 7 10 6 5 8 10
Output:
103
................
..#.....###.#...
.......#...#.#..
....###..#.#.#..
.#..##.#...#....
...##.....#.....
..#####.#..#.#..
.#.#.###.#..##..
...#.......#....
..#....#..#...#.
.#.#.......#....
...##..#.#..##..
.#.#.........#..
......#..#.##...
.#..............
................
I tried different ways. Floodfill from max value, then from min value, but it's not working for every input or require code complication. Any ideas?
I'm interesting algorithm with complexity O(n^2) or o(n^3).
Summary
I would be tempted to try and solve this using a disjoint-set data structure.
The algorithm would be to iterate over all heights in the map performing a floodfill operation at each height.
Details
For each height x (starting at 0)
Connect all flagstones of height x to their neighbours if the neighbour height is <= x (storing connected sets of flagstones in the disjoint set data structure)
Remove any sets that connected to the grass
Mark all flagstones of height x in still remaining sets as being puddles
Add the total count of flagstones in remaining sets to a total t
At the end t gives the total volume of water.
Worked Example
0 0 0 0 0 1 0 0
0 1 1 1 0 1 0 0
0 1 0 2 1 2 4 5
0 1 1 2 0 2 4 5
0 3 3 3 3 3 3 4
0 3 0 1 2 0 3 4
0 3 3 3 3 3 3 0
0 0 0 0 0 0 0 0
Connect all flagstones of height 0 into sets A,B,C,D,E,F
A A A A A 1 B B
A 1 1 1 A 1 B B
A 1 C 2 1 2 4 5
A 1 1 2 D 2 4 5
A 3 3 3 3 3 3 4
A 3 E 1 2 F 3 4
A 3 3 3 3 3 3 A
A A A A A A A A
Remove flagstones connecting to the grass, and mark remaining as puddles
1
1 1 1 1
1 C 2 1 2 4 5 #
1 1 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E 1 2 F 3 4 # #
3 3 3 3 3 3
Count remaining set size t=4
Connect all of height 1
G
C C C G
C C 2 D 2 4 5 #
C C 2 D 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
2 2 4 5 #
2 2 4 5 #
3 3 3 3 3 3 4
3 E E 2 F 3 4 # # #
3 3 3 3 3 3
t=4+3=7
Connect all of height 2
A B 4 5 #
A B 4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # #
3 3 3 3 3 3
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
3 3 3 3 3 3 4
3 E E E E 3 4 # # # #
3 3 3 3 3 3
t=7+4=11
Connect all of height 3
4 5 #
4 5 #
E E E E E E 4
E E E E E E 4 # # # #
E E E E E E
Remove flagstones connecting to the grass, and mark remaining as puddles
4 5 #
4 5 #
4
4 # # # #
After doing this for heights 4 and 5 nothing will remain.
A preprocessing step to create lists of all locations with each height should mean that the algorithm is close to O(n^2).
This seems to be working nicely. The idea is it is a recursive function, that checks to see if there is an "outward flow" that will allow it to escape to the edge. If the values that do no have such an escape will puddle. I tested it on your two input files and it works quite nicely. I copied the output for these two files for you. Pardon my nasty use of global variables and what not, I figured it was the concept behind the algorithm that mattered, not good style :)
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
int SIZE_X;
int SIZE_Y;
bool **result;
int **INPUT;
bool flowToEdge(int x, int y, int value, bool* visited) {
if(x < 0 || x == SIZE_X || y < 0 || y == SIZE_Y) return true;
if(visited[(x * SIZE_X) + y]) return false;
if(value < INPUT[x][y]) return false;
visited[(x * SIZE_X) + y] = true;
bool left = false;
bool right = false;
bool up = false;
bool down = false;
left = flowToEdge(x-1, y, value, visited);
right = flowToEdge(x+1, y, value, visited);
up = flowToEdge(x, y+1, value, visited);
down = flowToEdge(x, y-1, value, visited);
return (left || up || down || right);
}
int main() {
ifstream myReadFile;
myReadFile.open("test.txt");
myReadFile >> SIZE_X;
myReadFile >> SIZE_Y;
INPUT = new int*[SIZE_X];
result = new bool*[SIZE_X];
for(int i = 0; i < SIZE_X; i++) {
INPUT[i] = new int[SIZE_Y];
result[i] = new bool[SIZE_Y];
for(int j = 0; j < SIZE_Y; j++) {
int someInt;
myReadFile >> someInt;
INPUT[i][j] = someInt;
result[i][j] = false;
}
}
for(int i = 0; i < SIZE_X; i++) {
for(int j = 0; j < SIZE_Y; j++) {
bool visited[SIZE_X][SIZE_Y];
for(int k = 0; k < SIZE_X; k++)//You can avoid this looping by using maps with pairs of coordinates instead
for(int l = 0; l < SIZE_Y; l++)
visited[k][l] = 0;
result[i][j] = flowToEdge(i,j, INPUT[i][j], &visited[0][0]);
}
}
for(int i = 0; i < SIZE_X; i++) {
cout << endl;
for(int j = 0; j < SIZE_Y; j++)
cout << result[i][j];
}
cout << endl;
}
The 16 by 16 file:
1111111111111111
1101111100010111
1111111011101011
1111000110101011
1011001011101111
1110011111011111
1100000101101011
1010100010110011
1110111111101111
1101101011011101
1010111111101111
1110011010110011
1010111111111011
1111110110100111
1011111111111111
1111111111111111
The 8 by 8 file
11111111
11111111
11011111
11110111
11111111
11000011
11111111
11111111
You could optimize this algorithm easily and considerably by doing several things. A: return true immediately upon finding a route would speed it up considerably. You could also connect it globally to the current set of results so that any given point would only have to find a flow point to an already known flow point, and not all the way to the edge.
The work involved, each n will have to exam each node. However, with optimizations, we should be able to get this much lower than n^2 for most cases, but it still an n^3 algorithm in the worst case... but creating this would be very difficult(with proper optimization logic... dynamic programming for the win!)
EDIT:
The modified code works for the following circumstances:
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 1 1
And these are the results:
11111111
10000001
10111101
10100101
10110101
10110101
10000101
11111111
Now when we remove that 1 at the bottom we want to see no puddling.
8 8
1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 1
1 0 1 1 1 1 0 1
1 0 1 0 0 1 0 1
1 0 1 1 0 1 0 1
1 0 1 1 0 1 0 1
1 0 0 0 0 1 0 1
1 1 1 1 1 1 0 1
And these are the results
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1