What is the maximum number of comparisons to heapify an array? - heap

Is there a general formula to calculate the maximum number of comparisons to heapify n elements?
If not, is 13 the max number of comparisons to heapify an array of 8 elements?
My reasoning is as such:
at h = 0, 1 node, 0 comparisons, 1* 0 = 0 comparisons
at h = 1, 2 nodes, 1 comparison each, 2*1 = 2 comparisons
at h = 2, 4 nodes, 2 comparisons each, 4*2 = 8 comparisons
at h = 3, 1 node, 3 comparisons each, 1*3 = 3 comparisons
Total = 0 + 2 + 8 + 3 =13

Accepted theory is that build-heap requires at most (2N - 2) comparisons. So the maximum number of comparisons required should be 14. We can confirm that easily enough by examining a heap of 8 elements:
7
/ \
3 1
/ \ / \
5 4 8 2
/
6
Here, the 4 leaf nodes will never move down. The nodes 5 and 1 can move down 1 level. 3 could move down two levels. And 7 could move down 3 levels. So the maximum number of level moves is:
(0*4)+(1*2)+(2*1)+(3*1) = 7
Every level move requires 2 comparisons, so the maximum number of comparisons would be 14.

Related

How to construct a tree given its depth and postorder traversal, then print its preorder traversal

I need to construct a tree given its depth and postorder traversal, and then I need to generate the corresponding preorder traversal. Example:
Depth: 2 1 3 3 3 2 2 1 1 0
Postorder: 5 2 8 9 10 6 7 3 4 1
Preorder(output): 1 2 5 3 6 8 9 10 7 4
I've defined two arrays that contain the postorder sequence and depth. After that, I couldn't come up with an algorithm to solve it.
Here's my code:
int postorder[1000];
int depth[1000];
string postorder_nums;
getline(cin, postorder_nums);
istringstream token1(postorder_nums);
string tokenString1;
int idx1 = 0;
while (token1 >> tokenString1) {
postorder[idx1] = stoi(tokenString1);
idx1++;
}
string depth_nums;
getline(cin, depth_nums);
istringstream token2(depth_nums);
string tokenString2;
int idx2 = 0;
while (token2 >> tokenString2) {
depth[idx2] = stoi(tokenString2);
idx2++;
}
Tree tree(1);
You can do this actually without constructing a tree.
First note that if you reverse the postorder sequence, you get a kind of preorder sequence, but with the children visited in opposite order. So we'll use this fact and iterate over the given arrays from back to front, and we will also store values in the output from back to front. This way at least the order of siblings will come out right.
The first value we get from the input will thus always be the root value. Obviously we cannot store this value at the end of the output array, as it really should come first. But we will put this value on a stack until all other values have been processed. The same will happen for any value that is followed by a "deeper" value (again: we are processing the input in reversed order). But as soon as we find a value that is not deeper, we flush a part of the stack into the output array (also filling it up from back to front).
When all values have been processed, we just need to flush the remaining values from the stack into the output array.
Now, we can optimise our space usage here: as we fill the output array from the back, we have free space at its front to use as the stack space for this algorithm. This has as nice consequence that when we arrive at the end we don't need to flush the stack anymore, because it is already there in the output, with every value where it should be.
Here is the code for this algorithm where I did not include the input collection, which apparently you already have working:
// Input example
int depth[] = {2, 1, 3, 3, 3, 2, 2, 1, 1, 0};
int postorder[] = {5, 2, 8, 9, 10, 6, 7, 3, 4, 1};
// Number of values in the input
int n = sizeof(depth)/sizeof(int);
int preorder[n]; // This will contain the ouput
int j = n; // index where last value was stored in preorder
int stackSize = 0; // how many entries are used as stack in preorder
for (int i = n - 1; i >= 0; i--) {
while (depth[i] < stackSize) {
preorder[--j] = preorder[--stackSize]; // flush it
}
preorder[stackSize++] = postorder[i]; // stack it
}
// Output the result:
for (int i = 0; i < n; i++) {
std::cout << preorder[i] << " ";
}
std::cout << "\n";
This algorithm has an auxiliary space complexity of O(1) -- so not counting the memory needed for the input and the output -- and has a time complexity of O(n).
I won't give you the code, but some hints how to solve the problem.
First, for postorder graph processing you first visit the children, then print (process) the value of the node. So, the tree or subtree parent is the last thing that is processed in its (sub)tree. I replace 10 with 0 for better indentation:
2 1 3 3 3 2 2 1 1 0
--------------------
5 2 8 9 0 6 7 3 4 1
As explained above, node of depth 0, or the root, is the last one. Let's lower all other nodes 1 level down:
2 1 3 3 3 2 2 1 1 0
-------------------
1
5 2 8 9 0 6 7 3 4
Now identify all nodes of depth 1, and lower all that is not of depth 0 or 1:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 8 9 0 6 7
As you can see, (5,2) is in a subtree, (8,9,10,6,7,3) in another subtree, (4) is a single-node subtree. In other words, all that is to the left of 2 is its subtree, all to the right of 2 and to the left of 3 is in the subtree of 3, all between 3 and 4 is in the subtree of 4 (here: empty).
Now lets deal with depth 3 in a similar way:
2 1 3 3 3 2 2 1 1 0
-------------------
1
2 3 4
5 6 7
8 9 0
2 is the parent for 2;
6 is the parent for 8, 8, 10;
3 is ahe parent for 6,7;
or very explicitly:
2 1 3 3 3 2 2 1 1 0
-------------------
1
/ / /
2 3 4
/ / /
5 6 7
/ / /
8 9 0
This is how you can construct a tree from the data you have.
EDIT
Clearly, this problem can be solved easily by recursion. In each step you find the lowest depth, print the node, and call the same function recursively for each of its subtrees as its argument, where the subtree is defined by looking for current_depth + 1. If the depth is passed as another argument, it can save the necessity of computing the lowest depth.

Can we really avoid extra space when all the values are non-negative?

This question is a follow-up of another one I had asked quite a while ago:
We have been given an array of integers and another number k and we need to find the total number of continuous subarrays whose sum equals to k. For e.g., for the input: [1,1,1] and k=2, the expected output is 2.
In the accepted answer, #talex says:
PS: BTW if all values are non-negative there is better algorithm. it doesn't require extra memory.
While I didn't think much about it then, I am curious about it now. IMHO, we will require extra memory. In the event that all the input values are non-negative, our running (prefix) sum will go on increasing, and as such, sure, we don't need an unordered_map to store the frequency of a particular sum. But, we will still need extra memory (perhaps an unordered_set) to store the running (prefix) sums that we get along the way. This obviously contradicts what #talex said.
Could someone please confirm if we absolutely do need extra memory or if it could be avoided?
Thanks!
Let's start with a slightly simpler problem: all values are positive (no zeros). In this case the sub arrays can overlap, but they cannot contain one another.
I.e.: arr = 2 1 5 1 1 5 1 2, Sum = 8
2 1 5 1 1 5 1 2
|---|
|-----|
|-----|
|---|
But this situation can never occur:
* * * * * * *
|-------|
|---|
With this in mind there is algorithm that doesn't require extra space (well.. O(1) space) and has O(n) time complexity. The ideea is to have left and right indexes indicating the current sequence and the sum of the current sequence.
if the sum is k increment the counter, advance left and right
if the sum is less than k then advance right
else advance left
Now if there are zeros the intervals can contain one another, but only if the zeros are on the margins of the interval.
To adapt to non-negative numbers:
Do as above, except:
skip zeros when advancing left
if sum is k:
count consecutive zeros to the right of right, lets say zeroes_right_count
count consecutive zeros to the left of left. lets say zeroes_left_count
instead of incrementing the count as before, increase the counter by: (zeroes_left_count + 1) * (zeroes_right_count + 1)
Example:
... 7 0 0 5 1 2 0 0 0 9 ...
^ ^
left right
Here we have 2 zeroes to the left and 3 zeros to the right. This makes (2 + 1) * (3 + 1) = 12 sequences with sum 8 here:
5 1 2
5 1 2 0
5 1 2 0 0
5 1 2 0 0 0
0 5 1 2
0 5 1 2 0
0 5 1 2 0 0
0 5 1 2 0 0 0
0 0 5 1 2
0 0 5 1 2 0
0 0 5 1 2 0 0
0 0 5 1 2 0 0 0
I think this algorithm would work, using O(1) space.
We maintain two pointers to the beginning and end of the current subsequence, as well as the sum of the current subsequence. Initially, both pointers point to array[0], and the sum is obviously set to array[0].
Advance the end pointer (thus extending the subsequence to the right), and increase the sum by the value it points to, until that sum exceeds k. Then advance the start pointer (thus shrinking the subsequence from the left), and decrease the sum, until that sum gets below k. Keep doing this until the end pointer reaches the end of the array. Keep track of the number of times the sum was exactly k.

How do optimize my code to find product of all the contiguous subsequences of an array?

This is my try to count the contiguous subsequences of an array with product mod 4 is not equal to 2:
# include <iostream>
using namespace std;
int main() {
long long int n, i, j, s, t, count = 0;
cin>>n;
long long int arr[n];
count = 0;
for(i = 0; i<n; i++) {
cin>>arr[i];
}
for(i = 0; i<n; i++) {
s = 1;
for(j = i; j<n; j++) {
s = s*arr[j];
if(s%4!=2) {
count++;
}
}
}
cout<<count;
return 0;
}
However, I want to reduce the time taken by my code to execute. I am looking for a way to do it. Any help/hint would be appreciated.
Thank you.
What does this definition of contiguous subsequences mean?
Listing all the subsequences
Suppose we have the sequence:
A B C D E F
First of all, we should recognize that there is one substring for every unique start and end point. Let's use the notation C-F to mean all items from C through F: i.e.: C D E F.
We can list all subsequences in a triangular arrangement like this:
A B C D E F
A-B B-C C-D D-E E-F
A-C B-D C-E D-F
A-D B-E C-F
A-E B-F
A-F
The first row lists all the subsequences of length 1.
The second row lists all the subsequences of length 2.
The third row lists all the subsequences of length 3. Etc.
The last row is the full sequence.
Modular arithmetic
Computing the product MOD 4 of a set of numbers
To figure out the product of a bunch of numbers MOD 4, we just need to look at each element of the set MOD 4. Intuitively, this is because when you multiply a bunch of numbers, the last digit of the result is determined entirely by the last digit of each factor. In this case "the last digit base 4" is the number mod 4.
The identity we are using is:
(A * B) MOD N == ((A MOD N) * (B MOD N)) MOD N
The table of products
Now we also have to look at the matrix of possible multiplications that might happen. It's a fairly small table and the interesting entries are given here:
2 * 2 = 4 4 MOD 4 = 0
2 * 3 = 6 6 MOD 4 = 2
3 * 3 = 9 9 MOD 4 = 1
So the results of multiplying any 2 numbers MOD 4 are given by this table:
+--------+---+---+---+---+
| Factor | 0 | 1 | 2 | 3 |
+--------+---+---+---+---+
| 0 | 0 | / | / | / |
| 1 | 0 | 1 | / | / |
| 2 | 0 | 2 | 0 | / |
| 3 | 0 | 3 | 2 | 1 |
+--------+---+---+---+---+
The /'s are omitted because of the symmetry of multiplication (A * B = B * A)
An example sequence
Now for each subsequence, let's compute the product MOD 4 of its elements.
Consider the following list of numbers
242 497 681 685 410 795
The first thing we do is take all these numbers MOD 4 and list them as the first row of our list of all subsequences triangle.
2 0 1 1 2 3
The second row is just the product of the pairs above it.
2 0 1 1 2 3
0 0 1 2 3
In general, the Nth element of each row is the product, MOD 4, of:
the number just to its left in the row above left times and
the element in the first row that is diagonally to its right
For example C = A * B
* * * * B *
* * * / *
* A / *
* C *
* *
*
Again,
A is immediately up and left of C
B is diagonally right all the way to the top row from C
Now we can complete our triangle
2 0 1 1 2 3
0 0 1 2 3
0 0 2 3
0 0 2
0 0
0
This can be computed easily in O(n^2) time.
Optimization
These optimizations do not improve the time complexity of the algorithm in its worse case, but can cause an early exit in the computation, and should therefore be included if time is to be reduced and the input is unknown.
Contageous 0's
Furthermore, as a matter of optimization, notice how contagious the 0's are. Anything times 0 is 0, so you can skip computing products of cells below a 0. In your case those sequences will not equal 2 MOD 4 once the product of one of its subsequences is determined to be equal to 0 MOD 4.
* * * 0 * * // <-- this zero infects all cells below it
* * 0 0 *
* 0 0 0
0 0 0
0 0
0
Need a 2 to make a 2.
Look back at the table of factors and products. Notice that the only way to get a product that is equal to 2 MOD 4 is to have one of the factors be equal to 2 MOD 4. What that means is that there can only be a 2 below another 2. So we are only interested in following computing entries in the table that are below a 2. Other entries in rows below can never become a 2.
You don't have to store more than the whole rows.
You only need O(n) storage to implement this. Working line by line, you can compute the values in a row entirely from the values in the first row and values in the row above.
Reading the answers from the table
Now you can look at the rows of the triangle list as you generate them and read off which subsequences are to be included.
Entries with a 2 are to be excluded. All others are to be included.
2 0 1 1 3 2
0 0 1 3 2
0 0 3 2
0 0 2
0 0
0
The excluded subsequences for the example (which I will list only because there are fewer of them in my example) are:
A
F
E-F
D-F
C-F
Which remember, according to our convention refer to the elements:
A
F
E F
D E F
C D E F
Which are:
242
795
410 795
685 410 795
681 685 410 795
Hopefully it's obvious how to display the "included" sequences, rather than the "excluded" sequences, as I have shown above.
Displaying all the elements makes it take much longer.
Sadly, actually displaying all of the elements of such subsequences is still an O(N^3) operation in the worst case. (Imagine a sequence of all zeros.)
Summary
For me, I feel like an average developer could take the magic bullet observation made in the diagram below and write an implementation that has optimal time complexity.
C = A * B
* * * * B *
* * * / *
* A / *
* C *
* *
*

Downscale array for decimal factor

Is there efficient way to downscale number of elements in array by decimal factor?
I want to downsize elements from one array by certain factor.
Example:
If I have 10 elements and need to scale down by factor 2.
1 2 3 4 5 6 7 8 9 10
scaled to
1.5 3.5 5.5 7.5 9.5
Grouping 2 by 2 and use arithmetic mean.
My problem is what if I need to downsize array with 10 elements to 6 elements? In theory I should group 1.6 elements and find their arithmetic mean, but how to do that?
Before suggesting a solution, let's define "downsize" in a more formal way. I would suggest this definition:
Downsizing starts with an array a[N] and produces an array b[M] such that the following is true:
M <= N - otherwise it would be upsizing, not downsizing
SUM(b) = (M/N) * SUM(a) - The sum is reduced proportionally to the number of elements
Elements of a participate in computation of b in the order of their occurrence in a
Let's consider your example of downsizing 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 to six elements. The total for your array is 55, so the total for the new array would be (6/10)*55 = 33. We can achieve this total in two steps:
Walk the array a totaling its elements until we've reached the integer part of N/M fraction (it must be an improper fraction by rule 1 above)
Let's say that a[i] was the last element of a that we could take as a whole in the current iteration. Take the fraction of a[i+1] equal to the fractional part of N/M
Continue to the next number starting with the remaining fraction of a[i+1]
Once you are done, your array b would contain M numbers totaling to SUM(a). Walk the array once more, and scale the result by N/M.
Here is how it works with your example:
b[0] = a[0] + (2/3)*a[1] = 2.33333
b[1] = (1/3)*a[1] + a[2] + (1/3)*a[3] = 5
b[2] = (2/3)*a[3] + a[4] = 7.66666
b[3] = a[5] + (2/3)*a[6] = 10.6666
b[4] = (1/3)*a[6] + a[7] + (1/3)*a[8] = 13.3333
b[5] = (2/3)*a[8] + a[9] = 16
--------
Total = 55
Scaling down by 6/10 produces the final result:
1.4 3 4.6 6.4 8 9.6 (Total = 33)
Here is a simple implementation in C++:
double need = ((double)a.size()) / b.size();
double have = 0;
size_t pos = 0;
for (size_t i = 0 ; i != a.size() ; i++) {
if (need >= have+1) {
b[pos] += a[i];
have++;
} else {
double frac = (need-have); // frac is less than 1 because of the "if" condition
b[pos++] += frac * a[i]; // frac of a[i] goes to current element of b
have = 1 - frac;
b[pos] += have * a[i]; // (1-frac) of a[i] goes to the next position of b
}
}
for (size_t i = 0 ; i != b.size() ; i++) {
b[i] /= need;
}
Demo.
You will need to resort to some form of interpolation, as the number of elements to average isn't integer.
You can consider computing the prefix sum of the array, i.e.
0 1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
yields by summation
0 1 2 3 4 5 6 7 8 9
1 3 6 10 15 21 28 36 45 55
Then perform linear interpolation to get the intermediate values that you are lacking, like at 0*, 10/6, 20/6, 30/5*, 40/6, 50/6, 60/6*. (Those with an asterisk are readily available).
0 1 10/6 2 3 20/6 4 5 6 40/6 7 8 50/6 9
1 3 15/3 6 10 35/3 15 21 28 100/3 36 45 145/3 55
Now you get fractional sums by subtracting values in pairs. The first average is
(15/3-1)/(10/6) = 12/5
I can't think of anything in the C++ library that will crank out something like this, all fully cooked and ready to go.
So you'll have to, pretty much, roll up your sleeves and go to work. At this point, the question of what's the "efficient" way of doing it boils down to its very basics. Which means:
1) Calculate how big the output array should be. Based on the description of the issue, you should be able to make that calculation even before looking at the values in the input array. You know the input array's size(), you can calculate the size() of the destination array.
2) So, you resize() the destination array up front. Now, you no longer need to worry about the time wasted in growing the size of the dynamic output array, incrementally, as you go through the input array, making your calculations.
3) So what's left is the actual work: iterating over the input array, and calculating the downsized values.
auto b=input_array.begin();
auto e=input_array.end();
auto p=output_array.begin();
Don't see many other options here, besides brute force iteration and calculations. Iterate from b to e, getting your samples, calculating each downsized value, and saving the resulting value into *p++.

How to know if an index in a binary heap is on an odd level?

If I have a binary heap , with the typical properties of left neighbor of position "pos" being (2*pos)+1 while right neighbor is (2*pos)+2 and parent node in (pos-1) )/ 2, how can I efficiently determine if a given index represents a node on an odd level (with the level of the root being level 0) ?
(Disclaimer: This is a more complete answer based on Jarod42's comment.)
The formula you want is:
floor(log2(pos+1)) mod 2
To understand why, look at the levels of the first few nodes:
0 Level: 0
1 2 1
3 4 5 6 2
7 8 9 10 11 12 13 14 3
0 -> 0
1 -> 1
2 -> 1
3 -> 2
...
6 -> 2
7 -> 3
...
The first step is to find a function that will map node numbers to level numbers in this way. Adding one to the number and taking a base 2 logarithm will give you almost (but not quite) what you want:
log2 (0+1) = log2 1 = 0
log2 (1+1) = log2 2 = 1
log2 (2+1) = log2 3 = 1.6 (roughly)
log2 (3+1) = log2 4 = 2
....
log2 (6+1) = log2 7 = 2.8 (roughly)
log2 (7+1) = log2 8 = 3
You can see from this that rounding down to the nearest integer in each case will give you the level of each node, hence giving us floor(log2(pos+1)).
As Jarod42 said, it's then a case of looking at the parity of the level number, which just involves taking the number mod 2. This will give either 0 (the level is even) or 1 (the level is odd).