How to map array elements to select binary tree nodes? - c++

I have an array of length n. I want to sort the array elements such that my new array elements are like
arr[0] = arr[n/2]
arr[1] = arr[n/4]
arr[2] = arr[3n/4]
arr[3] = arr[n/8]
arr[4] = arr[3n/8]
arr[5] = arr[5n/8]
and so on...
What I have tried, using vectors.
#include <iostream>
#include <algorithm>
#include <vector>
bool myfunc (int l, int r)
{
int m = (l+r)/2;
return m;
}
int main()
{
std::vector<int> myvector = {3,1,20,9,7,5,6,22,17,14,4};
std::sort (myvector.begin(), myvector.end(), myfunc);
for (std::vector<int>::iterator it=myvector.begin(); it!=myvector.end(); ++it)
std::cout << ' ' << *it;
std::cout << '\n';
return 0;
}
So, for an array for length 11, I expect
myvector[0] = arr[5]
myvector[1] = arr[2]
myvector[2] = arr[8]
myvector[3] = arr[0]
myvector[4] = arr[3]
myvector[5] = arr[6]
myvector[6] = arr[9]
myvector[7] = arr[1]
myvector[8] = arr[4]
myvector[9] = arr[7]
myvector[10] = arr[10]
My question is, what should be my function definition of myfunc, such that I get expected output
bool myfunc (int l, int r)
{
int m = (l+r)/2;
//Cant figure out this logic
}
I have tried debugger, but that definitely doesnt help in defining the function! Any clues would be appreciated.

It appears you want a binary search tree (BST) stored in array form, using the same internal represenation which is often used to store a heap.
The expected output is an array such that the one based indexes form a tree, where for any one-based index x, the left node of x is at index 2*x, and the right node of x is at index 2*x+1. Additionally, there are no gaps, meaning every member of the array is used, up to N. (It is a complete binary tree) Since c++ uses zero-based indexing, you need to be careful with this one-based index.
That way of representing a tree is very good for storing a heap data structure, but very bad for a binary search tree where you want to insert things, thus breaking the completeness, and forcing you into a very expensive rebalance.
You asked for a mapping from the sorted array index to this array format. We can build it using a recursive function. This recursive function will take exactly the same amount of work as it would have taken to build the binary tree, and in fact, it is nearly identical to how you would write that function, so this is not an optimal approach. We are doing as much work as the entire problem requires, just to come up with an intermediary step.
The special note here is that we do not want the median. We want to ensure that the left subtree forms a perfect binary tree, so that it fits in the array with no gaps. Therefore, it must have a power of 2, minus 1 nodes. The right subtree can be merely complete.
int log2(int n) {
if (n > 1)
return 1 + log2(n / 2);
return 0;
}
// current_position is the index in bst_indexes
void build_binary_tree_index_mapping(std::vector<int> &bst_indexes, int lower, int upper, int current_position=0) {
if (current_position >= bst_indexes.size())
return;
int power = log2(upper - lower);
int number = 1 << (power); // left subtree must be perfect
int root = lower + number - 1;
// fill current_position
// std::cout << current_position << " = " << root << std::endl;
bst_indexes[current_position] = root;
if (lower < root) {
// fill left subtree
int left_node_position = (current_position + 1) * 2 - 1;
build_binary_tree_index_mapping(bst_indexes, lower, root - 1, left_node_position);
}
if (root < upper) {
// fill right subtree
int right_node_position = (current_position + 1) * 2 + 1 - 1;
build_binary_tree_index_mapping(bst_indexes, root + 1, upper, right_node_position);
}
}
This gives me {7, 3, 9, 1, 5, 8, 10, 0, 2, 4, 6} as the index mapping. It differs from yours because you left spaces in the lower left of the tree, and I am ensuring that the array is completely filled, so I had to shift the bottom row over, then the BST property required reordering everything.
As a side note, in order to use this mapping, you first must sort the data, which is also about the same complexity as the whole problem.
Additionally, the sorted vector already gives you a superior way to do a binary search, using std::binary_search http://en.cppreference.com/w/cpp/algorithm/binary_search.

Related

Count how many iterations of deletion until array is ordered

I'm trying to write a program whose input is an array of integers, and its size. This code has to delete each element which is smaller than the element to the left. We want to find number of times that we can process the array this way, until we can no longer delete any more elements.
The contents of the array after we return are unimportant - only the return value is of interest.
For example: given the array [10, 9, 7, 8, 6, 5, 3, 4, 2, 1], the function should return 2, because:
[10,9,7,8,6,5,3,4,2,1] → [10,8,4] → [10]
For example: given the array [1,2,3,4], the function should return 0, because
No element is larger than the element to its right
I want each element to remove the right element if it is more than its right element. We get a smaller array. Then we repeat this operation again. Until we get to an array in which no element can delete another element. I want to calculate the number of steps performed.
int Mafia(int n, vector <int> input_array)
{
int ptr = n;
int last_ptr = n;
int night_Count = 0;
do
{
last_ptr = ptr;
ptr = 1;
for (int i = 1; i < last_ptr; i++)
{
if (input_array[i] >= input_array[i - 1])
{
input_array[ptr++] = input_array[i];
}
}
night_Count++;
} while (last_ptr > ptr);
return night_Count - 1;
}
My code works but I want it to be faster.
Do you have any idea to make this code faster, or another way that is faster than this?
Here is a O(NlogN) solution.
The idea is to iterate over the array and keep tracking candidateKillers which could kill unvisited numbers. Then we find the killer for the current number by using binary search and update the maximum iterations if needed.
Since we iterate over the array which has N numbers and apply log(N) binary search for each number, the overall time complexity is O(NlogN).
Alogrithm
If the current number is greater or equal than the number before it, it could be a killer for numbers after it.
For each killer, we keep tracking its index idx, the number of it num and the iterations needed to reach that killer iters.
The numbers in the candidateKillers by its nature are non-increasing (see next point). Therefore we can apply binary search to find the killer of the current number, which is the one that is a) the closest to the current number b) greater than the current number. This is implemented in searchKiller.
If the current number will be killed by a number in candidateKillers with killerPos, then all candidate killers after killerPos are outdated, because those outdated killers will be killed before the numbers after the current number reach them. If the current number is greater than all candidateKillers, then all the candidateKillers can be discarded.
When we find the killer of the current number, we increase the iters of the killer by one. Because from now on, one more iteration is needed to reach that killer where the current number need to be killed first.
class Solution {
public:
int countIterations(vector<int>& array) {
if (array.size() <= 1) {
return 0;
}
int ans = 0;
vector<Killer> candidateKillers = {Killer(0, array[0], 1)};
for (auto i = 1; i < array.size(); i++) {
int curNum = array[i];
int killerPos = searchKiller(candidateKillers, curNum);
if (killerPos == -1) {
// current one is the largest so far and all candidateKillers before are outdated
candidateKillers = {Killer(i, curNum, 1)};
continue;
}
// get rid of outdated killers
int popCount = candidateKillers.size() - 1 - killerPos;
for (auto j = 0; j < popCount; j++) {
candidateKillers.pop_back();
}
Killer killer = candidateKillers[killerPos];
ans = max(killer.iters, ans);
if (curNum < array[i-1]) {
// since the killer of the current one may not even be in the list e.g., if current is 4 in [6,5,4]
if (killer.idx == i - 1) {
candidateKillers[killerPos].iters += 1;
}
} else {
candidateKillers[killerPos].iters += 1;
candidateKillers.push_back(Killer(i, curNum, 1));
}
}
return ans;
}
private:
struct Killer {
Killer(int idx, int num, int iters)
: idx(idx), num(num), iters(iters) {};
int idx;
int num;
int iters;
};
int searchKiller(vector<Killer>& candidateKillers, int n) {
int lo = 0;
int hi = candidateKillers.size() - 1;
if (candidateKillers[0].num < n) {
return -1;
}
int ans = -1;
while (lo <= hi) {
int mid = lo + (hi - lo) / 2;
if (candidateKillers[mid].num > n) {
ans = mid;
lo = mid + 1;
} else {
hi = mid - 1;
}
}
return ans;
}
};
int main() {
vector<int> array1 = {10, 9, 7, 8, 6, 5, 3, 4, 2, 1};
vector<int> array2 = {1, 2, 3, 4};
vector<int> array3 = {4, 2, 1, 2, 3, 3};
cout << Solution().countIterations(array1) << endl; // 2
cout << Solution().countIterations(array2) << endl; // 0
cout << Solution().countIterations(array3) << endl; // 4
}
You can iterate in reverse, keeping two iterators or indices and moving elements in place. You don't need to allocate a new vector or even resize existing vector. Also a minor, but can replace recursion with loop or write the code the way compiler likely to do it.
This approach is still O(n^2) worst case but it would be faster in run time.

Upsampling: insert extra values between each consecutive elements of a vector

Suppose we a have a vector V consisting of 20 floating point numbers. Is it possible to insert values between each pair of these floating points such that vector V becomes a vector of exactly 50 numbers.
The inserted value should be a random number between upper and lower value I've decided to insert a midpoint of two values between the two.
I have try the following:
vector<double> upsample(vector<double>& in)
{
vector<double> temp;
for (int i = 1; i <= in.size() - 1 ; i++)
{
double sample = (in[i] + in[i - 1]) / 2;
temp.push_back(in[i - 1]);
temp.push_back(sample);
}
temp.push_back(in.back());
return temp;
}
with this function the input vector elements increases by 2(n) - 1 (20 elements becomes 39). It can be possible that the input vector has different sizes less than 50.
I think it can be done by inserting more than one value between two elements randomly to have a vector of size 50 (e.g. between V[0] and V[1] insert 3 values, between V[3] and V[4] insert 1 value, etc.). Is this possible?
Could you please guide me how to perform this?
Thank you.
So I did some math myself, because I was curious how to get the weight ratios (as if upsampling linearly up to common multiple and then extracting only target values from large array - but without creating the large array, just using weights to know how much left+right elements contribute to particular value).
The sample code does always create new value by simple weighted average (i.e. 40% of 123.4 with 60% 567.8 will give "upscaled" value 390.04), no random used to pepper the upscaled values (leaving that part to OP).
The ratios goes like this:
if vector of size M is being upscaled to size N (M <= N) (the "upscale" will always preserve first and last element of input vector, those are "fixed" in this proposal)
then every upscaled element can be viewed as being somewhere between some original elements [i, i+1].
If we would declare the "distance" between the source elements [i, i+1] equal to d = N-1, then the upscaled element distance between can be always expressed as some j/d where j:[0,d] (when j is actual d, it's precisely at "i+1" element and can be considered the same case as j=0 but with [i+1,i+2] source elements)
And the distance between two upscaled elements is then M-1.
So when source vector is size 4, and upscaled vector size should be 5, the ratios for upscaled elements are [ [4/4,0/4], [1/4,3/4], [2/4,2/4], [3/4,1/4], [0/4,4/4] ] of elements (indices into vector) [ [0,1], [0,1], [1,2], [2, 3], [2, 3] ].
("distance" between source elements is 5-1=4, that's then the "/4" to normalize weights, "distance" between upscaled elements is 4-1=3, that's why the ratios are shifting by [-3,+3] in every step).
I'm afraid that my description is far from "obvious" (how it feels in my head after figuring it out), but if you will put some of that into spreadsheet and toy around with it, hopefully it will make some sense. Or maybe you can debug the code to get better feel how that mumbling above transforms into real code.
Code example 1, this one will "copy" source element only if the weight is precisely fully onto it (i.e. in the example data only first and last element are "copied", rest of the upscaled elements are weighted averages of original values).
#include <iostream>
#include <vector>
#include <cassert>
static double get_upscale_value(const size_t total_weight, const size_t right_weight, const double left, const double right) {
// do the simple weighted average for demonstration purposes
const size_t left_weight = total_weight - right_weight;
return (left * left_weight + right * right_weight) / total_weight;
}
std::vector<double> upsample_weighted(std::vector<double>& in, size_t n)
{
assert( 2 <= in.size() && in.size() <= n ); // this is really only upscaling (can't downscale)
// resulting vector variable
std::vector<double> upscaled;
upscaled.reserve(n);
// upscaling factors variables and constants
size_t index_left = 0; // first "left" item is the in[0] element
size_t weight_right = 0; // and "right" has zero weight (i.e. in[0] is copied)
const size_t in_weight = n - 1; // total weight of single "in" element
const size_t weight_add = in.size() - 1; // shift of weight between "upscaled" elements
while (upscaled.size() < n) { // add N upscaled items
if (0 == weight_right) {
// full weight of left -> just copy it (never tainted by "upscaling")
upscaled.push_back(in[index_left]);
} else {
// the weight is somewhere between "left" and "right" items of "in" vector
// i.e. weight = 1..(in_weight-1) ("in_weight" is full "right" value, never happens)
double upscaled_val = get_upscale_value(in_weight, weight_right, in[index_left], in[index_left+1]);
upscaled.push_back(upscaled_val);
}
weight_right += weight_add;
if (in_weight <= weight_right) {
// the weight shifted so much that "right" is new "left"
++index_left;
weight_right -= in_weight;
}
}
return upscaled;
}
int main(int argc, const char *argv[])
{
std::vector<double> in { 10, 20, 30 };
// std::vector<double> in { 20, 10, 40 };
std::vector<double> upscaled = upsample_weighted(in, 14);
std::cout << "upsample_weighted from " << in.size() << " to " << upscaled.size() << ": ";
for (const auto i : upscaled) {
std::cout << i << " ";
}
std::cout << std::endl;
return 0;
}
output:
upsample_weighted from 3 to 14: 10 11.5385 13.0769 14.6154 16.1538 17.6923 19.2308 20.7692 22.3077 23.8462 25.3846 26.9231 28.4615 30
Code example 2, this one will "copy" every source elements and use weighted average only to fill up gaps between, so as much of original data as possible is preserved (for the price of the result being not linearly-upscaled of original data set, but "aliased" to "grid" defined by target size):
(the code is pretty much identical to first one, except one if line in the upscaler)
#include <iostream>
#include <vector>
#include <cassert>
static double get_upscale_value(const size_t total_weight, const size_t right_weight, const double left, const double right) {
// do the simple weighted average for demonstration purposes
const size_t left_weight = total_weight - right_weight;
return (left * left_weight + right * right_weight) / total_weight;
}
// identical to "upsample_weighted", except all source values from "in" are copied into result
// and only extra added values (to make the target size) are generated by "get_upscale_value"
std::vector<double> upsample_copy_preferred(std::vector<double>& in, size_t n)
{
assert( 2 <= in.size() && in.size() <= n ); // this is really only upscaling (can't downscale)
// resulting vector variable
std::vector<double> upscaled;
upscaled.reserve(n);
// upscaling factors variables and constants
size_t index_left = 0; // first "left" item is the in[0] element
size_t weight_right = 0; // and "right" has zero weight (i.e. in[0] is copied)
const size_t in_weight = n - 1; // total weight of single "in" element
const size_t weight_add = in.size() - 1; // shift of weight between "upscaled" elements
while (upscaled.size() < n) { // add N upscaled items
/* ! */ if (weight_right < weight_add) { /* ! this line is modified */
// most of the weight on left -> copy it (don't taint it by upscaling)
upscaled.push_back(in[index_left]);
} else {
// the weight is somewhere between "left" and "right" items of "in" vector
// i.e. weight = 1..(in_weight-1) ("in_weight" is full "right" value, never happens)
double upscaled_val = get_upscale_value(in_weight, weight_right, in[index_left], in[index_left+1]);
upscaled.push_back(upscaled_val);
}
weight_right += weight_add;
if (in_weight <= weight_right) {
// the weight shifted so much that "right" is new "left"
++index_left;
weight_right -= in_weight;
}
}
return upscaled;
}
int main(int argc, const char *argv[])
{
std::vector<double> in { 10, 20, 30 };
// std::vector<double> in { 20, 10, 40 };
std::vector<double> upscaled = upsample_copy_preferred(in, 14);
std::cout << "upsample_copy_preferred from " << in.size() << " to " << upscaled.size() << ": ";
for (const auto i : upscaled) {
std::cout << i << " ";
}
std::cout << std::endl;
return 0;
}
output:
upsample_copy_preferred from 3 to 14: 10 11.5385 13.0769 14.6154 16.1538 17.6923 19.2308 20 22.3077 23.8462 25.3846 26.9231 28.4615 30
(notice how "20.7692" from example 1 is here just "20" - copy of original sample, even if at that point the "30" has some small weight if linear interpolation is considered)

Divide elements of a sorted array into least number of groups such that difference between the elements of the new array is less than or equal to 1

How to divide elements in an array into a minimum number of arrays such that the difference between the values of elements of each of the formed arrays does not differ by more than 1?
Let's say that we have an array: [4, 6, 8, 9, 10, 11, 14, 16, 17].
The array elements are sorted.
I want to divide the elements of the array into a minimum number of array(s) such that each of the elements in the resulting arrays do not differ by more than 1.
In this case, the groupings would be: [4], [6], [8, 9, 10, 11], [14], [16, 17]. So there would be a total of 5 groups.
How can I write a program for the same? Or you can suggest algorithms as well.
I tried the naive approach:
Obtain the difference between consecutive elements of the array and if the difference is less than (or equal to) 1, I add those elements to a new vector. However this method is very unoptimized and straight up fails to show any results for a large number of inputs.
Actual code implementation:
#include<cstdio>
#include<iostream>
#include<vector>
using namespace std;
int main() {
int num = 0, buff = 0, min_groups = 1; // min_groups should start from 1 to take into account the grouping of the starting array element(s)
cout << "Enter the number of elements in the array: " << endl;
cin >> num;
vector<int> ungrouped;
cout << "Please enter the elements of the array: " << endl;
for (int i = 0; i < num; i++)
{
cin >> buff;
ungrouped.push_back(buff);
}
for (int i = 1; i < ungrouped.size(); i++)
{
if ((ungrouped[i] - ungrouped[i - 1]) > 1)
{
min_groups++;
}
}
cout << "The elements of entered vector can be split into " << min_groups << " groups." << endl;
return 0;
}
Inspired by Faruk's answer, if the values are constrained to be distinct integers, there is a possibly sublinear method.
Indeed, if the difference between two values equals the difference between their indexes, they are guaranteed to belong to the same group and there is no need to look at the intermediate values.
You have to organize a recursive traversal of the array, in preorder. Before subdividing a subarray, you compare the difference of indexes of the first and last element to the difference of values, and only subdivide in case of a mismatch. As you work in preorder, this will allow you to emit pieces of the groups in consecutive order, as well as detect to the gaps. Some care has to be taken to merge the pieces of the groups.
The worst case will remain linear, because the recursive traversal can degenerate to a linear traversal (but not worse than that). The best case can be better. In particular, if the array holds a single group, it will be found in time O(1). If I am right, for every group of length between 2^n and 2^(n+1), you will spare at least 2^(n-1) tests. (In fact, it should be possible to estimate an output-sensitive complexity, equal to the array length minus a fraction of the lengths of all groups, or similar.)
Alternatively, you can work in a non-recursive way, by means of exponential search: from the beginning of a group, you start with a unit step and double the step every time, until you detect a gap (difference in values too large); then you restart with a unit step. Here again, for large groups you will skip a significant number of elements. Anyway, the best case can only be O(Log(N)).
I would suggest encoding subsets into an offset array defined as follows:
Elements for set #i are defined for indices j such that offset[i] <= j < offset[i+1]
The number of subsets is offset.size() - 1
This only requires one memory allocation.
Here is a complete implementation:
#include <cassert>
#include <iostream>
#include <vector>
std::vector<std::size_t> split(const std::vector<int>& to_split, const int max_dist = 1)
{
const std::size_t to_split_size = to_split.size();
std::vector<std::size_t> offset(to_split_size + 1);
offset[0] = 0;
size_t offset_idx = 1;
for (std::size_t i = 1; i < to_split_size; i++)
{
const int dist = to_split[i] - to_split[i - 1];
assert(dist >= 0); // we assumed sorted input
if (dist > max_dist)
{
offset[offset_idx] = i;
++offset_idx;
}
}
offset[offset_idx] = to_split_size;
offset.resize(offset_idx + 1);
return offset;
}
void print_partition(const std::vector<int>& to_split, const std::vector<std::size_t>& offset)
{
const std::size_t offset_size = offset.size();
std::cout << "\nwe found " << offset_size-1 << " sets";
for (std::size_t i = 0; i + 1 < offset_size; i++)
{
std::cout << "\n";
for (std::size_t j = offset[i]; j < offset[i + 1]; j++)
{
std::cout << to_split[j] << " ";
}
}
}
int main()
{
std::vector<int> to_split{4, 6, 8, 9, 10, 11, 14, 16, 17};
std::vector<std::size_t> offset = split(to_split);
print_partition(to_split, offset);
}
which prints:
we found 5 sets
4
6
8 9 10 11
14
16 17
Iterate through the array. Whenever the difference between 2 consecutive element is greater than 1, add 1 to your answer variable.
`
int getPartitionNumber(int arr[]) {
//let n be the size of the array;
int result = 1;
for(int i=1; i<n; i++) {
if(arr[i]-arr[i-1] > 1) result++;
}
return result;
}
`
And because it is always nice to see more ideas and select the one that suites you best, here the straight forward 6 line solution. Yes, it is also O(n). But I am not sure, if the overhead for other methods makes it faster.
Please see:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
using Data = std::vector<int>;
using Partition = std::vector<Data>;
Data testData{ 4, 6, 8, 9, 10, 11, 14, 16, 17 };
int main(void)
{
// This is the resulting vector of vectors with the partitions
std::vector<std::vector<int>> partition{};
// Iterating over source values
for (Data::iterator i = testData.begin(); i != testData.end(); ++i) {
// Check,if we need to add a new partition
// Either, at the beginning or if diff > 1
// No underflow, becuase of boolean shortcut evaluation
if ((i == testData.begin()) || ((*i) - (*(i-1)) > 1)) {
// Create a new partition
partition.emplace_back(Data());
}
// And, store the value in the current partition
partition.back().push_back(*i);
}
// Debug output: Copy all data to std::cout
std::for_each(partition.begin(), partition.end(), [](const Data& d) {std::copy(d.begin(), d.end(), std::ostream_iterator<int>(std::cout, " ")); std::cout << '\n'; });
return 0;
}
Maybe this could be a solution . . .
How do you say your approach is not optimized? If your is correct, then according to your approach, it takes O(n) time complexity.
But you can use binary-search here which can optimize in average case. But in worst case this binary search can take more than O(n) time complexity.
Here's a tips,
As the array sorted so you will pick such a position whose difference is at most 1.
Binary search can do this in simple way.
int arr[] = [4, 6, 8, 9, 10, 11, 14, 16, 17];
int st = 0, ed = n-1; // n = size of the array.
int partitions = 0;
while(st <= ed) {
int low = st, high = n-1;
int pos = low;
while(low <= high) {
int mid = (low + high)/2;
if((arr[mid] - arr[st]) <= 1) {
pos = mid;
low = mid + 1;
} else {
high = mid - 1;
}
}
partitions++;
st = pos + 1;
}
cout<< partitions <<endl;
In average case, it is better than O(n). But in worst case (where the answer would be equal to n) it takes O(nlog(n)) time.

Checking if a vector is a min heap using recursion

I'm trying to write a program that checks if a vector is a min heap. I've been looking at code from here. I understand why they use 2*i+2 to compare to n, since there's a tipping point with the index where values in the vector/array (mine uses a vector) become leaf nodes. What I don't understand is why they keep using 2*i + 1 and 2*i + 2 as the index when they call the function recursively. Shouldn't they be using i+1 to access the left node and i+2 to access the right? But I tried this and I get a segmentation fault.
bool checkMinHeap(int A[], int i, int n)
{
// if i is a leaf node, return true as every leaf node is a heap
if (2*i + 2 > n)
return true;
// if i is an internal node
// recursively check if left child is heap
bool left = (A[i] <= A[2*i + 1]) && checkMinHeap(A, 2*i + 1, n);
// recursively check if right child is heap (to avoid array out
// of bound, we first check if right child exists or not)
bool right = (2*i + 2 == n) ||
(A[i] <= A[2*i + 2] && checkMinHeap(A, 2*i + 2, n));
// return true if both left and right child are heap
return left && right;
}
Their test code:
int main()
{
int A[] = {1, 2, 3, 4, 5, 6};
int n = sizeof(A) / sizeof(int);
// start with index 0 (root of the heap)
int index = 0;
if (checkMinHeap(A, index, n))
cout << "Given array is a min heap";
else
cout << "Given array is not a min heap";
return 0;
}
My test code (returns 0, when it should return 1):
int main (void)
{
vector <int> test;
test.push_back(1);
test.push_back(2);
test.push_back(3);
test.push_back(4);
test.push_back(5);
test.push_back(9);
test.push_back(3);
test.push_back(19);
cout << isMinHeap(test,0) << endl;
}
What I don't understand is why they keep using 2*i + 1 and 2*i + 2 as the index when they call the function recursively.
For instance, your heap data structure is following one.
These values are stored in an array, say A[i], i = 0, 1, …, 7.
In this picture, the blue circles i = 3 = 2*1+1 and i = 4 = 2*1+2 are the children of the green circle i = 1
Like this, in general, the left child of a parent i has an index 2*i+1 and the right one has index 2*i+2.
This is very general child-parent relationships in binary heap maps.
This is the reason why they keep using 2*i+1 and 2*i+2 as the index when they call the function recursively.

Finding number of pairs having sum as 0 in two different arrays

If I have two seperate sorted arrays, containing equal number of entries, and I need to find the number of pairs(both numbers should be from seperate arrays) having sum = 0 in linear time, how can I do that?
I can easily do it in O(n^2) but how to do it in linear time?
OR should I merge the two arrays and then proceed?
Thanks!
You don't need the arrays to be sorted.
Stick the numbers from one of the arrays into a hash table. Then iterate over the other array. For each number n, see if -n is in the hash table.
(If either array can contain duplicates, you need to take some care around handling them.)
P.S. You can exploit the fact that the arrays are sorted. Just iterate over them from the opposite ends once, looking for items that have the same value but the opposite signs. I leave figuring out the details as an exercise (hint: think of the merge step of merge sort).
Try this:
for(i=0;j=0;i<n&&j<n;)
{
if(arr1[i]+arr2[j]==0)
{
count++;
i++;
j++;
}
else if(arr[i]>arr[j])
{
j++;
}
else
{
i++;
}
}
Following may help:
std::size_t count_zero_pair(const std::vector<int>& v1, const std::vector<int>& v2)
{
assert(is_sorted(v1.begin(), v1.end()));
assert(is_sorted(v2.begin(), v2.end()));
std::size_t res = 0;
auto it1 = v1.begin();
auto it2 = v2.rbegin();
while (it1 != v1.end() && it2 != v2.rend()) {
const int sum = *it1 + *it2;
if (sum < 0) {
++it1;
} else if (0 < sum) {
++it2;
} else { // sum == 0
// may be more complicated depending
// how you want to manage duplicated pairs
++it1;
++it2;
++res;
}
}
return res;
}
If they are already sorted, you can traverse them, one frome left to right, one from right to left:
Take two pointers, and put one at the very left of one array, the other at the very right of the other array. Look at both values you currently point on. If the absolute value of one of these values is greater than the other, advance the greater one. If the absolute values are equal, report both values, and advance both pointers. Stop, as soon as the pointer coming from the left reaches a positive value, or the pointer from the right reaches a negative value. After that, do the same with the pointers starting at the resp. other ends of the arrays.
This is essentially the solution proposed by #Matthias with an added pointer to catch duplicates. If there is a string of duplicate values in arr2, searchStart will always point to the one with the highest index so that we can check the entire string against the next value in arr1. All values in arr1 are explicitly checked, so no extra duplicate handling is required.
int pairCount = 0;
for (int base=0, searchStart=arr2Size-1; base<arr1Size; base++) {
int searchCurrent = searchStart;
while (arr1[base]+arr2[searchCurrent] > 0) {
searchCurrent--;
if (searchCurrent < 0) break;
}
searchStart=searchCurrent;
if (searchStart < 0) break;
while (arr1[base]+arr2[searchCurrent] == 0) {
std::cout << "arr1[" << base << "] + arr2[" << searchCurrent << "] = ";
std::cout << "[" << arr1[base] << "," << arr2[searchCurrent] << "]\n";
pairCount++;
searchCurrent--;
}
}
std::cout << "pairCount = " << pairCount << "\n";
Given the arrays:
arr1[] = {-5, -3, -3, -2, -1, 0, 2, 4, 4, 5, 8};
arr2[] = {-7, -5, -5, -4, -3, -2, 1, 3, 4, 5, 6, 7, 8};
we get:
arr1[0] + arr2[9] = [-5,5]
arr1[1] + arr2[7] = [-3,3]
arr1[2] + arr2[7] = [-3,3]
arr1[4] + arr2[6] = [-1,1]
arr1[6] + arr2[5] = [2,-2]
arr1[7] + arr2[3] = [4,-4]
arr1[8] + arr2[3] = [4,-4]
arr1[9] + arr2[2] = [5,-5]
arr1[9] + arr2[1] = [5,-5]
pairCount = 9
Now we come to the question of time complexity. The construction of searchStart is such that for each value in arr1 can have an extra compare with one value in arr2 (but no more than 1). Otherwise, for arrays with no duplicates this checks each value in arr2 exactly once, so this algorithm runs in O(n).
If duplicate values are present, however, it complicates things a bit. Consider the arrays:
arr1 = {-3, -3, -3}
arr2 = { 3, 3, 3}
Clearly, since all O(n²) pairs equal zero, we have to count all O(n²) pairs. This means that in the worst case, the algorithm is O(n²) and this is the best we can do. It is possibly more constructive to say that the complexity is O(n + p) where p is the number of matching pairs.
Note that if you only want to count the number of matches rather than printing them all, you can do this in linear time as well. Just change when searchStart is updated to when the last match is found and keep a counter that equals the number of matches found for the current searchStart. Then if the next arr1[base] matches arr2[searchStart], add the counter to the number of pairs.