Finding Maximum of an interval in array using binary trees

Finding Maximum of an interval in array using binary trees - c++

I'm studying Binary trees! and i have a problem in this Homework.
I have to use binary trees to solve this problem
here is the problem :
You are given a list of integers. You then need to answer a number of questions of the form: "What is the maximum value of the elements of the list content between the A index and the index B?".
example :
INPUT :
10
2 4 3 5 7 19 3 8 6 7
4
1 5
3 6
8 10
3 9
OUTPUT:
7
19
8
19
TIME LIMITS AND MEMORY (Language: C + +)
Time: 0.5s on a 1GHz machine.
Memory: 16000 KB
CONSTRAINTS
1 <= N <= 100000, where N is the number of elements in the list.
1 <= A, B <= N, where A, B are the limits of a range.
1 <= I <= 10 000, where I is the number of intervals.
Please do not give me the solution just a hint !
Thanks so much !

As already discussed in the comments, to make things simple, you can add entries to the array to make its size a power of two, so the binary tree has the same depth for all leaves. It doesn't really matter what elements you add to this list, as you won't use these computed values in the actual algorithm.
In the binary tree, you have to compute the maxima in a bottom-up manner. These values then tell you the maximum of the whole range these nodes are representing; this is the major idea of the tree.
What remains is splitting a query into such tree nodes, so they represent the original interval using less nodes than the size of the interval. Figure out "the pattern" of the intervals the tree nodes represent. Then figure out a way to split the input interval into as few nodes as possible. Maybe start with the trivial solution: just split the input in leave nodes, i.e. single elements. Then figure out how you can "combine" multiple elements from the interval using inner nodes from the tree. Find an algorithm doing this for you by not using the tree (since this would require a linear time in the number of elements, but the whole idea of the tree is to make it logarithmic).

Write some code which works with an interval of size 0. It will be very simple.
Then write some for a interval of size 1. It will still be simple.
Then write some for an interval of size 2. It may need a comparison. It will still be simple.
Then write some for an interval of size 3. It may involve a choice of which interval of size 2 to compare. This isn't too hard.
Once you've done this, it should be easy to make it work with any interval size.

An array would be the best data structure for this problem.
But given you need to use a binary tree, I would store (index, value) in the binary
tree and key on index.

Related

Find the number of ways to partition the array

I want number of ways to divide an array of possitive integers such that maximum value of left part of array is greater than or equal to the maximum value of right part of the array.
For example,
6 4 1 2 1 can be divide into:
[[6,4,1,2,1]] [[6][4,1,2,1]] [[6,4][1,2,1]] [[6,4,1][2,1]] [[6,4,1,2][1]] [[6][4,1][2,1]] [[6][4][1,2,1]] [[6][4][1,2][1]] [[6][4,1,2][1]] [[6,4,1][2][1]] [[6][4,1][2][1]] [[6,4][1,2][1]]
which are total 12 ways of partitioning.
I tried a recursive approach but it fails beacause of termination due to exceed of time limit. Also this approach is not giving correct output always.
In this another approach, I took the array ,sort it in decreasing order and then for each element I checked weather it lies on right of the original array, and if does then added it's partitions to it's previous numbers too.
I want an approach to solve this, any implementation or pseudocode or just an idea to do this would be appreciable.

I designed a simple recursive algorithm. I will try to explain on your example;
First, check if [6] is a possible/valid part of a partition.
It is a valid partition because maximum element of ([6]) is bigger than remaining part's ([4,1,2,1]) maximum value.
Since it is a valid partition, we can use recursive part of the algorithm.
concatenate([6],algorithm([4,1,2,1]))
now the partitions
[[6][4,1,2,1]], [[6][4,1][2,1]], [[6][4,1][2,1]] [[6][4][1,2,1]] [[6][4][1,2][1]] [[6][4,1,2][1]]
are in our current solution set.
Check if [6,4] is a possible/valid part of a partition.
Continue like this until reaching [6,4,1,2,1].

Which is the best algorithm to find sum of all subarray?

for example if
a[3]={1,2,3}
1,2,3,1+2,2+3,1+2+3
so my program should print
1
2
3
3
5
8
I know there exits a formula for calculating the total sum.
But what i want is efficient method to calculate individual sums.
A seg tree advisable?

Assuming that by subarray you mean what some authors conventionally call subarray being a contiguous slice of an array, you are looking for Kadane's algorithm.
It works by incrementally finding the biggest subarray. At any given point of the search, the maximum subarray on that index is either the empty array (sum == zero) or consists of one more element than the maximum subarray that ended at the previous position. You keep track of what is the best you've ever found so you can compare subarrays with the best so far and return the actual best solution.
It may also be extended to multiple dimensions.

Building of segment tree

Given an integer array A of n element and m query each query contain an integer x i have to answer number of element in a array less than x.
0 < A[i] < 10^6 && x < 10^6
example:
A[]={105,2,9,3,8,5,7,7}
query
6
8
104
answer
3
5
7
Explanation:
for query1 elements are={2,3,5}
for query2 elements are={2,3,5,7,7}
for query3 elements are={2,9,3,8,5,7,7}
Question:
How to solve this question using segment tree?(I have built segment tree for finding max,min and sum in a range but my mind is going blank how to built segment tree for this).please explain with example
Note:I already know the nlogn solution using sorting and binary search(for each query).I want to learn how segment tree can be exploited to solve this.
Thank you

I don't think the segment tree would work if you build it for elements of the array A. You may use some heuristics/pruning using maximum and minimum for a segment, but in the end for cases like
0, 10^6, 0, 10^6, 0, 10^6,...
the queries will degenerate to O(n), because you need to go down in every leaf.
What you should do is to build a segment tree over the range of the possible values: For every value 0<a<10^6 you remember how many elements with this value are in the array A. For example for
A=[5,2,3,3,3,5,7,7]
the array of occurrences would be
f=[0,0,1,3,0,2,0,1,0,...]
Now the query for the number of elements in an array A which are less-equal than x, translates to the query for the sum of elements in the occurrences array f from 0 til x.
You could use segment tree to answer this queries.
However if you know the whole array prior to the queries - this is a pretty boring case - you could just use the prefix sum on the array f with preprocessing time O(n) and query time O(1).
Segment trees are only interesting if queries and updates of the array A are interleaved.
If queries and updates are interleaved, I would recommend to use the Fenwicktree, it is not as flexible as a segment tree, but it is tailored for exactly this kind of problems. It is easier to implement, faster and needs less memory.

Using a normal segment tree it is not possible to answer this type of queries in O(logn).
You need to use wavelet tree (there are also few other data structures that enable answering this query but wavelet trees are most fun). This links might be helpful if you don't know this data structure:
https://codeforces.com/blog/entry/52854
https://youtu.be/4aSv9PcecDw

Algorithm to find min and max in a given set

A large array array[n] of integers is given as input. Two index values are given - start,end. It is desired to find very quickly - min & max in the set [start,end] (inclusive) and max in the rest of array (excluding [start,end]).
eg-
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
start,end - 2,7
min,max in [2,7] -- 1,12
max in rest - 10
I cannot think of anything better than linear. But this is not good enough as n is of order 10^5 and the number of such find operations is also of the same order.
Any help would be highly appreciated.

The way I understand your question is that you want to do some preprocessing on a fixed array that then makes your find max operation very fast.
This answers describes an approach that does O(nlogn) preprocessing work, followed by O(1) work for each query.
Preprocessing O(nlogn)
The idea is to prepare two 2d arrays BIG[a,k] and SMALL[a,k] where
1. BIG[a,k] is the max of the 2^k elements starting at a
2. SMALL[a,k] is the min of the 2^k elements starting at a
You can compute this arrays in a recursive way by starting at k==0 and then building up the value for each higher element by combining two previous elements together.
BIG[a,k] = max(BIG[a,k-1] , BIG[a+2^(k-1),k-1])
SMALL[a,k] = min(SMALL[a,k-1] , SMALL[a+2^(k-1),k-1])
Lookup O(1) per query
You are then able to instantly find the max and min for any range by combining 2 preprepared answers.
Suppose you want to find the max for elements from 100 to 133.
You already know the max of 32 elements 100 to 131 (in BIG[100,5]) and also the max of 32 elements from 102 to 133 (in BIG[102,5]) so you can find the largest of these to get the answer.
The same logic applies for the minimum. You can always find two overlapping prepared answers that will combine to give the answer you need.

You're asking for a data structure that will answer min and max queries for intervals on an array quickly.
You want to build two segment trees on your input array; one for answering interval minimum queries and one for answering interval maximum queries. This takes linear preprocessing, linear extra space, and allows queries to take logarithmic time.

I am afraid, that there is no faster way. Your data is completly random, and in that way, you have to go through every value.
Even sorting wont help you, because its at best O(n log n), so its slower. You cant use bisection method, because data are not sorted. If you start building data structures (like heap), it will again be O(n log n) at the best.

If the array is very large, then split it into partitions and use threads to do a linear check of each partition. Then do min/max with the results from the threads.

Searching for min and max in an unsorted array can only be optimized by taking two values at a time and comparing them to each other first:
register int min, max, i;
min = max = array[0] ;
for(i = 1; i + 1 < length; i += 2)
{
if(array[i] < array[i+1])
{
if(min > array[i]) min = array[i];
if(max < array[i+1]) max = array[i+1];
}
else
{
if(min > array[i+1]) min = array[i];
if(max < array[i]) max = array[i+1];
}
}
if(i < length)
if(min > array[i]) min = array[i];
else if(max < array[i]) max = array[i];
But I don't believe it's actually faster. Consider writing it in assembly.
EDIT:
When comparing strings, this algorithm could make the difference!

If you kinda know the min you can test from x to min if the value exists in the array. If you kinda know the max, you can test (backwards) from y to max, if the value exists in array, you found max.
For example, from your array, I will assume you have only positive integers.:
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
You set x to be 0, test if 0 exists, doesn't, you then change it to 1, you find 1. there is your min.
You set y to be 15 (arbitrary large number): exists? no. set to 14. exists? no, set to 13. exists? no. set to 12. exists? yes! there is your max! I just made 4 comparisons.
If y exists from the first try, you might have tested a value INSIDE the array. So you test it again with y + length / 2. Assume you found the center of the array, so decal it a bit. If again you found the value from the first try, it might be within the array.
If you have negative and/or float values, this technique does not work :)

Of course it is not possible to have sub-linear algorithm (as far as I know) to search the way you want. However, you can achieve sub-linear time is some cases by storing fixed ranges of min-max and with some knowledge of the range you can improve search time.
e.g. if you know that 'most' of the time range of search will be say 10 then you can store min-max of 10/2 = 5 elements separately and index those ranges. During search you have to find the superset of ranges that can subsume search-range.
e.g. in the example
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
start,end - 2,7
min,max in [2,7] -- 1,12
if you 'know' that most of the time search range would be 5 elements then, you can index the min-max beforehand like: since 5/2 = 2,
0-1 min-max (3,4)
2-3 min-max (2,2)
4-5 min-max (1,3)
6-7 min-max (5,12)
...
I think, this method will work better when ranges are large so that storing min-max avoids some searches.
To search min-max [2-7] you have to search the stored indexes like: 2/2 = 1 to 7/2 = 3,
then min of mins(2,1,5) will give you the minimum (1) and max of maxes (2,3,12) will give you the maximum(12). In case of overlap you will have to search only the corner indexes (linearly). Still it could avoid several searches I think.
It is possible that this algorithm is slower than linear search (because linear search has a very good locality of reference) so I would advise you to measure them first.

Linear is the best you can do, and its relatively easy to prove it.
Assume an infinite amount instantaneous memory storage and costless access, just so we can ignore them.
Furthermore, we'll assume away your task of finding min/max in a substring. We will think of them both as essentially the exact same mechanical problem. One just magically keeping track of the numbers smaller than other numbers in a comparison, and one magically keeping track of the numbers bigger than in a comparison. This action is assumed to be costless.
Lets then assume away the min/max of the sub-array problem, because its just the same problem as the min/max of any array, and we'll magically assume that it is solved and as part of our general action of finding the max in the bigger array. We can do this by assuming that the biggest number in the entire array is in fact the first number we look at by some magical fluke, and it is also the biggest number in the sub-array, and also happens to be the smallest number in the sub-array, but we just don't happen to know how lucky we are. How can we find out?
The least work we have to do is one comparison between it and every other number in the array to prove it is the biggest/smallest. This is the only action we are assuming has a cost.
How many comparisons do we have to do? We'll let N be the length of the array, and the total number of operations for any length N is N - 1. As we add elements to the array, the number of comparisons scales at the same rate even if all of our widely outrageous assumptions held true.
So we've arrived at the point where N is both the length of the array, and the determinant of the increasing cost of the best possible operation in our wildly unrealistic best case scenario.
Your operation scales with N in the best case scenario. I'm sorry.
/sorting the inputs must be more expensive than this minimal operation, so it would only be applicable if you were doing the operation multiple times, and had no way of storing the actual results, which doesn't seem likely, because 10^5 answers is not exactly taxing.
//multithreading and the like is all well and good too, just assume away any cost of doing so, and divide N by the number of threads. The best algorithm possible still scales linearly however.
///I'm guessing it would in fact have to be a particularly curious phenomenon for anything to ever scale better than linearly without assuming things about the data...stackoverflowers?

Counting Number of Paths (Length N) in Bipartite Graph

I am currently counting the number of paths of length $n$ in a bipartite graph by doing a depth first search (up to 10 levels). However, my implementation of this takes 5+ minutes to count 7 million paths of length 5 from a bipartite graph with 3000+ elements. I am looking for a more efficient way to do this counting problem, and I am wondering if there is any such algorithm in the literature.
These are undirected bipartite graphs, so there can be cycles in the paths.
My goal here is to count the number of paths of length $n$ in a bipartite graph of 1 million elements under a minute.
Thank you in advance for any suggested answers.

I agree with the first idea but it's not quite a BFS. In a BFS you go through each node once, here you can go a large number of times.
You have to keep 2 arrays (let's call it Cnt1, and Cnt2, Cnt1 is the number of times you have reached an element and you have a path of length i, and Cnt2 is the same but for length i + 1). First time all the elements are 0 in Cnt2 and 1 in Cnt1( because you have one path of length zero starting at each node).
Repeat N times:
Go through all the nodes
For the current node you go through all his connected nodes and for each you add at there position on Cnt2 the number of times you reached the current node in Cnt1.
When you finished all the nodes you just Copy Cnt2 in Cnt1 and make Cnt2 zero.
At the end you just add all the numbers of Cnt1 and that is the answer.

Convert to a breadth-first search, and whenever you have 2 paths that lead to the same node at the same length, just keep track of how many such ways there are and not how you got there.
This will avoid a lot of repeated work and should provide a significant speedup. (If n is not small, there are better speedups, read on.)
My goal here is to count the number of paths of length n in a bipartite graph of 1 million elements under a minute.
Um, good luck?
An alternate approach to look into is if you take the adjacency matrix of the graph, and raise it to the n'th power, all of the entries of the matrix you get are the number of paths of length end starting in one place, ending in another. So you can take shortcuts like repeated squaring. Convenient, isn't that?
Unfortunately a million element graph gives rise to an adjacency matrix with 10^12 entries. Multiplying two such matrices with a naive algorithm should require 10^18 operations. Of course we have better matrix multiplication algorithms, but you're still not getting below, say, 10^15 operations. Which will most assuredly not complete in 1 minute. (If your matrix is sparse enough you might have a chance, but you should do some researching on the topic.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js