Building of segment tree - c++

Given an integer array A of n element and m query each query contain an integer x i have to answer number of element in a array less than x.
0 < A[i] < 10^6 && x < 10^6
example:
A[]={105,2,9,3,8,5,7,7}
query
6
8
104
answer
3
5
7
Explanation:
for query1 elements are={2,3,5}
for query2 elements are={2,3,5,7,7}
for query3 elements are={2,9,3,8,5,7,7}
Question:
How to solve this question using segment tree?(I have built segment tree for finding max,min and sum in a range but my mind is going blank how to built segment tree for this).please explain with example
Note:I already know the nlogn solution using sorting and binary search(for each query).I want to learn how segment tree can be exploited to solve this.
Thank you

I don't think the segment tree would work if you build it for elements of the array A. You may use some heuristics/pruning using maximum and minimum for a segment, but in the end for cases like
0, 10^6, 0, 10^6, 0, 10^6,...
the queries will degenerate to O(n), because you need to go down in every leaf.
What you should do is to build a segment tree over the range of the possible values: For every value 0<a<10^6 you remember how many elements with this value are in the array A. For example for
A=[5,2,3,3,3,5,7,7]
the array of occurrences would be
f=[0,0,1,3,0,2,0,1,0,...]
Now the query for the number of elements in an array A which are less-equal than x, translates to the query for the sum of elements in the occurrences array f from 0 til x.
You could use segment tree to answer this queries.
However if you know the whole array prior to the queries - this is a pretty boring case - you could just use the prefix sum on the array f with preprocessing time O(n) and query time O(1).
Segment trees are only interesting if queries and updates of the array A are interleaved.
If queries and updates are interleaved, I would recommend to use the Fenwicktree, it is not as flexible as a segment tree, but it is tailored for exactly this kind of problems. It is easier to implement, faster and needs less memory.

Using a normal segment tree it is not possible to answer this type of queries in O(logn).
You need to use wavelet tree (there are also few other data structures that enable answering this query but wavelet trees are most fun). This links might be helpful if you don't know this data structure:
https://codeforces.com/blog/entry/52854
https://youtu.be/4aSv9PcecDw

Related

Hash that returns the same value for all numbers in range?

I'm working on a problem where I have an entire table from a database in memory at all times, with a low range and high range of 9-digit numbers. I'm given a 9-digit number that I need to use to lookup the rest of the columns in the table based on whether that number falls in the range. For example, if the range was 100,000,000 to 125,000,000 and I was given a number 117,123,456, then I would know that I'm in the 100-125 mil range, and whatever vector of data that points to is what I will be using.
Now the best I can think of for lookup time is log(n) run time. This is OK, at best, but still pretty slow. The table has at least 100,000 entries and I will need to look up values in this table tens-of-thousands, if not hundred-thousands of times, per execution of this application (10+ times/day).
So I was wondering if it was possible to use an unordered_set instead, writing my own Hash function that ALWAYS returns the same hash-value for every number in range. Using the same example above, 100,000,000 through 125,000,000 will always return, for example, a hash value of AB12CD. Then when I use the lookup value of 117,123,456, I will get that same AB12CD hash and have a lookup time of O(1).
Is this possible, and if so, any ideas how?
Thanks in advance.
Yes. Assuming that you can number your intervals in order, you could fit a polynomial to your cutoff values, and receive an index value from the polynomial. For instance, with cutoffs of 100,000,000, 125,000,000, 250,000,000, and 327,000,000, you could use points (100, 0), (125, 1), (250, 2), and (327, 3), restricting the first derivative to [0, 1]. Assuming that you have decently-behaved intervals, you'll be able to fit this with an (N+2)th-degree polynomial for N cutoffs.
Have a table of desired hash values; use floor[polynomial(i)] for the index into the table.
Can you write such a hash function? Yes. Will evaluating it be slower than a search? Well there's the catch...
I would personally solve this problem as follows. I'd have a sorted vector of all values. And then I'd have a jump table of indexes into that vector based on the value of n >> 8.
So now your logic is that you look in the jump table to figure out where you are jumping to and how many values you should consider. (Just look at where you land versus the next index to see the size of the range.) If the whole range goes to the same vector, you're done. If there are only a few entries, do a linear search to find where you belong. If they are a lot of entries, do a binary search. Experiment with your data to find when binary search beats a linear search.
A vague memory suggests that the tradeoff is around 100 or so because predicting a branch wrong is expensive. But that is a vague memory from many years ago, so run the experiment for yourself.

What is the fastest data structure to search and update list of integer values?

I have to maintain a list of unordered integers , where number of integers are unknown. It may increase or decrease over the time. I need to update this list of integers frequently. I have tried using vector . But it is really slow . Array appears to be faster , but since the length of list is not fixed, it takes significant amount of time to resize it . Please suggest any other option .
Use a hash table, if order of the values in unimportant. Time is O(1). I'm pretty sure you'll find an implementation in the standard template libraries.
Failing that, a splay tree is extremely fast, especially if you want to keep the list ordered: amortized cost of O(ln n) per operation, with a very low constant factor. I think C++ stdlib map is something like this.
Know thy data structures.
If you are interested in Dynamic increments of Arrays size you can do this .
current =0;
x = (int**)malloc(temp * sizeof(int*));
x[current]=(int*)malloc(RequiredLength * sizeof(int));
So add elements to array and when elements are filled in x[current]
You can add more space for elements by doing
x[++current]=(int*)malloc(RequiredLength * sizeof(int));
Doing this you can accommodate for RequiredLength more elements .
You can repeat this upto 1024 times which means 1024*RequiredLength elements can be
accommodated , here it gives you chance to increase size of array whenever you want it .
You can always access the n th element by X[ n / 1024 ][ n % 1024] ;
Considering your comments, it looks like it is std::set or std::unordered_set fits your needs better than std::vector.
If sequential data structures fails to meet requirements, you could try looking at trees (binary, AVL, m-way, red-black ect ...). I would suggest you try to implement AVL tree since it yields a balanced or near balanced binary search tree which would optimize your operation. For more on AVL tree: http://en.wikipedia.org/wiki/AVL_tree
well,deque has no resize cost,but if it's unordered,it's search time is linear ,and its delete and insert operation time in the middle of its self is even worth than vector.
if you don't need search by the value of the number,hashmap or map may be your choice .No resize cost.,then you set the key of the map to number's index,and the value to the number's value.the search and insert operation is better than linear.
std::list is definitely created for such problems, adding and deleting elements in list do not necessitate memory re-allocations like in vector. However, due to the noncontagious memory allocation of the list, searching elements may prove to be a painful experience ofcourse but if you do not search its entries frequently, it can be used.

Find that unique element from the 10^5 array size [duplicate]

This question already has answers here:
How to find the only number in an array that doesn't occur twice [duplicate]
(5 answers)
Closed 7 years ago.
What would be the best algorithm for finding a number that occurs only once in a list which has all other numbers occurring exactly twice.
So, in the list of integers (lets take it as an array) each integer repeats exactly twice, except one. To find that one, what is the best algorithm.
The fastest (O(n)) and most memory efficient (O(1)) way is with the XOR operation.
In C:
int arr[] = {3, 2, 5, 2, 1, 5, 3};
int num = 0, i;
for (i=0; i < 7; i++)
num ^= arr[i];
printf("%i\n", num);
This prints "1", which is the only one that occurs once.
This works because the first time you hit a number it marks the num variable with itself, and the second time it unmarks num with itself (more or less). The only one that remains unmarked is your non-duplicate.
By the way, you can expand on this idea to very quickly find two unique numbers among a list of duplicates.
Let's call the unique numbers a and b. First take the XOR of everything, as Kyle suggested. What we get is a^b. We know a^b != 0, since a != b. Choose any 1 bit of a^b, and use that as a mask -- in more detail: choose x as a power of 2 so that x & (a^b) is nonzero.
Now split the list into two sublists -- one sublist contains all numbers y with y&x == 0, and the rest go in the other sublist. By the way we chose x, we know that a and b are in different buckets. We also know that each pair of duplicates is still in the same bucket. So we can now apply ye olde "XOR-em-all" trick to each bucket independently, and discover what a and b are completely.
Bam.
O(N) time, O(N) memory
HT= Hash Table
HT.clear()
go over the list in order
for each item you see
if(HT.Contains(item)) -> HT.Remove(item)
else
ht.add(item)
at the end, the item in the HT is the item you are looking for.
Note (credit #Jared Updike): This system will find all Odd instances of items.
comment: I don't see how can people vote up solutions that give you NLogN performance. in which universe is that "better" ?
I am even more shocked you marked the accepted answer s NLogN solution...
I do agree however that if memory is required to be constant, then NLogN would be (so far) the best solution.
Kyle's solution would obviously not catch situations were the data set does not follow the rules. If all numbers were in pairs the algorithm would give a result of zero, the exact same value as if zero would be the only value with single occurance.
If there were multiple single occurance values or triples, the result would be errouness as well.
Testing the data set might well end up with a more costly algorithm, either in memory or time.
Csmba's solution does show some errouness data (no or more then one single occurence value), but not other (quadrouples). Regarding his solution, depending on the implementation of HT, either memory and/or time is more then O(n).
If we cannot be sure about the correctness of the input set, sorting and counting or using a hashtable counting occurances with the integer itself being the hash key would both be feasible.
I would say that using a sorting algorithm and then going through the sorted list to find the number is a good way to do it.
And now the problem is finding "the best" sorting algorithm. There are a lot of sorting algorithms, each of them with its strong and weak points, so this is quite a complicated question. The Wikipedia entry seems like a nice source of info on that.
Implementation in Ruby:
a = [1,2,3,4,123,1,2,.........]
t = a.length-1
for i in 0..t
s = a.index(a[i])+1
b = a[s..t]
w = b.include?a[i]
if w == false
puts a[i]
end
end
You need to specify what you mean by "best" - to some, speed is all that matters and would qualify an answer as "best" - for others, they might forgive a few hundred milliseconds if the solution was more readable.
"Best" is subjective unless you are more specific.
That said:
Iterate through the numbers, for each number search the list for that number and when you reach the number that returns only a 1 for the number of search results, you are done.
Seems like the best you could do is to iterate through the list, for every item add it to a list of "seen" items or else remove it from the "seen" if it's already there, and at the end your list of "seen" items will include the singular element. This is O(n) in regards to time and n in regards to space (in the worst case, it will be much better if the list is sorted).
The fact that they're integers doesn't really factor in, since there's nothing special you can do with adding them up... is there?
Question
I don't understand why the selected answer is "best" by any standard. O(N*lgN) > O(N), and it changes the list (or else creates a copy of it, which is still more expensive in space and time). Am I missing something?
Depends on how large/small/diverse the numbers are though. A radix sort might be applicable which would reduce the sorting time of the O(N log N) solution by a large degree.
The sorting method and the XOR method have the same time complexity. The XOR method is only O(n) if you assume that bitwise XOR of two strings is a constant time operation. This is equivalent to saying that the size of the integers in the array is bounded by a constant. In that case you can use Radix sort to sort the array in O(n).
If the numbers are not bounded, then bitwise XOR takes time O(k) where k is the length of the bit string, and the XOR method takes O(nk). Now again Radix sort will sort the array in time O(nk).
You could simply put the elements in the set into a hash until you find a collision. In ruby, this is a one-liner.
def find_dupe(array)
h={}
array.detect { |e| h[e]||(h[e]=true; false) }
end
So, find_dupe([1,2,3,4,5,1]) would return 1.
This is actually a common "trick" interview question though. It is normally about a list of consecutive integers with one duplicate. In this case the interviewer is often looking for you to use the Gaussian sum of n-integers trick e.g. n*(n+1)/2 subtracted from the actual sum. The textbook answer is something like this.
def find_dupe_for_consecutive_integers(array)
n=array.size-1 # subtract one from array.size because of the dupe
array.sum - n*(n+1)/2
end

Fast Algorithm for finding largest values in 2d array

I have a 2D array (an image actually) that is size N x N. I need to find the indices of the M largest values in the array ( M << N x N) . Linearized index or the 2D coords are both fine. The array must remain intact (since it's an image). I can make a copy for scratch, but sorting the array will bugger up the indices.
I'm fine with doing a full pass over the array (ie. O(N^2) is fine). Anyone have a good algorithm for doing this as efficiently as possible?
Selection is sorting's austere sister (repeat this ten times in a row). Selection algorithms are less known than sort algorithms, but nonetheless useful.
You can't do better than O(N^2) (in N) here, since nothing indicates that you must not visit each element of the array.
A good approach is to keep a priority queue made of the M largest elements. This makes something O(N x N x log M).
You traverse the array, enqueuing pairs (elements, index) as you go. The queue keeps its elements sorted by first component.
Once the queue has M elements, instead of enqueuing you now:
Query the min element of the queue
If the current element of the array is greater, insert it into the queue and discard the min element of the queue
Else do nothing.
If M is bigger, sorting the array is preferable.
NOTE: #Andy Finkenstadt makes a good point (in the comments to your question) : you definitely should traverse your array in the "direction of data locality": make sure that you read memory contiguously.
Also, this is trivially parallelizable, the only non parallelizable part is when you merge the queues when joining the sub processes.
You could copy the array into a single dimensioned array of tuples (value, original X, original Y ) and build a basic heap out of it in (O(n) time), provided you implement the heap as an array.
You could then retrieve the M largest tuples in O(M lg n) time and reference their original x and y from the tuple.
If you are going to make a copy of the input array in order to do a sort, that's way worse than just walking linearly through the whole thing to pick out numbers.
So the question is how big is your M? If it is small, you can store results (i.e. structs with 2D indexes and values) in a simple array or a vector. That'll minimize heap operations but when you find a larger value than what's in your vector, you'll have to shift things around.
If you expect M to get really large, then you may need a better data structure like a binary tree (std::set) or use sorted std::deque. std::set will reduce number of times elements must be shifted in memory, while if you use std::deque, it'll do some shifting, but it'll reduce number of times you have to go to the heap significantly, which may give you better performance.
Your problem doesn't use the 2 dimensions in any interesting way, it is easier to consiger the equivalent problem in a 2d array.
There are 2 main ways to solve this problem:
Mantain a set of M largest elements, and iterate through the array. (Using a heap allows you to do this efficiently).
This is simple and is probably better in your case (M << N)
Use selection, (the following algorithm is an adaptation of quicksort):
Create an auxiliary array, containing the indexes [1..N].
Choose an arbritary index (and corresponding value), and partition the index array so that indexes corresponding to elements less go to the left, and bigger elements go to the right.
Repeat the process, binary search style until you narrow down the M largest elements.
This is good for cases with large M. If you want to avoid worst case issues (the same quicksort has) then look at more advanced algorithms, (like median of medians selection)
How many times do you search for the largest value from the array?
If you only search 1 time, then just scan through it keeping the M largest ones.
If you do it many times, just insert the values into a sorted list (probably best implemented as a balanced tree).

What elegant solution exists for this pattern? Multi-Level Searching

Assume that we have multiple arrays of integers. You can consider each array as a level. We try to find a sequence of elements, exactly one element from each array, and proceed to the next array with the same predicate. For example, we have v1, v2, v3 as the arrays:
v1 | v2 | v3
-----------------
1 | 4 | 16
2 | 5 | 81
3 | 16 | 100
4 | 64 | 121
I could say that the predicate is: next_element == previous_element^2
A valid sequence from the above example is: 2 -> 4 -> 16
Actually, in this example there isn't another valid sequence.
I could write three loops to brute-force the mentioned example, but what if the number of arrays is variable, but with know order of course, how would you solve this problem?
Hints, or references to design patters are very appreciated. I shall do it in C++, but I just need the idea.
Thanks,
If you order your arrays beforehand, the search can be done much faster. You could start on your smaller array, then binary-search for expected numbers on each of them. This would be O(nklogM), n being the size of the smallest array, k being the numbers of arrays, M being the size of larger array
This could be done even faster if you use Hashmaps instead of arrays. This would let you search in O(n*k).
If using reverse functions (to search in earlier arrays) is not an option, then you should start on first array, and n = size of first array.
For simplicity, I'll start from first array
//note the 1-based arrays
for (i : 1 until allArrays[1].size()) {
baseNumber = allArrays[1][i];
for (j: 2 until allArrays.size()) {
expectedNumber = function(baseNumber);
if (!find(expectedNumber, allArrays[j]))
break;
baseNumber = expectedNumber;
}
}
You can probably do some null checks and add some booleans in there to know if the sequence exist or not
(Design patterns apply to class and API design to improve code quality, but they aren't for solving computational problems.)
Depending on the cases:
If the arrays comes in random order, and you have finite space requirement, then brute-force is the only solution. O(Nk) time (k = 3), O(1) space.
If the predicate is not invertible (e.g. SHA1(next_elem) xor SHA1(prev_elem) == 0x1234), then brute force is also the only solution.
If you can expense space, then create hash sets for v2 and v3, so you can quickly find the next element that satisfies the predicate. O(N + bk) time, O(kN) space. (b = max number of next_elem that satisfy the predicate given a prev_elem)
If the arrays are sorted and bounded, you can also use binary search instead of the hash table to avoid using space. O(N (log N)k-1 + bk) time, O(1) space.
(All of the space count doesn't take account to stack usage due to recursion.)
A general way that consumes up to O(Nbk) space is to build the solution by successively filtering, e.g.
solutions = [[1], [2], ... [N]]
filterSolution (Predicate, curSols, nextElems) {
nextSols = []
for each curSol in curSols:
find elem in nextElems that satisfy the Predicate
append elem into a copy of curSol, then push into nextSols
return nextSols
}
for each levels:
solutions = filterSolution(Predicate, solutions, all elems in this level)
return solutions
You could generate a seperate index that map an index from one array to the index of another. From the index you can quickly see if an solution exists or not.
Generating the index would require a brute force approach but then you'd do it only one. If you want to improve the array search, consider using a more appropriate data structure to allow for fast search (red-black trees for example instead of arrays).
I would keep all vectors as heaps so I can have O(log n) complexity when searching for an element. So for a total of k vectors you will get a complexity like O(k * log n)
If the predicates preserve the ordering in the arrays (e.g. with your example, if the values are all guaranteed non-negative), you could adapt a merge algorithm. Consider the key for each array to be the end-value (what you get after applying the predicate as many times as needed for all arrays).
If the predicate doesn't preserve ordering (or the arrays aren't ordered to start) you can sort by the end-value first, but the need to do that suggests that another approach may be better (such as the hash tables suggested elsewhere).
Basically, check whether the next end-value is equal for all arrays. If not, step over the lowest (in one array only) and repeat. If you get all three equal, that is a (possible) solution - step over all three before searching for the next.
"Possible" solution because you may need to do a check - if the predicate function can map more than one input value to the same output value, you might have a case where the value found in some arrays (but not the first or last) is wrong.
EDIT - there may be bigger problems when the predicate doesn't map each input to a unique output - can't think at the moment. Basically, the merge approach can work well, but only for certain kinds of predicate function.