Fast per-column summation. Possible? - c++

Please look at this picture:
Is it possible to find per-column sum for all columns faster than in O(n^2)?
Firstly I thought it's possible to make it n * log(n), if we regroup summation like this (to sum 2 rows at time, then remaining 2 rows, and then remaining 2 rows...):
But then I counted the number of pluses and it came out to be equal in both cases - 7 = 7 from both pictures.
So is it possible to compose such a sum in n * log(n) time, or I have fooled myself (I know there are FHT or FFT like transforms, so that might be the case)?

No, our input size is O(n^2), so our algorithm can not be faster than that (because we are using all the input values).
This is assuming that n is the amount of rows, that the matrix is square (giving n^2) and there is no special relation between the elements.

No. You need to read (at least) n^2 items from memory, which takes (at least) O(n^2) time.1
1. Assuming n is the number of columns (or number of rows).

It cannot be done better then O(n^2) unless you have more knowledge on the matrix.
You need to read each element in the matrix to get the correct sum for each column, so you get a lower bound of Omega(n^2)
Also, note that your idea is O(n^2), because even at the first iteration, you summaize have n * (n/2) sum ops, which is O(n^2)

Related

Fast generation of random derangements

I am looking to generate derangements uniformly at random. In other words: shuffle a vector so that no element stays in its original place.
Requirements:
uniform sampling (each derangement is generated with equal probability)
a practical implementation is faster than the rejection method (i.e. keep generating random permutations until we find a derangement)
None of the answers I found so far are satisfactory in that they either don't sample uniformly (or fail to prove uniformity) or do not make a practical comparison with the rejection method. About 1/e = 37% of permutations are derangements, which gives a clue about what performance one might expect at best relative to the rejection method.
The only reference I found which makes a practical comparison is in this thesis which benchmarks 7.76 s for their proposed algorithm vs 8.25 s for the rejection method (see page 73). That's a speedup by a factor of only 1.06. I am wondering if something significantly better (> 1.5) is possible.
I could implement and verify various algorithms proposed in papers, and benchmark them. Doing this correctly would take quite a bit of time. I am hoping that someone has done it, and can give me a reference.
Here is an idea for an algorithm that may work for you. Generate the derangement in cycle notation. So (1 2) (3 4 5) represents the derangement 2 1 4 5 3. (That is (1 2) is a cycle and so is (3 4 5).)
Put the first element in the first place (in cycle notation you can always do this) and take a random permutation of the rest. Now we just need to find out where the parentheses go for the cycle lengths.
As https://mathoverflow.net/questions/130457/the-distribution-of-cycle-length-in-random-derangement notes, in a permutation, a random cycle is uniformly distributed in length. They are not randomly distributed in derangements. But the number of derangements of length m is m!/e rounded up for even m and down for odd m. So what we can do is pick a length uniformly distributed in the range 2..n and accept it with the probability that the remaining elements would, proceeding randomly, be a derangement. This cycle length will be correctly distributed. And then once we have the first cycle length, we repeat for the next until we are done.
The procedure done the way I described is simpler to implement but mathematically equivalent to taking a random derangement (by rejection), and writing down the first cycle only. Then repeating. It is therefore possible to prove that this produces all derangements with equal probability.
With this approach done naively, we will be taking an average of 3 rolls before accepting a length. However we then cut the problem in half on average. So the number of random numbers we need to generate for placing the parentheses is O(log(n)). Compared with the O(n) random numbers for constructing the permutation, this is a rounding error. However it can be optimized by noting that the highest probability for accepting is 0.5. So if we accept with twice the probability of randomly getting a derangement if we proceeded, our ratios will still be correct and we get rid of most of our rejections of cycle lengths.
If most of the time is spent in the random number generator, for large n this should run at approximately 3x the rate of the rejection method. In practice it won't be as good because switching from one representation to another is not actually free. But you should get speedups of the order of magnitude that you wanted.
this is just an idea but i think it can produce a uniformly distributed derangements.
but you need a helper buffer with max of around N/2 elements where N is the size of the items to be arranged.
first is to choose a random(1,N) position for value 1.
note: 1 to N instead of 0 to N-1 for simplicity.
then for value 2, position will be random(1,N-1) if 1 fall on position 2 and random(1,N-2) otherwise.
the algo will walk the list and count only the not-yet-used position until it reach the chosen random position for value 2, of course the position 2 will be skipped.
for value 3 the algo will check if position 3 is already used. if used, pos3 = random(1,N-2), if not, pos3 = random(1,N-3)
again, the algo will walk the list and count only the not-yet-used position until reach the count=pos3. and then position the value 3 there.
this will goes for the next values until totally placed all the values in positions.
and that will generate a uniform probability derangements.
the optimization will be focused on how the algo will reach pos# fast.
instead of walking the list to count the not-yet-used positions, the algo can used a somewhat heap like searching for the positions not yet used instead of counting and checking positions 1 by 1. or any other methods aside from heap-like searching. this is a separate problem to be solved: how to reached an unused item given it's position-count in a list of unused-items.
I'm curious ... and mathematically uninformed. So I ask innocently, why wouldn't a "simple shuffle" be sufficient?
for i from array_size downto 1: # assume zero-based arrays
j = random(0,i-1)
swap_elements(i,j)
Since the random function will never produce a value equal to i it will never leave an element where it started. Every element will be moved "somewhere else."
Let d(n) be the number of derangements of an array A of length n.
d(n) = (n-1) * (d(n-1) + d(n-2))
The d(n) arrangements are achieved by:
1. First, swapping A[0] with one of the remaining n-1 elements
2. Next, either deranging all n-1 remaning elements, or deranging
the n-2 remaining that excludes the index
that received A[0] from the initial matrix.
How can we generate a derangement uniformly at random?
1. Perform the swap of step 1 above.
2. Randomly decide which path we're taking in step 2,
with probability d(n-1)/(d(n-1)+d(n-2)) of deranging all remaining elements.
3. Recurse down to derangements of size 2-3 which are both precomputed.
Wikipedia has d(n) = floor(n!/e + 0.5) (exactly). You can use this to calculate the probability of step 2 exactly in constant time for small n. For larger n the factorial can be slow, but all you need is the ratio. It's approximately (n-1)/n. You can live with the approximation, or precompute and store the ratios up to the max n you're considering.
Note that (n-1)/n converges very quickly.

finding the average number of comparisons

I would like to write an algorithm to find min and max of 100000 arrays 100000 with the size of 1000, containing random numbers from 1 to 1000. This algorithm suppose to return the average number of comparisons.
Suppose I use a naive solution with the complexity of O(n) , what does the average number of comparisons suppose to be? 1999 or 2000 (to min and max)?
I also would like to ask how to creat a random array in cpp.
You have to compare every element twice (once to the current min, once to the current max).
That's not "naive", it's the optimal way to find min and max of unsorted numbers.
There is a big difference between naive and simple / optimal ,it's always good to look for other solutions , but not always you are going to find a better or more optimal one .
as for your question you have to compare them twice once as a min and once as a max
Well, I disagree with the Sid's answer as checking each element with max and min is optimal.
Firstly, you can take first two elements and compare them together and set one as min and another as maximum.
Then you can easily in loop take 2 elements at once, compare them together and check lower one with the minimum and bigger one with the maximum.
Therefore on 2 elements you have only 3 comparisons.
It is better than checking each number with minimum and maximum, because you have 4 comparisons on 2 elements.

sum of all the elements in an array in less than O(n)

I tried by adding all even places and odd places in a loop then add both to get final answer making complexity o(n/2) but I need a better way
In the general case, where all you know is that there is an array of n elements, it is impossible to compute the sum of all of the elements in less than O(n) time.
However, if the elements in the array follow a pattern there is likely a mathematical formula which is much faster.
If you know you will need to compute the sum of the array while you build it, you can calculate the sum as you build the array, but this will still take O(n) time, just at a different point in your code.
In general, certain things simply can't be done faster than O(n). If a result depends on the values of n things, of which you know nothing, then it can't be computed without at least looking at the values of all n things, which takes O(n) time.
You could manage the array and update the sum when there is changes. This shifts the time to the modifying operations and you technically calculate (or not) the sum in zero time.

Algorithm to find min and max in a given set

A large array array[n] of integers is given as input. Two index values are given - start,end. It is desired to find very quickly - min & max in the set [start,end] (inclusive) and max in the rest of array (excluding [start,end]).
eg-
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
start,end - 2,7
min,max in [2,7] -- 1,12
max in rest - 10
I cannot think of anything better than linear. But this is not good enough as n is of order 10^5 and the number of such find operations is also of the same order.
Any help would be highly appreciated.
The way I understand your question is that you want to do some preprocessing on a fixed array that then makes your find max operation very fast.
This answers describes an approach that does O(nlogn) preprocessing work, followed by O(1) work for each query.
Preprocessing O(nlogn)
The idea is to prepare two 2d arrays BIG[a,k] and SMALL[a,k] where
1. BIG[a,k] is the max of the 2^k elements starting at a
2. SMALL[a,k] is the min of the 2^k elements starting at a
You can compute this arrays in a recursive way by starting at k==0 and then building up the value for each higher element by combining two previous elements together.
BIG[a,k] = max(BIG[a,k-1] , BIG[a+2^(k-1),k-1])
SMALL[a,k] = min(SMALL[a,k-1] , SMALL[a+2^(k-1),k-1])
Lookup O(1) per query
You are then able to instantly find the max and min for any range by combining 2 preprepared answers.
Suppose you want to find the max for elements from 100 to 133.
You already know the max of 32 elements 100 to 131 (in BIG[100,5]) and also the max of 32 elements from 102 to 133 (in BIG[102,5]) so you can find the largest of these to get the answer.
The same logic applies for the minimum. You can always find two overlapping prepared answers that will combine to give the answer you need.
You're asking for a data structure that will answer min and max queries for intervals on an array quickly.
You want to build two segment trees on your input array; one for answering interval minimum queries and one for answering interval maximum queries. This takes linear preprocessing, linear extra space, and allows queries to take logarithmic time.
I am afraid, that there is no faster way. Your data is completly random, and in that way, you have to go through every value.
Even sorting wont help you, because its at best O(n log n), so its slower. You cant use bisection method, because data are not sorted. If you start building data structures (like heap), it will again be O(n log n) at the best.
If the array is very large, then split it into partitions and use threads to do a linear check of each partition. Then do min/max with the results from the threads.
Searching for min and max in an unsorted array can only be optimized by taking two values at a time and comparing them to each other first:
register int min, max, i;
min = max = array[0] ;
for(i = 1; i + 1 < length; i += 2)
{
if(array[i] < array[i+1])
{
if(min > array[i]) min = array[i];
if(max < array[i+1]) max = array[i+1];
}
else
{
if(min > array[i+1]) min = array[i];
if(max < array[i]) max = array[i+1];
}
}
if(i < length)
if(min > array[i]) min = array[i];
else if(max < array[i]) max = array[i];
But I don't believe it's actually faster. Consider writing it in assembly.
EDIT:
When comparing strings, this algorithm could make the difference!
If you kinda know the min you can test from x to min if the value exists in the array. If you kinda know the max, you can test (backwards) from y to max, if the value exists in array, you found max.
For example, from your array, I will assume you have only positive integers.:
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
You set x to be 0, test if 0 exists, doesn't, you then change it to 1, you find 1. there is your min.
You set y to be 15 (arbitrary large number): exists? no. set to 14. exists? no, set to 13. exists? no. set to 12. exists? yes! there is your max! I just made 4 comparisons.
If y exists from the first try, you might have tested a value INSIDE the array. So you test it again with y + length / 2. Assume you found the center of the array, so decal it a bit. If again you found the value from the first try, it might be within the array.
If you have negative and/or float values, this technique does not work :)
Of course it is not possible to have sub-linear algorithm (as far as I know) to search the way you want. However, you can achieve sub-linear time is some cases by storing fixed ranges of min-max and with some knowledge of the range you can improve search time.
e.g. if you know that 'most' of the time range of search will be say 10 then you can store min-max of 10/2 = 5 elements separately and index those ranges. During search you have to find the superset of ranges that can subsume search-range.
e.g. in the example
array - 3 4 2 2 1 3 12 5 7 9 7 10 1 5 2 3 1 1
start,end - 2,7
min,max in [2,7] -- 1,12
if you 'know' that most of the time search range would be 5 elements then, you can index the min-max beforehand like: since 5/2 = 2,
0-1 min-max (3,4)
2-3 min-max (2,2)
4-5 min-max (1,3)
6-7 min-max (5,12)
...
I think, this method will work better when ranges are large so that storing min-max avoids some searches.
To search min-max [2-7] you have to search the stored indexes like: 2/2 = 1 to 7/2 = 3,
then min of mins(2,1,5) will give you the minimum (1) and max of maxes (2,3,12) will give you the maximum(12). In case of overlap you will have to search only the corner indexes (linearly). Still it could avoid several searches I think.
It is possible that this algorithm is slower than linear search (because linear search has a very good locality of reference) so I would advise you to measure them first.
Linear is the best you can do, and its relatively easy to prove it.
Assume an infinite amount instantaneous memory storage and costless access, just so we can ignore them.
Furthermore, we'll assume away your task of finding min/max in a substring. We will think of them both as essentially the exact same mechanical problem. One just magically keeping track of the numbers smaller than other numbers in a comparison, and one magically keeping track of the numbers bigger than in a comparison. This action is assumed to be costless.
Lets then assume away the min/max of the sub-array problem, because its just the same problem as the min/max of any array, and we'll magically assume that it is solved and as part of our general action of finding the max in the bigger array. We can do this by assuming that the biggest number in the entire array is in fact the first number we look at by some magical fluke, and it is also the biggest number in the sub-array, and also happens to be the smallest number in the sub-array, but we just don't happen to know how lucky we are. How can we find out?
The least work we have to do is one comparison between it and every other number in the array to prove it is the biggest/smallest. This is the only action we are assuming has a cost.
How many comparisons do we have to do? We'll let N be the length of the array, and the total number of operations for any length N is N - 1. As we add elements to the array, the number of comparisons scales at the same rate even if all of our widely outrageous assumptions held true.
So we've arrived at the point where N is both the length of the array, and the determinant of the increasing cost of the best possible operation in our wildly unrealistic best case scenario.
Your operation scales with N in the best case scenario. I'm sorry.
/sorting the inputs must be more expensive than this minimal operation, so it would only be applicable if you were doing the operation multiple times, and had no way of storing the actual results, which doesn't seem likely, because 10^5 answers is not exactly taxing.
//multithreading and the like is all well and good too, just assume away any cost of doing so, and divide N by the number of threads. The best algorithm possible still scales linearly however.
///I'm guessing it would in fact have to be a particularly curious phenomenon for anything to ever scale better than linearly without assuming things about the data...stackoverflowers?

Missing number(s) Interview Question Redux

The common interview problem of determining the missing value in a range from 1 to N has been done a thousand times over. Variations include 2 missing values up to K missing values.
Example problem: Range [1,10] (1 2 4 5 7 8 9 10) = {3,6}
Here is an example of the various solutions:
Easy interview question got harder: given numbers 1..100, find the missing number(s)
My question is that seeing as the simple case of one missing value is of O(n) complexity and that the complexity of the larger cases converge at roughly something larger than O(nlogn):
Couldn't it just be easier to answer the question by saying sort (mergesort) the range and iterate over it observing the missing elements?
This solution should take no more than O(nlogn) and is capable of solving the problem for ranges other than 1-to-N such as 10-to-1000 or -100 to +100 etc...
Is there any reason to believe that the given solutions in the above SO link will be better than the sorting based solution for larger number of missing values?
Note: It seems a lot of the common solutions to this problem, assume an only number theoretic approach. If one is being asked such a question in an S/E interview wouldn't it be prudent to use a more computer science/algorithmic approach, assuming the approach is on par with the number theoretic solution's complexity...
More related links:
https://mathoverflow.net/questions/25374/duplicate-detection-problem
How to tell if an array is a permutation in O(n)?
You are only specifying the time complexity, but the space complexity is also important to consider.
The problem complexity can be specified in term of N (the length of the range) and K (the number of missing elements).
In the question you link, the solution of using equations is O(K) in space (or perhaps a bit more ?), as you need one equation per unknown value.
There is also the preservation point: may you alter the list of known elements ? In a number of cases this is undesirable, in which case any solution involving reordering the elements, or consuming them, must first make a copy, O(N-K) in space.
I cannot see faster than a linear solution: you need to read all known elements (N-K) and output all unknown elements (K). Therefore you cannot get better than O(N) in time.
Let us break down the solutions
Destroying, O(N) space, O(N log N) time: in-place sort
Preserving, O(K) space ?, O(N log N) time: equation system
Preserving, O(N) space, O(N) time: counting sort
Personally, though I find the equation system solution clever, I would probably use either of the sorting solutions. Let's face it: they are much simpler to code, especially the counting sort one!
And as far as time goes, in a real execution, I think the "counting sort" would beat all other solutions hands down.
Note: the counting sort does not require the range to be [0, X), any range will do, as any finite range can be transposed to the [0, X) form by a simple translation.
EDIT:
Changed the sort to O(N), one needs to have all the elements available to sort them.
Having had some time to think about the problem, I also have another solution to propose. As noted, when N grows (dramatically) the space required might explode. However, if K is small, then we could change our representation of the list, using intervals:
{4, 5, 3, 1, 7}
can be represented as
[1,1] U [3,5] U [7,7]
In the average case, maintaining a sorted list of intervals is much less costly than maintaining a sorted list of elements, and it's as easy to deduce the missing numbers too.
The time complexity is easy: O(N log N), after all it's basically an insertion sort.
Of course what's really interesting is that there is no need to actually store the list, thus you can feed it with a stream to the algorithm.
On the other hand, I have quite a hard time figuring out the average space complexity. The "final" space occupied is O(K) (at most K+1 intervals), but during the construction there will be much more missing intervals as we introduce the elements in no particular order.
The worst case is easy enough: N/2 intervals (think odd vs even numbers). I cannot however figure out the average case though. My gut feeling is telling me it should be better than O(N), but I am not that trusting.
Whether the given solution is theoretically better than the sorting one depends on N and K. While your solution has complexity of O(N*log(N)), the given solution is O(N*K). I think that the given solution is (same as the sorting solution) able to solve any range [A, B] just by transforming the range [A, B] to [1, N].
What about this?
create your own set containing all the numbers
remove the given set of numbers from your set (no need to sort)
What's left in your set are the missing numbers.
My question is that seeing as the [...] cases converge at roughly
something larger than O(nlogn) [...]
In 2011 (after you posted this question) Caf posted a simple answer that solves the problem in O(n) time and O(k) space [where the array size is n - k].
Importantly, unlike in other solutions, Caf's answer has no hidden memory requirements (using bit array's, adding numbers to elements, multiplying elements by -1 - these would all require O(log(n)) space).
Note: The question here (and the original question) didn't ask about the streaming version of the problem, and the answer here doesn't handle that case.
Regarding the other answers: I agree that many of the proposed "solutions" to this problem have dubious complexity claims, and if their time complexities aren't better in some way than either:
count sort (O(n) time and space)
compare (heap) sort (O(n*log(n)) time, O(1) space)
...then you may as well just solve the problem by sorting.
However, we can get better complexities (and more importantly, genuinely faster solutions):
Because the numbers are taken from a small, finite range, they can be 'sorted' in linear time.
All we do is initialize an array of 100 booleans, and for each input, set the boolean corresponding to each number in the input, and then step through reporting the unset booleans.
If there are total N elements where each number x is such that 1 <= x <= N then we can solve this in O(nlogn) time complexity and O(1) space complexity.
First sort the array using quicksort or mergesort.
Scan through the sorted array and if the difference between previously scanned number, a and current number, b is equal to 2 (b - a = 2), then the missing number is a+1. This can be extended to condition where (b - a > 2).
Time complexity is O(nlogn)+O(n) almost equal to O(nlogn) when N > 100.
I already answered it HERE
You can also create an array of boolean of the size last_element_in_the_existing_array + 1.
In a for loop mark all the element true that are present in the existing array.
In another for loop print the index of the elements which contains false AKA The missing ones.
Time Complexity: O(last_element_in_the_existing_array)
Space Complexity: O(array.length)
If the range is given to you well ahead, in this case range is [1,10] you can perform XOR operation with your range and the numbers given to you. Since XOR is commutative operation. You will be left with {3,6}
(1 2 3 4 5 6 7 8 9 10) XOR (1 2 4 5 7 8 9 10) ={3,6}