Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized - c++

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?

sort the list and subtract nearest two elements

The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.

The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.

No, unless you know the list is sorted, you need two

Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)

This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

Related

What is the Big-O of code that uses random number generators?

I want to fill the array 'a' with random values from 1 to N (no repeated values). Lets suppose Big-O of randInt(i, j) is O(1) and this function generates random values from i to j.
Examples of the output are:
{1,2,3,4,5} or {2,3,1,4,5} or {5,4,2,1,3} but not {1,2,1,3,4}
#include<set>
using std::set;
set<int> S;// space O(N) ?
int a[N]; // space O(N)
int i = 0; // space O(1)
do {
int val = randInt(1,N); //space O(1), time O(1) variable val is created many times ?
if (S.find(val) != S.end()) { //time O(log N)?
a[i] = val; // time O(1)
i++; // time O(1)
S.insert(val); // time O(log N) <-- we execute N times O(N log N)
}
} while(S.size() < N); // time O(1)
The While Loop will continue until we generate all the values from 1 to N.
My understanding is that Set sorts the values in logarithmic time log(N), and inserts in log(N).
Big-O = O(1) + O(X*log N) + O(N*log N) = O(X*log N)
Where X the more, the high probability to generate a number that is not in the Set.
time O(X log N)
space O(2N+1) => O(N), we reuse the space of val
Where ?? it is very hard to generate all different numbers each time randInt is executed, so at least I expect to execute N times.
Is the variable X created many times ?
What would be the a good value for X?
Suppose that the RNG is ideal. That is, repeated calls to randInt(1,N) generate an i.i.d. (independent and identically distributed) sequence of values uniformly distributed on {1,...,N}.
(Of course, in reality the RNG won't be ideal. But let's go with it since it makes the math easier.)
Average case
In the first iteration, a random value val1 is chosen which of course is not in the set S yet.
In the next iteration, another random value is chosen.
With probability (N-1)/N, it will be distinct from val1 and the inner conditional will be executed. In this case, call the chosen value val2.
Otherwise (with probability 1/N), the chosen value will be equal to val1. Retry.
How many iterations does it take on average until a valid (distinct from val1) val2 is chosen? Well, we have an independent sequence of attempts, each of which succeeds with probability (N-1)/N, and we want to know how many attempts it takes on average until the first success. This is a geometric distribution, and in general a geometric distribution with success probability p has mean 1/p. Thus, it takes N/(N-1) attempts on average to choose val2.
Similarly, it takes N/(N-2) attempts on average to choose val3 distinct from val1 and val2, and so on. Finally, the N-th value takes N/1 = N attempts on average.
In total the do loop will be executed
times on average. The sum is the N-th harmonic number which can be roughly approximated by ln(N). (There's a well-known better approximation which is a bit more complicated and involves the Euler-Mascheroni constant, but ln(N) is good enough for finding asymptotic complexity.)
So to an approximation, the average number of iterations will be N ln N.
What about the rest of the algorithm? Things like inserting N things into a set also take at most O(N log N) time, so can be disregarded. The big remaining thing is that each iteration you have to check if the chosen random value lies in S, which takes logarithmic time in the current size of S. So we have to compute
which, from numerical experiments, appears to be approximately equal to N/2 * (ln N)^2 for large N. (Consider asking for a proof of this on math.SE, perhaps.) EDIT: See this math.SE answer for a short informal proof, and the other answer to that question for a more formal proof.
So in conclusion, the total average complexity is Θ(N (ln N)^2).
Again, this is assuming that the RNG is ideal.
Worst case
Like xaxxon mentioned, it is in principle possible (though unlikely) that the algorithm will not terminate at all. Thus, the worst case complexity would be O(∞).
That's a very bad algorithm for achieving your goal.
Simply fill the array with the numbers 1 through N and then shuffle.
That's O(N)
https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
To shuffle, pick an index between 0 and N-1 and swap it with index 0. Then pick an index between 1 and N-1 and swap it with index 1. All the way until the end of the list.
In terms of your specific question, it depends on the behavior of your random number generator. If it's truly random, it may never complete. If it's pseudorandom, it depends on the period of the generator. If it has a period of 5, then you'll never have any dupes.
It's catastrophically bad code with complex behaviour. Generating the first number is O(1), Then the second involves a binary search, so a log N, plus a rerun of the generator should the number be found. The chance of getting an new number is p = 1- i/N. So the average number of re-runs is the reciprocal, and gives you another factor of N. So O(N^2 log N).
The way to do it is to generate the numbers, then shuffle them. That's O(N).

m smallest values of vector with size n (c++11)

I need the average of the nClose smallest values (except the first zero) in a vector with n elements where we know that nClose + 1 < n, there are only non-negative numbers, and the vector contains at least one zero value. Furthermore, nClose will be a lot smaller than n, say that nClose will be around 10 and n will be around 500.
Normally I will use min_element to find the minimum, however this is useless here since I need several values. At the moment I use the following code
sort(diff.begin(), diff.end());
double sum = accumulate(diff.begin() + 1, diff.begin() + 1 + nClose, 0);
double avg = sum / nClose;
Due to the sort it runs in O(n log n) where we can do it in O(nClose*n) by just find the minimum and remove it, then repeat this for nClose times. Know one of you how to accomplish this with the algorithms of c++11?
You can use std::nth_element for that.
nth_element(diff.begin(),diff.begin()+nClose+1, diff.end());
double sum = accumulate(diff.begin(), diff.begin() + 1 + nClose, 0);
double avg = sum / nClose;
Regarding your remark about finding the minimum and removing it: This would probably be even less efficient than your current solution, as removing the nth element requires all elements after the nth position to be moved one position to the left, effectively turning your algorithm into something like O(nClose*n^2).
Also, while this should be a pretty efficient solution, I'd warn you against putting too much weight on algorithmic complexity, as the constants may actually play a much bigger role than any advantage in Big O notation.

2 player team knowing maximum moves

Given a list of N players who are to play a 2 player game. Each of them are either well versed in making a particular move or they are not. Find out the maximum number of moves a 2-player team can know.
And also find out how many teams can know that maximum number of moves?
Example Let we have 4 players and 5 moves with ith player is versed in jth move if a[i][j] is 1 otherwise it is 0.
10101
11100
11010
00101
Here maximum number of moves a 2-player team can know is 5 and their are two teams that can know that maximum number of moves.
Explanation : (1, 3) and (3, 4) know all the 5 moves. So the maximal moves a 2-player team knows is 5, and only 2 teams can acheive this.
My approach : For each pair of players i check if any of the players is versed in ith move or not and for each player maintain the maximum pairs he can make with other players with his local maximum move combination.
vector<int> pairmemo;
for(int i=0;i<n;i++){
int mymax=INT_MIN;
int countpairs=0;
for(int j=i+1;j<n;j++){
int count=0;
for(int k=0;k<m;k++){
if(arr[i][k]==1 || arr[j][k]==1)
{
count++;
}
}
if(mymax<count){
mymax=count;
countpairs=0;
}
if(mymax==count){
countpairs++;
}
}
pairmemo.push_back(countpairs);
maxmemo.push_back(mymax);
}
Overall maximum of all N players is answer and count is corresponding sum of the pairs being calculated.
for(int i=0;i<n;i++){
if(maxi<maxmemo[i])
maxi=maxmemo[i];
}
int countmaxi=0;
for(int i=0;i<n;i++){
if(maxmemo[i]==maxi){
countmaxi+=pairmemo[i];
}
}
cout<<maxi<<"\n";
cout<<countmaxi<<"\n";
Time complexity : O((N^2)*M)
Code :
How can i improve it?
Constraints : N<= 3000 and M<=1000
If you represent each set of moves by a very large integer, the problem boils down to finding pair of players (I, J) which have maximum number of bits set in MovesI OR MovesJ.
So, you can use bit-packing and compress all the information on moves in Long integer array. It would take 16 unsigned long integers to store according to the constraints. So, for each pair of players you OR the corresponding arrays and count number of ones. This would take O(N^2 * 16) which would run pretty fast given the constraints.
Example:
Lets say given matrix is
11010
00011
and you used 4-bit integer for packing it.
It would look like:
1101-0000
0001-1000
that is,
13,0
1,8
After OR the moves array for 2 player team becomes 13,8, now count the bits which are one. You have to optimize the counting of bits also, for that read the accepted answer here, otherwise the factor M would appear in complexity. Just maintain one count variable and one maxNumberOfBitsSet variable as you process the pairs.
What Ill do is:
1. Do logical OR between all the possible pairs - O(N^2) and store it's SUM in a 2D array with the symmetric diagonal ignored. (thats we save half of the calc - see example)
2. find the max value in the 2D Array (can be done while doing task 1) -> O(1)
3. count how many cells in the 2D array equals to the maximum value in task 2 O(N^2)
sum: 2*O(N^2)+ O(1) => O(N^2)
Example (using the data in the question (with letters indexes):
A[10101] B[11100] C[11010] D[00101]
Task 1:
[A|B] = 11101 = SUM(4)
[A|C] = 11111 = SUM(5)
[A|D] = 10101 = SUM(3)
[B|C] = 11110 = SUM(4)
[B|D] = 11101 = SUM(4)
[C|D] = 11111 = SUM(5)
Task 2 (Done while is done 1):
Max = 5
Task 3:
Count = 2
By the way, O(N^2) is the minimum possible since you HAVE to check all the possible pairs.
Since you have to find all solutions, unless you find a way to find a count without actually finding the solutions themselves, you have to actually look at or eliminate all possible solutions. So the worst case will always be O(N^2*M), which I'll call O(n^3) as long as N and M are both big and similar size.
However, you can hope for much better performance on the average case by pruning.
Don't check every case. Find ways to eliminate combinations without checking them.
I would sum and store the total number of moves known to each player, and sort the array rows by that value. That should provide an easy check for exiting the loop early. Sorting at O(n log n) should be basically free in an O(n^3) algorithm.
Use Priyank's basic idea, except with bitsets, since you obviously can't use a fixed integer type with 3000 bits.
You may benefit from making a second array of bitsets for the columns, and use that as a mask for pruning players.

Generate a new element different from 1000 elements of an array

I was asked this questions in an interview. Consider the scenario of punched cards, where each punched card has 64 bit pattern. I was suggested each card as an int since each int is a collection of bits.
Also, to be considered that I have an array which already contains 1000 such cards. I have to generate a new element everytime which is different from the previous 1000 cards. The integers(aka cards) in the array are not necessarily sorted.
Even more, how would that be possible the question was for C++, where does the 64 bit int comes from and how can I generate this new card from the array where the element to be generated is different from all the elements already present in the array?
There are 264 64 bit integers, a number that is so much
larger than 1000 that the simplest solution would be to just generate a
random 64 bit number, and then verify that it isn't in the table of
already generated numbers. (The probability that it is is
infinitesimal, but you might as well be sure.)
Since most random number generators do not generate 64 bit values, you
are left with either writing your own, or (much simpler), combining the
values, say by generating 8 random bytes, and memcpying them into a
uint64_t.
As for verifying that the number isn't already present, std::find is
just fine for one or two new numbers; if you have to do a lot of
lookups, sorting the table and using a binary search would be
worthwhile. Or some sort of a hash table.
I may be missing something, but most of the other answers appear to me as overly complicated.
Just sort the original array and then start counting from zero: if the current count is in the array skip it, otherwise you have your next number. This algorithm is O(n), where n is the number of newly generated numbers: both sorting the array and skipping existing numbers are constants. Here's an example:
#include <algorithm>
#include <iostream>
unsigned array[] = { 98, 1, 24, 66, 20, 70, 6, 33, 5, 41 };
unsigned count = 0;
unsigned index = 0;
int main() {
std::sort(array, array + 10);
while ( count < 100 ) {
if ( count > array[index] )
++index;
else {
if ( count < array[index] )
std::cout << count << std::endl;
++count;
}
}
}
Here's an O(n) algorithm:
int64 generateNewValue(list_of_cards)
{
return find_max(list_of_cards)+1;
}
Note: As #amit points out below, this will fail if INT64_MAX is already in the list.
As far as I'm aware, this is the only way you're going to get O(n). If you want to deal with that (fairly important) edge case, then you're going to have to do some kind of proper sort or search, which will take you to O(n log n).
#arne is almost there. What you need is a self-balancing interval tree, which can be built in O(n lg n) time.
Then take the top node, which will store some interval [i, j]. By the properties of an interval tree, both i-1 and j+1 are valid candidates for a new key, unless i = UINT64_MIN or j = UINT64_MAX. If both are true, then you've stored 2^64 elements and you can't possibly generate a new element. Store the new element, which takes O(lg n) worst-case time.
I.e.: init takes O(n lg n), generate takes O(lg n). Both are worst-case figures. The greatest thing about this approach is that the top node will keep "growing" (storing larger intervals) and merging with its successor or predecessor, so the tree will actually shrink in terms of memory use and eventually the time per operation decays to O(1). You also won't waste any numbers, so you can keep generating until you've got 2^64 of them.
This algorithm has O(N lg N) initialisation, O(1) query and O(N) memory usage. I assume you have some integer type which I will refer to as int64 and that it can represent the integers [0, int64_max].
Sort the numbers
Create a linked list containing intervals [u, v]
Insert [1, first number - 1]
For each of the remaining numbers, insert [prev number + 1, current number - 1]
Insert [last number + 1, int64_max]
You now have a list representing the numbers which are not used. You can simply iterate over them to generate new numbers.
I think the way to go is to use some kind of hashing. So you store your cards in some buckets based on lets say on MOD operation. Until you create some sort of indexing you are stucked with looping over the whole array.
IF you have a look on HashSet implementation in java you might get a clue.
Edit: I assume you wanted them to be random numbers, if you don't mind sequence MAX+1 below is good solution :)
You could build a binary tree of the already existing elements and traverse it until you find a node whose depth is not 64 and which has less than two child nodes. You can then construct a "missing" child node and have a new element. The should be fairly quick, in the order of about O(n) if I'm not mistaken.
bool seen[1001] = { false };
for each element of the original array
if the element is in the range 0..1000
seen[element] = true
find the index for the first false value in seen
Initialization:
Don't sort the list.
Create a new array 1000 long containing 0..999.
Iterate the list and, if any number is in the range 0..999, invalidate it in the new array by replacing the value in the new array with the value of the first item in the list.
Insertion:
Use an incrementing index to the new array. If the value in the new array at this index is not the value of the first element in the list, add it to the list, else check the value from the next position in the new array.
When the new array is used up, refill it using 1000..1999 and invalidating existing values as above. Yes, this is looping over the list, but it doesn't have to be done for each insertion.
Near O(1) until the list gets so large that occasionally iterating it for invalidation of the 'new' new array becomes significant. Maybe you could mitigate this by using a new array that grows, maybee always the size of the list?
Rgds,
Martin
Put them all into a hash table of size > 1000, and find the empty cell (this is the parking problem). Generate a key for that. This will of course work better for bigger table size. The table needs only 1-bit entries.
EDIT: this is the pigeonhole principle.
This needs "modulo tablesize" (or some other "semi-invertible" function) for a hash function.
unsigned hashtab[1001] = {0,};
unsigned long long long long numbers[1000] = { ... };
void init (void)
{
unsigned idx;
for (idx=0; idx < 1000; idx++) {
hashtab [ numbers[idx] % 1001 ] += 1; }
}
unsigned long long long long generate(void)
{
unsigned idx;
for (idx = 0; idx < 1001; idx++) {
if ( !hashtab [ idx] ) break; }
return idx + rand() * 1001;
}
Based on the solution here: question on array and number
Since there are 1000 numbers, if we consider their remainders with 1001, at least one remainder will be missing. We can pick that as our missing number.
So we maintain an array of counts: C[1001], which will maintain the number of integers with remainder r (upon dividing by 1001) in C[r].
We also maintain a set of numbers for which C[j] is 0 (say using a linked list).
When we move the window over, we decrement the count of the first element (say remainder i), i.e. decrement C[i]. If C[i] becomes zero we add i to the set of numbers. We update the C array with the new number we add.
If we need one number, we just pick a random element from the set of j for which C[j] is 0.
This is O(1) for new numbers and O(n) initially.
This is similar to other solutions but not quite.
How about something simple like this:
1) Partition the array into numbers equal and below 1000 and above
2) If all the numbers fit within the lower partition then choose 1001 (or any number greater than 1000) and we're done.
3) Otherwise we know that there must exist a number between 1 and 1000 that doesn't exist within the lower partition.
4) Create a 1000 element array of bools, or a 1000-element long bitfield, or whatnot and initialize the array to all 0's
5) For each integer in the lower partition, use its value as an index into the array/bitfield and set the corresponding bool to true (ie: do a radix sort)
6) Go over the array/bitfield and pick any unset value's index as the solution
This works in O(n) time, or since we've bounded everything by 1000, technically it's O(1), but O(n) time and space in general. There are three passes over the data, which isn't necessarily the most elegant approach, but the complexity remains O(n).
you can create a new array with the numbers that are not in the original array, then just pick one from this new array.
¿O(1)?

USACO: Subsets (Inefficient)

I am trying to solve subsets from the USACO training gateway...
Problem Statement
For many sets of consecutive integers from 1 through N (1 <= N <= 39), one can partition the set into two sets whose sums are identical.
For example, if N=3, one can partition the set {1, 2, 3} in one way so that the sums of both subsets are identical:
{3} and {1,2}
This counts as a single partitioning (i.e., reversing the order counts as the same partitioning and thus does not increase the count of partitions).
If N=7, there are four ways to partition the set {1, 2, 3, ... 7} so that each partition has the same sum:
{1,6,7} and {2,3,4,5}
{2,5,7} and {1,3,4,6}
{3,4,7} and {1,2,5,6}
{1,2,4,7} and {3,5,6}
Given N, your program should print the number of ways a set containing the integers from 1 through N can be partitioned into two sets whose sums are identical. Print 0 if there are no such ways.
Your program must calculate the answer, not look it up from a table.
End
Before I was running on a O(N*2^N) by simply permuting through the set and finding the sums.
Finding out how horribly inefficient that was, I moved on to mapping the sum sequences...
http://en.wikipedia.org/wiki/Composition_(number_theory)
After many coding problems to scrape out repetitions, still too slow, so I am back to square one :(.
Now that I look more closely at the problem, it looks like I should try to find a way to not find the sums, but actually go directly to the number of sums via some kind of formula.
If anyone can give me pointers on how to solve this problem, I'm all ears. I program in java, C++ and python.
Actually, there is a better and simpler solution. You should use Dynamic Programming
instead. In your code, you would have an array of integers (whose size is the sum), where each value at index i represents the number of ways to possibly partition the numbers so that one of the partitions has a sum of i. Here is what your code could look like in C++:
int values[N];
int dp[sum+1]; //sum is the sum of the consecutive integers
int solve(){
if(sum%2==1)
return 0;
dp[0]=1;
for(int i=0; i<N; i++){
int val = values[i]; //values contains the consecutive integers
for(int j=sum-val; j>=0; j--){
dp[j+val]+=dp[j];
}
}
return dp[sum/2]/2;
}
This gives you an O(N^3) solution, which is by far fast enough for this problem.
I haven't tested this code, so there might be a syntax error or something, but you get the point. Let me know if you have any more questions.
This is the same thing as finding the coefficient x^0 term in the polynomial (x^1+1/x)(x^2+1/x^2)...(x^n+1/x^n), which should take about an upper bound of O(n^3).