I have n amount of vectors, say 3, and they have n amount of elements (not necessarily the same amount). I need to choose x amount of combinations between them. Like choose 2 from vectors[n].
Example:
std::vector<int> v1(3), v2(5), v3(2);
There cannot be combinations from one vector itself, like v1[0] and v1[1]. How can I do this?
I've tried everything, but cannot figure this out.
If I understand you correctly you have N vectors, each with a different number of elements (call the size of the ith vector Si) and you which to choose M combinations of elements from these vectors without repetition. Each combination would be N elements, one element from each vector.
In this case the number of possible permutations is the product of the sizes of the vectors, which, for lack of some form of equation setting I'll call P and compute in C++:
std::vector<size_t> S(N);
// ...populate S...
size_t P = 1;
for(size_t i=0;i<S.size();++i)
P *= S[i];
So now the problem becomes one of picking M distinct numbers between 0 and P-1, then converting each of those M numbers into N indices into the original vectors. I can think of a few ways to compute those M numbers, perhaps the easiest is to keep drawing random numbers until you get M distinct ones (effectively rejection sampling from the distribution).
The slightly more convoluted part is to turn each of your M numbers into a vector of indices. We can do this with
size_t m = /* ... one of the M permutations */;
std::vector<size_t> indices_m(N);
for(size_t i=0; i<N; ++i)
{
indices[i] = m % S[i];
m /= S[i];
}
which basically chops m up into chunks for each index, much like you would when indexing a 2D array represented as a 1D array.
Now if we take your N=3 example we can get the 3 elements of our permutation with
v1[indices[0]]
v2[indices[1]]
v3[indices[2]]
generating as many distinct values of m as required.
Probably the confusion rises from improper definition of the problem. Guessing that you need to N times pick 1 element from 1 of V vectors, you can do this:
select N of the V vectors you want to pick from (N <= V)
for each of the selected vectors, select 1 of the vector.size() elements.
Related
I have two arrays, one sorted array int b[] and other unsorted array int a[n] having n elements . The sorted array is made of some or all elements of unsorted array. Now there are M queries. For each query values of l and r are given. In each query I need to find the number of elements of a[n] which are present in b[].
For eg -
N=5 ,M=2
a= [2 5 1 2 3]
b=[3 2 1]
for each m:
l=1 r=5 ->a[1]=1, a[5]=5 -> answer should be 3 as all elements of b i.e 1,2,3 are present in a
l=2 r=4 ->a[2]=5 , a[4]=2 ->answer should be 2 as only 1 and 2 are there in b for given value of l and r for array.
How to find the answer with not more than O(M * LOGN) time complexity ?
NOTE:
Array is not necessary. Vector can also be used that is if it helps in reducing time complexity or easier to implement the code.
Well i think you can do something like this
std::map<int,int> c;
for(int i = 0;i<b.length.i++){
c[b[i]] = 0;
}
for(int i = l; i<=r; i++){
int number = a[i];
c[number]++;
}
//Iterate through c with b index and get all number which different than 0. The left is for you
The purpose of this is creating a map hold index of B. Then while iterating A you can increase the c value. So that after that you can check whether each element in C has value different than zero mean that A has hold the number of B.
You can use array instead of map if C starting from zero and increase by 1 for better performance. Make sure to check if a[i] can throw out of bounds exception if you use array.
Suppose you are given an n sized array A and a integer k
Now you have to follow this function:
long long sum(int k)
{
long long sum=0;
for(int i=0;i<n;i++){
sum+=min(A[i],k);
}
return sum;
}
what is the most efficient way to find sum?
EDIT: if I am given m(<=100000) queries, and given a different k every time, it becomes very time consuming.
If set of queries changes with each k then you can't do better than in O(n). Your only options for optimizing is to use multiple threads (each thread sums some region of array) or at least ensure that your loop is properly vectorized by compiler (or write vectorized version manually using intrinsics).
But if set of queries is fixed and only k is changed, then you may do in O(log n) by using following optimization.
Preprocess array. This is done only once for all ks:
Sort elements
Make another array of the same length which contains partial sums
For example:
inputArray: 5 1 3 8 7
sortedArray: 1 3 5 7 8
partialSums: 1 4 9 16 24
Now, when new k is given, you need to perform following steps:
Make binary search for given k in sortedArray -- returns index of maximal element <= k
Result is partialSums[i] + (partialSums.length - i) * k
You can do way better than that if you can sort the array A[i] and have a secondary array prepared once.
The idea is:
Count how many items are less than k, and just compute the equivalent sum by the formula: count*k
Prepare an helper array which will give you the sum of the items superior to k directly
Preparation
Step 1: sort the array
std::sort(begin(A), end(A));
Step 2: prepare an helper array
std::vector<long long> p_sums(A.size());
std::partial_sum(rbegin(A), rend(A), begin(p_sums));
Query
long long query(int k) {
// first skip all items whose value is below k strictly
auto it = std::lower_bound(begin(A), end(A), k);
// compute the distance (number of items skipped)
auto index = std::distance(begin(A), it);
// do the sum
long long result = index*k + p_sums[index];
return result;
}
The complexity of the query is: O(log(N)) where N is the length of the array A.
The complexity of the preparation is: O(N*log(N)). We could go down to O(N) with a radix sort but I don't think it is useful in your case.
References
std::sort()
std::partial_sum()
std::lower_bound()
What you do seems absolutely fine. Unless this is really absolutely time critical (that is customers complain that your app is too slow and you measured it, and this function is the problem, in which case you can try some non-portable vector instructions, for example).
Often you can do things more efficiently by looking at them from a higher level. For example, if I write
for (n = 0; n < 1000000; ++n)
printf ("%lld\n", sum (100));
then this will take an awful long time (half a trillion additions) and can be done a lot quicker. Same if you change one element of the array A at a time and recalculate sum each time.
Suppose there are x elements of array A which are no larger than k and set B contains those elements which are larger than k and belongs to A.
Then the result of function sum(k) equals
k * x + sum_b
,where sum_b is the sum of elements belonging to B.
You can firstly sort the the array A, and calculate the array pre_A, where
pre_A[i] = pre_A[i - 1] + A[i] (i > 0),
or 0 (i = 0);
Then for each query k, use binary search on A to find the largest element u which is no larger than k. Assume the index of u is index_u, then sum(k) equals
k * index_u + pre_A[n] - pre_A[index_u]
. The time complex for each query is log(n).
In case array A may be dynamically changed, you can use BST to handle it.
https://www.codechef.com/problems/MAXGCD
Chef has a set consisting of N integers. Chef calls a subset of this set to be good if the subset has two or more elements. He denotes all the good subsets as S1, S2, S3, ... , S2N-N-1. Now he represents the GCD of the elements of each good subset Si as Gi.
Chef wants to find the maximum Gi.
Input
The first line of the input contains an integer T denoting the number of test cases. The description of T test cases follows."
The first line of each test case contains a single integer N denoting the number of elements in the set. The second line contains N space-separated integers A1, A2, ..., AN denoting the elements of the set.
Output
For each test case, output the maximum Gi
My solution:
I generate all possible subsets of the given set.
I calculate the GCD of each set using Euclid's algorithm
I tried to find the maximum of all of them.
This is my code for making all possible subsets:
vector< vector<int> > getAllSubsets(vector<int> set)
{
vector< vector<int> > subset;
vector<int> empty;
subset.push_back( empty );
for (int i = 0; i < set.size(); i++)
{
vector< vector<int> > subsetTemp = subset;
for (int j = 0; j < subsetTemp.size(); j++)
subsetTemp[j].push_back( set[i] );
for (int j = 0; j < subsetTemp.size(); j++)
subset.push_back( subsetTemp[j] );
}
return subset;
}
However, I get TLE while going with this approach. Where am I going wrong in this?
One optimization is that you never need to consider subsets larger than 2 elements. This is because if you add another element, the GCD can only decrease.
This leads to an O(n^2) algorithm. The problem statement says that n can be as large as 100 000, so we need to do even better.
The problem also says that the given values are at most 500 000, so the GCD cannot exceed this.
Let count[i] = how many times the value i appears in the array.
Then we can apply something similar to the Sieve of Eratosthenes: for a fixed value v, see if you can find two multiples of v (sum of count[multiple_of_v] > 1). If you can, then you can have a GCD of v. Keep track of the max you can find.
Pseudocode:
V = max(given array)
cnt[i] = how many times value i occurs in given array
for v = V down to 1:
num_multiples_v = 0
for j = v up to V:
num_multiples_v += cnt[j]
if num_multiples_v > 1: # TODO: break the inner loop when this is true
print v as solution
return
Complexity will be O(V log log V), which should be very fast.
You don't need all subsets.
Some basic properties of gcd:
gcd(a,b) == gcd(b,a)
gcd(a,b) <= a
gcd(a,b) <= b
gcd(a,b,c) == gcd(a,gcd(b,c)) == gcd(gcd(a,b),c)
and with this, it's easy to show that
gcd(a,b) >= gcd(a,b,c) >= gcd(a,b,c,d)...
for any natural numbers a,b,c,d.
You want to find the (one of the) subsets with the max. gcd. According to the rules above, one of this subsets has exactly two elements (given that the whole set has at least two elements). So the first optimization is to throw the subset generation away and make something like
max = 0
for all set elements "a"
{
for all set elements "b"
{
if(gcd(a,b) > max)
max = gcd(a,b)
}
}
If that is still not enough, sort the set form the largest to the smallest element first, and for each gcd calculated in the loops, delete every set element smaller than the calculated value.
This is a follow up question to Given a sequence of N numbers ,extract number of sequences of length K having range less than R?
I basically need a vector v as an answer of size N such that V[i] denotes number of sequences of length i which have range <=R.
Traditionally, in recursive solutions, you would compute the solution for K = 0, K = 1, and then find some kind of recurrence relation between subsequent elements to avoid recomputing the solution from scratch each time.
However here I believe that maybe attacking the problem from the other side would be interesting, because of the property of the spread:
Given a sequence of spread R (or less), any subsequence has a spread inferior to R as well
Therefore, I would first establish a list of the longest subsequences of spread R beginning at each index. Let's call this list M, and have M[i] = j where j is the higher index in S (the original sequence) for which S[j] - S[i] <= R. This is going to be O(N).
Now, for any i, the number of sequences of length K starting at i is either 0 or 1, and this depends whether K is greater than M[i] - i or not. A simple linear pass over M (from 0 to N-K) gives us the answer. This is once again O(N).
So, if we call V the resulting vector, with V[k] denoting the number of subsequences of length K in S with spread inferior to R, then we can do it in a single iteration over M:
for i in [0, len(M)]:
for k in [0, M[i] - i]:
++V[k]
The algorithm is simple, however the number of updates can be rather daunting. In the worst case, supposing than M[i] - i equals N - i, it is O(N*N) complexity. You would need a better data structure (probably an adaptation of a Fenwick Tree) to use this algorithm an lower the cost of computing those numbers.
If you are looking for contiguous sequences, try doing it recursively : The K-length subsequences set having a range inferior than R are included in the (K-1)-length subsequences set.
At K=0, you have N solutions.
Each time you increase K, you append (resp. prepend) the next (resp.previous) element, check if it the range is inferior to R, and either store it in a set (look for duplicates !) or discard it depending on the result.
If think the complexity of this algorithm is O(n*n) in the worst-case scenario, though it may be better on average.
I think Matthieu has the right answer when looking for all sequences with spread R.
As you are only looking for sequences of length K, you can do a little better.
Instead of looking at the maximum sequence starting at i, just look at the sequence of length K starting at i, and see if it has range R or not. Do this for every i, and you have all sequences of length K with spread R.
You don't need to go through the whole list, as the latest start point for a sequence of length K is n-K+1. So complexity is something like (n-K+1)*K = n*K - K*K + K. For K=1 this is n,
and for K=n it is n. For K=n/2 it is n*n/2 - n*n/4 + n/2 = n*n/2 + n/2, which I think is the maximum. So while this is still O(n*n), for most values of K you get a little better.
Start with a simpler problem: count the maximal length of sequences, starting at each index and having the range, equal to R.
To do this, let first pointer point to the first element of the array. Increase second pointer (also starting from the first element of the array) while sequence between pointers has the range, less or equal to R. Push every array element, passed by second pointer, to min-max-queue, made of a pair of mix-max-stacks, described in this answer. When difference between max and min values, reported by min-max-queue exceeds R, stop increasing second pointer, increment V[ptr2-ptr1], increment first pointer (removing element, pointed by it, from min-max-queue), and continue increasing second pointer (keeping range under control).
When second pointer leaves bounds of the array, increment V[N-ptr1] for all remaining ptr1 (corresponding ranges may be less or equal to R). To add all other ranges, that are less than R, compute cumulative sum of array V[], starting from its end.
Both time and space complexities are O(N).
Pseudo-code:
p1 = p2 = 0;
do {
do {
min_max_queue.push(a[p2]);
++p2;
} while (p2 < N && min_max_queue.range() <= R);
if (p2 < N) {
++v[p2 - p1 - 1];
min_max_queue.pop();
++p1;
}
} while (p2 < N);
for (i = 1; i <= N-p1; ++i) {
++v[i];
}
sum = 0;
for (j = N; j > 0; --j) {
value = v[j];
v[j] += sum;
sum += value;
}
Suppose I have an array of 100 numbers. The only distinct values in the array are 1, 2 and 3. The values are randomly ordered throughout the array. For instance, the array might be populated as:
int values[100];
for (int i = 0; i < 100; i++)
values[i] = 1 + rand() % 3;
How can I efficiently sort an array like this?
The fastest solution is not to "sort" at all:
Run through the array and count the number of occurrences of 1,2 and 3. These counts should hopefully fit in registers...
Fill the array with the right number of 1s, 2s and 3s, overwriting whatever is there already.
At the end you will have a fully sorted array.
In general, this can be a useful O(n) sorting algorithm when you have a very small range of possible values compared to the size of the array.
Dutch National flag algorithm is the commonly cited algorithm for this and is actually the partition step in one of the variants of quicksort (1 corresponds to less than, 2 to equal to and 3 to greater than). In that variant, you don't need to sort the middle portion.