Recursion vs bitmasking for getting all combinations of vector elements - c++

While practicing for programming competitions (like ACM, Code Jam, etc) I've met some problems that require me to generate all possible combinations of some vector elements.
Let's say that I have the vector {1,2,3}, I'd need to generate the following combinations (order is not important) :
1
2
3
1 2
1 3
2 3
1 2 3
So far I've done it with the following code :
void getCombinations(int a)
{
printCombination();
for(int j=a;j<vec.size();j++)
{
combination.pb(vec.at(j));
getCombinations(j+1);
combination.pop_back();
}
}
Calling getCombinations(0); does the job for me. But is there a better (faster) way? I've recently heard of bitmasking. As I understood it's simply for all numbers between 1 and 2^N-1 I turn that number into a binary where the 1s and 0s would represent whether or not that element is included in the combinations.
How do I implement this efficiently though? If I turn every number into binary the standard way (by dividing with 2 all the time) and then check all the digits, it seems to waste a lot of time. Is there any faster way? Should I keep on using the recursion (unless I run into some big numbers where recursion can't do the job (stack limit))?

The number of combinations you can get is 2^n, where n is the number of your elements. You can interpret every integer from 0 to 2^n -1 as a mask. In your example (elements 1, 2, 3) you have 3 elements and the masks would therefore be 000, 001, 010, 011, 100, 101, 110, and 111. Let every place in the mask represent one of your elements. For place that has a 1, take the corresponding element, otherwise if the place contains a 0, leave the element out. For example the the number 5 would be the mask 101 and it would generate this combination: 1, 3.
If you want to have a fast and relatively short code for it, you could do it like this:
#include <cstdio>
#include <vector>
using namespace std;
int main(){
vector<int> elements;
elements.push_back(1);
elements.push_back(2);
elements.push_back(3);
// 1<<n is essentially pow(2, n), but much faster and only for integers
// the iterator i will be our mask i.e. its binary form will tell us which elements to use and which not
for (int i=0;i<(1<<elements.size());++i){
printf("Combination #%d:", i+1);
for (int j=0;j<elements.size();++j){
// 1<<j shifts the 1 for j places and then we check j-th binary digit of i
if (i&(1<<j)){
printf(" %d", elements[j]);
}
}
printf("\n");
}
return 0;
}

Related

Every sum possibilities of elements

From a given array (call it numbers[]), i want another array (results[]) which contains all sum possibilities between elements of the first array.
For example, if I have numbers[] = {1,3,5}, results[] will be {1,3,5,4,8,6,9,0}.
there are 2^n possibilities.
It doesn't matter if a number appears two times because results[] will be a set
I did it for sum of pairs or triplet, and it's very easy. But I don't understand how it works when we sum 0, 1, 2 or n numbers.
This is what I did for pairs :
std::unordered_set<int> pairPossibilities(std::vector<int> &numbers) {
std::unordered_set<int> results;
for(int i=0;i<numbers.size()-1;i++) {
for(int j=i+1;j<numbers.size();j++) {
results.insert(numbers.at(i)+numbers.at(j));
}
}
return results;
}
Also, assuming that the numbers[] is sorted, is there any possibility to sort results[] while we fill it ?
Thanks!
This can be done with Dynamic Programming (DP) in O(n*W) where W = sum{numbers}.
This is basically the same solution of Subset Sum Problem, exploiting the fact that the problem has optimal substructure.
DP[i, 0] = true
DP[-1, w] = false w != 0
DP[i, w] = DP[i-1, w] OR DP[i-1, w - numbers[i]]
Start by following the above solution to find DP[n, sum{numbers}].
As a result, you will get:
DP[n , w] = true if and only if w can be constructed from numbers
Following on from the Dynamic Programming answer, You could go with a recursive solution, and then use memoization to cache the results, top-down approach in contrast to Amit's bottom-up.
vector<int> subsetSum(vector<int>& nums)
{
vector<int> ans;
generateSubsetSum(ans,0,nums,0);
return ans;
}
void generateSubsetSum(vector<int>& ans, int sum, vector<int>& nums, int i)
{
if(i == nums.size() )
{
ans.push_back(sum);
return;
}
generateSubsetSum(ans,sum + nums[i],nums,i + 1);
generateSubsetSum(ans,sum,nums,i + 1);
}
Result is : {9 4 6 1 8 3 5 0} for the set {1,3,5}
This simply picks the first number at the first index i adds it to the sum and recurses. Once it returns, the second branch follows, sum, without the nums[i] added. To memoize this you would have a cache to store sum at i.
I would do something like this (seems easier) [I wanted to put this in comment but can't write the shifting and removing an elem at a time - you might need a linked list]
1 3 5
3 5
-----
4 8
1 3 5
5
-----
6
1 3 5
3 5
5
------
9
Add 0 to the list in the end.
Another way to solve this is create a subset arrays of vector of elements then sum up each array's vector's data.
e.g
1 3 5 = {1, 3} + {1,5} + {3,5} + {1,3,5} after removing sets of single element.
Keep in mind that it is always easier said than done. A single tiny mistake along the implemented algorithm would take a lot of time in debug to find it out. =]]
There has to be a binary chop version, as well. This one is a bit heavy-handed and relies on that set of answers you mention to filter repeated results:
Split the list into 2,
and generate the list of sums for each half
by recursion:
the minimum state is either
2 entries, with 1 result,
or 3 entries with 3 results
alternatively, take it down to 1 entry with 0 results, if you insist
Then combine the 2 halves:
All the returned entries from both halves are legitimate results
There are 4 additional result sets to add to the output result by combining:
The first half inputs vs the second half inputs
The first half outputs vs the second half inputs
The first half inputs vs the second half outputs
The first half outputs vs the second half outputs
Note that the outputs of the two halves may have some elements in common, but they should be treated separately for these combines.
The inputs can be scrubbed from the returned outputs of each recursion if the inputs are legitimate final results. If they are they can either be added back in at the top-level stage or returned by the bottom level stage and not considered again in the combining.
You could use a bitfield instead of a set to filter out the duplicates. There are reasonably efficient ways of stepping through a bitfield to find all the set bits. The max size of the bitfield is the sum of all the inputs.
There is no intelligence here, but lots of opportunity for parallel processing within the recursion and combine steps.

Iterating through all possible combinations

My objective is to iterate through all combinations of a given amount of 1's and 0's. Say, if I am given the number 5, what would be a sufficiently fast way to list
1110100100,
1011000101, etc.
(Each different combination of 5 1's and 5 0's)
I am attempting to avoid iterating through all possible permutations and checking if 5 1's exist as 2^n is much greater than (n choose n/2). Thanks.
UPDATE
The answer can be calculated efficiently (recurses 10 deep) with:
// call combo() to have calculate(b) called with every valid bitset combo exactly once
combo(int index = 0, int numones = 0) {
static bitset<10> b;
if( index == 10 ) {
calculate(b); // can't have too many zeroes or ones, it so must be 5 zero and 5 one
} else {
if( 10 - numones < 5 ) { // ignore paths with too many zeroes
b[index] = 0;
combo(b, index+1, numones);
}
if( numones < 5 ) { // ignore paths with too many ones
b[index] = 1;
combo(b, index+1, numones++);
}
}
}
(Above code is not tested)
You can transform the problem. If you fix the 1s (or vice versa) then it's simply a matter of where you put the 0s. For 5 1s, there are 5+1 bins, and you want to put 5 elements (0s) in the bins.
This can be solved with a recursion per bin and a loop for each bin (put 0...reaming elements in the bin - except for the last bin, where you have to put all the remaning elements).
Another way to think about it is as a variant of the the string permutation question - just build a vector of length 2n (e.g. 111000) and then use the same algorithm for string permutation to build the result.
Note that the naive algorithm will print duplicate results. However, the algorithm can be easily adapted to ignore such duplicates by keeping a bool array in the recursive function that keeps the values set for the specific bit.

2 player team knowing maximum moves

Given a list of N players who are to play a 2 player game. Each of them are either well versed in making a particular move or they are not. Find out the maximum number of moves a 2-player team can know.
And also find out how many teams can know that maximum number of moves?
Example Let we have 4 players and 5 moves with ith player is versed in jth move if a[i][j] is 1 otherwise it is 0.
10101
11100
11010
00101
Here maximum number of moves a 2-player team can know is 5 and their are two teams that can know that maximum number of moves.
Explanation : (1, 3) and (3, 4) know all the 5 moves. So the maximal moves a 2-player team knows is 5, and only 2 teams can acheive this.
My approach : For each pair of players i check if any of the players is versed in ith move or not and for each player maintain the maximum pairs he can make with other players with his local maximum move combination.
vector<int> pairmemo;
for(int i=0;i<n;i++){
int mymax=INT_MIN;
int countpairs=0;
for(int j=i+1;j<n;j++){
int count=0;
for(int k=0;k<m;k++){
if(arr[i][k]==1 || arr[j][k]==1)
{
count++;
}
}
if(mymax<count){
mymax=count;
countpairs=0;
}
if(mymax==count){
countpairs++;
}
}
pairmemo.push_back(countpairs);
maxmemo.push_back(mymax);
}
Overall maximum of all N players is answer and count is corresponding sum of the pairs being calculated.
for(int i=0;i<n;i++){
if(maxi<maxmemo[i])
maxi=maxmemo[i];
}
int countmaxi=0;
for(int i=0;i<n;i++){
if(maxmemo[i]==maxi){
countmaxi+=pairmemo[i];
}
}
cout<<maxi<<"\n";
cout<<countmaxi<<"\n";
Time complexity : O((N^2)*M)
Code :
How can i improve it?
Constraints : N<= 3000 and M<=1000
If you represent each set of moves by a very large integer, the problem boils down to finding pair of players (I, J) which have maximum number of bits set in MovesI OR MovesJ.
So, you can use bit-packing and compress all the information on moves in Long integer array. It would take 16 unsigned long integers to store according to the constraints. So, for each pair of players you OR the corresponding arrays and count number of ones. This would take O(N^2 * 16) which would run pretty fast given the constraints.
Example:
Lets say given matrix is
11010
00011
and you used 4-bit integer for packing it.
It would look like:
1101-0000
0001-1000
that is,
13,0
1,8
After OR the moves array for 2 player team becomes 13,8, now count the bits which are one. You have to optimize the counting of bits also, for that read the accepted answer here, otherwise the factor M would appear in complexity. Just maintain one count variable and one maxNumberOfBitsSet variable as you process the pairs.
What Ill do is:
1. Do logical OR between all the possible pairs - O(N^2) and store it's SUM in a 2D array with the symmetric diagonal ignored. (thats we save half of the calc - see example)
2. find the max value in the 2D Array (can be done while doing task 1) -> O(1)
3. count how many cells in the 2D array equals to the maximum value in task 2 O(N^2)
sum: 2*O(N^2)+ O(1) => O(N^2)
Example (using the data in the question (with letters indexes):
A[10101] B[11100] C[11010] D[00101]
Task 1:
[A|B] = 11101 = SUM(4)
[A|C] = 11111 = SUM(5)
[A|D] = 10101 = SUM(3)
[B|C] = 11110 = SUM(4)
[B|D] = 11101 = SUM(4)
[C|D] = 11111 = SUM(5)
Task 2 (Done while is done 1):
Max = 5
Task 3:
Count = 2
By the way, O(N^2) is the minimum possible since you HAVE to check all the possible pairs.
Since you have to find all solutions, unless you find a way to find a count without actually finding the solutions themselves, you have to actually look at or eliminate all possible solutions. So the worst case will always be O(N^2*M), which I'll call O(n^3) as long as N and M are both big and similar size.
However, you can hope for much better performance on the average case by pruning.
Don't check every case. Find ways to eliminate combinations without checking them.
I would sum and store the total number of moves known to each player, and sort the array rows by that value. That should provide an easy check for exiting the loop early. Sorting at O(n log n) should be basically free in an O(n^3) algorithm.
Use Priyank's basic idea, except with bitsets, since you obviously can't use a fixed integer type with 3000 bits.
You may benefit from making a second array of bitsets for the columns, and use that as a mask for pruning players.

C++ Perform calculations on a huge array

I was asked a question for a job interview and I did not know the correct answer....
The question was:
If you have an array of 10 000 000 ints between 1 and 100, determine (efficiently) how many pairs of these ints sum up to 150 or less.
I don't know how to do this without a loop within a loop, but that is not very efficient.
Does anyone please have some pointers for me?
One way is by creating a smaller array of 100 elements. Loop through the 10,000,000 elements and count how many of each. Store the counter in the 100 element array.
// create an array counter of 101 elements and set every element to 0
for (int i = 0; i < 10000000; i++) {
counter[input[i]]++;
}
then do a second loop j from 1 to 100. inside that, have a loop k from 1 to min(150-j,j). if k!=j, add counter[j]*counter[k]. if k=j, add (counter[j]-1)*counter[j].
the total sum is your result.
Your total run time is bounded on the top by 10,000,000 + 100*100 = 10,010,000 (it's actually smaller than this).
This is a lot faster than (10,000,000)^2, which is 100,000,000,000,000.
Of course, you have to give up 101 int space in memory.
Delete counter when you're done.
Note also (as pointed out in the discussion below) that this is assuming that order matters. If order doesn't matter, just divide the result by 2.
first, I would sort the array. Then you start a single pass through the sorted array. You get the single value n in that cell and find the correspondent lowest value that is still allowed (e.g. for 15 it is 135). Now you find the index of this value in the array and that's the amount of pairs for n. Sum up all these and you have (if my mind is working correctly) counted each pair twice, so if you divide the sum by 2, you have the correct number.
The solution should be O(n log n) compared to the trivial one, which is O(n^2)
These kind of questions always require a mixture of mathematical insight and efficient programming. They don't want brute force.
First Insight
Numbers can be grouped according to how they will pair with other groups.
Putting them into:
1 - 50 | 51 - 75 | 76 - 100
A | B | C
Group A can pair with anything.
Group B can pair with A and B, and possibly C
Group C can pair with A and possibly B, but not C
The possibly is where we need some more insight.
Second Insight
For each number in B we need to check how many numbers there are up to its complement with 150. For example, with 62 from group B we want to know from group C how many numbers are less than or equal to 88.
For each number in C we add up the tallies up to it, e.g. tallies for 76, 77, 78, ..., 88. This is known mathematically as the partial sum.
In the standard library there is a function which produces a partial_sum
vector<int> tallies(25); // this is room for the tallies from C
vector<int> partial_sums(25);
partial_sum(tallies.begin(), tallies.end(), partial_sums.begin());
Symmetry means this sum only needs to be done for one group.
Third (much later) insight
Calculating the totals for group A and B can be done using partial_sum, too. So rather than only calculating for group C, and having to track the totals some other way, just store the totals for each number from 1 to 100, and then create the partial_sum over the whole thing. partial_sums[50] will give you the amount of numbers less than or equal to 50, partial_sums[75] those less than or equal to 75, and partial_sums[100] should be 10 million, i.e. all the numbers less than or equal to 100.
Finally we can calculate the combinations from B and C. We want to add together all the multiples of totals for 50 and 100, 51 and 99, 52 and 98, etc. we can do this by iterating through the tallies from 50 to 75 and the partial_sums from 100 to 75. There is a standard library function inner_product which can handle this.
This seems quite linear to me.
random_device rd;
mt19937 gen(rd());
uniform_int_distribution<> dis(1, 100);
vector<int> tallies(100);
for(int i=0; i < 10000000; ++i) {
tallies[dis(gen)]++;
}
vector<int> partial_sums(100);
partial_sum(tallies.begin(), tallies.end(), partial_sums.begin());
int A = partial_sums[50];
int AB = partial_sums[75];
int ABC = partial_sums[100];
int B = AB - A;
int C = ABC - AB;
int A_match = A * ABC;
int B_match = B * B;
int C_match = inner_product(&tallies[50], &tallies[75],
partial_sums.rend(), 0);

Output wrong Project Euler 50

So I am attempting Problem 50 of project euler. (So close to level 2 :D) It goes like this:
The prime 41, can be written as the sum of six consecutive primes:
41 = 2 + 3 + 5 + 7 + 11 + 13
This is the longest sum of consecutive primes that adds to a prime below one-hundred.
The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.
Which prime, below one-million, can be written as the sum of the most consecutive primes?
Here is my code:
#include <iostream>
#include <vector>
using namespace std;
int main(){
vector<int> primes(1000000,true);
primes[0]=false;
primes[1]=false;
for (int n=4;n<1000000;n+=2)
primes[n]=false;
for (int n=3;n<1000000;n+=2){
if (primes[n]==true){
for (int b=n*2;b<100000;b+=n)
primes[b]=false;
}
}
int basicmax,basiccount=1,currentcount,biggermax,biggercount=1,sum=0,basicstart,basicend,biggerstart,biggerend;
int limit=1000000;
for (int start=2;start<limit;start++){
//cout<<start;
sum=0;
currentcount=0;
for (int basic=start;start<limit&&sum+basic<limit;basic++){
if (primes[basic]==true){
//cout<<basic<<endl;
sum+=basic;currentcount++;}
if (primes[sum]&&currentcount>basiccount&&sum<limit)
{basicmax=sum;basiccount=currentcount;basicstart=start;basicend=basic;}
}
if (basiccount>biggercount)
{biggercount=basiccount;biggermax=basicmax;biggerend=basicend;biggerstart=basicstart;}
}
cout<<biggercount<<endl<<biggermax<<endl;
return 0;
}
Basically it just creates a vector of all primes up to 1000000 and then loops through them finding the right answer. The answer is 997651 and the count is supposed to be 543 but my program outputs 997661 and 546 respectively. What might be wrong?
It looks like you're building your primes vector wrong
for (int b=n*2;b<100000;b+=n)
primes[b]=false;
I think that should be 1,000,000 not 100,000. It might be better to refactor that number out as a constant to make sure it's consistent throughout.
The rest of it looks basically fine, although without testing it ourselves I'm not sure what else we can add. There's plenty of room for efficiency improvements: you do do a lot of repeated scanning of ranges e.g. there's no point starting to sum when prime[start] is false, you could build a second vector of just the primes for the summing etc. (Does project Euler have runtime and memory limit restrictions? I can't remember)
You are thinking about this the wrong way.
Generate the maximal sequence of primes such that their sum is less than 1,000,000. This is 2, 3, 5, ..., p. For some p.
Sum this sequence and test it for primality.
If it is prime terminate and return the sum.
A shorter sequence must be the correct one. There are exactly two ways of shortening the sequence and preserving the consecutive prime property - removing the first element or removing the last. Recurse from 2 with both of these sequences.