USACO: Subsets (Inefficient) - c++

I am trying to solve subsets from the USACO training gateway...
Problem Statement
For many sets of consecutive integers from 1 through N (1 <= N <= 39), one can partition the set into two sets whose sums are identical.
For example, if N=3, one can partition the set {1, 2, 3} in one way so that the sums of both subsets are identical:
{3} and {1,2}
This counts as a single partitioning (i.e., reversing the order counts as the same partitioning and thus does not increase the count of partitions).
If N=7, there are four ways to partition the set {1, 2, 3, ... 7} so that each partition has the same sum:
{1,6,7} and {2,3,4,5}
{2,5,7} and {1,3,4,6}
{3,4,7} and {1,2,5,6}
{1,2,4,7} and {3,5,6}
Given N, your program should print the number of ways a set containing the integers from 1 through N can be partitioned into two sets whose sums are identical. Print 0 if there are no such ways.
Your program must calculate the answer, not look it up from a table.
End
Before I was running on a O(N*2^N) by simply permuting through the set and finding the sums.
Finding out how horribly inefficient that was, I moved on to mapping the sum sequences...
http://en.wikipedia.org/wiki/Composition_(number_theory)
After many coding problems to scrape out repetitions, still too slow, so I am back to square one :(.
Now that I look more closely at the problem, it looks like I should try to find a way to not find the sums, but actually go directly to the number of sums via some kind of formula.
If anyone can give me pointers on how to solve this problem, I'm all ears. I program in java, C++ and python.

Actually, there is a better and simpler solution. You should use Dynamic Programming
instead. In your code, you would have an array of integers (whose size is the sum), where each value at index i represents the number of ways to possibly partition the numbers so that one of the partitions has a sum of i. Here is what your code could look like in C++:
int values[N];
int dp[sum+1]; //sum is the sum of the consecutive integers
int solve(){
if(sum%2==1)
return 0;
dp[0]=1;
for(int i=0; i<N; i++){
int val = values[i]; //values contains the consecutive integers
for(int j=sum-val; j>=0; j--){
dp[j+val]+=dp[j];
}
}
return dp[sum/2]/2;
}
This gives you an O(N^3) solution, which is by far fast enough for this problem.
I haven't tested this code, so there might be a syntax error or something, but you get the point. Let me know if you have any more questions.

This is the same thing as finding the coefficient x^0 term in the polynomial (x^1+1/x)(x^2+1/x^2)...(x^n+1/x^n), which should take about an upper bound of O(n^3).

Related

Subsequence having sum at most 'k'

Given a non decreasing array A of size n and an integer k, how to find a subsequence S of the array A with maximum possible sum of its elements, such that this sum is at most k. If there are multiple such subsequences, we are interested in finding only one.
For example, let the array be {1, 2, 2, 4} so, n = 4 and let k = 7. Then, the answer should be {1, 2, 4}.
Brute force approach takes approximately O(n(2^n-1)) but is there a more efficient solution to this problem?
In the general case the answer is no.
Just deciding if there is a solution where elements sum up to k is equivalent to the Subset Sum Problem and thus already NP-complete.
The Subset Sum Problem can be equivalently formulated as: given the integers or
natural numbers w_1,... ,w_n does any subset of them sum to precisely W
However, if either n or the number of bits P that it takes to represent the largest number w is small there might be more efficient solution (e.g., a pseudo-polynomial solution based on dynamic programming if P is small). Additionally, if all your numbers w are positive then it might also be possible to find a better solution.

Faster way of searching array of sets

I have an array containing 100,000 sets. Each set contains natural numbers below 1,000,000. I have to find the number of ordered pairs {m, n}, where 0 < m < 1,000,000, 0 < n < 1,000,000 and m != n, which do not exist together in any of 100,000 sets. A naive method of searching through all the sets leads to 10^5 * (10^6 choose 2) number of searches.
For example I have 2 sets set1 = {1,2,4} set2 = {1,3}. All possible ordered pairs of numbers below 5 are {1,2}, {1,3}, {1,4}, {2,3}, {2,4} and {3,4}. The ordered pairs of numbers below 5 which do not exist together in set 1 are {1,3},{2,3} and {3,4}. The ordered pairs below 5 missing in set 2 are {1,2},{1,4},{2,3},{2,4} and {3,4}. The ordered pairs which do not exist together in both the sets are {2,3} and {3,4}. So the count of number of ordered pairs missing is 2.
Can anybody point me to a clever way of organizing my data structure so that finding the number of missing pairs is faster? I apologize in advance if this question has been asked before.
Update:
Here is some information about the structure of my data set.
The number of elements in each set varies from 2 to 500,000. The median number of elements is around 10,000. The distribution peaks around 10,000 and tapers down in both direction. The union of the elements in the 100,000 sets is close to 1,000,000.
If you are looking for combinations across sets, there is a way to meaningfully condense your dataset, as shown in frenzykryger's answer. However, from your examples, what you're looking for is the number of combinations available within each set, meaning each set contains irreducible information. Additionally, you can't use combinatorics to simply obtain the number of combinations from each set either; you ultimately want to deduplicate combinations across all sets, so the actual combinations matter.
Knowing all this, it is difficult to think of any major breakthroughs you could make. Lets say you have i sets and a maximum of k items in each set. The naive approach would be:
If your sets are typically dense (i.e. contain most of the numbers between 1 and 1,000,000), replace them with the complement of the set instead
Create a set of 2 tuples (use a set structure that ensures insertion is idempotent)
For each set O(i):
Evaluate all combinations and insert into set of combinations: O(k choose 2)
The worst case complexity for this isn't great, but assuming you have scenarios where a set either contains most of the numbers between 0 and 1,000,000, or almost none of them, you should see a big improvement in performance.
Another approach would be to go ahead and use combinatorics to count the number of combinations from each set, then use some efficient approach to find the number of duplicate combinations among sets. I'm not aware of such an approach, but it is possible it exists.
First lets solve more simple task of counting number of elements not present in your sets. This task can be reworded in more simple form - instead of 100,000 sets you can think about 1 set which contains all your numbers. Then number of elements not present in this set is x = 1000000 - len(set). Now you can use this number x to count number of combinations. With repetitions: x * x, without repetitions: x * (x - 1). So bottom line of my answer is to put all your numbers in one big set and use it's length to find number of combinations using combinatorics.
Update
So above we have a way to find number of combinations where each element in combination is not in any of the sets. But question was to find number of combinations where each combination is not present in any of the sets.
Lets try to solve simpler problem first:
your sets have all numbers in them, none missing
each number is present exactly in one set, no duplicates across sets
How you would construct such combinations over such sets? You would simply pick two elements from different sets and resulting combination would not be in any of the sets. Number of such combinations could be counted using following code (it accepts sizes of the sets):
int count_combinations(vector<int>& buckets) {
int result = 0;
for (int i=0; i < buckets.size(); ++i) {
for (int j=i+1; j < buckets.size(); ++j) {
result += buckets[i] * buckets[j];
}
}
return result;
}
Now let's imagine that some numbers are missing. Then we can just add additional set with those missing numbers to our sets (as a separate set). But we also need to account that given there were n missing numbers there would be n * (n-1) combinations constructed using only these missing numbers. So following code will produce total number of combinations with account to missing numbers:
int missing_numbers = upper_bound - all_numbers.size() - 1;
int missing_combinations = missing_numbers * (missing_numbers - 1);
return missing_combinations + count_combinations(sets, missing_numbers);
Now lets imagine we have a duplicate across two sets: {a, b, c}, {a, d}.
What types of errors they will introduce? Following pairs: {a, a} - repetition, {a, d} - combination which is present in second set.
So how to treat such duplicates? We need to eliminate them completely from all sets. Even single instance of a duplicate will produce combination present in some set. Because we can just pick any element from the set where duplicate was removed and produce such combination (in my example - if we will keep a in first set, then pick d from the second to produce {a, d}, if we will keep a in second set, then pick b or c from the first to produce {a, b} and {a, c}). So duplicates shall be removed.
Update
However we can't simply remove all duplicates, consider this counterexample:
{a, b} {a, c} {d}. If we simply remove a we will acquire {b} {c} {d} and lost information about not-existing combination {a, d}. Consider another counterexample:
{a, b} {a, b, c} {b, d}. If we simply remove duplicates we will acquire {c} {d} and lost information about {a, d}.
Also we can't simply apply such logic to pairs of sets, a simple counter example for numbers < 3: {1, 2} {1} {2}. Here number of missing combinations is 0, but we will incorrectly count in {1, 2} if we will apply duplicates removal to pair of sets. Bottom line is that I can't come up with good technique which will help to correctly handle duplicate elements across sets.
What you can do, depending on memory requirements, is take advantage of the ordering of Set, and iterate over the values smartly. Something like the code below (untested). You'll iterate over all of your sets, and then for each of your sets you'll iterate over their values. For each of these values, you'll check all of the values in the set after them. Our complexity is reduced to the number of sets times the square of their sizes. You can use a variety of methods to keep track of your found/unfound count, but using a set should be fine, since insertion is simply O(log(n)) where n is no more than 499999500000. In theory using a map of sets (mapping based on the first value) could be slightly faster, but in either case the cost is minimal.
long long numMissing(const std::array<std::set<int>, 100000>& sets){
std::set<pair<int, int> > found;
for (const auto& s : sets){
for (const auto& m : s){
const auto &n = m;
for (n++; n != s.cend(); n++){
found.emplace(m, n);
}
}
}
return 499999500000 - found.size();
}
As an option you can build Bloom Filter(s) over your sets.
Before checking against all sets you can quickly lookup at your bloom filter and since it will never produce false negatives you can safely use your pair as its not present in your sets.
Physically storing each possible pair would take too much memory. We have 100k sets and an average set has 10k numbers = 50M pairs = 400MB with int32 (and set<pair<int, int>> needs much more than 8 bytes per element).
My suggestion is based on two ideas:
don't store, only count the missing pairs
use interval set for compact storage and fast set operations (like boost interval set)
The algorithm is still quadratic on the number of elements in the sets but needs much less space.
Algorithm:
Create the union_set of the individual sets.
We also need a data structure, let's call it sets_for_number to answer this question: which sets contain a particular number? For the simplest case this could be unordered_map<int, vector<int>> (vector stores set indices 0..99999)
Also create the inverse sets for each set. Using interval sets this takes only 10k * 2 * sizeof(int) space per set on average.
dynamic_bitset<> union_set = ...; //union of individual sets (can be vector<bool>)
vector<interval_set<int>> inverse_sets = ...; // numbers 1..999999 not contained in each set
int64_t missing_count = 0;
for(int n = 1; n < 1000000; ++n)
// count the missing pairs whose first element is n
if (union_set.count(n) == 0) {
// all pairs are missing
missing_count += (999999 - n);
} else {
// check which second elements are not present
interval_set<int> missing_second_elements = interval_set<int>(n+1, 1000000);
// iterate over all sets containing n
for(int set_idx: sets_for_number.find(n)) {
// operator&= is in-place intersection
missing_second_elements &= inverse_sets[set_idx];
}
// counting the number of pairs (n, m) where m is a number
// that is not present in any of the sets containing n
for(auto interval: missing_second_elements)
missing_count += interval.size()
}
}
If it is possible, have a set of all numbers and remove each of the number when you insert to your array of set. This will have a O(n) space complexity.
Of course if you don't want to have high spec complexity, maybe you can have a range vector. For each element in the vector, you have a pair of numbers which are the start/end of a range.

How to calc percentage of coverage in an array of 1-100 using C++?

This is for an assignment so I would appreciate no direct answers; rather, any logic help with my algorithms (or pointing out any logic flaws) would be incredibly helpful and appreciated!
I have a program that receives "n" number of elements from the user to put into a single-dimensional array.
The array uses random generated numbers.
IE: If the user inputs 88, a list of 88 random numbers (each between 1 to 100) is generated).
"n" has a max of 100.
I must write 2 functions.
Function #1:
Determine the percentage of numbers that appear in the array of "n" elements.
So any duplicates would decrease the percentage.
And any missing numbers would decrease the percentage.
Thus if n = 75, then you have a maximum possible %age of 0.75
(this max %age decreases if there are duplicates)
This function basically calls upon function #2.
FUNCTION HEADER(GIVEN) = "double coverage (int array[], int n)"
Function #2:
Using a linear search, search for the key (key being the current # in the list of 1 to 100, which should be from the loop in function #1), in the array.
Return the position if that key is found in the array
(IE: if this is the loops 40th run, it will be at the variable "39",
and will go through every instance of an element in the array
and if any element is equal to 39, all of those positions will be returned?
I believe that is what our prof is asking)
Return -1 if the key is not found.
Given notes = "Only function #1 calls function #2,
and does so to find out if a certain value (key) is found among the first n elements of the array."
FUNCTION HEADER(GIVEN) = "int search (int array[], int n, int key)"
What I really need help with is the logic for the algorithm.
I would appreciate any help with this as I would approach this problem completely differently than our professor wants us.
My first thoughts would be to loop through function #1 for all variable keys of 1 through 100.
And in that loop, go to the search function (function #2), in which a loop would go through every number in the array and add to a counter if a number was (1)a duplicate or (2) non-existent in the array. Then I would subtract that counter from 100. Thus if all numbers were included in the array except for the #40 and #41, and then #77 was a duplicate , the total percentage of coverage would be 100 - 3 = 97%.
Although as I type this I think that may in of itself be flawed? ^ Because with a max of 100 elements in the array, if the only number missing was 99, then you would subtract 1 for having that number missing, and then if there was a duplicate you would subtract another 1, and thus your percentage of coverage would be (100-2) = 98, when clearly it ought to be 99.
And this ^ is exactly why I would REALLY appreciate any logic help. :)
I know I am having some problems approaching this logically.
I think I can figure out the coding with a relative amount of ease; what I am struggling witht he most is the steps to take. So any pseudocode ideas would be amazing!
(I can post my entire program code so far if necessary for anyone, just ask, but it is rather long as of now as I have many other functions performing other tasks in the program)
I may be mistaken, but as I read it all you need to do is:
write a function that loops through the array of n elements to find a given number in it. It would return the index of first occurence, or a negative value in case the number cannot be found in the array.
write a loop to call the function for all numbers 1 to 100 and count the finds. Then divide the result by 100.
I'm not sure if I understand this whole thing right, but 1 function you can do it, if you don't care about speed, it's better to put array into a vector, loop through 1..100 and use boost find function http://www.boost.org/doc/libs/1_41_0/doc/html/boost/algorithm/find_nth.html. There you can compare current value with the second entry value in the vector, if it contains you decrease, not not decrease, if you want to find if the unique number is in array, use http://www.cplusplus.com/reference/algorithm/find/. I don't understand, how the percentage decreases, so it's on your own and I don't rly understand second function, but if its linear search use again find.
P.S. Vector description http://www.cplusplus.com/reference/vector/vector/begin/.
You want to know how many numbers in the range [1, 100] appear in your given array. You can search for each number in turn:
size_t count_unique(int array[], size_t n)
{
size_t result = 0;
for (int i = 1; i <= 100; ++i)
{
if (contains(array, n, i))
{
++result;
}
}
return result;
}
All you still need is an implementation of the containment check contains(array, n, i), and to transform the unique count into a percentage (by using division).

2 player team knowing maximum moves

Given a list of N players who are to play a 2 player game. Each of them are either well versed in making a particular move or they are not. Find out the maximum number of moves a 2-player team can know.
And also find out how many teams can know that maximum number of moves?
Example Let we have 4 players and 5 moves with ith player is versed in jth move if a[i][j] is 1 otherwise it is 0.
10101
11100
11010
00101
Here maximum number of moves a 2-player team can know is 5 and their are two teams that can know that maximum number of moves.
Explanation : (1, 3) and (3, 4) know all the 5 moves. So the maximal moves a 2-player team knows is 5, and only 2 teams can acheive this.
My approach : For each pair of players i check if any of the players is versed in ith move or not and for each player maintain the maximum pairs he can make with other players with his local maximum move combination.
vector<int> pairmemo;
for(int i=0;i<n;i++){
int mymax=INT_MIN;
int countpairs=0;
for(int j=i+1;j<n;j++){
int count=0;
for(int k=0;k<m;k++){
if(arr[i][k]==1 || arr[j][k]==1)
{
count++;
}
}
if(mymax<count){
mymax=count;
countpairs=0;
}
if(mymax==count){
countpairs++;
}
}
pairmemo.push_back(countpairs);
maxmemo.push_back(mymax);
}
Overall maximum of all N players is answer and count is corresponding sum of the pairs being calculated.
for(int i=0;i<n;i++){
if(maxi<maxmemo[i])
maxi=maxmemo[i];
}
int countmaxi=0;
for(int i=0;i<n;i++){
if(maxmemo[i]==maxi){
countmaxi+=pairmemo[i];
}
}
cout<<maxi<<"\n";
cout<<countmaxi<<"\n";
Time complexity : O((N^2)*M)
Code :
How can i improve it?
Constraints : N<= 3000 and M<=1000
If you represent each set of moves by a very large integer, the problem boils down to finding pair of players (I, J) which have maximum number of bits set in MovesI OR MovesJ.
So, you can use bit-packing and compress all the information on moves in Long integer array. It would take 16 unsigned long integers to store according to the constraints. So, for each pair of players you OR the corresponding arrays and count number of ones. This would take O(N^2 * 16) which would run pretty fast given the constraints.
Example:
Lets say given matrix is
11010
00011
and you used 4-bit integer for packing it.
It would look like:
1101-0000
0001-1000
that is,
13,0
1,8
After OR the moves array for 2 player team becomes 13,8, now count the bits which are one. You have to optimize the counting of bits also, for that read the accepted answer here, otherwise the factor M would appear in complexity. Just maintain one count variable and one maxNumberOfBitsSet variable as you process the pairs.
What Ill do is:
1. Do logical OR between all the possible pairs - O(N^2) and store it's SUM in a 2D array with the symmetric diagonal ignored. (thats we save half of the calc - see example)
2. find the max value in the 2D Array (can be done while doing task 1) -> O(1)
3. count how many cells in the 2D array equals to the maximum value in task 2 O(N^2)
sum: 2*O(N^2)+ O(1) => O(N^2)
Example (using the data in the question (with letters indexes):
A[10101] B[11100] C[11010] D[00101]
Task 1:
[A|B] = 11101 = SUM(4)
[A|C] = 11111 = SUM(5)
[A|D] = 10101 = SUM(3)
[B|C] = 11110 = SUM(4)
[B|D] = 11101 = SUM(4)
[C|D] = 11111 = SUM(5)
Task 2 (Done while is done 1):
Max = 5
Task 3:
Count = 2
By the way, O(N^2) is the minimum possible since you HAVE to check all the possible pairs.
Since you have to find all solutions, unless you find a way to find a count without actually finding the solutions themselves, you have to actually look at or eliminate all possible solutions. So the worst case will always be O(N^2*M), which I'll call O(n^3) as long as N and M are both big and similar size.
However, you can hope for much better performance on the average case by pruning.
Don't check every case. Find ways to eliminate combinations without checking them.
I would sum and store the total number of moves known to each player, and sort the array rows by that value. That should provide an easy check for exiting the loop early. Sorting at O(n log n) should be basically free in an O(n^3) algorithm.
Use Priyank's basic idea, except with bitsets, since you obviously can't use a fixed integer type with 3000 bits.
You may benefit from making a second array of bitsets for the columns, and use that as a mask for pruning players.

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?
sort the list and subtract nearest two elements
The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.
The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.
No, unless you know the list is sorted, you need two
Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)
This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}