Maximum subset which has no sum of two divisible by K - c++

I am given the set {1, 2, 3, ... ,N}. I have to find the maximum size of a subset of the given set so that the sum of any 2 numbers from the subset is not divisible by a given number K. N and K can be up to 2*10^9 so i need a very fast algorithm. I only came up with an algorithm of complexity O(K), which is slow.

first calculate all of the set elements mod k.and solve simple problem:
find the maximum size of a subset of the given set so that the sum of any 2 numbers from the subset is not equal by a given number K.
i divide this set to two sets (i and k-i) that you can not choose set(i) and set(k-i) Simultaneously.
int myset[]
int modclass[k]
for(int i=0; i< size of myset ;i++)
{
modclass[(myset[i] mod k)] ++;
}
choose
for(int i=0; i< k/2 ;i++)
{
if (modclass[i] > modclass[k-i])
{
choose all of the set elements that the element mod k equal i
}
else
{
choose all of the set elements that the element mod k equal k-i
}
}
finally you can add one element from that the element mod k equal 0 or k/2.
this solution with an algorithm of complexity O(K).
you can improve this idea with dynamic array:
for(int i=0; i< size of myset ;i++)
{
x= myset[i] mod k;
set=false;
for(int j=0; j< size of newset ;j++)
{
if(newset[j][1]==x or newset[j][2]==x)
{
if (x < k/2)
{
newset[j][1]++;
set=true;
}
else
{
newset[j][2]++;
set=true;
}
}
}
if(set==false)
{
if (x < k/2)
{
newset.add(1,0);
}
else
{
newset.add(0,1);
}
}
}
now you can choose with an algorithm of complexity O(myset.count).and your algorithm is more than O(myset.count) because you need O(myset.count) for read your set.
complexity of this solution is O(myset.count^2),that you can choose algorithm depended your input.with compare between O(myset.count^2) and o(k).
and for better solution you can sort myset based on mod k.

I'm assuming that the set of numbers is always 1 through N for some N.
Consider the first N-(N mod K) numbers. The form floor(N/K) sequences of K consecutive numbers, with reductions mod K from 0 through K-1. For each group, floor(K/2) have to be dropped for having a reduction mod K that is the negation mod K of another subset of floor(K/2). You can keep ceiling(K/2) from each set of K consecutive numbers.
Now consider the remaining N mod K numbers. They have reductions mod K starting at 1. I have not worked out the exact limits, but if N mod K is less than about K/2 you will be able to keep all of them. If not, you will be able to keep about the first ceiling(K/2) of them.
==========================================================================
I believe the concept here is correct, but I have not yet worked out all the details.
==========================================================================
Here is my analysis of the problem and answer. In what follows |x| is floor(x). This solution is similar to the one in #Constantine's answer, but differs in a few cases.
Consider the first K*|N/K| elements. They consist of |N/K| repeats of the reductions modulo K.
In general, we can include |N/K| elements that are k modulo K subject to the following limits:
If (k+k)%K is zero, we can include only one element that is k modulo K. That is the case for k=0 and k=(K/2)%K, which can only happen for even K.
That means we get |N/K| * |(K-1)/2| elements from the repeats.
We need to correct for the omitted elements. If N >= K we need to add 1 for the 0 mod K elements. If K is even and N>=K/2 we also need to add 1 for the (K/2)%K elements.
Finally, if M(N)!=0 we need to add a partial or complete copy of the repeat elements, min(N%K,|(K-1)/2|).
The final formula is:
|N/K| * |(K-1)/2| +
(N>=K ? 1 : 0) +
((N>=K/2 && (K%2)==0) ? 1 : 0) +
min(N%K,|(K-1)/2|)
This differs from #Constantine's version in some cases involving even K. For example, consider N=4, K=6. The correct answer is 3, the size of the set {1, 2, 3}. #Constantine's formula gives |(6-1)/2| = |5/2| = 2. The formula above gets 0 for each of the first two lines, 1 from the third line, and 2 from the final line, giving the correct answer.

formula is
|N/K| * |(K-1)/2| + ost
ost =
if n<k:
ost =0
else if n%k ==0 :
ost =1
else if n%k < |(K-1)/2| :
ost = n%k
else:
ost = |(K-1)/2|
where |a/b|
for example |9/2| = 4 |7/2| = 3
example n = 30 , k =7 ;
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
1 2 3 |4| 5 6 7. - is first line .
8 9 10 |11| 12 13 14 - second line
if we getting first 3 number in each line we may get size of this subset. also we may adding one number from ( 7 14 28)
getting first 3 number (1 2 3) is a number |(k-1)/2| .
a number of this line is |n/k| .
if there is not residue we may add one number (for example last number).
if residue < |(k-1)/2| we get all number in last line
else getting |(K-1)/2|.
thanks for exception case.
ost = 0 if k>n

n,k=(raw_input().split(' '))
n=int(n)
k=int(k)
l=[0 for x in range(k)]
d=[int(x) for x in raw_input().split(' ')]
flag=0
for x in d:
l[x%k]=l[x%k]+1
sum=0
if l[0]!=0:
sum+=1
if (k%2==0):
sum+=1
if k==1:
print 1
elif k==2:
print 2
else:
i=1
j=k-1
while i<j:
sum=sum+(l[i] if l[i]>=l[j] else l[j])
i=i+1
j=j-1
print sum

This is explanation to ABRAR TYAGI and amin k's solution.
The approach to this solution is:
Create an array L with K buckets and group all the elements from the
input array D into the K buckets. Each bucket L[i] contains D's elements such that ( element % K ) = i.
All the elements that are individually divisible by K are in L[0]. So
only one of these elements (if any) can belong in our final (maximal)
subset. Sum of any two of these elements is divisible by K.
If we add an element from L[i] to an element in L[K-i] then the sum is divisible by K. Hence we can add elements from only one of these buckets to
our final set. We pick the largest bucket.
Code:
d is the array containing the initial set of numbers of size n. The goal of this code is to find the count of the largest subset of d such that the sum of no two integers is divisible by 2.
l is an array that will contain k integers. The idea is to reduce each (element) in array d to (element % k) and save the frequency of their occurrences in array l.
For example, l[1] contains the frequency of all elements % k = 1
We know that 1 + (k-1) % k = 0 so either l[1] or l[k-1] have to be discarded to meet the criteria that sum of no two numbers % k should be 0.
But as we need the largest subset of d, we choose the larger of l[1] and l[k-1]
We loop through array l such that for (i=1; i<=k/2 && i < k-i; i++) and do the above step.
There are two outliers. The sum of any two numbers in the l[0] group % k = 0. So add 1 if l[0] is non-zero.
if k is even, the loop does not handle i=k/2, and using the same logic as above increment the count by one.

Related

Intuition behind storing the remainders?

I am trying to solve a question on LeetCode.com:
Given a list of non-negative numbers and a target integer k, write a function to check if the array has a continuous subarray of size at least 2 that sums up to the multiple of k, that is, sums up to n*k where n is also an integer. For e.g., if [23, 2, 4, 6, 7], k=6, then the output should be True, since [2, 4] is a continuous subarray of size 2 and sums up to 6.
I am trying to understand the following solution:
class Solution {
public:
bool checkSubarraySum(vector<int>& nums, int k) {
int n = nums.size(), sum = 0, pre = 0;
unordered_set<int> modk;
for (int i = 0; i < n; ++i) {
sum += nums[i];
int mod = k == 0 ? sum : sum % k;
if (modk.count(mod)) return true;
modk.insert(pre);
pre = mod;
}
return false;
}
};
I understand that we are trying to store: 0, (a/k), (a+b)/k, (a+b+c)/k, etc. into the hashSet (where k!=0) and that we do that in the next iteration since we want the subarray size to be at least 2.
But, how does this guarantee that we get a subarray whose elements sum up to k? What mathematical property guarantees this?
The set modk is gradually populated with all sums (considered modulo k) of contiguous sub-arrays starting at the beginning of the array.
The key observation is that:
a-b = n*k for some natural n iff
a-b ≡ 0 mod k iff
a ≡ b mod k
so if a contiguous sub-array nums[i_0]..nums[i_1], sums up to 0 modulo k, then the two sub-arrays nums[0]..nums[i_0] and nums[i_0 + 1]..nums[i_1] have the same sum modulo k.
Thus it's enough if two distinct sub-arrays starting at the beginning of the array have the same sum, modulo k.
Luckily, there are only k such values, so you only need to use a set of size k.
Some nitpicks:
if n > k, you're going to have an appropriate sub-array anyway (the pigeon-hole principle), so the loop will actually never iterate more than k+1 times.
There should not be any sort of class involved here, that makes no sense.
contiguous, not continuous. Arrays and sub-arrays are discrete and can't be continuous...
module base k of sum is equivalent to the module k of sum of the modules base k
(a+b)%k = (a%k + b%k) % k
(23 + 2) % 6 = 1
( (23%6) + (2%6) ) % 6 = (5 + 2) % 6 = 1
modk stores all modules that you calculated iteratively. If at iteration i you get a repeated module calculated at i-m that means that you added a subsequence of m elements which sum is multiple of k
i=0 nums[0] = 23 sum = 23 sum%6 = 5 modk = [5]
i=1 nums[1] = 2 sum = 25 sum%6 = 1 modk = [5, 1]
i=2 nums[2] = 4 sum = 29 sum%6 = 5 5 already exists in modk (4+2)%6 =0

Sum of difference of a number to an array of numbers

This is my problem.
Given an array of integers and another integer k, find the sum of differences of each element of the array and k.
For example if the array is 2, 4, 6, 8, 10 and k is 3
Sum of difference
= abs(2 - 3) + abs(4-3) + abs(6 - 3) + abs(8 - 3) + abs(10 - 3)
= 1 + 1 + 3 + 5 + 7
= 17
The array remains the same throughout and can contain up to 100000 elements and there will be 100000 different values of k to be tested. k may or may not be an element of the array. This has to be done within 1s or about 100M operations. How do I achieve this?
You can run multiple queries for sums of absolute differences in O(log N) if you add a preprocessing step which costs O(N * log N).
Sort the array, then for each item in the array store the sum of all numbers that are smaller than or equal to the corresponding item. This can be done in O(N * log N) Now you have a pair of arrays that look like this:
2 4 6 8 10 // <<== Original data
2 6 12 20 30 // <<== Partial sums
In addition, store the total T of all numbers in the array.
Now you can get sums of absolute differences by running a binary search on the original array, and using the sums from the partial sums array to compute the answer: subtract the sum of all numbers to the left of the target k from the count of numbers to the left of the target times k, then subtract the count times k from the sum to the right of the number, and add the two numbers together. The partial sum of the numbers to the right of the number can be computed by subtracting the partial sum on the left from the total T.
For k=3 binary search gets you to position 1.
Partial sum on the left is 2
Count of items on the left is 1
Partial sum on the right is (30-2)=28
Count of items on the right is 4
You compute (1*3-2) + (28-4*3) = 1 + 16 = 17
First sort the array and then compute an array that stores the sum of the prefixes of the resulting sorted array. Let's denote this array p, you can compute p in linear time so that p[i] = a[0] + a[1] + ... a[i]. Now having this array you can answer with constant complexity the question what is the sum of elements a[x] + a[x+1] + .... +a[y](i.e. with indices x to y). To do that you simply compute p[y] - p[x-1](Take special care when x is 1).
Now to answer a query of the type what is the sum of absolute differences with k, we will split the problem in two parts - what is the sum of the numbers greater than k and the numbers smaller than k. In order to compute these, perform a binary search to find the position of k in the sorted a(denote that idx), and compute the sum of the values in a before idx(denote that s) and after idx(denote that S). Now the sum of absolute differences with k is idx * k - s + S - (a.length - idx)* k. This of course is pseudo code and what I mean by a.length is the number of elements in a.
After performing a linearithmic precomputation, you will be able to answer a query with O(log(n)). Please note this approach only makes sense if you plan to perform multiple queries. If you are only going to perform a single query, you can not possibly go faster than O(n).
Just implementing dasblinkenlight's solution in "contest C++":
It does exactly as he says. Reads the values, sorts them, stores the accumulated sum in V[i].second, but here V[i] is the acumulated sum until i-1 (to simplify the algorithm). It also stores a sentinel in V[n] for cases when the query is greater than max(V).
Then, for each query, binary search for the value. In this case V[a].second is the sum of values lesser than query, V[n].second-V[a].second is the sum of values greater than it.
#include<iostream>
#include<algorithm>
#define pii pair<int, int>
using namespace std;
pii V[100001];
int main() {
int n;
while(cin >> n) {
for(int i=0; i<n; i++)
cin >> V[i].first;
sort(V, V+n);
V[0].second = 0;
for(int i=1; i<=n; i++)
V[i].second = V[i-1].first + V[i-1].second;
int k; cin >> k;
for(int i=0; i<k; i++) {
int query; cin >> query;
pii* res = upper_bound(V, V+n, pii(query, 0));
int a = res-V, b=n-(res-V);
int left = query*a-V[a].second;
int right = V[n].second-V[a].second-query*b;
cout << left+right << endl;
}
}
}
It assumes a file with a format like this:
5
10 2 8 4 6
2
3 5
Then, for each query, it answers like this:
17
13

Finding number of subsets of an array that add up to a multiple of a specific number

I have an array A of length N of negative as well as positive integers. I need to count the number of subsets in this array which add up to a multiple of a number M (or 0 (mod M))
For example:
Let A = {1,2,8,4,5}, M = 9,
Then, there are 4 such subsets:
{}: Empty set, corresponding to the multiple 0,
{1,8}: corresponding to the multiple 9,
{4,5}: corresponding to the multiple 9
{1,8,4,5}: corresponding to the multiple 18.
I thought of generating all possible multiples and then applying dynamic programming subset sum, but the constraints won't allow me that.
Constraints:
1 =< N <= 10^5,
1 =< M <= 100,
-10^9 =< each entry of array <=10^9
What should be my approach for this sort of problem?
You can solve this problem by dynamic programming, albeit extensive for large M and fast for small M. For each j satisfying 0 <=j <= M-1, and each integer k satisfying 0 < k <= N, let f(k,j) be the number of subsets of array elements between 1 and k that add up to give a sum of j mod M. Then to extend the counter f(k,j) to f(k+1,j') for all j' you just need to take the (k+1)th element X in your sequence and set f(k+1,j') = f(k,j') + f(k,j' - X mod M). When you iterate over all j satisfying 0 <= j <= M-1 for each k and then successively iterate over all k satisfying 0 <= k <= N, you will get your answer at f(N,0). Total complexity is O(MN), which for small M is basically linear in N, optimal.

Optimizing algorithm to find number of six digit numbers satisfying certain property

Problem: "An algorithm to find the number of six digit numbers where the sum of the first three digits is equal to the sum of the last three digits."
I came across this problem in an interview and want to know the best solution. This is what I have till now.
Approach 1: The Brute force solution is, of course, to check for each number (between 100,000 and 999,999) whether the sum of its first three and last three digits are equal. If yes, then increment certain counter which keeps count of all such numbers.
But this checks for all 900,000 numbers and so is inefficient.
Approach 2: Since we are asked "how many" such numbers and not "which numbers", we could do better. Divide the number into two parts: First three digits (these go from 100 to 999) and Last three digits (these go from 000 to 999). Thus, the sum of three digits in either part of a candidate number can range from 1 to 27.
* Maintain a std::map<int, int> for each part where key is the sum and value is number of numbers (3 digit) having that sum in the corresponding part.
* Now, for each number in the first part find out its sum and update the corresponding map.
* Similarly, we can get updated map for the second part.
* Now by multiplying the corresponding pairs (e.g. value in map 1 of key 4 and value in map 2 of key 4) and adding them up we get the answer.
In this approach, we end up checking 1K numbers.
My question is how could we further optimize? Is there a better solution?
For 0 <= s <= 18, there are exactly 10 - |s - 9| ways to obtain s as the sum of two digits.
So, for the first part
int first[28] = {0};
for(int s = 0; s <= 18; ++s) {
int c = 10 - (s < 9 ? (9 - s) : (s - 9));
for(int d = 1; d <= 9; ++d) {
first[s+d] += c;
}
}
That's 19*9 = 171 iterations, for the second half, do it similarly, with the inner loop starting at 0 instead of 1, that's 19*10 = 190 iterations. Then sum first[i]*second[i] for 1 <= i <= 27.
Generate all three-digit numbers; partition them into sets based on their sum of digits. (Actually, all you need to do is keep a vector that counts the size of the sets). For each set, the number of six-digit numbers that can be generated is the size of the set squared. Sum up the squares of the set sizes to get your answer.
int sumCounts[28]; // sums can go from 0 through 27
for (int i = 0; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += count * count;
}
EDIT Variation to eliminate counting leading zeroes:
int sumCounts[28];
int sumCounts2[28];
for (int i = 0; i < 100; ++i) {
int s = sumOfDigits(i);
sumCounts[s]++;
sumCounts2[s]++;
}
for (int i = 100; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += (count - sumCounts2[i]) * count;
}
Python Implementation
def equal_digit_sums():
dists = {}
for i in range(1000):
digits = [int(d) for d in str(i)]
dsum = sum(digits)
if dsum not in dists:
dists[dsum] = [0,0]
dists[dsum][0 if len(digits) == 3 else 1] += 1
def prod(dsum):
t = dists[dsum]
return (t[0]+t[1])*t[0]
return sum(prod(dsum) for dsum in dists)
print(equal_digit_sums())
Result: 50412
One idea: For each number from 0 to 27, count the number of three-digit numbers that have that digit sum. This should be doable efficiently with a DP-style approach.
Now you just sum the squares of the results, since for each answer, you can make a six-digit number with one of those on each side.
Assuming leading 0's aren't allowed, you want to calculate how many different ways are there to sum to n with 3 digits. To calculate that you can have a for loop inside a for loop. So:
firstHalf = 0
for i in xrange(max(1,n/3),min(9,n+1)): #first digit
for j in xrange((n-i)/2,min(9,n-i+1)): #second digit
firstHalf +=1 #Will only be one possible third digit
secondHalf = firstHalf + max(0,10-|n-9|)
If you are trying to sum to a number, then the last number is always uniquely determined. Thus in the case where the first number is 0 we are just calculating how many different values are possible for the second number. This will be n+1 if n is less than 10. If n is greater, up until 18 it will be 19-n. Over 18 there are no ways to form the sum.
If you loop over all n, 1 through 27, you will have your total sum.

How can I find the number of ways a number can be expressed as a sum of primes? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Generating the partitions of a number
Prime number sum
The number 7 can be expressed in 5 ways as a sum of primes:
2 + 2 + 3
2 + 3 + 2
2 + 5
3 + 2 + 2
5 + 2
Make a program that calculates, in how many ways number n can be
expressed as a sum of primes. You can assume that n is a number
between 0-100. Your program should print the answer in less than a
second
Example 1:
Give number: 7 Result: 5
Example 2:
Give number: 20 Result: 732
Example 3:
Give number: 80 Result: 10343662267187
I've been at this problem for hours. I can't figure out how to get n from (n-1).
Here are the sums from the first 30 numbers by a tree search
0 0 0 1 2 2 5 6 10 16 19 35 45 72 105 152 231 332 500 732 1081 1604 2351 3493 5136 7595 11212 16534 24441
I thought I had something with finding the biggest chain 7 = 5+2 and somehow using the knowledge that five can be written as 5, 3+2, 2+3, but somehow I need to account for the duplicate 2+3+2 replacement.
Look up dynamic programming, specifically Wikipedia's page and the examples there for the fibonacci sequence, and think about how you might be able to adapt that to your problem here.
Okay so this is a complicated problem. you are asking how to write code for the Partition Function; I suggest that you read up on the partition function itself first. Next you should look at algorithms to calculate partitions. It is a complex subject here is a starting point ... Partition problem is [NP complete] --- This question has already been asked and answered here and that may also help you start with algorithms.
There're several options. Since you know the number is between 0-100, there is the obvious: cheat, simply make an array and fill in the numbers.
The other way would be a loop. You'd need all the primes under 100, because a number which is smaller than 100 can't be expressed using the sum of a prime which is larger than 100. Eg. 99 can't be expressed as the sum of 2 and any prime larger than 100.
What you also know is: the maximum length of the sum for even numbers is the number divided by 2. Since 2 is the smallest prime. For odd numbers the maximum length is (number - 1) / 2.
Eg.
8 = 2 + 2 + 2 + 2, thus length of the sum is 4
9 = 2 + 2 + 2 + 3, thus length of the sum is 4
If you want performance you could cheat in another way by using GPGPU, which would significantly increase performance.
Then they're is the shuffling method. If you know 7 = 2 + 2 + 3, you know 7 = 2 + 3 + 2. To do this you'd need a method of calculating the different possibilities of shuffling. You could store the combinations of possibilities or keep them in mind while writing your loop.
Here is a relative brute force method (in Java):
int[] primes = new int[]{/* fill with primes < 100 */};
int number = 7; //Normally determined by user
int maxLength = (number % 2 == 0) ? number / 2 : (number - 1) / 2; //If even number maxLength = number / 2, if odd, maxLength = (number - 1) / 2
int possibilities = 0;
for (int i = 1; i <= maxLength; i++){
int[][] numbers = new int[i][Math.pow(primes.length, i)]; //Create an array which will hold all combinations for this length
for (int j = 0; j < Math.pow(primes.length, i); j++){ //Loop through all the possibilities
int value = 0; //Value for calculating the numbers making up the sum
for (int k = 0; k < i; k++){
numbers[k][j] = primes[(j - value) % (Math.pow(primes.length, k))]; //Setting the numbers making up the sum
value += numbers[k][j]; //Increasing the value
}
}
for (int x = 0; x < primes.length; x++){
int sum = 0;
for (int y = 0; y < i; y++){
sum += numbers[y];
if (sum > number) break; //The sum is greater than what we're trying to reach, break we've gone too far
}
if (sum == number) possibilities++;
}
}
I understand this is complicated. I will try to use an analogy. Think of it as a combination lock. You know the maximum number of wheels, which you have to try, hence the "i" loop. Next you go through each possibility ("j" loop) then you set the individual numbers ("k" loop). The code in the "k" loop is used to go from the current possibility (value of j) to the actual numbers. After you entered all combinations for this amount of wheels, you calculate if any were correct and if so, you increase the number of possibilities.
I apologize in advance if I made any errors in the code.