All pair Bitwise OR sum - bit-manipulation

Is there an algorithm to find Bit-wise OR sum or an array in linear time complexity?
Suppose if the array is {1,2,3} then all pair sum id 1|2 + 2|3 + 1|3 = 9.
I can find all pair AND sum in O(n) using following algorithm.... How can I change this to get all pair OR sum.
int ans = 0; // Initialize result
// Traverse over all bits
for (int i = 0; i < 32; i++)
{
// Count number of elements with i'th bit set
int k = 0; // Initialize the count
for (int j = 0; j < n; j++)
if ( (arr[j] & (1 << i)) )
k++;
// There are k set bits, means k(k-1)/2 pairs.
// Every pair adds 2^i to the answer. Therefore,
// we add "2^i * [k*(k-1)/2]" to the answer.
ans += (1<<i) * (k*(k-1)/2);
}
From here: http://www.geeksforgeeks.org/calculate-sum-of-bitwise-and-of-all-pairs/

You can do it in linear time. The idea is as follows:
For each bit position, record the number of entries in your array that have that bit set to 1. In your example, you have 2 entries (1 and 3) with the ones bit set, and 2 entries with the two's bit set (2 and 3).
For each number, compute the sum of the number's bitwise OR with all other numbers in the array by looking at each bit individually and using your cached sums. For example, consider the sum 1|1 + 1|2 + 1|3 = 1 + 3 + 3 = 7.
Because 1's last bit is 1, the result of a bitwise or with 1 will have the last bit set to 1. Thus, all three of the numbers 1|1, 1|2, and 1|3 will have last bit equal to 1. Two of those numbers have the two's bit set to 1, which is precisely the number of elements which have the two's bit set to 1. By grouping the bits together, we can obtain the sum 3*1 (three ones bits) + 2*2 (two two's bits) = 7.
Repeating this procedure for each element lets you compute the sum of all bitwise ors for all ordered pairs of elements in the array. So in your example, 1|2 and 2|1 will be computed, as will 1|1. So you'll have to subtract off all cases like 1|1 and then divide by 2 to account for double counting.
Let's try this algorithm out for your example.
Writing the numbers in binary, {1,2,3} = {01, 10, 11}. There are 2 numbers with the one's bit set, and 2 with the two's bit set.
For 01 we get 3*1 + 2*2 = 7 for the sum of ors.
For 10 we get 2*1 + 3*2 = 8 for the sum of ors.
For 11 we get 3*1 + 3*2 = 9 for the sum of ors.
Summing these, 7+8+9 = 24. We need to subtract off 1|1 = 1, 2|2 = 2 and 3|3 = 3, as we counted these in the sum. 24-1-2-3 = 18.
Finally, as we counted things like 1|3 twice, we need to divide by 2. 18/2 = 9, the correct sum.
This algorithm is O(n * max number of bits in any array element).
Edit: We can modify your posted algorithm by simply subtracting the count of all 0-0 pairs from all pairs to get all 0-1 or 1-1 pairs for each bit position. Like so:
int ans = 0; // Initialize result
// Traverse over all bits
for (int i = 0; i < 32; i++)
{
// Count number of elements with i'th bit not set
int k = 0; // Initialize the count
for (int j = 0; j < n; j++)
if ( !(arr[j] & (1 << i)) )
k++;
// There are k not set bits, means k(k-1)/2 pairs that don't contribute to the total sum, out of n*(n-1)/2 pairs.
// So we subtract the ones from don't contribute from the ones that do.
ans += (1<<i) * (n*(n-1)/2 - k*(k-1)/2);
}

Related

Maximize XOR Equation

Problem statement:
Given an array of n elements and an integer k, find an integer x in
the range [0,k] such that Xor-sum(x) is maximized. Print the maximum
value of the equation.
Xor-sum(x)=(x XOR A1)+(x XOR A[2])+(x XOR A[3])+…………..+(x XOR A[N])
Input Format
The first line contains integer N denoting the number of elements in
A. The next line contains an integer, k, denoting the maximum value
of x. Each line i of the N subsequent lines(where 0<=i<=N) contains
an integer describing Ai.
Constraints
1<=n<=10^5
0<=k<=10^9
0<=A[i]<=10^9
Sample Input
3
7
1
6
3
Sample Output
14
Explanation
Xor_sum(4)=(4^1)+(4^6)+(4^3)=14.
This problem was asked in Infosys requirement test. I was going through previous year papers &
I came across this problem.
I was only able to come up with a brute-force solution which is just to calculate the equation
for every x in the range [0,k] and print the maximum. But, the solution won't work for the
given constraints.
My solution
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n, k, ans = 0;
cin >> n >> k;
vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i];
for (int i = 0; i <= k; i++) {
int temp = 0;
for (int j = 0; j < n; j++) {
temp += (i ^ a[j]);
}
ans = max(temp, ans);
}
cout << ans;
return 0;
}
I found the solution on a website. I was unable to understand what the code does but, this solution gives incorrect answer for some test cases.
Scroll down to question 3
The trick here is that XOR works on bits in parallel, independently. You can optimize each bit of x. Brute-forcing this takes 2*32 tries, given the constraints.
As said in other comments each bit of x will give an independent contribution to the sum, so the first step is to calculate the added value for each possible bit.
To do this for the i-th bit of x count the number of 0s and 1s in the same position of each number in the array, if the difference N0 - N1 is positive then the added value is also positive and equal to (N0-N1) * 2^i, let's call such bits "useful".
The number x will be a combination of useful bits only.
Since k is not in the form 2^n - 1, we need a strategy to find the best combination (if you don't want to use brute force on the k possible values).
Consider then the binary representation of k and loop over its bits starting from the MSB, initializing two variables: CAV (current added value) = 0, BAV (best added value) = 0.
If the current bit is 0 loop over.
If the current bit is 1:
a) calculate the AV sum of all useful bits with lower index plus the CAV, if the result is greater then the BAV then replace BAV
b) if the current bit is not useful quit loop
c) add the current bit added value to CAV
When the loop is over, if CAV is greater than BAV replace BAV
EDIT: A sample implementation (in Java, sorry :) )
public class XorSum {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
int n=sc.nextInt();
int k=sc.nextInt();
int[] a=new int[n];
for (int i=0;i<n;i++) {
a[i]=sc.nextInt();
}
//Determine the number of bits to represent k (position of most significant 1 + 1)
int msb=0;
for (int kcopy=k; kcopy!=0; kcopy=kcopy>>>1) {
msb++;
}
//Compute the added value of each possible bit in x
int[] av=new int[msb];
int bmask=1;
for (int bit=0;bit<msb;bit++) {
int count0=0;
for (int i=0;i<n;i++) {
if ((a[i]&bmask)==0) {
count0++;
}
}
av[bit]=(count0*2-n)*bmask;
bmask = bmask << 1;
}
//Accumulated added value, the value of all positive av bits up to the index
int[] aav=new int[msb];
for (int bit=0;bit<msb;bit++) {
if (av[bit]>0) {
aav[bit]=av[bit];
}
if (bit>0) {
aav[bit]+=aav[bit-1];
}
}
//Explore the space of possible combinations moving on the k boundary
int cval=0;
int bval=0;
bmask = bmask >>> 1;
//Start from the msb
for (int bit=msb-1;bit>=0;bit--) {
//Exploring the space of bit combination we have 3 possible cases:
//bit of k is 0, then we must choose 0 as well, setting it to 1 will get x to be greater than k, so in this case just loop over
if ((k&bmask)==0) {
continue;
}
//bit of k is 1, we can choose between 0 and 1:
//- choosing 0, we can immediately explore the complete branch considering that all following bits can be set to 1, so just set to 1 all bits with positive av
// and get the meximum possible value for this branch
int val=cval+(bit>0?aav[bit]:0);
if (val>bval) {
bval=val;
}
//- choosing 1, if the bit has no positive av, then it's forced to 0 and the solution is found on the other branch, so we can stop here
if (av[bit]<=0) break;
//- choosing 1, with a positive av, then store the value and go on with this branch
cval+=av[bit];
}
if (cval>bval) {
bval=cval;
}
//Final sum
for (int i=0;i<n;i++) {
bval+=a[i];
}
System.out.println(bval);
}
}
I think you can consider solving for each bit. The number X should be the one that can turn on many high-order bits in the array. So you can count the number of bits 1 for 2^0, 2^1, ... And for each position in the 32 bits consider turning on the ones that many number has that position to be bit 0.
Combining this with the limit K should give you an answer that runs in O(log K) time.
Assuming k is unbounded, this problem is trivial.
For each bit (assuming 64-bit words there would be 64 for example) accumulate the total count of 1's and 0's in all values in the array (for that bit), with c1_i and c0_i representing the former and latter respectively for bit i.
Then define each bit b_i in x as
x_i = 1 if c0_i > c1_i else 0
Constructing x as described above is guaranteed to give you the value of x that maximizes the sum of interest.
When k is specific number, this can be solved using a dynamic programming solution. To understand how, first derive a recurrence.
Let z_0,z_1,...,z_n be the positions of ones occuring in k's binary representation with z_0 being the most significant position.
Let M[t] represent the maximum sum possible given the problem's array and defining any x such that x < t.
Important note: the optimal value of M[t] for t a power of 2 is obtained by following the procedure described above for an unbounded k, but limiting the largest bit used.
To solve this problem, we want to find
M[k] = max(M[2^z_0],M[k - 2^z_0] + C_0)
where C_i is defined to be the contribution to the final sum by setting the position z_i to one.
This of course continues as a recursion, with the next step being:
M[k - 2^z_0] = max(M[2^z_1],M[k - 2^z_0 - 2^z_1] + C_1)
and so on and so forth. The dynamic programming solution arises by converting this recursion to the appropriate DP algorithm.
Note, that due to the definition of M[k], it is still necessary to check if the sum of x=k is greater than M[k], as it may still be so, but this requires one pass.
At bit level it is simple 0 XOR 0, 1 XOR 1 = 0 and last one 0 XOR 1 = 1, but when these bit belongs to a number XOR operations have addition and subtraction effect. For example if third bit of a number is set and num XOR with 4 (0100) which also have third bit set then result would be subtraction from number by 2^(3-1), for example num = 5 then 0101 XOR 0100 = 0001, 4 subtracted in 5 , Similarly if third bit of a number is not set and num XOR with 4 then result would be addition for example num = 2 then 0010 XOR 0100 = 0101, 4 will be added in 2. Now let’s see this problem,
This problem can’t be solved by applying XOR on each number individually, rather the approach to solve this problem is Perform XOR on particular bit of all numbers, in one go! . Let’s see how it can be done?
Fact 1: Let’s consider we have X and we want to perform XOR on all numbers with X and if we know second bit of X is set, now suppose somehow we also know that how many numbers in all numbers have second bit set then we know answer 1 XOR 1 = 0 and we don’t have to perform XOR on each number individually.
Fact 2: From fact 1, we know how many numbers have a particular bit set, let’s call it M and if X also have that particular bit set then M * 2^(pos -1) will be subtracted from sum of all numbers. If N is total element in array than N - M numbers don’t have that particular bit set and due to it (N – M) * 2^(pos-1) will be added in sum of all numbers.
From Fact 1 and Fact 2 we can calculate overall XOR effect on a particular bit on all Numbers by effect = (N – M)* 2^(pos -1) – (M * 2^(pos -1)) and can perform the same for all bits.
Now it’s time to see above theory in action, if we have array = {1, 6, 3}, k = 7 then,
1 = 0001 (There are total 32 bits but I am showing only relevant bits other bits are zero)
6 = 0110
3 = 0011
So our bit count list = [0, 1, 2, 2] as you can see 1 and 3 have first bit set, 6 and 3 have second bit set and only 6 have third bit set.
X = 0, …, 7 but X = 0 have effect = 0 on sum because if bit is not set then it doesn’t not affect other bit in XOR operation, so let’s star from X = 1 which is 0001,
[0, 1, 2, 2] = count list,
[0, 0, 0, 1] = X
As it is visible in count list two numbers have first bit set and X also have first bit set, it means 2 * 2^(1 – 1) will be subtract in sum and total numbers in array are three, so (3 – 2) * 2^(1-1) will be added in sum. Conclusion is XOR of first bit is, effect = (3 – 2) * 2^(1-1) - 2 * 2^(1 – 1) = 1 – 2 = -1. It is also overall effect by X = 1 because it only has first bit set and rest of bits are zero. At this point we compare effect produced by X = 1 with X = 0 and -1 < 0 which means X = 1 will reduce sum of all numbers by -1 but X = 0 will not deduce sum of all numbers. So until now X = 0 will produce max sum.
The way XOR is performed for X = 1 can be performed for all other values and I would like to jump directly to X = 4 which is 0100
[0, 1, 2, 2] = count list,
[0, 1, 0, 0] = X
As it is visible X have only third bit set and only one number in array have first bit set, it means 1 * 2^(3 – 1 ) will be subtracted and (3 – 1) * 2^(3-1) will be added and overall effect = (3 – 1) * 2^(3-1) - 1 * 2^(3 – 1 ) = 8 – 4 = 4. At this point we compare effect of X = 4 with known max effect which is effect = 0 so 4 > 0 and due to this X = 4 will produce max sum and we considered it. When you perform this for all X = 0,…,7, you will find X = 4 will produce max effect on sum, so the answer is X = 4.
So
(x XOR arr[0]) + ( x XOR arr[1]) +….. + (x XOR arr[n]) = effect + sum(arr[0] + sum[1]+ …. + arr[n])
Complexity is,
O(32 n) to find for all 32 bits, how many number have a particular bit set, plus,
O(32 k) to find effect of all X in [0, k],
Complexity = O(32 n) + O(32 k) = O(c n) + O(c k), here c is constant,
finally
Complexity = O(n)
#include <iostream>
#include <cmath>
#include <bitset>
#include <vector>
#include <numeric>
std::vector<std::uint32_t> bitCount(const std::vector<std::uint32_t>& numList){
std::vector<std::uint32_t> countList(32, 0);
for(std::uint32_t num : numList){
std::bitset<32> bitList(num);
for(unsigned i = 0; i< 32; ++i){
if(bitList[i]){
countList[i] += 1;
}
}
}
return countList;
}
std::pair<std::uint32_t, std::int64_t> prefXAndMaxEffect(std::uint32_t n, std::uint32_t k,
const std::vector<std::uint32_t>& bitCountList){
std::uint32_t prefX = 0;
std::int64_t xorMaxEffect = 0;
std::vector<std::int64_t> xorBitEffect(32, 0);
for(std::uint32_t x = 1; x<=k; ++x){
std::bitset<32> xBitList(x);
std::int64_t xorEffect = 0;
for(unsigned i = 0; i< 32; ++i){
if(xBitList[i]){
if(0 != xorBitEffect[i]){
xorEffect += xorBitEffect[i];
}
else{
std::int64_t num = std::exp2(i);
xorBitEffect[i] = (n - bitCountList[i])* num - (bitCountList[i] * num);
xorEffect += xorBitEffect[i];
}
}
}
if(xorEffect > xorMaxEffect){
prefX = x;
xorMaxEffect = xorEffect;
}
}
return {prefX, xorMaxEffect};
}
int main(int , char *[]){
std::uint32_t k = 7;
std::vector<std::uint32_t> numList{1, 6, 3};
std::pair<std::uint32_t, std::int64_t> xAndEffect = prefXAndMaxEffect(numList.size(), k, bitCount(numList));
std::int64_t sum = 0;
sum = std::accumulate(numList.cbegin(), numList.cend(), sum) + xAndEffect.second;
std::cout<< sum<< '\n';
}
Output :
14

Intuition behind storing the remainders?

I am trying to solve a question on LeetCode.com:
Given a list of non-negative numbers and a target integer k, write a function to check if the array has a continuous subarray of size at least 2 that sums up to the multiple of k, that is, sums up to n*k where n is also an integer. For e.g., if [23, 2, 4, 6, 7], k=6, then the output should be True, since [2, 4] is a continuous subarray of size 2 and sums up to 6.
I am trying to understand the following solution:
class Solution {
public:
bool checkSubarraySum(vector<int>& nums, int k) {
int n = nums.size(), sum = 0, pre = 0;
unordered_set<int> modk;
for (int i = 0; i < n; ++i) {
sum += nums[i];
int mod = k == 0 ? sum : sum % k;
if (modk.count(mod)) return true;
modk.insert(pre);
pre = mod;
}
return false;
}
};
I understand that we are trying to store: 0, (a/k), (a+b)/k, (a+b+c)/k, etc. into the hashSet (where k!=0) and that we do that in the next iteration since we want the subarray size to be at least 2.
But, how does this guarantee that we get a subarray whose elements sum up to k? What mathematical property guarantees this?
The set modk is gradually populated with all sums (considered modulo k) of contiguous sub-arrays starting at the beginning of the array.
The key observation is that:
a-b = n*k for some natural n iff
a-b ≡ 0 mod k iff
a ≡ b mod k
so if a contiguous sub-array nums[i_0]..nums[i_1], sums up to 0 modulo k, then the two sub-arrays nums[0]..nums[i_0] and nums[i_0 + 1]..nums[i_1] have the same sum modulo k.
Thus it's enough if two distinct sub-arrays starting at the beginning of the array have the same sum, modulo k.
Luckily, there are only k such values, so you only need to use a set of size k.
Some nitpicks:
if n > k, you're going to have an appropriate sub-array anyway (the pigeon-hole principle), so the loop will actually never iterate more than k+1 times.
There should not be any sort of class involved here, that makes no sense.
contiguous, not continuous. Arrays and sub-arrays are discrete and can't be continuous...
module base k of sum is equivalent to the module k of sum of the modules base k
(a+b)%k = (a%k + b%k) % k
(23 + 2) % 6 = 1
( (23%6) + (2%6) ) % 6 = (5 + 2) % 6 = 1
modk stores all modules that you calculated iteratively. If at iteration i you get a repeated module calculated at i-m that means that you added a subsequence of m elements which sum is multiple of k
i=0 nums[0] = 23 sum = 23 sum%6 = 5 modk = [5]
i=1 nums[1] = 2 sum = 25 sum%6 = 1 modk = [5, 1]
i=2 nums[2] = 4 sum = 29 sum%6 = 5 5 already exists in modk (4+2)%6 =0

Sum of difference of a number to an array of numbers

This is my problem.
Given an array of integers and another integer k, find the sum of differences of each element of the array and k.
For example if the array is 2, 4, 6, 8, 10 and k is 3
Sum of difference
= abs(2 - 3) + abs(4-3) + abs(6 - 3) + abs(8 - 3) + abs(10 - 3)
= 1 + 1 + 3 + 5 + 7
= 17
The array remains the same throughout and can contain up to 100000 elements and there will be 100000 different values of k to be tested. k may or may not be an element of the array. This has to be done within 1s or about 100M operations. How do I achieve this?
You can run multiple queries for sums of absolute differences in O(log N) if you add a preprocessing step which costs O(N * log N).
Sort the array, then for each item in the array store the sum of all numbers that are smaller than or equal to the corresponding item. This can be done in O(N * log N) Now you have a pair of arrays that look like this:
2 4 6 8 10 // <<== Original data
2 6 12 20 30 // <<== Partial sums
In addition, store the total T of all numbers in the array.
Now you can get sums of absolute differences by running a binary search on the original array, and using the sums from the partial sums array to compute the answer: subtract the sum of all numbers to the left of the target k from the count of numbers to the left of the target times k, then subtract the count times k from the sum to the right of the number, and add the two numbers together. The partial sum of the numbers to the right of the number can be computed by subtracting the partial sum on the left from the total T.
For k=3 binary search gets you to position 1.
Partial sum on the left is 2
Count of items on the left is 1
Partial sum on the right is (30-2)=28
Count of items on the right is 4
You compute (1*3-2) + (28-4*3) = 1 + 16 = 17
First sort the array and then compute an array that stores the sum of the prefixes of the resulting sorted array. Let's denote this array p, you can compute p in linear time so that p[i] = a[0] + a[1] + ... a[i]. Now having this array you can answer with constant complexity the question what is the sum of elements a[x] + a[x+1] + .... +a[y](i.e. with indices x to y). To do that you simply compute p[y] - p[x-1](Take special care when x is 1).
Now to answer a query of the type what is the sum of absolute differences with k, we will split the problem in two parts - what is the sum of the numbers greater than k and the numbers smaller than k. In order to compute these, perform a binary search to find the position of k in the sorted a(denote that idx), and compute the sum of the values in a before idx(denote that s) and after idx(denote that S). Now the sum of absolute differences with k is idx * k - s + S - (a.length - idx)* k. This of course is pseudo code and what I mean by a.length is the number of elements in a.
After performing a linearithmic precomputation, you will be able to answer a query with O(log(n)). Please note this approach only makes sense if you plan to perform multiple queries. If you are only going to perform a single query, you can not possibly go faster than O(n).
Just implementing dasblinkenlight's solution in "contest C++":
It does exactly as he says. Reads the values, sorts them, stores the accumulated sum in V[i].second, but here V[i] is the acumulated sum until i-1 (to simplify the algorithm). It also stores a sentinel in V[n] for cases when the query is greater than max(V).
Then, for each query, binary search for the value. In this case V[a].second is the sum of values lesser than query, V[n].second-V[a].second is the sum of values greater than it.
#include<iostream>
#include<algorithm>
#define pii pair<int, int>
using namespace std;
pii V[100001];
int main() {
int n;
while(cin >> n) {
for(int i=0; i<n; i++)
cin >> V[i].first;
sort(V, V+n);
V[0].second = 0;
for(int i=1; i<=n; i++)
V[i].second = V[i-1].first + V[i-1].second;
int k; cin >> k;
for(int i=0; i<k; i++) {
int query; cin >> query;
pii* res = upper_bound(V, V+n, pii(query, 0));
int a = res-V, b=n-(res-V);
int left = query*a-V[a].second;
int right = V[n].second-V[a].second-query*b;
cout << left+right << endl;
}
}
}
It assumes a file with a format like this:
5
10 2 8 4 6
2
3 5
Then, for each query, it answers like this:
17
13

Optimizing algorithm to find number of six digit numbers satisfying certain property

Problem: "An algorithm to find the number of six digit numbers where the sum of the first three digits is equal to the sum of the last three digits."
I came across this problem in an interview and want to know the best solution. This is what I have till now.
Approach 1: The Brute force solution is, of course, to check for each number (between 100,000 and 999,999) whether the sum of its first three and last three digits are equal. If yes, then increment certain counter which keeps count of all such numbers.
But this checks for all 900,000 numbers and so is inefficient.
Approach 2: Since we are asked "how many" such numbers and not "which numbers", we could do better. Divide the number into two parts: First three digits (these go from 100 to 999) and Last three digits (these go from 000 to 999). Thus, the sum of three digits in either part of a candidate number can range from 1 to 27.
* Maintain a std::map<int, int> for each part where key is the sum and value is number of numbers (3 digit) having that sum in the corresponding part.
* Now, for each number in the first part find out its sum and update the corresponding map.
* Similarly, we can get updated map for the second part.
* Now by multiplying the corresponding pairs (e.g. value in map 1 of key 4 and value in map 2 of key 4) and adding them up we get the answer.
In this approach, we end up checking 1K numbers.
My question is how could we further optimize? Is there a better solution?
For 0 <= s <= 18, there are exactly 10 - |s - 9| ways to obtain s as the sum of two digits.
So, for the first part
int first[28] = {0};
for(int s = 0; s <= 18; ++s) {
int c = 10 - (s < 9 ? (9 - s) : (s - 9));
for(int d = 1; d <= 9; ++d) {
first[s+d] += c;
}
}
That's 19*9 = 171 iterations, for the second half, do it similarly, with the inner loop starting at 0 instead of 1, that's 19*10 = 190 iterations. Then sum first[i]*second[i] for 1 <= i <= 27.
Generate all three-digit numbers; partition them into sets based on their sum of digits. (Actually, all you need to do is keep a vector that counts the size of the sets). For each set, the number of six-digit numbers that can be generated is the size of the set squared. Sum up the squares of the set sizes to get your answer.
int sumCounts[28]; // sums can go from 0 through 27
for (int i = 0; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += count * count;
}
EDIT Variation to eliminate counting leading zeroes:
int sumCounts[28];
int sumCounts2[28];
for (int i = 0; i < 100; ++i) {
int s = sumOfDigits(i);
sumCounts[s]++;
sumCounts2[s]++;
}
for (int i = 100; i < 1000; ++i) {
sumCounts[sumOfDigits(i)]++;
}
int total = 0;
for (int i = 0; i < 28; ++i) {
count = sumCounts[i];
total += (count - sumCounts2[i]) * count;
}
Python Implementation
def equal_digit_sums():
dists = {}
for i in range(1000):
digits = [int(d) for d in str(i)]
dsum = sum(digits)
if dsum not in dists:
dists[dsum] = [0,0]
dists[dsum][0 if len(digits) == 3 else 1] += 1
def prod(dsum):
t = dists[dsum]
return (t[0]+t[1])*t[0]
return sum(prod(dsum) for dsum in dists)
print(equal_digit_sums())
Result: 50412
One idea: For each number from 0 to 27, count the number of three-digit numbers that have that digit sum. This should be doable efficiently with a DP-style approach.
Now you just sum the squares of the results, since for each answer, you can make a six-digit number with one of those on each side.
Assuming leading 0's aren't allowed, you want to calculate how many different ways are there to sum to n with 3 digits. To calculate that you can have a for loop inside a for loop. So:
firstHalf = 0
for i in xrange(max(1,n/3),min(9,n+1)): #first digit
for j in xrange((n-i)/2,min(9,n-i+1)): #second digit
firstHalf +=1 #Will only be one possible third digit
secondHalf = firstHalf + max(0,10-|n-9|)
If you are trying to sum to a number, then the last number is always uniquely determined. Thus in the case where the first number is 0 we are just calculating how many different values are possible for the second number. This will be n+1 if n is less than 10. If n is greater, up until 18 it will be 19-n. Over 18 there are no ways to form the sum.
If you loop over all n, 1 through 27, you will have your total sum.

C++: function creation using array

Write a function which has:
input: array of pairs (unique id and weight) length of N, K =< N
output: K random unique ids (from input array)
Note: being called many times frequency of appearing of some Id in the output should be greater the more weight it has.
Example: id with weight of 5 should appear in the output 5 times more often than id with weight of 1. Also, the amount of memory allocated should be known at compile time, i.e. no additional memory should be allocated.
My question is: how to solve this task?
EDIT
thanks for responses everybody!
currently I can't understand how weight of pair affects frequency of appearance of pair in the output, can you give me more clear, "for dummy" explanation of how it works?
Assuming a good enough random number generator:
Sum the weights (total_weight)
Repeat K times:
Pick a number between 0 and total_weight (selection)
Find the first pair where the sum of all the weights from the beginning of the array to that pair is greater than or equal to selection
Write the first part of the pair to the output
You need enough storage to store the total weight.
Ok so you are given input as follows:
(3, 7)
(1, 2)
(2, 5)
(4, 1)
(5, 2)
And you want to pick a random number so that the weight of each id is reflected in the picking, i.e. pick a random number from the following list:
3 3 3 3 3 3 3 1 1 2 2 2 2 2 4 5 5
Initially, I created a temporary array but this can be done in memory as well, you can calculate the size of the list by summing all the weights up = X, in this example = 17
Pick a random number between [0, X-1], and calculate which which id should be returned by looping through the list, doing a cumulative addition on the weights. Say I have a random number 8
(3, 7) total = 7 which is < 8
(1, 2) total = 9 which is >= 8 **boom** 1 is your id!
Now since you need K random unique ids you can create a hashtable from initial array passed to you to work with. Once you find an id, remove it from the hash and proceed with algorithm. Edit Note that you create the hashmap initially only once! You algorithm will work on this instead of looking through the array. I did not put in in the top to keep the answer clear
As long as your random calculation is not using any extra memory secretly, you will need to store K random pickings, which are <= N and a copy of the original array so max space requirements at runtime are O(2*N)
Asymptotic runtime is :
O(n) : create copy of original array into hastable +
(
O(n) : calculate sum of weights +
O(1) : calculate random between range +
O(n) : cumulative totals
) * K random pickings
= O(n*k) overall
This is a good question :)
This solution works with non-integer weights and uses constant space (ie: space complexity = O(1)). It does, however modify the input array, but the only difference in the end is that the elements will be in a different order.
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Subtract input[i-1].weight from input[i].weight (unless i == 0). Now subtract input[i].weight from to following (> i) input weights and also sum_weight.
Move input[i] to position [n-1] (sliding the intervening elements down one slot). This is the expensive part, as it's O(N) and we do it K times. You can skip this step on the last iteration.
subtract 1 from n
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*N). The expensive part (of the time complexity) is shuffling the chosen elements. I suspect there's a clever way to avoid that, but haven't thought of anything yet.
Update
It's unclear what the question means by "output: K random unique Ids". The solution above assumes that this meant that the output ids are supposed to be unique/distinct, but if that's not the case then the problem is even simpler:
Add the weight of each input to the weight of the following input, starting from the bottom working your way up. Now each weight is actually the sum of that input's weight and all of the previous weights.
sum_weights = the sum of all of the weights, and n = N.
K times:
Choose a random number r in the range [0,sum_weights)
binary search the first n elements for the first slot where the (now summed) weight is greater than or equal to r, i.
Add input[i].id to output.
Fix back all of the weights from n-1 down to 1 by subtracting the preceding input's weight
Time complexity is O(K*log(N)).
My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
Generate random numbers with a given distribution (specified by weights)
Generate unique random numbers
Generate random numbers with a given distribution
Calculate sum of weights (sumOfWeights)
Generate random number from the range [1; sumOfWeights]
Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random() function instead of standard rand() in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526
I do assume that the ids in the output must be unique. This makes this problem a specific instance of random sampling problems.
The first approach that I can think of solves this in O(N^2) time, using O(N) memory (The input array itself plus constant memory).
I Assume that the weights are possitive.
Let A be the array of pairs.
1) Set N to be A.length
2) calculate the sum of all weights W.
3) Loop K times
3.1) r = rand(0,W)
3.2) loop on A and find the first index i such that A[1].w + ...+ A[i].w <= r < A[1].w + ... + A[i+1].w
3.3) add A[i].id to output
3.4) A[i] = A[N-1] (or swap if the array contents should be preserved)
3.5) N = N - 1
3.6) W = W - A[i].w