I came across this piece of code to compute least common factor of all numbers in an array but could not understand the algorithm used. What is the use of __builtin_popcount here which is used to count the number of set bits?
pair<long long, int> pre[200000];
long long a[25], N;
long long trunc_mul(long long a, long long b)
{
return a <= INF / b ? a * b : INF;
}
void compute()
{
int limit = 1 << N;
limit--;
for (int i = 1; i <= limit; i++)
{
long long lcm = 1;
pre[i].second = __builtin_popcount(i);
int k = 1;
for (int j = N - 1; j >= 0; j--)
{
if (k&i)
{
lcm = trunc_mul(lcm / __gcd(lcm, a[j]), a[j]);
}
k = k << 1;
}
pre[i].first = lcm;
}
return;
}
The code snipped you provided is given up to 25 numbers. For each subset of numbers it computes their LCM into pre[i].first and number of them in that subset into pre[i].second. The subset itself is represented as a bitmask, therefore to compute number of elements in the subset the snippet uses __builtin_popcount. It has nothing to do with the computation of the LCM.
LCM is computed using a rather standard approach: LCM of any set of numbers is equal to their product divided by their GCD. This is exactly what this snipped does, using builtin GCD function __gcd.
The k&i and k = k<<1 part is to figure out what numbers belong to a set represented by a bitmask. If you don't fully understand it, try to see what happens if i = 0b11010, by running this loop on a piece of paper or in the debugger. You will notice that k&i condition will be true on the second, fourth and fifth iteration, precisely the positions at which i has ones in its binary representation.
Related
I am going to try and explain the problem as clearly as possible:
The user enters 2 numbers, n and q.
we take the first n fibonacci numbers and modulus each of them with q and put the modulus result in arr. So arr now has n elements. Till here the program works just fine.
We now have to sort arr using radix sort. When the test cases are small, for example n=5 q=100, n=15 q=13, n=1000000 q=1000000, the radix sort works just fine and i get the correct output.
the program runs into an infinite loop while sorting when n=5000000 and q=1000000000 (array length is 5000000) and largest number in the array is 999999973.
After sorting sorting there are some modulo arithmetic calculations which can be ignored as they are working fine and have no error.
Can someone kindly help me to check my sorting algorithm? Also right now I have chosen base as 2^20 for radix sort. On what basis do we choose the the base? length of largest number in the array?
The correct output for n=5000000 and q=1000000000 is 973061125 (just for reference if anyone decides to run the program and check).
#include<iostream>
#include<algorithm>
using namespace std;
void countsort(long int* arr, long int n,long int shift)
{
long int* count = new long int[1048576];
for (int i = 0; i < 1048576; i++)
count[i] = 0;
long int *output=new long int[n];
long int i, last;
for (i = 0; i < n; i++)
{
++count[(arr[i] >> shift) & 1048575];
}
for (i = last = 0; i < 1048576; i++)
{
last += count[i];
count[i] = last - count[i];
}
for (i = 0; i < n; i++)
{
output[count[(arr[i] >> shift) & 1048575]++] = arr[i];
}
for (i = 0; i < n; i++)
{
arr[i] = output[i];
}
delete[] output;
delete[] count;
}
int main()
{
int trials = 0;
cin >> trials;
while (trials--)
{
long int n = 0;
long int q = 0;
cin >> n;
cin >> q;
long int first = 0, second = 1, fib = 0;
long int* arr = new long int[n];
arr[0] = second;
long int m = 0;
for (long int i = 1; i < n; i++)
{
fib = (first + second) % q;
first = second;
second = fib;
arr[i] = fib;
if (m < arr[i])
m = arr[i];
}
//m is the largest integer in the array
// this is where radix sort starts
for (long int shift = 0; (m >> shift) > 0; shift += 20)
{
countsort(arr, n, shift);
}
long long int sum = 0;
for (long int i = 0; i < n; i++)
{
sum = sum + ((i + 1) * arr[i]) % q;
}
sum = sum % q;
cout << sum << endl;
}
}
The infinite loop problem is in this line
for (long int shift = 0; (m >> shift) > 0; shift += 20)
Assuming this is being run on a X86 processor, only the lower bits of a shift count are used, so for a 32 bit integer, only the lower 5 bits (0 to 31) of a shift count are used, and for a 64 bit integer, only the lower 6 bits (0 to 63) are used. Most compilers will not compensate for this limitation. (The original 8086/8088/80186 did not mask the shift count, this started with the 80286).
The other issue from your prior question was that (i + 1) * arr[i] can be greater than 32 bits. The prior question had sum defined as long long int. The code could also have i defined as long long int (or it could use a cast before doing the multiply). Fixes noted in comments. I don't know if sum is supposed to be % q, so I left it as a long long int value.
for (int shift = 0; m > 0; shift += 20) // fix
{
countsort(arr, n, shift);
m >>= shift; // fix
}
long long int sum = 0; // fix (long long)
for (long long int i = 0; i < n; i++) // fix (long long)
{
sum = sum + ((i + 1) * arr[i]) % q;
}
sum = sum;
cout << sum << endl;
I don't know what compiler you are using, so I don't know if long int is 32 bits or 64 bits. Your prior questions code declared sum as long long int, which is used to declare a 64 bit integer in the case of Visual Studio. I don't know about other compilers. If long int is 32 bits, then this like is a potential problem:
fib = (first + second) % q;
because first + second can sum up to be a negative number and the sign of the remainder will be the same as the sign of the dividend. Negative numbers will be an issue for the radix sort code you are using. Declaring fib as an unsigned int or as a long long int will avoid this issue.
As for choosing a base, it would probably be better to have all of the logic in countsort, and rename it to radix sort. Using base 2^8 and making 4 passes would be faster (due to the counts / indexes fitting in L1 cache). As I mentioned above, both arr and output should be declared as unsigned ints. The direction of radix sort would change with each of the 4 passes: arr->output, output->arr, arr->output, output->arr, eliminating the need to copy.
Another optimization is a hybrid MSD (most significant digit) / LSD (least significant digit) radix sort for an array much larger than all of cache. Assuming base 2^8 == 256 is used, then the first pass create 256 logical bins, each of which would then fit within cache, and then each of the 256 logical bins are sorted using 3 LSD radix sort passes. On my system (Intel 3770K, Win 7 Pro 64 bit), it was less than a 6% reduction in time for sorting 36 million 32 bit unsigned integers, from .37 seconds down to .35 seconds, a point of diminishing returns.
I am trying to solve hackerrank problem - Maximum sub array modulo - described here https://www.hackerrank.com/challenges/maximum-subarray-sum/problem.
I am curious if this problem can be solved with Kadane algorithm.
The goal: given an n-element array of integers, and an integer 'm' , determine the maximum value of the sum of any of its subarrays modulo 'm'.
Input Format:
1) First line contains an integer 'q' denoting the number of queries to perform. Each query is described over two lines:
a) The first line contains two space-separated integers
describing - array length and modulo number.
b) The second line contains space-separated integers describing the
elements of array.
Here is the likely C++ code that I came up . It fails for some of the test cases (sorry the test cases are too large to post here). Could you comment/review as why this may not work? Thanks.
#include <bits/stdc++.h>
int main()
{
uint64_t q = 0, n = 0, m = 0;
std::cin >> q;
std::cin >> n;
std::cin >> m;
while(q) {
std::vector<uint64_t> vec;
for (uint64_t i = 0; i < n; i++) {
uint64_t num;
std::cin >> num;
vec.push_back(num);
}
uint64_t subArrayMax = 0;
uint64_t maxMod = 0;
for (uint64_t i = 0; i < n; i++) {
// Kadane's algorithm.
subArrayMax = std::max(subArrayMax, subArrayMax+vec[i]); // try (a+b)%m=(a%m+b%m)%m trick?
maxMod = std::max(maxMod, subArrayMax % m);
}
std::cout << maxMod;
--q;
}
}
Kadane's algorithm is not working here because it involves property of modular arithmetic.
First you have to understand why Kadane's algorithm works: It is a simple dynamic programming which answers the following question:
If we know the maximum sum end at index i-1, then maximum sum end at i is either append a[i] to the subarray yielding answer at i-1, OR not appending it
With modular arithmetic, this does not work. For eg:
Let A = {1,2,3,4}, M = 6
With Kadane's algorithm, of course, maximum sum is adding all elements, and it can be found using the thought quoted above: Keep appending a[i] into previous maximum sum found.
But if we are finding maximum sum % 6, then answer is (2+3)%6 = 5 but not (1+2+3)%6 = 0 or (1+2+3+4)%6 = 4. The larger the maximum sum NOT IMPLIES a more optimal sum for maximum sum % M. Therefore your goal here is not even finding maximum sum.
This problem can be solved in O(N lg N) using a modified version of Kadane's algorithm.
For a specific index i,
Let DP(i) = maximum subarray sum % M end at i
Let PS(i) be the prefix sum % M end at i
Naturally you will start to think how to find some j < i which (PS(i) - PS(j)+ M) % M is maximum. (Assume you know how to precompute PS and basic modular arithmetic)
Here is the core part: turns out
DP(i) = max(PS(i), (PS(i) - PS(j) + M) % M
Where PS(j') is the smallest number larger than PS(i) out of all j < i
Why? Because look at the formula, if PS(j') < PS(i), then it is of course better NOT TO minus anything from PS(i).
However if PS(j') > PS(i), then we can rewrite the formula like this: (M - x)%M, then we want x = PS(j')-PS(i) as small as possible, so that (M - x)%M is the largest.
Same as Kadane's algorithm, we keep track the maximum answer found along the process.
We can use priority queue or set data structure to find such j' for all i online, achieving O(N lg N) in total. Details you can see below accepted code:
#include<bits/stdc++.h>
#define LL long long
using namespace std;
int T;
set<LL> pre;
LL n, M, a[100010], ans, sum;
int main() {
cin >> T;
while(T--){
ans = sum = 0;
pre.clear();
cin >> n >> M;
for(int i=0; i<n;i++) cin >> a[i];
for(int i=0; i<n; i++){
(sum += a[i]) %= M;
ans = max(ans, sum);
ans = max(ans, (sum - *(pre.upper_bound(sum))+M)%M);
pre.insert(sum);
}
cout << ans << endl;
}
return 0;
}
While trying to find prime numbers in a range (see problem description), I came across the following code:
(Code taken from here)
// For each prime in sqrt(N) we need to use it in the segmented sieve process.
for (i = 0; i < cnt; i++) {
p = myPrimes[i]; // Store the prime.
s = M / p;
s = s * p; // The closest number less than M that is composite number for this prime p.
for (int j = s; j <= N; j = j + p) {
if (j < M) continue; // Because composite numbers less than M are of no concern.
/* j - M = index in the array primesNow, this is as max index allowed in the array
is not N, it is DIFF_SIZE so we are storing the numbers offset from.
while printing we will add M and print to get the actual number. */
primesNow[j - M] = false;
}
}
// In this loop the first prime numbers for example say 2, 3 are also set to false.
for (int i = 0; i < cnt; i++) { // Hence we need to print them in case they're in range.
if (myPrimes[i] >= M && myPrimes[i] <= N) // Without this loop you will see that for a
// range (1, 30), 2 & 3 doesn't get printed.
cout << myPrimes[i] << endl;
}
// primesNow[] = false for all composite numbers, primes found by checking with true.
for (int i = 0; i < N - M + 1; ++i) {
// i + M != 1 to ensure that for i = 0 and M = 1, 1 is not considered a prime number.
if (primesNow[i] == true && (i + M) != 1)
cout << i + M << endl; // Print our prime numbers in the range.
}
However, I didn't find this code intuitive and it was not easy to understand.
Can someone explain the general idea behind the above algorithm?
What alternative algorithms are there to mark non-prime numbers in a range?
That's overly complicated. Let's start with a basic Sieve of Eratosthenes, in pseudocode, that outputs all the primes less than or equal to n:
function primes(n)
sieve := makeArray(2..n, True)
for p from 2 to n
if sieve[p]
output(p)
for i from p*p to n step p
sieve[p] := False
This function calls output on each prime p; output can print the primes, or sum the primes, or count them, or do whatever you want to do with them. The outer for loop considers each candidate prime in turn; The sieving occurs in the inner for loop where multiples of the current prime p are removed from the sieve.
Once you understand how that works, go here for a discussion of the segmented Sieve of Eratosthenes over a range.
Have you considered the sieve on a bit level, it can provide a bit larger number of primes, and with the buffer, you could modify it to find for example the primes between 2 and 2^60 using 64 bit ints, by reusing the same buffer, while preserving the offsets of the primes already discovered. The following will use an array of integers.
Declerations
#include <math.h> // sqrt(), the upper limit need to eliminate
#include <stdio.h> // for printing, could use <iostream>
Macros to manipulate bit, the following will use 32bit ints
#define BIT_SET(d, n) (d[n>>5]|=1<<(n-((n>>5)<<5)))
#define BIT_GET(d, n) (d[n>>5]&1<<(n-((n>>5)<<5)))
#define BIT_FLIP(d, n) (d[n>>5]&=~(1<<(n-((n>>5)<<5))))
unsigned int n = 0x80000; // the upper limit 1/2 mb, with 32 bits each
// will get the 1st primes upto 16 mb
int *data = new int[n]; // allocate
unsigned int r = n * 0x20; // the actual number of bits avalible
Could use zeros to save time but, on (1) for prime, is a bit more intuitive
for(int i=0;i<n;i++)
data[i] = 0xFFFFFFFF;
unsigned int seed = 2; // the seed starts at 2
unsigned int uLimit = sqrt(r); // the upper limit for checking off the sieve
BIT_FLIP(data, 1); // one is not prime
Time to discover the primes this took under a half second
// untill uLimit is reached
while(seed < uLimit) {
// don't include itself when eliminating canidates
for(int i=seed+seed;i<r;i+=seed)
BIT_FLIP(data, i);
// find the next bit still active (set to 1), don't include the current seed
for(int i=seed+1;i<r;i++) {
if (BIT_GET(data, i)) {
seed = i;
break;
}
}
}
Now for the output this will consume the most time
unsigned long bit_index = 0; // the current bit
int w = 8; // the width of a column
unsigned pc = 0; // prime, count, to assist in creating columns
for(int i=0;i<n;i++) {
unsigned long long int b = 1; // double width, so there is no overflow
// if a bit is still set, include that as a result
while(b < 0xFFFFFFFF) {
if (data[i]&b) {
printf("%8.u ", bit_index);
if(((pc++) % w) == 0)
putchar('\n'); // add a new row
}
bit_index++;
b<<=1; // multiply by 2, to check the next bit
}
}
clean up
delete [] data;
I have this problem:
There are K lines of N numbers (32-bit). I have to choose the line with the max product of numbers.
The main problem is that N can go up to 20.
I'm trying to do this with logarithms:
ld sum = 0, max = 0;
int index = 0;
for(int i = 0; i < k; i ++) { // K lines
sum = 0, c = 0;
for(int j = 0; j < n; j ++) { // N numbers
cin >> t;
if(t < 0)
c++; // If the number is less than 0 i memorize it
if(t == 1 || t == -1) { // if numbers = 1 OR -1
sum += 0.00000001; // Because log(1) = 0
if(t == -1)
c ++;
}
else if(t == 0) { // if some number is equal to zero then the sum is = 0
sum = 0;
break;
}
else {
sum += log10(fabs(t));
}
}
if(c % 2 == 1) // if c is odd than multiply by -1
sum *= -1;
if(sum >= max) {
max = sum;
index = i;
}
if((sum - max) < eps) { // if sum is equal to max i'm also have to choose it
max = sum;
index = i;
}
}
cout << index + 1 << endl;
The program works in 50% of test cases. Is there a way to optimize my code?
In the case of t == -1, you increment c twice.
if you want to avoid bignum libs you can exploit that if you multiply b1 and b2 bits numbers then the result is b1+b2 bits long
so just sum the bit count of all multiplicants in a line together
and compare that
remember the results in some array
int bits(DWORD p) // count how many bits is p DWORD is 32bit unsigned int
{
DWORD m=0x80000000; int b=32;
for (;m;m>>=1,b--)
if (p>=m) break;
return b;
}
index sort the lines by the result bit count descending
if the first bitcount after sort is also the max then its line is the answer
if you have more than one max (more lines have the same bitcount and are the max also)
only then you have to multiply them together
Now the multiplication
you know should multiply all the max lines at once
each time all sub results are divisible by the same prime
divide them by it
this way the result will be truncated to much less bit count
so it should fit into 64 bit value
you should check out primes up to sqrt(max value)
when your max value is 32bit then check primes up to 65536
so you can make a static table of primes to check to speed thing up
also there is no point in checking primes bigger then your actual sub result
if you know how then this can be extremly speeded up by Sieves of Eratosthenes
but you will need to keep track of index offset after each division and use periodic sieve tables which is a bit complicated but doable
if you do not check all the primes but just few selected ones
then the result can still overflow
so you should handle that too (throw some error or something)
or divide all subresults by some value but that can invalidate the the result
Another multiplication approach
you can also sort the multiplicant by value
and check if some are present in all max lines
if yes then change them for one (or delete from lists)
this can be combined with the previous approach
bignum multiplication
you can make your own bignum multiplication
the result is max 20*32=640 bit
so the result will be array of unsigned ints (bit wide 8,16,32 ... whatever you like)
you can also handle the number as a string
look here for how to compute fast exact bignum square in C++
it contains also the multiplication approaches
and here NTT based Schönhage-Strassen multiplication in C++
but that will be slower for such small numbers like yours
at last you need to compare results
so compare from MSW do LSW and which ever line has bigger number in it is the max line
(MSW is most significant word, LSW is least significant word)
I think that this line is definitely wrong:
if(c % 2 == 1) // if c is odd than multiply by -1
sum *= -1;
If your product is in the range [0,1] then its logarithm will be negative and this will make it positive. I think you should keep it separate.
here x,y<=10^12 and y-x<=10^6
i have looped from left to right and checked each number for a prime..this method is very slow when x and y are somewhat like 10^11 and 10^12..any faster approach?
i hv stored all primes till 10^6..can i use them to find primes between huge values like 10^10-10^12?
for(i=x;i<=y;i++)
{
num=i;
if(check(num))
{
res++;
}
}
my check function
int check(long long int num)
{
long long int i;
if(num<=1)
return 0;
if(num==2)
return 1;
if(num%2==0)
return 0;
long long int sRoot = sqrt(num*1.0);
for(i=3; i<=sRoot; i+=2)
{
if(num%i==0)
return 0;
}
return 1;
}
Use a segmented sieve of Eratosthenes.
That is, use a bit set to store the numbers between x and y, represented by x as an offset and a bit set for [0,y-x). Then sieve (eliminate multiples) for all the primes less or equal to the square root of y. Those numbers that remain in the set are prime.
With y at most 1012 you have to sieve with primes up to at most 106, which will take less than a second in a proper implementation.
This resource goes through a number of prime search algorithms in increasing complexity/efficiency. Here's the description of the best, that is PG7.8 (you'll have to translate back to C++, it shouldn't be too hard)
This algorithm efficiently selects potential primes by eliminating multiples of previously identified primes from consideration and
minimizes the number of tests which must be performed to verify the
primacy of each potential prime. While the efficiency of selecting
potential primes allows the program to sift through a greater range of
numbers per second the longer the program is run, the number of tests
which need to be performed on each potential prime does continue to
rise, (but rises at a slower rate compared to other algorithms).
Together, these processes bring greater efficiency to generating prime
numbers, making the generation of even 10 digit verified primes
possible within a reasonable amount of time on a PC.
Further skip sets can be developed to eliminate the selection of potential primes which can be factored by each prime that has already
been identified. Although this process is more complex, it can be
generalized and made somewhat elegant. At the same time, we can
continue to eliminate from the set of test primes each of the primes
which the skip sets eliminate multiples of, minimizing the number of
tests which must be performed on each potential prime.
You can use the Sieve of Eratosthenes algorithm. This page has some links to implementations in various languages: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.
Here is my implementation of Sieve of Erathostenes:
#include <string>
#include <iostream>
using namespace std;
const int k = 110000; //you can change this constant to whatever maximum int you would need to calculate
long int p[k]; //here we would store Sieve of Erathostenes from 2 to k
long int j;
void init_prime() //in here we set our array
{
for (int i = 2; i <= k; i++)
{
if (p[i] == 0)
{
j = i;
while (j <= k)
{
p[j] = i;
j = j + i;
}
}
}
/*for (int i = 2; i <= k; i++)
cout << p[i] << endl;*/ //if you uncomment this you can see the output of initialization...
}
string prime(int first, int last) //this is example of how you can use initialized array
{
string result = "";
for (int i = first; i <= last; i++)
{
if (p[i] == i)
result = result + to_str(i) + "";
}
return result;
}
int main() //I done this code some time ago for one contest, when first input was number of cases and then actual input came in so nocases means "number of cases"...
{
int nocases, first, last;
init_prime();
cin >> nocases;
for (int i = 1; i <= nocases; i++)
{
cin >> first >> last;
cout << prime(first, last);
}
return 0;
}
You can use the Sieve of Erathostenes to calculate factorial too. This is actually the fastest interpretation of the Sieve I could manage to create that day (it can calculate the Sieve of this range in less than a second)