Porting optimized Sieve of Eratosthenes from Python to C++ - c++

Some time ago I used the (blazing fast) primesieve in python that I found here: Fastest way to list all primes below N
To be precise, this implementation:
def primes2(n):
""" Input n>=6, Returns a list of primes, 2 <= p < n """
n, correction = n-n%6+6, 2-(n%6>1)
sieve = [True] * (n/3)
for i in xrange(1,int(n**0.5)/3+1):
if sieve[i]:
k=3*i+1|1
sieve[ k*k/3 ::2*k] = [False] * ((n/6-k*k/6-1)/k+1)
sieve[k*(k-2*(i&1)+4)/3::2*k] = [False] * ((n/6-k*(k-2*(i&1)+4)/6-1)/k+1)
return [2,3] + [3*i+1|1 for i in xrange(1,n/3-correction) if sieve[i]]
Now I can slightly grasp the idea of the optimizing by automaticly skipping multiples of 2, 3 and so on, but when it comes to porting this algorithm to C++ I get stuck (I have a good understanding of python and a reasonable/bad understanding of C++, but good enough for rock 'n roll).
What I currently have rolled myself is this (isqrt() is just a simple integer square root function):
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
T sievemax = (N-3 + (1-(N % 2))) / 2;
T i;
T sievemaxroot = isqrt(sievemax) + 1;
boost::dynamic_bitset<> sieve(sievemax);
sieve.set();
primes.push_back(2);
for (i = 0; i <= sievemaxroot; i++) {
if (sieve[i]) {
primes.push_back(2*i+3);
for (T j = 3*i+3; j <= sievemax; j += 2*i+3) sieve[j] = 0; // filter multiples
}
}
for (; i <= sievemax; i++) {
if (sieve[i]) primes.push_back(2*i+3);
}
}
This implementation is decent and automatically skips multiples of 2, but if I could port the Python implementation I think it could be much faster (50%-30% or so).
To compare the results (in the hope this question will be successfully answered), the current execution time with N=100000000, g++ -O3 on a Q6600 Ubuntu 10.10 is 1230ms.
Now I would love some help with either understanding what the above Python implementation does or that you would port it for me (not as helpful though).
EDIT
Some extra information about what I find difficult.
I have trouble with the techniques used like the correction variable and in general how it comes together. A link to a site explaining different Eratosthenes optimizations (apart from the simple sites that say "well you just skip multiples of 2, 3 and 5" and then get slam you with a 1000 line C file) would be awesome.
I don't think I would have issues with a 100% direct and literal port, but since after all this is for learning that would be utterly useless.
EDIT
After looking at the code in the original numpy version, it actually is pretty easy to implement and with some thinking not too hard to understand. This is the C++ version I came up with. I'm posting it here in full version to help further readers in case they need a pretty efficient primesieve that is not two million lines of code. This primesieve does all primes under 100000000 in about 415 ms on the same machine as above. That's a 3x speedup, better then I expected!
#include <vector>
#include <boost/dynamic_bitset.hpp>
// http://vault.embedded.com/98/9802fe2.htm - integer square root
unsigned short isqrt(unsigned long a) {
unsigned long rem = 0;
unsigned long root = 0;
for (short i = 0; i < 16; i++) {
root <<= 1;
rem = ((rem << 2) + (a >> 30));
a <<= 2;
root++;
if (root <= rem) {
rem -= root;
root++;
} else root--;
}
return static_cast<unsigned short> (root >> 1);
}
// https://stackoverflow.com/questions/2068372/fastest-way-to-list-all-primes-below-n-in-python/3035188#3035188
// https://stackoverflow.com/questions/5293238/porting-optimized-sieve-of-eratosthenes-from-python-to-c/5293492
template <class T>
void primesbelow(T N, std::vector<T> &primes) {
T i, j, k, l, sievemax, sievemaxroot;
sievemax = N/3;
if ((N % 6) == 2) sievemax++;
sievemaxroot = isqrt(N)/3;
boost::dynamic_bitset<> sieve(sievemax);
sieve.set();
primes.push_back(2);
primes.push_back(3);
for (i = 1; i <= sievemaxroot; i++) {
if (sieve[i]) {
k = (3*i + 1) | 1;
l = (4*k-2*k*(i&1)) / 3;
for (j = k*k/3; j < sievemax; j += 2*k) {
sieve[j] = 0;
sieve[j+l] = 0;
}
primes.push_back(k);
}
}
for (i = sievemaxroot + 1; i < sievemax; i++) {
if (sieve[i]) primes.push_back((3*i+1)|1);
}
}

I'll try to explain as much as I can. The sieve array has an unusual indexing scheme; it stores a bit for each number that is congruent to 1 or 5 mod 6. Thus, a number 6*k + 1 will be stored in position 2*k and k*6 + 5 will be stored in position 2*k + 1. The 3*i+1|1 operation is the inverse of that: it takes numbers of the form 2*n and converts them into 6*n + 1, and takes 2*n + 1 and converts it into 6*n + 5 (the +1|1 thing converts 0 to 1 and 3 to 5). The main loop iterates k through all numbers with that property, starting with 5 (when i is 1); i is the corresponding index into sieve for the number k. The first slice update to sieve then clears all bits in the sieve with indexes of the form k*k/3 + 2*m*k (for m a natural number); the corresponding numbers for those indexes start at k^2 and increase by 6*k at each step. The second slice update starts at index k*(k-2*(i&1)+4)/3 (number k * (k+4) for k congruent to 1 mod 6 and k * (k+2) otherwise) and similarly increases the number by 6*k at each step.
Here's another attempt at an explanation: let candidates be the set of all numbers that are at least 5 and are congruent to either 1 or 5 mod 6. If you multiply two elements in that set, you get another element in the set. Let succ(k) for some k in candidates be the next element (in numerical order) in candidates that is larger than k. In that case, the inner loop of the sieve is basically (using normal indexing for sieve):
for k in candidates:
for (l = k; ; l += 6) sieve[k * l] = False
for (l = succ(k); ; l += 6) sieve[k * l] = False
Because of the limitations on which elements are stored in sieve, that is the same as:
for k in candidates:
for l in candidates where l >= k:
sieve[k * l] = False
which will remove all multiples of k in candidates (other than k itself) from the sieve at some point (either when the current k was used as l earlier or when it is used as k now).

Piggy-Backing onto Howard Hinnant's response, Howard, you don't have to test numbers in the set of all natural numbers not divisible by 2, 3 or 5 for primality, per se. You need simply multiply each number in the array (except 1, which self-eliminates) times itself and every subsequent number in the array. These overlapping products will give you all the non-primes in the array up to whatever point you extend the deterministic-multiplicative process. Thus the first non-prime in the array will be 7 squared, or 49. The 2nd, 7 times 11, or 77, etc. A full explanation here: http://www.primesdemystified.com

As an aside, you can "approximate" prime numbers. Call the approximate prime P. Here are a few formulas:
P = 2*k+1 // not divisible by 2
P = 6*k + {1, 5} // not divisible 2, 3
P = 30*k + {1, 7, 11, 13, 17, 19, 23, 29} // not divisble by 2, 3, 5
The properties of the set of numbers found by these formulas is that P may not be prime, however all primes are in the set P. I.e. if you only test numbers in the set P for prime, you won't miss any.
You can reformulate these formulas to:
P = X*k + {-i, -j, -k, k, j, i}
if that is more convenient for you.
Here is some code that uses this technique with a formula for P not divisible by 2, 3, 5, 7.
This link may represent the extent to which this technique can be practically leveraged.

Related

Maximize XOR Equation

Problem statement:
Given an array of n elements and an integer k, find an integer x in
the range [0,k] such that Xor-sum(x) is maximized. Print the maximum
value of the equation.
Xor-sum(x)=(x XOR A1)+(x XOR A[2])+(x XOR A[3])+…………..+(x XOR A[N])
Input Format
The first line contains integer N denoting the number of elements in
A. The next line contains an integer, k, denoting the maximum value
of x. Each line i of the N subsequent lines(where 0<=i<=N) contains
an integer describing Ai.
Constraints
1<=n<=10^5
0<=k<=10^9
0<=A[i]<=10^9
Sample Input
3
7
1
6
3
Sample Output
14
Explanation
Xor_sum(4)=(4^1)+(4^6)+(4^3)=14.
This problem was asked in Infosys requirement test. I was going through previous year papers &
I came across this problem.
I was only able to come up with a brute-force solution which is just to calculate the equation
for every x in the range [0,k] and print the maximum. But, the solution won't work for the
given constraints.
My solution
#include <bits/stdc++.h>
using namespace std;
int main()
{
int n, k, ans = 0;
cin >> n >> k;
vector<int> a(n);
for (int i = 0; i < n; i++) cin >> a[i];
for (int i = 0; i <= k; i++) {
int temp = 0;
for (int j = 0; j < n; j++) {
temp += (i ^ a[j]);
}
ans = max(temp, ans);
}
cout << ans;
return 0;
}
I found the solution on a website. I was unable to understand what the code does but, this solution gives incorrect answer for some test cases.
Scroll down to question 3
The trick here is that XOR works on bits in parallel, independently. You can optimize each bit of x. Brute-forcing this takes 2*32 tries, given the constraints.
As said in other comments each bit of x will give an independent contribution to the sum, so the first step is to calculate the added value for each possible bit.
To do this for the i-th bit of x count the number of 0s and 1s in the same position of each number in the array, if the difference N0 - N1 is positive then the added value is also positive and equal to (N0-N1) * 2^i, let's call such bits "useful".
The number x will be a combination of useful bits only.
Since k is not in the form 2^n - 1, we need a strategy to find the best combination (if you don't want to use brute force on the k possible values).
Consider then the binary representation of k and loop over its bits starting from the MSB, initializing two variables: CAV (current added value) = 0, BAV (best added value) = 0.
If the current bit is 0 loop over.
If the current bit is 1:
a) calculate the AV sum of all useful bits with lower index plus the CAV, if the result is greater then the BAV then replace BAV
b) if the current bit is not useful quit loop
c) add the current bit added value to CAV
When the loop is over, if CAV is greater than BAV replace BAV
EDIT: A sample implementation (in Java, sorry :) )
public class XorSum {
public static void main(String[] args) {
Scanner sc=new Scanner(System.in);
int n=sc.nextInt();
int k=sc.nextInt();
int[] a=new int[n];
for (int i=0;i<n;i++) {
a[i]=sc.nextInt();
}
//Determine the number of bits to represent k (position of most significant 1 + 1)
int msb=0;
for (int kcopy=k; kcopy!=0; kcopy=kcopy>>>1) {
msb++;
}
//Compute the added value of each possible bit in x
int[] av=new int[msb];
int bmask=1;
for (int bit=0;bit<msb;bit++) {
int count0=0;
for (int i=0;i<n;i++) {
if ((a[i]&bmask)==0) {
count0++;
}
}
av[bit]=(count0*2-n)*bmask;
bmask = bmask << 1;
}
//Accumulated added value, the value of all positive av bits up to the index
int[] aav=new int[msb];
for (int bit=0;bit<msb;bit++) {
if (av[bit]>0) {
aav[bit]=av[bit];
}
if (bit>0) {
aav[bit]+=aav[bit-1];
}
}
//Explore the space of possible combinations moving on the k boundary
int cval=0;
int bval=0;
bmask = bmask >>> 1;
//Start from the msb
for (int bit=msb-1;bit>=0;bit--) {
//Exploring the space of bit combination we have 3 possible cases:
//bit of k is 0, then we must choose 0 as well, setting it to 1 will get x to be greater than k, so in this case just loop over
if ((k&bmask)==0) {
continue;
}
//bit of k is 1, we can choose between 0 and 1:
//- choosing 0, we can immediately explore the complete branch considering that all following bits can be set to 1, so just set to 1 all bits with positive av
// and get the meximum possible value for this branch
int val=cval+(bit>0?aav[bit]:0);
if (val>bval) {
bval=val;
}
//- choosing 1, if the bit has no positive av, then it's forced to 0 and the solution is found on the other branch, so we can stop here
if (av[bit]<=0) break;
//- choosing 1, with a positive av, then store the value and go on with this branch
cval+=av[bit];
}
if (cval>bval) {
bval=cval;
}
//Final sum
for (int i=0;i<n;i++) {
bval+=a[i];
}
System.out.println(bval);
}
}
I think you can consider solving for each bit. The number X should be the one that can turn on many high-order bits in the array. So you can count the number of bits 1 for 2^0, 2^1, ... And for each position in the 32 bits consider turning on the ones that many number has that position to be bit 0.
Combining this with the limit K should give you an answer that runs in O(log K) time.
Assuming k is unbounded, this problem is trivial.
For each bit (assuming 64-bit words there would be 64 for example) accumulate the total count of 1's and 0's in all values in the array (for that bit), with c1_i and c0_i representing the former and latter respectively for bit i.
Then define each bit b_i in x as
x_i = 1 if c0_i > c1_i else 0
Constructing x as described above is guaranteed to give you the value of x that maximizes the sum of interest.
When k is specific number, this can be solved using a dynamic programming solution. To understand how, first derive a recurrence.
Let z_0,z_1,...,z_n be the positions of ones occuring in k's binary representation with z_0 being the most significant position.
Let M[t] represent the maximum sum possible given the problem's array and defining any x such that x < t.
Important note: the optimal value of M[t] for t a power of 2 is obtained by following the procedure described above for an unbounded k, but limiting the largest bit used.
To solve this problem, we want to find
M[k] = max(M[2^z_0],M[k - 2^z_0] + C_0)
where C_i is defined to be the contribution to the final sum by setting the position z_i to one.
This of course continues as a recursion, with the next step being:
M[k - 2^z_0] = max(M[2^z_1],M[k - 2^z_0 - 2^z_1] + C_1)
and so on and so forth. The dynamic programming solution arises by converting this recursion to the appropriate DP algorithm.
Note, that due to the definition of M[k], it is still necessary to check if the sum of x=k is greater than M[k], as it may still be so, but this requires one pass.
At bit level it is simple 0 XOR 0, 1 XOR 1 = 0 and last one 0 XOR 1 = 1, but when these bit belongs to a number XOR operations have addition and subtraction effect. For example if third bit of a number is set and num XOR with 4 (0100) which also have third bit set then result would be subtraction from number by 2^(3-1), for example num = 5 then 0101 XOR 0100 = 0001, 4 subtracted in 5 , Similarly if third bit of a number is not set and num XOR with 4 then result would be addition for example num = 2 then 0010 XOR 0100 = 0101, 4 will be added in 2. Now let’s see this problem,
This problem can’t be solved by applying XOR on each number individually, rather the approach to solve this problem is Perform XOR on particular bit of all numbers, in one go! . Let’s see how it can be done?
Fact 1: Let’s consider we have X and we want to perform XOR on all numbers with X and if we know second bit of X is set, now suppose somehow we also know that how many numbers in all numbers have second bit set then we know answer 1 XOR 1 = 0 and we don’t have to perform XOR on each number individually.
Fact 2: From fact 1, we know how many numbers have a particular bit set, let’s call it M and if X also have that particular bit set then M * 2^(pos -1) will be subtracted from sum of all numbers. If N is total element in array than N - M numbers don’t have that particular bit set and due to it (N – M) * 2^(pos-1) will be added in sum of all numbers.
From Fact 1 and Fact 2 we can calculate overall XOR effect on a particular bit on all Numbers by effect = (N – M)* 2^(pos -1) – (M * 2^(pos -1)) and can perform the same for all bits.
Now it’s time to see above theory in action, if we have array = {1, 6, 3}, k = 7 then,
1 = 0001 (There are total 32 bits but I am showing only relevant bits other bits are zero)
6 = 0110
3 = 0011
So our bit count list = [0, 1, 2, 2] as you can see 1 and 3 have first bit set, 6 and 3 have second bit set and only 6 have third bit set.
X = 0, …, 7 but X = 0 have effect = 0 on sum because if bit is not set then it doesn’t not affect other bit in XOR operation, so let’s star from X = 1 which is 0001,
[0, 1, 2, 2] = count list,
[0, 0, 0, 1] = X
As it is visible in count list two numbers have first bit set and X also have first bit set, it means 2 * 2^(1 – 1) will be subtract in sum and total numbers in array are three, so (3 – 2) * 2^(1-1) will be added in sum. Conclusion is XOR of first bit is, effect = (3 – 2) * 2^(1-1) - 2 * 2^(1 – 1) = 1 – 2 = -1. It is also overall effect by X = 1 because it only has first bit set and rest of bits are zero. At this point we compare effect produced by X = 1 with X = 0 and -1 < 0 which means X = 1 will reduce sum of all numbers by -1 but X = 0 will not deduce sum of all numbers. So until now X = 0 will produce max sum.
The way XOR is performed for X = 1 can be performed for all other values and I would like to jump directly to X = 4 which is 0100
[0, 1, 2, 2] = count list,
[0, 1, 0, 0] = X
As it is visible X have only third bit set and only one number in array have first bit set, it means 1 * 2^(3 – 1 ) will be subtracted and (3 – 1) * 2^(3-1) will be added and overall effect = (3 – 1) * 2^(3-1) - 1 * 2^(3 – 1 ) = 8 – 4 = 4. At this point we compare effect of X = 4 with known max effect which is effect = 0 so 4 > 0 and due to this X = 4 will produce max sum and we considered it. When you perform this for all X = 0,…,7, you will find X = 4 will produce max effect on sum, so the answer is X = 4.
So
(x XOR arr[0]) + ( x XOR arr[1]) +….. + (x XOR arr[n]) = effect + sum(arr[0] + sum[1]+ …. + arr[n])
Complexity is,
O(32 n) to find for all 32 bits, how many number have a particular bit set, plus,
O(32 k) to find effect of all X in [0, k],
Complexity = O(32 n) + O(32 k) = O(c n) + O(c k), here c is constant,
finally
Complexity = O(n)
#include <iostream>
#include <cmath>
#include <bitset>
#include <vector>
#include <numeric>
std::vector<std::uint32_t> bitCount(const std::vector<std::uint32_t>& numList){
std::vector<std::uint32_t> countList(32, 0);
for(std::uint32_t num : numList){
std::bitset<32> bitList(num);
for(unsigned i = 0; i< 32; ++i){
if(bitList[i]){
countList[i] += 1;
}
}
}
return countList;
}
std::pair<std::uint32_t, std::int64_t> prefXAndMaxEffect(std::uint32_t n, std::uint32_t k,
const std::vector<std::uint32_t>& bitCountList){
std::uint32_t prefX = 0;
std::int64_t xorMaxEffect = 0;
std::vector<std::int64_t> xorBitEffect(32, 0);
for(std::uint32_t x = 1; x<=k; ++x){
std::bitset<32> xBitList(x);
std::int64_t xorEffect = 0;
for(unsigned i = 0; i< 32; ++i){
if(xBitList[i]){
if(0 != xorBitEffect[i]){
xorEffect += xorBitEffect[i];
}
else{
std::int64_t num = std::exp2(i);
xorBitEffect[i] = (n - bitCountList[i])* num - (bitCountList[i] * num);
xorEffect += xorBitEffect[i];
}
}
}
if(xorEffect > xorMaxEffect){
prefX = x;
xorMaxEffect = xorEffect;
}
}
return {prefX, xorMaxEffect};
}
int main(int , char *[]){
std::uint32_t k = 7;
std::vector<std::uint32_t> numList{1, 6, 3};
std::pair<std::uint32_t, std::int64_t> xAndEffect = prefXAndMaxEffect(numList.size(), k, bitCount(numList));
std::int64_t sum = 0;
sum = std::accumulate(numList.cbegin(), numList.cend(), sum) + xAndEffect.second;
std::cout<< sum<< '\n';
}
Output :
14

Divide array into smaller consecutive parts such that NEO value is maximal

On this years Bubble Cup (finished) there was the problem NEO (which I couldn't solve), which asks
Given array with n integer elements. We divide it into several part (may be 1), each part is a consecutive of elements. The NEO value in that case is computed by: Sum of value of each part. Value of a part is sum all elements in this part multiple by its length.
Example: We have array: [ 2 3 -2 1 ]. If we divide it like: [2 3] [-2 1]. Then NEO = (2 + 3) * 2 + (-2 + 1) * 2 = 10 - 2 = 8.
The number of elements in array is smaller then 10^5 and the numbers are integers between -10^6 and 10^6
I've tried something like divide and conquer to constantly split array into two parts if it increases the maximal NEO number otherwise return the NEO of the whole array. But unfortunately the algorithm has worst case O(N^2) complexity (my implementation is below) so I'm wondering whether there is a better solution
EDIT: My algorithm (greedy) doesn't work, taking for example [1,2,-6,2,1] my algorithm returns the whole array while to get the maximal NEO value is to take parts [1,2],[-6],[2,1] which gives NEO value of (1+2)*2+(-6)+(1+2)*2=6
#include <iostream>
int maxInterval(long long int suma[],int first,int N)
{
long long int max = -1000000000000000000LL;
long long int curr;
if(first==N) return 0;
int k;
for(int i=first;i<N;i++)
{
if(first>0) curr = (suma[i]-suma[first-1])*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Split the array into elements from [first..i] and [i+1..N-1] store the corresponding NEO value
else curr = suma[i]*(i-first+1)+(suma[N-1]-suma[i])*(N-1-i); // Same excpet that here first = 0 so suma[first-1] doesn't exist
if(curr > max) max = curr,k=i; // find the maximal NEO value for splitting into two parts
}
if(k==N-1) return max; // If the max when we take the whole array then return the NEO value of the whole array
else
{
return maxInterval(suma,first,k+1)+maxInterval(suma,k+1,N); // Split the 2 parts further if needed and return it's sum
}
}
int main() {
int T;
std::cin >> T;
for(int j=0;j<T;j++) // Iterate over all the test cases
{
int N;
long long int NEO[100010]; // Values, could be long int but just to be safe
long long int suma[100010]; // sum[i] = sum of NEO values from NEO[0] to NEO[i]
long long int sum=0;
int k;
std::cin >> N;
for(int i=0;i<N;i++)
{
std::cin >> NEO[i];
sum+=NEO[i];
suma[i] = sum;
}
std::cout << maxInterval(suma,0,N) << std::endl;
}
return 0;
}
This is not a complete solution but should provide some helpful direction.
Combining two groups that each have a positive sum (or one of the sums is non-negative) would always yield a bigger NEO than leaving them separate:
m * a + n * b < (m + n) * (a + b) where a, b > 0 (or a > 0, b >= 0); m and n are subarray lengths
Combining a group with a negative sum with an entire group of non-negative numbers always yields a greater NEO than combining it with only part of the non-negative group. But excluding the group with the negative sum could yield an even greater NEO:
[1, 1, 1, 1] [-2] => m * a + 1 * (-b)
Now, imagine we gradually move the dividing line to the left, increasing the sum b is combined with. While the expression on the right is negative, the NEO for the left group keeps decreasing. But if the expression on the right gets positive, relying on our first assertion (see 1.), combining the two groups would always be greater than not.
Combining negative numbers alone in sequence will always yield a smaller NEO than leaving them separate:
-a - b - c ... = -1 * (a + b + c ...)
l * (-a - b - c ...) = -l * (a + b + c ...)
-l * (a + b + c ...) < -1 * (a + b + c ...) where l > 1; a, b, c ... > 0
O(n^2) time, O(n) space JavaScript code:
function f(A){
A.unshift(0);
let negatives = [];
let prefixes = new Array(A.length).fill(0);
let m = new Array(A.length).fill(0);
for (let i=1; i<A.length; i++){
if (A[i] < 0)
negatives.push(i);
prefixes[i] = A[i] + prefixes[i - 1];
m[i] = i * (A[i] + prefixes[i - 1]);
for (let j=negatives.length-1; j>=0; j--){
let negative = prefixes[negatives[j]] - prefixes[negatives[j] - 1];
let prefix = (i - negatives[j]) * (prefixes[i] - prefixes[negatives[j]]);
m[i] = Math.max(m[i], prefix + negative + m[negatives[j] - 1]);
}
}
return m[m.length - 1];
}
console.log(f([1, 2, -5, 2, 1, 3, -4, 1, 2]));
console.log(f([1, 2, -4, 1]));
console.log(f([2, 3, -2, 1]));
console.log(f([-2, -3, -2, -1]));
Update
This blog provides that we can transform the dp queries from
dp_i = sum_i*i + max(for j < i) of ((dp_j + sum_j*j) + (-j*sum_i) + (-i*sumj))
to
dp_i = sum_i*i + max(for j < i) of (dp_j + sum_j*j, -j, -sum_j) ⋅ (1, sum_i, i)
which means we could then look at each iteration for an already seen vector that would generate the largest dot product with our current information. The math alluded to involves convex hull and farthest point query, which are beyond my reach to implement at this point but will make a study of.

Efficiency of Sieve of Eratosthenes algorithm

I am trying to understand the "Sieve of Eratosthenes". Here is my algorithm (code below), and a list of features that I cannot understand (in order).
Why is i * i more efficient than i * 2? Yes, I can understand it would be less iterations, therefore more efficient, but then doesn't it skip some numbers (for example i = 9 => j = 81 skips 18 27 36 ...)?
On Wikipedia I found that space complexity is equal to O(n) and that's understandable; whatever number we enter it creates an array of the size entered, but time complexity here is where things get confusing. I found this notation O(n(logn)(loglogn)) -- what is that? According to my understanding we have 2 full iterations and 1 partial iteration, therefore O(n^2 * logn).
#include <iostream>
using namespace std;
int main() {
cout << "Enter number:" << endl;
int arrSize;
cin >> arrSize;
bool primesArr[arrSize];
primesArr[0] = false;
for (int i = 1; i < arrSize; i++) primesArr[i] = true;
for (int i = 2; i < arrSize; i++)
if (primesArr[i - 1]) {
cout << i << endl;
/* for (int j = i * 2; j < arrSize; j += i) less efficient */
for (int j = i * i; j < arrSize; j += i)
primesArr[j - 1] = false;
}
return 0;
}
Why i * i more efficient than i * 2? Yes, I can understand it would be less iteration, therefore more efficiency, but then doesn't it skip some numbers (for example i = 9 => j = 81 skip 18 27 36 ...)?
You are referring to
for (int j = i * i; j < arrSize; j += i)
Note that i * i is the initial value for the loop counter j. So the values of j greater than i * i will all be marked off. The values which we skip from i * 2 to i * i have already been marked off during previous iterations. Let's think about the first few:
When i == 2, we mark off all multiples of 2 (2, 4, 6, 8, etc.). When i == 3, if we start j = 3 * 2 = 6 then we will mark off 6 again before reaching 9, 12, 15, etc. Since 6 is a multiple of 2 and was already marked off, we can skip straight to 3 * 3 == 9.
When we reach i == 5 and if we start at j == 5 * 2 == 10, then we will mark off 10, which was already taken care of since it is a multiple of 2, 15 which is a multiple of 3, and 20 which is also a multiple of 2 before we finally reach 25 which is not a multiple of any primer less than 5.
time complexity here is where things get confusing. I found this notation O(n(logn)(loglogn)) -- what is that? According to my understanding we have 2 full iterations and 1 partial iteration, therefore O(n^2 * logn).
Your analysis reaches correct result that this algorithm is O(n^2 * logn). A more detailed analysis can prove a tighter upper bound as O(n(logn)(loglogn)). Note that O(n(logn)(loglogn)) is a subset of O(n^2 * logn).
Why i * i more efficient than i * 2? Doesn't it skip some numbers?
No it doesn't because smaller multiple of i (For example 18, 27 etc in your case are covered while running loop for i = 2, i = 3 etc)
Every number can be represented as unique prime factorization. If i is a prime number, any multiple of i greater than i and smaller than i * i would be multiple of one or more primes smaller than i.
nasty notation O(n(logn)(loglogn))
From this answer
Number of operations are 1/2 + 1/3 + 1/5 + 1/7 ... = n log log n
If you count bit operations, since you're dealing with numbers up to n, they have about log n bits, which is where the factor of log n comes in, giving O(n log n log log n) bit operations.

Sieve of Eratosthenes on a segment

Sieve of Eratosthenes on the segment:
Sometimes you need to find all the primes that are in the range
[L...R] and not in [1...N], where R is a large number.
Conditions:
You are allowed to create an array of integers with size
(R−L+1).
Implementation:
bool isPrime[r - l + 1]; //filled by true
for (long long i = 2; i * i <= r; ++i) {
for (long long j = max(i * i, (l + (i - 1)) / i * i); j <= r; j += i) {
isPrime[j - l] = false;
}
}
for (long long i = max(l, 2); i <= r; ++i) {
if (isPrime[i - l]) {
//then i is prime
}
}
What is the logic behind setting the lower limit of 'j' in second for loop??
Thanks in advance!!
Think about what we want to find. Ignore the i*i part. We have only
(L + (i - 1)) / i * i) to consider. (I wrote the L capital since l and 1 look quite similar)
What should it be? Obviously it should be the smallest number within L..R that is divisible by i. That's when we want to start to sieve out.
The last part of the formula, / i * i finds the next lower number that is divisible by i by using the properties of integer division.
Example: 35 div 4 * 4 = 8 * 4 = 32, 32 is the highest number that is (equal or) lower than 35 which is divisible by 4.
The L is where we want to start, obviously, and the + (i-1) makes sure that we don't find the highest number equal or lower than but the smallest number equal or bigger than L that is divisible by i.
Example: (459 + (4-1)) div 4 * 4 = 462 div 4 * 4 = 115 * 4 = 460.
460 >= 459, 460 | 4, smallest number with that property
(the max( i*i, ...) is only so that i is not sieved out itself if it is within L..R, I think, although I wonder why it's not 2 * i)
For reasons of readability, I'd made this an inline function next_divisible(number, divisor) or the like. And I'd make it clear that integer division is used. If not, somebody clever might change it to regular division, with which it wouldn't work.
Also, I strongly recommend to wrap the array. It is not obvious to the outside that the property for a number X is stored at position X - L. Something like a class RangedArray that does that shift for you, allowing you a direct input of X instead of X - L, could easily take the responsibility. If you don't do that, at least make it a vector, outside of a innermost class, you shouldn't use raw arrays in C++.

C++: What are some general ways to make code more efficient for use with large numbers?

Please when answering this question try to be as general as possible to help the wider community, rather than just specifically helping my issue (although helping my issue would be great too ;) )
I seem to be encountering this problem time and time again with the simple problems on Project Euler. Most commonly are the problems that require a computation of the prime numbers - these without fail always fail to terminate for numbers greater than about 60,000.
My most recent issue is with Problem 12:
The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
Let us list the factors of the first seven triangle numbers:
1: 1
3: 1,3
6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28
We can see that 28 is the first triangle number to have over five divisors.
What is the value of the first triangle number to have over five hundred divisors?
Here is my code:
#include <iostream>
#include <vector>
#include <cmath>
using namespace std;
int main() {
int numberOfDivisors = 500;
//I begin by looping from 1, with 1 being the 1st triangular number, 2 being the second, and so on.
for (long long int i = 1;; i++) {
long long int triangularNumber = (pow(i, 2) + i)/2
//Once I have the i-th triangular, I loop from 1 to itself, and add 1 to count each time I encounter a divisor, giving the total number of divisors for each triangular.
int count = 0;
for (long long int j = 1; j <= triangularNumber; j++) {
if (triangularNumber%j == 0) {
count++;
}
}
//If the number of divisors is 500, print out the triangular and break the code.
if (count == numberOfDivisors) {
cout << triangularNumber << endl;
break;
}
}
}
This code gives the correct answers for smaller numbers, and then either fails to terminate or takes an age to do so!
So firstly, what can I do with this specific problem to make my code more efficient?
Secondly, what are some general tips both for myself and other new C++ users for making code more efficient? (I.e. applying what we learn here in the future.)
Thanks!
The key problem is that your end condition is bad. You are supposed to stop when count > 500, but you look for an exact match of count == 500, therefore you are likely to blow past the correct answer without detecting it, and keep going ... maybe forever.
If you fix that, you can post it to code review. They might say something like this:
Break it down into separate functions for finding the next triangle number, and counting the factors of some number.
When you find the next triangle number, you execute pow. I perform a single addition.
For counting the number of factors in a number, a google search might help. (e.g. http://www.cut-the-knot.org/blue/NumberOfFactors.shtml ) You can build a list of prime numbers as you go, and use that to quickly find a prime factorization, from which you can compute the number of factors without actually counting them. When the numbers get big, that loop gets big.
Tldr: 76576500.
About your Euler problem, some math:
Preliminary 1:
Let's call the n-th triangle number T(n).
T(n) = 1 + 2 + 3 + ... + n = (n^2 + n)/2 (sometimes attributed to Gauss, sometimes someone else). It's not hard to figure it out:
1+2+3+4+5+6+7+8+9+10 =
(1+10) + (2+9) + (3+8) + (4+7) + (5+6) =
11 + 11 + 11 + 11 + 11 =
55 =
110 / 2 =
(10*10 + 10)/2
Because of its definition, it's trivial that T(n) + n + 1 = T(n+1), and that with a<b, T(a)<T(b) is true too.
Preliminary 2:
Let's call the divisor count D. D(1)=1, D(4)=3 (because 1 2 4).
For a n with c non-repeating prime factors (not just any divisors, but prime factors, eg. n = 42 = 2 * 3 * 7 has c = 3), D(n) is c^2: For each factor, there are two possibilites (use it or not). The 9 possibile divisors for the examples are: 1, 2, 3, 7, 6 (2*3), 14 (2*7), 21 (3*7), 42 (2*3*7).
More generally with repeating, the solution for D(n) is multiplying (Power+1) together. Example 126 = 2^1 * 3^2 * 7^1: Because it has two 3, the question is no "use 3 or not", but "use it 1 time, 2 times or not" (if one time, the "first" or "second" 3 doesn't change the result). With the powers 1 2 1, D(126) is 2*3*2=12.
Preliminary 3:
A number n and n+1 can't have any common prime factor x other than 1 (technically, 1 isn't a prime, but whatever). Because if both n/x and (n+1)/x are natural numbers, (n+1)/x - n/x has to be too, but that is 1/x.
Back to Gauss: If we know the prime factors for a certain n and n+1 (needed to calculate D(n) and D(n+1)), calculating D(T(n)) is easy. T(N) = (n^2 + n) / 2 = n * (n+1) / 2. As n and n+1 don't have common prime factors, just throwing together all factors and removing one 2 because of the "/2" is enough. Example: n is 7, factors 7 = 7^1, and n+1 = 8 = 2^3. Together it's 2^3 * 7^1, removing one 2 is 2^2 * 7^1. Powers are 2 1, D(T(7)) = 3*2 = 6. To check, T(7) = 28 = 2^2 * 7^1, the 6 possible divisors are 1 2 4 7 14 28.
What the program could do now: Loop through all n from 1 to something, always factorize n and n+1, use this to get the divisor count of the n-th triangle number, and check if it is >500.
There's just the tiny problem that there are no efficient algorithms for prime factorization. But for somewhat small numbers, todays computers are still fast enough, and keeping all found factorizations from 1 to n helps too for finding the next one (for n+1). Potential problem 2 are too large numbers for longlong, but again, this is no problem here (as can be found out with trying).
With the described process and the program below, I got
the 12375th triangle number is 76576500 and has 576 divisors
#include <iostream>
#include <vector>
#include <cstdint>
using namespace std;
const int limit = 500;
vector<uint64_t> knownPrimes; //2 3 5 7...
//eg. [14] is 1 0 0 1 ... because 14 = 2^1 * 3^0 * 5^0 * 7^1
vector<vector<uint32_t>> knownFactorizations;
void init()
{
knownPrimes.push_back(2);
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 0 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 0)); //factors for 1 (dummy)
knownFactorizations.push_back(vector<uint32_t>(1, 1)); //factors for 2
}
void addAnotherFactorization()
{
uint64_t number = knownFactorizations.size();
size_t len = knownPrimes.size();
for(size_t i = 0; i < len; i++)
{
if(!(number % knownPrimes[i]))
{
//dividing with a prime gets a already factorized number
knownFactorizations.push_back(knownFactorizations[number / knownPrimes[i]]);
knownFactorizations[number][i]++;
return;
}
}
//if this failed, number is a newly found prime
//because a) it has no known prime factors, so it must have others
//and b) if it is not a prime itself, then it's factors should've been
//found already (because they are smaller than the number itself)
knownPrimes.push_back(number);
len = knownFactorizations.size();
for(size_t s = 0; s < len; s++)
{
knownFactorizations[s].push_back(0);
}
knownFactorizations.push_back(knownFactorizations[0]);
knownFactorizations[number][knownPrimes.size() - 1]++;
}
uint64_t calculateDivisorCountOfN(uint64_t number)
{
//factors for number must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
for(size_t s = 0; s < len; s++)
{
if(knownFactorizations[number][s])
{
res *= (knownFactorizations[number][s] + 1);
}
}
return res;
}
uint64_t calculateDivisorCountOfTN(uint64_t number)
{
//factors for number and number+1 must be known
uint64_t res = 1;
size_t len = knownFactorizations[number].size();
vector<uint32_t> tmp(len, 0);
size_t s;
for(s = 0; s < len; s++)
{
tmp[s] = knownFactorizations[number][s]
+ knownFactorizations[number+1][s];
}
//remove /2
tmp[0]--;
for(s = 0; s < len; s++)
{
if(tmp[s])
{
res *= (tmp[s] + 1);
}
}
return res;
}
int main()
{
init();
uint64_t number = knownFactorizations.size() - 2;
uint64_t DTn = 0;
while(DTn <= limit)
{
number++;
addAnotherFactorization();
DTn = calculateDivisorCountOfTN(number);
}
uint64_t tn;
if(number % 2) tn = ((number+1)/2)*number;
else tn = (number/2)*(number+1);
cout << "the " << number << "th triangle number is "
<< tn << " and has " << DTn << " divisors" << endl;
return 0;
}
About your general question about speed:
1) Algorithms.
How to know them? For (relatively) simple problems, either reading a book/Wikipedia/etc. or figuring it out if you can. For harder stuff, learning more basic things and gaining experience is necessary before it's even possible to understand them, eg. studying CS and/or maths ... number theory helps a lot for your Euler problem. (It will help less to understand how a MP3 file is compressed ... there are many areas, it's not possible to know everything.).
2a) Automated compiler optimizations of frequently used code parts / patterns
2b) Manual timing what program parts are the slowest, and (when not replacing it with another algorithm) changing it in a way that eg. requires less data send to slow devices (HDD, hetwork...), less RAM memory access, less CPU cycles, works better together with OS scheduler and memory management strategies, uses the CPU pipeline/caches better etc.etc. ... this is both education and experience (and a big topic).
And because long variables have a limited size, sometimes it is necessary to use custom types that use eg. a byte array to store a single digit in each byte. That way, it's possible to use the whole RAM for a single number if you want to, but the downside is you/someone has to reimplement stuff like addition and so on for this kind of number storage. (Of course, libs for that exist already, without writing everything from scratch).
Btw., pow is a floating point function and may get you inaccurate results. It's not appropriate to use it in this case.