Sieve Of Atkin is surprisingly slow - c++

I recently became very interested in prime numbers and tried making programs to calculate them. I was able to make a sieve of Sundaram program that was able to calculate a million prime numbers in a couple seconds. I believe that's pretty fast, but I wanted better. I went on to try to make a Sieve of Atkin, I slapped together working C++ code in 20 minutes after copying the pseudocode from Wikipedia.
I knew that it wouldn't be perfect because after all, its pseudocode. I was expecting at least better times than my Sundaram Sieve though, but I was so wrong. It's very very slow. I have looked it over many times but I cannot find any significant changes that could be made. When looking at my code remember, I know it's inefficient, I know I used system commands, I know it's all over the place, but this isn't a project or anything important, it's for me.
#include <iostream>
#include <fstream>
#include <time.h>
#include <Windows.h>
#include <vector>
using namespace std;
int main(){
float limit;
float slimit;
long int n;
int counter = 0;
int squarenum;
int starttime;
int endtime;
vector <bool> primes;
ofstream save;
save.open("primes.txt");
save.clear();
cout << "Find all primes up to: " << endl;
cin >> limit;
slimit = sqrt(limit);
primes.resize(limit);
starttime = time(0);
// sets all values to false
for (int i = 0; i < limit; i++){
primes[i] = false;
}
//puts in possible primes
for (int x = 1; x <= slimit; x++){
for (int y = 1; y <= slimit; y++){
n = (4*x*x) + (y*y);
if (n <= limit && (n%12 == 1 || n%12 == 5)){
primes[n] = !primes[n];
}
n = (3*x*x) + (y*y);
if (n <= limit && n% 12 == 7){
primes[n] = !primes[n];
}
n = (3*x*x) - (y*y);
if ( x > y && n <= limit && n%12 == 11){
primes[n] = !primes[n];
}
}
}
//square number mark all multiples not prime
for (float i = 5; i < slimit; i++){
if (primes[i] == true){
for (long int k = i*i; k < limit; k = k + (i*i)){
primes[k] = false;
}
}
}
endtime = time(0);
cout << endl << "Calculations complete, saving in text document" << endl;
// loads to document
for (int i = 0 ; i < limit ; i++){
if (primes[i] == true){
save << counter << ") " << i << endl;
counter++;
}
}
save << "Found in " << endtime - starttime << " seconds" << endl;
save.close();
system("primes.txt");
system ("Pause");
return 0;
}

This isn't exactly an answer (IMO, you've already gotten an answer in the comments), but a quick standard for comparison. A sieve of Eratosthenes should find a million primes in well under a second on a reasonably modern machine.
#include <vector>
#include <iostream>
#include <time.h>
unsigned long primes = 0;
int main() {
// empirically derived limit to get 1,000,000 primes
int number = 15485865;
clock_t start = clock();
std::vector<bool> sieve(number,false);
sieve[0] = sieve[1] = true;
for(int i = 2; i<number; i++) {
if(!sieve[i]) {
++primes;
for (int temp = 2*i; temp<number; temp += i)
sieve[temp] = true;
}
}
clock_t stop = clock();
std::cout.imbue(std::locale(""));
std::cout << "Total primes: " << primes << "\n";
std::cout << "Time: " << double(stop - start) / CLOCKS_PER_SEC << " seconds\n";
return 0;
}
Running this on my laptop, I get a result of:
Total primes: 1000000
Time: 0.106 seconds
Obviously, speed will vary somewhat with processor, clock speed, etc., but with anything reasonably modern, I'd still expect a time of less than a second. Of course, if you decide to write the primes out to a file, you can expect that to add some time, but even with that I'd expect a total time under a second--with my laptop's relatively slow hard drive, writing out the numbers only gets the total up to about 0.6 seconds.

vector is a bitset. It is expensive to update bitset values that are not in cache. Try vector, it is much cheaper to write to.

Related

Find One to N is Prime optimization

So I was inspired by a recent Youtube video from the Numberphile Channel. This one to be exact. Cut to around the 5 minute mark for the exact question or example that I am referring to.
TLDR; A number is created with all the digits corresponding to 1 to N. Example: 1 to 10 is the number 12,345,678,910. Find out if this number is prime. According to the video, N has been checked up to 1,000,000.
From the code below, I have taken the liberty of starting this process at 1,000,000 and only going to 10,000,000. I'm hoping to increase this to a larger number later.
So my question or the assistance that I need is optimization for this problem. I'm sure each number will still take very long to check but even a minimal percentage of optimization would go a long way.
Edit 1: Optimize which division numbers are used. Ideally this divisionNumber would only be prime numbers.
Here is the code:
#include <iostream>
#include <chrono>
#include <ctime>
namespace
{
int myPow(int x, int p)
{
if (p == 0) return 1;
if (p == 1) return x;
if (p == 2) return x * x;
int tmp = myPow(x, p / 2);
if (p % 2 == 0) return tmp * tmp;
else return x * tmp * tmp;
}
int getNumDigits(unsigned int num)
{
int count = 0;
while (num != 0)
{
num /= 10;
++count;
}
return count;
}
unsigned int getDigit(unsigned int num, int position)
{
int digit = num % myPow(10, getNumDigits(num) - (position - 1));
return digit / myPow(10, getNumDigits(num) - position);
}
unsigned int getTotalDigits(int num)
{
unsigned int total = 0;
for (int i = 1; i <= num; i++)
total += getNumDigits(i);
return total;
}
// Returns the 'index'th digit of number created from 1 to num
int getIndexDigit(int num, int index)
{
if (index <= 9)
return index;
for (int i = 10; i <= num; i++)
{
if (getTotalDigits(i) >= index)
return getDigit(i, getNumDigits(i) - (getTotalDigits(i) - index));
}
}
// Can this be optimized?
int floorSqrt(int x)
{
if (x == 0 || x == 1)
return x;
int i = 1, result = 1;
while (result <= x)
{
i++;
result = i * i;
}
return i - 1;
}
void PrintTime(double num, int i)
{
constexpr double SECONDS_IN_HOUR = 3600;
constexpr double SECONDS_IN_MINUTE = 60;
double totalSeconds = num;
int hours = totalSeconds / SECONDS_IN_HOUR;
int minutes = (totalSeconds - (hours * SECONDS_IN_HOUR)) / SECONDS_IN_MINUTE;
int seconds = totalSeconds - (hours * SECONDS_IN_HOUR) - (minutes * SECONDS_IN_MINUTE);
std::cout << "Elapsed time for " << i << ": " << hours << "h, " << minutes << "m, " << seconds << "s\n";
}
}
int main()
{
constexpr unsigned int MAX_NUM_CHECK = 10000000;
for (int i = 1000000; i <= MAX_NUM_CHECK; i++)
{
auto start = std::chrono::system_clock::now();
int digitIndex = 1;
// Simplifying this to move to the next i in the loop early:
// if i % 2 then the last digit is a 0, 2, 4, 6, or 8 and is therefore divisible by 2
// if i % 5 then the last digit is 0 or 5 and is therefore divisible by 5
if (i % 2 == 0 || i % 5 == 0)
{
std::cout << i << " not prime" << '\n';
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
PrintTime(elapsed_seconds.count(), i);
continue;
}
bool isPrime = true;
int divisionNumber = 3;
int floorNum = floorSqrt(i);
while (divisionNumber <= floorNum && isPrime)
{
if (divisionNumber % 5 == 0)
{
divisionNumber += 2;
continue;
}
int number = 0;
int totalDigits = getTotalDigits(i);
// This section does the division necessary to iterate through each digit of the 1 to N number
// Example: Think of dividing 124 into 123456 on paper and how you would iterate through that process
while (digitIndex <= totalDigits)
{
number *= 10;
number += getIndexDigit(i, digitIndex);
number %= divisionNumber;
digitIndex++;
}
if (number == 0)
{
isPrime = false;
break;
}
divisionNumber += 2;
}
if (isPrime)
std::cout << "N = " << i << " is prime." << '\n';
else
std::cout << i << " not prime" << '\n';
auto end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end - start;
PrintTime(elapsed_seconds.count(), i);
}
}
Its nice to see you are working on the same question I pondered few months ago.
Please refer to question posted in Math Stackexchange for better resources.
TL-DR,
The number you are looking for is called SmarandachePrime.
As per your code, it seems you are dividing with every number that is not a multiple of 2,5. To optimize you can actually check for n = 6k+1 ( 𝑘 ∈ ℕ ).
unfortunately, it is still not a better approach with respect to the number you are dealing with.
The better approach is to use primality test screening to find probable prime numbers in the sequence and then check whether they are prime or not. These tests take a less time ~(O(k log3n)) to check whether a number is prime or not, using mathematical fundamentals, compared to division.
there are several libraries that provide functions for primality check.
for python, you can use gmpy2 library, which uses Miller-Rabin Primality test to find probable primes.
I recommend you to further read about different Primality tests here.
I believe you are missing one very important check, and it's the division by 3:
A number can be divided by 3 is the sum of the numbers can be divided by 3, and your number consists of all numbers from 1 to N.
The sum of all numbers from 1 to N equals:
N * (N+1) / 2
This means that, if N or N+1 can be divided by 3, then your number cannot be prime.
So before you do anything, check MOD(N,3) and MOD(N+1,3). If either one of them equals zero, you can't have a prime number.

Trying to create a multithreaded program to find the total primes from 0-100000000

Hello I am trying to write a C++ multithreaded program using POSIX thread library to find the number of prime numbers between 1 and 10,000,000 (10 million) and find out how many microseconds it takes...
Creating my threads and running them works completely fine, however I feel as if there is an error found in my Prime function when determining if a number is prime or not...
I keep receiving 78496 as my output, however I desire 664579. Below is my code. Any hints or pointers would be greatly appreciated.
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <unistd.h>
#include <iostream>
#include <sys/time.h> //measure the execution time of the computations
using namespace std;
//The number of thread to be generated
#define NUMBER_OF_THREADS 4
void * Prime(void* index);
long numbers[4] = {250000, 500000, 750000, 1000000};
long start_numbers[4] = {1, 250001, 500001, 750001};
int thread_numbers[4] = {0, 1, 2, 3};
int main(){
pthread_t tid[NUMBER_OF_THREADS];
int tn;
long sum = 0;
timeval start_time, end_time;
double start_time_microseconds, end_time_microseconds;
gettimeofday(&start_time, NULL);
start_time_microseconds = start_time.tv_sec * 1000000 + start_time.tv_usec;
for(tn = 0; tn < NUMBER_OF_THREADS; tn++){
if (pthread_create(&tid[tn], NULL, Prime, (void *) &thread_numbers[tn]) == -1 ) {
perror("thread fail");
exit(-1);
}
}
long value[4];
for(int i = 0; i < NUMBER_OF_THREADS; i++){
if(pthread_join(tid[i],(void **) &value[i]) == 0){
sum = sum + value[i]; //add four sums together
}else{
perror("Thread join failed");
exit(-1);
}
}
//get the end time in microseconds
gettimeofday(&end_time, NULL);
end_time_microseconds = end_time.tv_sec * 1000000 + end_time.tv_usec;
//calculate the time passed
double time_passed = end_time_microseconds - start_time_microseconds;
cout << "Sum is: " << sum << endl;
cout << "Running time is: " << time_passed << " microseconds" << endl;
exit(0);
}
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
for (int j=2; j*j <= i; j++)
{
if (i % j == 0)
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}```
This is because, you used 10^6 instead of 10^7.
Also, added some corner cases for numbers 1, 2 and 3:
//Prime function
void* Prime(void* index){
int temp_index;
temp_index = *((int*)index);
long sum_t = 0;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
// Corner cases
if(i<=1)continue;
if (i <= 3){
sum_t++;
continue;
}
for (int j=2; j*j <= i; j++)
{
if ((i % j == 0) || (i %( j+2))==0 )
{
break;
}
else if (j+1 > sqrt(i)) {
sum_t++;
}
}
}
cout << "Thread " << temp_index << " terminates" << endl;
pthread_exit( (void*) sum_t);
}
I tested your code with correct number and got the correct number of primes as output:
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 4.69242e+07 microseconds
Thanks to #chux - Reinstate Monica for pointing this out
Along with taking 10^7 as the numbers divided in thread instead of setting the limit as 10^6 ,a number of other small scale errors are there and a number of optimizations could be made -
First of all start numbers could be from 2 itself
long start_numbers[4] = {2, 2500001, 5000001, 7500001};
sum_t++ in your code may not work on edge cases. It is better to follow the following algorithm for calculating Prime function
bool flag = false;
for(long i = start_numbers[temp_index]; i <= numbers[temp_index]; i++){
flag = false;
for (long j=2; j*j <= i; j++){
if (i % j == 0 )
{
flag = true;
break;
}
}
if(!flag)
sum_t++;
}
After these 2 operations i am getting the result as
Thread 0 terminates
Thread 1 terminates
Thread 2 terminates
Thread 3 terminates
Sum is: 664579
Running time is: 6.62618e+06 microseconds
edit:
( Note : in this case j is taken as long datatype but it could work as well with int in this 'example' since the tested compiler takes int as 32 bits long)

Why my Shell sorting is so slow

I am trying to implement shell sorting algorithm myself. I wrote my own code and didn't watch to any code samples only watch the video of algorithm description
My sort works but very slow (bubble sort 100 items - 0.007 s; shell sort 100 items - 4.83 s), how is it possible to improve it?
void print(vector<float>vec)
{
for (float i : vec)
cout << i << " ";
cout << "\n\n";
}
void Shell_sorting(vector<float>&values)
{
int swapping = 0;
int step = values.size();
clock_t start;
double duration;
start = clock();
while (step/2 >= 1)
{
step /= 2;
for (int i = 0; i < values.size()-step; i++)
{
if ((i + step < values.size()))
{
if ((values[i + step] < values[i]))
{
swap(values[i], values[i + step]);
print(values);
++swapping;
int c = i;
while (c - step > 0)
{
if (values[c] < values[c - step])
{
swap(values[c], values[c - step]);
print(values);
++swapping;
c -= step;
}
else
break;
}
}
}
else
break;
}
}
duration = (clock() - start) / (double)CLOCKS_PER_SEC;
print(values);
cout << swapping << " " << duration;
print(values);
}
A better implementation could be:
#include <iostream>
#include <vector>
int main()
{
std::vector<int> vec = {
726,621,81,719,167,958,607,130,263,108,
134,235,508,407,153,162,849,923,996,975,
250,78,460,667,654,62,865,973,477,912,
580,996,156,615,542,655,240,847,613,497,
274,241,398,84,436,803,138,677,470,606,
226,593,620,396,460,448,198,958,566,599,
762,248,461,191,933,805,288,185,21,340,
458,592,703,303,509,55,190,318,310,189,
780,923,933,546,816,627,47,377,253,709,
992,421,587,768,908,261,946,75,682,948,
};
std::vector<int> gaps = {5, 2, 1};
int j;
for (int gap : gaps) {
for (int i = gap; i < vec.size(); i++)
{
j = i-gap;
while (j >= 0) {
if (vec[j+gap] < vec[j])
{
int temp = vec[j+gap];
vec[j+gap] = vec[j];
vec[j] = temp;
j = j-gap;
}
else break;
}
}
}
for (int item : vec) std::cout << item << " " << std::endl;
return 0;
}
I prefer to use a vector to store gap data so that you do not need to compute the division (which is an expansive operation). Besides, this choice, gives your code more flexibility.
the extern loop cycles on gap values. Once choosen the gap, you iterate over your vector, starting from vec[gap] and explore if there are elements smaller then it according to the logic of the Shell Sort.
So, you start setting j=i-gap and test the if condition. If it is true, swap items and then repeat the while loop decrementing j. Note: vec[j+gap]is the element that in the last loop cycle was swapped. If the condition is true, there's no reason to continue in the loop, so you can exit from it with a break.
On my machine, it took 0.002s calculated using the time shell command (the time includes the process of printing numbers).
p.s. to generate all that numbers and write them in the array, since i'm too lazy to write a random function, i used this link and then i edited the output in the shell with:
sed -e 's/[[:space:]]/,/g' num | sed -e 's/$/,/'

Optimizing bubble sort - What am I missing?

I'm trying to understand possible optimization methods for the bubble sort algorithm. I know there are better sorting methods, but I'm just curious.
To test the efficiency I'm using std::chrono. The program sorts a 10000 number long int array 30 times and prints the average sorting time. The numbers are picked randomly(up to 10000) in every iteration. Here is the code, with no optimization:
#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;
int main() {
//bubble sort
srand(time(NULL));
chrono::time_point<chrono::steady_clock> start, end;
const int n = 10000;
int i,j, last, tests = 30,arr[n];
long long total = 0;
bool out;
while (tests-->0) {
for (i = 0; i < n; i++) {
arr[i] = rand() % 1000;
}
j = n;
start = chrono::high_resolution_clock::now();
while(1){
out = 0;
for (i = 0; i < j - 1; i++) {
if (arr[i + 1] < arr[i]) {
swap(arr[i + 1], arr[i]);
out = 1;
}
}
if (!out) {
break;
}
//j--;
}
end = chrono::high_resolution_clock::now();
total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
cout << "Remaining :"<<tests << endl;
}
cout << "Average :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30) + nanosec -> sec
cin.sync();
cin.ignore();
return 0;
}
I get 0.17 seconds average sorting time.
If I uncomment line 47(j--;) to avoid comparing numbers already sorted I get 0.12 sorting time which is understandable.
If I remember the last position where a swap took place, I know that after that index, elements are sorted, and can thus sort up to that position in further iterations. It's better explained in the second part of this post: https://stackoverflow.com/a/16196115/1967496.
This is the code that implements the new possible optimization:
#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;
int main() {
//bubble sort
srand(time(NULL));
chrono::time_point<chrono::steady_clock> start, end;
const int n = 10000;
int i,j, last, tests = 30,arr[n];
long long total = 0;
bool out;
while (tests-->0) {
for (i = 0; i < n; i++) {
arr[i] = rand() % 1000;
}
j = n;
start = chrono::high_resolution_clock::now();
while(1){
out = 0;
for (i = 0; i < j - 1; i++) {
if (arr[i + 1] < arr[i]) {
swap(arr[i + 1], arr[i]);
out = 1;
last = i;
}
}
if (!out) {
break;
}
j = last + 1;
}
end = chrono::high_resolution_clock::now();
total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
cout << "Remaining :"<<tests << endl;
}
cout << "Average :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30) + nanosec -> sec
cin.sync();
cin.ignore();
return 0;
}
Note lines 40 and 48. And here comes the problem: The average time is now again around 0.17 seconds.
Is there a problem in my code, or am I missing something ?
Update:
I did sorting with 10 times more numbers and get now following results:
No optimization: 19.3 seconds
First optimization(j--): 14.5 seconds
Second (supposed) optimization(j=last+1): 17.4 seconds;
From my understanding, the second method should be in any case better than the first, but the numbers tell something else.
Well... The problem is that there might not be the right or wrong answer to this question.
First of all, when you're comparing only 10000 elements, you cannot really call it an effeciency test. Try comparing much higher number of elements - maybe 500000 (although you will probably need to alocate an array dynamicaly for that).
Second of all, it might be the compiler. Compilers often try to optimize things so that the program execution will run smoother and faster.

Same random numbers being created everytime the loop goes around.

I am a beginner at C++ and for one of my project involves loop inside loops and creating random numbers. Here is what I have so far:
`
using namespace std;
int main()
{
srand((unsigned int)time(0));
{
cout << "Name of reservoir: ";
string reservior_name;
cin >> reservior_name;
cout << "Capacity in MAF: ";
double capacity;
cin >> capacity;
cout << "Maximum inflow in MAF: ";
int max;
cin>> max;
cout << "minimum inflow in MAF: ";
int min;
cin >> min;
if(min>max)
{cout<<endl<<"Error: The minimum inflow is higher than the maximum inflow."<<endl
<< "Please re-enter your minimum inflow: ";
cin>>min;
}
double inflow_range= max-min;
cout <<"required outflow in MAF: ";
double required;
cin >> required;
if (required > 0.9 * (min + max)/2)
{
cout<<endl<< "Warning: required ouflow is over 90% of the average inflow."<<endl
<< "Returning to main menu ";
}
else
{ const int simulations = 10;
int water_level = 0;
int years = 1;
cout << "Running simulation..." << endl;
for (int i = 1; i <= simulations; i++)
{
int x = (rand()% (max-min + 1)) + min;
while (water_level < capacity)
{
//double r = rand() * 1.0 / RAND_MAX;
//double x = min + inflow_range * r;
//int x = (rand()% (max-min + 1)) + min;
if (water_level + x > required)
{
water_level = water_level + x - required;
}
else
{
water_level= 0;
}
years++;
}
cout <<"Simulation "<< i <<" took " << years <<" years to finish"<< endl;
}
}
}
system ("pause");
return 0;
}
`
So my main question is I'm running into a wall concerning setting up the for loops underneath "Running simulation" where I need to set up the first for loop to run the internal for loop 10 times, with each of those 10 iterations of the internal for loop coming up with random numbers for the range of acceptable results from the query for a random value. I've been told that the idea is to use the Monte Carlo method, i.e. I put in here both the Monte Carlo method and the normal random number generating method. Here it is:
for (int i = 1; i <= simulations; i++)
{
int x = (rand()% (max-min + 1)) + min;
while (water_level < capacity)
{
//double r = rand() * 1.0 / RAND_MAX;
//double x = min + inflow_range * r;
//int x = (rand()% (max-min + 1)) + min;
so the program will create a random value for the inflow. The idea is that the internal for loop will continue to run until the fill_level of the reservoir, which starts at 0, hits the capacity. The process of simulating how many years (each iteration of the internal for loop representing a year) is to be repeated 10 times by the parent for loop of the water_level simulation for loop.
The problem is that the random number that is supposed to created are the same number. THey are different every time I run it, but they are the same every time the loops repeat to make a new simulation. I have tried to figure out what the problem is for hours and still stuck. Any help is very appreciated.
The x is random in your code, the problem is the algorithm and calculation after that. See your code live.
You've forgotten to reset simulation parameter at each iteration, put these inside simulation loop:
--------------------------------------------+
|
for (int i = 1; i <= simulations; i++) |
{ |
int water_level = 0; <--+
int years = 1; <--+
int x = (rand() % (max - min + 1)) + min;
See the code after this edition: live code. The output is
Simulation 1 took 68 years to finish
Simulation 2 took 101 years to finish
Simulation 3 took 8 years to finish
With the code as shown, each iteration (simulation) gets a single value of x for all the years that are simulated. Your commented out code generates a new value of x for each year. Which is the method you want? I'm inclined to think that the inflow varies from year to year, so you should generate a new value of x for each year.
It also looks like you should reset years and water_level for each simulation.
cout << "Running simulation..." << endl;
for (int i = 1; i <= simulations; i++)
{
int water_level = 0;
int years = 1;
while (water_level < capacity)
{
int x = (rand() % (max - min + 1)) + min;
if (water_level + x > required)
water_level += x - required;
else
water_level = 0;
years++;
}
cout <<"Simulation "<< i <<" took " << years <<" years to finish"<< endl;
}
And for debugging, I'd want to print the control parameters (min, max, capacity, required), and then print the key values (year, x, water_level) on each iteration of the inner while loop until I was satisfied it was working correctly.