Sieve of Eratosthenes for large numbers c++

Sieve of Eratosthenes for large numbers c++ - c++

Just like this question, I also am working on the sieve of Eratosthenes. Also from the book "programming principles and practice using c++", chapter 4. I was able to implement it correctly and it is functioning exactly as the exercise asks.
#include <iostream>
#include <vector>
using namespace std;
int main() {
unsigned int amount = 0;
cin >> amount;
vector<int>numbers;
for (unsigned int i = 0; i <= amount; i++) {
numbers.push_back(i);
}
for (unsigned int p = 2; p < amount; p++) {
if (numbers[p] == 0)
continue;
cout << p << '\n';
for (unsigned int i = p + p; i <= amount; i += p) {
numbers[i] = false;
}
}
return 0;
}
Now, how would I be able to handle real big numbers in the amount input? The unsigned int type should allow me to enter a number of 2^32=4,294,967,296. But I can't, I run out of memory. Yes, I've done the math: storing 2^32 amount of int, 32 bits each. So 32/8*2^32=16 GiB of memory. I have just 4 GiB...
So what I am really doing here is setting non-primes to zero. So I could use a boolean. But still, they would take 8 bits, so 1 byte each. Theoretical I could go to the limit for unsigned int (8/8*2^32=4 GiB), using some of my swap space for the OS and overhead. But I have a x86_64 PC, so what about numbers larger than 2^32?
Knowing that primes are important in cryptography, there must be a more efficient way of doing this? And are there also ways to optimize the time needed to find all those primes?

In the sense of storage, you could use the std::vector<bool> container. Because of how it works, you have to trade in speed for storage. Because this implements one bit per boolean, your storage becomes 8 times as efficient. You should be possible to get numbers close to 8*4,294,967,296 if you have all your RAM available for this one program. Only thing you need to do, is use unsigned long long to unleash the availability of 64 bit numbers.
Note: Testing the program with the code example below, with an amount input of 8 billion, caused the program to run with a memory usage of approx. 975 MiB, proving the theoretical number.
You can also gain some time, because you can declare the complete vector at once, without iteration: vector<bool>numbers (amount, true); creates a vector of size equal to input amount, with all elements set to true. Now, you can adjust the code to set non-primes to false instead of 0.
Furthermore, once you have followed the sieve up to the square root of amount, all numbers that remain true are primes. Insert if (p * p >= amount) as an additional continue condition, just after you output the prime number. Also this is a humble improvement for your processing time.
Edit: In the last loop, p can be squared, because all numbers until the square of p are already proved not to be primes by previous numbers.
All together you should get something like this:
#include <iostream>
#include <vector>
using namespace std;
int main() {
unsigned long long amount = 0;
cin >> amount;
vector<bool>numbers (amount, true);
for (unsigned long long p = 2; p < amount; p++) {
if ( ! numbers[p])
continue;
cout << p << '\n';
if (p * p >= amount)
continue;
for (unsigned long long i = p * p; i <= amount; i += p) {
numbers[i] = false;
}
}
return 0;
}

You've asked a couple of different questions.
For primes up to 2**32, sieving is appropriate, but you need to work in segments instead of in one big blog. My answer here tells how to do that.
For cryptographic primes, which are very much larger, the process is to pick a number and then test it for primality, using a probabilistic test such as a Miller-Rabin test or a Baillie-Wagstaff test. This process isn't perfect, and occasionally a composite might be chosen instead of a prime, but such an occurrence is very rare.

Related

Is there a way to factorize large numbers in c++

I've written the following C++ code to factorize really large numbers efficiently (numbers up to 24997300729).
I have a vector containing 41000 primes approx.( I know having such a large vector isn't a good idea although but couldn't figure a way around this).
This code produces the prime factorization of moderately large numbers in no time but when it comes to numbers such as, 24997300572 the program stalls.
Here's the program below with some screenshots of the output:
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <cmath>
using namespace std;
vector<int> primes = {paste from
https://drive.google.com/file/d/1nGvtMMQSa9YIDkMW2jgEbJk67P7p54ft/view?usp=sharing
};
void factorize(int n) {
if (n == 1)
return;
if (find(primes.begin(), primes.end(), n) != primes.end()) {
cout << n <<" "; //if n is prime dont'proceed further
return;
}
//obtaining an iterator to the location of prime equal to or just greater than sqrt(n)
auto s = sqrt(n);
vector<int>::iterator it = lower_bound(primes.begin(), primes.end(), s);
if (it == primes.end()) {
return; // if no primes found then the factors are beyond range
}
for (auto i = it;i != primes.begin();i--) {
if (n % *i == 0)
{
cout << *i << " ";
n = n / (*i);
factorize(n);
return; // the two consecutive for() loops should never run one after another
}
}
for (auto i = it;i != primes.end();i++) {
if (n % *i == 0)
{
cout << *i << " ";
n = n / (*i);
factorize(n);
return; // the two consecutive for() loops should never run one after another
}
}
}
int main() {
unsigned int n;
cout << "Enter a number between 1 and 24997300729 ";
cin >> n;
if (n > 24997300729) {
cout << "Number out of range;";
exit(-1);
}
factorize(n);
return 0;
}
This is OK
But This is NOT!!!
I tried using long long int and long double wherever I could to over come the problem of large numbers, but that didn't help much.
Any help Would Be Greatly Appreciated

It's a little unclear (at least to me) exactly why you've structured the program the way you have.
You can fully factor a number by only looking for prime factors less than or equal to that number's square root. Any prime factor larger than those pairs with one prime factors smaller than that, so you only have to search for those to find all the prime factors. Any remaining factors can be obtained by simple division, not searching.
I'd probably generate the base of prime numbers on the fly (mostly likely using a sieve). The square root of 24'997'300'729 is (about) 158'105. A quick test shows that even without any work on optimization, a sieve of Eratosthenes will find the primes up to that limit in about 12 milliseconds.
Personally, I'd rather not have a fixed limit on the largest number the user can factor, other than the limit on the size of number we're working with, so if the user enters something close to the limit for a 64-bit number, we find all the primes that fit in 32 bits, and then use those to factor the number. This will obviously be slower than if we don't find as many primes, but a user probably won't be too surprised at the idea that factoring a larger number takes longer than factoring a smaller number.
So, implementing that, we might end up with code something like this:
#include <iostream>
#include <locale>
#include <vector>
#include <string>
using Number = unsigned long long;
auto build_base(Number limit) {
std::vector<bool> sieve(limit / 2, true);
for (Number i = 3; i < limit; i += 2) {
if (sieve[i / 2]) {
for (Number temp = i * i; temp < limit; temp += i)
if (temp & 1)
sieve[temp / 2] = false;
}
}
return sieve;
}
void factor(Number input, std::vector<bool> const &candidates)
{
while (input % 2 == 0) {
std::cout << 2 << "\t";
input /= 2;
}
for (Number i = 1; i < candidates.size(); i++) {
if (candidates[i]) {
auto candidate = i * 2 + 1;
while ((input % candidate) == 0) {
std::cout << candidate << "\t";
input /= candidate;
}
}
}
if (input != 1)
std::cout << input;
}
int main(int argc, char **argv) {
std::cout.imbue(std::locale(""));
if (argc != 2) {
std::cerr << "Usage: factor <number>\n";
return EXIT_FAILURE;
}
auto number = std::stoull(argv[1]);
auto limit = std::sqrt(number) + 1;
auto candidates = build_base(limit);
factor(number, candidates);
}
At a high level, the code works like this: we start by finding the primes up to the square root of the number the user entered. Since we want all the primes up to a limit, we use a sieve of Eratosthenes to find them. This builds a vector of bools, in which vector[n] will be true if n is prime, and false if n is composite. It does this starting from 3 (2 is a special case we kind of ignore for now) and crossing off the multiples of three. Then it finds the next number that hasn't been crossed off (which will be five, in this case), and crosses off its multiples. It continues doing that until it reaches the end of the array. To save some space, it leaves all the even numbers out of the array, because (other than that special case for 2) we already know none of them is prime.
Once we have that, we use those prime numbers to find prime factors of the number we want to factor. This proceeds pretty simply: walk through the vector of primes, and test whether each prime number divides evenly into the target number. If it does, print it out, divide it out of the target number, and continue.
At least for me, this seems to work pretty dependably, and is reasonably fast. If we wanted to do a better job of factoring larger numbers, the next big step would be to switch to a segmented sieve. This can improve the speed of the first part of the job by a pretty wide margin, allowing us (for example) to factor anything that'll fit into a 64-bit number in no more than about 10 seconds.

Segmentation fault when my array is too big

Edit: Seems like the error is simply 9,999,999,999,999 being too big a number for an array.
I get this error "program received signal sigsegv segmentation fault" on my code.
Basically my code is to do an integer factorisation, and the exercise can be seen on codeabbey here. Basically I will receive input like 1000 and output them as a product of their factors, 2*2*2*5*5*5 in this case.
I do the above by having a vector of prime numbers which I generate using the Sieve of Eratosthenes method.
According to the website, the number of digits of the input will not exceed 13, hence my highest number is 9,999,999,999,999. Below is my code.
#include <iostream>
#include <vector>
#include <cstring>
unsigned long long int MAX_LIMIT = 9999999999999;
std::vector<unsigned long long int> intFactorisation (unsigned long long int num) {
std::vector<unsigned long long int> answers;
static std::vector<unsigned long long int> primes;
if (primes.empty()) { // generate prime numbers using sieve method
bool *arr = new bool[MAX_LIMIT];
memset (arr, true, MAX_LIMIT);
for (unsigned long long int x = 2; x*x < MAX_LIMIT; x++) {
if (arr[x] == true) {
for (unsigned long long int y = x*x; y < MAX_LIMIT; y += x) {
arr[y] = false; // THIS LINE ALWAYS HAS AN ERROR!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
}
}
}
for (unsigned long long int x = 2; x <= MAX_LIMIT; x++) {
if (arr[x]) {
primes.push_back(x);
}
}
}
std::vector<unsigned long long int>::iterator it = primes.begin(); // start the factorisation
while(it != primes.end()) {
if (num % *it == 0) {
answers.push_back(*it);
num/=*it;
}
else {
it++;
}
if (num == 1) {
break;
}
}
return answers;
}
int main() {
int maxi;
std::cin >> maxi;
int out[maxi];
for (int x = 0; x < maxi; x++) {
std::cin >> out[x];
}
for (auto it : out) {
std::vector<unsigned long long int> temp = intFactorisation(it);
for (std::vector<unsigned long long int>::iterator it = temp.begin();
it != temp.end(); it++) {
if (it == temp.end() - 1) {
std::cout << *it << " ";
}
else {
std::cout << *it << "*";
}
}
}
}
However, for some reason the program will always terminate at arr[y] = false in the function intFactorisation. I will see a notification giving the segmentation fault message on the bottom left of my screen while using CodeBlocks.
I already used 'new' on my ridiculously large array, so memory should be on the heap. I have tried using a smaller MAX_LIMIT such as 100,000 and my function works. Anyone knows why?
Also, I am wondering why I don't need to dereference my pointer arr. For example, arr[y] = false works, but *arr[y] or (*arr)[y] doesn't. Hopefully this can be clarified too.
Thanks for reading this and I appreciate any help.

There are two types of issues in the posted code, the memory managment and the algorithms.
Resource acquisition
The program exhibits three kinds of memory allocation:
Variable length array. In main, out is declared as int out[maxi], maxi beeing a variable, not a compile time constant. This is a C99 feature, it's never been part of any C++ standard, even if it is offered as a feature by some compiler.
bool *arr = new bool[MAX_LIMIT];. Modern guidelines suggest to avoid the use of bare new to allocate memory and to prefer smart pointers, standard containers and in general to follow the RAII idiom, but that's not even the major problem here. MAX_LIMIT is way too big to not lead to a std::bad_alloc exception and also delete is never called.
In the same function there's also a loop which will end up accessing the (unlikely) allocated memory out of bounds: for (unsigned long long int x = 2; x <= MAX_LIMIT; x++) { if (arr[x]) {, x will eventually become MAX_LIMIT, but you'll run out of memory way before.
std::vector. That would be the way, if only the program wouldn't try to fill the vector primes with all the primes up to MAX_LIMIT, which is 1013-1 or almost 80TB assuming a 64-bit type.
Algorithm
The idea is to first calculate all the possible primes and then, for each inputted number, to check if any of them is a factor. The problem is that the maximum possible number is a really big one, but the good news is that you don't need to calculate and store all the primes up to that number, but just to the square root of that.
Imagine to have tried all the primes up to that square root (let's call it S) and consequently divided the original number by any factor found. If there's still a remainder, that value isn't divisible by any of the mentioned primes and is less or equal to the original number. It has to be a prime number itself. All the possible factors <= S have been already ruled out, and so any hypothetical factor > S (what would be the result of the division by it? another already tested prime less than S).
From a design point of view, I'd also address how the primes are calculated and stored in OP's code. The factorization function is basically written as the follows.
unsigned long long int MAX_LIMIT = 9999999999999;
std::vector<unsigned long long int> intFactorisation (unsigned long long int num)
{
static std::vector<unsigned long long int> primes;
// ^^^^^^ ^
if (primes.empty())
{
// generate prime numbers up to MAX_LIMIT using sieve method
}
// Use the primes...
}
The static vector could have been initialized using a separete function and also declared const, but given the close relation with the factorization function it could be better to wrap those data and functionalities into a class responsible for the correct allocation and initialization of the resources.
Since the introduction of lambdas in the language, we can avoid most of the boilerplate associated with a normal functor class, so that the factorization function might be constructed as a stateful lambda returned by the following:
auto make_factorizer(uint64_t max_value)
{
uint64_t sqrt_of_max = std::ceil(std::sqrt(max_value));
// Assuming to have a function returning a vector of primes.
// The returned lambda will accept the number to be factorized and
// a reference to a vector where the factors will be stored
return [primes = all_primes_up_to(sqrt_of_max)](uint64_t number,
std::vector<uint64_t>& factors)
{
uint64_t n{number};
factors.clear();
// Test against all the known primes, in ascending order
for (auto prime : primes)
{
// You can stop earlier, if prime >= sqrt(n)
if (n / prime <= prime)
break;
// Keep "consuming" the number with the same prime,
// a factor can have a multiplicity greater than one.
while (n % prime == 0)
{
n /= prime;
factors.push_back(prime);
}
}
// If n == 1, it has already found and stored all the factors.
// If it has run out of primes or breaked out from the loop,
// either what remains is a prime or number was <= 1.
if (n != 1 || n == number)
{
factors.push_back(n);
}
};
}
Here a live implementation is tested.

Simpler solution could be finding all prime factors on-the-fly rather than initially listing all prime numbers and checking whether it is a factor of a given number.
Here is the function to find the factor without allocating large memory,
std::vector<unsigned long long int> intFactorisation(unsigned long long int num) {
std::vector<unsigned long long int> answers{};
unsigned long long int Current_Prime = 2;
bool found;
while (num!=1 && num!=0)
{
//push all the factor repeatedly until it is not divisible
// for input 12 push 2,2 and exit
while (num % Current_Prime == 0) {
answers.push_back(Current_Prime);
num /= Current_Prime;
}
//find Next Prime factor
while (Current_Prime <= num) {
Current_Prime++;
found = true;
for (unsigned long long int x = 2; x*x < Current_Prime; x++)
{
if (Current_Prime%x == 0) {
found = false;
break;
}
}
if (found == true)
break;
}
}
return answers;
}

Check if number is automorphic

I'm getting two numbers. First natural number n and second n - digit number. n range is 1<=n<=50000. The problem is how can I do n * n on big numbers with for example 49000 digits. I was trying to do it on string, then I have array with each digit but what then? Write function that multiply n * n as string? I didn't have idea how to start it. Any ideas?
EDIT
I check if number is automorphic but how to edit it to work with numbers to 50000 digits?
#include <cstdlib>
#include <iostream>
using namespace std;
int main() {
unsigned int n, m = 10, a, b;
cin >> n;
b = m;
while (n > b) {
b *= m;
}
a = (n * n) % b;
if (a == n)
cout << "OK";
else
cout << "NO";
return 0;
}

They're various ways of dealing with big int in C++
Using a library, like boost::xint, Matt McCutchen bigint, InfInt, etc...
Doing the needed operation by hand (if they're not much operations needed, you could implemented), in this case you only need multiplication (the module is with a power of 10 and can easily implemented), you could use for example std::vector<unsigned char> to store the digits of n and do the multiplication as teach in school, digit by digit, and report the last digits needed.
Note: you could do only part of the multiplication for obtaining the last digits needed (by take care with how much you need, for the digits needed). For 5000 digits, doing all the multiplication would finish lightning fast.

fastest method for finding number of prime numbers between two large numbers x and y

here x,y<=10^12 and y-x<=10^6
i have looped from left to right and checked each number for a prime..this method is very slow when x and y are somewhat like 10^11 and 10^12..any faster approach?
i hv stored all primes till 10^6..can i use them to find primes between huge values like 10^10-10^12?
for(i=x;i<=y;i++)
{
num=i;
if(check(num))
{
res++;
}
}
my check function
int check(long long int num)
{
long long int i;
if(num<=1)
return 0;
if(num==2)
return 1;
if(num%2==0)
return 0;
long long int sRoot = sqrt(num*1.0);
for(i=3; i<=sRoot; i+=2)
{
if(num%i==0)
return 0;
}
return 1;
}

Use a segmented sieve of Eratosthenes.
That is, use a bit set to store the numbers between x and y, represented by x as an offset and a bit set for [0,y-x). Then sieve (eliminate multiples) for all the primes less or equal to the square root of y. Those numbers that remain in the set are prime.
With y at most 1012 you have to sieve with primes up to at most 106, which will take less than a second in a proper implementation.

This resource goes through a number of prime search algorithms in increasing complexity/efficiency. Here's the description of the best, that is PG7.8 (you'll have to translate back to C++, it shouldn't be too hard)
This algorithm efficiently selects potential primes by eliminating multiples of previously identified primes from consideration and
minimizes the number of tests which must be performed to verify the
primacy of each potential prime. While the efficiency of selecting
potential primes allows the program to sift through a greater range of
numbers per second the longer the program is run, the number of tests
which need to be performed on each potential prime does continue to
rise, (but rises at a slower rate compared to other algorithms).
Together, these processes bring greater efficiency to generating prime
numbers, making the generation of even 10 digit verified primes
possible within a reasonable amount of time on a PC.
Further skip sets can be developed to eliminate the selection of potential primes which can be factored by each prime that has already
been identified. Although this process is more complex, it can be
generalized and made somewhat elegant. At the same time, we can
continue to eliminate from the set of test primes each of the primes
which the skip sets eliminate multiples of, minimizing the number of
tests which must be performed on each potential prime.

You can use the Sieve of Eratosthenes algorithm. This page has some links to implementations in various languages: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes.

Here is my implementation of Sieve of Erathostenes:
#include <string>
#include <iostream>
using namespace std;
const int k = 110000; //you can change this constant to whatever maximum int you would need to calculate
long int p[k]; //here we would store Sieve of Erathostenes from 2 to k
long int j;
void init_prime() //in here we set our array
{
for (int i = 2; i <= k; i++)
{
if (p[i] == 0)
{
j = i;
while (j <= k)
{
p[j] = i;
j = j + i;
}
}
}
/*for (int i = 2; i <= k; i++)
cout << p[i] << endl;*/ //if you uncomment this you can see the output of initialization...
}
string prime(int first, int last) //this is example of how you can use initialized array
{
string result = "";
for (int i = first; i <= last; i++)
{
if (p[i] == i)
result = result + to_str(i) + "";
}
return result;
}
int main() //I done this code some time ago for one contest, when first input was number of cases and then actual input came in so nocases means "number of cases"...
{
int nocases, first, last;
init_prime();
cin >> nocases;
for (int i = 1; i <= nocases; i++)
{
cin >> first >> last;
cout << prime(first, last);
}
return 0;
}
You can use the Sieve of Erathostenes to calculate factorial too. This is actually the fastest interpretation of the Sieve I could manage to create that day (it can calculate the Sieve of this range in less than a second)

C++ Euler-Problem 14 Program Freezing

I'm working on Euler Problem 14:
http://projecteuler.net/index.php?section=problems&id=14
I figured the best way would be to create a vector of numbers that kept track of how big the series was for that number... for example from 5 there are 6 steps to 1, so if ever reach the number 5 in a series, I know I have 6 steps to go and I have no need to calculate those steps. With this idea I coded up the following:
#include <iostream>
#include <vector>
#include <iomanip>
using namespace std;
int main()
{
vector<int> sizes(1);
sizes.push_back(1);
sizes.push_back(2);
int series, largest = 0, j;
for (int i = 3; i <= 1000000; i++)
{
series = 0;
j = i;
while (j > (sizes.size()-1))
{
if (j%2)
{
j=(3*j+1)/2;
series+=2;
}
else
{
j=j/2;
series++;
}
}
series+=sizes[j];
sizes.push_back(series);
if (series>largest)
largest=series;
cout << setw(7) << right << i << "::" << setw(5) << right << series << endl;
}
cout << largest << endl;
return 0;
}
It seems to work relatively well for smaller numbers but this specific program stalls at the number 113382. Can anyone explain to me how I would go about figuring out why it freezes at this number?
Is there some way I could modify my algorithim to be better? I realize that I am creating duplicates with the current way I'm doing it:
for example, the series of 3 is 3,10,5,16,8,4,2,1. So I already figured out the sizes for 10,5,16,8,4,2,1 but I will duplicate those solutions later.
Thanks for your help!

Have you ruled out integer overflow? Can you guarantee that the result of (3*j+1)/2 will always fit into an int?
Does the result change if you switch to a larger data type?
EDIT: The last forum post at http://forums.sun.com/thread.jspa?threadID=5427293 seems to confirm this. I found this by googling for 113382 3n+1.

I think you are severely overcomplicating things. Why are you even using vectors for this?
Your problem, I think, is overflow. Use unsigned ints everywhere.
Here's a working code that's much simpler and that works (it doesn't work with signed ints however).
int main()
{
unsigned int maxTerms = 0;
unsigned int longest = 0;
for (unsigned int i = 3; i <= 1000000; ++i)
{
unsigned int tempTerms = 1;
unsigned int j = i;
while (j != 1)
{
++tempTerms;
if (tempTerms > maxTerms)
{
maxTerms = tempTerms;
longest = i;
}
if (j % 2 == 0)
{
j /= 2;
}
else
{
j = 3*j + 1;
}
}
}
printf("%d %d\n", maxTerms, longest);
return 0;
}
Optimize from there if you really want to.

When i = 113383, your j overflows and becomes negative (thus never exiting the "while" loop).
I had to use "unsigned long int" for this problem.

The problem is overflow. Just because the sequence starts below 1 million does not mean that it cannot go above 1 million later. In this particular case, it overflows and goes negative resulting in your code going into an infinite loop. I changed your code to use "long long" and this makes it work.
But how did I find this out? I compiled your code and then ran it in a debugger. I paused the program execution while it was in the loop and inspected the variables. There I found that j was negative. That pretty much told me all I needed to know. To be sure, I added a cout << j; as well as an assert(j > 0) and confirmed that j was overflowing.

I would try using a large array rather than a vector, then you will be able to avoid those duplicates you mention as for every number you calculate you can check if it's in the array, and if not, add it. It's probably actually more memory efficient that way too. Also, you might want to try using unsigned long as it's not clear at first glance how large these numbers will get.

i stored the length of the chain for every number in an array.. and during brute force whenever i got a number less than that being evaluated for, i just added the chain length for that lower number and broke out of the loop.
For example, i already know the Collatz sequence for 10 is 7 lengths long.
now when i'm evaluating for 13, i get 40, then 20, then 10.. which i have already evaluated. so the total count is 3 + 7.
the result on my machine (for upto 1 million) was 0.2 secs. with pure brute force that was 5 seconds.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sieve of Eratosthenes for large numbers c++ - c++

Related

Is there a way to factorize large numbers in c++

Segmentation fault when my array is too big

Check if number is automorphic

fastest method for finding number of prime numbers between two large numbers x and y

C++ Euler-Problem 14 Program Freezing

Categories

Resources