sieve or eratosthenes calculator -- running into memory issues and crashing with numbers >=1,000,000

sieve or eratosthenes calculator -- running into memory issues and crashing with numbers >=1,000,000 - c++

I'm not exactly sure why this is. I tried changing the variables to long long, and I even tried doing a few other things -- but its either about the inefficiency of my code (it literally does the whole process of finding all primes up to the number, then checking against the number to see if its divisible by that prime -- very inefficient, but its my first attempt at this and I feel pretty accomplished having it work at all....)
Or the fact that it overflows the stack. Im not sure where it is exactly, but all I know is that it MUST be related to memory and the way its dealing with the number.
If I had to guess, Id say its a memory issue happening when it is dealing with the prime number generation up to that number -- thats where it dies even if I remove the check against the input number.
I'll post my code -- just be aware, I didnt change long long back to int in a few places, and I also have a SquareRoot Variable that is not used, because it was supposed to try and help memory efficiency but was not effective the way I tried to do it. I Just never deleted it. I will clean up the code when and if I can successfully finish it.
As far as I am aware though, it DOES work pretty reliably for 999,999 and down, I actually checked it up against other calculators of the same type and it seemingly does generate the proper answers.
If anyone can help or explain what I screwed up here, your helping a guy trying to learn on his own without any school or anything. so its appreciated.
#include <iostream>
#include <cmath>
void sieve(int ubound, int primes[]);
int main()
{
long long n;
int i;
std::cout << "Input Number: ";
std::cin >> n;
if (n < 2) {
return 1;
}
long long upperbound = n;
int A[upperbound];
int SquareRoot = sqrt(upperbound);
sieve(upperbound, A);
for (i = 0; i < upperbound; i++) {
if (A[i] == 1 && upperbound % i == 0) {
std::cout << " " << i << " ";
}
}
return 0;
}
void sieve(int ubound, int primes[])
{
long long i, j, m;
for (i = 0; i < ubound; i++) {
primes[i] = 1;
}
primes[0] = 0, primes[1] = 0;
for (i = 2; i < ubound; i++) {
for(j = i * i; j < ubound; j += i) {
primes[j] = 0;
}
}
}

If you used legal C++ constructs instead of non-standard variable length arrays, your code will run (whether it produces the correct answers is another question).
The issue is more than likely that you're exceeding the limits of the stack when you declare arrays with a million or more elements.
Therefore instead of this:
long long upperbound = n;
A[upperbound];
Use std::vector:
#include <vector>
//...
long long upperbound = n;
std::vector<int> A(upperbound);
and then:
sieve(upperbound, A.data());
The std::vector does not use the stack space to allocate its elements (unless you have written an allocator for it that uses the stack).
As a matter of fact, you don't even need to pass upperbound to sieve, as a std::vector knows its own size by calling the size() member function. But I leave that as an exercise.
Live example using 2,000,000

First of all, read and apply PaulMcKenzie's advice. That's the most important thing. I'm only addressing some teeny bits of your question that remained open.
It seems that you are trying to factor the number that you misleadingly called upperbound. The mysterious role of the square root of this number is related to this fact: if the number is composite at all - and hence can be computed as the product of some prime factors - then the smallest of these prime factors cannot be greater than the square root of the number. In fact, only one factor can possibly be greater, all others cannot exceed the square root.
However, in its present form your code cannot draw advantage from this fact. The trial division loop as it stands now has to run up to number_to_be_factored / 2 in order not to miss any factors because its body looks like this:
if (sieve[i] == 1 && number_to_be_factored % i == 0) {
std::cout << " " << i << " ";
}
You can factor much more efficiently if you refactor your code a bit: when you have found the smallest prime factor p of your number then the remaining factors to be found must be precisely those of rest = number_to_be_factored / p (or n = n / p, if you will), and none of the remaining factors can be smaller than p. However, don't forget that p might occur more than once as a factor.
During any round of the proceedings you only need to consider the prime factors between p and the square root of the current number; if none of those primes divides the current number then it must be prime. To test whether p exceeds the square root of some number n you can use if (p * p > n), which is computationally more efficient that actually computing the square root.
Hence the square root occurs in two different roles:
the square root of the number to be factored limits the amount of sieving that needs to be done
during the trial division loop, the square root of the current number gives an upper bound for the highest prime factor that you need to consider
That's two faces of the same coin but two different usages in the actual code.
Note: once you got your code working by applying PaulMcKenzie's advice, you might also to consider posting over on Code Review.

Related

Problem with numbers and power of numbers

Problem:
In a given range (a, b) ( a <= b, 2 <= a, b <= 1000000 ) find all natural numbers that can be expressed in format x ^ n ( x and n are natural numbers ). If there are more than one possibility to present expressed number, present it with a bigger exponential value.
U1.txt
Screen
40 110
49 = 7^2; 64 = 2^6; 81 = 3^4; 100 = 10^2;
#include <iostream>
#include <fstream>
#include <cmath>
int Power(int number, int base);
int main()
{
int a, b;
std::ifstream fin("U1.txt");
fin >> a >> b;
fin.close();
for (int i = a; i <= b; i++)
{
int max_power = 0;
int min_base = 10;
bool found = false;
for (int j = 2; j <= 10; j++)
{
int power = Power(i, j);
if (power > 0)
{
if (max_power < power) { max_power = power; }
if (min_base > j) { min_base = j; }
found = true;
}
}
if (found)
{
std::cout << i << " = " << min_base << " ^ " << max_power << "; ";
}
}
return 0;
}
int Power(int number, int base)
{
int power = (log(number) / log(base) + 0.5);
if (pow(base, power) == number)
{
return power;
}
return 0;
}
I solved the problem. However, I don't understand few things:
How the int Power(int number, int base) function works. Why the log function is used? Why after division of two log functions the 0.5 is added? I found the Idea on the Internet.
I am not sure if this solution works on all cases. I didn't know what could be the biggest value of the base number so my for (int j = 2; j <= 10; j++) loop is going from 2 to 10. If there is a number that base is bigger the solution won't work.
Are there any easier ways to solve this problem?

How does the function work?
That's something the OP should have asked to the authors of that snippet (assuming it was copied verbatim or close).
The intent seems to check if a whole number power exists, such that in combination with the integral arguments number and base the following equation is satisfied:
number = base power
The function returns it or 0 if it doesn't exist, meaning that number is not an integral power of some integral base. To do so,
it uses a property of the logarithms:
n = bp
log(n) = p log(b)
p = log(n) / log(b)
it rounds the number[1] to the "closest" integer, to avoid cases where the limited precision of floating-point types and operations would have yield incorrect results in case of a simple truncation.
In the comments I've already made the example of std::log(1000)/std::log(10), which may produce a double result close to 3.0, but less than 3.0 (something like 2.9999999999999996). When stored in an int it would be truncated to 2.
It checks if the number found is the exact power which solve the previous equation, but that comparison has the same problems I mentioned before.
pow(base, power) == number // It compares a double with an int
Just like std::log, std::pow returns a double value, making all the calculations performed with those functions prone to subtle numerical errors (either by rounding or by accumulation when multiple operations are involved). It's often preferable to use integral types and operations, if possible, when accuracy (or absolute exactness[2]) is needed.
Is the algorithm correct?
I didn't know what could be the biggest value of the base number so my for loop is going from 2 to 10
That's just wrong. One of the constraints of the problem is b <= 1'000'000, but the posted solution couldn't find any power greater than 102.
An extimate of the greatest possible base is the square root of said b.
Are there any easier ways to solve this problem?
Easiness is subjective and we don't know all the requirements and constraints of OP's assignment. I'll describe an alternative solution without posting the code I wrote to test it[3].
OP's code considers all the numbers between a and b checking for every (well, up to 10) base if there exists a whole power.
My proposal uses only integral variables, of a wide enough type, say long (any 32-bit integer is enough).
The outer loop starts from base = 2 and increments it by one at every step.
Inside this loop, exponent is set to 2 and value to base * base
If value is greater than b, the algorithm stops.
While value is less than a, updates it (multiplying it by base) and the exponent (it's incremented by one). We need to find the first power of base which is greater or equal to a.
While value is less than or equal to b, store the triplet of variables value, base and exponent in suitable container.
Consider a std::map<long, std::pair<long, long>>, it lets us associate all the values with the corresponding pair of base and exponent. Also, it could be later traversed to obtain all the values in ascending order.
The assignment requires, in case of multiple powers, to present only the one with the bigger exponent. In the example, it shows 64 = 26, ignoring 64 = 43. Note the needed one is the one with the smaller base, so that it's enough to ignore any further value if it's already present in the map.
value and exponent are updated as before.
Note that this algorithm only consider bases up to the square root of b (in the outer loop) and the number of iterations of the inner loop is much more limited (with base = 2, it would be less than 20, beeing 220 > 1'000'000. Greater bases would stop sooner and sooner).
[1] See e.g. Why do lots of (old) programs use floor(0.5 + input) instead of round(input)?
[2] See e.g. The most efficient way to implement an integer based power function pow(int, int)
[3] How do I ask and answer homework questions?

Odds of winning lottery C++

I have an assignment that asks for us to make a program in C++ that takes the input from a user for the amount of numbers on a lottery ticket, and the amount of numbers in a lottery drawing. It should then calculates the odds of the user getting the numbers correct. This is (more or less) my first program I am writing in C++, so I am new to this. What I have so far is below. I am seeking help with making the program work. I can get values in for the declared variables, but cannot figure out how to write down what it is I actually need to do - which is a factorial function. I know the function, just don't know how to say it in C++
From what I understand at this point is that it should look something like this:
for (int i = 1; i <= k; i++) {
result = (result * (n+1-i)) / i;
or something to that effect?.... at least this is what I have come across in the past couple of hours of searching for an answer online. I think I am getting close to figuring it out but I am at a road block.
I don't want someone to just tell me the answer. If you could explain to me what I am doing wrong and what I can do to fix it that would be most helpful for me.
#include <iostream>
#include <iomanip>
using namespace std;
int main (int argc, char** argv)
{
int n, k;
int odds;
cout<< "How many numbers are printed on the lottery ticket? ";
cin >> n ;
cout<<"How may numbers are selected in the lottery drawing? ";
cin >> k ;
cout << "You entered " << n << " for how many numbers are printed on the lottery ticket, and "
<< k << " for how many numbers are selected in the lottery drawing." << endl;
for (int i = 1; i <= k; i++)
{
odds = (n * (n-k++))/k;
cout << odds;
}
return 0;
}
When I run this I just get an endless stream of "3-3-3-3....". It's non-stop. At one point I was getting a number as the output (one VERY large incorrect number), but while I was tinkering with it I couldn't get it back.
Any guidance would be appreciated.

This seems slightly difficult for a first assignment, unless you're most of the way through a computer science curriculum and only new to C++.
The formula for the odds, which is commonly known as "number of combinations", is frequently written in terms of factorials. But you can't manipulate those factorials effectively on a computer; they are far too large for any of the built-in data types.
Instead, it's important to cancel like terms from numerator and denominator. Interleaving multiplications and divisions can help even more.
I've previously posted working code for number of combinations on another question:
Number of combinations (N choose R) in C++
Your current code actually does have things interleaved pretty well, but you haven't been at all careful with the meanings of i and k and n, and you've also got undefined behavior from both reading and writing a variable between sequence points.
Specifically, this is illegal because the k in the denominator is unstable, since it is in the process of being incremented:
odds = n*(n-k++)/k;
You shouldn't be changing k here at all. The value varying from 1 to k is i. So this becomes:
odds = n * (n-i) / i;
You need all the terms to accumulate across loop iterations, so you should be multiplying by the previous odds value:
odds = odds * (n - i) / i;
But you do need n - 0 in the numerator, but no 0 in the denominator. You're chosen to make i one-based, you it's the numerator that needs to be adjusted:
odds = odds * (n + 1 - i) / i;
And now your code is extremely close to mine. Depending on your values of n and k you might still overflow. Changing the data type of odds to long long or double should help with that.

This is the formula you need:
http://en.wikipedia.org/wiki/Lottery_mathematics
Make sure that you have the mathematics well in hand. Start with a function that implements that formula.
Once you have the formula in hand, you'll realize that the naive student factorial will never work. The biggest naive factorial you can have with a long is 20!; after that it overflows.
The right way to do it is logarithms and gamma function:
https://en.wikipedia.org/wiki/Gamma_function
So that formula will turn into:
ln{n!/k!(n-k)!)} = ln(n!) - ln(k!) - ln((n-k)!)
But since gamma(n+1) = n!
lngamma(n+1) - lngamma(k+1) - lngamma(n-k-1)
The gamma function returns doubles, not integers or longs. It'll behave much better for you.

Problems with program which find prime number from 1 to 100

I wrote this program which find and displays prime numbers from 1 to 100
int ifprime (int n)
{
int i=1;
while (i<= n)
{
i++;
if (n%i == 0)
{
return false;
break;
}
else continue;
}
return true;
}
int prime_numbers (void)
{
bool result;
for (int i = 2; i<=100; ++i)
{
result = ifprime(i);
if (result==true) cout<<i<<endl;
else continue;
}
}
int main()
{
prime_numbers();
return 0;
}
The program displays nothing. Why?

Change your loop to:
for(int i=2; i<n; i++){
if(n%i==0){
return false;
}
}
Or your while end condition to:
while(i < n-1)

As pointed out in the comments, every non-zero number is divisible by 1 and itself. Change this line (line 4)
while (i<= n)
to
while (i< n)

If that's your whole code, then you're simply missing a main() function.
Though it shouldn't link without a main() function.
There are some additional problems with your code, but that's probably the reason why you don't see any output.
Try adding this to your file:
int main()
{
prime_numbers();
return 0;
}

Many people have pointed out the problem with:
while (i <= n)
...because you allow i to be n in your for loop, every natural number is divisible by itself so it wrongly accuses prime numbers of being composites. As people pointed out, the quick fix is:
while (i < n)
But the reason why I reply is because there are other things you can do to make your code better. The first improvement is that you don't need to try dividing by numbers greater than the square root of n because if there is a greater than it, then there is also a divisor less than it. So you could do something like this:
while (i*i <= n)
But there are further improvements you can do on that. For example, why should you have to compute i*i every iteration? If you pre-compute square root of n (rounded to int), then you can avoid that computation.
Another optimization is that you can avoid trial dividing by half of the numbers: if n is not divisible by 2, then no reason to try any other even numbers. So you can jump i by 2 every time in your inner loop. There are other tricks if you want to eliminate trial dividing by numbers divisible by 3.
Really, however, there is a nice super-duper fast algorithm to find the first x primes if you don't mind using order x bytes of memory. It is called the sieve of Eratosthenes, and it is really fun to implement. Once you get your current code optimized, I recommend trying the sieve.
The problem of finding prime numbers efficiently has received an enormous amount of attention in the academic literature, and it is now considered solved. But it takes a lot of study to learn it.

Need a way to make this code run faster

I'm trying to solve Project Euler problem 401. They only way I could find a way to solve it was brute-force. I've been running this code for like 10 mins without any answer. Can anyone help me with ideas improve it.
Code:
#include <iostream>
#include <cmath>
#define ull unsigned long long
using namespace std;
ull sigma2(ull n);
ull SIGMA2(ull n);
int main()
{
ull ans = SIGMA2(1000000000000000) % 1000000000;
cout << "Answer: " << ans << endl;
cin.get();
cin.ignore();
return 0;
}
ull sigma2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=floor(sqrt(n)); i++)
{
if(n%i == 0)
{
sum += (i*i)+((n/i)*(n/i));
}
if(i*i == n)
{
sum -= n;
}
}
return sum;
}
ull SIGMA2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=n; i++)
{
sum+=sigma2(i);
}
return sum;
}

You're missing some dividers, if a/b=c, and b is a divider of a then c will also be a divider of a but cmight be greater than floor(sqrt(a)), for example 3 > floor(sqrt(6)) but divides 6.
Then you should put your floor(sqrt(n)) in a variable and use the variable in the for, otherwise you recalculate it a every operation which is very expensive.

You can do some straightforward optimizations:
inline sigma2,
calculate floor(sqrt(n)) before the loop (but compiler may be doing it anyway, though),
precalculate squares of all ints from 1 to n and then use array lookup instead of multiplication
You will gain more by changing your approach. Think what you are trying to do - summing squares of all divisors of all integers from 1 to n. You grouped divisors by what they divide, but you can regroup terms in this sum. Let's group divisors by their value:
1 divides everything so it will appear n times in the sum, bringing 1*1*n total,
2 divides evens and will appear n/2 (integer division!) times, bringing 2*2*(n/2) total,
k ... will bring k*k*(n/k) total.
So we should just add up k*k*(n/k) for k from 1 to n.

Think about the problem.
Bruteforce the way you tried is obviously not a good idea.
You should come up with something better...
Isn't there any method how to use some nice prime factorization method to speed up the computation? Isn't there any recursion pattern? Try to find something...

One simple optimization that you can carry out is that there will be many repeated factors in the numbers.
So first estimate in how many numbers would 1 be a factor ( all N numbers ).
In how many numbers would 2 be a factor ( N/2 ).
...
Similarly for others.
Just multiply their squares with their frequency.
Time complexity shall then straight-away reduce to O(N)

There are obvious microoptimizations such as ++i rather than i++ or getting floor(sqrt(n)) out of the loop (these are two floating point operations which are really expensive compared to other integer operation in the loop), and calculting n/i only once (use a dummy variable for it and then calculate the square of the dummy).
There are also rather obvious simplifications in the algorithm. For example SIGMA2(i) = SIGMA2(i-1) + sigma2(i). But do not use recursion since you need a really huge number, this would not work and your stack memory would be exhausted. Use loop instead of recursion. There is a huge potential for improvement.
And well, there is a bigger problem - 10^15 has 15 digits. This number squared has 30 digits. There is no way you can store this into unsigned long long, which has I think about 20 digits. So you need to employ somehow the modulo 10^9 (the end of the assignment) and get additional space for your calculations...
And when using brute force, print out the temporary result every milion number for example to give you idea how fast you are approaching to the final result. Waiting 10 minutes blindly is not a good idea.

Optimizing my code for finding the factors of a given integer

Here is my code,but i'lld like to optimize it.I don't like the idea of it testing all the numbers before the square root of n,considering the fact that one could be faced with finding the factors of a large number. Your answers would be of great help. Thanks in advance.
unsigned int* factor(unsigned int n)
{
unsigned int tab[40];
int dim=0;
for(int i=2;i<=(int)sqrt(n);++i)
{
while(n%i==0)
{
tab[dim++]=i;
n/=i;
}
}
if(n>1)
tab[dim++]=n;
return tab;
}

Here's a suggestion on how to do this in 'proper' c++ (since you tagged as c++).
PS. Almost forgot to mention: I optimized the call to sqrt away :)
See it live on http://liveworkspace.org/code/6e2fcc2f7956fafbf637b54be2db014a
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef unsigned int uint;
std::vector<uint> factor(uint n)
{
std::vector<uint> tab;
int dim=0;
for(unsigned long i=2;i*i <= n; ++i)
{
while(n%i==0)
{
tab.push_back(i);
n/=i;
}
}
if(n>1)
tab.push_back(n);
return tab;
}
void test(uint x)
{
auto v = factor(x);
std::cout << x << ":\t";
std::copy(v.begin(), v.end(), std::ostream_iterator<uint>(std::cout, ";"));
std::cout << std::endl;
}
int main(int argc, const char *argv[])
{
test(1);
test(2);
test(4);
test(43);
test(47);
test(9997);
}
Output
1:
2: 2;
4: 2;2;
43: 43;
47: 47;
9997: 13;769;

There's a simple change that will cut the run time somewhat: factor out all the 2's, then only check odd numbers.

If you use
... i*i <= n; ...
It may run much faster than i <= sqrt(n)
By the way, you should try to handle factors of negative n or at least be sure you never pass a neg number

I'm afraid you cannot. There is no known method in the planet can factorize large integers in polynomial time. However, there are some methods can help you slightly (not significantly) speed up your program. Search Wikipedia for more references. http://en.wikipedia.org/wiki/Integer_factorization

As seen from your solution , you find basically all prime numbers ( the condition while (n%i == 0)) works like that , especially for the case of large numbers , you could compute prime numbers beforehand, and keep checking only those. The prime number calculation could be done using Sieve of Eratosthenes method or some other efficient method.

unsigned int* factor(unsigned int n)
If unsigned int is the typical 32-bit type, the numbers are too small for any of the more advanced algorithms to pay off. The usual enhancements for the trial division are of course worthwhile.
If you're moving the division by 2 out of the loop, and divide only by odd numbers in the loop, as mentioned by Pete Becker, you're essentially halving the number of divisions needed to factor the input number, and thus speed up the function by a factor of very nearly 2.
If you carry that one step further and also eliminate the multiples of 3 from the divisors in the loop, you reduce the number of divisions and hence increase the speed by a factor close to 3 (on average; most numbers don't have any large prime factors, but are divisible by 2 or by 3, and for those the speedup is much smaller; but those numbers are quick to factor anyway. If you factor a longer range of numbers, the bulk of the time is spent factoring the few numbers with large prime divisors).
// if your compiler doesn't transform that to bit-operations, do it yourself
while(n % 2 == 0) {
tab[dim++] = 2;
n /= 2;
}
while(n % 3 == 0) {
tab[dim++] = 3;
n /= 3;
}
for(int d = 5, s = 2; d*d <= n; d += s, s = 6-s) {
while(n % d == 0) {
tab[dim++] = d;
n /= d;
}
}
If you're calling that function really often, it would be worthwhile to precompute the 6542 primes not exceeding 65535, store them in a static array, and divide only by the primes to eliminate all divisions that are a priori guaranteed to not find a divisor.
If unsigned int happens to be larger than 32 bits, then using one of the more advanced algorithms would be profitable. You should still begin with trial divisions to find the small prime factors (whether small should mean <= 1000, <= 10000, <= 100000 or perhaps <= 1000000 would need to be tested, my gut feeling says one of the smaller values would be better on average). If after the trial division phase the factorisation is not yet complete, check whether the remaining factor is prime using e.g. a deterministic (for the range in question) variant of the Miller-Rabin test. If it's not, search a factor using your favourite advanced algorithm. For 64 bit numbers, I'd recommend Pollard's rho algorithm or an elliptic curve factorisation. Pollard's rho algorithm is easier to implement and for numbers of that magnitude finds factors in comparable time, so that's my first recommendation.

Int is way to small to encounter any performance problems. I just tried to measure the time of your algorithm with boost but couldn't get any useful output (too fast). So you shouldn't worry about integers at all.
If you use i*i I was able to calculate 1.000.000 9-digit integers in 15.097 seconds. It's good to optimize an algorithm but instead of "wasting" time (depends on your situation) it's important to consider if a small improvement really is worth the effort. Sometimes you have to ask yourself if you rally need to be able to calculate 1.000.000 ints in 10 seconds or if 15 is fine as well.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js