C++ Coprimes Problem. Optimize code - c++

Hi i want to optimize the following code. It tries to find all coprimes in a given range by comparing them to n. But i want to make it run faster... any ideas?
#include <iostream>
using namespace std;
int GCD(int a, int b)
{
while( 1 )
{
a = a % b;
if( a == 0 )
return b;
b = b % a;
if( b == 0 )
return a;
}
}
int main(void){
int t;
cin >> t;
for(int i=0; i<t; i++){
int n,a,b;
cin >> n >> a >> b;
int c = 0;
for(int j=a; j<=b; j++){
if(GCD(j, n) == 1) c++;
}
cout << c << endl;
}
return 0;
}

This smells like homework, so only a hint.
You don't need to calculate GCD here. If you can factorize n (even in the crudest way of trying to divide by every odd number smaller than 2^16), then you can just count numbers which happen not to divide by factors of n.
Note that there will be at most 10 factors of a 32-bit number (we don't need to remember how many times given prime is used in factorization).
How to do that? Try to count non-coprimes using inclusion–exclusion principle. You will have at most 1023 subsets of primes to check, for every subset you need to calculate how many multiplies are in the range, which is constant time for each subset.
Anyway, my code works in no time now:
liori:~/gg% time ./moje <<< "1 1003917915 1 1003917915"
328458240
./moje <<< "1 1003917915 1 1003917915" 0,00s user 0,00s system 0% cpu 0,002 total

On a single core computer it's not going to get much faster than it currently is. So you want to utilize multiple cores or even multiple computers. Parallelize and distribute.
Since each pair of numbers you want to calculate the GCD for isn't linked to any other pair of numbers you can easily modify your program to utilize multiple cores by using threads.
If this still isn't fast enough you'd better start thinking of using distributed computing, assigning the work to many computers. This is a bit trickier but should improve the performance the most if the search space is large.

Consider giving it a try with doubles. It said that divisions with doubles are faster on typical intel chips. Integer division is the slowest instruction out there. This is a chicken egg problem. Nobody uses them because they're slow and intel doesnt make it faster because nobody uses it.

Related

sieve or eratosthenes calculator -- running into memory issues and crashing with numbers >=1,000,000

I'm not exactly sure why this is. I tried changing the variables to long long, and I even tried doing a few other things -- but its either about the inefficiency of my code (it literally does the whole process of finding all primes up to the number, then checking against the number to see if its divisible by that prime -- very inefficient, but its my first attempt at this and I feel pretty accomplished having it work at all....)
Or the fact that it overflows the stack. Im not sure where it is exactly, but all I know is that it MUST be related to memory and the way its dealing with the number.
If I had to guess, Id say its a memory issue happening when it is dealing with the prime number generation up to that number -- thats where it dies even if I remove the check against the input number.
I'll post my code -- just be aware, I didnt change long long back to int in a few places, and I also have a SquareRoot Variable that is not used, because it was supposed to try and help memory efficiency but was not effective the way I tried to do it. I Just never deleted it. I will clean up the code when and if I can successfully finish it.
As far as I am aware though, it DOES work pretty reliably for 999,999 and down, I actually checked it up against other calculators of the same type and it seemingly does generate the proper answers.
If anyone can help or explain what I screwed up here, your helping a guy trying to learn on his own without any school or anything. so its appreciated.
#include <iostream>
#include <cmath>
void sieve(int ubound, int primes[]);
int main()
{
long long n;
int i;
std::cout << "Input Number: ";
std::cin >> n;
if (n < 2) {
return 1;
}
long long upperbound = n;
int A[upperbound];
int SquareRoot = sqrt(upperbound);
sieve(upperbound, A);
for (i = 0; i < upperbound; i++) {
if (A[i] == 1 && upperbound % i == 0) {
std::cout << " " << i << " ";
}
}
return 0;
}
void sieve(int ubound, int primes[])
{
long long i, j, m;
for (i = 0; i < ubound; i++) {
primes[i] = 1;
}
primes[0] = 0, primes[1] = 0;
for (i = 2; i < ubound; i++) {
for(j = i * i; j < ubound; j += i) {
primes[j] = 0;
}
}
}
If you used legal C++ constructs instead of non-standard variable length arrays, your code will run (whether it produces the correct answers is another question).
The issue is more than likely that you're exceeding the limits of the stack when you declare arrays with a million or more elements.
Therefore instead of this:
long long upperbound = n;
A[upperbound];
Use std::vector:
#include <vector>
//...
long long upperbound = n;
std::vector<int> A(upperbound);
and then:
sieve(upperbound, A.data());
The std::vector does not use the stack space to allocate its elements (unless you have written an allocator for it that uses the stack).
As a matter of fact, you don't even need to pass upperbound to sieve, as a std::vector knows its own size by calling the size() member function. But I leave that as an exercise.
Live example using 2,000,000
First of all, read and apply PaulMcKenzie's advice. That's the most important thing. I'm only addressing some teeny bits of your question that remained open.
It seems that you are trying to factor the number that you misleadingly called upperbound. The mysterious role of the square root of this number is related to this fact: if the number is composite at all - and hence can be computed as the product of some prime factors - then the smallest of these prime factors cannot be greater than the square root of the number. In fact, only one factor can possibly be greater, all others cannot exceed the square root.
However, in its present form your code cannot draw advantage from this fact. The trial division loop as it stands now has to run up to number_to_be_factored / 2 in order not to miss any factors because its body looks like this:
if (sieve[i] == 1 && number_to_be_factored % i == 0) {
std::cout << " " << i << " ";
}
You can factor much more efficiently if you refactor your code a bit: when you have found the smallest prime factor p of your number then the remaining factors to be found must be precisely those of rest = number_to_be_factored / p (or n = n / p, if you will), and none of the remaining factors can be smaller than p. However, don't forget that p might occur more than once as a factor.
During any round of the proceedings you only need to consider the prime factors between p and the square root of the current number; if none of those primes divides the current number then it must be prime. To test whether p exceeds the square root of some number n you can use if (p * p > n), which is computationally more efficient that actually computing the square root.
Hence the square root occurs in two different roles:
the square root of the number to be factored limits the amount of sieving that needs to be done
during the trial division loop, the square root of the current number gives an upper bound for the highest prime factor that you need to consider
That's two faces of the same coin but two different usages in the actual code.
Note: once you got your code working by applying PaulMcKenzie's advice, you might also to consider posting over on Code Review.

How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
{
int n, i, sum, j;
scanf("%d", &n);
sum=0;
for(i=0;i<n;i++)
scanf("%d", &num[i]);
for(i=0;i<n-1;i++)
{
for(j=i+1;j<n;j++)
{
sum+=(num[i]^num[j]);
}
}
printf("%d\n", sum);
return 0;
}
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
{
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
}
return res;
}
Live Demo.
Explanation:
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

Odds of winning lottery C++

I have an assignment that asks for us to make a program in C++ that takes the input from a user for the amount of numbers on a lottery ticket, and the amount of numbers in a lottery drawing. It should then calculates the odds of the user getting the numbers correct. This is (more or less) my first program I am writing in C++, so I am new to this. What I have so far is below. I am seeking help with making the program work. I can get values in for the declared variables, but cannot figure out how to write down what it is I actually need to do - which is a factorial function. I know the function, just don't know how to say it in C++
From what I understand at this point is that it should look something like this:
for (int i = 1; i <= k; i++) {
result = (result * (n+1-i)) / i;
or something to that effect?.... at least this is what I have come across in the past couple of hours of searching for an answer online. I think I am getting close to figuring it out but I am at a road block.
I don't want someone to just tell me the answer. If you could explain to me what I am doing wrong and what I can do to fix it that would be most helpful for me.
#include <iostream>
#include <iomanip>
using namespace std;
int main (int argc, char** argv)
{
int n, k;
int odds;
cout<< "How many numbers are printed on the lottery ticket? ";
cin >> n ;
cout<<"How may numbers are selected in the lottery drawing? ";
cin >> k ;
cout << "You entered " << n << " for how many numbers are printed on the lottery ticket, and "
<< k << " for how many numbers are selected in the lottery drawing." << endl;
for (int i = 1; i <= k; i++)
{
odds = (n * (n-k++))/k;
cout << odds;
}
return 0;
}
When I run this I just get an endless stream of "3-3-3-3....". It's non-stop. At one point I was getting a number as the output (one VERY large incorrect number), but while I was tinkering with it I couldn't get it back.
Any guidance would be appreciated.
This seems slightly difficult for a first assignment, unless you're most of the way through a computer science curriculum and only new to C++.
The formula for the odds, which is commonly known as "number of combinations", is frequently written in terms of factorials. But you can't manipulate those factorials effectively on a computer; they are far too large for any of the built-in data types.
Instead, it's important to cancel like terms from numerator and denominator. Interleaving multiplications and divisions can help even more.
I've previously posted working code for number of combinations on another question:
Number of combinations (N choose R) in C++
Your current code actually does have things interleaved pretty well, but you haven't been at all careful with the meanings of i and k and n, and you've also got undefined behavior from both reading and writing a variable between sequence points.
Specifically, this is illegal because the k in the denominator is unstable, since it is in the process of being incremented:
odds = n*(n-k++)/k;
You shouldn't be changing k here at all. The value varying from 1 to k is i. So this becomes:
odds = n * (n-i) / i;
You need all the terms to accumulate across loop iterations, so you should be multiplying by the previous odds value:
odds = odds * (n - i) / i;
But you do need n - 0 in the numerator, but no 0 in the denominator. You're chosen to make i one-based, you it's the numerator that needs to be adjusted:
odds = odds * (n + 1 - i) / i;
And now your code is extremely close to mine. Depending on your values of n and k you might still overflow. Changing the data type of odds to long long or double should help with that.
This is the formula you need:
http://en.wikipedia.org/wiki/Lottery_mathematics
Make sure that you have the mathematics well in hand. Start with a function that implements that formula.
Once you have the formula in hand, you'll realize that the naive student factorial will never work. The biggest naive factorial you can have with a long is 20!; after that it overflows.
The right way to do it is logarithms and gamma function:
https://en.wikipedia.org/wiki/Gamma_function
So that formula will turn into:
ln{n!/k!(n-k)!)} = ln(n!) - ln(k!) - ln((n-k)!)
But since gamma(n+1) = n!
lngamma(n+1) - lngamma(k+1) - lngamma(n-k-1)
The gamma function returns doubles, not integers or longs. It'll behave much better for you.

Need a way to make this code run faster

I'm trying to solve Project Euler problem 401. They only way I could find a way to solve it was brute-force. I've been running this code for like 10 mins without any answer. Can anyone help me with ideas improve it.
Code:
#include <iostream>
#include <cmath>
#define ull unsigned long long
using namespace std;
ull sigma2(ull n);
ull SIGMA2(ull n);
int main()
{
ull ans = SIGMA2(1000000000000000) % 1000000000;
cout << "Answer: " << ans << endl;
cin.get();
cin.ignore();
return 0;
}
ull sigma2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=floor(sqrt(n)); i++)
{
if(n%i == 0)
{
sum += (i*i)+((n/i)*(n/i));
}
if(i*i == n)
{
sum -= n;
}
}
return sum;
}
ull SIGMA2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=n; i++)
{
sum+=sigma2(i);
}
return sum;
}
You're missing some dividers, if a/b=c, and b is a divider of a then c will also be a divider of a but cmight be greater than floor(sqrt(a)), for example 3 > floor(sqrt(6)) but divides 6.
Then you should put your floor(sqrt(n)) in a variable and use the variable in the for, otherwise you recalculate it a every operation which is very expensive.
You can do some straightforward optimizations:
inline sigma2,
calculate floor(sqrt(n)) before the loop (but compiler may be doing it anyway, though),
precalculate squares of all ints from 1 to n and then use array lookup instead of multiplication
You will gain more by changing your approach. Think what you are trying to do - summing squares of all divisors of all integers from 1 to n. You grouped divisors by what they divide, but you can regroup terms in this sum. Let's group divisors by their value:
1 divides everything so it will appear n times in the sum, bringing 1*1*n total,
2 divides evens and will appear n/2 (integer division!) times, bringing 2*2*(n/2) total,
k ... will bring k*k*(n/k) total.
So we should just add up k*k*(n/k) for k from 1 to n.
Think about the problem.
Bruteforce the way you tried is obviously not a good idea.
You should come up with something better...
Isn't there any method how to use some nice prime factorization method to speed up the computation? Isn't there any recursion pattern? Try to find something...
One simple optimization that you can carry out is that there will be many repeated factors in the numbers.
So first estimate in how many numbers would 1 be a factor ( all N numbers ).
In how many numbers would 2 be a factor ( N/2 ).
...
Similarly for others.
Just multiply their squares with their frequency.
Time complexity shall then straight-away reduce to O(N)
There are obvious microoptimizations such as ++i rather than i++ or getting floor(sqrt(n)) out of the loop (these are two floating point operations which are really expensive compared to other integer operation in the loop), and calculting n/i only once (use a dummy variable for it and then calculate the square of the dummy).
There are also rather obvious simplifications in the algorithm. For example SIGMA2(i) = SIGMA2(i-1) + sigma2(i). But do not use recursion since you need a really huge number, this would not work and your stack memory would be exhausted. Use loop instead of recursion. There is a huge potential for improvement.
And well, there is a bigger problem - 10^15 has 15 digits. This number squared has 30 digits. There is no way you can store this into unsigned long long, which has I think about 20 digits. So you need to employ somehow the modulo 10^9 (the end of the assignment) and get additional space for your calculations...
And when using brute force, print out the temporary result every milion number for example to give you idea how fast you are approaching to the final result. Waiting 10 minutes blindly is not a good idea.

Optimizing my code for finding the factors of a given integer

Here is my code,but i'lld like to optimize it.I don't like the idea of it testing all the numbers before the square root of n,considering the fact that one could be faced with finding the factors of a large number. Your answers would be of great help. Thanks in advance.
unsigned int* factor(unsigned int n)
{
unsigned int tab[40];
int dim=0;
for(int i=2;i<=(int)sqrt(n);++i)
{
while(n%i==0)
{
tab[dim++]=i;
n/=i;
}
}
if(n>1)
tab[dim++]=n;
return tab;
}
Here's a suggestion on how to do this in 'proper' c++ (since you tagged as c++).
PS. Almost forgot to mention: I optimized the call to sqrt away :)
See it live on http://liveworkspace.org/code/6e2fcc2f7956fafbf637b54be2db014a
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef unsigned int uint;
std::vector<uint> factor(uint n)
{
std::vector<uint> tab;
int dim=0;
for(unsigned long i=2;i*i <= n; ++i)
{
while(n%i==0)
{
tab.push_back(i);
n/=i;
}
}
if(n>1)
tab.push_back(n);
return tab;
}
void test(uint x)
{
auto v = factor(x);
std::cout << x << ":\t";
std::copy(v.begin(), v.end(), std::ostream_iterator<uint>(std::cout, ";"));
std::cout << std::endl;
}
int main(int argc, const char *argv[])
{
test(1);
test(2);
test(4);
test(43);
test(47);
test(9997);
}
Output
1:
2: 2;
4: 2;2;
43: 43;
47: 47;
9997: 13;769;
There's a simple change that will cut the run time somewhat: factor out all the 2's, then only check odd numbers.
If you use
... i*i <= n; ...
It may run much faster than i <= sqrt(n)
By the way, you should try to handle factors of negative n or at least be sure you never pass a neg number
I'm afraid you cannot. There is no known method in the planet can factorize large integers in polynomial time. However, there are some methods can help you slightly (not significantly) speed up your program. Search Wikipedia for more references. http://en.wikipedia.org/wiki/Integer_factorization
As seen from your solution , you find basically all prime numbers ( the condition while (n%i == 0)) works like that , especially for the case of large numbers , you could compute prime numbers beforehand, and keep checking only those. The prime number calculation could be done using Sieve of Eratosthenes method or some other efficient method.
unsigned int* factor(unsigned int n)
If unsigned int is the typical 32-bit type, the numbers are too small for any of the more advanced algorithms to pay off. The usual enhancements for the trial division are of course worthwhile.
If you're moving the division by 2 out of the loop, and divide only by odd numbers in the loop, as mentioned by Pete Becker, you're essentially halving the number of divisions needed to factor the input number, and thus speed up the function by a factor of very nearly 2.
If you carry that one step further and also eliminate the multiples of 3 from the divisors in the loop, you reduce the number of divisions and hence increase the speed by a factor close to 3 (on average; most numbers don't have any large prime factors, but are divisible by 2 or by 3, and for those the speedup is much smaller; but those numbers are quick to factor anyway. If you factor a longer range of numbers, the bulk of the time is spent factoring the few numbers with large prime divisors).
// if your compiler doesn't transform that to bit-operations, do it yourself
while(n % 2 == 0) {
tab[dim++] = 2;
n /= 2;
}
while(n % 3 == 0) {
tab[dim++] = 3;
n /= 3;
}
for(int d = 5, s = 2; d*d <= n; d += s, s = 6-s) {
while(n % d == 0) {
tab[dim++] = d;
n /= d;
}
}
If you're calling that function really often, it would be worthwhile to precompute the 6542 primes not exceeding 65535, store them in a static array, and divide only by the primes to eliminate all divisions that are a priori guaranteed to not find a divisor.
If unsigned int happens to be larger than 32 bits, then using one of the more advanced algorithms would be profitable. You should still begin with trial divisions to find the small prime factors (whether small should mean <= 1000, <= 10000, <= 100000 or perhaps <= 1000000 would need to be tested, my gut feeling says one of the smaller values would be better on average). If after the trial division phase the factorisation is not yet complete, check whether the remaining factor is prime using e.g. a deterministic (for the range in question) variant of the Miller-Rabin test. If it's not, search a factor using your favourite advanced algorithm. For 64 bit numbers, I'd recommend Pollard's rho algorithm or an elliptic curve factorisation. Pollard's rho algorithm is easier to implement and for numbers of that magnitude finds factors in comparable time, so that's my first recommendation.
Int is way to small to encounter any performance problems. I just tried to measure the time of your algorithm with boost but couldn't get any useful output (too fast). So you shouldn't worry about integers at all.
If you use i*i I was able to calculate 1.000.000 9-digit integers in 15.097 seconds. It's good to optimize an algorithm but instead of "wasting" time (depends on your situation) it's important to consider if a small improvement really is worth the effort. Sometimes you have to ask yourself if you rally need to be able to calculate 1.000.000 ints in 10 seconds or if 15 is fine as well.