Need a way to make this code run faster - c++

I'm trying to solve Project Euler problem 401. They only way I could find a way to solve it was brute-force. I've been running this code for like 10 mins without any answer. Can anyone help me with ideas improve it.
Code:
#include <iostream>
#include <cmath>
#define ull unsigned long long
using namespace std;
ull sigma2(ull n);
ull SIGMA2(ull n);
int main()
{
ull ans = SIGMA2(1000000000000000) % 1000000000;
cout << "Answer: " << ans << endl;
cin.get();
cin.ignore();
return 0;
}
ull sigma2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=floor(sqrt(n)); i++)
{
if(n%i == 0)
{
sum += (i*i)+((n/i)*(n/i));
}
if(i*i == n)
{
sum -= n;
}
}
return sum;
}
ull SIGMA2(ull n)
{
ull sum = 0;
for(ull i = 1; i<=n; i++)
{
sum+=sigma2(i);
}
return sum;
}

You're missing some dividers, if a/b=c, and b is a divider of a then c will also be a divider of a but cmight be greater than floor(sqrt(a)), for example 3 > floor(sqrt(6)) but divides 6.
Then you should put your floor(sqrt(n)) in a variable and use the variable in the for, otherwise you recalculate it a every operation which is very expensive.

You can do some straightforward optimizations:
inline sigma2,
calculate floor(sqrt(n)) before the loop (but compiler may be doing it anyway, though),
precalculate squares of all ints from 1 to n and then use array lookup instead of multiplication
You will gain more by changing your approach. Think what you are trying to do - summing squares of all divisors of all integers from 1 to n. You grouped divisors by what they divide, but you can regroup terms in this sum. Let's group divisors by their value:
1 divides everything so it will appear n times in the sum, bringing 1*1*n total,
2 divides evens and will appear n/2 (integer division!) times, bringing 2*2*(n/2) total,
k ... will bring k*k*(n/k) total.
So we should just add up k*k*(n/k) for k from 1 to n.

Think about the problem.
Bruteforce the way you tried is obviously not a good idea.
You should come up with something better...
Isn't there any method how to use some nice prime factorization method to speed up the computation? Isn't there any recursion pattern? Try to find something...

One simple optimization that you can carry out is that there will be many repeated factors in the numbers.
So first estimate in how many numbers would 1 be a factor ( all N numbers ).
In how many numbers would 2 be a factor ( N/2 ).
...
Similarly for others.
Just multiply their squares with their frequency.
Time complexity shall then straight-away reduce to O(N)

There are obvious microoptimizations such as ++i rather than i++ or getting floor(sqrt(n)) out of the loop (these are two floating point operations which are really expensive compared to other integer operation in the loop), and calculting n/i only once (use a dummy variable for it and then calculate the square of the dummy).
There are also rather obvious simplifications in the algorithm. For example SIGMA2(i) = SIGMA2(i-1) + sigma2(i). But do not use recursion since you need a really huge number, this would not work and your stack memory would be exhausted. Use loop instead of recursion. There is a huge potential for improvement.
And well, there is a bigger problem - 10^15 has 15 digits. This number squared has 30 digits. There is no way you can store this into unsigned long long, which has I think about 20 digits. So you need to employ somehow the modulo 10^9 (the end of the assignment) and get additional space for your calculations...
And when using brute force, print out the temporary result every milion number for example to give you idea how fast you are approaching to the final result. Waiting 10 minutes blindly is not a good idea.

Related

Can Anyone reduce the Complexity of My Code. Problem E of Codeforces Round113 Div.2

Link to The Problem: https://codeforces.com/problemset/problem/166/E
Problem Statement:
*You are given a tetrahedron. Let's mark its vertices with letters A, B, C, and D correspondingly.
An ant is standing in the vertex D of the tetrahedron. The ant is quite active and he wouldn't stay idle. At each moment of time, he makes a step from one vertex to another one along some edge of the tetrahedron. The ant just can't stand on one place.
You do not have to do much to solve the problem: your task is to count the number of ways in which the ant can go from the initial vertex D to itself in exactly n steps. In other words, you are asked to find out the number of different cyclic paths with the length of n from vertex D to itself. As the number can be quite large, you should print it modulo 1000000007 (10^9 + 7).*
Input:
The first line contains the only integer n (1 ≤ n ≤ 107) — the required length of the cyclic path.
Output:
Print the only integer — the required number of ways modulo 1000000007 (10e9 + 7).
Example: Input n=2 , Output: 3
Input n=4, Output: 21
My Approach to Problem:
I have written a recursive code that takes two input n and present index, then I am traveling and exploring all possible combinations.
#include<iostream>
using namespace std;
#define mod 10000000
#define ll long long
ll count_moves=0;
ll count(ll n, int present)
{
if(n==0 and present==0) count_moves+=1, count_moves%=mod; //base_condition
else if(n>1){ //Generating All possible Combinations
count(n-1,(present+1)%4);
count(n-1,(present+2)%4);
count(n-1,(present+3)%4);
}
else if(n==1 and present) count(n-1,0);
}
int main()
{
ll n; cin>>n;
if(n==1) {
cout<<"0"; return;
}
count(n,0);
cout<<count_moves%mod;
}
But the problem is that I am getting Time Limit Error since Time Complexity of my Code is very high. Please Can anyone suggest me how can I optimize/Memoize my code to reduce its complexity?
#**Edit 1: ** Some People are commenting about macros and division well it's not an issue. The Range of n is 10^7 and complexity of my code is exponential so my actual doubt is how to decrease it to linear time. i,e O(n).
Anytime you built into a recursion and you exceeded time complexity, you have to understand the recursion is likely the problem.
The best solution is to not use a recursion.
Look at the result you have:
3
6
21
60
183
546
1641
4920
   ⋮      ⋮
While it might be hard to find a pattern for the first couple terms, but it gets easier later on.
Each term is roughly 3 times larger than the last term, or more precisely,
Now you could just write a for loop for it:
for(int i = 0; i < n-1; i++)
{
count_moves = count_moves * 3 + std::pow(-1, i) * 3;
}
or to get rid of pow():
for(int i = 0; i < n-1; i++)
{
count_moves = count_moves * 3 + (i % 2 * 2 - 1) * -3;
}
Further more, you could even build that into a general term formula to get rid of the for loop:
or in code:
count_moves = (pow(3, n) + (n % 2 * 2 - 1) * -3) / 4;
However, you can't get rid of the pow() this time, or you will have to write a loop for that then.
I believe one of your issues is that you are recalculating things.
Take for example n=4. count(3,x) is called 3 times for x in [0,3].
However if you made a std::map<int,int> you could save the value for (n,present) pairs and only calculate each value once.
This will take more space. The map will be 4*(n-1) big when you are done. That is still probably too large for 10^9?
Another thing you can do is multithread. Each call to count can instigate its own thread. You need to be careful then to be thread safe when changing the global count and the state of the std::map if you decide to use it.
Edit:
Calculate count(n,x) one time for n in [1,n-1] x in [0,3] then count[n,0] = a*count(n-1,1) +b*count(n-1,2) +c*count(n-1,3).
If you can figure out the pattern for what a,b,c are given n or maybe even the a,b,c for the n-1 case then you may be able to solve this problem easily.

How to find the sum of largest odd factor of a number?

I've a problem to find sum of largest odd factor of a number ( F(x) ) and F(x) = f(1)+f(2)+...+f(x). As you know the largest factor of 1 is 1, 2 is 1, 3 is 3, 4 is 1, and so on...
e.g The sum of largest odd factor of a number 6 is f(1)+f(2)+f(3)+f(4)+f(5)+f(6), that is 1+1+3+1+5+3 or 14.
And I want to try to solve a number until 2*10^9
So this is my code for the f(x) that get 82/100 before timeout
unsigned long long int biggestOddFactor(unsigned long long int n){
//!(n & (n - 1))
while(n%2 == 0){
n /= 2;
}
return n;
}
This is the another method by removing the zero on the last bit of a number, but it only make 77/100
#include <bitset>
unsigned long long int biggestOddFactorUsingBinary(unsigned long long int n){
std::string bin = std::bitset<32>(n).to_string();
int delet = 0;
for(int i = bin.length()-1; i >= 0; i--){
if(bin[i] == '0'){
delet += 1;
}else{
break;
}
}
bin = bin.substr(0, bin.length()-delet);
return std::bitset<32>(bin).to_ulong();;
}
Is there any way to optimize my algorithm?
You're asking the wrong question. The problem is not that you are finding the biggest odd factor too slowly, the problem is that your algorithm for finding the sum is too slow, not that this one part of it is too slow.
For example, the largest odd factor of every odd number is the number itself, and there's a formula for the sum of the first n odd numbers. Why are you not using that to halve the number of times you call biggestOddFactor? That's just for starters.
The largest odd factor of any even number is the same as that for half that number. So the sum of the largest odd factors of, say, 16, 14, 12, and 10 is the same as that for 8, 7, 6, and 5. Yet you compute these two ranges separately? Why?
And so on. You need to optimize your algorithm, not your implementation of a bad algorithm. The concepts above suggest several possible recursive implementations that will be much faster.
I just very quickly whipped up a solution to this problem using a better algorithm and it's thousands of times faster than just calling biggestOddFactor on every number. Note that my solution is recursive.
You should always consider algorithmic optimizations before you try to micro-optimize an implementation. The payoff tends to be much greater and the result is much less fragile.

How do determine Big-O of recursive code?

I have the following code, which is an answer to this question: https://leetcode.com/problems/add-digits/
class Solution {
public:
int addDigits(int num) {
if (!num/10)
return num;
long d = 1;
int retVal = 0;
while(num / d){
d *= 10;
}
for(d; d >= 1; d/=10){
retVal += num / d;
num %= d;
}
if (retVal > 9)
retVal = addDigits(retVal);
return retVal;
}
};
As a follow-up to this though, I'm trying to determine what the BigO growth is. My first attempt at calculating it came out to be O(n^n) (I assumed since the growth of each depth is directly depended on n every time), which is just depressing. Am I wrong? I hope I'm wrong.
In this case it's linear O(n) because you call addDigits method recursively without any loop and whatnot once in the method body
More details:
Determining complexity for recursive functions (Big O notation)
Update:
It's linear from the point of view of that the recursive function is called once. However, in this case, it's not exactly true, because the number of executions barely depends on input parameter.
Let n be the number of digits in base 10 of num.
I'd say that
T(1)=O(1)
T(n)=n+T(n') with n' <=n
Which gives us
O(n*n)
But can we do better?
Note than the maximum number representable with 2 digits is 99 which reduce in this way 99->18->9.
Note that we can always collapse 10 digits into 2 9999999999->90. For n>10 we can decompose than number in n/10segments of up to 10 digits each and reduce those segments in numbers of 2 digits each to be summed. The sum of n/10 numbers of 2 digits will always have less (or equal) than (n/10)*2 digits. Therefore
T(n)=n+T(n/5) for n>=10
Other base cases with n<10 should be easier. This gives
T(n)=O(1) for n<10
T(n)=n+T(n/5) for n>=10
Solving the recurrence equation gives
O(n) for n>=10
Looks like it's O(1) for values < 10, and O(n) for any other values.
I'm not well versed enough with the Big-O notation, to give an answer how this would be combined.
Most probably the first part is neclectable in significance, and such the overall time complexity becomes O(n).

How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
{
int n, i, sum, j;
scanf("%d", &n);
sum=0;
for(i=0;i<n;i++)
scanf("%d", &num[i]);
for(i=0;i<n-1;i++)
{
for(j=i+1;j<n;j++)
{
sum+=(num[i]^num[j]);
}
}
printf("%d\n", sum);
return 0;
}
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
{
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
}
return res;
}
Live Demo.
Explanation:
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

Optimizing my code for finding the factors of a given integer

Here is my code,but i'lld like to optimize it.I don't like the idea of it testing all the numbers before the square root of n,considering the fact that one could be faced with finding the factors of a large number. Your answers would be of great help. Thanks in advance.
unsigned int* factor(unsigned int n)
{
unsigned int tab[40];
int dim=0;
for(int i=2;i<=(int)sqrt(n);++i)
{
while(n%i==0)
{
tab[dim++]=i;
n/=i;
}
}
if(n>1)
tab[dim++]=n;
return tab;
}
Here's a suggestion on how to do this in 'proper' c++ (since you tagged as c++).
PS. Almost forgot to mention: I optimized the call to sqrt away :)
See it live on http://liveworkspace.org/code/6e2fcc2f7956fafbf637b54be2db014a
#include <vector>
#include <iostream>
#include <iterator>
#include <algorithm>
typedef unsigned int uint;
std::vector<uint> factor(uint n)
{
std::vector<uint> tab;
int dim=0;
for(unsigned long i=2;i*i <= n; ++i)
{
while(n%i==0)
{
tab.push_back(i);
n/=i;
}
}
if(n>1)
tab.push_back(n);
return tab;
}
void test(uint x)
{
auto v = factor(x);
std::cout << x << ":\t";
std::copy(v.begin(), v.end(), std::ostream_iterator<uint>(std::cout, ";"));
std::cout << std::endl;
}
int main(int argc, const char *argv[])
{
test(1);
test(2);
test(4);
test(43);
test(47);
test(9997);
}
Output
1:
2: 2;
4: 2;2;
43: 43;
47: 47;
9997: 13;769;
There's a simple change that will cut the run time somewhat: factor out all the 2's, then only check odd numbers.
If you use
... i*i <= n; ...
It may run much faster than i <= sqrt(n)
By the way, you should try to handle factors of negative n or at least be sure you never pass a neg number
I'm afraid you cannot. There is no known method in the planet can factorize large integers in polynomial time. However, there are some methods can help you slightly (not significantly) speed up your program. Search Wikipedia for more references. http://en.wikipedia.org/wiki/Integer_factorization
As seen from your solution , you find basically all prime numbers ( the condition while (n%i == 0)) works like that , especially for the case of large numbers , you could compute prime numbers beforehand, and keep checking only those. The prime number calculation could be done using Sieve of Eratosthenes method or some other efficient method.
unsigned int* factor(unsigned int n)
If unsigned int is the typical 32-bit type, the numbers are too small for any of the more advanced algorithms to pay off. The usual enhancements for the trial division are of course worthwhile.
If you're moving the division by 2 out of the loop, and divide only by odd numbers in the loop, as mentioned by Pete Becker, you're essentially halving the number of divisions needed to factor the input number, and thus speed up the function by a factor of very nearly 2.
If you carry that one step further and also eliminate the multiples of 3 from the divisors in the loop, you reduce the number of divisions and hence increase the speed by a factor close to 3 (on average; most numbers don't have any large prime factors, but are divisible by 2 or by 3, and for those the speedup is much smaller; but those numbers are quick to factor anyway. If you factor a longer range of numbers, the bulk of the time is spent factoring the few numbers with large prime divisors).
// if your compiler doesn't transform that to bit-operations, do it yourself
while(n % 2 == 0) {
tab[dim++] = 2;
n /= 2;
}
while(n % 3 == 0) {
tab[dim++] = 3;
n /= 3;
}
for(int d = 5, s = 2; d*d <= n; d += s, s = 6-s) {
while(n % d == 0) {
tab[dim++] = d;
n /= d;
}
}
If you're calling that function really often, it would be worthwhile to precompute the 6542 primes not exceeding 65535, store them in a static array, and divide only by the primes to eliminate all divisions that are a priori guaranteed to not find a divisor.
If unsigned int happens to be larger than 32 bits, then using one of the more advanced algorithms would be profitable. You should still begin with trial divisions to find the small prime factors (whether small should mean <= 1000, <= 10000, <= 100000 or perhaps <= 1000000 would need to be tested, my gut feeling says one of the smaller values would be better on average). If after the trial division phase the factorisation is not yet complete, check whether the remaining factor is prime using e.g. a deterministic (for the range in question) variant of the Miller-Rabin test. If it's not, search a factor using your favourite advanced algorithm. For 64 bit numbers, I'd recommend Pollard's rho algorithm or an elliptic curve factorisation. Pollard's rho algorithm is easier to implement and for numbers of that magnitude finds factors in comparable time, so that's my first recommendation.
Int is way to small to encounter any performance problems. I just tried to measure the time of your algorithm with boost but couldn't get any useful output (too fast). So you shouldn't worry about integers at all.
If you use i*i I was able to calculate 1.000.000 9-digit integers in 15.097 seconds. It's good to optimize an algorithm but instead of "wasting" time (depends on your situation) it's important to consider if a small improvement really is worth the effort. Sometimes you have to ask yourself if you rally need to be able to calculate 1.000.000 ints in 10 seconds or if 15 is fine as well.