Can Anyone reduce the Complexity of My Code. Problem E of Codeforces Round113 Div.2 - c++

Link to The Problem:
Problem Statement:
*You are given a tetrahedron. Let's mark its vertices with letters A, B, C, and D correspondingly.
An ant is standing in the vertex D of the tetrahedron. The ant is quite active and he wouldn't stay idle. At each moment of time, he makes a step from one vertex to another one along some edge of the tetrahedron. The ant just can't stand on one place.
You do not have to do much to solve the problem: your task is to count the number of ways in which the ant can go from the initial vertex D to itself in exactly n steps. In other words, you are asked to find out the number of different cyclic paths with the length of n from vertex D to itself. As the number can be quite large, you should print it modulo 1000000007 (10^9 + 7).*
The first line contains the only integer n (1 ≤ n ≤ 107) — the required length of the cyclic path.
Print the only integer — the required number of ways modulo 1000000007 (10e9 + 7).
Example: Input n=2 , Output: 3
Input n=4, Output: 21
My Approach to Problem:
I have written a recursive code that takes two input n and present index, then I am traveling and exploring all possible combinations.
using namespace std;
#define mod 10000000
#define ll long long
ll count_moves=0;
ll count(ll n, int present)
if(n==0 and present==0) count_moves+=1, count_moves%=mod; //base_condition
else if(n>1){ //Generating All possible Combinations
else if(n==1 and present) count(n-1,0);
int main()
ll n; cin>>n;
if(n==1) {
cout<<"0"; return;
But the problem is that I am getting Time Limit Error since Time Complexity of my Code is very high. Please Can anyone suggest me how can I optimize/Memoize my code to reduce its complexity?
#**Edit 1: ** Some People are commenting about macros and division well it's not an issue. The Range of n is 10^7 and complexity of my code is exponential so my actual doubt is how to decrease it to linear time. i,e O(n).

Anytime you built into a recursion and you exceeded time complexity, you have to understand the recursion is likely the problem.
The best solution is to not use a recursion.
Look at the result you have:
   ⋮      ⋮
While it might be hard to find a pattern for the first couple terms, but it gets easier later on.
Each term is roughly 3 times larger than the last term, or more precisely,
Now you could just write a for loop for it:
for(int i = 0; i < n-1; i++)
count_moves = count_moves * 3 + std::pow(-1, i) * 3;
or to get rid of pow():
for(int i = 0; i < n-1; i++)
count_moves = count_moves * 3 + (i % 2 * 2 - 1) * -3;
Further more, you could even build that into a general term formula to get rid of the for loop:
or in code:
count_moves = (pow(3, n) + (n % 2 * 2 - 1) * -3) / 4;
However, you can't get rid of the pow() this time, or you will have to write a loop for that then.

I believe one of your issues is that you are recalculating things.
Take for example n=4. count(3,x) is called 3 times for x in [0,3].
However if you made a std::map<int,int> you could save the value for (n,present) pairs and only calculate each value once.
This will take more space. The map will be 4*(n-1) big when you are done. That is still probably too large for 10^9?
Another thing you can do is multithread. Each call to count can instigate its own thread. You need to be careful then to be thread safe when changing the global count and the state of the std::map if you decide to use it.
Calculate count(n,x) one time for n in [1,n-1] x in [0,3] then count[n,0] = a*count(n-1,1) +b*count(n-1,2) +c*count(n-1,3).
If you can figure out the pattern for what a,b,c are given n or maybe even the a,b,c for the n-1 case then you may be able to solve this problem easily.


Finding the sequence so that the event is finished at the earliest

This is a problem from informatica olympiad that I am trying to solve since sometime. This is important for me since this contains an underlying fundamental problem that I see in a lot of problems.
Given N citizens for an event such that they have to program on a single computer, eat chocolates and then eat doughnuts. time , ith citizen takes for each task is given as input. Each citizen has to finish the tasks in order, i.e., first program then eat chocolate and then eat doughnuts. Any number of people could eat chocolates or doughnuts at a time but since computer is one only 1 person can program each time. Once, he is done he would move to chocolates and next person shall program. The task is to find the order in which citizens be sent out to program such that event ends in minimum time and this time is the output.
I worked this problem using the approach:
If I start with ith citizen then for remaining n-1 citizens if I find the time (tn-1) then tn = max((ni[0]+ni[1]+ni[2]), ni[0] + tn-1). Eg.:
18 7 6
23 10 27
20 9 14
then 18+7+6, 18+23+10+27, 18+23+20+9+14, max would be 84 but if you start with 23 then time would be 74 which is less.
I implemented this approach whose code I am presenting here. However, the complexity is O(n!) for my approach. I can see underlying repeated subproblems,so I could use DP approach. But the problem is I need to store the time value for each list i to j such that it could begin with any k from i to j and so on. This storage process would again be complex and require n! storage. How, to solve this problem and similar such problems?
Here is my program on my approach:
#include <iostream>
#include <vector>
#include <climits>
int min_time_sequence(std::vector<std::vector<int> > Info, int N)
if (N == 0) return 0;
if (N == 1)
int val = Info[0][0] + Info[0][1] + Info[0][2];
return val;
std::vector<std::vector<int> > tmp = Info;
int mn = INT_MAX;
for (int i = 0; i < N; ++i)
//prepare new list
int mn = min_time_sequence(tmp, N-1);
int v1 = Info[i][0] + mn;
int v2 = Info[i][0] + Info[i][1] + Info[i][2];
int larger = v1 > v2 ? v1 : v2;
if (mn > larger) mn = larger;
return mn;
int main()
int N;
std::vector<std::vector<int> > Info;
for (int i = 0; i < N; ++i)
int mx = 0;
if (N > 0)
mx = min_time_sequence(Info, N);
return 0;
Since you asked for general techniques, you might want to look at greedy algorithms, that is, algorithms that repeatedly optimize the next selection. In this case, that might be for the remaining person who will take the longest total time (the sum of the three times) to program next, so he or she will finish eating sooner, and no one who starts later will take more time.
If such an algorithm were optimal, the program could simply sort the list by the sum of times, in decreasing order, which takes O(N log N) time.
You would, however, be expected to prove that your solution is valid. One way to do that is known as “Greedy Stays Ahead.” That is an inductive proof where you show that the solution your greedy algorithm produces is at least as optimal (by some measure equivalent to optimality at the final step) at its first step, then that it is also as good at its second step, the step after that, and so on. Hint: you might try measuring what is the worst-case scenario for how much time the event could need after each person starts programming. At the final step, when the last person gets to start programming, this is equivalent to optimality.
Another method to prove an algorithm is optimal is “Proof by Exchange.” This is a form of proof by contradiction in which you hypothesize that some different solution is optimal, then you show that exchanging a part of that solution with a part of your solution could improve the supposedly-optimal solution. That contradicts the premise that it was ever optimal—which proves that no other solution is better than this. So: assume the optimal order is different, meaning the last person who finishes started after someone else who took less time. What happens if you switch the positions of those two people?
Greedy solutions are not always best, so in cases where they are not, you would want to look at other techniques, such as symmetry-breaking and pruning the search tree early.

how to find the minimum number of primatics that sum to a given number

Given a number N (<=10000), find the minimum number of primatic numbers which sum up to N.
A primatic number refers to a number which is either a prime number or can be expressed as power of prime number to itself i.e. prime^prime e.g. 4, 27, etc.
I tried to find all the primatic numbers using seive and then stored them in a vector (code below) but now I am can't see how to find the minimum of primatic numbers that sum to a given number.
Here's my sieve:
#define MAX 10000
typedef long long int ll;
ll modpow(ll a, ll n, ll temp) {
ll res=1, y=a;
while (n>0) {
if (n&1)
return res%temp;
int isprimeat[MAX+20];
std::vector<int> primeat;
//Finding all prime numbers till 10000
void seive()
ll i,j;
for (i=2; i<=MAX; i++) {
if (isprimeat[i]==0) {
for (j=i*i; j<=MAX; j+=i) {
for (i=2; i<=MAX; i++) {
if (isprimeat[i]==0) {
int main()
std::sort(primeat.begin(), primeat.end());
return 0;
One method could be to store all primatics less than or equal to N in a sorted list - call this list L - and recursively search for the shortest sequence. The easiest approach is "greedy": pick the largest spans / numbers as early as possible.
for N = 14 you'd have L = {2,3,4,5,7,8,9,11,13}, so you'd want to make an algorithm / process that tries these sequences:
13 is too small
13 + 13 -> 13 + 2 will be too large
11 is too small
11 + 11 -> 11 + 4 will be too large
11 + 3 is a match.
You can continue the process by making the search function recurse each time it needs another primatic in the sum, which you would aim to have occur a minimum number of times. To do so you can pick the largest -> smallest primatic in each position (the 1st, 2nd etc primatic in the sum), and include another number in the sum only if the primatics in the sum so far are small enough that an additional primatic won't go over N.
I'd have to make a working example to find a small enough N that doesn't result in just 2 numbers in the sum. Note that because you can express any natural number as the sum of at most 4 squares of natural numbers, and you have a more dense set L than the set of squares, so I'd think it rare you'd have a result of 3 or more for any N you'd want to compute by hand.
Dynamic Programming approach
I have to clarify that 'greedy' is not the same as 'dynamic programming', it can give sub-optimal results. This does have a DP solution though. Again, i won't write the final process in code but explain it as a point of reference to make a working DP solution from.
To do this we need to build up solutions from the bottom up. What you need is a structure that can store known solutions for all numbers up to some N, this list can be incrementally added to for larger N in an optimal way.
Consider that for any N, if it's primatic then the number of terms for N is just 1. This applies for N=2-5,7-9,11,13,16,17,19. The number of terms for all other N must be at least two, which means either it's a sum of two primatics or a sum of a primatic and some other N.
The first few examples that aren't trivial:
6 - can be either 2+4 or 3+3, all the terms here are themselves primatic so the minimum number of terms for 6 is 2.
10 - can be either 2+8, 3+7, 4+6 or 5+5. However 6 is not primatic, and taking that solution out leaves a minimum of 2 terms.
12 - can be either 2+10, 3+9, 4+8, 5+7 or 6+6. Of these 6+6 and 2+10 contain non-primatics while the others do not, so again 2 terms is the minimum.
14 - ditto, there exist two-primatic solutions: 3+11, 5+9, 7+7.
The structure for storing all of these solutions needs to be able to iterate across solutions of equal rank / number of terms. You already have a list of primatics, this is also the list of solutions that need only one term.
Sol[term_length] = list(numbers). You will also need a function / cache to look up some N's shortest-term-length, eg S(N) = term_length iif N in Sol[term_length]
Sol[1] = {2,3,4,5 ...} and Sol[2] = {6,10,12,14 ...} and so on for Sol[3] and onwards.
Any solution can be found using one term from Sol[1] that is primatic. Any solution requiring two primatics will be found in Sol[2]. Any solution requiring 3 will be in Sol[3] etc.
What you need to recognize here is that a number S(N) = 3 can be expressed Sol[1][a] + Sol[1][b] + Sol[1][c] for some a,b,c primatics, but it can also be expressed as Sol[1][a] + Sol[2][d], since all Sol[2] must be expressible as Sol[1][x] + Sol[1][y].
This algorithm will in effect search Sol[1] for a given N, then look in Sol[1] + Sol[K] with increasing K, but to do this you will need S and Sol structures roughly in the form shown here (or able to be accessed / queried in a similar manner).
Working Example
Using the above as a guideline I've put this together quickly, it even shows which multi-term sum it uses.
I can explain the code in-depth if you want but the real DP section is around lines 40-64. The recursion depth (also number of additional terms in the sum) is k, a simple dual-iterator while loop checks if a sum is possible using the kth known solutions and primatics, if it is then we're done and if not then check k+1 solutions, if any. Sol and S work as described.
The only confusing part might be the use of reverse iterators, it's just to make != end() checking consistent for the while condition (end is not a valid iterator position but begin is, so != begin would be written differently).
Edit - FYI, the first number that takes at least 3 terms is 959 - had to run my algorithm to 1000 numbers to find it. It's summed from 6 + 953 (primatic), no matter how you split 6 it's still 3 terms.

How to calculate the sum of the bitwise xor values of all the distinct combination of the given numbers efficiently?

Given n(n<=1000000) positive integer numbers (each number is smaller than 1000000). The task is to calculate the sum of the bitwise xor ( ^ in c/c++) value of all the distinct combination of the given numbers.
Time limit is 1 second.
For example, if 3 integers are given as 7, 3 and 5, answer should be 7^3 + 7^5 + 3^5 = 12.
My approach is:
#include <bits/stdc++.h>
using namespace std;
int num[1000001];
int main()
int n, i, sum, j;
scanf("%d", &n);
scanf("%d", &num[i]);
printf("%d\n", sum);
return 0;
But my code failed to run in 1 second. How can I write my code in a faster way, which can run in 1 second ?
Edit: Actually this is an Online Judge problem and I am getting Cpu Limit Exceeded with my above code.
You need to compute around 1e12 xors in order to brute force this. Modern processors can do around 1e10 such operations per second. So brute force cannot work; therefore they are looking for you to figure out a better algorithm.
So you need to find a way to determine the answer without computing all those xors.
Hint: can you think of a way to do it if all the input numbers were either zero or one (one bit)? And then extend it to numbers of two bits, three bits, and so on?
When optimising your code you can go 3 different routes:
Optimising the algorithm.
Optimising the calls to language and library functions.
Optimising for the particular architecture.
There may very well be a quicker mathematical way of xoring every pair combination and then summing them up, but I know it not. In any case, on the contemporary processors you'll be shaving off microseconds at best; that is because you are doing basic operations (xor and sum).
Optimising for the architecture also makes little sense. It normally becomes important in repetitive branching, you have nothing like that here.
The biggest problem in your algorithm is reading from the standard input. Despite the fact that "scanf" takes only 5 characters in your computer code, in machine language this is the bulk of your program. Unfortunately, if the data will actually change each time your run your code, there is no way around the requirement of reading from stdin, and there will be no difference whether you use scanf, std::cin >>, or even will attempt to implement your own method to read characters from input and convert them into ints.
All this assumes that you don't expect a human being to enter thousands of numbers in less than one second. I guess you can be running your code via: myprogram < data.
This function grows quadratically (thanks #rici). At around 25,000 positive integers with each being 999,999 (worst case) the for loop calculation alone can finish in approximately a second. Trying to make this work with input as you have specified and for 1 million positive integers just doesn't seem possible.
With the hint in Alan Stokes's answer, you may have a linear complexity instead of quadratic with the following:
std::size_t xor_sum(const std::vector<std::uint32_t>& v)
std::size_t res = 0;
for (std::size_t b = 0; b != 32; ++b) {
const std::size_t count_0 =
std::count_if(v.begin(), v.end(),
[b](std::uint32_t n) { return (n >> b) & 0x01; });
const std::size_t count_1 = v.size() - count_0;
res += count_0 * count_1 << b;
return res;
Live Demo.
x^y = Sum_b((x&b)^(y&b)) where b is a single bit mask (from 1<<0 to 1<<32).
For a given bit, with count_0 and count_1 the respective number of count of number with bit set to 0 or 1, we have count_0 * (count_0 - 1) 0^0, count_0 * count_1 0^1 and count_1 * (count_1 - 1) 1^1 (and 0^0 and 1^1 are 0).

Need a way to make this code run faster

I'm trying to solve Project Euler problem 401. They only way I could find a way to solve it was brute-force. I've been running this code for like 10 mins without any answer. Can anyone help me with ideas improve it.
#include <iostream>
#include <cmath>
#define ull unsigned long long
using namespace std;
ull sigma2(ull n);
ull SIGMA2(ull n);
int main()
ull ans = SIGMA2(1000000000000000) % 1000000000;
cout << "Answer: " << ans << endl;
return 0;
ull sigma2(ull n)
ull sum = 0;
for(ull i = 1; i<=floor(sqrt(n)); i++)
if(n%i == 0)
sum += (i*i)+((n/i)*(n/i));
if(i*i == n)
sum -= n;
return sum;
ull SIGMA2(ull n)
ull sum = 0;
for(ull i = 1; i<=n; i++)
return sum;
You're missing some dividers, if a/b=c, and b is a divider of a then c will also be a divider of a but cmight be greater than floor(sqrt(a)), for example 3 > floor(sqrt(6)) but divides 6.
Then you should put your floor(sqrt(n)) in a variable and use the variable in the for, otherwise you recalculate it a every operation which is very expensive.
You can do some straightforward optimizations:
inline sigma2,
calculate floor(sqrt(n)) before the loop (but compiler may be doing it anyway, though),
precalculate squares of all ints from 1 to n and then use array lookup instead of multiplication
You will gain more by changing your approach. Think what you are trying to do - summing squares of all divisors of all integers from 1 to n. You grouped divisors by what they divide, but you can regroup terms in this sum. Let's group divisors by their value:
1 divides everything so it will appear n times in the sum, bringing 1*1*n total,
2 divides evens and will appear n/2 (integer division!) times, bringing 2*2*(n/2) total,
k ... will bring k*k*(n/k) total.
So we should just add up k*k*(n/k) for k from 1 to n.
Think about the problem.
Bruteforce the way you tried is obviously not a good idea.
You should come up with something better...
Isn't there any method how to use some nice prime factorization method to speed up the computation? Isn't there any recursion pattern? Try to find something...
One simple optimization that you can carry out is that there will be many repeated factors in the numbers.
So first estimate in how many numbers would 1 be a factor ( all N numbers ).
In how many numbers would 2 be a factor ( N/2 ).
Similarly for others.
Just multiply their squares with their frequency.
Time complexity shall then straight-away reduce to O(N)
There are obvious microoptimizations such as ++i rather than i++ or getting floor(sqrt(n)) out of the loop (these are two floating point operations which are really expensive compared to other integer operation in the loop), and calculting n/i only once (use a dummy variable for it and then calculate the square of the dummy).
There are also rather obvious simplifications in the algorithm. For example SIGMA2(i) = SIGMA2(i-1) + sigma2(i). But do not use recursion since you need a really huge number, this would not work and your stack memory would be exhausted. Use loop instead of recursion. There is a huge potential for improvement.
And well, there is a bigger problem - 10^15 has 15 digits. This number squared has 30 digits. There is no way you can store this into unsigned long long, which has I think about 20 digits. So you need to employ somehow the modulo 10^9 (the end of the assignment) and get additional space for your calculations...
And when using brute force, print out the temporary result every milion number for example to give you idea how fast you are approaching to the final result. Waiting 10 minutes blindly is not a good idea.

How can I find number of consecutive sequences of various lengths satisfy a particular property?

I am given a array A[] having N elements which are positive integers
.I have to find the number of sequences of lengths 1,2,3,..,N that satisfy a particular property?
I have built an interval tree with O(nlogn) complexity.Now I want to count the number of sequences that satisfy a certain property ?
All the properties required for the problem are related to sum of the sequences
Note an array will have N*(N+1)/2 sequences. How can I iterate over all of them in O(nlogn) or O(n) ?
If we let k be the moving index from 0 to N(elements), we will run an algorithm that is essentially looking for the MIN R that satisfies the condition (lets say I), then every other subset for L = k also is satisfied for R >= I (this is your short circuit). After you find I, simply return an output for (L=k, R>=I). This of course assumes that all numerics in your set are >= 0.
To find I, for every k, begin at element k + (N-k)/2. Figure out if this defined subset from (L=k, R=k+(N-k)/2) satisfies your condition. If it does, then decrement R until your condition is NOT met, then R=1 is your MIN (your could choose to print these results as you go, but they results in these cases would be essentially printed backwards). If (L=k, R=k+(N-k)/2) does not satisfy your condition, then INCREMENT R until it does, and this becomes your MIN for that L=k. This degrades your search space for each L=k by a factor of 2. As k increases and approaches N, your search space continuously decreases.
// This declaration wont work unless N is either a constant or MACRO defined above
unsigned int myVals[N];
unsigned int Ndiv2 = N / 2;
unsigned int R;
for(unsigned int k; k < N; k++){
if(TRUE == TESTVALS(myVals, k, Ndiv2)){ // It Passes
for(I = NDiv2; I>=k; I--){
if(FALSE == TESTVALS(myVals, k, I)){
}else{ // It Didnt Pass
for(I = NDiv2; I>=k; I++){
if(TRUE == TESTVALS(myVals, k, I)){
// PRINT ALL PAIRS from L=k, from R=I to R=N-1
if((k & 0x00000001) == 0) Ndiv2++;
} // END --> for(unsigned int k; k < N; k++)
The complexity of the algorithm above is O(N^2). This is because for each k in N(i.e. N iterations / tests) there is no greater than N/2 values for each that need testing. Big O notation isnt concerned about the N/2 nor the fact that truly N gets smaller as k grows, it is concerned with really only the gross magnitude. Thus it would say N tests for every N values thus O(N^2)
There is an Alternative approach which would be FASTER. That approach would be to whenever you wish to move within the secondary (inner) for loops, you could perform a move have the distance algorithm. This would get you to your O(nlogn) set of steps. For each k in N (which would all have to be tested), you run this half distance approach to find your MIN R value in logN time. As an example, lets say you have a 1000 element array. when k = 0, we essentially begin the search for MIN R at index 500. If the test passes, instead of linearly moving downward from 500 to 0, we test 250. Lets say the actual MIN R for k = 0 is 300. Then the tests to find MIN R would look as follows:
While this is oversimplified, your are most likely going to have to optimize, and test 301 as well 299 to make sure youre in the sweet spot. Another not is to be careful when dividing by 2 when you have to move in the same direction more than once in a row.
