Sum of all Maximum and minimum in all segments - c++

I am doing a problem in which there is a need to find sum of maximum elements in a segment - sum of minimum elements in a segment.I tried using Sparse Table ,but it is two slow for the time limit.So i did something like this:
If n=4 segments are [1,2],[1,3],[1,4],[2,3],[2,4],[3,4].
The problem is similar to an RMQ problem but i have to do it for all segments and find the
sum=max(a[1],a[2])+
max(a[1],a[2],a[3])+max(a[1],a[2],a[3],a[4])+max(a[2],a[3])+m‌​ax(a[2],a[3],a[4])+max(a[3],a[4])-min(a[1],a[2])+min(a[1],a[2],a[3])+min(a[1],a[2‌​],a[3],a[4])+min(a[2],a[3])+min(a[2],a[3],a[4])+min(a[3],a[4])
for(i=1;i<n;i++)
{
maxtilli[i-1]=INT_MIN;
mintilli[i-1]=INT_MAX;
for(k=1,j=i;j<=n;k++,j++)
{
if(a[j]>maxtilli[k-1])
{
maxtilli[k]=a[j];
}
else
{
maxtilli[k]=maxtilli[k-1];
}
if(a[j]<mintilli[k-1])
{
mintilli[k]=a[j];
}
else
{
mintilli[k]=mintilli[k-1];
}
if(i!=j)
{
ans+=(maxtilli[k]-mintilli[k]);
}
}
}
Here n is of the order of 100,000. So is there any way to optimize it.
Suppose n=4 then segments are [1,2],[1,3],[1,4],[2,3],[2,4],[3,4].
The thing required is
sum=max(a[1],a[2])+max(a[1],a[2],a[3])+max(a[1],a[2],a[3],a[4])+max(a[2],a[3])+m‌​ax(a[2],a[3],a[4])+max(a[3],a[4])-min(a[1],a[2])+min(a[1],a[2],a[3])+min(a[1],a[2‌​],a[3],a[4])+min(a[2],a[3])+min(a[2],a[3],a[4])+min(a[3],a[4])

We can try to finish the first problem, sum of the max value in all segments.
Algorithm
First, you can find the max value a[i] in the whole sequence. All segments which contain a[i] would be considered. The answer plus A[i] * (i * (n - i)). And the problem is split into two small sequences [1, i - 1] and [i + 1, n], you can do it in the same way.
Code
void cal(int L, int R){
max_index = find_max(L, R); // O(logN), using Sparse Table or Segment Tree
int all_segments = (max_index - L + 1) * (R - max_index)
ans += a[max_index] * all_segments;
cal(L, max_index - 1);
cal(max_index + 1, R);
}
// call max_index N times, so the total complexity is O(N * logN)

Related

Need help understanding this line in an FFT algorithm

In my program I have a function that performs the fast Fourier transform. I know there are very good implementations freely available, but this is a learning thing so I don't want to use those. I ended up finding this comment with the following implementation (it originated from the Italian entry for the FFT):
void transform(complex<double>* f, int N) //
{
ordina(f, N); //first: reverse order
complex<double> *W;
W = (complex<double> *)malloc(N / 2 * sizeof(complex<double>));
W[1] = polar(1., -2. * M_PI / N);
W[0] = 1;
for(int i = 2; i < N / 2; i++)
W[i] = pow(W[1], i);
int n = 1;
int a = N / 2;
for(int j = 0; j < log2(N); j++) {
for(int k = 0; k < N; k++) {
if(!(k & n)) {
complex<double> temp = f[k];
complex<double> Temp = W[(k * a) % (n * a)] * f[k + n];
f[k] = temp + Temp;
f[k + n] = temp - Temp;
}
}
n *= 2;
a = a / 2;
}
free(W);
}
I've made a lot of changes by now but this was my starting point. One of the changes I made was to not cache the twiddle factors, because I decided to see if it's needed first. Now I've decided I do want to cache them. The way this implementation seems to do it is it has this array W of length N/2, where every index k has the value . What I don't understand is this expression:
W[(k * a) % (n * a)]
Note that n * a is always equal to N/2. I get that this is supposed to be equal to , and I can see that , which this relies on. I also get that modulo can be used here because the twiddle factors are cyclic. But there's one thing I don't get: this is a length-N DFT, and yet only N/2 twiddle factors are ever calculated. Shouldn't the array be of length N, and the modulo should be by N?
But there's one thing I don't get: this is a length-N DFT, and yet only N/2 twiddle factors are ever calculated. Shouldn't the array be of length N, and the modulo should be by N?
The twiddle factors are equally spaced points on the unit circle, and there is an even number of points because N is a power-of-two. After going around half of the circle (starting at 1, going counter clockwise above the X-axis), the second half is a repeat of the first half but this time it's below the X-axis (the points can be reflected through the origin). That is why Temp is subtracted the second time. That subtraction is the negation of the twiddle factor.

Most efficient way to calculate a lower triangular matrix row index?

I am working with a lower triangular matrix, the function below calculates a row index of such matrix. How can I optimize it in terms of execution time?
The triangular matrix can hold at most N (N + 1) / 2 nonzero elements (where N is the matrix dimension - N x N).
I have a set of numbers 0, 1, 2, ..., N (N + 1) / 2 - 1 and from those, I have to calculate the matrix row index.
My current solution:
inline
unsigned int calc_row(unsigned int i)
{
return static_cast<unsigned int>(floor(sqrt(static_cast<double>(0.25 + 2 * i)) - 0.5));
}
// example:
calc_row(0) == 0;
calc_row(1) == 1;
calc_row(2) == 1;
calc_row(3) == 2;
calc_row(4) == 2;
calc_row(5) == 2;
Question:
1) Do you think my current solution is performance friendly?
2) If not how can I optimize it (in terms of the function execution time)?
If You believe an alternate method to calculate the row index would perform better, I am fine with it. Unfortunately the lookup table is not an option in my case.
EDIT #1:
I just had an idea: Is there a way to make a template metaprogramming version of the lookup table? A way to generate the row number at a compile time could prove to be a significant optimization. The biggest unsigned int i would be around 10 million in my case.
EDIT #2:
I edited the entire question because it caused a major confusion. I am sorry about that.
EDIT #3:
calc_row() calculates the formula: (sqrt(1 + 8 * i) - 1) / 2 which is the solution for the quadratic equation x(x + 1) / 2 = i. Where i is row index.
The main idea for this solution lies in the fact that the linear index for a triangular matrix with diagonal can be calculated as: i (i + 1) / 2 + j. Where i is row index and j is column index.

Most effecient algorithm for finding this LCM summation

Problem : Find
Range of n : 1<= n <=
The main challenge is handling queries(Q) which can be large . 1 <= Q <=
Methods I have used so far :
Brute Force
while(Q--)
{
int N;
cin>>N;
for(int i=1;i<=N;i++)
ans += lcm(i,N)/i ;
}
Complexity :
Preprocessing and Handling queries in
First I build a table which holds the value of euler totient function for every N.
This can be done in O(N).
void sieve()
{
// phi table holds euler totient function value
// lp holds the lowest prime factor for a number
// pr is a vector which contains prime numbers
phi[1]=1;
for(int i=2;i<=MAX;i++)
{
if(lp[i]==0)
{
lp[i]=i;
phi[i]=i-1;
pr.push_back(i);
}
else
{
if(lp[i]==lp[i/lp[i]])
phi[i] = phi[i/lp[i]]*lp[i];
else phi[i] = phi[i/lp[i]]*(lp[i]-1);
}
for(int j=0;j<(int)pr.size()&&pr[j]<=lp[i]&&i*pr[j]<=MAX;j++)
lp[i*pr[j]] = pr[j];
}
For each query factorize N and add d*phi[d] to the result.
for(int i=1;i*i<=n;i++)
{
if(n%i==0)
{
// i is a factor
sum += (n/i)*phi[n/i];
if(i*i!=n)
{
// n/i is a factor too
sum += i*phi[i];
}
}
}
This takes O(sqrt(N)) .
Complexity : O(Q*sqrt(N))
Handling queries in O(1)
To the sieve method I described above I add a part which calculates the answer we need in O(NLogN)
for(int i=1;i<=MAX;++i)
{
//MAX is 10^7
for(int j=i;j<=MAX;j+=i)
{
ans[j] += i*phi[i];
}
}
This unfortunately times out for the given constraints and the time limit (1 second).
I think this involves some clever idea regarding the prime factorization of N .
I can prime factorize a number in O(LogN) using the lp(lowest prime) table built above but I cant figure out how to arrive at the answer using the factorization.
You can try following algorithm:
lcm(i,n) / i = i * n / i * gcd(i, n) = n / gcd(i, n)
Now should find sum of numbers n / gcd(i, n).
Lets n = p1^i1 * p2^i2 * p3^j3 where number p1, p2, ... pk is prime.
Number of items n / gdc(i, n) where gcd(i , n) == 1 is phi[n] = n*(p1-1)*(p2-1)*...*(pk-1)/(p1*p2*...*pk), so add to sum n*phi[n].
Number of items n / gdc(i, n) where gcd(i , n) == p1 is phi[n/p1] = (n/p1)*(p1-1)*(p2-1)*...*(pk-1)/(p1*p2*...*pk), so add to sum n/p1*phi[n/p1].
Number of items n / gdc(i, n) where gcd(i , n) == p1*p2 is phi[n/(p1*p2)] = (n/(p1*p2))*(p1-1)*(p2-1)*...*(pk-1)/(p1*p2*...*pk), so add to sum n/(p1*p2)*phi[n/(p1*p2)].
Now answer is the sum
n/(p1^j1*p2^j2*...*pk^jk) phi[n/(p1^j1*p2^j2*...*pk^jk)]
over all
j1=0,...,i1
j2=0,...,i2
....
jk=0,...,ik
Total number of items in this sum is i1*i2*...*ik that is significantly less then O(n).
To calculate this sum you can use a recursion function with free argument initial number, current representation and initial representation:
initial = {p1:i1, p2:i2, ... ,pn:in}
current = {p1:i1, p2:i2, ... ,pn:in}
visited = {}
int calc(n, initial, current, visited):
if(current in visited):
return 0
visited add current
int sum = 0
for pj in keys of current:
if current[pj] == 0:
continue
current[pj]--
sum += calc(n, initial, current)
current[pj]++
mult1 = n
for pj in keys of current:
mult1 /= pj^current[pj]
mult2 = mult1
for pj in keys of current:
if initial[pj] == current[pj]:
continue
mult2 = mult2*(pj -1)/pj
sum += milt1 * mult2
return sum
Its possible to quickly determine the sum if you know the prime factorization of the number N. Working off the same approach (totient function times N divided by a factor) as the existing answer, but applying some algebra to simplify terms, factor the expression to sums of prime powers, substituting the formula for a geometric series... we arrive at a much simpler solution.
Given the prime factorization of N in primes ps to powers qs, we can compute the result of the original equation for N via:
result = 1
for p, q in prime_factors
result *= p * (p-1) * (p**(2*q) - 1) / (p**2 - 1) + 1
Note that ** denotes exponentiation in the above pseudo-code.
If one sieves for primes up to MAX, storing at least one prime divisor for each composite discovered (as mentioned in the original problem) as precomputation, its possible to then factor the subsequent N values in log(N) time by referencing the factor table. If one also pre-computes a prime power table, the above algorithm can then run in log(N) time, for an overall complexity of O(MAX*log(MAX)) pre-computing time and O(Q*log(MAX)) query time, and O(MAX) space.

Cut rectangle in minimum number of squares

I'm trying to solve the following problem:
A rectangular paper sheet of M*N is to be cut down into squares such that:
The paper is cut along a line that is parallel to one of the sides of the paper.
The paper is cut such that the resultant dimensions are always integers.
The process stops when the paper can't be cut any further.
What is the minimum number of paper pieces cut such that all are squares?
Limits: 1 <= N <= 100 and 1 <= M <= 100.
Example: Let N=1 and M=2, then answer is 2 as the minimum number of squares that can be cut is 2 (the paper is cut horizontally along the smaller side in the middle).
My code:
cin >> n >> m;
int N = min(n,m);
int M = max(n,m);
int ans = 0;
while (N != M) {
ans++;
int x = M - N;
int y = N;
M = max(x, y);
N = min(x, y);
}
if (N == M && M != 0)
ans++;
But I am not getting what's wrong with this approach as it's giving me a wrong answer.
I think both the DP and greedy solutions are not optimal. Here is the counterexample for the DP solution:
Consider the rectangle of size 13 X 11. DP solution gives 8 as the answer. But the optimal solution has only 6 squares.
This thread has many counter examples: https://mathoverflow.net/questions/116382/tiling-a-rectangle-with-the-smallest-number-of-squares
Also, have a look at this for correct solution: http://int-e.eu/~bf3/squares/
I'd write this as a dynamic (recursive) program.
Write a function which tries to split the rectangle at some position. Call the function recursively for both parts. Try all possible splits and take the one with the minimum result.
The base case would be when both sides are equal, i.e. the input is already a square, in which case the result is 1.
function min_squares(m, n):
// base case:
if m == n: return 1
// minimum number of squares if you split vertically:
min_ver := min { min_squares(m, i) + min_squares(m, n-i) | i ∈ [1, n/2] }
// minimum number of squares if you split horizontally:
min_hor := min { min_squares(i, n) + min_squares(m-i, n) | i ∈ [1, m/2] }
return min { min_hor, min_ver }
To improve performance, you can cache the recursive results:
function min_squares(m, n):
// base case:
if m == n: return 1
// check if we already cached this
if cache contains (m, n):
return cache(m, n)
// minimum number of squares if you split vertically:
min_ver := min { min_squares(m, i) + min_squares(m, n-i) | i ∈ [1, n/2] }
// minimum number of squares if you split horizontally:
min_hor := min { min_squares(i, n) + min_squares(m-i, n) | i ∈ [1, m/2] }
// put in cache and return
result := min { min_hor, min_ver }
cache(m, n) := result
return result
In a concrete C++ implementation, you could use int cache[100][100] for the cache data structure since your input size is limited. Put it as a static local variable, so it will automatically be initialized with zeroes. Then interpret 0 as "not cached" (as it can't be the result of any inputs).
Possible C++ implementation: http://ideone.com/HbiFOH
The greedy algorithm is not optimal. On a 6x5 rectangle, it uses a 5x5 square and 5 1x1 squares. The optimal solution uses 2 3x3 squares and 3 2x2 squares.
To get an optimal solution, use dynamic programming. The brute-force recursive solution tries all possible horizontal and vertical first cuts, recursively cutting the two pieces optimally. By caching (memoizing) the value of the function for each input, we get a polynomial-time dynamic program (O(m n max(m, n))).
This problem can be solved using dynamic programming.
Assuming we have a rectangle with width is N and height is M.
if (N == M), so it is a square and nothing need to be done.
Otherwise, we can divide the rectangle into two other smaller one (N - x, M) and (x,M), so it can be solved recursively.
Similarly, we can also divide it into (N , M - x) and (N, x)
Pseudo code:
int[][]dp;
boolean[][]check;
int cutNeeded(int n, int m)
if(n == m)
return 1;
if(check[n][m])
return dp[n][m];
check[n][m] = true;
int result = n*m;
for(int i = 1; i <= n/2; i++)
int tmp = cutNeeded(n - i, m) + cutNeeded(i,m);
result = min(tmp, result);
for(int i = 1; i <= m/2; i++)
int tmp = cutNeeded(n , m - i) + cutNeeded(n,i);
result = min(tmp, result);
return dp[n][m] = result;
Here is a greedy impl. As #David mentioned it is not optimal and is completely wrong some cases so dynamic approach is the best (with caching).
def greedy(m, n):
if m == n:
return 1
if m < n:
m, n = n, m
cuts = 0
while n:
cuts += m/n
m, n = n, m % n
return cuts
print greedy(2, 7)
Here is DP attempt in python
import sys
def cache(f):
db = {}
def wrap(*args):
key = str(args)
if key not in db:
db[key] = f(*args)
return db[key]
return wrap
#cache
def squares(m, n):
if m == n:
return 1
xcuts = sys.maxint
ycuts = sys.maxint
x, y = 1, 1
while x * 2 <= n:
xcuts = min(xcuts, squares(m, x) + squares(m, n - x))
x += 1
while y * 2 <= m:
ycuts = min(ycuts, squares(y, n) + squares(m - y, n))
y += 1
return min(xcuts, ycuts)
This is essentially classic integer or 0-1 knapsack problem that can be solved using greedy or dynamic programming approach. You may refer to: Solving the Integer Knapsack

How to compute sum of evenly spaced binomial coefficients

How to find sum of evenly spaced Binomial coefficients modulo M?
ie. (nCa + nCa+r + nCa+2r + nCa+3r + ... + nCa+kr) % M = ?
given: 0 <= a < r, a + kr <= n < a + (k+1)r, n < 105, r < 100
My first attempt was:
int res = 0;
int mod=1000000009;
for (int k = 0; a + r*k <= n; k++) {
res = (res + mod_nCr(n, a+r*k, mod)) % mod;
}
but this is not efficient. So after reading here
and this paper I found out the above sum is equivalent to:
summation[ω-ja * (1 + ωj)n / r], for 0 <= j < r; and ω = ei2π/r is a primitive rth root of unity.
What can be the code to find this sum in Order(r)?
Edit:
n can go upto 105 and r can go upto 100.
Original problem source: https://www.codechef.com/APRIL14/problems/ANUCBC
Editorial for the problem from the contest: https://discuss.codechef.com/t/anucbc-editorial/5113
After revisiting this post 6 years later, I'm unable to recall how I transformed the original problem statement into mine version, nonetheless, I shared the link to the original solution incase anyone wants to have a look at the correct solution approach.
Binomial coefficients are coefficients of the polynomial (1+x)^n. The sum of the coefficients of x^a, x^(a+r), etc. is the coefficient of x^a in (1+x)^n in the ring of polynomials mod x^r-1. Polynomials mod x^r-1 can be specified by an array of coefficients of length r. You can compute (1+x)^n mod (x^r-1, M) by repeated squaring, reducing mod x^r-1 and mod M at each step. This takes about log_2(n)r^2 steps and O(r) space with naive multiplication. It is faster if you use the Fast Fourier Transform to multiply or exponentiate the polynomials.
For example, suppose n=20 and r=5.
(1+x) = {1,1,0,0,0}
(1+x)^2 = {1,2,1,0,0}
(1+x)^4 = {1,4,6,4,1}
(1+x)^8 = {1,8,28,56,70,56,28,8,1}
{1+56,8+28,28+8,56+1,70}
{57,36,36,57,70}
(1+x)^16 = {3249,4104,5400,9090,13380,9144,8289,7980,4900}
{3249+9144,4104+8289,5400+7980,9090+4900,13380}
{12393,12393,13380,13990,13380}
(1+x)^20 = (1+x)^16 (1+x)^4
= {12393,12393,13380,13990,13380}*{1,4,6,4,1}
{12393,61965,137310,191440,211585,203373,149620,67510,13380}
{215766,211585,204820,204820,211585}
This tells you the sums for the 5 possible values of a. For example, for a=1, 211585 = 20c1+20c6+20c11+20c16 = 20+38760+167960+4845.
Something like that, but you have to check a, n and r because I just put anything without regarding about the condition:
#include <complex>
#include <cmath>
#include <iostream>
using namespace std;
int main( void )
{
const int r = 10;
const int a = 2;
const int n = 4;
complex<double> i(0.,1.), res(0., 0.), w;
for( int j(0); j<r; ++j )
{
w = exp( i * 2. * M_PI / (double)r );
res += pow( w, -j * a ) * pow( 1. + pow( w, j ), n ) / (double)r;
}
return 0;
}
the mod operation is expensive, try avoiding it as much as possible
uint64_t res = 0;
int mod=1000000009;
for (int k = 0; a + r*k <= n; k++) {
res += mod_nCr(n, a+r*k, mod);
if(res > mod)
res %= mod;
}
I did not test this code
I don't know if you reached something or not in this question, but the key to implementing this formula is to actually figure out that w^i are independent and therefore can form a ring. In simpler terms you should think of implement
(1+x)^n%(x^r-1) or finding out (1+x)^n in the ring Z[x]/(x^r-1)
If confused I will give you an easy implementation right now.
make a vector of size r . O(r) space + O(r) time
initialization this vector with zeros every where O(r) space +O(r) time
make the first two elements of that vector 1 O(1)
calculate (x+1)^n using the fast exponentiation method. each multiplication takes O(r^2) and there are log n multiplications therefore O(r^2 log(n) )
return first element of the vector.O(1)
Complexity
O(r^2 log(n) ) time and O(r) space.
this r^2 can be reduced to r log(r) using fourier transform.
How is the multiplication done, this is regular polynomial multiplication with mod in the power
vector p1(r,0);
vector p2(r,0);
p1[0]=p1[1]=1;
p2[0]=p2[1]=1;
now we want to do the multiplication
vector res(r,0);
for(int i=0;i<r;i++)
{
for(int j=0;j<r;j++)
{
res[(i+j)%r]+=(p1[i]*p2[j]);
}
}
return res[0];
I have implemented this part before, if you are still cofused about something let me know. I would prefer that you implement the code yourself, but if you need the code let me know.