Need help understanding this line in an FFT algorithm - c++

In my program I have a function that performs the fast Fourier transform. I know there are very good implementations freely available, but this is a learning thing so I don't want to use those. I ended up finding this comment with the following implementation (it originated from the Italian entry for the FFT):
void transform(complex<double>* f, int N) //
{
ordina(f, N); //first: reverse order
complex<double> *W;
W = (complex<double> *)malloc(N / 2 * sizeof(complex<double>));
W[1] = polar(1., -2. * M_PI / N);
W[0] = 1;
for(int i = 2; i < N / 2; i++)
W[i] = pow(W[1], i);
int n = 1;
int a = N / 2;
for(int j = 0; j < log2(N); j++) {
for(int k = 0; k < N; k++) {
if(!(k & n)) {
complex<double> temp = f[k];
complex<double> Temp = W[(k * a) % (n * a)] * f[k + n];
f[k] = temp + Temp;
f[k + n] = temp - Temp;
}
}
n *= 2;
a = a / 2;
}
free(W);
}
I've made a lot of changes by now but this was my starting point. One of the changes I made was to not cache the twiddle factors, because I decided to see if it's needed first. Now I've decided I do want to cache them. The way this implementation seems to do it is it has this array W of length N/2, where every index k has the value . What I don't understand is this expression:
W[(k * a) % (n * a)]
Note that n * a is always equal to N/2. I get that this is supposed to be equal to , and I can see that , which this relies on. I also get that modulo can be used here because the twiddle factors are cyclic. But there's one thing I don't get: this is a length-N DFT, and yet only N/2 twiddle factors are ever calculated. Shouldn't the array be of length N, and the modulo should be by N?

But there's one thing I don't get: this is a length-N DFT, and yet only N/2 twiddle factors are ever calculated. Shouldn't the array be of length N, and the modulo should be by N?
The twiddle factors are equally spaced points on the unit circle, and there is an even number of points because N is a power-of-two. After going around half of the circle (starting at 1, going counter clockwise above the X-axis), the second half is a repeat of the first half but this time it's below the X-axis (the points can be reflected through the origin). That is why Temp is subtracted the second time. That subtraction is the negation of the twiddle factor.

Related

Hyperbolic sine without math.h

im new to code and c++ for a homework assignment im to create a code for sinh without the math file. I understand the math behind sinh, but i have no idea how to code it, any help would be highly appreciated.
According to Wikipedia, there is a Taylor series for sinh:
sinh(x) = x + (pow(x, 3) / 3!) + (pow(x, 5) / 5!) + pow(x, 7) / 7! + ...
One challenge is that you are not allowed to use the pow function. The other is calculating the factorial.
The series is a sum of terms, so you'll need a loop:
double sum = 0.0;
for (unsigned int i = 0; i < NUMBER_OF_TERMS; ++i)
{
sum += Term(i);
}
You could implement Term as a separate function, but you may want to take advantage of declaring and using variables in the loop (that the function may not have access to).
Consider that pow(x, N) expands to x * x * x...
This means that in each iteration the previous value is multiplied by the present value. (This will come in handy later.)
Consider that N! expands to 1 * 2 * 3 * 4 * 5 * ...
This means that in each iteration, the previous value is multiplied by the iteration number.
Let's revisit the loop:
double sum = 0.0;
double power = 1.0;
double factorial = 1.0;
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
}
One issue with the above loop is that the pow and factorial need to be calculated for each iteration, but the Taylor Series terms use the odd iterations. This is solved by calculated the terms for odd iterations:
for (unsigned int i = 1; i <= NUMBER_OF_TERMS; ++i)
{
// Calculate pow(x, i)
power = power * x;
// Calculate x!
factorial = factorial * i;
// Calculate sum for odd iterations
if ((i % 2) == 1)
{
// Calculate the term.
sum += //...
}
}
In summary, the pow and factorial functions are broken down into iterative pieces. The iterative pieces are placed into a loop. Since the Taylor Series terms are calculated with odd iteration values, a check is placed into the loop.
The actual calculation of the Taylor Series term is left as an exercise for the OP or reader.

Cholesky decomposition in Halide

I'm trying to implement a Cholesky decomposition in Halide. Part of common algorithm such as crout consists of an iteration over a triangular matrix. In a way that, the diagonal elements of the decomposition are computed by subtracting a partial column sum from the diagonal element of the input matrix. Column sum is calculated over squared elements of a triangular part of the input matrix, excluding the diagonal element.
Using BLAS the code would in C++ look as follows:
double* a; /* input matrix */
int n; /* dimension */
const int c__1 = 1;
const double c_b12 = 1.;
const double c_b10 = -1.;
for (int j = 0; j < n; ++j) {
double ajj = a[j + j * n] - ddot(&j, &a[j + n], &n, &a[j + n], &n);
ajj = sqrt(ajj);
a[j + j * n] = ajj;
if (j < n) {
int i__2 = n - j;
dgemv("No transpose", &i__2, &j, &c_b10, &a[j + 1 + n], &n, &a[j + n], &b, &c_b12, &a[j + 1 + j * n], &c__1);
double d__1 = 1. / ajj;
dscal(&i__2, &d__1, &a[j + 1 + j * n], &c__1);
}
}
My question is if a pattern like this is in general expressible by Halide? And if so, how would it look like?
I think Andrew may have a more complete answer, but in the interest of a timely response, you can use an RDom predicate (introduced via RDom::where) to enumerate triangular regions (or their generalization to more dimensions). A sketch of the pattern is:
Halide::RDom triangular(0, extent, 0, extent);
triangular.where(triangular.x < triangular.y);
Then use triangular in a reduction.
I once had a fast Cholesky written in Halide. Unfortunately I can't find the code. I put the outer loop in C and wrote a good block-panel update routine that operated on something like a 32-wide panel at a time. This was before Halide had triangular iteration, so maybe you can do better now.

Efficient C/C++ algorithm on 2-dimensional max-sum window

I have a c[N][M] matrix where I apply a max-sum operation over a (K+1)² window. I am trying to reduce the complexity of the naive algorithm.
In particular, here's my code snippet in C++:
<!-- language: cpp -->
int N,M,K;
std::cin >> N >> M >> K;
std::pair< unsigned , unsigned > opt[N][M];
unsigned c[N][M];
// Read values for c[i][j]
// Initialize all opt[i][j] at (0,0).
for ( int i = 0; i < N; i ++ ) {
for ( int j = 0; j < M ; j ++ ) {
unsigned max = 0;
int posX = i, posY = j;
for ( int ii = i; (ii >= i - K) && (ii >= 0); ii -- ) {
for ( int jj = j; (jj >= j - K) && (jj >= 0); jj -- ) {
// Ignore the (i,j) position
if (( ii == i ) && ( jj == j )) {
continue;
}
if ( opt[ii][jj].second > max ) {
max = opt[ii][jj].second;
posX = ii;
posY = jj;
}
}
}
opt[i][j].first = opt[posX][posY].second;
opt[i][j].second = c[i][j] + opt[posX][posY].first;
}
}
The goal of the algorithm is to compute opt[N-1][M-1].
Example: for N = 4, M = 4, K = 2 and:
c[N][M] = 4 1 1 2
6 1 1 1
1 2 5 8
1 1 8 0
... the result should be opt[N-1][M-1] = {14, 11}.
The running complexity of this snippet is however O(N M K²). My goal is to reduce the running time complexity. I have already seen posts like this, but it appears that my "filter" is not separable, probably because of the sum operation.
More information (optional): this is essentially an algorithm which develops the optimal strategy in a "game" where:
Two players lead a single team in a N × M dungeon.
Each position of the dungeon has c[i][j] gold coins.
Starting position: (N-1,M-1) where c[N-1][M-1] = 0.
The active player chooses the next position to move the team to, from position (x,y).
The next position can be any of (x-i, y-j), i <= K, j <= K, i+j > 0. In other words, they can move only left and/or up, up to a step K per direction.
The player who just moved the team gets the coins in the new position.
The active player alternates each turn.
The game ends when the team reaches (0,0).
Optimal strategy for both players: maximize their own sum of gold coins, if they know that the opponent is following the same strategy.
Thus, opt[i][j].first represents the coins of the player who will now move from (i,j) to another position. opt[i][j].second represents the coins of the opponent.
Here is a O(N * M) solution.
Let's fix the lower row(r). If the maximum for all rows between r - K and r is known for every column, this problem can be reduced to a well-known sliding window maximum problem. So it is possible to compute the answer for a fixed row in O(M) time.
Let's iterate over all rows in increasing order. For each column the maximum for all rows between r - K and r is the sliding window maximum problem, too. Processing each column takes O(N) time for all rows.
The total time complexity is O(N * M).
However, there is one issue with this solution: it does not exclude the (i, j) element. It is possible to fix it by running the algorithm described above twice(with K * (K + 1) and (K + 1) * K windows) and then merging the results(a (K + 1) * (K + 1) square without a corner is a union of two rectangles with K * (K + 1) and (K + 1) * K size).

How to compute sum of evenly spaced binomial coefficients

How to find sum of evenly spaced Binomial coefficients modulo M?
ie. (nCa + nCa+r + nCa+2r + nCa+3r + ... + nCa+kr) % M = ?
given: 0 <= a < r, a + kr <= n < a + (k+1)r, n < 105, r < 100
My first attempt was:
int res = 0;
int mod=1000000009;
for (int k = 0; a + r*k <= n; k++) {
res = (res + mod_nCr(n, a+r*k, mod)) % mod;
}
but this is not efficient. So after reading here
and this paper I found out the above sum is equivalent to:
summation[ω-ja * (1 + ωj)n / r], for 0 <= j < r; and ω = ei2π/r is a primitive rth root of unity.
What can be the code to find this sum in Order(r)?
Edit:
n can go upto 105 and r can go upto 100.
Original problem source: https://www.codechef.com/APRIL14/problems/ANUCBC
Editorial for the problem from the contest: https://discuss.codechef.com/t/anucbc-editorial/5113
After revisiting this post 6 years later, I'm unable to recall how I transformed the original problem statement into mine version, nonetheless, I shared the link to the original solution incase anyone wants to have a look at the correct solution approach.
Binomial coefficients are coefficients of the polynomial (1+x)^n. The sum of the coefficients of x^a, x^(a+r), etc. is the coefficient of x^a in (1+x)^n in the ring of polynomials mod x^r-1. Polynomials mod x^r-1 can be specified by an array of coefficients of length r. You can compute (1+x)^n mod (x^r-1, M) by repeated squaring, reducing mod x^r-1 and mod M at each step. This takes about log_2(n)r^2 steps and O(r) space with naive multiplication. It is faster if you use the Fast Fourier Transform to multiply or exponentiate the polynomials.
For example, suppose n=20 and r=5.
(1+x) = {1,1,0,0,0}
(1+x)^2 = {1,2,1,0,0}
(1+x)^4 = {1,4,6,4,1}
(1+x)^8 = {1,8,28,56,70,56,28,8,1}
{1+56,8+28,28+8,56+1,70}
{57,36,36,57,70}
(1+x)^16 = {3249,4104,5400,9090,13380,9144,8289,7980,4900}
{3249+9144,4104+8289,5400+7980,9090+4900,13380}
{12393,12393,13380,13990,13380}
(1+x)^20 = (1+x)^16 (1+x)^4
= {12393,12393,13380,13990,13380}*{1,4,6,4,1}
{12393,61965,137310,191440,211585,203373,149620,67510,13380}
{215766,211585,204820,204820,211585}
This tells you the sums for the 5 possible values of a. For example, for a=1, 211585 = 20c1+20c6+20c11+20c16 = 20+38760+167960+4845.
Something like that, but you have to check a, n and r because I just put anything without regarding about the condition:
#include <complex>
#include <cmath>
#include <iostream>
using namespace std;
int main( void )
{
const int r = 10;
const int a = 2;
const int n = 4;
complex<double> i(0.,1.), res(0., 0.), w;
for( int j(0); j<r; ++j )
{
w = exp( i * 2. * M_PI / (double)r );
res += pow( w, -j * a ) * pow( 1. + pow( w, j ), n ) / (double)r;
}
return 0;
}
the mod operation is expensive, try avoiding it as much as possible
uint64_t res = 0;
int mod=1000000009;
for (int k = 0; a + r*k <= n; k++) {
res += mod_nCr(n, a+r*k, mod);
if(res > mod)
res %= mod;
}
I did not test this code
I don't know if you reached something or not in this question, but the key to implementing this formula is to actually figure out that w^i are independent and therefore can form a ring. In simpler terms you should think of implement
(1+x)^n%(x^r-1) or finding out (1+x)^n in the ring Z[x]/(x^r-1)
If confused I will give you an easy implementation right now.
make a vector of size r . O(r) space + O(r) time
initialization this vector with zeros every where O(r) space +O(r) time
make the first two elements of that vector 1 O(1)
calculate (x+1)^n using the fast exponentiation method. each multiplication takes O(r^2) and there are log n multiplications therefore O(r^2 log(n) )
return first element of the vector.O(1)
Complexity
O(r^2 log(n) ) time and O(r) space.
this r^2 can be reduced to r log(r) using fourier transform.
How is the multiplication done, this is regular polynomial multiplication with mod in the power
vector p1(r,0);
vector p2(r,0);
p1[0]=p1[1]=1;
p2[0]=p2[1]=1;
now we want to do the multiplication
vector res(r,0);
for(int i=0;i<r;i++)
{
for(int j=0;j<r;j++)
{
res[(i+j)%r]+=(p1[i]*p2[j]);
}
}
return res[0];
I have implemented this part before, if you are still cofused about something let me know. I would prefer that you implement the code yourself, but if you need the code let me know.

How do you multiply a matrix by itself?

This is what i have so far but I do not think it is right.
for (int i = 0 ; i < 5; i++)
{
for (int j = 0; j < 5; j++)
{
matrix[i][j] += matrix[i][j] * matrix[i][j];
}
}
Suggestion: if it's not a homework don't write your own linear algebra routines, use any of the many peer reviewed libraries that are out there.
Now, about your code, if you want to do a term by term product, then you're doing it wrong, what you're doing is assigning to each value it's square plus the original value (n*n+n or (1+n)*n, whatever you like best)
But if you want to do an authentic matrix multiplication in the algebraic sense, remember that you had to do the scalar product of the first matrix rows by the second matrix columns (or the other way, I'm not very sure now)... something like:
for i in rows:
for j in cols:
result(i,j)=m(i,:)·m(:,j)
and the scalar product "·"
v·w = sum(v(i)*w(i)) for all i in the range of the indices.
Of course, with this method you cannot do the product in place, because you'll need the values that you're overwriting in the next steps.
Also, explaining a little bit further Tyler McHenry's comment, as a consecuence of having to multiply rows by columns, the "inner dimensions" (I'm not sure if that's the correct terminology) of the matrices must match (if A is m x n, B is n x o and A*C is m x o), so in your case, a matrix can be squared only if it's square (he he he).
And if you just want to play a little bit with matrices, then you can try Octave, for example; squaring a matrix is as easy as M*M or M**2.
I don't think you can multiply a matrix by itself in-place.
for (i = 0; i < 5; i++) {
for (j = 0; j < 5; j++) {
product[i][j] = 0;
for (k = 0; k < 5; k++) {
product[i][j] += matrix[i][k] * matrix[k][j];
}
}
}
Even if you use a less naïve matrix multiplication (i.e. something other than this O(n3) algorithm), you still need extra storage.
That's not any matrix multiplication definition I've ever seen. The standard definition is
for (i = 1 to m)
for (j = 1 to n)
result(i, j) = 0
for (k = 1 to s)
result(i, j) += a(i, k) * b(k, j)
to give the algorithm in a sort of pseudocode. In this case, a is a m x s matrix and b is an s x n, the result is a m x n, and subscripts begin with 1..
Note that multiplying a matrix in place is going to get the wrong answer, since you're going to be overwriting values before using them.
It's been too long since I've done matrix math (and I only did a little bit of it, on top), but the += operator takes the value of matrix[i][j] and adds to it the value of matrix[i][j] * matrix[i][j], which I don't think is what you want to do.
Well it looks like what it's doing is squaring the row/column, then adding it to the row/column. Is that what you want it to do? If not, then change it.