FFT Multiple 1d transforms using FFTW - fortran

I have a 3-dimensional array U(z,y,x) and I want to perform a complex Fourier transform in z for all values of y and x. I am planning to use the FFTW library. I figured out from the FFTW manual that there is a way to perform multiple 1d transforms at once(mentioned below).
CALL dfftw_plan_many_dft(PLAN, rank, n, howmany, in, inembed, istride, idist, out, onembed, ostride, odist, FFTW_MEASURE)
I don't clearly understand what inembed and outembed means. Could you provide more insight into this as I am new to Fortran and I am not entirely sure how to use this?
EDIT1: updated the Fortran code

It's described here actually quite well:
http://www.fftw.org/fftw3_doc/Advanced-Complex-DFTs.html
inembed and outembed allow one to embed the incoming and outgoing data into a larger dataset:
Imagine that you would like to FFT the sub-matrix of in denoted by the O elements. And possibly outembed the result into the out variable's O fields.
X X X X X X X X X X X
X X X X X X O O X X X
in = X O O X X out = X O O X X X
X O O X X
X X X X X
inembed then would be [2, 1] (column-major) and outmbed [1, 1]. Then stride would take you from slice to slice / volume to volume etc. Using stride and embed you tell FFTW, how to find the O elements for each sub-data to transform and equally, where to put them in a larger dataset.
Hope this explains it. If you already now the the BLAS interface, you will find that inembed and outembed correspond to LDA, LDB of many routines. Of course BLAS routines are limited to matrices, i.e. assume 2 dimensional operations. FFTs you may of course do in as many dimensions as you like.
If you set inembed and outembed as NULL, then FFTW assumes that there are no X fields in either input our output respectively.

Related

Using LAPACK's DORMQR with non-square Q

I want use LAPACK to calculate Q * x and Q^T * x, where Q comes from the reduced QR factorization of an m by n matrix A (m > n), stored in the form of Householder reflectors and a vector tau, as obtained from DGEQRF and x is a vector of length n in the case of Q * x and length m in the case of Q^T * x.
The documentation of DORMQR states that x is overwritten with the result, which already confuses me, since x and Q * x obviuosly have different dimensions if the original matrix A and subsequently its reduced Q are not square. Furthermore it states that
"Q is of order M if SIDE = 'L' and of order N if SIDE = 'R'."
In my case, only the first half applies and M refers to the length of x. What do they mean by order? I have rarely ever heard the term "order" in the context of non-square matrices, and if so, it would be something like m by n, and not just a single number. Do they mean rank?
Can I even use DORMQR to calculate both Q * x and Q^T * x for a non-square Q, or is it not designed for this? Do I need to pad x with zeros?
DORMQR applies only to Q a square matrix. Although the input A to the procedure relates to elementary reflectors, such as output of DGEQRF which can be more general, the documentation has the additional restriction that Q "is a real orthogonal matrix".
Of course, to be orthogonal, Q must be square.

Prolog get squares of NxN matrix

I have a list L = [[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]] Ii. That represents my matrix. The size can change dynamic, so the blocksize can be different, 4x4 = 4 elements, 9x9= 9 elements
I want to obtain the 4 squares that compose the List.(In this case it's a matrix 4 by 4). If I have that matrix:
5 6 7 8
10 11 12 13
1 2 3 4
14 15 16 17
The result should be:
R = [5,6,10,11],[7,8,12,13],[1,2,14,15],[3,4,16,17].
Any suggestions are welcomed. Thanks
The first thing you need is really a lever for turning a list of lists into a matrix. What distinguishes a 2-dimensional matrix from a list of lists? The idea of a coordinate system. So you need a way to relate a coordinate pair with the corresponding value in the matrix.
at(Matrix, X, Y, V) :- nth0(X, Matrix, Row), nth0(Y, Row, V).
This predicate makes it possible to index the matrix at (X,Y) and get the value V. This turns out to be, IMO, a massive demonstration of what makes Prolog powerful, because once you have this one, simple predicate, you gain:
The ability to obtain the value at the point supplied:
?- at([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], 1,3, V).
V = 13.
The ability to iterate the entire matrix (only instantiate Matrix and leave the other arguments as variables):
?- at([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], X,Y, V).
X = Y, Y = 0,
V = 5 ;
X = 0,
Y = 1,
V = 6 ;
...
X = 3,
Y = 2,
V = 16 ;
X = Y, Y = 3,
V = 17.
The ability to search the matrix for values:
?- at([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], X,Y, 14).
X = 3,
Y = 0 ;
false.
So this is a pretty useful lever! In a conventional lanugage, you'd need three different functions to do all these things, but this is different, because in Prolog we just have to define the relationship between things (in this case, a data structure and a coordinate pair) and Prolog can do quite a bit of the heavy lifting.
It's easy to see how we could produce a particular submatrix now, by just defining the sets of X and Y values we'd like to see. For instance, to get the upper-left matrix we would do this:
?- between(0,1,X), between(0,1,Y),
at([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], X,Y, V).
X = Y, Y = 0,
V = 5 ;
X = 0,
Y = 1,
V = 6 ;
X = 1,
Y = 0,
V = 10 ;
X = Y, Y = 1,
V = 11.
We can of course use findall/3 to gather up the solutions in one place:
?- findall(V, (between(0,1,X), between(0,1,Y),
at([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], X,Y, V)),
Vs).
Vs = [5, 6, 10, 11].
What's left for your problem is basically some arithmetic. Let's see if we have a square matrix:
square_matrix(M, Degree) :-
length(M, Degree),
maplist(length, M, InnerDegrees),
forall(member(I, InnerDegrees), I=Degree).
This is not a perfect predicate, in that it will not generate! But it will tell us whether a matrix is square and if so, what degree it has:
?- square_matrix([[5,6,7,8],[10,11,12,13],[1,2,3,4],[14,15,16,17]], D).
D = 4.
Once you have that, what you have to do is sort of formulaic:
Make sure the degree is a perfect square
Take the square root of the degree. That's how many rows or columns you have (square root 4 = 2, 2 rows and 2 columns, square root 9 = 3, 3 rows and 3 columns).
Make a relationship between the (row,column) coordinate and a list of (x,y) coordinates for the matrix in that location. For instance in the 4x4 matrix, you have four tiles: (0,0), (0,1), (1,0) and (1,1). The coordinates for (0,0) will be (0,0), (0,1), (1,0), (1,1), but the coordinates for (1,1) will be (2,2),(2,3),(3,2),(3,3). If you do a few of these by hand, you'll see it's going to amount to adding an x and y offset to all the permutations from 0 to row/column count (minus one) for both coordinates.
Now that you have that relationship, you need to do the iteration and assemble your output. I think maplist/N will suffice for this.
Hope this helps!

Efficient sparse matrix addition in Armadillo

I am trying to construct a sparse matrix L of the form
L and Hi are respectively a very sparse matrix and row vector. The final L matrix should have a density of around 1 % .
Armadillo provides a arma::sp_mat class that seems to suit my needs. The assembly of L then looks like this
arma::sp_mat L(N,N);
arma::sp_mat Hi(1,N);
for (int i = 0; i < p; ++ i){
// The non-zero terms in Hi are populated here
L += Hi.t() * Hi;
}
The number of non-zero elements in Hi is constant with i. I do not have much experience with sparse matrices but I was expecting the incremental assembly of L to be relatively constant in speed.
Yet, it seems that the speed at which Hi.t() * Hi is added to L decreases over time. Am I doing something wrong in the way I assemble L? Should I preconstruct L by specifying which of its components I know will not be zero?
It seems that L is not initialized so that it effectively changes size when incremented with Hi.t() * Hi. This was likely the cause for the decrease in the speed.

Finding solution set of a Linear equation?

I need to find all possible solutions for this equation:
x+2y = N, x<100000 and y<100000.
given N=10, say.
I'm doing it like this in python:
for x in range(1,100000):
for y in range(1,100000):
if x + 2*y == 10:
print x, y
How should I optimize this for speed? What should I do?
Essentially this is a Language-Agnostic question. A C/C++ answer would also help.
if x+2y = N, then y = (N-x)/2 (supposing N-x is even). You don't need to iterate all over range(1,100000)
like this (for a given N)
if (N % 2): x0 = 1
else: x0 = 0
for x in range(x0, min(x,100000), 2):
print x, (N-x)/2
EDIT:
you have to take care that N-x does not turn negative. That's what min is supposed to do
The answer of Leftris is actually better than mine because these special cases are taken care of in an elegant way
we can iterate over the domain of y and calculate x. Also taking into account that x also has a limited range, we further limit the domain of y as [1, N/2] (as anything over N/2 for y will give negative value for x)
x=N;
for y in range(1,N/2-1):
x = x-2
print x, y
This just loops N/2 times (instead of 50000)
It doesn't even do those expensive multiplications and divisions
This runs in quadratic time. You can reduce it to linear time by rearranging your equation to the form y = .... This allows you to loop over x only, calculate y, and check whether it's an integer.
Lefteris E 's answer is the way to go,
but I do feel y should be in the range [1,N/2] instead of [1,2*N]
Explanation:
x+2*y = N
//replace x with N-2*y
N-2*(y) + 2*y = N
N-2*(N/2) + 2*y = N
2*y = N
//therefore, when x=0, y is maximum, and y = N/2
y = N/2
So now you can do:
for y in range(1,int(N/2)):
x = N - (y<<1)
print x, y
You may try to only examine even numbers for x given N =10;
the reason is that: 2y must be even, therefore, x must be even. This should reduce the total running time to half of examining all x.
If you also require that the answer is natural number, so negative numbers are ruled out. you can then only need to examine numbers that are even between [0,10] for x, since both x and 2y must be not larger than 10 alone.

Multiple condition clause in C/ C++

My aim is to get two numbers and perform an corresponding to the values they hold:
For ex.. Let the vars be x and y.
I need to compute the value of another var z, as follows:
z = x + y // if x = y = 1
z = x - y // if x = 0 and y = 1
Since i need to use it several times, it would not be efficient to use if else within a loop.
What i basically require is like a macro.... preferably using #define., like if it were to be used once:
#define x + y 1 replaces x+y with 1, but does not depend on the values of x and/or y
Is there any way I could replace x+y with 1, x-y with 0 and so on...
You cannot know the values of x and y until runtime, so there is nothing you can do but use an if like you were going to and think of the math that will require the fewest operations (taking into account that 1 = subtraction = addition < multiplication < division usually).
If I misunderstood and you simply want to replace x + y with 1 and x - y with 0, you can just replace them by hand or via find and replace.
If the vars are integers/any other primitive type you can't improve the performance, since the compile will usually translate it one assembly code line. such as sub eax, ebx
Anyhow, as mentioned before, you can't do it on compile time since x,y values are not known at compile time.
you can hint the compiler to save x and y on a register using the register keyword, which will save the variables on the CPU registers in order to preform faster calculations.