I'm trying to build an array / matrix from a command given through stdin. The command is formatted like this:
nameOfArray build numberOfDimensions : dimensionList : valueList
Another example:
B build 1 : 3 : 4,5,6
The command needs to work for up to three dimensions, and I am completely stumped as to how to implement it.
Since we are limited to three dimensions, the problem is easy. We simply treat all the cases as the 3 dimensional case, with height and depth set to one for the lower dimensions.
So we set up the array with malloc() or std::vector::resize() width * height * depth, then read the values in one by one. In C, the job is done. In C++, you might then need to fiddle about to turn your vector into a multi-dimensional matrix class with a nice interface.
Related
I am doing a batch execution of high number of 3x3 matrices with CUDA.
The goal is to get a big matrix of 3x3 matrix (so I use a 4D array).
I have done previously the same operation with numpy.linalg.inv function. With this way, I can directly get an array of 3x3 matrix : I show you the code that performs this operation.
Now, with CUDA version, I would like to reshape in a minimum of instructions the big 1D array produced : so I have to build a (N,N,3,3) array from a (NN3*3) 1D array.
For the moment, I can do this reshape into 2 steps (here the code below).
The original version with classical numpy.linalg.inv is carried out by:
for r_p in range(N):
for s_p in range(N):
# original version (without GPU)
invCrossMatrix[:,:,r_p,s_p] = np.linalg.inv(arrayFullCross_vec[:,:,r_p,s_p])
invCrossMatrix represents a (3,3,N,N) array and I get it directly from the (3,3,N,N) arrayFullCross array (dimBlocks = 3)
For the moment, when I use GPU batch execution, I start from the 1D array :
# Declaration of inverse cross matrix
invCrossMatrix_temp = np.zeros((N**2,3,3))
# Create arrayFullCross_vec array
arrayFullCross_vec = np.zeros((3,3,N,N))
# Create arrayFullCross_vec array
invCrossMatrix_gpu = np.zeros((3*3*(N**2)))
# Build observables covariance matrix
arrayFullCross_vec = buildObsCovarianceMatrix3_vec(k_ref, mu_ref, ir)
## Performing batch inversion 3x3 :
invCrossMatrix_gpu = gpuinv3x3(arrayFullCross_vec.flatten('F'),N**2)
## First reshape
invCrossMatrix_temp = invCrossMatrix_gpu.reshape(N**2,3,3)
# Second reshape : don't forget ".T" transpose operator
invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T
Question 1) Why the -F option into flatten('F') is necessary?
If I do only gpuinv3x3(arrayFullCross_vec.flatten,N**2), the code doesn't work. Python is maybe column major like Fortran ?
Question 2) Now, I would like to convert the following block:
## First reshape
invCrossMatrix_temp = invCrossMatrix_gpu.reshape(N**2,3,3)
# Second reshape : don't forget ".T" transpose operator
invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T
into a single reshape instruction. Is it possible?
The issue is about to convert the 1D array invCrossMatrix_gpu(N**2 * 3 *3) directly into a (3,3,N,N) array.
I expect to reshape the original 1D array in one only time since I call these routines a lot of times.
Update
Is to right to say that array inVCrossMatrix defined by:
invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T
has dimensions (3,3,N,N).
#hpaulj: Is it equivalent to do this?
invCrossMatrix =(invCrossMatrix_temp.reshape(N,N,3,3)).transpose(2,3,0,1)
l have a numpy variable called rnn1 of dimension(37,512)
n, bins, patches = plt.hist(rnn1, histtype='stepfilled')
l got the following histogram shape
To what the different colors refer ?
What is the difference between n and patches
As the documentation of hist() states: input x can be an array of shape (n,) or a sequence of (n,) arrays. Since you are passing an array of shape (37,512), matplotlib interprets this as a sequence of 512 different (37,)-long arrays. It therefore draws 512 histogram, each with a different color. I'm guessing that's not actually what you were trying to achieve, but that's outside the scope of your question.
The returned value n is a list of 512 arrays, each containing the height of each of the bars in your histograms.
The returned object patch is a list of 512 lists of patches, which are the actual graphical elements that compose the figure.
so I have this class:
class Piece{
int width;
int height;
}
my problem is that I need to make a container type class that somehow can save the layout of multiple and different size "Piece" objects (note that Piece can only represent rectangles).
Example:
________
| t |
| t jj |
| t jj |
_________
My goal with this is to be able to "fill" a empty rectangle with multiple "Piece" objects but with the ability to know if the "Piece" can fit in.
I'm developing this in C++. I started with the most logic solution I think that was to use a "matrix" of vectors (vector< vector< Piece * > > mat) but that doesn't work because as I said "Piece" objects can have different sizes.
I hope you can give some hints on how to get a solution for this or if it exists some lib or open-sorce project links.
Thank you.
EDIT
I forgot this:I know beforehand the dimensions of the container and the insertion (after validation) is sequential (Piece after Piece) with no predefined orientation.
You can use Piece p[width][height] and use memset to make all zeros or use a std::vector if you don't know the size of the grid beforehand. Then you can check(while adding a new Piece at some position (x, y)) if on any of its subsquares there is some other Piece already.
Edit: You can use a matrix char mem[sqrt(width)][sqrt(height)]; and a one vector of Pieces. Then using the matrix if there will be a probable collision and if not, just add the Piece. Else you iterate through all the existing ones and check for a collision.
If you want to make the procedure faster( this one is reasonable only with small grids ), then you will need to use more "advanced" data structures. What I suggest you is to learn about 2D BIT(or fenwick) trees(there are a lot of resources on google). You can also use 2D segment trees. Then when adding a new Piece at position (x, y) check the sum of all squares in it(e.g from (x, y) to (x + width, y + height)). If that sum is zero then the new Piece won't collide with previous ones and you then update the grid as you add 1 to all squares in your Piece(I mean to the corresponding values in the 2D segment tree). Else if the sum is greater than zero it means that there will be some overlap and you must then discard the new Piece.
I am working on an algorithm with many computations done on a GPU. I'm working mainly with oclMat structures and am trying to avoid copying from CPU to GPU and vice versa, yet I cannot find an easy way to:
compare all elements in an ocl matrix to a specific single value (be it float or double, for instance) and create a logical matrix in accordance
create an oclMat matrix with a given size and type initialized with all elements to a specific value (for example all elements are float and equal to 1.234567)
For example:
cv::ocl::oclMat M1 =...
// DO STUFF WITH M1
cv::ocl::oclMat logicalM1 = M1>1.55; // compare directly to a single value
cv::ocl::oclMat logicalM2 = ... ; // i.e. I want a 100x100 CV_32FC1 matrix with all elements set to be equal to 1.234567
By reading the documentation, it seems using cv::ocl::compare only works with both matrices the same dimensions and type, so maybe my first request isn't feasible. On the other hand, I don't know how to initialize a specific matrix directly in ocl (with cv::Mat I know how it's done).
I assume an easy workaround exists, but haven't found one yet... Thanks!
You are right. Looks like cv::ocl::compare supports only two cv::oclMat on input.
But you can create oclMat filled with specific value as follows:
cv::ocl::oclMat logicalM2(M1.size(), M1.type);
logicalM2.setTo(cv::Scalar(1.234567));
cv::ocl::oclMat logicalM1;
cv::ocl::compare(M1, logicalM2, logicalM1, cv::CMP_GT);
P.S. Also I suggest you trying new OpenCV 3.0 with Transparent-API which makes processing on GPU using OpenCL much easier.
I am trying to do a 2D Real To Complex FFT using CUFFT.
I realize that I will do this and get W/2+1 complex values back (W being the "width" of my H*W matrix).
The question is - what if I want to build out a full H*W version of this matrix after the transform - how do I go about copying some values from the H*(w/2+1) result matrix back to a full size matrix to get both parts and the DC value in the right place
Thanks
I'm not familiar with CUDA, so take that into consideration when reading my response. I am familiar with FFTs and signal processing in general, though.
It sounds like you start out with an H (rows) x W (cols) matrix, and that you are doing a 2D FFT that essentially does an FFT on each row, and you end up with an H x W/2+1 matrix. A W-wide FFT returns W values, but the CUDA function only returns W/2+1 because real data is even in the frequency domain, so the negative frequency data is redundant.
So, if you want to reproduce the missing W/2-1 points, simply mirror the positive frequency. For instance, if one of the rows is as follows:
Index Data
0 12 + i
1 5 + 2i
2 6
3 2 - 3i
...
The 0 index is your DC power, the 1 index is the lowest positive frequency bin, and so forth. You would thus make your closest-to-DC negative frequency bin 5+2i, the next closest 6, and so on. Where you put those values in the array is up to you. I would do it the way Matlab does it, with the negative frequency data after the positive frequency data.
I hope that makes sense.
There are two ways this can be acheived. You will have to write your own kernel to acheive either of this.
1) You will need to perform conjugate on the (half) data you get to find the other half.
2) Since you want full results anyway, it would be best if you convert the input data from real to complex (by padding with 0 imaginary) and performing the complex to complex transform.
From practice I have noticed that there is not much of a difference in speed either way.
I actually searched the nVidia forums and found a kernel that someone had written that did just what I was asking. That is what I used. if you search the cuda forum for "redundant results fft" or similar you will find it.