I have a mask with a certain numbers of rows and columns.
I would like to read just the (i,j) components that contains for example 1.
Then once the (i,j) component with 1 is found perform four different action if any of the (i+1,j),(i-i,j),(i,j+1),(i,j-1) equal to 1 is found.
Hope this is clear.
Any advice are welcome.
Thanks.
Let your mask be something like this:
integer, dimension(1:r,1:c) :: mask
Then the following fragment should point you in the right direction:
forall (i=1:r, j=1:c, mask(i,j)==1)
if ( mask(i-1,j)==1) then
! do the right thing
else if ( mask(i+1,j)==1) then
! I hope you get the picture now
...
I'll leave it to you to deal with the edge cases where +/- i/j step outside the array bounds. forall is not necessarily the fastest construction to use, and we could debate the elegance and readability of using it rather than a sequence of loops, but let's not.
Related
From my understanding of fft functions (eg from questions like this one)
Assumming 1D fft, given N points of real data, I'll get a double sided fft of length N (but complex) + 1 for a zeroth frequency. If I take that same fft output, and run an ifft on it, I'll get N real values, and in the ideal case, this will exactly match the original input to the fft.
In cufft, this appears to be much different.
According to Nvidia, giving N real components will result in N2 + 1 complex components for a fft, and N2+1 complex components will result in N real components.
see here (R = real, C = complex, 2 = to):
Note that I recognize that half of the complex components are essentially duplicated (but conjugate and reversed) and thus not necessary for the input out output values to retain all the date necessary for reconstruction, but that doesn't explain anything about how Nvidia claims the input and output data length of the fft should be structured, cufft input and output length is doing the opposite of what I would have expected from accounting for this scenario.
What you're looking at here is your browser not being able to properly render MathML content. The same table rendered in Firefox 66.0.2 seems to show what you'd expect:
I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/
I've Googled and found zero answers for "safety wall", so I'm pretty sure that's not the correct term. I'll explain myself:
As I've read, I'm talking about taking a two dimensional array and placing it in a same array with an addition of one cell to each side to make sure staying safe and not getting out the limits I've created.
What is the right term for this technique and how would I use it?
Like others told, you need to search it "sentinel" or something like "sentinel control"..
You can use sentinel control when you dont know size or limits of your program. For example, you are writting a program, which is calculating avarage grade of class. However you dont know how many student are in class. Or you inserting array which you dont know limits. Then you can use sentinel control for this job.
Lets look this example,
int grade;
int totalgrade = 0;
int studentCount = 0;
std::cin >> grade;
while (grade != -1)
{
totalgrade = totalgrade + grade;
studentCount ++;
std::cin >> grade;
} // loop until user enter -1
So if you dont know how many values will be entered from user, you can use sentinel control for this job. You can also read more about sentinel value.
These are usually referred to as "ghost cells", and are often used in numerical simulations or image processing where you are applying a kernel (such as a smoothing or difference operator) to an array. They allow you apply the kernel without special casing the edges.
For example; suppose you want to smooth out an image - you could use a kernel like:
0.0 0.1 0.0
0.1 0.6 0.1
0.0 0.1 0.0
You apply this by taking the source image, and for every pixel, you compute the value of the destination pixel by centering the kernel on the source pixel and adding up the weighted contributions of the 9 covered pixel (0.6 * the value of the source pixel, plus 0.1 times the value of each of the pixels above, below, and to the sides). Do this for every pixel and you'll end up with a smoothed version of your original image.
This works well, but the question is "what do you do at the border cells?" Rather than having complicated if/then logic for the border cases (which can be tricky and can degrade performance), you can just add 1 layer of ghost cells to each side.
Of course, you have to pick values for the cells before you run your algorithm. How you pick their value depends on your algorithm. You might choose to set them all to zero, but in the case of the smoothing kernel, this will darken your image at it's borders, so that's probably not what you want. A better plan would be to fill the ghost cells with the value of the nearest non-ghost cell.
You also need to figure out how many ghost cells you need, which depends on the size of your kernel. For a 3x3 kernel like above, you need 1 layer of ghost cells (to take care of the part of the kernel that might "hang off" the edge). More complicated kernels might require more (a 5x5 kernel would require 2 layers, etc).
You can google "ghost cell computation" to find out more (add 'computation' or you'll get a lot of biology results!)
hell-o guys!
well, I'm playing with random walks. Midpoint displacement gives some nice results, but I would like a random walk without walk loops, like the ones (in yellow) on this screen-hot :
My first idea to deal with that problem is to check for each segment if there is an intersection with all others segments, then to delete the walk loop between the both segments and bind at the interesection point. But for some walks, it would give a strange result, like that one :
where the yellow part is a loop, and we can see that a big part of the walk would be deleted if I do what I said.
Maybe another method would be to check, when the displacement of the midpoint is made, if the segments are interesecting. In case of there is an intersection, get another displacment. But it looks to become very time consuming quickly when the number of subdivisions rises...
So I would like to know if there is a way to avoid these loops
so... it's seems playing with the amplitudes of the random numbers is a good way to avoid overlaps :
the path without displacement is drawn in cyan. I didn't get overlaps with these displacments :
do{
dx = (D>0)? 0.5*sqrt((double)(rand()%D)) - sqrt((double)D)/2. : 0 ;
dz = (D>0)? 0.5*sqrt((double)(rand()%D)) - sqrt((double)D)/2. : 0 ;
}while(dx*dx+dz*dz>D);
where D is the squared distance between the two neibourers of the point we want to displace. The (D>0)? is needed to avoid some Floating Point Exception.
I am trying to do a 2D Real To Complex FFT using CUFFT.
I realize that I will do this and get W/2+1 complex values back (W being the "width" of my H*W matrix).
The question is - what if I want to build out a full H*W version of this matrix after the transform - how do I go about copying some values from the H*(w/2+1) result matrix back to a full size matrix to get both parts and the DC value in the right place
Thanks
I'm not familiar with CUDA, so take that into consideration when reading my response. I am familiar with FFTs and signal processing in general, though.
It sounds like you start out with an H (rows) x W (cols) matrix, and that you are doing a 2D FFT that essentially does an FFT on each row, and you end up with an H x W/2+1 matrix. A W-wide FFT returns W values, but the CUDA function only returns W/2+1 because real data is even in the frequency domain, so the negative frequency data is redundant.
So, if you want to reproduce the missing W/2-1 points, simply mirror the positive frequency. For instance, if one of the rows is as follows:
Index Data
0 12 + i
1 5 + 2i
2 6
3 2 - 3i
...
The 0 index is your DC power, the 1 index is the lowest positive frequency bin, and so forth. You would thus make your closest-to-DC negative frequency bin 5+2i, the next closest 6, and so on. Where you put those values in the array is up to you. I would do it the way Matlab does it, with the negative frequency data after the positive frequency data.
I hope that makes sense.
There are two ways this can be acheived. You will have to write your own kernel to acheive either of this.
1) You will need to perform conjugate on the (half) data you get to find the other half.
2) Since you want full results anyway, it would be best if you convert the input data from real to complex (by padding with 0 imaginary) and performing the complex to complex transform.
From practice I have noticed that there is not much of a difference in speed either way.
I actually searched the nVidia forums and found a kernel that someone had written that did just what I was asking. That is what I used. if you search the cuda forum for "redundant results fft" or similar you will find it.