I'm trying to build a generalized batch normalization function in Tensorflow.
I learn batch normalization in this article that i found very kind.
I have a problem with the dimensions of the scale and beta variables: In my case batch normalization is applied to each activations of each convolutional layer, thus if i have as output of the convolutional layer a tersor with size:
[57,57,96]
i need that scale and beta have same dimension as the convolutional layer output, correct?
here's my function, the program works but i don't know if is correct
def batch_normalization_layer(batch):
# Calculate batch mean and variance
batch_mean, batch_var = tf.nn.moments(batch, axes=[0, 1, 2])
# Apply the initial batch normalizing transform
scale = tf.Variable(tf.ones([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
beta = tf.Variable(tf.zeros([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
normalized_batch = tf.nn.batch_normalization(batch, batch_mean, batch_var, beta, scale, 0.0001)
return normalized_batch
from the documentation of tf.nn.batch_normalization:
mean, variance, offset and scale are all expected to be of one of two
shapes:
In all generality, they can have the same number of dimensions as the
input x, with identical sizes as x for the dimensions that are not
normalized over (the 'depth' dimension(s)), and dimension 1 for the
others which are being normalized over. mean and variance in this case
would typically be the outputs of tf.nn.moments(..., keep_dims=True)
during training, or running averages thereof during inference.
In the
common case where the 'depth' dimension is the last dimension in the
input tensor x, they may be one dimensional tensors of the same size
as the 'depth' dimension. This is the case for example for the common
[batch, depth] layout of fully-connected layers, and [batch, height,
width, depth] for convolutions. mean and variance in this case would
typically be the outputs of tf.nn.moments(..., keep_dims=False) during
training, or running averages thereof during inference.
With your values (scale=1.0 and offset=0) you can also just provide the value None.
Related
The title pretty much sums it, I'm trying to implement a GAN:
How can I create a tensor of batch batch_size of uniformly distributed values between -1 and 1 with pytorch?
def create_latent_batch_vectors(batch_size, latent_vector_size, device):
'''
The function creates a random batch of latent vectors with random values
distributed uniformly between -1 and 1.
Finally, it moves the tensor to the given ```device``` (cpu or gpu).
The output should have a shape of [batch_size, latent_vector_size].
'''
# maybe torch.distributions.uniform.Uniform() somehow?
return z.to(device)
Thanks!
Let us first define an uniform distribution with a low-range as -1 and high-range as +1
dist = torch.distributions.uniform.Uniform(-1,1)
sample_shape = torch.Size([2])
dist.sample(sample_shape)
>tensor([0.7628, 0.3497])
This is a tensor of shape 2 (sample_shape). It doesn't have batch_shape. Let's check:
dist.batch_shape
>torch.Size([])
Now let's use expand. It essentially creates a new distribution instance by expanding the batch_shape.
new_batch_shape = torch.Size([5]) # batch_shape of [5]
expanded_dist = dist.expand(new_batch_shape)
Check:
expanded_dist.batch_shape
>torch.Size([5])
Creating a tensor of shape [batch_size, sample_shape]
expanded_dist.sample(sample_shape)
>tensor([[0.1592, 0.3404, 0.3520, 0.3038, 0.0393],
[0.9368, 0.0108, 0.5836, 0.6156, 0.6704]])
The three types of shapes are defined as follows:
Sample shape describes independent, identically distributed draws from the distribution.
Batch shape describes independent, not identically distributed draws. Namely, we may have a set of (different)
parameterizations to the same distribution. This enables the common
use case in machine learning of a batch of examples, each modeled by
its own distribution.
Event shape describes the shape of a single draw (event space) from the distribution; it may be dependent across dimensions.
I have some raw images to debayer then apply colour corrections/transforms to. I use OpenCV and C++, and for the image sensor used the linear matrix coefficients are:
1.32 -0.46 0.14
-0.36 1.25 0.11
0.08 -1.96 1.88
I am not sure how to apply these to the image. It's not clear to me what I am supposed to do with them and why.
Can anyone explain what these colour reproduction or colour matrix values are, and how to use them to process an image?
Thank you!
Your question is not clear because it seems you also don't know what to do.
"what I am supposed to do with them"
First thing coming to my mind, you can convolve image with that matrix by using filter2D. According to documentation filter2D:
Convolves an image with the kernel.
The function applies an arbitrary linear filter to an image. In-place
operation is supported. When the aperture is partially outside the
image, the function interpolates outlier pixel values according to the
specified border mode.
Here is the example code snippet hpw tp use it:
Mat output;
Mat kernelMatrix = (Mat_<double>(3, 3) << 1.32, -0.46, 0.14,
-0.36, 1.25, 0.11,
0.08, -1.96, 1.88);
filter2D(rawImage, output, -1, kernelMatrix);
Before debayering you have an array B (-ayer) of MxN filtered "graylevel" values. They are physically filtered in the sense that the the number of photons measured by each one of them is affected by the color filter on top of each sensor site.
After debayering you have an array C (-olor) of MxNx3 BGR values, obtained by (essentially) reindexing the B array. However, each of the 3 values at a (row, col) image location represents 3 physical measurements. This is not the final image because we still need to "convert" the physical measurements to numbers that are representative of color channels as perceived by a human (or, more generally, by the intended user, which could also be some kind of image processing software). That is, the physical values need to be mapped to a color space.
The 3x3 "color correction" matrix you have represents one possible mapping - a simple linear one. You need to apply it in turn to each BGR triple at all (row, col) pixel locations. For example (in python/numpy/cv2):
import numpy as np
def colorCorrect(img, M):
"""Applies a color correction M to a BGR image img"""
rows, cols, depth = img.shape
assert depth == 3
assert M.shape == (3, 3)
img_corr = np.zeros((rows, cols, 3), dtype=img.dtype)
for r in range(rows):
for c in range(cols):
img_corr[r, c, :] = M.dot(img[r, c, :])
return img_corr
I have a Linear() layer in Pytorch after a few Conv() layers. All the images in my dataset are black and white. However most of the images in my test set are of a different dimension than the images in my training set. Apart from resizing the images themselves, is there any way to define the Linear() layer in such a way that it takes a variable input dimension? For example something similar to view(-1)
Well, it doesn't make sense to have a Linear() layer with a variable input size. Because in fact it's a learnable matrix of shape [n_in, n_out]. And matrix multiplication is not defined for inputs if theirs feature dimension != n_in
What you can do is to apply pooling from functional API. You'll need to specify kernel_size and stride such that resulting output will have feature dimension size = n_in.
I am trying to understand unpooling in Pytorch because I want to build a convolutional auto-encoder.
I have the following code
from torch.autograd import Variable
data = Variable(torch.rand(1, 73, 480))
pool_t = nn.MaxPool2d(2, 2, return_indices=True)
unpool_t = nn.MaxUnpool2d(2, 2)
out, indices1 = pool_t(data)
out = unpool_t(out, indices1)
But I am constantly getting this error on the last line (unpooling).
IndexError: tuple index out of range
Although the data is simulated in this example, the input has to be of that shape because of the preprocessing that has to be done.
I am fairly new to convolutional networks, but I have even tried using a ReLU and convolutional 2D layer before the pooling however, the indices always seem to be incorrect when unpooling for this shape.
Your data is 1D and you are using 2D pooling and unpooling operations.
PyTorch interpret the first two dimensions of tensors as "batch dimension" and "channel"/"feature space" dimension. The rest of the dimensions are treated as spatial dimensions.
So, in your example, data is 3D tensor of size (1, 73, 480) and is interpret by pytorch as a single batch ("batch dimension" = 1) with 73 channels per sample and 480 samples.
For some reason MaxPool2d works for you and treats the channel dimension as a spatial dimension and sample this as well - I'm not sure this is a bug or a feature.
If you do want to sample along the second dimension you can add an additional dimension, making data a 4D tensor:
out, indices1 = pool_t(data[None,...])
In [11]: out = unpool_t(out, indices1, data[None,...].size())
Some details about my problem:
I'm trying to realize corner detector in openCV (another algorithm, that are built-in: Canny, Harris, etc).
I've got a matrix filled with the response values. The biggest response value is - the biggest probability of corner detected is.
I have a problem, that in neighborhood of a point there are few corners detected (but there is only one). I need to reduce number of false-detected corners.
Exact problem:
I need to walk through the matrix with a kernel, calculate maximum value of every kernel, leave max value, but others values in kernel make equal zero.
Are there build-in openCV functions to do this?
This is how I would do it:
Create a kernel, it defines a pixels neighbourhood.
Create a new image by dilating your image using this kernel. This dilated image contains the maximum neighbourhood value for every point.
Do an equality comparison between these two arrays. Wherever they are equal is a valid neighbourhood maximum, and is set to 255 in the comparison array.
Multiply the comparison array, and the original array together (scaling appropriately).
This is your final array, containing only neighbourhood maxima.
This is illustrated by these zoomed in images:
9 pixel by 9 pixel original image:
After processing with a 5 by 5 pixel kernel, only the local neighbourhood maxima remain (ie. maxima seperated by more than 2 pixels from a pixel with a greater value):
There is one caveat. If two nearby maxima have the same value then they will both be present in the final image.
Here is some Python code that does it, it should be very easy to convert to c++:
import cv
im = cv.LoadImage('fish2.png',cv.CV_LOAD_IMAGE_GRAYSCALE)
maxed = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
comp = cv.CreateImage((im.width, im.height), cv.IPL_DEPTH_8U, 1)
#Create a 5*5 kernel anchored at 2,2
kernel = cv.CreateStructuringElementEx(5, 5, 2, 2, cv.CV_SHAPE_RECT)
cv.Dilate(im, maxed, element=kernel, iterations=1)
cv.Cmp(im, maxed, comp, cv.CV_CMP_EQ)
cv.Mul(im, comp, im, 1/255.0)
cv.ShowImage("local max only", im)
cv.WaitKey(0)
I didn't realise until now, but this is what #sansuiso suggested in his/her answer.
This is possibly better illustrated with this image, before:
after processing with a 5 by 5 kernel:
solid regions are due to the shared local maxima values.
I would suggest an original 2-step procedure (there may exist more efficient approaches), that uses opencv built-in functions :
Step 1 : morphological dilation with a square kernel (corresponding to your neighborhood). This step gives you another image, after replacing each pixel value by the maximum value inside the kernel.
Step 2 : test if the cornerness value of each pixel of the original response image is equal to the max value given by the dilation step. If not, then obviously there exists a better corner in the neighborhood.
If you are looking for some built-in functionality, FilterEngine will help you make a custom filter (kernel).
http://docs.opencv.org/modules/imgproc/doc/filtering.html#filterengine
Also, I would recommend some kind of noise reduction, usually blur, before all processing. That is unless you really want the image raw.