Pass variable sized input to Linear layer in Pytorch - computer-vision

I have a Linear() layer in Pytorch after a few Conv() layers. All the images in my dataset are black and white. However most of the images in my test set are of a different dimension than the images in my training set. Apart from resizing the images themselves, is there any way to define the Linear() layer in such a way that it takes a variable input dimension? For example something similar to view(-1)

Well, it doesn't make sense to have a Linear() layer with a variable input size. Because in fact it's a learnable matrix of shape [n_in, n_out]. And matrix multiplication is not defined for inputs if theirs feature dimension != n_in
What you can do is to apply pooling from functional API. You'll need to specify kernel_size and stride such that resulting output will have feature dimension size = n_in.

Related

How to do this specific tensor transformation in Eigen?

I am looking for an idiomatic and efficient solution for this problem:
Let's say I have 3D Tensor where I want to represent an image with 100*100 pixels on 3 color channels,
Eigen::Tensor<int, 3> input(3,100,100);
The output I would like to get could be stored in
Eigen::Tensor<int, 4> output(3,3,100,100);
I would like to project the 3D input into the 4D output in a way that each color channel in the original tensor would have its own individual 3D tensor in the output, where each channel would contain the same values, that is
tensor(0,0,42,42) = tensor(0,1,42,42) = tensor(0,2,42,42)
tensor(0,0,12,12) = tensor(0,1,12,12) = tensor(0,2,12,12)
Illustrated on a picture:
Originally I wanted to solve this method:
Chip the individual color channels.
Broadcast the individual color channels into the size I need,
Reshape the broadcasted result into the desirable format(this is just a 3D Tensor at this point)
Concatenate the individual 3D Tensors into a big 4d one.
I have two problems with this approach.
Firstly, I just can not get the reshaping right, it always gives back a reshaped tensor with the dimensionality I want, but the coefficients get shuffled. I started to experiment with the layout of the Tensors, but it did not seem to help.
Secondly, this seems to be very tedious, I just feel like there should be a more convenient way to achieve this but I could not find any cue about that in the documentation.

Sum elements in a channel in caffe

If I have a 4-D blob, say of size (40,1024,300,1) and I want to average pool across the second channel and generate an output of size (40,1,300,1), how would I do it? I think the reduction layer collapses the whole blob and generates a blob of size (40) by summing elements in all other axises (after 1) also. Is there any work around for this without re-implementing a new layer?
The only easy workaround I found is as follows. Permute your blob to a shape (40,300,1,1024). Use reduction layer to compute the mean with axis = -1 and operation = MEAN. I think the blob will be of shape (40,300,1). You may need to use reshape to append an extra dimension at the end (check if this is needed) and then permute back to shape (40,1,300,1).
You can find an implementation of a Permute layer here or here. I hope this helps.

dimensions in batch normalization

I'm trying to build a generalized batch normalization function in Tensorflow.
I learn batch normalization in this article that i found very kind.
I have a problem with the dimensions of the scale and beta variables: In my case batch normalization is applied to each activations of each convolutional layer, thus if i have as output of the convolutional layer a tersor with size:
[57,57,96]
i need that scale and beta have same dimension as the convolutional layer output, correct?
here's my function, the program works but i don't know if is correct
def batch_normalization_layer(batch):
# Calculate batch mean and variance
batch_mean, batch_var = tf.nn.moments(batch, axes=[0, 1, 2])
# Apply the initial batch normalizing transform
scale = tf.Variable(tf.ones([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
beta = tf.Variable(tf.zeros([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
normalized_batch = tf.nn.batch_normalization(batch, batch_mean, batch_var, beta, scale, 0.0001)
return normalized_batch
from the documentation of tf.nn.batch_normalization:
mean, variance, offset and scale are all expected to be of one of two
shapes:
In all generality, they can have the same number of dimensions as the
input x, with identical sizes as x for the dimensions that are not
normalized over (the 'depth' dimension(s)), and dimension 1 for the
others which are being normalized over. mean and variance in this case
would typically be the outputs of tf.nn.moments(..., keep_dims=True)
during training, or running averages thereof during inference.
In the
common case where the 'depth' dimension is the last dimension in the
input tensor x, they may be one dimensional tensors of the same size
as the 'depth' dimension. This is the case for example for the common
[batch, depth] layout of fully-connected layers, and [batch, height,
width, depth] for convolutions. mean and variance in this case would
typically be the outputs of tf.nn.moments(..., keep_dims=False) during
training, or running averages thereof during inference.
With your values (scale=1.0 and offset=0) you can also just provide the value None.

Armadillo porting imagesc to save image bitmap from matrix

I have this matlab code to display image object after do super spectrogram (stft, couple plca...)
t = z2 *stft_options.hop/stft_options.sr;
f = stft_options.sr*[0:size(spec_t,1)-1]/stft_options.N/1000;
max_val = max(max(db(abs(spec_t))));
imagesc(t, f, db(abs(spec_t)),[max_val-60 max_val]);
And get this result:
I was porting to C++ successfully by using Armadillo lib and get the mat results:
mat f,t,spec_t;
The problem is that I don't have any idea for converting bitmap like imagesc in matlab.
I searched and found this answer, but seems it doesn't work in my case because:
I use a double matrix instead of integer matrix, which can't be mark as bitmap color
The imagesc method take 4 parameters, which has the bounds with vectors x and y
The imagesc method also support scale ( I actually don't know how it work)
Does anyone have any suggestion?
Update: Here is the result of save method in Armadillo. It doesn't look like spectrogram image above. Do I miss something?
spec_t.save("spec_t.png", pgm_binary);
Update 2: save spectrogram with db and abs
mat spec_t_mag = db(abs(spec_t)); // where db method: m = 10 * log10(m);
mag_spec_t.save("mag_spec_t.png", pgm_binary);
And the result:
Armadillo is a linear algebra package, AFAIK it does not provide graphics routines. If you use something like opencv for those then it is really simple.
See this link about opencv's imshow(), and this link on how to use it in a program.
Note that opencv (like most other libraries) uses row-major indexing (x,y) and Armadillo uses column-major (row,column) indexing, as explained here.
For scaling, it's safest to convert to unsigned char yourself. In Armadillo that would be something like:
arma::Mat<unsigned char> mat2=255*(mat-mat.min())/(mat.max()-mat.min());
The t and f variables are for setting the axes, they are not part of the bitmap.
For just writing an image you can use Armadillo. Here is a description on how to write portable grey map (PGM) and portable pixel map (PPM) images. PGM export is only possible for 2D matrices, PPM export only for 3D matrices, where the 3rd dimension (size 3) are the channels for red, green and blue.
The reason your matlab figure looks prettier is because it has a colour map: a mapping of every value 0..255 to a vector [R, G, B] specifying the relative intensity of red, green and blue. A photo has an RGB value at every point:
colormap(gray);
x=imread('onion.png');
imagesc(x);
size(x)
That's the 3rd dimension of the image.
Your matrix is a 2d image, so the most natural way to show it is as grey levels (as happened for your spectrum).
x=mean(x,3);
imagesc(x);
This means that the R, G and B intensities jointly increase with the values in mat. You can put a colour map of different R,G,B combinations in a variable and use that instead, i.e. y=colormap('hot');colormap(y);. The variable y shows the R,G,B combinations for the (rescaled) image values.
It's also possible to make your own colour map (in matlab you can specify 64 R, G, and B combinations with values between 0 and 1):
z[63:-1:0; 1:2:63 63:-2:0; 0:63]'/63
colormap(z);
Now for increasing image values, red intensities decrease (starting from the maximum level), green intensities quickly increase then decrease, and blue values increase from minuimum to maximum.
Because PPM appears (I don't know the format) not to support colour maps, you need to specify the R,G,B values in a 3D array. For a colour order similar to z you would neet to make a Cube<unsigned char> c(ysize, xsize, 3) and then for every pixel y, x in mat2, do:
c(y,x,0) = 255-mat2(y,x);
c(y,x,1) = 255-abs(255-2*mat2(y,x));
x(y,x,2) = mat2(y,x)
or something very similar.
You may use SigPack, a signal processing library on top of Armadillo. It has spectrogram support and you may save the plot to a lot of different formats (png, ps, eps, tex, pdf, svg, emf, gif). SigPack uses Gnuplot for the plotting.

How do images work in opencl kernel?

I'm trying to find ways to copy multidimensional arrays from host to device in opencl and thought an approach was to use an image... which can be 1, 2, or 3 dimensional objects. However I'm confused because when reading a pixle from an array, they are using vector datatypes. Normally I would think double pointer, but it doesn't sound like that is what is meant by vector datatypes. Anyway here are my questions:
1) What is actually meant to vector datatype, why wouldn't we just specify 2 or 3 indices when denoting pixel coordinates? It looks like a single value such as float2 is being used to denote coordinates, but that makes no sense to me. I'm looking at the function read_imageui and read_image.
2) Can the input image just be a subset of the entire image and sampler be the subset of the input image? I don't understand how the coordinates are actually specified here either since read_image() only seams to take a single value for input and a single value for sampler.
3) If doing linear algebra, should I just bite the bullet and translate 1-D array data from the buffer into multi-dim arrays in opencl?
4) I'm still interested in images, so even if what I want to do is not best for images, could you still explain questions 1 and 2?
Thanks!
EDIT
I wanted to refine my question and ask, in the following khronos documentation they define...
int4 read_imagei (
image2d_t image,
sampler_t sampler,
int2 coord)
But nowhere can I find what image2d_t's definition or structure is supposed to be. The samething for sampler_t and int2 coord. They seem like structs to me or pointers to structs since opencl is supposed to be based on ansi c, but what are the fields of these structs or how do I note the coord with what looks like a scala?! I've seen the notation (int2)(x,y), but that's not ansi c, that looks like scala, haha. Things seem conflicting to me. Thanks again!
In general you can read from images in three different ways:
direct pixel access, no sampling
sampling, normalized coordinates
sampling, integer coordinates
The first one is what you want, that is, you pass integer pixel coordinates like (10, 43) and it will return the contents of the image at that point, with no filtering whatsoever, as if it were a memory buffer. You can use the read_image*() family of functions which take no sampler_t param.
The second one is what most people want from images, you specify normalized image coords between 0 and 1, and the return value is the interpolated image color at the specified point (so if your coordinates specify a point in between pixels, the color is interpolated based on surrounding pixel colors). The interpolation, and the way out-of-bounds coordinates are handled, are defined by the configuration of the sampler_t parameter you pass to the function.
The third one is the same as the second one, except the texture coordinates are not normalized, and the sampler needs to be configured accordingly. In some sense the third way is closer to the first, and the only additional feature it provides is the ability to handle out-of-bounds pixel coordinates (for instance, by wrapping or clamping them) instead of you doing it manually.
Finally, the different versions of each function, e.g. read_imagef, read_imagei, read_imageui are to be used depending on the pixel format of your image. If it contains floats (in each channel), use read_imagef, if it contains signed integers (in each channel), use read_imagei, etc...
Writing to an image on the other hand is straightforward, there are write_image{f,i,ui}() functions that take an image object, integer pixel coordinates and a pixel color, all very easy.
Note that you cannot read and write to the same image in the same kernel! (I don't know if recent OpenCL versions have changed that). In general I would recommend using a buffer if you are not going to be using images as actual images (i.e. input textures that you sample or output textures that you write to only once at the end of your kernel).
About the image2d_t, sampler_t types, they are OpenCL "pseudo-objects" that you can pass into a kernel from C (they are reserved types). You send your image or your sampler from the C side into clSetKernelArg, and the kernel gets back a sampler_t or an image2d_t in the kernel's parameter list (just like you pass in a buffer object and it gets a pointer). The objects themselves cannot be meaningfully manipulated inside the kernel, they are just handles that you can send into the read_image/write_image functions, along with a few others.
As for the "actual" low-level difference between images and buffers, GPU's often have specially reserved texture memory that is highly optimized for "read often, write once" access patterns, with special texture sampling hardware and texture caches to optimize scatter reads, mipmaps, etc..
On the CPU there is probably no underlying difference between an image and a buffer, and your runtime likely implements both as memory arrays while enforcing image semantics.