I am trying to understand unpooling in Pytorch because I want to build a convolutional auto-encoder.
I have the following code
from torch.autograd import Variable
data = Variable(torch.rand(1, 73, 480))
pool_t = nn.MaxPool2d(2, 2, return_indices=True)
unpool_t = nn.MaxUnpool2d(2, 2)
out, indices1 = pool_t(data)
out = unpool_t(out, indices1)
But I am constantly getting this error on the last line (unpooling).
IndexError: tuple index out of range
Although the data is simulated in this example, the input has to be of that shape because of the preprocessing that has to be done.
I am fairly new to convolutional networks, but I have even tried using a ReLU and convolutional 2D layer before the pooling however, the indices always seem to be incorrect when unpooling for this shape.

Your data is 1D and you are using 2D pooling and unpooling operations.
PyTorch interpret the first two dimensions of tensors as "batch dimension" and "channel"/"feature space" dimension. The rest of the dimensions are treated as spatial dimensions.
So, in your example, data is 3D tensor of size (1, 73, 480) and is interpret by pytorch as a single batch ("batch dimension" = 1) with 73 channels per sample and 480 samples.
For some reason MaxPool2d works for you and treats the channel dimension as a spatial dimension and sample this as well - I'm not sure this is a bug or a feature.
If you do want to sample along the second dimension you can add an additional dimension, making data a 4D tensor:
out, indices1 = pool_t(data[None,...])
In [11]: out = unpool_t(out, indices1, data[None,...].size())


Optimal way to append to numpy array when dealing with large dimensions

I am working with a json file that consists of approximately 17,000 3x1 arrays denoting the coordinates.
Currently,I have an image of 1024x1024 dimensions(which I have flattend),and I am using np.hstack to add the 3x1 array to that image ,this gives me a 1d array of dimension 1048579x1
My objective is to create a final array of dimension 1048579x17,000.
Unfortunately list.append and np.append are not working in this case,because it's consuming too much memory.I tried running this on colab pro,but the memory consumption is too high which causes the session to crash
My current code is as follows
#Here data consists of 17,000 entries each of which is a 3x1 list
with open('data.json') as f:
json1_str =
for i in range(len(json1_data)):
local_coordinates.append(new_arr) #
Is there an optimal way to stack all the 1048579 1d arrays to create the final matrix which can be used for training purposes?

Pass variable sized input to Linear layer in Pytorch

I have a Linear() layer in Pytorch after a few Conv() layers. All the images in my dataset are black and white. However most of the images in my test set are of a different dimension than the images in my training set. Apart from resizing the images themselves, is there any way to define the Linear() layer in such a way that it takes a variable input dimension? For example something similar to view(-1)
Well, it doesn't make sense to have a Linear() layer with a variable input size. Because in fact it's a learnable matrix of shape [n_in, n_out]. And matrix multiplication is not defined for inputs if theirs feature dimension != n_in
What you can do is to apply pooling from functional API. You'll need to specify kernel_size and stride such that resulting output will have feature dimension size = n_in.

SSIM for 3D image volume

I'm working on an image super-resolution problem (both 2D and 3D) using TensorFlow and am using SSIM as one of the eval_metrics.
I'm using image.ssim from TF and measure.comapre_ssim from skimage. Both of them are giving same results for 2D, but there's always a difference in results for 3D volumes.
I've looked into the source code for both TF-implementation and skimage-implemenation. There seems to be some fundamental differences in how the input images are considered and handled in the two implementations.
Code to replicate the issue:
import numpy as np
import tensorflow as tf
from skimage import measure
# For 2-D case
a = np.random.random([32, 32, 64])
b = np.random.random([32, 32, 64])
a_ = tf.convert_to_tensor(a)
b_ = tf.convert_to_tensor(b)
ssim_2d_tf = tf.image.ssim(a_, b_, 1.0)
ssim_2d_sk = measure.compare_ssim(a, b, multichannel=True, gaussian_weights=True, data_range=1.0, use_sample_covariance=False)
print (tf.Session().run(ssim_2d_tf), ssim_2d_sk)
# For 3-D case
a = np.random.random([32, 32, 32, 64])
b = np.random.random([32, 32, 32, 64])
a_ = tf.convert_to_tensor(a)
b_ = tf.convert_to_tensor(b)
ssim_3d_tf = tf.image.ssim(a_, b_, 1.0)
ssim_3d_sk = measure.compare_ssim(a, b, multichannel=True, gaussian_weights=True, data_range=1.0, use_sample_covariance=False)
s_3d_tf = tf.Session().run(ssim_3d_tf)
print (np.mean(s_3d_tf), ssim_3d_sk)
I have to take the mean of the output in case of 3D, as Tensorflow computes SSIM over last three dimensions, and hence results in 32 SSIM values. This suggests that TF considers images for SSIM in NHWC format. Is this good for SSIM over 3D volumes?
skimage however, seems to be using 1D Gaussian filters. So clearly even this is not considering depth in 3D volumes.
Can someone throw some light on these and help me in deciding which one to use further and why?
From a cursory look at the code, it seems that TensorFlow always computes a 2D SSIM, for each image in the batch and for each channel. It averages SSIM values across channels, and returns a value for each image in the batch. For TF, a 4D array is a collection of 2D images with multiple channels.
In contrast, SciKit-Image computes SSIM over all dimensions, except the last one if multichannel is set. So in the case of a 4D array, it computes a 3D SSIM for each channel and averages across channels.
This is consistent with your finding of similar results for a 3D array, but different results for a 4D array.
skimage however, seems to be using 1D Gaussian filters.
I’m not sure where you got this from, SciKit-Image uses an nD Gaussian in the case of a nD image. However, a Gaussian is a separable filter, meaning it can be efficiently implemented by n applications of a 1D filter.

How to calculate dimensions of the dense and output layer in convolutional neural network?

Can someone please tell me that why the size of dense layer and the output layer is 256 and 10 respectively?
input = 1x28x28
conv2d1 (28-(5-1))=24 -> 32x24x24
maxpool1 32x12x12
conv2d2 (12-(3-1))=10 -> 32x10x10
maxpool2 32x5x5
dense 256
output 10
Convolution layers are different from Fully Connected layers. For fully connected, you reshape the vector to one single dimension and apply matrix multiplication with fc layer weights (W*x+B).
You should clearly read and understand concepts here (best tutorial to understand how convnets works) :
For Dense Layer:
In your case, first dense layer has size of weights [32*5*5,256]. Reshape the output of pool layer to one vector and feed it through dense layers. Output of first dense layer is 256 dim vector - feed it through second FC layer (weights_size = [256,10]) to get 10 dim vector
All the details of Conv, Pool, Relu, fully-connected layers and calculation of output sizes of each layer are clearly explained in the above link.
Please go through it. I hope that helps.

dimensions in batch normalization

I'm trying to build a generalized batch normalization function in Tensorflow.
I learn batch normalization in this article that i found very kind.
I have a problem with the dimensions of the scale and beta variables: In my case batch normalization is applied to each activations of each convolutional layer, thus if i have as output of the convolutional layer a tersor with size:
i need that scale and beta have same dimension as the convolutional layer output, correct?
here's my function, the program works but i don't know if is correct
def batch_normalization_layer(batch):
# Calculate batch mean and variance
batch_mean, batch_var = tf.nn.moments(batch, axes=[0, 1, 2])
# Apply the initial batch normalizing transform
scale = tf.Variable(tf.ones([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
beta = tf.Variable(tf.zeros([batch.get_shape()[1],batch.get_shape()[2],batch.get_shape()[3]]))
normalized_batch = tf.nn.batch_normalization(batch, batch_mean, batch_var, beta, scale, 0.0001)
return normalized_batch
from the documentation of tf.nn.batch_normalization:
mean, variance, offset and scale are all expected to be of one of two
In all generality, they can have the same number of dimensions as the
input x, with identical sizes as x for the dimensions that are not
normalized over (the 'depth' dimension(s)), and dimension 1 for the
others which are being normalized over. mean and variance in this case
would typically be the outputs of tf.nn.moments(..., keep_dims=True)
during training, or running averages thereof during inference.
In the
common case where the 'depth' dimension is the last dimension in the
input tensor x, they may be one dimensional tensors of the same size
as the 'depth' dimension. This is the case for example for the common
[batch, depth] layout of fully-connected layers, and [batch, height,
width, depth] for convolutions. mean and variance in this case would
typically be the outputs of tf.nn.moments(..., keep_dims=False) during
training, or running averages thereof during inference.
With your values (scale=1.0 and offset=0) you can also just provide the value None.