Tensorflow maxpooling in conv2d filter instead of atrous_conv2d - computer-vision

I want to perform a convolution on a big patch of the image. But I don't want too have to many variables, one solution could be to use the atrous_conv2d function but I would prefer apply first a max_pool on the patch, then the regular conv2d. How can I do this?
I have to keep the same image size between input and output. Here the code with the atrous_conv2d function
x = tf.placeholder('float', shape=[None, size_x*size_y])
image = tf.reshape(x, [-1,size_x , size_y,1])
W = weight_variable([9, 9, 1, n])
conv =tf.nn.atrous_conv2d(image, W, 10, padding='SAME')
If I understand correctly the patch size of the atrous_conv2d convolution is (9*10 X 9*10) but it act on each different pixel at 10 pixel interval and need only 9X9Xn variables.
I would prefer to take the same patch size, apply a max_pool on it, then a conventional conv2d on the (9X9) patch resulting from the max_pool. At the end it would produce the same numbers of variables but it could provide smoother results. The code could look like this :
x = tf.placeholder('float', shape=[None, size_x*size_y])
image = tf.reshape(x, [-1,size_x , size_y,1])
W = weight_variable([9, 9, 1, n])
def maxp(patch):
tf.sum_reduce(tf.nn.max_pool(patch, ksize=[1,10,10,1],
strides=[1,10,10,1], padding='SAME')*W)
conv=conv_func(image,maxp,patch_size=[1,9*10,9*10,1],strides=[1,1,1,1])
where conv_func take as argument the value, a function and a patch_size and apply the function on the patch.

Related

combined Scharr derivatives in opencv

I have few questions regarding Scharr derivatives and its OpenCV implementation.
I am interested in second order image derivatives with (3X3) kernels.
I started with Sobel second derivative, which failed to find some thin lines in the images. After reading the Sobel and Charr comparison in the bottom of this page, I decided to try Scharr instead by changing this line:
Sobel(gray, grad, ddepth, 2, 2, 3, scale, delta, BORDER_DEFAULT);
to this line:
Scharr(img, gray, ddepth, 2, 2, scale, delta, BORDER_DEFAULT );
My problem is that it seems like cv::Scharr allows performing an only first order of one partial derivative at a time, So I get the following error:
error: (-215) dx >= 0 && dy >= 0 && dx+dy == 1 in function getScharrKernels
(see assertion line here)
Following this restriction, I have a few questions regarding Scharr derivatives:
Is it considered bad-practice to use high order Scharr derivatives? Why did OpenCV choose to assert dx+dy == 1?
If I am to call Scharr twice for each axis, What is the correct way to combine the results?
I am currently using:
addWeighted( abs_grad_x, 0.5, abs_grad_y, 0.5, 0, grad );
but I am not sure that this how the Sobel function combines the two axis and in what order it should be done for all 4 derivatives.
If I am to compute the (dx=2,dy=2) derivative by using 4 different kernels, I would like to reduce processing time by unifying all 4 kernels into 1 before applying it on the image (I assume that this is what cv::Sobel does). Is there a reasonable way to create such combined Shcarr kernel and convolve it with my image?
Thanks!
I've never read the original Scharr paper (the dissertation is in German) so I don't know the answer to why the Scharr() function doesn't allow higher order derivatives. Maybe because of the first point I make in #3 below?
The Scharr function is supposed to be a derivative. And the total derivative of a multivariable function f(x) = f(x0, ..., xN) is
df/dx = dx0*df/dx0 + ... + dxN*df/dxN
That is, the sum of the partials each multiplied by the change. In the case of images of course, the change dx in the input is a single pixel, so it's equivalent to 1. In other words, just sum the partials; not weighting them by half. You can use addWeighted() with 1s as the weights, or you can just sum them, but to make sure you won't saturate your image you'll need to convert to a float or 16-bit image first. However, it's also pretty common to compute the Euclidean magnitude of the derivatives, too, if you're trying to get the gradient instead of the derivative.
However, that's just for the first-order derivative. For higher orders, you need to apply some chain ruling. See here for the details of combining a second order.
Note that an optimized kernel for first-order derivatives is not necessarily the optimal kernel for second-order derivatives by applying it twice. Scharr himself has a paper on optimizing second-order derivative kernels, you can read it here.
With that said, filters are split into x and y directions to make linear separable filters, which basically turn your 2d convolution problem into two 1d convolutions with smaller kernels. Think of the Sobel and Scharr kernels: for the x direction, they both just have the single column on either side with the same values (except one is negative). When you slide the kernel across the image, at the first location, you're multiplying the first column and the third column by the values in your kernel. And then two steps later, you're multiplying the third and the fifth. But the third was already computed, so that's wasteful. Instead, since both sides are the same, just multiply each column by the vector since you know you need those values, and then you can just look up the values for the results in column 1 and 3 and subtract them.
In short, I don't think you can combine them with built-in separable filter functions, because certain values are positive sometimes, and negative otherwise; and the only way to know when applying a filter linearly is to do them separately. However, we can examine the result of applying both filters and see how they affect a single pixel, construct the 2D kernel, and then convolve with OpenCV.
Suppose we have a 3x3 image:
image
=====
a b c
d e f
g h i
And we have the Scharr kernels:
kernel_x
========
-3 0 3
-10 0 10
-3 0 3
kernel_y
========
-3 -10 -3
0 0 0
3 10 3
The result of applying each kernel to this image gives us:
image * kernel_x
================
-3a -10b -3c
+0d +0e +0f
+3g +10h +3i
image * kernel_y
================
-3a +0b +3c
-10d +0e +10f
-3g +0h +3i
These values are summed and placed into pixel e. Since the sum of both of these is the total derivative, we sum all these values into pixel e at the end of the day.
image * kernel_x + image * kernel y
===================================
-3a -10b -3c +3g +10h +3i
-3a +3c -10d +10f -3g +3i
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
-6a -10b +0c -10d +10f +0g +10h +6i
And this is the same result we'd have gotten if we multiplied by the kernel
kernel_xy
=============
-6 -10 0
-10 0 10
0 10 6
So there's a 2D kernel that does a single-order derivative. Notice anything interesting? It's just the addition of the two kernels. Is that surprising? Not really, as x(a+b) = ax + bx. Now we can pass that into filter2D()
to compute the addition of the derivatives. Does that actually give the same result?
import cv2
import numpy as np
img = cv2.imread('cameraman.png', 0).astype(np.float32)
kernel = np.array([[-6, -10, 0],
[-10, 0, 10],
[0, 10, 6]])
total_first_derivative = cv2.filter2D(img, -1, kernel)
scharr_x = cv2.Scharr(img, -1, 1, 0)
scharr_y = cv2.Scharr(img, -1, 0, 1)
print((total_first_derivative == (scharr_x + scharr_y)).all())
True
Yep. Now I guess you can just do it twice.

Python 2.7- how to save a picture through tuple or set values?

So I've been using opencv2 and PIL to get pixel values. They're saved as such
(0, 0, 0), (1, 1, 1)
I've tried like 7 different ways to use this data to create an image.
My biggest problem is I can't seem to get putdata to work with my tuple.
I would show code, but my laptop is flat and my code is broken anyway.
Tldr: how to save an image with PIL using pixel values stored in a tuple?
Something like this should work
from PIL import Image
W = 200
H = 200
img = Image.new("RGB", (W, H))
pixel_list = [(i%256,i%256,i%256) for i in range(W*H)]
i_pixel = 0
for x in range(W):
for y in range(H):
img.putpixel((x, y), pixel_list[i_pixel])
i_pixel += 1
img.save('result.png')
With the following result
Note: I read here the following:
In 1.1.6, the above is better written as:
pix = im.load()
for i in range(n):
...
pix[x, y] = value
But I couldn't get that to work.

getting axes don't match array error when trying to visualize all layers in caffe using classification.ipny

I am a newbie in python and have a very basic knowledge of the language, having said that, I'm trying to get the visualization for all layers both for weights and their filters.For this instead of repeating:
# the parameters are a list of [weights, biases]
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))
and changing layer name, I tried using a loop like this :
for layer_name, param in net.params.iteritems():
# the parameters are a list of [weights, biases]
filters = net.params[layer_name][0].data
vis_square(filters.transpose(0, 2, 3, 1))
now it works fine for the first layer, but gives this error and stops running:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-cf7d5999a45c> in <module>()
2 # the parameters are a list of [weights, biases]
3 filters = net.params[layer_name][0].data
----> 4 vis_square(filters.transpose(0, 2, 3, 1))
ValueError: axes don't match array
And this is the definition of vis_square() (defined in classification.ipny in example directory of caffe):
def vis_square(data):
"""Take an array of shape (n, height, width) or (n, height, width, 3)
and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)"""
# normalize data for display
data = (data - data.min()) / (data.max() - data.min())
# force the number of filters to be square
n = int(np.ceil(np.sqrt(data.shape[0])))
padding = (((0, n ** 2 - data.shape[0]),
(0, 1), (0, 1)) # add some space between filters
+ ((0, 0),) * (data.ndim - 3)) # don't pad the last dimension (if there is one)
data = np.pad(data, padding, mode='constant', constant_values=1) # pad with ones (white)
# tile the filters into an image
data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
plt.imshow(data); plt.axis('off')
What is wrong here?
I'd be grateful if anyone could give me a hand on this.
For subsequent layers, the number of channels is > 64. For instance, if you have num_output: 64 in the first layer and num_output: 64 in the second as well, the shape of the 4D matrix that stores the weights is 64 x 64 x height x width. After you do the transpose, it's 64 x height x width x 64.
Your function is not capable of handling a 64 layer object, though it's great for 3-layer objects.
I would just do n = int(np.ceil(np.sqrt(data.shape[0] * data.shape[3]))) and reshape the whole thing into a 1-layer object. I don't think visualising the convolution kernel as RGB will give you any insight.
For anyone having similar problem ("axes don't match array" error): Right before transposing, I put my data in a variable giving the exact size. If my data is Data with the size of 10*12*15:
DataI = Data [0:9, 0:11, 0:14]
DataII = np.transpose(DataI,(0,2,1))
this worked for me.

How to find an Equivalent point in a Scaled down image?

I would like to calculate the corner points or contours of the star in this in a Larger image. For that I'm scaling down the size to a smaller one & I'm able to get this points clearly. Now How to map this point in original image? I'm using opencv c++.
Consider a trivial example: the image size is reduced exactly by half.
So, the cartesian coordinate (x, y) in the original image becomes coordinate (x/2, y/2) in the reduced image, and coordinate (x', y') in the reduced image corresponds to coordinate (x*2, y*2) in the original image.
Of course, fractional coordinates get typically rounded off, in a reduced scale image, so the exact mapping is only possible for even-numbered coordinates in this example's original image.
Generalizing this, if the image's width is scaled by a factor of w horizontally and h vertically, coordinate (x, y) becomes coordinate(x*w, y*h), rounded off. In the example I gave, both w and h are 1/2, or .5
You should be able to figure out the values of w and h yourself, and be able to map the coordinates trivially. Of course, due to rounding off, you will not be able to compute the exact coordinates in the original image.
I realize this is an old question. I just wanted to add to Sam's answer above, to deal with "rounding off", in case other readers are wondering the same thing I faced.
This rounding off becomes obvious for even # of pixels across a coordinate axis. For instance, along a 1-D axis, a point demarcating the 2nd quartile gets mapped to an inaccurate value:
axis_prev = [0, 1, 2, 3]
axis_new = [0, 1, 2, 3, 4, 5, 6, 7]
w_prev = len(axis_prev) # This is an axis of length 4
w_new = len(axis_new) # This is an axis of length 8
x_prev = 2
x_new = x_prev * w_new / w_prev
print(x_new)
>>> 4
### x_new should be 5
In Python, one strategy would be to linearly interpolate values from one axis resolution to another axis resolution. Say for the above, we wish to map a point from the smaller image to its corresponding point of the star in the larger image:
import numpy as np
from scipy.interpolate import interp1d
x_old = np.linspace(0, 640, 641)
x_new = np.linspace(0, 768, 769)
f = interp1d(x_old, x_new)
x = 35
x_prime = f(x)

Create color variations in cpp

I have a given color and want to create variations of it in terms of hue, saturation and lightness.
I found a webpage which creates variations the way I would like it (See http://coloreminder.com/). However, I do not entirely understand how these variations are created for an arbitrary color. From what I can tell from considering created variations at this home page, it seems not to be enough to simply change the HSL values separately to create variations.
Hence, I wanted to ask if anybody knows an approach for creating these variations, or ideally knows where to get a piece of code to adopt this kind of color variations creation in my own program?
I am using C++ and QT.
EDIT: Thank you for your replies! Actually the variations of the given homepage really only varies the HSL values separately in 10% steps. I got confused since I compared the values with HSV values in color picker of my program.
From what I can tell from considering created variations at this home page, it seems not to be enough to simply change the HSL values seperately to create variations.
Really? The interface seems to be clear enough about what modifications it makes. You can select "hue", "saturation" or "luminance" and it shows 9 variations on that channel. The following MATLAB script will plot the different variations in a similar way (although in the HSV color space, not HSL).
% display n variations of HTML-style color code.
function [] = colorwheel ( hex, n )
% parse color code.
rgb = hex2rgb(hex);
% render all variations.
h = figure();
for j = 1 : 3,
% build n variations on current channel.
colors = variantsof(rgb, j, n);
% display variations.
for i = 1 : n,
% generate patch of specified color.
I = zeros(128, 128, 3);
I(:,:,1) = colors(i, 1);
I(:,:,2) = colors(i, 2);
I(:,:,3) = colors(i, 3);
% render patches side-by-side to show progression.
imshow(I, 'parent', ...
subplot(3, n, (j-1)*n+i, 'parent', h));
end
end
end
% parse HTML-style color code.
function [ rgb ] = hex2rgb ( hex )
r = double(hex2dec(hex(1:2))) / 255;
g = double(hex2dec(hex(3:4))) / 255;
b = double(hex2dec(hex(5:6))) / 255;
rgb = [r g b];
end
% generate n variants of color on j-th channel.
function [ colors ] = variantsof ( rgb, j, n )
colors = zeros(n, 3);
for i = 1 : n,
% convert to HSV.
color = rgb2hsv(rgb);
% apply variation to selected channel.
color(j) = color(j) + ((i-1) / n);
if color(j) > 1.0,
color(j) = color(j) - 1.0;
end
% convert to RGB.
colors(i,:) = hsv2rgb(color);
end
% order colors with respect to channel.
if j > 1,
colors = sortrows(colors, j);
end
end
Using the "goldenrod" sample color, as:
colorwheel('daa520', 9);
I get:
The first row is a variation on hue, the second on saturation and the third on value. The outputs don't correspond exactly to the ones on the coloreminder.com, but this is explained by the difference in color space and exact value used in permutations.
Have you read through the documentation for QColor?
The QColor class itself provides plenty of useful functions for manipulating colors in pretty much any way you can think of, and the documentation itself explains some basic color theory as well.