Reshaping the input layer to a single channel and multiple images - c++

In some reference code I have picked up, there is:
net_->input_blobs()[0]->Reshape(1, 3, height, width);
My prototxt has:
input_shape {
dim: 1
dim: 3
dim: 260
dim: 347
}
I have been indirectly informed that the model provided has been tuned for greyscale (we have both a colour and a greyscale prototxt), and the currently-used Python code uses a greyscaled input with three identical channels.
Now I want to do either both or separately process 4 images in a single call to net_->Forward(); and pass in these four images as one-channel greyscale. So, first, choosing a single channel:
net_->input_blobs()[0]->Reshape(1, 1, height, width);
What are the repercussions of changing the number of channels? How do all my layers react? Will it work? If it works, will a one-channel net be faster?
Second, choosing four images:
net_->input_blobs()[0]->Reshape(4, 3, height, width);
I have a feeling that won't work, and I should be looking at increasing the number of input_blobs, but how to do that? Or what is the correct approach?

working with a single channel rather than identical three should be faster (fewer multiplication-addition operations). Since this is done at the finest scale, this might even have noticeable impact on run time.
Feeding 4 images as a single batch is usually faster than processing each image separately as a batch with one image (due to internal optimization of the computation to work with batches).
Bottom line: you should get better run time running a single batch of four images. If the input is three identical channels - it is better to modify the model to work with only one.

Related

Position detection of a defined mark in a picture

I am still a beginner in coding. I am currently working on a program in C/C++ that is determining pixel position of a defined mark (which is a black circle with white surroundings) in a photo.
I made a mask from the mark and a vector, which contains mask's every pixel value as it's elements (using Magick++ I summed values for Red, Green and Blue). Vector contains aprox. 10 000 values since the mask is 100x100px. I also used threshold functions for simplifying the image.
Than I made a grid, that is doing the same for the picture, where I want to find the coordinates of the mark. It is basically a loop, that is going throught the image and when the program knows pixel values in the grid it immediately compares them with the mask. Main idea is to find lowest difference between the mask and one of the grid positions.
The problem is however that this procedure of evaluating all grids position takes huge amount of time (e.g. the image has 1920x1080px so more than 2 million vectors containing 10 000 values). I decided to cycle the grid not every pixel but for example every 10th column and row, and than for the best corellation from this procedure I selected area where I used every pixel loop. But, this still takes lot of time.
I would like to ask you, if there is some way of improving this method for better (faster) results or this whole idea is not time efficient and I should use different approach.
Thanks for every advice!
Edit: The program will be used for processing multiple images and on all of them the size will be same. This is the picture after threshold, the mark is the big black dot.
Image
The idea that I find interesting is a pyramidal scheme - or progressive refinement: you find the spot at a lower size image then search only a small rectangle in the larger image.
If you reduce your image by 2 in each dimension then you would reduce the time by 4 plus some search effort in the larger image.
This has some problems: the reduction will affect accuracy I expect. You might miss the spot.
You have to cut the sample (template) by the same so you create a half-size template in this case. As you half half half... the template will get blurred into the surrounding objects so it will not be possible to have a valid template; for half size once I guess the dot has a couple of pixels around it.
As you haven't specified a tool or OS, I will choose ImageMagick which is installed on most Linux distros and is available for OSX and Windows. I am just using it at the command-line here but there are C, C++, Python, Perl, PHP, Ruby, Java and .Net bindings available.
I would use a "Connect Components Analysis" or "Blob Analysis" like this:
convert image.png -negate \
-define connected-components:area-threshold=1200 \
-define connected-components:verbose=true \
-connected-components 8 -auto-level result.png
I have inverted your image with -negate because in morphological operations, the foreground is usually white rather than black. I have excluded blobs smaller than 1200 pixels because your circles seem to have a radius of 22 pixels which makes for an area of 1520 pixels (Pi * 22^2).
That gives this output, which means 7 blobs - one per line - with the bounding box and area of each:
Objects (id: bounding-box centroid area mean-color):
0: 1358x1032+0+0 640.8,517.0 1296947 gray(0)
3: 341x350+1017+287 1206.5,468.9 90143 gray(255)
106: 64x424+848+608 892.2,829.3 6854 gray(255)
95: 38x101+44+565 61.5,619.1 2619 gray(255)
49: 17x145+1341+379 1350.3,446.7 2063 gray(0)
64: 43x43+843+443 864.2,464.1 1451 gray(255)
86: 225x11+358+546 484.7,551.9 1379 gray(255)
Note that, as your circle is 42x42 pixels you will be looking for a blob that is square-ish and close to that size - so I am looking at the second to last line. I can draw that in in red on your original image like this:
convert image.png -fill none -stroke red -draw "rectangle 843,443 886,486" result.png
Also, note that as you are looking for a circle, you would expect the area to be pi * r^2 or around 1500 pixels and you can check that in the penultimate column of the output.
That runs in 0.4 seconds on a reasonable spec iMac. Note that you could divide the image into 4 and run each quarter in parallel to speed things up. So, if you do something like this:
#!/bin/bash
# Split image into 4 (maybe should allow 23 pixels overlap)
convert image.png -crop 1x4# tile-%02d.mpc
# Do Blob Analysis on 4 strips in parallel
for f in tile-*mpc; do
convert $f -negate \
-define connected-components:area-threshold=1200 \
-define connected-components:verbose=true \
-connected-components 8 info: &
done
# Wait for all 4 to finish
wait
That runs in around 0.14 seconds.

image IFFT of a zero padded FFT gives a washed out image

For practice, I am slowly implementing image processing concepts with the FFT, and I have started with zero-padding. The result is supposed to rescale the size of the image (in this case, double the width and height), but my output is washed out. I was thinking that it had to do with my normalization after the ifft, since the width and height had changed after the padding, but nothing I have tried has produced a better image. Any ideas on where I may be scaling the data incorrectly, or quick fixes to increase the power of my output? Before I save my image, I scale all the pixel data to a range between 0 and 255, but it almost seems like the output is between 128 and 255 instead.
Original:
Zero-padded FFT:
IFFT:
I should have tested some more before posting. All I was doing wrong was scaling the image between 0 and 255 one too many times. This messed up the data, especially since my scaling function uses a logarithmic scale to better display the fft.

Multiple scans by key

I have one 4-channel HSVL image - Hue, Saturation, Value (floats), Label (unsigned int).
The task is to compute an array of sums of Hues, Saturations, and Values, for each unique label. For example, I will be able to access the output Sum[of pixels with label 455] = { Hue: 500, Sat: 100, Val: 200 }. The size of the image is about 5 MP, and there are about 3000 different labels.
My idea is to have ~32 scans over parts of the image, that will produce 32 x nLabels sums. Then I can scan over the 32 partitions of the image, to arrive at nLabel sum structures.
Does a "scan by key?" algorithm exist that is a solution to this exact type of problem?
If you want to do this by CUDA, the following could help.
Since you only need the sum values, I think what you need is "reduce by key". Thrust provides an implementation thrust::reduce_by_key() which could meet your needs.
But before using it, you have to sort all the pixels by the labels. This can be done with thrust::sort_by_key()
You may also be interested in thrust::zip_iterator, which can zip the 3 channels HSV into a single value iterator for sorting and reduction.

Compressing BMP methods

I am working on a project to losslessly compress a specific style of BMP images that look like this
I have thought about doing pattern recognition, to find repetitive blocks of N x N pixels but I feel like it wont be fast enough execution time.
Any suggestions?
EDIT: I have access to the dataset that created these images too, I just use the image to visualize my data.
Optical illusions make it hard to tell for sure but are the colors only black/blue/red/green? If so, the most straightforward compression would be to simply make more efficient use of pixels. I'm thinking pixels use a fixed amount of space regardless of what color they are. Thus, chances are you are using 12x as many pixels as you really need to be. Since a pixel can be a lot more colors than just those four.
A simple way to do that would be to do label the pixels with the following base 4 numbers:
Black = 0
Red = 1
Green = 2
Blue = 3
Example:
The first four colors of the image seems to be Blue-Red-Blue-Blue. This is equal to 3233 in base 4, which is simply EF in base 16 or 239 in base 10. This is enough to define what the red color of the new pixel should be. The next 4 would define the green color and the final 4 define what the blue color is. Thus turning 12 pixels into a single pixel.
Beyond that you'll probably want to look into more conventional compression software.

C++: How to interpret a byte array representation of an image?

I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.