As an educational excercise for myself I'm writing an application that can average a bunch of images. This is often used in Astrophotography to reduce noise.
The library I'm using is Magick++ and I've succeeded in actually writing the application. But, unfortunately, its slow. This is the code I'm using:
red.clear(); blue.clear(); green.clear();
ColorRGB rgb(image[i].pixelColor(column,row));
redVal = avg(red);
greenVal = avg(green);
blueVal = avg(blue);
redVal = redVal*MaxRGB; greenVal = greenVal*MaxRGB; blueVal = blueVal*MaxRGB;
Color newRGB(redVal,greenVal,blueVal);
The code averages 10 images by going through each pixel and adding each channel's pixel intensity into a double vector. The function avg then takes the vector as a parameter and averages the result. This average is then used at the corresponding pixel in stackedImage - which is the resultant image. It works just fine but as I mentioned, I'm not happy with the speed. It takes 2 minutes and 30s seconds on a Core i5 machine. The images are 8 megapixel and 16 bit TIFFs. I understand that its a lot of data, but I have seen it done faster in other applications.
Is it my loop thats slow or is pixelColor(x,y) a slow way to access pixels in an image? Is there a faster way?

Why use vectors/arrays at all?
Why not
double red=0.0, blue=0.0, green=0.0;
ColorRGB rgb(image[i].pixelColor(column,row));;;;
This avoids 36 function calls on vector objects per pixel.
And you may get even better performance by using a PixelCache of the whole image instead of the original Image objects. See the "Low-Level Image Pixel Access" section of the online Magick++ documentation for Image
Then the inner loop becomes
PixelPacket* pix = cache[i]+row*columns+column;
red+= pix->red;
blue+= pix->blue;
green+= pix->green;
Now you have also removed 10 calls to PixelColor, 10 ColorRGB constructors, and 30 accessor functions per pixel.
Note, This is all theory; I haven't tested any of it

Why do you use vectors for red, blue and green? Because using push_back can perform reallocations, and bottleneck processing. You could instead allocate just once three arrays of 10 colors.
Couldn't you declare rgb outside of the loops in order to relieve stack of unnecessary constructions and destructions?
Doesn't Magick++ have a way to average images?

Just in case anyone else wants to average images to reduce noise, and doesn't feel like too much "educational exercise" ;-)
ImageMagick can do averaging of a sequence of images like this:
convert image1.tif image2.tif ... image32.tif -evaluate-sequence mean result.tif
You can also do median filtering and others by changing the word mean in the above command to whatever you want, e.g.:
convert image1.tif image2.tif ... image32.tif -evaluate-sequence median result.tif
You can get a list of the available operations with:
identify -list evaluate


Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!

Audio Visualizer from wav looks wrong

I'm having trouble making an audio visualizer look accurate. The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. This forms a flat bouncing line (usually in the higher freqencies).
I want to display 512 bins or less at 30 frames per second. I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been:
Load pcm data from wav file. This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT.
Divide these samples up into 1470 per frame. (446 are ignored)
Take 1024 samples and apply a Hann window.
Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part.
Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]).
Taking 20 * log(magnitude) to get the decibel level of each bin
Draw a rectangle for each bin. Draw these bins for each frame.
I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin.
I'm writing this in c++. Using STK to only load left channel from the audio file:
//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
I'm using FFTReal to perform the FFT:
std::vector<std::vector <double> > leftChannelData;
int numberOfFrames = stkObject->getSize()/samplesPerFrame;
for(int i = 0; i < numberOfFrames; i++)
for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
real[j] = standardVector[j + (i*samplesPerFrame)];
applyHannWindow(real, FFT_SAMPLE_LENGTH);
//FFTReal instructions say to run this after an fft
for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
double dbValue = 20 * log(magnitude/maxMagnitude);
leftChannelData[i].at(j) = dbValue;
I'm at a loss as to what's causing this. I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. I think I may be doing something fundamentally wrong. I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. Any thoughts?
EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). All it seems to do is shift all of the bin's values evenly downwards.
Here's a what they look like to get a visual:
Song playing low sounds - with log(mag)
Song playing low sounds - same but with log(mag/maxMag)
Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0.
Is this what I should do? I've tried this, but it still looks wrong. Maybe it's just a scaling issue? When I do this, a lot of the bars don't make it above the line when it sounds like they should. And if they do make it above 0, they do so just barely.
You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. And your window isn't even a plain Hann window. Discarding 446 points is effectively a rectangular window function. The FT of the window functions will both show up in your output.
Secondly, the dB scale is logarithmic. That indeed means it can go quite low in the absence of a signal. You mention -60 dB, but it in fact could hit minus infinity. The only thing that would save you from that is the window function, which will introduce smear at about -110 dB.
The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold.
Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude.
Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess).

Check for similarity on different size images

I have a video source that produce many streams for different devices (such as: HD television, Pads, smart phones, etc.), every of them has to be checked within each other for similarity. The video stream release 50 images per second, one image every 20 milliseconds.
Lets take for instance img1 coming from stream1 at time ts1=1, img2 coming from stream2 at ts2=1 and img1.1 taken from stream1 at ts=2 (20 milliseconds later than ts=1), the comparison result should look something like this:
compare(img1, img1) = 1 same image same size
compare(img1, img2) = 0.9 same image different size
compare(img1, img1.1) = 0.8 different images same size
ideally this should be done real time, so within 20 millisecond, the goal is to understand if the streams are out of synchronization, I already implemented some compare methods (nobody of them works for this case yet):
1) histogram (SSE and OpenCV cuda), result compare(img1, img2) ~= compare(img1, img1.1)
2) pnsr (SSE and OCV cuda), result compare(img1, img2) < compare(img1, img1.1)
3) ssim (SSE and OCV cuda), resulting the same as pnsr
Maybe I get bad results because of the resize interpolation method?
Is it possible to realize a comparison method that fulfill my requirements, any ideas?
I'm afraid that you're running into a Real Problem (TM). This is not a trivial lets-give-it-to-the-intern problem.
The main challenge is that you can't do a brute-force comparison. HD images are 3 MB or more, and you're talking about O(N*M) comparisons (in time and across streams).
What you essentially need is a fingerprint that's robust against resizing but time-variant. And as you didn't realize that (the histogram idea for instance is quite time-stable, for instance) you didn't include the necessary information in this question.
So this isn't a C++ question, really. You need to understand your inputs.

Determine difference in stops between images with no EXIF data

I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.

OpenCV, C++: Distance between two points

For a group project, we are attempting to make a game, where functions are executed whenever a player forms a set of specific hand gestures in front of a camera. To process the images, we are using Open-CV 2.3.
During the image-processing we are trying to find the length between two points.
We already know this can be done very easily with Pythagoras law, though it is known that Pythagoras law requires much computer power, and we wish to do this as low-resource as possible.
We wish to know if there exist any build-in function within Open-CV or standard library for C++, which can handle low-resource calculations of the distance between two points.
We have the coordinates for the points, which are in pixel values (Of course).
Extra info:
Previous experience have taught us, that OpenCV and other libraries are heavily optimized. As an example, we attempted to change the RGB values of the live image feed from the camera with a for loop, going through each pixel. This provided with a low frame-rate output. Instead we decided to use an Open-CV build-in function instead, which instead gave us a high frame-rate output.
You should try this
cv::Point a(1, 3);
cv::Point b(5, 6);
double res = cv::norm(a-b);//Euclidian distance
As you correctly pointed out, there's an OpenCV function that does some of your work :)
(Also check the other way)
It is called magnitude() and it calculates the distance for you. And if you have a vector of more than 4 vectors to calculate distances, it will use SSE (i think) to make it faster.
Now, the problem is that it only calculate the square of the powers, and you have to do by hand differences. (check the documentation). But if you do them also using OpenCV functions it should be fast.
Mat pts1(nPts, 1, CV_8UC2), pts2(nPts, 1, CV_8UC2);
// populate them
Mat diffPts = pts1-pts2;
Mat ptsx, ptsy;
// split your points in x and y vectors. maybe separate them from start
Mat dist;
magnitude(ptsx, ptsy, dist); // voila!
The other way is to use a very fast sqrt:
// 15 times faster than the classical float sqrt.
// Reasonably accurate up to root(32500)
// Source:
unsigned int root(unsigned int x){
unsigned int a,b;
b = x;
a = x = 0x3f;
x = b/x;
a = x = (x+a)>>1;
x = b/x;
a = x = (x+a)>>1;
x = b/x;
x = (x+a)>>1;
This ought to a comment, but I haven't enough rep (50?) |-( so I post it as an answer.
What the guys are trying to tell you in the comments of your questions is that if it's only about comparing distances, then you can simply use
d=(dx*dx+dy*dy) = (x1-x2)(x1-x2) + (y1-y2)(y1-y2)
thus avoiding the square root. But you can't of course skip the square elevation.
Pythagoras is the fastest way, and it really isn't as expensive as you think. It used to be, because of the square-root. But modern processors can usually do this within a few cycles.
If you really need speed, use OpenCL on the graphics card for image processing.