super mysterious logic error for a planar fitting image processing algorithm - c++

so i have this image processing program where i am using a linear regression algorithm to find a plane that best fits all of the points (x,y,z: z being the pixel color intensity (0-255)
Simply speaking i have this picture of ? x ? dimension. I run this algorithm and i get these A, B, C values. (3 float values)
then i go every pixel in the program and minus the pixel value with mod_val where
mod_val = (-A * x -B * y ) / C
A,B,C are constants while x,y is the pixel location in a x,y plane.
When the dimension of the picture is divisible by 100 its perfect but when its not the picture fractures. The picture itself is the same as the original but there is a diagonal line with color contrast that goes across the picture. The program is supposed to make the pixel color uniform from the center.
I tried running the picture where mod_val = 0 for not divisble by 100 dimension pictures and it copies a new picture perfectly. So i doubt there is a problem with storing and writing the read data in terms of alignment. (fyi this picture is a grey scale 8 bit.bmp)
I have tried changing the A,B,C values but the diagonal remains the same. The color of the image fragments within the diagonals change.
when i run 1400 x 1100 picture it works perfectly with the mod_val equation written above which is the most baffling part.
I spent a lot of time looking for rounding errors. They are virtually all floats. The dimension i used for breaking picture is 1490 x 1170.
here is a gragment of the code where i think a error is occuring:
int img_row = row_length;
int img_col = col_length;
int i = 0;
float *pfAmultX = new float[img_row];
for (int x = 0; x < img_row; x++)
{
pfAmultX[x] = (A * x)/C;
}
for (int y = 0; y < img_col; y++)
{
float BmultY = B*y/C;
for (int x = 0; x < img_row; x++, i++)
{
modify_val = pfAmultX[x] + BmultY;
int temp = (int) data.data[i];
data.data[i] += (unsigned char) modify_val;
if(temp >= 250){
data.data[i] = 255;
}
else if(temp < 0){
data.data[i] = 0;
}
}
}
delete[] pfAmultX;
The img_row, img_col is correct according to VS debugger mode
Any help would be greatly appreciated. I've been trying to find this bug for many hours now and my boss is telling me that i can't go back home until i find this bug.....
before algorithm (1400 x 1100, works)
after
before (1490 x 1170, demonstrates the problem)
after
UPDATE:
well i have boiled down the problem as something with the x coordinate after extensive testing.
This is because when i use large A or B values or both (C value is always ~.999) for 1400x1100 it does not create diagonals.
However, for the other image, large B values do not create diagonals but a fairly small - avg A value creates diagonals.
Whats even more, when i test a picture where x is disivible by 100 but y is divisible by 10. the answer is correct.

well in the end i found the solution. It was a problem due to the padding the the bitmap. When the dimension on the x was not divisible by 4 it would use padding which would throw off all of the x coordinates. This also meant that the row_value i received from the bmp header was the same as the dimension but not really the same in reality. I had to make a edit where i had to do: 4 * (row_value_from_bmp_header + 3)/ 4.

Related

How can I draw smooth pixels in a netpbm image?

I just wrote a small netpbm parser and I am having fun with it, drawing mostly parametric equations. They look OK for a first time thing, but how can I expand upon this and have something that looks legit? This picture is how my method recreated the Arctic Monkeys logo which was just
0.5[cos(19t) - cos(21t)]
(I was trying to plot both cosines first before superpositioning them)
It obviously looks very "crispy" and sharp. I used as small of a step size as I could without it taking forever to finish. (0.0005, takes < 5 sec)
The only idea I had was that when drawing a white pixel, I should also draw its immediate neighbors with some slightly lighter gray. And then draw the neighbors of THOSE pixels with even lighter gray. Almost like the white color is "dissolving" or "dissipating".
I didn't try to implement this because it felt like a really bad way to do it and I am not even sure it'd produce anything near the desirable effect so I thought I'd ask first.
EDIT: here's a sample code that draws just a small spiral
the draw loop:
for (int t = 0; t < 6 * M_PI; t += 0.0005)
{
double r = t;
new_x = 10 * r * cosf(0.1 * M_PI * t);
new_y = -10 * r * sinf(0.1 * M_PI * t);
img.SetPixel(new_x + img.img_width / 2, new_y + img.img_height / 2, 255);
}
//img is a PPM image with magic number P5 (binary grayscale)
SetPixel:
void PPMimage::SetPixel(const uint16_t x, const uint16_t y, const uint16_t pixelVal)
{
assert(pixelVal >= 0 && pixelVal <= max_greys && "pixelVal larger than image's maximum max_grey\n%d");
assert(x >= 0 && x < img_width && "X value larger than image width\n");
assert(y >= 0 && y < img_height && "Y value larger than image height\n");
img_raster[y * img_width + x] = pixelVal;
}
This is what this code produces
A very basic form of antialiasing for a scatter plot (made of points rather than lines) can be achieved by applying something like stochastic rounding: consider the brush to be a pixel-sized square (but note the severe limitations of this model), centered at the non-integer coordinates of the plotted point, and compute its overlap with the four pixels that share the corner closest to that point. Treat that overlap fraction as a grayscale fraction and set each pixel to the largest value for a large number of points approximating a line, or do alpha blending for a small number of discrete points.

Modifying only the beginning of an image and not it fully, as I wish

I currently have some code that reads an image stored in the tga format, then do something with it and then store it in a new tga file.
The problem is that only the bottom one third is being modified, the other two thirds are equal to the original image. Here is the code:
int size = width*height*bpp;
char imageArray [size];
char * arrayPtr = &imageArray[0];
......
for (int x=0; x<width; x++) {
for (int y=0; y<height; y++) {
imageArray [x*height + 3*y] = 255;
imageArray [x*height + 3*y + 1] = 0;
imageArray [x*height + 3*y + 2] = 0;
}
}
fileWriter.write (arrayPtr, size);
As can be seen inside the loops, I am modifying each color value, in this case making it into a single color image. Unfortunately only the bottom third will be modified, even with the number of loop iterations being equal to the number of pixels, and doing three operations by iteration, the number of it is equal to the number of bytes of the original image.
So I have no idea of what I am doing wrong and would be thankful for any recommendations.
The whole offset has to be multiplied by bpp, not only y:
imageArray [bpp*(x*height + y)] = 255;
imageArray [bpp*(x*height + y) + 1] = 0;
....
I think I understand your problem now, but it relies on some assumptions about how you are bringing in your data and what bpp means.
You are trying to loop over every pixel here and update the 3 values.
You set size = width*height*bpp where I can only assume bpp means bits-per-pixel and is the 3 showing up in your loop. Try stepping through this with x=1 and y=0. If the data is being layed out contiguously like:
RGB # x=0,y=0; RGB # x=1,y=0; ... then you can see you end up writing over your data from the first iteration of the loop. Everytime you nest the loop, the index should get multiplied entirely by the next levels dimension. Just replace x*height + 3*y with (x*height + y)*bpp assuming bpp = 3.
It all depends on the order that bytes a stored in the image array.
Your formulation suggest a by-column/by-row/by-color. But it can also be by-row/by-column/by-color or even by-color/by-row/by-column.
The index formulation should be
x*(b*h)+y*b+c
y*(b*w)+x*b+c
c*(w*h)+y*h+x
(b, w ,and h are color bytes, width and height)
Note how indexes cumulate in the sums. You have at least forgotten one multiplication, assuming the order is correct.

Convolutional network filter always negative

I asked a question about a network which I've been building last week, and I iterated on the suggestions which lead me to finding a few problems. I've come back to this project and fixed up all the issues and learnt a lot more about CNNs in the process. Now I'm stuck on an issue were all of my weights move to massively negative values, which coupled with the RELU ends in the output image always being completely black (making it impossible for the classifier to do it's job).
On two labeled images:
These are passed into a two layer network, one classifier (which gets 100% on its own) and a one filter 3*3 convolutional layer.
On the first iteration the output from the conv layer looks like (images in same order as above):
The filter is 3*3*3, due to the images being RGB. The weights are all random numbers between 0.0f-1.0f. On the next iteration the images are completely black, printing the filters shows that they are now in range of -49678.5f (the highest I can see) and -61932.3f.
This issue in turn is due to the gradients being passed back from the Logistic Regression/Linear layer being crazy high for the cross (label 0, prediction 0). For the circle (label 1, prediction 0) the values are between roughly -12 and -5, but for the cross they are all in the positive high 1000 to high 2000 range.
The code which sends these back looks something like (some parts omitted):
void LinearClassifier::Train(float * x,float output, float y)
{
float h = output - y;
float average = 0.0f;
for (int i =1; i < m_NumberOfWeights; ++i)
{
float error = h*x[i-1];
m_pGradients[i-1] = error;
average += error;
}
average /= static_cast<float>(m_NumberOfWeights-1);
for (int theta = 1; theta < m_NumberOfWeights; ++theta)
{
m_pWeights[theta] = m_pWeights[theta] - learningRate*m_pGradients[theta-1];
}
// Bias
m_pWeights[0] -= learningRate*average;
}
This is passed back to the single convolution layer:
// This code is in three nested for loops (for layer,for outWidth, for outHeight)
float gradient = 0.0f;
// ReLu Derivative
if ( m_pOutputBuffer[outputIndex] > 0.0f)
{
gradient = outputGradients[outputIndex];
}
for (int z = 0; z < m_InputDepth; ++z)
{
for ( int u = 0; u < m_FilterSize; ++u)
{
for ( int v = 0; v < m_FilterSize; ++v)
{
int x = outX + u - 1;
int y = outY + v - 1;
int inputIndex = x + y*m_OutputWidth + z*m_OutputWidth*m_OutputHeight;
int kernelIndex = u + v*m_FilterSize + z*m_FilterSize*m_FilterSize;
m_pGradients[inputIndex] += m_Filters[layer][kernelIndex]*gradient;
m_GradientSum[layer][kernelIndex] += input[inputIndex]*gradient;
}
}
}
This code is iterated over by passing each image in a one at a time fashion. The gradients are obviously going in the right direction but how do I stop the huge gradients from throwing the prediction function?
RELU activations are notorious for doing this. You usually have to use a low learning rate. The reasoning behind this is that when the RELU returns positive numbers it can continue to learn freely, but if a unit gets in a position where the signal coming into it is always negative it can become a "dead" neuron and never activate again.
Also initializing your weights is more delicate with RELU. It appears that you are initializing to range 0-1 which creates a huge bias. Two tips here - Use a range centered around 0, and a range that is much smaller. A normal distribution with mean 0 and std 0.02 usually works well.
I fixed it by downscaling the gradients int the CNN layer, but now I'm confused as to why this works/is needed so if anyone has any intuition as to why this works that'd be great.

Subsampling an array of numbers

I have a series of 100 integer values which I need to reduce/subsample to 77 values for the purpose of fitting into a predefined space on screen. This gives a fraction of 77/100 values-per-pixel - not very neat.
Assuming the 77 is fixed and cannot be changed, what are some typical techniques for subsampling 100 numbers down to 77. I get a sense that it will be a jagged mapping, by which I mean the first new value is the average of [0, 1] then the next value is [3], then average [4, 5] etc. But how do I approach getting the pattern for this mapping?
I am working in C++, although I'm more interested in the technique than implementation.
Thanks in advance.
Either if you downsample or you oversample, you are trying to reconstruct a signal over nonsampled points in time... so you have to make some assumptions.
The sampling theorem tells you that if you sample a signal knowing that it has no frequency components over half the sampling frequency, you can continously and completely recover the signal over the whole timing period. There's a way to reconstruct the signal using sinc() functions (this is sin(x)/x)
sinc() (indeed sin(M_PI/Sampling_period*x)/M_PI/x) is a function that has the following properties:
Its value is 1 for x == 0.0 and 0 for x == k*Sampling_period with k == 0, +-1, +-2, ...
It has no frequency component over half of the sampling_frequency derived from Sampling_period.
So if you consider the sum of the functions F_x(x) = Y[k]*sinc(x/Sampling_period - k) to be the sinc function that equals the sampling value at position k and 0 at other sampling value and sum over all k in your sample, you'll get the best continous function that has the properties of not having components on frequencies over half the sampling frequency and have the same values as your samples set.
Said this, you can resample this function at whatever position you like, getting the best way to resample your data.
This is by far, a complicated way of resampling data, (it has also the problem of not being causal, so it cannot be implemented in real time) and you have several methods used in the past to simplify the interpolation. you have to constructo all the sinc functions for each sample point and add them together. Then you have to resample the resultant function to the new sampling points and give that as a result.
Next is an example of the interpolation method just described. It accepts some input data (in_sz samples) and output interpolated data with the method described before (I supposed the extremums coincide, which makes N+1 samples equal N+1 samples, and this makes the somewhat intrincate calculations of (in_sz - 1)/(out_sz - 1) in the code (change to in_sz/out_sz if you want to make plain N samples -> M samples conversion:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
/* normalized sinc function */
double sinc(double x)
{
x *= M_PI;
if (x == 0.0) return 1.0;
return sin(x)/x;
} /* sinc */
/* interpolate a function made of in samples at point x */
double sinc_approx(double in[], size_t in_sz, double x)
{
int i;
double res = 0.0;
for (i = 0; i < in_sz; i++)
res += in[i] * sinc(x - i);
return res;
} /* sinc_approx */
/* do the actual resampling. Change (in_sz - 1)/(out_sz - 1) if you
* don't want the initial and final samples coincide, as is done here.
*/
void resample_sinc(
double in[],
size_t in_sz,
double out[],
size_t out_sz)
{
int i;
double dx = (double) (in_sz-1) / (out_sz-1);
for (i = 0; i < out_sz; i++)
out[i] = sinc_approx(in, in_sz, i*dx);
}
/* test case */
int main()
{
double in[] = {
0.0, 1.0, 0.5, 0.2, 0.1, 0.0,
};
const size_t in_sz = sizeof in / sizeof in[0];
const size_t out_sz = 5;
double out[out_sz];
int i;
for (i = 0; i < in_sz; i++)
printf("in[%d] = %.6f\n", i, in[i]);
resample_sinc(in, in_sz, out, out_sz);
for (i = 0; i < out_sz; i++)
printf("out[%.6f] = %.6f\n", (double) i * (in_sz-1)/(out_sz-1), out[i]);
return EXIT_SUCCESS;
} /* main */
There are different ways of interpolation (see wikipedia)
The linear one would be something like:
std::array<int, 77> sampling(const std::array<int, 100>& a)
{
std::array<int, 77> res;
for (int i = 0; i != 76; ++i) {
int index = i * 99 / 76;
int p = i * 99 % 76;
res[i] = ((p * a[index + 1]) + ((76 - p) * a[index])) / 76;
}
res[76] = a[99]; // done outside of loop to avoid out of bound access (0 * a[100])
return res;
}
Live example
Create 77 new pixels based on the weighted average of their positions.
As a toy example, think about the 3 pixel case which you want to subsample to 2.
Original (denote as multidimensional array original with RGB as [0, 1, 2]):
|----|----|----|
Subsample (denote as multidimensional array subsample with RGB as [0, 1, 2]):
|------|------|
Here, it is intuitive to see that the first subsample seems like 2/3 of the first original pixel and 1/3 of the next.
For the first subsample pixel, subsample[0], you make it the RGB average of the m original pixels that overlap, in this case original[0] and original[1]. But we do so in weighted fashion.
subsample[0][0] = original[0][0] * 2/3 + original[1][0] * 1/3 # for red
subsample[0][1] = original[0][1] * 2/3 + original[1][1] * 1/3 # for green
subsample[0][2] = original[0][2] * 2/3 + original[1][2] * 1/3 # for blue
In this example original[1][2] is the green component of the second original pixel.
Keep in mind for different subsampling you'll have to determine the set of original cells that contribute to the subsample, and then normalize to find the relative weights of each.
There are much more complex graphics techniques, but this one is simple and works.
Everything depends on what you wish to do with the data - how do you want to visualize it.
A very simple approach would be to render to a 100-wide image, and then smooth scale the image down to a narrower size. Whatever graphics/development framework you're using will surely support such an operation.
Say, though, that your goal might be to retain certain qualities of the data, such as minima and maxima. In such a case, for each bin, you're drawing a line of darker color up to the minimum value, and then continue with a lighter color up to the maximum. Or, you could, instead of just putting a pixel at the average value, you draw a line from the minimum to the maximum.
Finally, you might wish to render as if you had 77 values only - then the goal is to somehow transform the 100 values down to 77. This will imply some kind of an interpolation. Linear or quadratic interpolation is easy, but adds distortions to the signal. Ideally, you'd probably want to throw a sinc interpolator at the problem. A good list of them can be found here. For theoretical background, look here.

Optimized float Blur variations

I am looking for optimized functions in c++ for calculating areal averages of floats. the function is passed a source float array, a destination float array (same size as source array), array width and height, "blurring" area width and height.
The function should "wrap-around" edges for the blurring/averages calculations.
Here is example code that blur with a rectangular shape:
/*****************************************
* Find averages extended variations
*****************************************/
void findaverages_ext(float *floatdata, float *dest_data, int fwidth, int fheight, int scale, int aw, int ah, int weight, int xoff, int yoff)
{
printf("findaverages_ext scale: %d, width: %d, height: %d, weight: %d \n", scale, aw, ah, weight);
float total = 0.0;
int spos = scale * fwidth * fheight;
int apos;
int w = aw;
int h = ah;
float* f_temp = new float[fwidth * fheight];
// Horizontal
for(int y=0;y<fheight ;y++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel (including wrap-around edge)
for (int kx = 0; kx <= w; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Wrap
for (int kx = (fwidth-w); kx < fwidth; ++kx)
if (kx >= 0 && kx < fwidth)
total += floatdata[y*fwidth + kx];
// Store first window
f_temp[y*fwidth] = (total / (w*2+1));
for(int x=1;x<fwidth ;x++) // x width changes with y
{
// Substract pixel leaving window
if (x-w-1 >= 0)
total -= floatdata[y*fwidth + x-w-1];
// Add pixel entering window
if (x+w < fwidth)
total += floatdata[y*fwidth + x+w];
else
total += floatdata[y*fwidth + x+w-fwidth];
// Store average
apos = y * fwidth + x;
f_temp[apos] = (total / (w*2+1));
}
}
// Vertical
for(int x=0;x<fwidth ;x++)
{
Sleep(10); // Do not burn your processor
total = 0.0;
// Process entire window for first pixel
for (int ky = 0; ky <= h; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Wrap
for (int ky = fheight-h; ky < fheight; ++ky)
if (ky >= 0 && ky < fheight)
total += f_temp[ky*fwidth + x];
// Store first if not out of bounds
dest_data[spos + x] = (total / (h*2+1));
for(int y=1;y< fheight ;y++) // y width changes with x
{
// Substract pixel leaving window
if (y-h-1 >= 0)
total -= f_temp[(y-h-1)*fwidth + x];
// Add pixel entering window
if (y+h < fheight)
total += f_temp[(y+h)*fwidth + x];
else
total += f_temp[(y+h-fheight)*fwidth + x];
// Store average
apos = y * fwidth + x;
dest_data[spos+apos] = (total / (h*2+1));
}
}
delete f_temp;
}
What I need is similar functions that for each pixel finds the average (blur) of pixels from shapes different than rectangular.
The specific shapes are: "S" (sharp edges), "O" (rectangular but hollow), "+" and "X", where the average float is stored at the center pixel on destination data array. Size of blur shape should be variable, width and height.
The functions does not need to be pixelperfect, only optimized for performance. There could be separate functions for each shape.
I am also happy if anyone can tip me of how to optimize the example function above for rectangluar blurring.
What you are trying to implement are various sorts of digital filters for image processing. This is equivalent to convolving two signals where the 2nd one would be the filter's impulse response. So far, you regognized that a "rectangular average" is separable. By separable I mean, you can split the filter into two parts. One that operates along the X axis and one that operates along the Y axis -- in each case a 1D filter. This is nice and can save you lots of cycles. But not every filter is separable. Averaging along other shapres (S, O, +, X) is not separable. You need to actually compute a 2D convolution for these.
As for performance, you can speed up your 1D averages by properly implementing a "moving average". A proper "moving average" implementation only requires a fixed amount of little work per pixel regardless of the averaging "window". This can be done by recognizing that neighbouring pixels of the target image are computed by an average of almost the same pixels. You can reuse these sums for the neighbouring target pixel by adding one new pixel intensity and subtracting an older one (for the 1D case).
In case of arbitrary non-separable filters your best bet performance-wise is "fast convolution" which is FFT-based. Checkout www.dspguide.com. If I recall correctly, there is even a chapter on how to properly do "fast convolution" using the FFT algorithm. Although, they explain it for 1-dimensional signals, it also applies to 2-dimensional signals. For images you have to perform 2D-FFT/iFFT transforms.
To add to sellibitze's answer, you can use a summed area table for your O, S and + kernels (not for the X one though). That way you can convolve a pixel in constant time, and it's probably the fastest method to do it for kernel shapes that allow it.
Basically, a SAT is a data structure that lets you calculate the sum of any axis-aligned rectangle. For the O kernel, after you've built a SAT, you'd take the sum of the outer rect's pixels and subtract the sum of the inner rect's pixels. The S and + kernels can be implemented similarly.
For the X kernel you can use a different approach. A skewed box filter is separable:
You can convolve with two long, thin skewed box filters, then add the two resulting images together. The center of the X will be counted twice, so will you need to convolve with another skewed box filter, and subtract that.
Apart from that, you can optimize your box blur in many ways.
Remove the two ifs from the inner loop by splitting that loop into three loops - two short loops that do checks, and one long loop that doesn't. Or you could pad your array with extra elements from all directions - that way you can simplify your code.
Calculate values like h * 2 + 1 outside the loops.
An expression like f_temp[ky*fwidth + x] does two adds and one multiplication. You can initialize a pointer to &f_temp[ky*fwidth] outside the loop, and just increment that pointer in the loop.
Don't do the division by h * 2 + 1 in the horizontal step. Instead, divide by the square of that in the vertical step.