Prewitt filter, edge detection - c++

I have this code that implements Prewitt edge detection. What I need to do is to implement it with only one buffer, meaning, I will not create copy of the image but edit original image. So if i want to change pixel with value 78, I cant put the new value e.g. 100 until all surrounding pixels have read value 78. Color values of the pixels. I have tried all day to figure it out but couldn't, if someone would write me some kind of pseudocode I would be very grateful
void filter_serial_prewitt(int *inBuffer, int *outBuffer, int width, int height){
for (int i = 1; i < width - 1; i ++) {
for (int j = 1; j < height - 1; j ++) {
int Fx = 0;
int Fy = 0;
int F = 0;
for (int m = -1; m <= 1; m++) {
for (int n = -1; n <= 1; n++) {
Fx += inBuffer[(j + n) * width + (i + m)] * n;
Fy += inBuffer[(j + n) * width + (i + m)] * m;
}
}
F = abs(Fx) + abs(Fy);
if (F < THRESHOLD){
outBuffer[j * width + i] = 255;
} else{
outBuffer[j * width + i] = 0;
}
}
}
}

One thing to know about a Prewitt operator is that it is separable. See the Wikipedia article for details.
To calculate a single output row, you need to do the following (pseudocode):
int* buffer = malloc (sizeof(int) * width);
for (int i = 0; i < width; i++)
{
// Do the vertical pass of the convolution of the first 3 rows into
// the buffer.
buffer [ i ] = vertical_convolve(inBuffer [ i ], vertical_kernel);
}
// Next, do the horizontal convolution of the first row. We need to
// keep the previous value in a temp buffer while we work
int temp0 = horizontal_convolve(buffer [ 0 ], horizontal_kernel);
for (int i = 1; i < width; i++)
{
int temp1 = horizontal_convolve(buffer[ i ], horizontal_kernel);
inBuffer [ i - 1 ] = temp0;
temp0 = temp1;
}
That requires a buffer that is 1 pixel tall and the width of the image.
To work on the whole image, you need to keep 2 of the above buffers around and after you calculate a pixel on the third line, you can replace the first pixel of the first line of the image with the first pixel of the first buffer. Then you can put the newly calculated value into the buffer.
So in this scenario, you won't keep around an entire second image, but will need to keep around 2 1-pixel tall buffers that are as wide as the image.

Related

Getting top layer of 3d noise

I've generated a cubic world using FastNoiseLite but I don't know how to differentiate top level blocks as grass and bottom one's dirt when using 3d noise.
TArray<float> CalculateNoise(const FVector& ChunkPosition)
{
Densities.Reset();
// ChunkSize is 32
for (int z = 0; z < ChunkSize; z++)
{
for (int y = 0; y < ChunkSize; y++)
{
for (int x = 0; x < ChunkSize; x++)
{
const float Noise = GetNoise(FVector(ChunkPosition.X + x, ChunkPosition.Y + y, ChunkPosition.Z + z));
Densities.Add(Noise - ChunkPosition.Z);
}
}
}
return Densities;
}
void AddCubeMaterial(const FVector& ChunkPosition)
{
const int32 DensityIndex = GetIndex(ChunkPosition);
const float Density = Densities[DensityIndex];
if (Density < 1)
{
// Add Grass block
}
// Add dirt block
}
void GetNoise(const FVector& Position) const
{
const float Height = 280.f;
if (bIs3dNoise)
{
FastNoiseLiteObj->GetNoise(Position.X, Position.Y, Position.Z) * Height;
}
FastNoiseLiteObj->GetNoise(Position.X, Position.Y) * Height;
}
This is the result when using 3D noise.
3D Noise result
But if I switch to 2D noise it works perfectly fine.
2D Noise result
This answer applies to Perlin like noise.
Your integer chunk size is dis-contiguous in noise space.
'Position' needs to be scaled by 1/Height. To scale the noise as a contiguous block. Then scale by Height.
If you were happy with the XY axes(2D), you could limit the scaling to the Z axis:
FastNoiseLiteObj->GetNoise(Position.X, Position.Y, Position.Z / Height) * Height;
This adjustment provides a noise continuous Z block location with respect to Position(X,Y).
Edit in response to comments
Contiguous:
The noise algorithm guarantees continuous output in all dimensions.
By sampling every 32 pixels (dis-contiguous sampling), The continuity is broken, on purpose(?) and augmented by the Density.
To guarantee a top level grass layer:
Densities.Add(Noise + (ChunkPosition.Z > Threshold) ? 1: 0);
Your code- ChunkPosition.Z made grass thicker as it went down. Add it back if you wish.
To add random overhangs/underhangs reduce the Density threshold randomly:
if (Density < (rnd() < 0.125)? 0.5 : 1)
I leave the definition of rnd() to your preferred random distribution.
To almost always have overhangs, requires forward lookup of the next and previous blocks' Z in noise.
Precalculate the noise values for the next line into alternating arrays 2 wider than the width to support the edges set at 0.
The algorithm is:
// declare arrays: currentnoise[ChunkSize + 2] and nextnoise[ChunkSize +2] and alpha=.2; //see text
for (int y = 0; y < ChunkSize; y++) // note the reorder y-z-x
{
// pre load currentnoise for z=0
currentnoise[0] = 0;
currentnoise[ChunkSize+1] = 0;
for (int x = 0; x < ChunkSize; x++)
{
currentnoise[x + 1] = GetNoise(FVector(ChunkPosition.X + x, ChunkPosition.Y + y, ChunkPosition.Z));
}
for (int z = 1; z < ChunkSize -2; z++)
{
nextnoise[0] = 0;
nextnoise[ChunkSize+1] = 0;
// load next
for (int x = 0; x < ChunkSize; x++)
{
nextnoise[x + 1] = GetNoise(FVector(ChunkPosition.X + x, ChunkPosition.Y + y, ChunkPosition.Z + z+1));
}
// apply current with next
for (int x = 0; x < ChunkSize; x++)
{
Densities.Add(currentnoise[x + 1] * .75 + nextnoise[x+2] * alpha + nextnoise[x] * alpha);
}
// move next to current in a memory safe manor:
// it is faster to swap pointers, but this is much safer for portability
for (int i = 1; i < ChunkSize + 1; i++)
currentnoise[i]=nextnoise[i];
}
// apply last z(no next)
for (int x = 0; x < ChunkSize; x++)
{
Densities.Add(currentnoise[X + 1]);
}
}
Where alpha is approximately between .025 and .25 depending on preferred fill amounts.
The 2 inner most x for loops could be streamlined into 1 loop, but left separate for readability.(it requires 2 preloads)

How to reorder raw image color data to achieve a specific 2 by 2 format from four images? (C++)

I have the raw color data for four images, let's call them 1, 2, 3, and 4. I am storing the data in an unsigned char * with allocated memory. Individually I can manipulate or encode the images but when trying to concatenate or order them into a single image it works but takes more time than I would like.
I would like to create a 2 by 2 of the raw image data to encode as a single image.
1 2
3 4
For my example each image is 400 by 225 with RGBA (360000 bytes). Iim doing a for loop with memcpy where
for (int j = 0; j < 225; j++)
{
std::memcpy(dest + (j * (400 + 400) * 4), src + (j * 400 * 4), 400 * 4); //
}
for each image with an offset for the starting position added in (the example above would only work for the top left of course).
This works but I'm wondering if this is a solved problem with a better solution, either in an algorithm described somewhere or a small library.
#include <iostream>
const int width = 6;
const int height = 4;
constexpr int n = width * height;
int main()
{
unsigned char a[n], b[n], c[n], d[n];
unsigned char dst[n * 4];
int i = 0, j = 0;
/* init data */
for (; i < n; i++) {
a[i] = 'a';
b[i] = 'b';
c[i] = 'c';
d[i] = 'd';
}
/* re-order */
i = 0;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++, i++, j++) {
dst[i ] = a[j];
dst[i + width] = b[j];
dst[i + n * 2 ] = c[j];
dst[i + n * 2 + width] = d[j];
}
i += width;
}
/* print result */
i = 0;
for (int y = 0; y < height * 2; y++) {
for (int x = 0; x < width * 2; x++, i++)
std::cout << dst[i];
std::cout << '\n';
}
return 0;
}

grayscale Laplace sharpening implementation

I am trying to implement Laplace sharpening using C++ , here's my code so far:
img = imread("cow.png", 0);
Mat convoSharp() {
//creating new image
Mat res = img.clone();
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = 0.0;
}
}
//variable declaration
int filter[3][3] = { {0,1,0},{1,-4,1},{0,1,0} };
//int filter[3][3] = { {-1,-2,-1},{0,0,0},{1,2,1} };
int height = img.rows;
int width = img.cols;
int filterHeight = 3;
int filterWidth = 3;
int newImageHeight = height - filterHeight + 1;
int newImageWidth = width - filterWidth + 1;
int i, j, h, w;
//convolution
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
}
}
}
}
//img - laplace
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = img.at<uchar>(y, x) - res.at<uchar>(y, x);
}
}
return res;
}
I don't really know what went wrong, I also tried different filter (1,1,1),(1,-8,1),(1,1,1) and the result is also same (more or less). I don't think that I need to normalize the result because the result is in range of 0 - 255. Can anyone explain what really went wrong in my code?
Problem: uchar is too small to hold partial results of filerting operation.
You should create a temporary variable and add all the filtered positions to this variable then check if value of temp is in range <0,255> if not, you need to clamp the end result to fit <0,255>.
By executing below line
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
partial result may be greater than 255 (max value in uchar) or negative (in filter you have -4 or -8). temp has to be singed integer type to handle the case when partial result is negative value.
Fix:
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
int temp = res.at<uchar>(i,j); // added
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
temp += filter[h - i][w - j] * img.at<uchar>(h,w); // add to temp
}
}
// clamp temp to <0,255>
res.at<uchar>(i,j) = temp;
}
}
You should also clamp values to <0,255> range when you do the subtraction of images.
The problem is partially that you’re overflowing your uchar, as rafix07 suggested, but that is not the full problem.
The Laplace of an image contains negative values. It has to. And you can’t clamp those to 0, you need to preserve the negative values. Also, it can values up to 4*255 given your version of the filter. What this means is that you need to use a signed 16 bit type to store this output.
But there is a simpler and more efficient approach!
You are computing img - laplace(img). In terms of convolutions (*), this is 1 * img - laplace_kernel * img = (1 - laplace_kernel) * img. That is to say, you can combine both operations into a single convolution. The 1 kernel that doesn’t change the image is [(0,0,0),(0,1,0),(0,0,0)]. Subtract your Laplace kernel from that and you obtain [(0,-1,0),(-1,5,-1),(0,-1,0)].
So, simply compute the convolution with that kernel, and do it using int as intermediate type, which you then clamp to the uchar output range as shown by rafix07.

C++ Pattern Matching with FFT cross-correlation (Images)

everyone I am trying to implement patter matching with FFT but I am not sure what the result should be (I think I am missing something even though a read a lot of stuff about the problem and tried a lot of different implementations this one is the best so far). Here is my FFT correlation function.
void fft2d(fftw_complex**& a, int rows, int cols, bool forward = true)
{
fftw_plan p;
for (int i = 0; i < rows; ++i)
{
p = fftw_plan_dft_1d(cols, a[i], a[i], forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
}
fftw_complex* t = (fftw_complex*)fftw_malloc(rows * sizeof(fftw_complex));
for (int j = 0; j < cols; ++j)
{
for (int i = 0; i < rows; ++i)
{
t[i][0] = a[i][j][0];
t[i][1] = a[i][j][1];
}
p = fftw_plan_dft_1d(rows, t, t, forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
for (int i = 0; i < rows; ++i)
{
a[i][j][0] = t[i][0];
a[i][j][1] = t[i][1];
}
}
fftw_free(t);
}
int findCorrelation(int argc, char* argv[])
{
BMP bigImage;
BMP keyImage;
BMP result;
RGBApixel blackPixel = { 0, 0, 0, 1 };
const bool swapQuadrants = (argc == 4);
if (argc < 3 || argc > 4) {
cout << "correlation img1.bmp img2.bmp" << endl;
return 1;
}
if (!keyImage.ReadFromFile(argv[1])) {
return 1;
}
if (!bigImage.ReadFromFile(argv[2])) {
return 1;
}
//Preparations
const int maxWidth = std::max(bigImage.TellWidth(), keyImage.TellWidth());
const int maxHeight = std::max(bigImage.TellHeight(), keyImage.TellHeight());
const int rowsCount = maxHeight;
const int colsCount = maxWidth;
BMP bigTemp = bigImage;
BMP keyTemp = keyImage;
keyImage.SetSize(maxWidth, maxHeight);
bigImage.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
if (i < bigTemp.TellHeight() && j < bigTemp.TellWidth()) {
p1 = bigTemp.GetPixel(j, i);
} else {
p1 = blackPixel;
}
bigImage.SetPixel(j, i, p1);
RGBApixel p2;
if (i < keyTemp.TellHeight() && j < keyTemp.TellWidth()) {
p2 = keyTemp.GetPixel(j, i);
} else {
p2 = blackPixel;
}
keyImage.SetPixel(j, i, p2);
}
//Here is where the transforms begin
fftw_complex **a = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **b = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **c = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
for (int i = 0; i < rowsCount; ++i) {
a[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
b[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
c[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
p1 = bigImage.GetPixel(j, i);
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
RGBApixel p2;
p2 = keyImage.GetPixel(j, i);
b[i][j][0] = (0.299*p2.Red + 0.587*p2.Green + 0.114*p2.Blue);
b[i][j][1] = 0.0;
}
}
fft2d(a, rowsCount, colsCount);
fft2d(b, rowsCount, colsCount);
result.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
fftw_complex& y = a[i][j];
fftw_complex& x = b[i][j];
double u = x[0], v = x[1];
double m = y[0], n = y[1];
c[i][j][0] = u*m + n*v;
c[i][j][1] = v*m - u*n;
int fx = j;
if (fx>(colsCount / 2)) fx -= colsCount;
int fy = i;
if (fy>(rowsCount / 2)) fy -= rowsCount;
float r2 = (fx*fx + fy*fy);
const double cuttoffCoef = (maxWidth * maxHeight) / 37992.;
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
}
fft2d(c, rowsCount, colsCount, false);
const int halfCols = colsCount / 2;
const int halfRows = rowsCount / 2;
if (swapQuadrants) {
for (int i = 0; i < halfRows; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i + halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i + halfRows][j + halfCols][1]);
}
for (int i = halfRows; i < rowsCount; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i - halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i - halfRows][j + halfCols][1]);
}
}
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
const double& g = c[i][j][0];
RGBApixel pixel;
pixel.Alpha = 0;
int gInt = 255 - static_cast<int>(std::floor(g + 0.5));
pixel.Red = gInt;
pixel.Green = gInt;
pixel.Blue = gInt;
result.SetPixel(j, i, pixel);
}
BMP res;
res.SetSize(maxWidth, maxHeight);
result.WriteToFile("result.bmp");
return 0;
}
Sample output
This question would probably be more appropriately posted on another site like cross validated (metaoptimize.com used to also be a good one, but it appears to be gone)
That said:
There's two similar operations you can perform with FFT: convolution and correlation. Convolution is used for determining how two signals interact with each-other, whereas correlation can be used to express how similar two signals are to each-other. Make sure you're doing the right operation as they're both commonly implemented throught a DFT.
For this type of application of DFTs you usually wouldn't extract any useful information in the fourier spectrum unless you were looking for frequencies common to both data sources or whatever (eg, if you were comparing two bridges to see if their supports are spaced similarly).
Your 3rd image looks a lot like the power domain; normally I see the correlation output entirely grey except where overlap occurred. Your code definitely appears to be computing the inverse DFT, so unless I'm missing something the only other explanation I've come up with for the fuzzy look could be some of the "fudge factor" code in there like:
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
As for what you should expect: wherever there are common elements between the two images you'll see a peak. The larger the peak, the more similar the two images are near that region.
Some comments and/or recommended changes:
1) Convolution & correlation are not scale invariant operations. In other words, the size of your pattern image can make a significant difference in your output.
2) Normalize your images before correlation.
When you get the image data ready for the forward DFT pass:
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
/* ... */
How you grayscale the image is your business (though I would've picked something like sqrt( r*r + b*b + g*g )). However, I don't see you doing anything to normalize the image.
The word "normalize" can take on a few different meanings in this context. Two common types:
normalize the range of values between 0.0 and 1.0
normalize the "whiteness" of the images
3) Run your pattern image through an edge enhancement filter. I've personally made use of canny, sobel, and I think I messed with a few others. As I recall, canny was "quick'n dirty", sobel was more expensive, but I got comparable results when it came time to do correlation. See chapter 24 of the "dsp guide" book that's freely available online. The whole book is worth your time, but if you're low on time then at a minimum chapter 24 will help a lot.
4) Re-scale the output image between [0, 255]; if you want to implement thresholds, do it after this step because the thresholding step is lossy.
My memory on this one is hazy, but as I recall (edited for clarity):
You can scale the final image pixels (before rescaling) between [-1.0, 1.0] by dividing off the largest power spectrum value from the entire power spectrum
The largest power spectrum value is, conveniently enough, the center-most value in the power spectrum (corresponding to the lowest frequency)
If you divide it off the power spectrum, you'll end up doing twice the work; since FFTs are linear, you can delay the division until after the inverse DFT pass to when you're re-scaling the pixels between [0..255].
If after rescaling most of your values end up so black you can't see them, you can use a solution to the ODE y' = y(1 - y) (one example is the sigmoid f(x) = 1 / (1 + exp(-c*x) ), for some scaling factor c that gives better gradations). This has more to do with improving your ability to interpret the results visually than anything you might use to programmatically find peaks.
edit I said [0, 255] above. I suggest you rescale to [128, 255] or some other lower bound that is gray rather than black.

How to load nearby pixels using pointer

Suppose i have an image matrix and i am at a particular pixel [say 4] like this:
0 1 2
3 `4` 5
6 7 8
I am trying to cycle through all pixels and am attempting to access 0,1,2, 3,5 6,7,8 whose values i am storing in the array called Pixel.... here is my attempt at it using OpenCV, kindly tell me where am i going wrong.
I am using pointer temp_ptr to access the IplImage image.
uchar* temp_ptr=0 ;
CvScalar Pixel[3][3];
int rows=image->height,cols=image->width,row,col;
for( row = 0; row < rows-2; ++row)
{
for ( col = 0; col < cols-2; ++col)
{
temp_ptr = &((uchar*)(image->imageData + (image->widthStep*row)))[col*3];
for (int krow = -1 ; krow <= 1; krow++)
{
for (int kcol = -1; kcol <= 1; kcol++)
{
temp_ptr = &((uchar*)(image->imageData + (image->widthStep*row+krow)))[(col+kcol)*3];
for(int i=0; i < 3; i++)
{
for(int j=0; j < 3; j++)
{
for(int k=0; k < 3; k++)
{
Pixel[i][j].val[k]=temp_ptr[k];
}
}
}
}
}
}
}
I am not really sure how to load the sorrounding Pixels usingtemp_ptr, please help me out.
Well sir, it sounds like you want to do convolution, and doing it this way when you have OpenCV at your fingertips is a bit like hammering a can opener on your Spaghettios to burst it open by blunt force.
In fact, what you're doing is almost exactly the output of cv::blur(src, dst, cv::Size(3,3)) except it also includes the center pixel in the average.
If you want to exclude the center pixel then you can create a custom kernel - just a matrix with appropriate weights:
[.125 .125 .125
.125 0 .125
.125 .125 .125 ]
and apply this to the image with cv::filter2d(src, dst, -1, kernel).
Assuming image->imageData is in RGB format, so there are 3 bytes for each pixel, you could do something like this:
int rows = image->height;
int cols = image->width;
uchar* temp_ptr = 0;
CvScalar pixels[8];
for (int col = 0; col < image->height - 2; col++) {
temp_ptr = image->imageData + image->width * col + 1;
for (int row = 0; row < image->width - 2; row++) {
temp_ptr += row * 3;
pixels[0].val = temp_ptr - width * 3 - 3; // pixel 0 from your example
pixels[1].val = temp_ptr - width * 3; // 1
pixels[2].val = temp_ptr - width * 3 + 3; // 2
pixels[3].val = temp_ptr - 3; // 4
pixels[4].val = temp_ptr + 3; // etc...
pixels[5].val = temp_ptr + width * 3 - 3;
pixels[6].val = temp_ptr + width * 3;
pixels[7].val = temp_ptr + width * 3 + 3;
// calculate averages here and store them somewhere (in a vector perhaps)
}
}
Note I didn't test this code.
First of all you have to start learning some programming. Your complete code is a mess.
Some major problems I could quickly found:
First of all you have to start your first two for loops from 1 (because you decrement by -1 when you apply the window) and you will end up reading some memory address that are not allocated.
Second the first temp_ptr = &((uchar*)(image->imageData + (image->widthStep*row)))[col*3] is useless so you can remove it.
the other
temp_ptr = &((uchar*)(image->imageData + (image->widthStep*row+krow)))[(col+kcol)*3];
is having a small problem, the operator precedence, should be:
temp_ptr = &((uchar*)(image->imageData + image->widthStep*(row+krow))[(col+kcol)*3];
you don't need the other 3 inside loops
Also is not clear what you want to do, you want to get the neighborhood of a specific pixel (then you need no loops) or you want to apply a kernel to each pixel from the image.