I am trying to implement Laplace sharpening using C++ , here's my code so far:
img = imread("cow.png", 0);
Mat convoSharp() {
//creating new image
Mat res = img.clone();
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = 0.0;
}
}
//variable declaration
int filter[3][3] = { {0,1,0},{1,-4,1},{0,1,0} };
//int filter[3][3] = { {-1,-2,-1},{0,0,0},{1,2,1} };
int height = img.rows;
int width = img.cols;
int filterHeight = 3;
int filterWidth = 3;
int newImageHeight = height - filterHeight + 1;
int newImageWidth = width - filterWidth + 1;
int i, j, h, w;
//convolution
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
}
}
}
}
//img - laplace
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = img.at<uchar>(y, x) - res.at<uchar>(y, x);
}
}
return res;
}
I don't really know what went wrong, I also tried different filter (1,1,1),(1,-8,1),(1,1,1) and the result is also same (more or less). I don't think that I need to normalize the result because the result is in range of 0 - 255. Can anyone explain what really went wrong in my code?
Problem: uchar is too small to hold partial results of filerting operation.
You should create a temporary variable and add all the filtered positions to this variable then check if value of temp is in range <0,255> if not, you need to clamp the end result to fit <0,255>.
By executing below line
res.at<uchar>(i,j) += filter[h - i][w - j] * img.at<uchar>(h,w);
partial result may be greater than 255 (max value in uchar) or negative (in filter you have -4 or -8). temp has to be singed integer type to handle the case when partial result is negative value.
Fix:
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
int temp = res.at<uchar>(i,j); // added
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
temp += filter[h - i][w - j] * img.at<uchar>(h,w); // add to temp
}
}
// clamp temp to <0,255>
res.at<uchar>(i,j) = temp;
}
}
You should also clamp values to <0,255> range when you do the subtraction of images.
The problem is partially that you’re overflowing your uchar, as rafix07 suggested, but that is not the full problem.
The Laplace of an image contains negative values. It has to. And you can’t clamp those to 0, you need to preserve the negative values. Also, it can values up to 4*255 given your version of the filter. What this means is that you need to use a signed 16 bit type to store this output.
But there is a simpler and more efficient approach!
You are computing img - laplace(img). In terms of convolutions (*), this is 1 * img - laplace_kernel * img = (1 - laplace_kernel) * img. That is to say, you can combine both operations into a single convolution. The 1 kernel that doesn’t change the image is [(0,0,0),(0,1,0),(0,0,0)]. Subtract your Laplace kernel from that and you obtain [(0,-1,0),(-1,5,-1),(0,-1,0)].
So, simply compute the convolution with that kernel, and do it using int as intermediate type, which you then clamp to the uchar output range as shown by rafix07.
Related
I am trying to implement laplacian filter for sharpening an image.
but the result is kinda grey , I don't know what went wrong with my code.
Here's my work so far
img = imread("moon.png", 0);
Mat convoSharp() {
//creating new image
Mat res = img.clone();
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
res.at<uchar>(y, x) = 0.0;
}
}
//variable declaration
//change -5 to -4 for original result.
int filter[3][3] = { {0,1,0},{1,-4,1},{0,1,0} };
//int filter[3][3] = { {-1,-2,-1},{0,0,0},{1,2,1} };
int height = img.rows;
int width = img.cols;
int **temp = new int*[height];
for (int i = 0; i < height; i++) {
temp[i] = new int[width];
}
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
temp[i][j] = 0;
}
}
int filterHeight = 3;
int filterWidth = 3;
int newImageHeight = height - filterHeight + 1;
int newImageWidth = width - filterWidth + 1;
int i, j, h, w;
//convolution
for (i = 0; i < newImageHeight; i++) {
for (j = 0; j < newImageWidth; j++) {
for (h = i; h < i + filterHeight; h++) {
for (w = j; w < j + filterWidth; w++) {
temp[i][j] += filter[h - i][w - j] * (int)img.at<uchar>(h, w);
}
}
}
}
//find max and min
int max = 0;
int min = 100;
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
if (temp[i][j] > max) {
max = temp[i][j];
}
if (temp[i][j] < min) {
min = temp[i][j];
}
}
}
//clamp 0 - 255
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
res.at<uchar>(i, j) = 0 + (temp[i][j] - min)*(255 - 0) / (max - min);
}
}
//empty the temp array
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
temp[i][j] = 0;
}
}
//img - res and store it in temp array
for (int y = 0; y < res.rows; y++) {
for (int x = 0; x < res.cols; x++) {
//int a = (int)img.at<uchar>(y, x) - (int)res.at<uchar>(y, x);
//cout << a << endl;
temp[y][x] = (int)img.at<uchar>(y, x) - (int)res.at<uchar>(y, x);
}
}
//find the new max and min
max = 0;
min = 100;
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
if (temp[i][j] > max) {
max = temp[i][j];
}
if (temp[i][j] < min) {
min = temp[i][j];
}
}
}
//clamp it back to 0-255
for (int i = 0; i < height; i++) {
for (int j = 0; j < width; j++) {
res.at<uchar>(i, j) = 0 + (temp[i][j] - min)*(255 - 0) / (max - min);
temp[i][j] = (int)res.at<uchar>(i, j);
}
}
return res;
}
And here's the result
as you can see in my code above , i already normalize the pixel value to 0-255. i still don't know what went wrong here. Can anyone here explain why is that ?
The greyness is because, as Max suggested in his answer, you are scaling to the 0-255 range, not clamping (as your comments in the code suggest).
However, that is not all of the issues in your code. The output of the Laplace operator contains negative values. You nicely store these in an int. But then you scale and copy over to a char. Don't do that!
You need to add the result of the Laplace unchanged to your image. This way, some pixels in your image will become darker, and some lighter. This is what causes the edges to appear sharper.
Simply skip some of the loops in your code, and keep one that does temp = img - temp. That result you can freely scale or clamp to the output range and cast to char.
To clamp, simply set any pixel values below 0 to 0, and any above 255 to 255. Don't compute min/max and scale as you do, because there you reduce contrast and create the greyish wash over your image.
Your recent question is quite similar (though the problem in the code was different), read my answer there again, it suggests a way to further simplify your code so that img-Laplace becomes a single convolution.
The problem is that you are clamping and rescaling the image. Look at the bottom left border of the moon: There are very bright pixels next to very dark pixels, and then some gray pixels right besides the bright ones. Your sharpening filter will really spike on that bright border and increase the maximum. Similarly, the black pixels will be reduced even further.
You then determine minimum and maximum and rescale the entire image. This necessarily means the entire image will lose contrast when displayed in the previous gray scale, because your filter outputted pixel values above 255 and below 0.
Looks closely at the border of the moon in the output image:
There is a black halo (the new 0) and a bright, sharp edge (the new 255). (The browser image scaling made it less crisp in this screenshot, look at your original output). Everything else was squashed by the rescaling, so what was previous black (0) is now dark gray.
I was learning filters in OpenCV, but I'm a little confused about the Laplacian filter. My result is very different from the Laplacian filter in OpenCV lib.
For first, I use a Gaussian filter for the image:
Mat filtroGauss(Mat src){
Mat gauss = src.clone();
Mat temp(src.rows+2,src.cols+2,DataType<uchar>::type);
int y,x;
for (y=0; y<src.rows; y++){
for (x=0; x<src.cols; x++) temp.at<uchar>(y+1,x+1) = src.at<uchar>(y,x);
}
int mask[lenMask*lenMask];
mask[0] = mask[2] = mask[6] = mask[8] = 1;
mask[1] = mask[3] = mask[5] = mask[7] = 2;
mask[4] = 4;
int denominatore = 0;
for (int i=0; i<lenMask*lenMask; i++) denominatore += mask[i];
int value[lenMask*lenMask];
for(y=0; y<src.rows; y++){
for (x=0; x<src.cols; x++){
value[0] = temp.at<uchar>(y-1,x-1)*mask[0];
value[1] = temp.at<uchar>(y-1,x)*mask[1];
value[2] = temp.at<uchar>(y-1,x+1)*mask[2];
value[3] = temp.at<uchar>(y,x-1)*mask[3];
value[4] = temp.at<uchar>(y,x)*mask[4];
value[5] = temp.at<uchar>(y,x+1)*mask[5];
value[6] = temp.at<uchar>(y+1,x-1)*mask[6];
value[7] = temp.at<uchar>(y+1,x)*mask[7];
value[8] = temp.at<uchar>(y+1,x+1)*mask[8];
int avg = 0;
for(int i=0; i<lenMask*lenMask; i++)avg+=value[i];
avg = avg/denominatore;
gauss.at<uchar>(y,x) = avg;
}
}
return gauss;
}
Then I use the Laplacian function:
L(y,x) = f(y-1,x) + f(y+1,x) + f(y,x-1) + f(y,x+1) + 4*f(y,x)
Mat filtroLaplace(Mat src){
Mat output = src.clone();
Mat temp = src.clone();
int y,x;
for (y =1; y<src.rows-1; y++){
for(x =1; x<src.cols-1; x++){
output.at<uchar>(y,x) = temp.at<uchar>(y-1,x) + temp.at<uchar>(y+1,x) + temp.at<uchar>(y,x-1) + temp.at<uchar>(y,x+1) -4*( temp.at<uchar>(y,x));
}
}
return output;
}
And here is the final result from my code:
OpenCV result:
Let's rewrite the function a little, so it's easier to discuss:
cv::Mat filtroLaplace(cv::Mat src)
{
cv::Mat output = src.clone();
for (int y = 1; y < src.rows - 1; y++) {
for (int x = 1; x < src.cols - 1; x++) {
int sum = src.at<uchar>(y - 1, x)
+ src.at<uchar>(y + 1, x)
+ src.at<uchar>(y, x - 1)
+ src.at<uchar>(y, x + 1)
- 4 * src.at<uchar>(y, x);
output.at<uchar>(y, x) = sum;
}
}
return output;
}
The source of your problem is sum. Let's examine its range in scope of this algorithm, by taking the two extremes:
Black pixel, surrounded by 4 white. That means 255 + 255 + 255 + 255 - 4 * 0 = 1020.
White pixel, surrounded by 4 black. That means 0 + 0 + 0 + 0 - 4 * 255 = -1020.
When you perform output.at<uchar>(y, x) = sum; there's an implicit cast of the int back to unsigned char -- the high order bits simply get chopped off and the value overflows.
The correct approach to handle this situation (which OpenCV takes), is to perform saturation before the actual cast. Essentially
if (sum < 0) {
sum = 0;
} else if (sum > 255) {
sum = 255;
}
OpenCV provides function cv::saturate_cast<T> to do just this.
There's an additional problem that you're not handling the edge rows/columns of the input image -- you just leave them at the original value. Since you're not asking about that, I'll leave solving that as an excercise to the reader.
Code:
cv::Mat filtroLaplace(cv::Mat src)
{
cv::Mat output = src.clone();
for (int y = 1; y < src.rows - 1; y++) {
for (int x = 1; x < src.cols - 1; x++) {
int sum = src.at<uchar>(y - 1, x)
+ src.at<uchar>(y + 1, x)
+ src.at<uchar>(y, x - 1)
+ src.at<uchar>(y, x + 1)
- 4 * src.at<uchar>(y, x);
output.at<uchar>(y, x) = cv::saturate_cast<uchar>(sum);
}
}
return output;
}
Sample input:
Output of corrected filtroLaplace:
Output of cv::Laplacian:
My code seems to have a bug somewhere but I just can't catch it. I'm passing a 2d array to three sequential functions. First function populates it, second function modifies the values to 1's and 0's, the third function counts the 1's and 0's. I can access the array easily inside the first two functions, but I get an access violation at the first iteration of the third one.
Main
text_image_data = new int*[img_height];
for (i = 0; i < img_height; i++) {
text_image_data[i] = new int[img_width];
}
cav_length = new int[numb_of_files];
// Start processing - load each image and find max cavity length
for (proc = 0; proc < numb_of_files; proc++)
{
readImage(filles[proc], text_image_data, img_height, img_width);
threshold = makeBinary(text_image_data, img_height, img_width);
cav_length[proc] = measureCavity(bullet[0], img_width, bullet[1], img_height, text_image_data);
}
Functions
int makeBinary(int** img, int height, int width)
{
int threshold = 0;
unsigned long int sum = 0;
for (int k = 0; k < width; k++)
{
sum = sum + img[1][k] + img[2][k] + img[3][k] + img[4][k] + img[5][k];
}
threshold = sum / (width * 5);
for (int i = 0; i < height; i++)
{
for (int j = 0; j < width; j++)
{
img[i][j] = img[i][j] > threshold ? 1 : 0;
}
}
return threshold;
}
// Count pixels - find length of cavity here
int measureCavity(int &x, int& width, int &y, int &height, int **img)
{
double mean = 1.;
int maxcount = 0;
int pxcount = 0;
int i = x - 1;
int j;
int pxsum = 0;
for (j = 0; j < height - 2; j++)
{
while (mean > 0.0)
{
for (int ii = i; ii > i - 4; ii--)
{
pxsum = pxsum + img[ii][j] + img[ii][j + 1];
}
mean = pxsum / 4.;
pxcount += 2;
i += 2;
pxsum = 0;
}
maxcount = std::max(maxcount, pxcount);
pxcount = 0;
j++;
}
return maxcount;
}
I keep getting an access violation in the measureCavity() function. I'm passing and accessing the array text_image_data the same way as in makeBinary() and readImage(), and it works just fine for those functions. The size is [550][70], I'm getting the error when trying to access [327][0].
Is there a better, more reliable way to pass this array between the functions?
I am trying to make an alphatrimmed filter in openCV library. My code is not working properly and the resultant image is not looking as image after filtering.
The filter should work in the following way.
Chossing some (array) of pixels in my example it is 9 pixels '3x3' window.
Ordering them in increasing way.
Cutting our 'array' both sides for alpha-2.
calculating arithmetic mean of remaining pixels and inserting them in proper place.
int alphatrimmed(Mat img, int alpha)
{
Mat img9 = img.clone();
const int start = alpha/2 ;
const int end = 9 - (alpha/2);
//going through whole image
for (int i = 1; i < img.rows - 1; i++)
{
for (int j = 1; j < img.cols - 1; j++)
{
uchar element[9];
Vec3b element3[9];
int k = 0;
int a = 0;
//selecting elements for window 3x3
for (int m = i -1; m < i + 2; m++)
{
for (int n = j - 1; n < j + 2; n++)
{
element3[a] = img.at<Vec3b>(m*img.cols + n);
a++;
for (int c = 0; c < img.channels(); c++)
{
element[k] += img.at<Vec3b>(m*img.cols + n)[c];
}
k++;
}
}
//comparing and sorting elements in window (uchar element [9])
for (int b = 0; b < end; b++)
{
int min = b;
for (int d = b + 1; d < 9; d++)
{
if (element[d] < element[min])
{
min = d;
const uchar temp = element[b];
element[b] = element[min];
element[min] = temp;
const Vec3b temporary = element3[b];
element3[b] = element3[min];
element3[min] = temporary;
}
}
}
// index in resultant image( after alpha-trimmed filter)
int result = (i - 1) * (img.cols - 2) + j - 1;
for (int l = start ; l < end; l++)
img9.at<Vec3b>(result) += element3[l];
img9.at<Vec3b>(result) /= (9 - alpha);
}
}
namedWindow("AlphaTrimmed Filter", WINDOW_AUTOSIZE);
imshow("AlphaTrimmed Filter", img9);
return 0;
}
Without actual data, it's somewhat of a guess, but an uchar can't hold the sum of 3 channels. It works modulo 256 (at least on any platform OpenCV supports).
The proper solution is std::sort with a proper comparator for your Vec3b :
void L1(Vec3b a, Vec3b b) { return a[0]+a[1]+a[2] < b[0]+b[1]+b[2]; }
everyone I am trying to implement patter matching with FFT but I am not sure what the result should be (I think I am missing something even though a read a lot of stuff about the problem and tried a lot of different implementations this one is the best so far). Here is my FFT correlation function.
void fft2d(fftw_complex**& a, int rows, int cols, bool forward = true)
{
fftw_plan p;
for (int i = 0; i < rows; ++i)
{
p = fftw_plan_dft_1d(cols, a[i], a[i], forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
}
fftw_complex* t = (fftw_complex*)fftw_malloc(rows * sizeof(fftw_complex));
for (int j = 0; j < cols; ++j)
{
for (int i = 0; i < rows; ++i)
{
t[i][0] = a[i][j][0];
t[i][1] = a[i][j][1];
}
p = fftw_plan_dft_1d(rows, t, t, forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
for (int i = 0; i < rows; ++i)
{
a[i][j][0] = t[i][0];
a[i][j][1] = t[i][1];
}
}
fftw_free(t);
}
int findCorrelation(int argc, char* argv[])
{
BMP bigImage;
BMP keyImage;
BMP result;
RGBApixel blackPixel = { 0, 0, 0, 1 };
const bool swapQuadrants = (argc == 4);
if (argc < 3 || argc > 4) {
cout << "correlation img1.bmp img2.bmp" << endl;
return 1;
}
if (!keyImage.ReadFromFile(argv[1])) {
return 1;
}
if (!bigImage.ReadFromFile(argv[2])) {
return 1;
}
//Preparations
const int maxWidth = std::max(bigImage.TellWidth(), keyImage.TellWidth());
const int maxHeight = std::max(bigImage.TellHeight(), keyImage.TellHeight());
const int rowsCount = maxHeight;
const int colsCount = maxWidth;
BMP bigTemp = bigImage;
BMP keyTemp = keyImage;
keyImage.SetSize(maxWidth, maxHeight);
bigImage.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
if (i < bigTemp.TellHeight() && j < bigTemp.TellWidth()) {
p1 = bigTemp.GetPixel(j, i);
} else {
p1 = blackPixel;
}
bigImage.SetPixel(j, i, p1);
RGBApixel p2;
if (i < keyTemp.TellHeight() && j < keyTemp.TellWidth()) {
p2 = keyTemp.GetPixel(j, i);
} else {
p2 = blackPixel;
}
keyImage.SetPixel(j, i, p2);
}
//Here is where the transforms begin
fftw_complex **a = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **b = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **c = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
for (int i = 0; i < rowsCount; ++i) {
a[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
b[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
c[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
p1 = bigImage.GetPixel(j, i);
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
RGBApixel p2;
p2 = keyImage.GetPixel(j, i);
b[i][j][0] = (0.299*p2.Red + 0.587*p2.Green + 0.114*p2.Blue);
b[i][j][1] = 0.0;
}
}
fft2d(a, rowsCount, colsCount);
fft2d(b, rowsCount, colsCount);
result.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
fftw_complex& y = a[i][j];
fftw_complex& x = b[i][j];
double u = x[0], v = x[1];
double m = y[0], n = y[1];
c[i][j][0] = u*m + n*v;
c[i][j][1] = v*m - u*n;
int fx = j;
if (fx>(colsCount / 2)) fx -= colsCount;
int fy = i;
if (fy>(rowsCount / 2)) fy -= rowsCount;
float r2 = (fx*fx + fy*fy);
const double cuttoffCoef = (maxWidth * maxHeight) / 37992.;
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
}
fft2d(c, rowsCount, colsCount, false);
const int halfCols = colsCount / 2;
const int halfRows = rowsCount / 2;
if (swapQuadrants) {
for (int i = 0; i < halfRows; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i + halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i + halfRows][j + halfCols][1]);
}
for (int i = halfRows; i < rowsCount; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i - halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i - halfRows][j + halfCols][1]);
}
}
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
const double& g = c[i][j][0];
RGBApixel pixel;
pixel.Alpha = 0;
int gInt = 255 - static_cast<int>(std::floor(g + 0.5));
pixel.Red = gInt;
pixel.Green = gInt;
pixel.Blue = gInt;
result.SetPixel(j, i, pixel);
}
BMP res;
res.SetSize(maxWidth, maxHeight);
result.WriteToFile("result.bmp");
return 0;
}
Sample output
This question would probably be more appropriately posted on another site like cross validated (metaoptimize.com used to also be a good one, but it appears to be gone)
That said:
There's two similar operations you can perform with FFT: convolution and correlation. Convolution is used for determining how two signals interact with each-other, whereas correlation can be used to express how similar two signals are to each-other. Make sure you're doing the right operation as they're both commonly implemented throught a DFT.
For this type of application of DFTs you usually wouldn't extract any useful information in the fourier spectrum unless you were looking for frequencies common to both data sources or whatever (eg, if you were comparing two bridges to see if their supports are spaced similarly).
Your 3rd image looks a lot like the power domain; normally I see the correlation output entirely grey except where overlap occurred. Your code definitely appears to be computing the inverse DFT, so unless I'm missing something the only other explanation I've come up with for the fuzzy look could be some of the "fudge factor" code in there like:
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
As for what you should expect: wherever there are common elements between the two images you'll see a peak. The larger the peak, the more similar the two images are near that region.
Some comments and/or recommended changes:
1) Convolution & correlation are not scale invariant operations. In other words, the size of your pattern image can make a significant difference in your output.
2) Normalize your images before correlation.
When you get the image data ready for the forward DFT pass:
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
/* ... */
How you grayscale the image is your business (though I would've picked something like sqrt( r*r + b*b + g*g )). However, I don't see you doing anything to normalize the image.
The word "normalize" can take on a few different meanings in this context. Two common types:
normalize the range of values between 0.0 and 1.0
normalize the "whiteness" of the images
3) Run your pattern image through an edge enhancement filter. I've personally made use of canny, sobel, and I think I messed with a few others. As I recall, canny was "quick'n dirty", sobel was more expensive, but I got comparable results when it came time to do correlation. See chapter 24 of the "dsp guide" book that's freely available online. The whole book is worth your time, but if you're low on time then at a minimum chapter 24 will help a lot.
4) Re-scale the output image between [0, 255]; if you want to implement thresholds, do it after this step because the thresholding step is lossy.
My memory on this one is hazy, but as I recall (edited for clarity):
You can scale the final image pixels (before rescaling) between [-1.0, 1.0] by dividing off the largest power spectrum value from the entire power spectrum
The largest power spectrum value is, conveniently enough, the center-most value in the power spectrum (corresponding to the lowest frequency)
If you divide it off the power spectrum, you'll end up doing twice the work; since FFTs are linear, you can delay the division until after the inverse DFT pass to when you're re-scaling the pixels between [0..255].
If after rescaling most of your values end up so black you can't see them, you can use a solution to the ODE y' = y(1 - y) (one example is the sigmoid f(x) = 1 / (1 + exp(-c*x) ), for some scaling factor c that gives better gradations). This has more to do with improving your ability to interpret the results visually than anything you might use to programmatically find peaks.
edit I said [0, 255] above. I suggest you rescale to [128, 255] or some other lower bound that is gray rather than black.