For practice, I am slowly implementing image processing concepts with the FFT, and I have started with zero-padding. The result is supposed to rescale the size of the image (in this case, double the width and height), but my output is washed out. I was thinking that it had to do with my normalization after the ifft, since the width and height had changed after the padding, but nothing I have tried has produced a better image. Any ideas on where I may be scaling the data incorrectly, or quick fixes to increase the power of my output? Before I save my image, I scale all the pixel data to a range between 0 and 255, but it almost seems like the output is between 128 and 255 instead.
Original:
Zero-padded FFT:
IFFT:
I should have tested some more before posting. All I was doing wrong was scaling the image between 0 and 255 one too many times. This messed up the data, especially since my scaling function uses a logarithmic scale to better display the fft.
Related
I want to get a part of an image loaded in another image. There are several, easy ways to do that but for example cv::Mat OutImage = Image(cv::Rect(7,47,1912,980)) but- the resulted image is to large For example:
I got an image with 1920 x 1024 pixel. I want to cut a cv:Rect(7,47,1912,980) from it. I would suggest, that the resulting image has the size (1912 - 7 = 1905) x (980 - 47 = 933) pixel but it has 1912 x 980. It seems, that Opencv is just cutting on the right lower side and keeping the left upper area.
The dimension of the image is important, because in the next step I'd like to perform a substraction which is only valid if the Mat object has the same dimension. I also don't want to use a loop designed by myself, because performance is very important.
Any ideas?
Regards,
Jan
It is actually cv:Rect(x,y,width,height), so you should set the last two parameters as your willing output width and height. Mind the range you set or it would cause errors.
I had also dealed with this issue I will just give my example here it is working for me well. You may also try this one.
Rect const box(100, 295, 400, 185); //this mean the first corner is
//(x,y)=(100,295)
// and the second corner is
//(x + b, y+c )= (100 +400,295+185)
Mat ROI = frame(box);
I am running the tutorial found here: https://software.intel.com/en-us/articles/using-librealsense-and-opencv-to-stream-rgb-and-depth-data
It gets the depth values from the r200 using the following lines:
cv::Mat depth16( _depth_intrin.height, _depth_intrin.width, CV_16U,(uchar *)_rs_camera.get_frame_data( rs::stream::depth ) );
cv::Mat depth8u = depth16;
depth8u.convertTo( depth8u, CV_8UC1, 255.0/1000 );
imshow( WINDOW_DEPTH, depth8u );
And the output image steam is:
https://imgur.com/EmdhFNk
You can see the color image as well. I've also put a tape measure across the bottom that goes as far as 3.5m (the range for the r200 is supposed to be up to 3.5m)
Why on earth is the color binary? I've tried adding different color images but it seems to not be depth values at all. Also it makes no sense that the floor is consistently black even though it spans from 1m to 5m away. Why are all objects white? The table and couch are obviously different distances away.
How can I improve this? I know you can get good depth values from the r200 as I get them in the examples. See (http://docs.ros.org/kinetic/api/librealsense/html/cpp-capture_8cpp_source.html) but these use glfw as opposed to OpenCV. I'm wondering why the depth values are so odd once theyve been converted.
Ideally i would like to generate depth values and filter any outside the range of 1m to 2m away. Thanks!
Edit: As #MSalters pointed out, the first half of my answer was erroneous and due to my misreading of the OP's code. The second half contains the right answer.
If your depth range is 1-3.5m, measured in millimetres (1000mm-3500mm); dividing the result by 1000 will give you data in the range 1.0-3.5. However, your source data is a 16-bit unsigned type, which can't represent decimal or floating point values, only integers, so your values get truncated to one of {0,1,2,3}. You might get away with this in convertTo, as it may marshal the types internally, but it's a potential source of error.
There is a second problem though... CV_8U is an 8-bit unsigned char, which can also only represent integer values, this time in the range from 0-255. Since your data can be in the range 0...3500, by multiplying by 0.255 as you do in your example, anything over 1000mm depth results in a value over 255 and so gets truncated there.
Instead of converting the raw depth image as you are above, you could use the cv::normalize function, with the NORM_MINMAX normalisation-type to normalise your data down to the 0...255 range. You can set the destination image format to CV_8U too.
This is probably only suitable for visualisation though, as it'll be affected by the source data input range. Instead, if you know your max value is 3500, and your min is 0, divide the source image by 3500 and multiply by 255. That said, where possible, it's probably best to keep it in the 16-bit format for the sake of depth resolution.
I have two videos, one of a background and one of that same background with a person sitting in the frame. I generated two images from the video of just the background: the mean image of the background video (by accumulating the frames and dividing by the number of frames) and an image of standard deviations from the mean per pixel, taken over the frames. In other words, I have two images representing the Gaussian distribution of the background video. Now, I want to threshold an image, not using one fixed threshold value for all pixels, but using the standard deviations from the image (a different threshold per pixel). However, as far as I understand, OpenCV's threshold() function only allows for one fixed threshold. Are there functions I'm missing, or is there a workaround?
A cv::Mat provides methodology to accomplish this.
The setTo() methods takes an optional InputArray as mask.
Assuming the following:
std is your standard deviations cv::Mat, in is the cv::Mat you want to threshold and thresh is the factor for your standard deviations.
Using these values the custom thresholding could be done like this:
// Computes threshold values based on input image + std dev
cv::Mat mask = in +/- (std * thresh);
// Set in.at<>(x,y) to 0 if value is lower than mask.at<>(x,y)
in.setTo(0, in < mask);
The in < mask expression creates a new MatExpr object, a matrix which is 0 at every pixel where the predicate is false, 255 otherwise.
This is a handy way to implement custom thresholding.
I'm dealing with a small project with OpenCV & C++, maybe the following questions are naive, but I'll be very grateful if anyone could offer help.
Being new here, I don't have enough reputation to post images, so I'll try to make it clear.
I'm trying to denoise an image (MxN = 200x200 Mat) in frequency domain,
and say I have a UxV=3x3 Gaussian kernel {{0, -1, 0},{-1, 4, -1},{0, -1, 0}}, and the expected steps are:
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of
image take inverse 2-D FFT of result
Both The result of step 2 (fft of the kernel) and the final result (filted image) seems right, but then I found this answer:
you do need to make K as big as I by padding it with zeros. Also, after padding, but before you take the FFT of the kernel, you need to translate it with wraparound, such that the center of the kernel (the peak of the Gaussian) is at (0,0). Otherwise, your filtered image will be translated.
That's what I didn't do. But how could the result seems acceptable?
So I'm now wondering what's the difference between whether moving the kernel to make its center is at (0, 0) before fft?
Here comes my 2nd question. If we got an shifted fft of an image (with the 0 frequency in the middle), can I just do this to obtain a 'low-pass' effet:
For the pixels whose distance from the center is bigger than a threshold, set their value 0.
(I think it's straight forward, but haven't found some similar methods widely used.)
Thank you VERY much for offering any help :-)
I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.