I am novice in OpenCV. My program reads image data in 16 bit unsigned int. I need to multiply the image data by some gain of 16 bit unsigned int. So, the resulting data should be kept in 32 bit image file.
I tried following, but I get 8 bit all white image. Please help.
Mat inputData = Mat(Size(width, height), CV_16U, inputdata);
inputData.convertTo(input1Data, CV_32F);
input1Data = input1Data * gain;//gain is ushort
As Micka noticed in the comment, first of all we need to scale inputData to have values between 0.0f and 1.0f by passing a scaling factor:
inputData.convertTo(input1Data, CV_32F, 1.0/65535.0f); // since in inputData
// we have values between 0 and
// 65535 so all resulted values
// will be between 0.0f and 1.0f
And now, the same with the multiplication:
input1Data = input1Data * gain * (1.0f / 65535.0f); // gain, of course, will be
// automatically cast to float
// therefore the resulted factor
// will have value from 0 to 1,
// so input1Data too!
And I think this should compile too:
input1Data *= gain * (1.0f / 65535.0f);
optimizing first version a bit by not creating temporary data.
Related
I am accessing the image like so:
pDoc = GetDocument();
int iBitPerPixel = pDoc->_bmp->bitsperpixel; // used to see if grayscale(8 bits) or RGB (24 bits)
int iWidth = pDoc->_bmp->width;
int iHeight = pDoc->_bmp->height;
BYTE *pImg = pDoc->_bmp->point; // pointer used to point at pixels in the image
int Wp = iWidth;
const int area = iWidth * iHeight;
int r; // red pixel value
int g; // green pixel value
int b; // blue pixel value
int gray; // gray pixel value
BYTE *pImgGS = pImg; // grayscale image pixel array
and attempting to change the rgb image to gray like so:
// convert RGB values to grayscale at each pixel, then put in grayscale array
for (int i = 0; i<iHeight; i++)
for (int j = 0; j<iWidth; j++)
{
r = pImg[i*iWidth * 3 + j * 3 + 2];
g = pImg[i*iWidth * 3 + j * 3 + 1];
b = pImg[i*Wp + j * 3];
r * 0.299;
g * 0.587;
b * 0.144;
gray = std::round(r + g + b);
pImgGS[i*Wp + j] = gray;
}
finally, this is how I try to draw the image:
//draw the picture as grayscale
for (int i = 0; i < iHeight; i++) {
for (int j = 0; j < iWidth; j++) {
// this should set every corresponding grayscale picture to the current picture as grayscale
pImg[i*Wp + j] = pImgGS[i*Wp + j];
}
}
}
original image:
and the resulting image that I get is this:
First check if image type is 24 bits per pixels.
Second, allocate memory to pImgGS;
BYTE* pImgGS = (BTYE*)malloc(sizeof(BYTE)*iWidth *iHeight);
Please refer this article to see how bmp data is saved. bmp images are saved upside down. Also, first 54 byte of information is BITMAPFILEHEADER.
Hence you should access values in following way,
double r,g,b;
unsigned char gray;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[(i*iWidth + j)*3 + 2];
g = (double)pImg[(i*iWidth + j)*3 + 1];
b = (double)pImg[(i*iWidth + j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
}
If there is padding present, then first determine padding and access in different way. Refer this to understand pitch and padding.
double r,g,b;
unsigned char gray;
long index=0;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[index+ (j)*3 + 2];
g = (double)pImg[index+ (j)*3 + 1];
b = (double)pImg[index+ (j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
index =index +pitch;
}
While drawing image,
as pImg is 24bpp, you need to copy gray values thrice to each R,G,B channel. If you ultimately want to save grayscale image in bmp format, then again you have to write bmp data upside down or you can simply skip that step in converting to gray here:
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
tl; dr:
Make one common path. Convert everything to 32-bits in a well-defined manner, and do not use image dimensions or coordinates. Refactor the YCbCr conversion ( = grey value calculation) into a separate function, this is easier to read and runs at exactly the same speed.
The lengthy stuff
First, you seem to have been confused with strides and offsets. The artefact that you see is because you accidentially wrote out one value (and in total only one third of the data) when you should have written three values.
One can get confused with this easily, but here it happened because you do useless stuff that you needed not do in the first place. You are iterating coordinates left to right, top-to-bottom and painstakingly calculate the correct byte offset in the data for each location.
However, you're doing a full-screen effect, so what you really want is iterate over the complete image. Who cares about the width and height? You know the beginning of the data, and you know the length. One loop over the complete blob will do the same, only faster, with less obscure code, and fewer opportunities of getting something wrong.
Next, 24-bit bitmaps are common as files, but they are rather unusual for in-memory representation because the format is nasty to access and unsuitable for hardware. Drawing such a bitmap will require a lot of work from the driver or the graphics hardware (it will work, but it will not work well). Therefore, 32-bit depth is usually a much better, faster, and more comfortable choice. It is much more "natural" to access program-wise.
You can rather trivially convert 24-bit to 32-bit. Iterate over the complete bitmap data and write out a complete 32-bit word for each 3 byte-tuple read. Windows bitmaps ignore the A channel (the highest-order byte), so just leave it zero, or whatever.
Also, there is no such thing as a 8-bit greyscale bitmap. This simply doesn't exist. Although there exist bitmaps that look like greyscale bitmaps, they are in reality paletted 8-bit bitmaps where (incidentially) the bmiColors member contains all greyscale values.
Therefore, unless you can guarantee that you will only ever process images that you have created yourself, you cannot just rely that e.g. the values 5 and 73 correspond to 5/255 and 73/255 greyscale intensity, respectively. That may be the case, but it is in general a wrong assumption.
In order to be on the safe side as far as correctness goes, you must convert your 8-bit greyscale bitmaps to real colors by looking up the indices (the bitmap's grey values are really indices) in the palette. Otherwise, you could be loading a greyscale image where the palette is the other way around (so 5 would mean 250 and 250 would mean 5), or a bitmap which isn't greyscale at all.
So... you want to convert 24-bit and you want to convert 8-bit bitmaps, both to 32-bit depth. That means you do all the annoying what-if stuff once at the beginning, and the rest is one identical common path. That's a good thing.
What you will be showing on-screen is always a 32-bit bitmap where the topmost byte is ignored, and the lower three are all the same value, resulting in what looks like a shade of grey. That's simple, and simple is good.
Note that if you do a BT.601 style YCbCr conversion (as indicated by your use of the constants 0.299, 0.587, and 0.144), and if your 8-bit greyscale images are perceptive (this is something you must know, there is no way of telling from the file!), then for 100% correctness, you need to to the inverse transformation when converting from paletted 8-bit to RGB. Otherwise, your final result will look like almost right, but not quite. If your 8-bit greycales are linear, i.e. were created without using the above constants (again, you must know, you cannot tell from the image), you need to copy everything as-is (here, doing the conversion would make it look almost-but-not-quite right).
About the RGB-to-greyscale conversion, you do not need an extra greyscale bitmap just to hold the values that you never need again afterwards. You can read the three color values from the loaded bitmap, calculate Y, and directly build the 32-bit ARGB word, which you then write out to the final bitmap. This saves one entirely useless round-trip to memory which is not necessary.
Something like this:
uint32_t* out = (uint32_t*) output_bitmap_data;
for(int i = 0; i < inputSize; i+= 3)
{
uint8_t Y = calc_greyscale(in[0], in[1], in[2]);
*out++ = (Y<<16) | (Y<<8) | Y;
}
Alternatively, you can also do the from-whatever-to-32 conversion, and then do the to-greyscale conversion in-place there. This, in turn, introduces an extra round-trip to memory, but the code becomes much, much easier overall.
I've a problem with downsampling image with bilinear interpolation. I've read almost all relevant articles on stackoverflow and searched around in google, trying to solve or at least to find the problem in my OpenCL kernel. This is my main source for the theory. After I implemented this code in OpenCL:
__kernel void downsample(__global uchar* image, __global uchar* outputImage, __global int* width, __global int* height, __global float* factor){
//image vector containing original RGB values
//outputImage vector containing "downsampled" RGB mean values
//factor - downsampling factor, downscaling the image by factor: 1024*1024 -> 1024/factor * 1024/factor
int r = get_global_id(0);
int c = get_global_id(1); //current coordinates
int oWidth = get_global_size(0);
int olc, ohc, olr, ohr; //coordinates of the original image used for bilinear interpolation
int index; //linearized index of the point
uchar q11, q12, q21, q22;
float accurate_c, accurate_r; //the exact scaled point
int k;
accurate_c = convert_float(c*factor[0]);
olc=convert_int(accurate_c);
ohc=olc+1;
if(!(ohc<width[0]))
ohc=olc;
accurate_r = convert_float(r*factor[0]);
olr=convert_int(accurate_r);
ohr=olr+1;
if(!(ohr<height[0]))
ohr=olr;
index= (c + r*oWidth)*3; //3 bytes per pixel
//Compute RGB values: take a central mean RGB values among four points
for(k=0; k<3; k++){
q11=image[(olc + olr*width[0])*3+k];
q12=image[(olc + ohr*width[0])*3+k];
q21=image[(ohc + olr*width[0])*3+k];
q22=image[(ohc + ohr*width[0])*3+k];
outputImage[index+k] = convert_uchar((q11*(ohc - accurate_c)*(ohr - accurate_r) +
q21*(accurate_c - olc)*(ohr - accurate_r) +
q12*(ohc - accurate_c)*(accurate_r - olr) +
q22*(accurate_c - olc)*(accurate_r - olr)));
}
}
The kernel works with factor = 2, 4, 5, 6 but not with factor = 3, 7 (I get missing pixels, and the image appears little bit skewed) whereas the "identical" code written in c++ works fine with all factor values. I cann't explain it to myself why that happens in opencl. I attach my full code project here
I am working with depth images retrieved from kinect which are 16 bits. I found some difficulties on making my own filters due to the index or the size of the images.
I am working with Textures because allows to work with any bit size of images.
So, I am trying to compute an easy gradient to understand what is wrong or why it doesn't work as I expected.
You can see that there is something wrong when I use y dir.
For x:
For y:
That's my code:
typedef concurrency::graphics::texture<unsigned int, 2> TextureData;
typedef concurrency::graphics::texture_view<unsigned int, 2> Texture
cv::Mat image = cv::imread("Depth247.tiff", CV_LOAD_IMAGE_ANYDEPTH);
//just a copy from another image
cv::Mat image2(image.clone() );
concurrency::extent<2> imageSize(640, 480);
int bits = 16;
const unsigned int nBytes = imageSize.size() * 2; // 614400
{
uchar* data = image.data;
// Result data
TextureData texDataD(imageSize, bits);
Texture texR(texDataD);
parallel_for_each(
imageSize,
[=](concurrency::index<2> idx) restrict(amp)
{
int x = idx[0];
int y = idx[1];
// 65535 is the maxium value that can take a pixel with 16 bits (2^16 - 1)
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
texR.set(idx, valX);
});
//concurrency::graphics::copy(texR, image2.data, imageSize.size() *(bits / 8u));
concurrency::graphics::copy_async(texR, image2.data, imageSize.size() *(bits) );
cv::imshow("result", image2);
cv::waitKey(50);
}
Any help will be very appreciated.
Your indexes are swapped in two places.
int x = idx[0];
int y = idx[1];
Remember that C++AMP uses row-major indices for arrays. Thus idx[0] refers to row, y axis. This is why the picture you have for "For x" looks like what I would expect for texR.set(idx, valY).
Similarly the extent of image is also using swapped values.
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
Here imageSize[0] refers to the number of columns (the y value) not the number of rows.
I'm not familiar with OpenCV but I'm assuming that it also uses a row major format for cv::Mat. It might invert the y axis with 0, 0 top-left not bottom-left. The Kinect data may do similar things but again, it's row major.
There may be other places in your code that have the same issue but I think if you double check how you are using index and extent you should be able to fix this.
I am trying to implement unsharp masking like it's done in Adobe Photoshop. I gathered a lot of information on the interent but I'm not sure if I'm missing something. Here's the code:
void unsharpMask( cv::Mat* img, double amount, double radius, double threshold ) {
// create blurred img
cv::Mat img32F, imgBlur32F, imgHighContrast32F, imgDiff32F, unsharpMas32F, colDelta32F, compRes, compRes32F, prod;
double r = 1.5;
img->convertTo( img32F, CV_32F );
cv::GaussianBlur( img32F, imgBlur32F, cv::Size(0,0), radius );
cv::subtract( img32F, imgBlur32F, unsharpMas32F );
// increase contrast( original, amount percent )
imgHighContrast32F = img32F * amount / 100.0f;
cv::subtract( imgHighContrast32F, img32F, imgDiff32F );
unsharpMas32F /= 255.0f;
cv::multiply( unsharpMas32F, imgDiff32F, colDelta32F );
cv::compare( cv::abs( colDelta32F ), threshold, compRes, cv::CMP_GT );
compRes.convertTo( compRes32F, CV_32F );
cv::multiply( compRes32F, colDelta32F, prod );
cv::add( img32F, prod, img32F );
img32F.convertTo( *img, CV_8U );
}
At the moment I am testing with a grayscale image. If i try the exact same parameters in Photoshop I get much better result. My own code leads to noisy images. What am I doing wrong.
The 2nd question is, how i can apply unsharp masking on RGB images? Do I have to unsharp mask each of the 3 channels or would it be better in another color space? How are these things done in Photoshop?
Thanks for your help!
I'm trying to replicate Photoshop's Unsharp Mask as well.
Let's ignore the Threshold for a second.
I will show you how to replicate Photoshop's Unsharp Mask using its Gaussian Blur.
Assuming O is the original image layer.
Create a new layer GB which is a Gaussian Blur applied on O.
Create a new layer which is O - GB (Using Apply Image).
Create a new layer by inverting GB - invGB.
Create a new layer which is O + invGB using Image Apply.
Create a new layer which is inversion of the previous layer, namely inv(O + invGB).
Create a new layer which is O + (O - GB) - inv(O + invGB).
When you do that in Photoshop you'll get a perfect reproduction of the Unsharp Mask.
If you do the math recalling that inv(L) = 1 - L you will get that the Unsharp Mask is
USM(O) = 3O - 2B.
Yet when I do that directly in MATLAB I don't get Photoshop's results.
Hopefully someone will know the exact math.
Update
OK,
I figured it out.
In Photoshop USM(O) = O + (2 * (Amount / 100) * (O - GB))
Where GB is a Gaussian Blurred version of O.
Yet, in order to replicate Photoshop's results you must do the steps above and clip the result of each step into [0, 1] as done in Photoshop.
According to docs:
C++: void GaussianBlur(InputArray src, OutputArray dst, Size ksize,
double sigmaX, double sigmaY=0, int borderType=BORDER_DEFAULT )
4th parameter is not "radius" it is "sigma" - gaussian kernel standard deviation. Radius is rather "ksize". Anyway Photoshop is not open source, hence we can not be sure they use the same way as OpenCV to calculate radius from sigma.
Channels
Yes you should apply sharp to any or to all channels, it depends on your purpose. Sure you can use any space: if you want sharp only brightness-component and don't want to increase color noise you can covert it to HSL or Lab-space and sharp L-channel only (Photoshop has all this options too).
In response to #Royi, the 2x multiplier results from assuming no clamping in this formula:
USM(Original) = Original + Amount / 100 * ((Original - GB) - (1 - (Original + (1 - GB))))
Ignoring clamping this incorrectly reduces to:
USM(Original) = Original + 2 * Amount / 100 * (Original - GB)
However, as you also point out, (Original - GB) and (Original + inv(GB)) are clamped to [0, 1]:
USM(Original) = Original + Amount / 100 *
(Max(0, Min(1, Original - GB)) - (1 - (Max(0, Min(1, Original + (1 - GB))))))
This correctly reduces to:
USM(Original) = Original + Amount / 100 * (Original - GB)
Here is an example illustrating why:
https://legacy.imagemagick.org/discourse-server/viewtopic.php?p=133597#p133597
Here's the code what I have done.
I am using this code to implement Unsharp Mask and it is working well for me.
Hope it is useful for you.
void USM(cv::Mat &O, int d, int amp, int threshold)
{
cv::Mat GB;
cv::Mat O_GB;
cv::subtract(O, GB, O_GB);
cv::Mat invGB = cv::Scalar(255) - GB;
cv::add(O, invGB, invGB);
invGB = cv::Scalar(255) - invGB;
for (int i = 0; i < O.rows; i++)
{
for (int j = 0; j < O.cols; j++)
{
unsigned char o_rgb = O.at<unsigned char>(i, j);
unsigned char d_rgb = O_GB.at<unsigned char>(i, j);
unsigned char inv_rgb = invGB.at<unsigned char>(i, j);
int newVal = o_rgb;
if (d_rgb >= threshold)
{
newVal = o_rgb + (d_rgb - inv_rgb) * amp;
if (newVal < 0) newVal = 0;
if (newVal > 255) newVal = 255;
}
O.at<unsigned char>(i, j) = unsigned char(newVal);
}
}
}
I use the following c++ code to read out the depth information from the kinect:
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
// end pixel is start + width*height - 1
const USHORT * pBufferEnd = pBufferRun + (Width * Height);
// process data for display in main window.
while ( pBufferRun < pBufferEnd )
{
// discard the portion of the depth that contains only the player index
USHORT depth = NuiDepthPixelToDepth(*pBufferRun);
BYTE intensity = static_cast<BYTE>(depth % 256);
// Write out blue byte
*(rgbrun++) = intensity;
// Write out green byte
*(rgbrun++) = intensity;
// Write out red byte
*(rgbrun++) = intensity;
++rgbrun;
++pBufferRun;
}
What I'd like to know is, what is the easiest way to implement frame flipping (horizontal & vertical)? I couldn't find any function in the kinect SDK, but maybe I missed it?
EDIT1 I'd like to not having to use any external libraries, so any solutions that explain the depth data layout and how to invert rows / columns, is highly appreciated.
So, you're using a standard 16bpp single channel depth map with player data. This is a nice easy format to work with. An image buffer is arranged row-wise, and each pixel in the image data has the bottom 3 bits set to the player ID and the top 13 bits set to depth data.
Here's a quick'n'dirty way to read each row in reverse, and write it out to an RGBWhatever image with a simple depth visualisation that's a little nicer to look at that the wrapping output you currently use.
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
for (unsigned int y = 0; y < Height; y++)
{
for (unsigned int x = 0; x < Width; x++)
{
// shift off the player bits
USHORT depthIn = pBufferRun[(y * Width) + (Width - 1 - x)] >> 3;
// valid depth is (generally) in the range 0 to 4095.
// here's a simple visualisation to do a greyscale mapping, with white
// being closest. Set 0 (invalid pixel) to black.
BYTE intensity =
depthIn == 0 || depthIn > 4095 ?
0 : 255 - (BYTE)(((float)depthIn / 4095.0f) * 255.0f);
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
++rgbrun;
}
}
Code untested, E&OE, etc ;-)
It is possible to parallelise the outer loop, if instead of using a single rgbrun pointer you get a pointer to the beginning of the current row and write the output to that instead.