I am starting to learn CUDA GPU programming from Udacity video course (course is 2 yrs old). I am using CUDA 5.5 with Visual Studio Express 2012 (students edition, so not all features of CUDA debugging is not available) on Nvidia GeForce GT 630M GPU.
Just implemented some vector addition and other simple operations.
Now I am trying to convert a RGB image to Grayscale. I am reading image with help of OpenCV. (Anyway I failed whatever methods I tried. That is why I am here)
Below is my .cpp file : https://gist.github.com/abidrahmank/7020863
Below is my .cu file : https://gist.github.com/abidrahmank/7020910
My input image is a simple 64x64 color image (Actually I used 512x512 image first, didn't work, so brought down to 64x64 to check if that is the problem. It doesn't seem so)
Problem
My output image of CUDA implementation is a white image. All value 255. Somewhere here and there, there are some gray pixels, may be less than 1%. Remaining everything is white.
What I tried:
For three days, I tried following things:
I thought problem may be due image size, so that number of threads may not be optimal or something like that, So reduced image size. Still same result.
I tried a similar example, created a 64x64 array. Take its two pixels at a time, and find the square of their sums, and it worked fine. Here is the code : https://gist.github.com/abidrahmank/7021023
Started checking data one-by-one at each stage. Input image just before loading to GPU is fine. But input data, when I checked inside kernel, is always 255. (Check line 14 here)
Finally I set all GPU data to zero using CudaMemset and checked input data inside kernel, it is still 255.
So I don't have any other option to do other asking at StackOverflow.
Can anyone tell me what is the mistake I am making?
Your kernel signature says:
__global__ void kernel(unsigned char* d_in, unsigned char* d_out)
But you call it like:
kernel<<<rows,cols>>>(d_out, d_in);
Which one is in and which one is out?
Having done quite a bit of CUDA programming in the past, I would strongly recommend that you use Thrust instead of hand-crafting kernels. Even thrust::for_each is hard to beat with raw kernels.
Besides the parameter issue indicated by DanielKO, you also have problems on thread/block settings.
Since you've already treat your 2-D image as a 1-D array, here's a good example showing how to set thread/block for data with arbitrary size.
https://developer.nvidia.com/content/easy-introduction-cuda-c-and-c
Related
I am blurring the background of an image using the blur method. All the tutorials I have seen show the highest kernel size of (7,7). But that is not blurred enough for what I need it for.
I have used Size(33,33) and it works alright but I would like to go higher so currently I am using Size(77,77). Is this the most efficient way of blurring an image in OpenCV? And is it okay to go that high at all?
Another Idea is run the blur method more than once. with a kernel size of (7,7), but that doesn't seem like it is more efficient.
EDIT:
OpenCV version 3.2
Try cv::stackBlur().
It's an addition from v4.7.0. Its performance is almost flat, i.e. independent of kernel size. The pull request contains performance figures: https://github.com/opencv/opencv/pull/20379
GaussianBlur(sigmaX=22) (30 ms)
stackBlur(ksize=(101,101)) (0.4 ms)
My aim is to capture all the frames (RGB) from Kinect at 30 fps and save them to my hard drive. For doing this I took the following approach.
Get the frames from Kinect and store them in an array buffer. Since writing to disk (using imwrite()) takes a bit of time and I may miss some frames while doing so, so instead of directly saving them to the disk, I store them in an array. Now, I have another parallel thread that accesses this array and writes the individual frames to the disk as images.
Now I have used a static array of size 3000 and type Mat. This will suffice since I need to store frames for 1.5 minute videos (1.5 minutes = 2700 frames). I have declared the array as follows :
#define NUM_FRAMES 3000
Mat rgb[NUM_FRAMES];
I have already tested this limit by reading images and saving them to the array using the following code :
for(int i=0; i<NUM_FRAMES; i++)
{
Mat img = imread("image.jpg", CV_LOAD_IMAGE_COLOR);
rgb[i] = img;
imshow("Image", img);
cvWaitKey(10);
}
The above code executed flawlessly.
But one problem is that the code I am using for capturing image using Kinect, captures the image in an IplImage. Thus I need to convert the image to cv::Mat format before using it. I convert it using the following command:
IplImage* color = cvCreateImageHeader(cvSize(COLOR_WIDTH, COLOR_HEIGHT), IPL_DEPTH_8U, 4);
cvSetData(color, colorBuffer, colorLockedRect.Pitch); // colorBuffer and colorLockedRect.Pitch is something that Kinect uses. Not related to OpenCv
rgb[rgb_read++] = Mat(color, FLAG);
Now here lies my problem. Whenever I am setting #define FLAG true, it causes memory leaks and gives me OpenCv Error: Insufficient memory (failed to allocate 1228804 bytes) error.
But if I use #define FLAG false it works correctly, but the frames that I am getting is erroneous as shown below. They are three consecutive frames.
I was moving around my arm and the image got cut in between as can be seen from above.
Can someone please point out the reason for this weird behavior or any other alternate way of obtaining the desired result. I have been struggling with this since a few days now. Please ask for if any further clarifications are required.
I am using OpenCV 2.4.8, Kinect SDK for Windows version-1.8.0 and Microsoft Visual Studio 2010.
Also can someone please explan to me the role of the CopyData parameter in Mat::Mat. I have already gone through this link, but still it's not completely clear. Maybe that's why I could not solve the above error in the first place since it's working is not very clear.
Thanks in advance.
first, do not use IplImages, stick with cv::Mat, please.
the equivalent code for that would be:
Mat img_borrowed = Mat( height, width, CV_8U4C, colorBuffer, colorLockedRect.Pitch );
note, that this does not do any allocation on its own, it's still the kinect's pixels, so you will have to clone() it:
rgb[rgb_read++] = img_borrowed.clone();
this is the same as setting the flag in your code above to 'true'. (deep-copy the data)
[edit] maybe it's a good idea to skip the useless 4th channel (also less mem required), so , instead of the above you could do:
cvtColor( img_borrowed, rgb[rgb_read++], CV_BGRA2BGR); // will make a 'deep copy', too.
so, - here's the bummer: if you don't save a deep-copy in your array, you'll end up with garbled (and all the same!) images, probably even with undefined behaviour due to the locking/unlocking of the kinect buffer, if you do copy it (and you must), you will need a lot of memory.
unlikely, that you can keep 3000 *1024*786*4 = 9658368000 bytes in memory, you'll have to cut it down one way or another.
I want to read the contents of every pixel in an image i have and convert it to a bit-stream (raw bits) or contain it in a 2-D array . Which would be the best place to start looking for such a conversion?
Specifics of the image : Standard test image called lena.bmp
size : 256 x 256
Bit depth of pixel : 8
Also I would like to know the importance of the number of bits per pixel with regards to this question since packing and unpacking will also be incorporated .
CImg is a nice simple, lightweight C++ library which can load and save a number of image formats (including BMP).
It's a single header file, so there's no need to compile or link the library. Just include the header, and you're good to go.
You should investigate OpenCV: a cross-platform computer vision library. It provides a C++ API as well as a C API, and it supports many image formats including bmp.
In the C++ interface, cv::Mat is the type that represents a 2D image. A simple application that loads and displays an image can be found here.
To learn how to access the matrix elements (pixels) you can check these threads:
OpenCV get pixel information from Mat image
Pixel access in OpenCV 2.2
Common Matrix Operations in OpenCV
OpenCV’s C++ interface offers a short introduction to cv::Mat. There has been many threads on Stackoverflow regarding OpenCV, there's a lot of valuable content around and you can benefit a lot by using the search box.
This page has a collection of books/tutorials/install guides focused on OpenCV, but this the newest official tutorial.
I'll first tell you the problem and then I'll tell you my solution.
Problem: I have a blank white PNG image approximately 900x900 pixels. I want to copy circles 30x30 pixels in size, which are essentially circles with a different colour. There are 8 different circles, and placed on the image depending on data values which I've created elsewhere.
Solution: I've used ImageMagicK, it's suppose to be good for general purpose image editing etc. I created a blank image
Image.outimage("900x900","white");
I upload all other small 30x30 pixel images with 'read' function.
I upload the data and extract vales.
I place the small 'circle' images on the blank one using the composite command.
outimage.composite("some file.png",pixelx,pixely,InCompositeOp);
This all works fine and the images come up the way I want them too.
However its painfully SLOW. It takes 20 seconds to do one image, and I have 1000 of them. Surely there must be a better way to do this. I've seen other researchers simulate images way more complex and way faster. It's quite possible I took the wrong approach. Maybe I sould be 'drawing' circles instead of 'pasting' them or something. I'm quite baffled. Any input is appreciated.
I suspect that you just need some library that is capable of drawing circles on bitmap and saving that bitmap as png.
For example my Graphin library: http://code.google.com/p/graphin/
Or some such. With Graphin you can also draw one PNG on surface of another as in your case.
You did not give any information about the platform you are using (only "C++"), so if you are looking for a platform independent solution, the CImg library might be worth a try.
http://cimg.sourceforge.net/
By the way, did you try drawing the circles using the ImageMagick C++ API Magick++ instead of "composing" them? I cannot believe that it is that slow.
I need to convert 24bppRGB to 16bppRGB, 8bppRGB, 4bppRGB, 8bpp grayscal and 4bpp grayscale. Any good link or other suggestions?
preferably using Windows/GDI+
[EDIT] speed is more critical than quality. source images are screenshots
[EDIT1] color conversion is required to minimize space
You're better off getting yourself a library, as others have suggested. Aside from ImageMagick, there are others, such as OpenCV. The benefits of leaving this to a library are:
Save yourself some time -- by cutting out dev and testing time for the algorithm
Speed. Most libraries out there are optimized to a level far greater than a standard developer (such as ourselves) could achieve
Standards compliance. There are many image formats out there, and using a library cuts the problem of standards compliance out of the equation.
If you're doing this yourself, then your problem can be divided into the following sub-problems:
Simple color quantization. As #Alf P. Steinbach pointed out, this is just "downscaling" the number of colors. RGB24 has 8 bits per R, G, B channels, each. For RGB16 you can do a number of conversions:
Equal number of bits for each of R, G, B. This typically means 4 or 5 bits each.
Favor the green channel (human eyes are more sensitive to green) and give it 6 bits. R and B get 5 bits.
You can even do the same thing for RGB24 to RGB8, but the results won't be as pretty as a palletized image:
4 bits green, 2 red, 2 blue.
3 bits green, 5 bits between red and blue
Palletization (indexed color). This is for going from RGB24 to RGB8 and RGB4. This is a hard problem to solve by yourself.
Color to grayscale conversion. Very easy. Convert your RGB24 to YUV' color space, and keep the Y' channel. That will give you 8bpp grayscale. If you want 4bpp grayscale, then you either quantize or do palletization.
Also be sure to check out chroma subsampling. Often, you can decrease the bitrate by a third without visible losses to image quality.
With that breakdown, you can divide and conquer. Problems 1 and 2 you can solve pretty quickly. That will allow you to see the quality you can get simply by doing coarser color quantization.
Whether or not you want to solve Problem 2 will depend on the result from above. You said that speed is more important, so if the quality of color quantization only is good enough, don't bother with palletization.
Finally, you never mentioned WHY you are doing this. If this is for reducing storage space, then you should be looking at image compression. Even lossless compression will give you better results than reducing the color depth alone.
EDIT
If you're set on using PNG as the final format, then your options are quite limited, because both RGB16 and RGB8 are not valid combinations in the PNG header.
So what this means is: regardless of bit depth, you will have to switch to index color if you want RGB color images below 24bpp (8 bits per channel). This means you will NOT be able to take advantage of the color quantization and chroma decimation that I mentioned above -- it's not supported in PNG. So this means you will have to solve Problem 2 -- palletization.
But before you think about that, some more questions:
What are the dimensions of your images?
What sort of ideal file-size are you after?
How close to that ideal file-size do you get with straight RBG24 + PNG compression?
What is the source of your images? You've mentioned screenshots, but since you're so concerned about disk space, I'm beginning to suspect that you might be dealing with image sequences (video). If this is so, then you could do better than PNG compression.
Oh, and if you're serious about doing things with PNG, then definitely have a look at this library.
Find your self a copy of the ImageMagick [sic] library. It's very configurable, so you can teach it about the details of some binary format that you need to process...
See: ImageMagick, which has a very practical license.
I received acceptable results (preliminary) by GDI+, v.1.1 that is shipped with Vista and Win7. It allows conversion to 16bpp (I used PixelFormat16bppRGB565) and to 8bpp and 4bpp using standard palettes. Better quality could be received by "optimal palette" - GDI+ would calculate optimal palette for each screenshot, but it's two times slower conversion. Grayscale was received by specifying simple custom palette, e.g. as demonstrated here, except that I didn't need to modify pixels manually, Bitmap::ConvertFormat() did it for me.
[EDIT] results were really acceptable until I decided to check the solution on WinXP. Surprisingly, Microsoft decided to not ship GDI+ v.1.1 (required for Bitmap::ConvertFormat) to WinXP. Nice move! So I continue researching...
[EDIT] had to reimplement this on clean GDI hardcoding palettes from GDI+