how to access pixel in OpenCV Cuda (GpuMat) - c++

cv::Mat image = cv::Mat::zeros(1920,1080,CV_8UC4); //it's an example (I use image have 4 channel)
cv::Vec4b& pixel = image.at<cv::Vec4b>(i,j) // i and j is rows and cols
I want to use Cuda(GpuMat) and there is no ".at"
how can I change my code for accessing the pixels

The cv::cuda::GpuMat class has it's data live on the GPU/device, and this cannot be directly accessed by CPU/host code. This is why there is no equivalent to cv::Mat.at(). Transferring data between the host and device is slow, and doing a per pixel operation on a cv::cuda::GpuMat would therefore be far slower than on a cv::Mat.
It is however possible to write CUDA kernels which perform per-pixel operations. I'm afraid that while I can't give good advice on this, that this is apparently doable and there are answers to similar problems such as this one that might be able to help you.
Outside of that depending on exactly what you need to do there might be a build in function that does something similar.

Related

Extract overlapping image patches from an image in CUDA

I am currently planning on writing a function that extracts overlapping image patches from a 2D image (width x height) into a 3D batch of these patches (batch_id x patch_width x patch_height). As far as I know, there are no utilities in CUDA or OpenCV CUDA which make that very easy. (Please correct me if I am wrong here)
Since I need to resort to writing my own CUDA kernel for this task I need to decide how to tackle this approach. As far as I see there are two ways how to write the kernel:
Create a GPU thread for each pixel and map this pixel to potentially multiple locations in the 3D batch.
Create a GPU thread for each pixel in the 3D batch and let it fetch its corresponding pixel from the image.
I didn't find a clear answer in the CUDA Programming Guide to whether any of these approaches has specific advantages or disadvantages. Would you favour one of these approaches or is there an even easier way of doing this?
I think 1 is better, because it can minimize memory transactions. Memory transactions are done in a fixed size (e.g. L1 : 128 bytes), so grouping data loads and making as few cache transactions as possible can affect processing time...
Of course, it's possible that memory transactions in both way are same. Although I'm not sure about my choice, consider this when you make a kernel.

OpenCV multithreading with Mats

Hey! So I am working on an assignment about multi-threading with OpenCV. My question is as follows. How can I get all my threads to work on the same image (stored in a Mat)? I know making copies would make it slow and thus multi-threading would have no sense. Also, I would like to control the number of threads I use, and even tho I have seen the lambda c++ 11 introduced I do not know how to make it so that I control the number of threads.
I currently have a function that calculates every pixel to be put in the image, so my code running on serial looks something like this:
for(int i=0;r<MyMat.cols;i++){
for (int j=0;j<MyMat.rows;j++){
uchar value = (uchar) MyFunction(i,j);
MyMat.ptr<uchar>(i)[j] = value;
}
}
English is not my mother tongue, if I did not explain myself properly please ask for clarifications. Any help is good help!
If you split the image into horizontal bands, each thread can work on its own band independently. If each thread does not change any image data beyond its band, it should work.
In fact, OpenCV supports this already.
Take a look at parallel_for_ and how it is used.

Change data of OpenCV matrix from pointer

I am trying to capture images from several cameras using the cameras driver,OpenCV and C++. My goal is to get as many FPS as possible, and to this end I have found saving the images in the hard drive to be the slowest operation. In order to speed up the process, I am doing each saving in separate threads. Problem is, I still have to wait for the saving to be complete to avoid the captured image being overwritten. Doing this provides good results, but for unknown reasons every 30-40 frames the speed is 10x higher.
I am addressing this by creating a ring buffer where I store the images, as these sudden drops in write speed are very short. I have obtained very good results using this approach, but unfortunately for more than 3 cameras the camera driver can't handle the stress and my program halts, waiting for the first image of the 4th camera to be saved. I checked and it's not the CPU, as 3 cameras + a thread writing random data in the disk works fine.
Now, seeing how using opencv reduced the stress on the camera driver, I would like to create a OpenCV mat buffer to hold the images while they are saved without my camera overwritting them (well, not until the buffer has done a whole lap, which I will make sure won't happen).
I know I can do
cv::Mat colorFrame(cv::Size(width, height),CV_8UC3,pointerToMemoryOfCamera);
to initialize a frame from the memory written by the camera. This does not solve my problem, as it will only point to the data, and the moment the camera overwrites it, it will corrupt the image saved.
How do I create a matrix with a given size and type, and then copy the contents of the memory to this matrix?
You need to create a deep copy. You can use clone:
cv::Mat colorFrame = cv::Mat(height, width, CV_8UC3, pointerToMemoryOfCamera).clone();
You can also speed up the process of saving the images using matwrite and matread functions.

Extracting RGB channels in OpenCV under C++

I'm using OpenCV to convert image data captured using an IDS uEye camera into a useful format, using the following code:
IplImage* tmpImg = cvCreateImage(cvSize(width,height),IPL_DEPTH_8U,3);
tmpImg->imageData = pFrameBuffer[k];
frame = cv::cvarrToMat(tmpImg);
This works perfectly - I can then use imwrite(filename,frame); further downstream to write the processed images out as a sensible format. I would ideally like to be able to save the RGB channels as separate 'grayscale' image files, but I don't understand the OpenCV documentation regarding single-channel operations. Can anyone suggest a means of accomplishing this? Preferably it's not overly computationally expensive (looping over an image pixel-by pixel isn't an option - I'm working with 60-230fps video at up to 1280x1064, and all the processing has to be done at the point of capture).
Running the latest Debian Testing if that makes any difference (I don't think it should).
Once you have a cv::Mat object it's pretty simple:
std::vector<cv::Mat> grayPlanes;
cv::split(frame, grayPlanes);
cv::imwrite("blue.png", grayPlanes[0]);
cv::imwrite("green.png", grayPlanes[1]);
cv::imwrite("red.png", grayPlanes[2]);
The split function can directly write to a standard vector and you don't really have to think about memory management and other stuff.

Capture image frames from Kinect and save to Hard drive

My aim is to capture all the frames (RGB) from Kinect at 30 fps and save them to my hard drive. For doing this I took the following approach.
Get the frames from Kinect and store them in an array buffer. Since writing to disk (using imwrite()) takes a bit of time and I may miss some frames while doing so, so instead of directly saving them to the disk, I store them in an array. Now, I have another parallel thread that accesses this array and writes the individual frames to the disk as images.
Now I have used a static array of size 3000 and type Mat. This will suffice since I need to store frames for 1.5 minute videos (1.5 minutes = 2700 frames). I have declared the array as follows :
#define NUM_FRAMES 3000
Mat rgb[NUM_FRAMES];
I have already tested this limit by reading images and saving them to the array using the following code :
for(int i=0; i<NUM_FRAMES; i++)
{
Mat img = imread("image.jpg", CV_LOAD_IMAGE_COLOR);
rgb[i] = img;
imshow("Image", img);
cvWaitKey(10);
}
The above code executed flawlessly.
But one problem is that the code I am using for capturing image using Kinect, captures the image in an IplImage. Thus I need to convert the image to cv::Mat format before using it. I convert it using the following command:
IplImage* color = cvCreateImageHeader(cvSize(COLOR_WIDTH, COLOR_HEIGHT), IPL_DEPTH_8U, 4);
cvSetData(color, colorBuffer, colorLockedRect.Pitch); // colorBuffer and colorLockedRect.Pitch is something that Kinect uses. Not related to OpenCv
rgb[rgb_read++] = Mat(color, FLAG);
Now here lies my problem. Whenever I am setting #define FLAG true, it causes memory leaks and gives me OpenCv Error: Insufficient memory (failed to allocate 1228804 bytes) error.
But if I use #define FLAG false it works correctly, but the frames that I am getting is erroneous as shown below. They are three consecutive frames.
I was moving around my arm and the image got cut in between as can be seen from above.
Can someone please point out the reason for this weird behavior or any other alternate way of obtaining the desired result. I have been struggling with this since a few days now. Please ask for if any further clarifications are required.
I am using OpenCV 2.4.8, Kinect SDK for Windows version-1.8.0 and Microsoft Visual Studio 2010.
Also can someone please explan to me the role of the CopyData parameter in Mat::Mat. I have already gone through this link, but still it's not completely clear. Maybe that's why I could not solve the above error in the first place since it's working is not very clear.
Thanks in advance.
first, do not use IplImages, stick with cv::Mat, please.
the equivalent code for that would be:
Mat img_borrowed = Mat( height, width, CV_8U4C, colorBuffer, colorLockedRect.Pitch );
note, that this does not do any allocation on its own, it's still the kinect's pixels, so you will have to clone() it:
rgb[rgb_read++] = img_borrowed.clone();
this is the same as setting the flag in your code above to 'true'. (deep-copy the data)
[edit] maybe it's a good idea to skip the useless 4th channel (also less mem required), so , instead of the above you could do:
cvtColor( img_borrowed, rgb[rgb_read++], CV_BGRA2BGR); // will make a 'deep copy', too.
so, - here's the bummer: if you don't save a deep-copy in your array, you'll end up with garbled (and all the same!) images, probably even with undefined behaviour due to the locking/unlocking of the kinect buffer, if you do copy it (and you must), you will need a lot of memory.
unlikely, that you can keep 3000 *1024*786*4 = 9658368000 bytes in memory, you'll have to cut it down one way or another.