I was browsing over some code in the OpenCV page when it came to accessing Pixel Data
IplImage* img=cvCreateImage(cvSize(640,480),IPL_DEPTH_32F,3);
CvScalar s;
s=cvGet2D(img,i,j); // get the (i,j) pixel value
printf("B=%f, G=%f, R=%f\n",s.val[0],s.val[1],s.val[2]);
s.val[0]=111;
s.val[1]=111;
s.val[2]=111;
cvSet2D(img,i,j,s); // set the (i,j) pixel value
I had done something similar, but I used the Template Class provided to access pixel data......anyways Im not sure I understand the part s.val[0]=111....etc?
if s.val[0] contains the B value, what exactly is s.val[0]=111 doing? is it setting it to black?........I dont understand exactly what it's supposed to be?
Im used to CVscalars and such but I dont understand this format? Specifically what 111 means?
thanks
The cvSet2D(img, i, j, s) functions to not access the (i,j)th pixel. It accesses the (j,i)th pixel. That is because images are stored as a matrix - you need to specify the row first (the Y coordinate) and then the column (the X coordinate).
Instead of using the cvGet/Set functions, did you try using pointers to access data within an image?
If you want direct access to the pixels, after loading an image you could do something like:
// This example converts a colored image to its grayscale version.
// Let's say that rgb_img is your previously loaded image.
IplImage* gray_frame = 0;
gray_frame = cvCreateImage(cvSize(rgb_img->width, rgb_img->height), rgb_img->depth, rgb_img->nChannels);
if (!gray_frame)
{
fprintf(stderr, "!!! cvCreateImage failed!\n" );
return NULL;
}
for (int i = 0; i < rgb_img->width * rgb_img->height * rgb_img->nChannels; i += rgb_img->nChannels)
{
gray_frame->imageData[i] = (rgb_img->imageData[i] + rgb_img->imageData[i+1] + rgb_img->imageData[i+2])/3; //B
gray_frame->imageData[i+1] = (rgb_img->imageData[i] + rgb_img->imageData[i+1] + rgb_img->imageData[i+2])/3; //G
gray_frame->imageData[i+2] = (rgb_img->imageData[i] + rgb_img->imageData[i+1] + rgb_img->imageData[i+2])/3; //R
}
Related
I've been trying to use opencv::forEach, but the position argument seems to correspond to the z,y,x, position, instead of x,y,z position. I can't find this stated anywhere though, and OpenCV appears to imply the opposite in their documentation.
// Creating 3D matrix (255 x 255 x 255) typed uint8_t
// and initialize all elements by the value which equals elements position.
// i.e. pixels (x,y,z) = (1,2,3) is (b,g,r) = (1,2,3).
int sizes[] = { 255, 255, 255 };
typedef cv::Point3_<uint8_t> Pixel;
Mat_<Pixel> image = Mat::zeros(3, sizes, CV_8UC3);
image.forEach<Pixel>([&](Pixel& pixel, const int position[]) -> void {
pixel.x = position[0];
pixel.y = position[1];
pixel.z = position[2];
});
I've looked this up outside of OpenCV's documenation, but apparently 99% of people don't use this for more than 1D matrices. I've seen no documentation state explicitly what order the position elements are in. And OpenCV is not shy about using either natural, or C order indexing, the at<T> function for example supports both.
Each pixel is stored in BGR format
I want to manipulate the color of loaded images but I am having trouble when I try to backup the pixel data. My code looks something like this:
Uint32* pixels, oriPixels;
SDL_Surface* image;
void BackupPixelData()
{
pixels = (Uint32*)image->pixels;
oriPixels = new Uint32[image->w * image->h];
for (int i = 0; i < image->w * image->h; i++)
{
oriPixels[i] = pixels[i]; //This causes an access violation midway through
*(oriPixels + i) = *(pixels + i); //Using this method does not cause any crash, but the image will have artifacts
}
}
I can get the code to work by changing oriPixels into a vector of Uint32, and I haven't experienced any issues doing that (the image can be restored to the original color using the oriPixels).
What should I do to properly load the pixel data?
The image is in 32 bits.
You must take into account data alignment. Rows may be padded depending on the surface format.
Check the documentation about the pitch field of the SDL_Surface for more details https://wiki.libsdl.org/SDL_Surface.
You get the access violation because the memory buffer size is not width * height but actually pitch * height.
currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)
I'm working with Kinect sensor and I'm trying to align depth and color frames so that I can save them as images which "fit" into each other. I've spent a lot of time going through msdn forums and modest documentation of Kinect SDK and I'm getting absolutely nowhere.
Based on this answer: Kinect: Converting from RGB Coordinates to Depth Coordinates
I have the following function, where depthData and colorData are obtained from NUI_LOCKED_RECT.pBits and mappedData is the output containing new color frame, mapped to depth coordinates:
bool mapColorFrameToDepthFrame(unsigned char *depthData, unsigned char* colorData, unsigned char* mappedData)
{
INuiCoordinateMapper* coordMapper;
// Get coordinate mapper
m_pSensor->NuiGetCoordinateMapper(&coordMapper);
NUI_DEPTH_IMAGE_POINT* depthPoints = new NUI_DEPTH_IMAGE_POINT[640 * 480];
HRESULT result = coordMapper->MapColorFrameToDepthFrame(NUI_IMAGE_TYPE_COLOR, NUI_IMAGE_RESOLUTION_640x480, NUI_IMAGE_RESOLUTION_640x480, 640 * 480, reinterpret_cast<NUI_DEPTH_IMAGE_PIXEL*>(depthData), 640 * 480, depthPoints);
if (FAILED(result))
{
return false;
}
int pos = 0;
int* colorRun = reinterpret_cast<int*>(colorData);
int* mappedRun = reinterpret_cast<int*>(mappedData);
// For each pixel of new color frame
for (int i = 0; i < 640 * 480; ++i)
{
// Find the corresponding pixel in original color frame from depthPoints
pos = (depthPoints[i].y * 640) + depthPoints[i].x;
// Set pixel value if it's within frame boundaries
if (pos < 640 * 480)
{
mappedRun[i] = colorRun[pos];
}
}
return true;
}
All I get when running this code is an unchanged color frame with removed (white) all pixels where depthFrame had no information.
With the OpenNI framework there an option call registration.
IMAGE_REGISTRATION_DEPTH_TO_IMAGE – The depth image is transformed to have the same apparent vantage point as the RGB image.
OpenNI 2.0 and Nite 2.0 works very well to capture Kinect information and there a lot of tutorials.
You can have a look to this :
Kinect with OpenNI
And OpenNi have a example in SimplerViewer that merge Depth and Color maybe you can just look on that and try it.
This might not be the quick answer you're hoping for, but this transformation is done successfully within the ofxKinectNui addon for openFrameworks (see here).
It looks like ofxKinectNui delegates to the GetColorPixelCoordinatesFromDepthPixel function defined here.
I think the problem is that you're calling MapColorFrameToDepthFrame, when you should actually call MapDepthFrameToColorFrame.
The smoking gun is this line of code:
mappedRun[i] = colorRun[pos];
Reading from pos and writing to i is backwards, since pos = depthPoints[i] represents the depth coordinates corresponding to the color coordinates at i. You actually want to iterate over writing all depth coordinates and read from the input color image at the corresponding color coordinates.
I think that in your code there are different not correct lines.
First of all, which kind of depth map are you passing to your function?
Depth data is storred using two bytes for each value, that means that the correct type of the pointer that you should use for your depth data
is unsigned short.
Second point is that from what i have understood, you want to map depth frame to color frame, so the correct function that you have
to call from kinect sdk is MapDepthFrameToColorFrame instead of MapColorFrameToDepthFrame.
Finally the function will return a map of point where for each depth data at position [i], you have the position x and position y where that point should
be mapped.
To do this you don't need for colorData pointer.
So your function should be modified as follow:
/** Method used to build a depth map aligned to color frame
#param [in] depthData : pointer to your data;
#param [out] mappedData : pointer to your aligned depth map;
#return true if is all ok : false whene something wrong
*/
bool DeviceManager::mapColorFrameToDepthFrame(unsigned short *depthData, unsigned short* mappedData){
INuiCoordinateMapper* coordMapper;
NUI_COLOR_IMAGE_POINT* colorPoints = new NUI_COLOR_IMAGE_POINT[640 * 480]; //color points
NUI_DEPTH_IMAGE_PIXEL* depthPoints = new NUI_DEPTH_IMAGE_PIXEL[640 * 480]; // depth pixel
/** BE sURE THAT YOU ARE WORKING WITH THE RIGHT HEIGHT AND WIDTH*/
unsigned long refWidth = 0;
unsigned long refHeight = 0;
NuiImageResolutionToSize( NUI_IMAGE_RESOLUTION_640x480, refWidth, refHeight );
int width = static_cast<int>( refWidth ); //get the image width in a right way
int height = static_cast<int>( refHeight ); //get the image height in a right way
m_pSensor>NuiGetCoordinateMapper(&coordMapper); // get the coord mapper
//Map your frame;
HRESULT result = coordMapper->MapDepthFrameToColorFrame( NUI_IMAGE_RESOLUTION_640x480, width * height, depthPoints, NUI_IMAGE_TYPE_COLOR, NUI_IMAGE_RESOLUTION_640x480, width * height, colorPoints );
if (FAILED(result))
return false;
// apply map in terms of x and y (image coordinates);
for (int i = 0; i < width * height; i++)
if (colorPoints[i].x >0 && colorPoints[i].x < width && colorPoints[i].y>0 && colorPoints[i].y < height)
*(mappedData + colorPoints[i].x + colorPoints[i].y*width) = *(depthData + i );
// free your memory!!!
delete colorPoints;
delete depthPoints;
return true;
}
Make sure that your mappedData has been initialized in correct way, for example as follow.
mappedData = (USHORT*)calloc(width*height, sizeof(ushort));
Remember that kinect sdk does not provide an accurate align function between color and depth data.
If you want an accurate alignment between two images you should use a calibration model.
In that case i suggest you to use the Kinect Calibration Toolbox, based on Heikkilä calibration model.
You can find it in the follow link:
http://www.ee.oulu.fi/~dherrera/kinect/.
First of all, you must calibrate your device.
That means, you should calibrate the RGB and the IR sensor and then find the transformation between RGB and IR.
Once you know this information, you can apply the function:
RGBPoint = RotationMatrix * DepthPoint + TranslationVector
Check OpenCV or ROS projects for further details on it.
Extrinsic Calibration
Intrinsic Calibration
I'm using OpenCV for object detection and one of the operations I would like to be able to perform is a per-pixel square root. I imagine the loop would be something like:
IplImage* img_;
...
for (int y = 0; y < img_->height; y++) {
for(int x = 0; x < img_->width; x++) {
// Take pixel square root here
}
}
My question is how can I access the pixel value at coordinates (x, y) in an IplImage object?
Assuming img_ is of type IplImage, and assuming 16 bit unsigned integer data, I would say
unsigned short pixel_value = ((unsigned short *)&(img_->imageData[img_->widthStep * y]))[x];
See also here for IplImage definition.
OpenCV IplImage is a one dimensional array. You must create a single index to get at image data. The position of your pixel will be based on the color depth, and number of channels in your image.
// width step
int ws = img_->withStep;
// the number of channels (colors)
int nc = img_->nChannels;
// the depth in bytes of the color
int d = img_->depth&0x0000ffff) >> 3;
// assuming the depth is the size of a short
unsigned short * pixel_value = (img_->imageData)+((y*ws)+(x*nc*d));
// this gives you a pointer to the first color in a pixel
//if your are rolling grayscale just dereference the pointer.
You can pick a channel (color) by moving over pixel pointer pixel_value++. I would suggest using a look up table for square roots of pixels if this is going to be any sort of real time application.
please use the CV_IMAGE_ELEM macro.
Also, consider using cvPow with power=0.5 instead of working on pixels yourself, which should be avoided anyways
You may find several ways of reaching image elements in Gady Agam's nice OpenCV tutorial here.