sws_scale on raw yuv420p data - c++

I am getting camera data from my raspberry pi in yuv420p format. I am trying to use sws_scale to convert them into RGBA format. So this is how I initialize my context:
_sws_context = sws_getContext(CAMERA_WIDTH, CAMERA_HEIGHT, AV_PIX_FMT_YUV420P,
CAMERA_WIDTH, CAMERA_HEIGHT, AV_PIX_FMT_RGBA, 0, nullptr, nullptr, nullptr);
I am now a bit confused on how to set the data and line size for sws_scale. From the camera I just get a plain array of bytes without further structure. I assume I have to subdivide that into the planes somehow. My first approach was not to separate it at all and essentially have something like this (based on the fact that :
const uint8_t *src_data[] = {data.data()};
const int src_strides[] = {(int) std::ceil((CAMERA_WIDTH * 6) / 8)};
This was based on:
there are 12 bits for a 2x2 grid of pixels
So I assumed one line would use half of this. But that causes a segmentation fault. So I think I somehow have to split src_data and src_strides into the respective YUV planes, but I am not sure how to do this, especially since one pixel for YUV420 data uses less than one byte per plane...

Turns out it is simpler than I thought! The planes are one after another which also makes the strides pretty obvious:
const auto y = data.data();
const auto u = y + CAMERA_WIDTH * CAMERA_HEIGHT;
const auto v = u + (CAMERA_WIDTH * CAMERA_HEIGHT) / 4;
const auto stride_y = CAMERA_WIDTH;
const auto stride_u = CAMERA_WIDTH / 2;
const auto stride_v = CAMERA_WIDTH / 2;

Related

How to create a custom winrt::Microsoft::AI::MachineLearning::TensorFloat16Bit?

How do I create a TensorFloat16Bit when manually doing a tensorization of the data?
We tensorized our data based on this Microsoft example, where we are converting 255-0 to 1-0, and changing the RGBA order.
...
std::vector<int64_t> shape = { 1, channels, height , width };
float* pCPUTensor;
uint32_t uCapacity;
// The channels of image stored in buffer is in order of BGRA-BGRA-BGRA-BGRA.
// Then we transform it to the order of BBBBB....GGGGG....RRRR....AAAA(dropped)
TensorFloat tf = TensorFloat::Create(shape);
com_ptr<ITensorNative> itn = tf.as<ITensorNative>();
CHECK_HRESULT(itn->GetBuffer(reinterpret_cast<BYTE**>(&pCPUTensor), &uCapacity));
// 2. Transform the data in buffer to a vector of float
if (BitmapPixelFormat::Bgra8 == pixelFormat)
{
for (UINT32 i = 0; i < size; i += 4)
{
// suppose the model expects BGR image.
// index 0 is B, 1 is G, 2 is R, 3 is alpha(dropped).
UINT32 pixelInd = i / 4;
pCPUTensor[pixelInd] = (float)pData[i];
pCPUTensor[(height * width) + pixelInd] = (float)pData[i + 1];
pCPUTensor[(height * width * 2) + pixelInd] = (float)pData[i + 2];
}
}
ref: https://github.com/microsoft/Windows-Machine-Learning/blob/2179a1dd5af24dff4cc2ec0fc4232b9bd3722721/Samples/CustomTensorization/CustomTensorization/TensorConvertor.cpp#L59-L77
I just converted our .onnx model to float16 to verify if that would provide some performance improvements on the inference when the available hardware provides support for float16. However, the binding is failing and the suggestion here is to pass a TensorFloat16Bit.
So if I swap the TensorFloat for TensorFloat16Bit I get an access violation exception at pCPUTensor[(height * width * 2) + pixelInd] = (float)pData[i + 2]; because pCPUTensor is half of the size of what it was. It seems like I should be reinterpreting_cast to uint16_t** or something among those lines, so pCPUTensor will have the same size as when it was a TensorFloat, but then I get further errors that it can only be uint8_t** or BYTE**.
Any ideas on how I can modify this code so I can get a custom TensorFloat16Bit?
Try the factory methods on TensorFloat16Bit.
However, you will need to convert you data to float16:
https://stackoverflow.com/a/60047308/11998382
Also, I might recommend you instead do the conversion within the onnx model.

Flatten array of structs efficiently

I'm looking for the most efficient way to flatten an array of structs in C++ for passing the flattend 1D array data as input to a cv::Mat. The struct looks as follows:
struct Color3
{
uint8_t red, green, blue;
}
My code then looks like this:
// Update color frame
cv::Mat colorMat = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3)
const Color3* colorPtr = colorFrame->getData(); // Get Frame from Library
std::vector<uchar> vecColorData;
data.reserve(1920 * 1080 * 3);
for (int i = 0; i < 1920 * 1080; ++i)
{
auto color = *colorPtr;
vecColorData.push_back(color.red);
vecColorData.push_back(color.green);
vecColorData.push_back(color.blue);
vecColorData++;
}
colorMat.data = vecColorData.data();
Is there a more efficient way than creating an intermediate std::vector and looping over the entire array? I guess I'm looking for something like:
colorMat.data = colorFrame->getData()
However, I'm getting the following error: a value of type Color3* cannot be assigned to an entity of type uchar*.
you don't need an intermediate vector.
If I understood, you want to assign the same RGB triple to all data.
It is also unclear to me if you have to allocate colorMat.data on your own or not.
If this is the case, once colorMat.data is allocated and sized 1920 * 1080 * 3, you can do something like the following:
uchar * data = colorMat.data;
for (int i = 0; i < 1920 * 1080; ++i)
{
*data++ = (uchar)colorPtr->red;
*data++ = (uchar)colorPtr->green;
*data++ = (uchar)colorPtr->.blue;
}
The following answer is not technically portable but will work on the vast majority of platforms you will encounter in real life.
It is extremely likely that your Color3 struct has no padding. You can veryify this by using a static_assert:
static_assert(sizeof(Color3) == sizeof(uint8_t) * 3);
With this confirmed you can cast an array of Color3 to an array of uint8_t and pass it directly to colorMat.data (assuming that member actually accepts uint8_t).
Your code therefore becomes:
cv::Mat colorMat = cv::Mat::zeros(cv::Size(1920, 1080), CV_8UC3)
const Color3* colorPtr = colorFrame->getData(); // Get Frame from Library
colorMat.data = reinterpret_cast<const uint8_t*>(colorPtr);
Bear in mind I have never used the cv library and know nothing about the ownership requirements of the data pointer. The above just replicates what you're doing without the unnecessary std::vector.

(C++)(Visual Studio) Change RGB to Grayscale

I am accessing the image like so:
pDoc = GetDocument();
int iBitPerPixel = pDoc->_bmp->bitsperpixel; // used to see if grayscale(8 bits) or RGB (24 bits)
int iWidth = pDoc->_bmp->width;
int iHeight = pDoc->_bmp->height;
BYTE *pImg = pDoc->_bmp->point; // pointer used to point at pixels in the image
int Wp = iWidth;
const int area = iWidth * iHeight;
int r; // red pixel value
int g; // green pixel value
int b; // blue pixel value
int gray; // gray pixel value
BYTE *pImgGS = pImg; // grayscale image pixel array
and attempting to change the rgb image to gray like so:
// convert RGB values to grayscale at each pixel, then put in grayscale array
for (int i = 0; i<iHeight; i++)
for (int j = 0; j<iWidth; j++)
{
r = pImg[i*iWidth * 3 + j * 3 + 2];
g = pImg[i*iWidth * 3 + j * 3 + 1];
b = pImg[i*Wp + j * 3];
r * 0.299;
g * 0.587;
b * 0.144;
gray = std::round(r + g + b);
pImgGS[i*Wp + j] = gray;
}
finally, this is how I try to draw the image:
//draw the picture as grayscale
for (int i = 0; i < iHeight; i++) {
for (int j = 0; j < iWidth; j++) {
// this should set every corresponding grayscale picture to the current picture as grayscale
pImg[i*Wp + j] = pImgGS[i*Wp + j];
}
}
}
original image:
and the resulting image that I get is this:
First check if image type is 24 bits per pixels.
Second, allocate memory to pImgGS;
BYTE* pImgGS = (BTYE*)malloc(sizeof(BYTE)*iWidth *iHeight);
Please refer this article to see how bmp data is saved. bmp images are saved upside down. Also, first 54 byte of information is BITMAPFILEHEADER.
Hence you should access values in following way,
double r,g,b;
unsigned char gray;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[(i*iWidth + j)*3 + 2];
g = (double)pImg[(i*iWidth + j)*3 + 1];
b = (double)pImg[(i*iWidth + j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
}
If there is padding present, then first determine padding and access in different way. Refer this to understand pitch and padding.
double r,g,b;
unsigned char gray;
long index=0;
for (int i = 0; i<iHeight; i++)
{
for (int j = 0; j<iWidth; j++)
{
r = (double)pImg[index+ (j)*3 + 2];
g = (double)pImg[index+ (j)*3 + 1];
b = (double)pImg[index+ (j)*3 + 0];
r= r * 0.299;
g= g * 0.587;
b= b * 0.144;
gray = floor((r + g + b + 0.5));
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
}
index =index +pitch;
}
While drawing image,
as pImg is 24bpp, you need to copy gray values thrice to each R,G,B channel. If you ultimately want to save grayscale image in bmp format, then again you have to write bmp data upside down or you can simply skip that step in converting to gray here:
pImgGS[(iHeight-i-1)*iWidth + j] = gray;
tl; dr:
Make one common path. Convert everything to 32-bits in a well-defined manner, and do not use image dimensions or coordinates. Refactor the YCbCr conversion ( = grey value calculation) into a separate function, this is easier to read and runs at exactly the same speed.
The lengthy stuff
First, you seem to have been confused with strides and offsets. The artefact that you see is because you accidentially wrote out one value (and in total only one third of the data) when you should have written three values.
One can get confused with this easily, but here it happened because you do useless stuff that you needed not do in the first place. You are iterating coordinates left to right, top-to-bottom and painstakingly calculate the correct byte offset in the data for each location.
However, you're doing a full-screen effect, so what you really want is iterate over the complete image. Who cares about the width and height? You know the beginning of the data, and you know the length. One loop over the complete blob will do the same, only faster, with less obscure code, and fewer opportunities of getting something wrong.
Next, 24-bit bitmaps are common as files, but they are rather unusual for in-memory representation because the format is nasty to access and unsuitable for hardware. Drawing such a bitmap will require a lot of work from the driver or the graphics hardware (it will work, but it will not work well). Therefore, 32-bit depth is usually a much better, faster, and more comfortable choice. It is much more "natural" to access program-wise.
You can rather trivially convert 24-bit to 32-bit. Iterate over the complete bitmap data and write out a complete 32-bit word for each 3 byte-tuple read. Windows bitmaps ignore the A channel (the highest-order byte), so just leave it zero, or whatever.
Also, there is no such thing as a 8-bit greyscale bitmap. This simply doesn't exist. Although there exist bitmaps that look like greyscale bitmaps, they are in reality paletted 8-bit bitmaps where (incidentially) the bmiColors member contains all greyscale values.
Therefore, unless you can guarantee that you will only ever process images that you have created yourself, you cannot just rely that e.g. the values 5 and 73 correspond to 5/255 and 73/255 greyscale intensity, respectively. That may be the case, but it is in general a wrong assumption.
In order to be on the safe side as far as correctness goes, you must convert your 8-bit greyscale bitmaps to real colors by looking up the indices (the bitmap's grey values are really indices) in the palette. Otherwise, you could be loading a greyscale image where the palette is the other way around (so 5 would mean 250 and 250 would mean 5), or a bitmap which isn't greyscale at all.
So... you want to convert 24-bit and you want to convert 8-bit bitmaps, both to 32-bit depth. That means you do all the annoying what-if stuff once at the beginning, and the rest is one identical common path. That's a good thing.
What you will be showing on-screen is always a 32-bit bitmap where the topmost byte is ignored, and the lower three are all the same value, resulting in what looks like a shade of grey. That's simple, and simple is good.
Note that if you do a BT.601 style YCbCr conversion (as indicated by your use of the constants 0.299, 0.587, and 0.144), and if your 8-bit greyscale images are perceptive (this is something you must know, there is no way of telling from the file!), then for 100% correctness, you need to to the inverse transformation when converting from paletted 8-bit to RGB. Otherwise, your final result will look like almost right, but not quite. If your 8-bit greycales are linear, i.e. were created without using the above constants (again, you must know, you cannot tell from the image), you need to copy everything as-is (here, doing the conversion would make it look almost-but-not-quite right).
About the RGB-to-greyscale conversion, you do not need an extra greyscale bitmap just to hold the values that you never need again afterwards. You can read the three color values from the loaded bitmap, calculate Y, and directly build the 32-bit ARGB word, which you then write out to the final bitmap. This saves one entirely useless round-trip to memory which is not necessary.
Something like this:
uint32_t* out = (uint32_t*) output_bitmap_data;
for(int i = 0; i < inputSize; i+= 3)
{
uint8_t Y = calc_greyscale(in[0], in[1], in[2]);
*out++ = (Y<<16) | (Y<<8) | Y;
}
Alternatively, you can also do the from-whatever-to-32 conversion, and then do the to-greyscale conversion in-place there. This, in turn, introduces an extra round-trip to memory, but the code becomes much, much easier overall.

C++AMP Computing gradient using texture on a 16 bit image

I am working with depth images retrieved from kinect which are 16 bits. I found some difficulties on making my own filters due to the index or the size of the images.
I am working with Textures because allows to work with any bit size of images.
So, I am trying to compute an easy gradient to understand what is wrong or why it doesn't work as I expected.
You can see that there is something wrong when I use y dir.
For x:
For y:
That's my code:
typedef concurrency::graphics::texture<unsigned int, 2> TextureData;
typedef concurrency::graphics::texture_view<unsigned int, 2> Texture
cv::Mat image = cv::imread("Depth247.tiff", CV_LOAD_IMAGE_ANYDEPTH);
//just a copy from another image
cv::Mat image2(image.clone() );
concurrency::extent<2> imageSize(640, 480);
int bits = 16;
const unsigned int nBytes = imageSize.size() * 2; // 614400
{
uchar* data = image.data;
// Result data
TextureData texDataD(imageSize, bits);
Texture texR(texDataD);
parallel_for_each(
imageSize,
[=](concurrency::index<2> idx) restrict(amp)
{
int x = idx[0];
int y = idx[1];
// 65535 is the maxium value that can take a pixel with 16 bits (2^16 - 1)
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
texR.set(idx, valX);
});
//concurrency::graphics::copy(texR, image2.data, imageSize.size() *(bits / 8u));
concurrency::graphics::copy_async(texR, image2.data, imageSize.size() *(bits) );
cv::imshow("result", image2);
cv::waitKey(50);
}
Any help will be very appreciated.
Your indexes are swapped in two places.
int x = idx[0];
int y = idx[1];
Remember that C++AMP uses row-major indices for arrays. Thus idx[0] refers to row, y axis. This is why the picture you have for "For x" looks like what I would expect for texR.set(idx, valY).
Similarly the extent of image is also using swapped values.
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
Here imageSize[0] refers to the number of columns (the y value) not the number of rows.
I'm not familiar with OpenCV but I'm assuming that it also uses a row major format for cv::Mat. It might invert the y axis with 0, 0 top-left not bottom-left. The Kinect data may do similar things but again, it's row major.
There may be other places in your code that have the same issue but I think if you double check how you are using index and extent you should be able to fix this.

Converting YUV into BGR or RGB in OpenCV

I have a TV capture card that has a feed coming in as a YUV format. I've seen other posts here similar to this question and attempted to try every possible method stated, but neither of them provided a clear image. At the moment the best results were with the OpenCV cvCvtColor(scr, dst, CV_YUV2BGR) function call.
I am currently unaware of the YUV format and to be honest confuses me a little bit as it looks like it stores 4 channels, but is only 3? I have included an image from the capture card to hope that someone can understand what is possibly going on that I could use to fill in the blanks.
The feed is coming in through a DeckLink Intensity Pro card and being accessed in a C++ application in using OpenCV in a Windows 7 environment.
Update
I have looked at a wikipedia article regarding this information and attempted to use the formula in my application. Below is the code block with the output received from it. Any advice is greatly appreciated.
BYTE* pData;
videoFrame->GetBytes((void**)&pData);
m_nFrames++;
printf("Num Frames executed: %d\n", m_nFrames);
for(int i = 0; i < 1280 * 720 * 3; i=i+3)
{
m_RGB->imageData[i] = pData[i] + pData[i+2]*((1 - 0.299)/0.615);
m_RGB->imageData[i+1] = pData[i] - pData[i+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[i+2]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+2] = pData[i] + pData[i+1]*((1 - 0.114)/0.436);
}
In newer version of OPENCV there is a built in function can be used to do YUV to RGB conversion
cvtColor(src,dst,CV_YUV2BGR_YUY2);
specify the YUV format after the underscore, like this CV_YUYV2BGR_xxxx
It looks to me like you're decoding a YUV422 stream as YUV444. Try this modification to the code you provided:
for(int i = 0, j=0; i < 1280 * 720 * 3; i+=6, j+=4)
{
m_RGB->imageData[i] = pData[j] + pData[j+3]*((1 - 0.299)/0.615);
m_RGB->imageData[i+1] = pData[j] - pData[j+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[j+3]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+2] = pData[j] + pData[j+1]*((1 - 0.114)/0.436);
m_RGB->imageData[i+3] = pData[j+2] + pData[j+3]*((1 - 0.299)/0.615);
m_RGB->imageData[i+4] = pData[j+2] - pData[j+1]*((0.114*(1-0.114))/(0.436*0.587)) - pData[j+3]*((0.299*(1 - 0.299))/(0.615*0.587));
m_RGB->imageData[i+5] = pData[j+2] + pData[j+1]*((1 - 0.114)/0.436);
}
I'm not sure you've got your constants correct, but at worst your colors will be off - the image should be recognizable.
I use the following C++ code using OpenCV to convert yuv data (YUV_NV21) to rgb image (BGR in OpenCV)
int main()
{
const int width = 1280;
const int height = 800;
std::ifstream file_in;
file_in.open("../image_yuv_nv21_1280_800_01.raw", std::ios::binary);
std::filebuf *p_filebuf = file_in.rdbuf();
size_t size = p_filebuf->pubseekoff(0, std::ios::end, std::ios::in);
p_filebuf->pubseekpos(0, std::ios::in);
char *buf_src = new char[size];
p_filebuf->sgetn(buf_src, size);
cv::Mat mat_src = cv::Mat(height*1.5, width, CV_8UC1, buf_src);
cv::Mat mat_dst = cv::Mat(height, width, CV_8UC3);
cv::cvtColor(mat_src, mat_dst, cv::COLOR_YUV2BGR_NV21);
cv::imwrite("yuv.png", mat_dst);
file_in.close();
delete []buf_src;
return 0;
}
and the converted result is like the image yuv.png.
you can find the testing raw image from here and the whole project from my Github Project
It may be the wrong path, but many people (I mean, engineers) do mix YUV with YCbCr.
Try to
cvCvtColor(src, dsc, CV_YCbCr2RGB)
or CV_YCrCb2RGB or maybe a more exotic type.
The BlackMagic Intensity software return YUVY' format in bmdFormat8BitYUV, so 2 sources pixels are compressed into 4bytes - I don't think openCV's cvtColor can handle this.
You can either do it yourself, or just call the Intensity software ConvertFrame() function
edit: Y U V is normally stored as
There is a Y (brightness) for each pixel but only a U and V (colour) for every alternate pixel in the row.
So if data is an unsigned char pointing to the start of the memory as shown above.
pixel 1, Y = data[0] U = data[+1] V = data[+3]
pixel 2, Y = data[+2] U = data[+1] V = data[+3]
Then use the YUV->RGB coefficients you used in your sample code.
Maybe someone is confused by color models YCbCr and YUV.
Opencv does not handle YCbCr. Instead it has YCrCb, and it implemented the same way as YUV in opencv.
From the opencv sources https://github.com/Itseez/opencv/blob/2.4/modules/imgproc/src/color.cpp#L3830:
case CV_BGR2YCrCb: case CV_RGB2YCrCb:
case CV_BGR2YUV: case CV_RGB2YUV:
// ...
// 1 if it is BGR, 0 if it is RGB
bidx = code == CV_BGR2YCrCb || code == CV_BGR2YUV ? 0 : 2;
//... converting to YUV with the only difference that brings
// order of Blue and Red channels (variable bidx)
But there is one more thing to say.
There is currently a bug in conversion CV_BGR2YUV and CV_RGB2YUV in OpenCV branch 2.4.* .
At present, this formula is used in implementation:
Y = 0.299B + 0.587G + 0.114R
U = 0.492(R-Y)
V = 0.877(B-Y)
What it should be (according to wikipedia):
Y = 0.299R + 0.587G + 0.114B
U = 0.492(B-Y)
V = 0.877(R-Y)
The channels Red and Blue are misplaced in the implemented formula.
Possible workaround to convert BGR->YUV while the bug is not fixed :
cv::Mat source = cv::imread(filename, CV_LOAD_IMAGE_COLOR);
cv::Mat yuvSource;
cvtColor(source, yuvSource, cv::COLOR_BGR2RGB); // rearranges B and R in the appropriate order
cvtColor(yuvSource, yuvSource, cv::COLOR_BGR2YUV);
// yuvSource will contain here correct image in YUV color space