Blackmagic frame : convert from yuv to RGB to use in openCV - c++

I have a problem to convert an image captured from a camera in YUV format to RGB format.
The function which is used to do it is the following :
int uwidth = 1920;
int uheight= 1080;
int i = 0,j = 0, r = 0, g = 0, b = 0;
typedef unsigned char BYTE;
IplImage* m_RGB = cvCreateImage(cvSize(uwidth, uheight), IPL_DEPTH_8U, 3);
unsigned char* pData = (unsigned char *) frameBytes;
for(i = 0, j=0; i < uwidth * uheight*3 ; i+=6, j+=4)
{
unsigned char u = pData[j];
unsigned char y = pData[j+1];
unsigned char v = pData[j+2];
b = 1.0*y + 8 + 1.402*(v-128);
g = 1.0*y - 0.34413*(u-128) - 0.71414*(v-128);
r = 1.0*y + 1.772*(u-128);
if(r>255) r =255;
if(g>255) g =255;
if(b>255) b =255;
if(r<0) r =0;
if(g<0) g =0;
if(b<0) b =0;
m_RGB->imageData[i] = (BYTE)(r*220/256);
m_RGB->imageData[i+1] = (BYTE)(g*220/256);
m_RGB->imageData[i+2] =(BYTE)(b*220/256);
}
cvNamedWindow("ck", CV_WINDOW_AUTOSIZE);
cvShowImage( "ck", m_RGB );
cvReleaseImage(&m_RGB);
The problem is that we have not one but two images in the window on my screen, and that we have good colors but not the good ratio.
Does anyone have an idea about those problems ?
Edit: Image output

Let's assume imageData is defined as
BYTE* imageData;
In this case this loop tells a lot:
for(i = 0, j=0; i < uwidth * uheight*3 ; i+=6, j+=4)
i+=6 means each time you set a pixel you will skip the next pixel (or what you expected to do, set 2 pixels at a time).
j+=4
unsigned char u = pData[j];
unsigned char y = pData[j+1];
unsigned char v = pData[j+2];
Means that the format of your camera is UYVY :
It describe two successive pixels P0 and P1
The chroma channel is the same for P0 and P1.U = U0 = U1 and V = V0 = V1
The lumina channel is different. the first is for P0, the second for P1.
You need to set 2 pixels by iterations :
m_RGB->imageData[i] = r1;
m_RGB->imageData[i+1] = g1;
m_RGB->imageData[i+2] =b1;
m_RGB->imageData[i+3] = r2;
m_RGB->imageData[i+4] = g2;
m_RGB->imageData[i+5] =b2;
The difference between r1 and r2 (and others) is that you use two different Y in the conversion formula.

If you're programming for Mac, have a look at the recent additions to the vImage conversion library in OSX 10.9 and even 10.10. The stuff in there is truly mind blowing. I was doing my own 24-bit to 32-bit conversions and had it honed to about 6ms/frame (1920x1080). Promptly blown out of the water by vImage more than 15X.

Related

C++ FFTW forward backward DFTvalues get wrapped

Hello StackOverflow community,
i have a problem with the dft algorithm of the fftw library.
All i want to do is to transform a certain pattern forward and backward to receive the input pattern again, of course there will be some sort of filtering in between the transformations later on.
So, what my program does atm is:
Create a test signal
Filter or "window" the test signal with a value of 1.0 or 0.5
Copy the test signal to a fftw_complex data type
Perform a forward and backward dft
Calculate the magnitude, which is called phase here
Copy and adjust data for display purposes, and finally display the images via OpenCV
My problem is that when is use no filtering my backward transformed image is wrapped somehow and i can't calculate the correct magnitude, which should be indentical to my input image / test signal.
When i set the fitler/"window" to a value of 0.5 the backward transformation works fine, but my input image is just half as bright as it should be.
The following image illustrates my problem: (from top left to bottom right)
1. Input signal, 2. Real part of backward transformation, 3. From backward transformated data calculated magnitude, 4. Input signal multiplied with 0.5, 5. Real part of backward transformation, 6. From backward transformated data calculated magnitude.
http://imageshack.com/a/img538/5426/nbL9YZ.png
Does anybody have an idea why the dft performs in that way?! It's kind of strange...
My code looks like this atm:
/***** parameters **************************************************************************/
int imSize = 256;
int imN = imSize * imSize;
char* interferogram = new char[imN];
double* spectrumReal = new double[imN];
double* spectrumImaginary = new double[imN];
double* outputReal = new double[imN];
double* outputImaginary = new double[imN];
double* phase = new double[imN];
char* spectrumRealChar = new char[imN];
char* spectrumImaginaryChar = new char[imN];
char* outputRealChar = new char[imN];
char* outputImaginaryChar = new char[imN];
char* phaseChar = new char[imN];
Mat interferogramMat = Mat(imSize, imSize, CV_8U, interferogram);
Mat spectrumRealCharMat = Mat(imSize, imSize, CV_8U, spectrumRealChar);
Mat spectrumImaginaryCharMat = Mat(imSize, imSize, CV_8U, spectrumImaginaryChar);
Mat outputRealCharMat = Mat(imSize, imSize, CV_8U, outputRealChar);
Mat outputImaginaryCharMat = Mat(imSize, imSize, CV_8U, outputImaginaryChar);
Mat phaseCharMat = Mat(imSize, imSize, CV_8U, phaseChar);
/***** compute interferogram ****************************************************************/
fill_n(interferogram, imN, 0);
double value = 0;
double window = 0;
for (int y = 0; y < imSize; y++)
{
for (int x = 0; x < imSize; x++)
{
value = 127.5 + 127.5 * cos((2*PI) / 10000 * (pow(double(x - imSize/2), 2) + pow(double(y - imSize/2), 2)));
window = 1;
value *= window;
interferogram[y * imSize + x] = (unsigned char)value;
}
}
/***** create fftw arays and plans **********************************************************/
fftw_complex* input;
fftw_complex* spectrum;
fftw_complex* output;
fftw_plan p_fw;
fftw_plan p_bw;
input = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * imN);
spectrum = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * imN);
output = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * imN);
p_fw = fftw_plan_dft_2d(imSize, imSize, input, spectrum, FFTW_FORWARD, FFTW_ESTIMATE);
p_bw = fftw_plan_dft_2d(imSize, imSize, spectrum, output, FFTW_BACKWARD, FFTW_ESTIMATE);
/***** copy data ****************************************************************************/
for (int i = 0; i < imN; i++)
{
input[i][0] = double(interferogram[i]) / 255.;
input[i][1] = 0.;
spectrum[i][0] = 0.;
spectrum[i][1] = 0.;
output[i][0] = 0.;
output[i][1] = 0.;
}
/***** FPS algorithm ************************************************************************/
fftw_execute(p_fw);
fftw_execute(p_bw);
for (int i = 0; i < imN; i++)
{
phase[i] = sqrt(pow(output[i][0], 2) + pow(output[i][1], 2));
}
/***** copy data ****************************************************************************/
for (int i = 0; i < imN; i++)
{
spectrumReal[i] = spectrum[i][0];
spectrumImaginary[i] = spectrum[i][1];
outputReal[i] = output[i][0] / imN;
outputImaginary[i] = output[i][1];
}
SaveCharImage(interferogram, imN, "01_interferogram_512px_8bit.raw");
SaveDoubleImage(spectrumReal, imN, "02_spectrum_real_512px_64bit.raw");
SaveDoubleImage(spectrumImaginary, imN, "03_spectrum_imaginary_512px_64bit.raw");
SaveDoubleImage(outputReal, imN, "03_output_real_512px_64bit.raw");
DoubleToCharArray(spectrumReal, spectrumRealChar, imSize);
DoubleToCharArray(spectrumImaginary, spectrumImaginaryChar, imSize);
DoubleToCharArray(outputReal, outputRealChar, imSize);
DoubleToCharArray(outputImaginary, outputImaginaryChar, imSize);
DoubleToCharArray(phase, phaseChar, imSize);
/***** show images **************************************************************************/
imshow("interferogram", interferogramMat);
imshow("spectrum real", spectrumRealCharMat);
imshow("spectrum imaginary", spectrumImaginaryCharMat);
imshow("out real", outputRealCharMat);
imshow("out imaginary", outputImaginaryCharMat);
imshow("phase", phaseCharMat);
int key = waitKey(0);
Here are some lines of your code :
char* interferogram = new char[imN];
...
double value = 0;
double window = 0;
for (int y = 0; y < imSize; y++)
{
for (int x = 0; x < imSize; x++)
{
value = 127.5 + 127.5 * cos((2*PI) / 10000 * (pow(double(x - imSize/2), 2) + pow(double(y - imSize/2), 2)));
window = 1;
value *= window;
interferogram[y * imSize + x] = (unsigned char)value;
}
}
The problem is that a char is between -128 and 127, while unsigned char ranges from 0 to 255. In interferogram[y * imSize + x] = (unsigned char)value;, there is an implicit cast to char.
It does not affect the output if window=0.5, but it triggers a change if window=1 as value becomes higher than 127. This is exactly the problem that you noticed in your question !
It does not affect the first displayed image since CV_8U corresponds to unsigned char : interferogram is therefore cast back into a unsigned char*. Take a look at Can I turn unsigned char into char and vice versa? to know more about char to unsigned char cast.
The problem occurs at input[i][0] = double(interferogram[i]) / 255.; : if window=1, interferogram[i] may be negative and input[i][0] becomes negative.
Change all char to unsigned char and it should solve the problem.
You may also change
outputReal[i] = output[i][0] / imN;
outputImaginary[i] = output[i][1];
for
outputReal[i] = output[i][0];
outputImaginary[i] = output[i][1];
Calls to fftw seems to be fine.

AccessVioilationException using BitmapData in c++

Below is my program. I am trying to apply grayscale filter using bitmapdata class in visual c++. I am getting AccessViolationException at 11, tagged by the comment. I have tried using CLR:Safe and CLR:pure but no use. In c# this would be solved by using unsafe block. Any suggestions? None of the other solutions on related questions worked.
Bitmap^ bmp = gcnew Bitmap(pictureBox1->Image);
BitmapData^ data = bmp->LockBits(Rectangle(0,0,bmp->Width,bmp->Height), ImageLockMode::ReadWrite, PixelFormat::Format24bppRgb);
int blue=0, green=0, red=0;
System::IntPtr s = data->Scan0;
int* P = (int*)(void*)s;
for (int i =0; i<bmp->Height;i++)
{
for (int j = 0; j < bmp->Width*3; j++)
{
blue = (int)P[0]; //access violation exception
green =(int )P[1];
red = (int)P[2];
int avg = (int)((blue + green + red) / 3);
P[0] = avg;
P[1] = avg;
P[2] = avg;
P +=3;
}
}
bmp->UnlockBits(data);
pictureBox1->Image = bmp;
You are using an int* when you should be using a byte*. Your pixels are three bytes each, one byte per channel. Your int is (likely) 4 bytes, so p[0] returns an entire pixel plus on byte past it. This is why you get an access violation; you are overrunning the bounds of the image buffer.
When you increment a pointer, you are adding sizeof *p bytes to it. In this case, P += 3 increments the pointer P by 12 bytes. Much too much, and you'll never be able to read a single pixel (or channel) of a 24bpp image with an int*. You are also assuming that your stride is Width * 3, which may or may not be correct (bitmaps are 4 byte aligned.)
Byte* base = (Byte*)data->Scan0;
int stride = data->Stride;
for(int y = 0; y < data->Height; ++y) {
Byte* src = base + y * stride;
for(int x = 0; x < data->Width; ++x, src += 3) {
// bitmaps are stored in BGR order (though not really important here).
// I'm assuming a 24bpp bitmap.
Byte b = src[0];
Byte g = src[1];
Byte r = src[2];
int average = (r + g + b) / 3;
src[0] = src[1] = src[2] = (Byte)average;
}
}

Implement a near real-time CPU capability like glAlphaFunc(GL_GREATER) with RGB source and RGBA overlay

Latency is the biggest concern here. I have found that trying to render 3 1920x1080 video feeds with RGBA overlays to individual windows via OpenGL has limits. I am able to render two windows with overlays or 3 windows without overlays just fine, but when the third window is introduced, rendering stalls are obvious. I believe that the issue is due to the overuse of glAlphaFunc() to overlay and RGBA based texture on an RGB video texture. In order to reduce the overuse, my thought is to move some of the overlay function into CPU (as I have lots of CPU - dual hexcore Xeon). The ideal place to do this would be when copying the source RGB image to the mapped PBO and replacing the RGB values with the ones from the RGBA overlay where A > 0.
I have tried using Intel IPP methods, but there is no method available that doesn't involve multiple calls and results in too much latency. I've tried straight C code, but this takes longer than the 33 ms that I am allowed. I need help with creating an optimized assembly or SSE based routine that will provide minimal latency.
Compile the below code with > g++ -fopenmp -O2 -mtune=native
Basic C function for clarity:
void copyAndOverlay(const uint8_t* aSourceRGB, const uint8_t* aOverlayRGBA, uint8_t* aDestinationRGB, int aWidth, int aHeight) {
int i;
#pragma omp parallel for
for (i=0; i<aWidth*aHeight; ++i) {
if (0 == aOverlayRGBA[i*4+3]) {
aDestinationRGB[i*3] = aSourceRGB[i*3]; // R
aDestinationRGB[i*3+1] = aSourceRGB[i*3+1]; // G
aDestinationRGB[i*3+2] = aSourceRGB[i*3+2]; // B
} else {
aDestinationRGB[i*3] = aOverlayRGBA[i*4]; // R
aDestinationRGB[i*3+1] = aOverlayRGBA[i*4+1]; // G
aDestinationRGB[i*3+2] = aOverlayRGBA[i*4+2]; // B
}
}
}
uint64_t getTime() {
struct timeval tNow;
gettimeofday(&tNow, NULL);
return (uint64_t)tNow.tv_sec * 1000000 + (uint64_t)tNow.tv_usec;
}
int main(int argc, char **argv) {
int pixels = _WIDTH_ * _HEIGHT_ * 3;
uint8_t *rgba = new uint8_t[_WIDTH_ * _HEIGHT_ * 4];
uint8_t *src = new uint8_t[pixels];
uint8_t *dst = new uint8_t[pixels];
uint64_t tStart = getTime();
for (int t=0; t<1000; ++t) {
copyAndOverlay(src, rgba, dst, _WIDTH_, _HEIGHT_);
}
printf("delta: %lu\n", (getTime() - tStart) / 1000);
delete [] rgba;
delete [] src;
delete [] dst;
return 0;
}
Here is an SSE4 implementation that is a little more than 5 times faster than the code you posted with the question (without parallelization of the loop). As written it only works on RGBA buffers that are 16-byte aligned and sized in multiples of 64, and on RGB buffers that are 16-byte aligned and sized in multiples of 48. The size will requirments will jive perfectly with your 1920x1080 resolution, and you may need to add code to ensure your buffers are 16-byte aligned.
void copyAndOverlay(const uint8_t* aSourceRGB, const uint8_t* aOverlayRGBA, uint8_t* aDestinationRGB, int aWidth, int aHeight) {
__m128i const ocmp = _mm_setzero_si128();
__m128i const omskshf1 = _mm_set_epi32(0x00000000, 0x0F0F0F0B, 0x0B0B0707, 0x07030303);
__m128i const omskshf2 = _mm_set_epi32(0x07030303, 0x00000000, 0x0F0F0F0B, 0x0B0B0707);
__m128i const omskshf3 = _mm_set_epi32(0x0B0B0707, 0x07030303, 0x00000000, 0x0F0F0F0B);
__m128i const omskshf4 = _mm_set_epi32(0x0F0F0F0B, 0x0B0B0707, 0x07030303, 0x00000000);
__m128i const ovalshf1 = _mm_set_epi32(0x00000000, 0x0E0D0C0A, 0x09080605, 0x04020100);
__m128i const ovalshf2 = _mm_set_epi32(0x04020100, 0x00000000, 0x0E0D0C0A, 0x09080605);
__m128i const ovalshf3 = _mm_set_epi32(0x09080605, 0x04020100, 0x00000000, 0x0E0D0C0A);
__m128i const ovalshf4 = _mm_set_epi32(0x0E0D0C0A, 0x09080605, 0x04020100, 0x00000000);
__m128i const blndmsk1 = _mm_set_epi32(0xFFFFFFFF, 0x00000000, 0x00000000, 0x00000000);
__m128i const blndmsk2 = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0x00000000, 0x00000000);
__m128i const blndmsk3 = _mm_set_epi32(0xFFFFFFFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x00000000);
__m128i a, b, c, x, y, z, w, p, q, r, s;
uint8_t const *const aSourceRGBPast = aSourceRGB + 3 * aWidth * aHeight;
while (aSourceRGB != aSourceRGBPast) {
// source:
// aaabbbcccdddeeef
// ffggghhhiiijjjkk
// klllmmmnnnoooppp
//
// overlay:
// aaaabbbbccccdddd
// eeeeffffgggghhhh
// iiiijjjjkkkkllll
// mmmmnnnnoooopppp
// load source
a = _mm_load_si128((__m128i const*)(aSourceRGB ));
b = _mm_load_si128((__m128i const*)(aSourceRGB + 16));
c = _mm_load_si128((__m128i const*)(aSourceRGB + 32));
// load overlay
x = _mm_load_si128((__m128i const*)(aOverlayRGBA ));
y = _mm_load_si128((__m128i const*)(aOverlayRGBA + 16));
z = _mm_load_si128((__m128i const*)(aOverlayRGBA + 32));
w = _mm_load_si128((__m128i const*)(aOverlayRGBA + 48));
// compute blend mask, put 0xFF in bytes equal to zero
p = _mm_cmpeq_epi8(x, ocmp);
q = _mm_cmpeq_epi8(y, ocmp);
r = _mm_cmpeq_epi8(z, ocmp);
s = _mm_cmpeq_epi8(w, ocmp);
// align overlay to be condensed to 3-byte color
x = _mm_shuffle_epi8(x, ovalshf1);
y = _mm_shuffle_epi8(y, ovalshf2);
z = _mm_shuffle_epi8(z, ovalshf3);
w = _mm_shuffle_epi8(w, ovalshf4);
// condense overlay to 3-btye color
x = _mm_blendv_epi8(x, y, blndmsk1);
y = _mm_blendv_epi8(y, z, blndmsk2);
z = _mm_blendv_epi8(z, w, blndmsk3);
// align blend mask to be condensed to 3-byte color
p = _mm_shuffle_epi8(p, omskshf1);
q = _mm_shuffle_epi8(q, omskshf2);
r = _mm_shuffle_epi8(r, omskshf3);
s = _mm_shuffle_epi8(s, omskshf4);
// condense blend mask to 3-btye color
p = _mm_blendv_epi8(p, q, blndmsk1);
q = _mm_blendv_epi8(q, r, blndmsk2);
r = _mm_blendv_epi8(r, s, blndmsk3);
// select from overlay and source based on blend mask
x = _mm_blendv_epi8(x, a, p);
y = _mm_blendv_epi8(y, b, q);
z = _mm_blendv_epi8(z, c, r);
// write colors to destination
_mm_store_si128((__m128i*)(aDestinationRGB ), x);
_mm_store_si128((__m128i*)(aDestinationRGB + 16), y);
_mm_store_si128((__m128i*)(aDestinationRGB + 32), z);
// update poniters
aSourceRGB += 48;
aOverlayRGBA += 64;
aDestinationRGB += 48;
}
}

Issue with writing YUV image frame in C/C++

I am trying to convert an RGB frame, which is taken from OpenGL glReadPixels(), to a YUV frame, and write the YUV frame to a file (.yuv). Later on I would like to write it to a named_pipe as an input for FFMPEG, but as for now I just want to write it to a file and view the image result using a YUV Image Viewer. So just disregard the "writing to pipe" for now.
After running my code, I encountered the following errors:
The number of frames shown in the YUV Image Viewer software is always 1/3 of the number of frames I declared in my program. When I declare fps as 10, I could only view 3 frames. When I declared fps as 30, I could only view 10 frames. However when I view the file in Text Editor, I could see that I have the correct amount of word "FRAME" printed in the file.
This is the example output that I got: http://www.bobdanani.net/image.yuv
I could not see the correct image, but just some distorted green, blue, yellow, and black pixels.
I read about YUV format from http://wiki.multimedia.cx/index.php?title=YUV4MPEG2 and http://www.fourcc.org/fccyvrgb.php#mikes_answer and http://kylecordes.com/2007/pipe-ffmpeg
Here is what I have tried so far. I know that this conversion approach is quite in-efficient, and I can optimize it later. Now I just want to get this naive approach to work and have the image shown properly.
int frameCounter = 1;
int windowWidth = 0, windowHeight = 0;
unsigned char *yuvBuffer;
unsigned long bufferLength = 0;
unsigned long frameLength = 0;
int fps = 10;
void display(void) {
/* clear the color buffers */
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
/* DRAW some OPENGL animation, i.e. cube, sphere, etc
.......
.......
*/
glutSwapBuffers();
if ((frameCounter % fps) == 1){
bufferLength = 0;
windowWidth = glutGet(GLUT_WINDOW_WIDTH);
windowHeight = glutGet (GLUT_WINDOW_HEIGHT);
frameLength = (long) (windowWidth * windowHeight * 1.5 * fps) + 100; // YUV 420 length (width*height*1.5) + header length
yuvBuffer = new unsigned char[frameLength];
write_yuv_frame_header();
}
write_yuv_frame();
frameCounter = (frameCounter % fps) + 1;
if ( (frameCounter % fps) == 1){
snprintf(filename, 100, "out/image-%d.yuv", seq_num);
ofstream out(filename, ios::out | ios::binary);
if(!out) {
cout << "Cannot open file.\n";
}
out.write (reinterpret_cast<char*> (yuvBuffer), bufferLength);
out.close();
bufferLength = 0;
delete[] yuvBuffer;
}
}
void write_yuv_frame_header (){
char *yuvHeader = new char[100];
sprintf (yuvHeader, "YUV4MPEG2 W%d H%d F%d:1 Ip A0:0 C420mpeg2 XYSCSS=420MPEG2\n", windowWidth, windowHeight, fps);
memcpy ((char*)yuvBuffer + bufferLength, yuvHeader, strlen(yuvHeader));
bufferLength += strlen (yuvHeader);
delete (yuvHeader);
}
void write_yuv_frame() {
int width = glutGet(GLUT_WINDOW_WIDTH);
int height = glutGet(GLUT_WINDOW_HEIGHT);
memcpy ((void*) (yuvBuffer+bufferLength), (void*) "FRAME\n", 6);
bufferLength +=6;
long length = windowWidth * windowHeight;
long yuv420FrameLength = (float)length * 1.5;
long lengthRGB = length * 3;
unsigned char *rgb = (unsigned char *) malloc(lengthRGB * sizeof(unsigned char));
unsigned char *yuvdest = (unsigned char *) malloc(yuv420FrameLength * sizeof(unsigned char));
glReadPixels(0, 0, windowWidth, windowHeight, GL_RGB, GL_UNSIGNED_BYTE, rgb);
int r, g, b, y, u, v, ypos, upos, vpos;
for (int j = 0; j < windowHeight; ++j){
for (int i = 0; i < windowWidth; ++i){
r = (int)rgb[(j * windowWidth + i) * 3 + 0];
g = (int)rgb[(j * windowWidth + i) * 3 + 1];
b = (int)rgb[(j * windowWidth + i) * 3 + 2];
y = (int)(r * 0.257 + g * 0.504 + b * 0.098) + 16;
u = (int)(r * 0.439 + g * -0.368 + b * -0.071) + 128;
v = (int)(r * -0.148 + g * -0.291 + b * 0.439 + 128);
ypos = j * windowWidth + i;
upos = (j/2) * (windowWidth/2) + i/2 + length;
vpos = (j/2) * (windowWidth/2) + i/2 + length + length/4;
yuvdest[ypos] = y;
yuvdest[upos] = u;
yuvdest[vpos] = v;
}
}
memcpy ((void*) (yuvBuffer + bufferLength), (void*)yuvdest, yuv420FrameLength);
bufferLength += yuv420FrameLength;
free (yuvdest);
free (rgb);
}
This is just the very basic approach, and I can optimize the conversion algorithm later.
Can anyone tell me what is wrong in my approach? My guess is that one of the issues is with the outstream.write() call, because I converted the unsigned char* data to char* data that it may lose data precision. But if I don't cast it to char* I will get a compile error. However this doesn't explain why the output frames are corrupted (only account to 1/3 of the number of total frames).
It looks to me like you have too many bytes per frame for 4:2:0 data. ACcording to the spec you linked to, the number of bytes for a 200x200 pixel 4:2:0 frame should be 200 * 200 * 3 / 2 = 60,000. But you have ~90,000 bytes. Looking at your code, I don't see where you are convert from 4:4:4 to 4:2:0. So you have 2 choices - either set the header to 4:4:4, or convert the YCbCr data to 4:2:0 before writing it out.
I compiled your code and surely there is a problem when computing upos and vpos values.
For me this worked (RGB to YUV NV12):
vpos = length + (windowWidth * (j/2)) + (i/2)*2;
upos = vpos + 1;

Locking a GDI+ Bitmap in Native C++?

I can find many examples on how to do this in managed c++ but none for unmanaged.
I want to get all the pixel data as efficiently as possible, but some of the scan0 stuff I would need more info about so I can properly iterate through the pixel data and get each rgba value from it.
right now I have this:
Bitmap *b = new Bitmap(filename);
if(b == NULL)
{
return 0;
}
UINT w,h;
w = b->GetWidth();
h = b->GetHeight();
Rect *r = new Rect(0,0,w,h);
BitmapData *lockdat;
b->LockBits(r,ImageLockModeRead,PixelFormatDontCare,lockdat);
delete(r);
if(w == 0 && h == 0)
{
return 0;
}
Color c;
std::vector<GLubyte> pdata(w * h * 4,0.0);
for (unsigned int i = 0; i < h; i++) {
for (unsigned int j = 0; j < w; j++) {
b->GetPixel(j,i,&c);
pdata[i * 4 * w + j * 4 + 0] = (GLubyte) c.GetR();
pdata[i * 4 * w + j * 4 + 1] = (GLubyte) c.GetG();
pdata[i * 4 * w + j * 4 + 2] = (GLubyte) c.GetB();
pdata[i * 4 * w + j * 4 + 3] = (GLubyte) c.GetA();
}
}
delete(b);
return CreateTexture(pdata,w,h);
How do I use lockdat to do the equivalent of getpixel?
Thanks
lockdat->Scan0 is a pointer to the pixel data of the bitmap. Note that you really do care what pixel format you ask for, PixelFormatDontCare won't do. Because how you use the pointer is affected by the pixel format. PixelFormat32bppARGB is the easiest, one pixel will be the size of an int, 4 bytes representing alpha, red, green and blue. And the stride will be equal to the width of the bitmap. Making it likely that a simple memcpy() will get the job done. Beware the bitmaps are stored upside-down.
Bitmap *m_image = new Bitmap(...) // a 24-bit RGB bitmap
BitmapData bmData;
Rect rect(0, 0, m_image->GetWidth(), m_image->GetHeight());
m_image->LockBits(&rect , ImageLockModeRead , PixelFormat24bppRGB,&bmData );
memcpy(your_bytes_buffer, bmData.Scan0, min(bmData.Height * bmData.Stride, your_buffer_size));
m_image->UnlockBits(&bmData);