So I have a x8r8g8b8 formatted IDirect3DSurface9 that contains the contents of the back buffer. When I call LockRect on it I get access to a struct containing pBits, a pointer to the pixels I assume, and and integer Pitch (which I am very unclear about its purpose).
How to read the individual pixels?
Visual Studio 2008 C++
The locked area is stored in a D3DLOCKED_RECT. I haven't ever used this but the documentation says it is the "Number of bytes in one row of the surface". Actually people would normally call this "stride" (some terms explained in the MSDN).
For example, if one pixel has 4 bytes (8 bits for each component of XRGB), and the texture width is 7, the image is usually stored as 8*4 bytes instead of 7*4 bytes because the memory can be accessed faster if the data is DWORD-aligned.
So, in order to read pixel [x, y] you would have to read
uint8_t *pixels = rect.pBits;
uint32_t *mypixel = (uint32_t*)&pixels[rect.Pitch*y + 4*x];
where 4 is the size of a pixel. *myPixel would be the content of the pixel in my example.
Yep, you would access the individual RGB components of the pixel like that.
The first byte of the pixel is not used, but it is more efficient to use 4 Bytes per pixel, so that each pixel is aligned on a 32Bit boundary (that's also, why there's the pitch).
In your example, the x is not used, but note that there are lso other pixel formats, for example ARGB, which stores the alpha value (transparency) in the first byte. Sometimes the colors are also reversed (BGR instead of RGB). If you're unsure what byte corresponds to what color, a good trick is to create a texture which is entirely red, green or blue and then check which of the 4 bytes has the value 255.
Related
MSDN documentation seems to contradict itself:
Here it says:
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
While here it says:
The number of bytes in each scan line. This value must be divisible by 2, because the system assumes that the bit values of a bitmap form an array that is word aligned.
So sometimes MSDN wants a 4-byte aligned stride and sometimes it wants a 2-byte aligned stride. Which is right?
To be more specific, when saving a bitmap file should I use a 4-byte stride or a 2-byte stride?
The first quote is accurate. The second dates back to the 16-bit version of Windows and did not get edited as it should have. Not entirely unusual, GDI32 docs have had a fair amount of mistakes.
Do note that the up voted answer is not accurate. Monochrome bitmaps still have a stride that's a multiple of 4, there is no special rule that makes it 2. A bit of .NET code to demonstrate this:
var bmp = new Bitmap(1, 1, System.Drawing.Imaging.PixelFormat.Format1bppIndexed);
var bdata = bmp.LockBits(new Rectangle(0, 0, 1, 1), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
Console.WriteLine(bdata.Stride);
Output: 4
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
Bitmaps are not necessarily always uncompressed RGB, they might be monochrome. In the BITMAP structure, the member bmBitsPixel specifies the number of bits per pixel, so it is valid for it to be 1. So, you should save RGB bitmaps with a byte stride that is a multiple of 4, and save monochrome bitmap with a stride that is a multiple of 2.
CreateBitmap/CreateBitmapIndirect/BITMAP struct - are all pre-Windows 3.0 APIs that was supposed to be used on 16-bit processors. Thats why they are using this 16-bit aligned stride.
All newer APIs are using 32bit stride aligment (sizeof(DWORD)).
You can use "newer" APIs (post-Windows 3.0) like CreateDIBitmap or CreateCompatibleBitmap/SetDiBits if your buffer have 32-bit aligned strides.
As for files - they are using BITMAPINFO/BITMAPINFOHEADER structure and implies 32bit stride aligment.
I have data for every pixel red one byte, green one byte, blue one byte. I need to pack this to 8 bits bitmap, so I have only one byte for pixel. How to transform rgb (three bytes) to one byte for bitmap format ?
( I am using C++ and cannot use any external libraries)
I think you misunderstood how to form a bitmap structure. You do not need to pack (somehow) 3 bytes into one. That is not possible after all, unless you throw away information (like using special image formats GL_R3_G3_B2).
The BMP file format wiki page shows detailed BMP format : it is a header, followed by data. Now depending on what you set in your header, it is possible to form a BMP image containing RBG data component, where each component is one byte.
First you need to decide how many bits you want to allocate for each color.
3bit per color will overflow a byte (9bits)
2bits per color will underflow;
In three byte RGB bitmap you have one byte to represent each color's intensity. Where 0 is minimum and 255 is max intensity. When you convert it to 1 byte bitmap (assuming you will choose 2bits per color ) transform should be:
1-byte red color/64
i.e you will get only 4 shades out of a spectrum of 265 shades per color.
First you have to produce 256 colors palette that best fits your source image.
Then you need to dither the image using the palette you've generated.
Both problems have many well-known solutions. However, it's impossible to produce high-quality result completely automatic: for different source images, different approaches work best. For example, here's the Photoshop UI that tunes the parameters of the process:
I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.
What are the disadvantages of always using alginment of 1?
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
glPixelStorei(GL_PACK_ALIGNMENT, 1)
Will it impact performance on modern gpus?
How can data not be 1-byte aligned?
This strongly suggests a lack of understanding of what the row alignment in pixel transfer operations means.
Image data that you pass to OpenGL is expected to be grouped into rows. Each row contains width number of pixels, with each pixel being the size as defined by the format and type parameters. So a format of GL_RGB with a type of GL_UNSIGNED_BYTE will result in a pixel that is 24-bits in size. Pixels are otherwise expected to be packed, so a row of 16 of these pixels will take up 48 bytes.
Each row is expected to be aligned on a specific value, as defined by the GL_PACK/UNPACK_ALIGNMENT. This means that the value you add to the pointer to get to the next row is: align(pixel_size * width, GL_*_ALIGNMENT). If the pixel size is 3-bytes, the width is 2, and the alignment is 1, the row byte size is 6. If the alignment is 4, the row byte size is eight.
See the problem?
Image data, which may come from some image file format as loaded with some image loader, has a row alignment. Sometimes this is 1-byte aligned, and sometimes it isn't. DDS images have an alignment specified as part of the format. In many cases, images have 4-byte row alignments; pixel sizes less than 32-bits will therefore have padding at the end of rows with certain widths. If the alignment you give OpenGL doesn't match that, then you get a malformed texture.
You set the alignment to match the image format's alignment. If you know or otherwise can ensure that your row alignment is always 1 (and that's unlikely unless you've written your own image format or DDS writer), you need to set the row alignment to be exactly what your image format uses.
Will it impact performance on modern gpus?
No, because the pixel store settings are only relevent for the transfer of data from or to the GPU, namely the alignment of your data. Once on the GPU memory it's aligned in whatever way the GPU and driver desire.
There will be no impact on performance. Setting higher alignment (in openGL) doesn't improve anything, or speeds anything up.
All alignment does is to tell openGL where to expect the next row of pixels. You should always use an alignment of 1, if your image pixels are tightly packed, i.e. if there are no gaps between where a row of bytes ends and where a new row starts.
The default alignment is 4 (i.e. openGL expects the next row of pixels to be after a jump in memory which is divisible by 4), which may cause problems in cases where you load R, RG or RGB textures which are not 4-bytes floats, or the width is not divisible by 4. If your image pixels are tightly packed you have to change alignment to 1 in order for the unpacking to work.
You could (I personally haven’t encountered them) have an image of, say, 3x3 RGB ubyte, whose rows are 4th-aligned with 3 extra bytes used as padding in the end. Which rows might look like this:
R - G - B - R - G - B - R - G - B - X - X - X (16 bytes in total)
The reason for it is that aligned data improves the performance of the processor (not sure how much it's true/justified on todays processors). IF you have any control over how the original image is composed, then maybe aligning it one way or another will improve the handling of it. But this is done PRIOR to openGL. OpenGL has no way of changing anything about this, it only cares about where to find the pixels.
So, back to the 3x3 image row above - setting the alignment to 4 would be good (and necessary) to jump over the last padding. If you set it to 1 then, it will mess your result, so you need to keep/restore it to 4. (Note that you could also use ROW_LENGTH to jump over it, as this is the parameter used when dealing with subsets of the image, in which case you sometime have to jump much more then 3 or 7 bytes (which is the max the alignment parameter of 8 can give you). In our example if you supply a row length of 4 and an alignment of 1 will also work).
Same goes for packing. You can tell openGL to align the pixels row to 1, 2, 4 and 8. If you're saving a 3x3 RGB ubyte, you should set the alignment to 1. Technically, if you want the resulting rows to be tightly packed, you should always give 1. If you want (for whatever reason) to create some padding you can give another value. Giving (in our example) a PACK_ALIGNMENT of 4, would result in creating rows that look like the row above (with the 3 extra padding in the end). Note that in that case your containing object (openCV mat, bitmap, etc.) should be able to receive that extra padding.
I'm trying to understand building a bmp based on raw data in c++ and I have a few questions.
My bmp can be black and white so I figured that the in the bit per pixel field I should go with 1. However in a lot of guides I see the padding field adds the number of bits to keep 32 bit alignment, meaning my bmp will be the same file size as a 24 bit per pixel bmp.
Is this understanding correct or in some way is the 1 bit per pixel smaller than 24, 32 etc?
Thanks
Monochrome bitmaps are aligned too, but they will not take as much space as 24/32-bpp ones.
A row of 5-pixel wide 24-bit bitmap will take 16 bytes: 5*3=15 for pixels, and 1 byte of padding.
A row of 5-pixel wide 32-bit bitmap will take 20 bytes: 5*4=20 for pixels, no need for padding.
A row of 5-pixel wide monochrome bitmap will take 4 bytes: 1 byte for pixels (it is not possible to use less than a byte, so whole byte is taken but 3 of its 8 bits are not used), and 3 bytes of padding.
So, monochrome bitmap will of course be smaller than 24-bit one.
The answer is already given above (that bitmap rows are aligned/padded to 32-bit boundary), however if you want more information, you might want to read DIBs and Their Uses, the "DIB Header" section - it explains in detail.
Every scanline is DWORD-aligned. The scanline is buffered to alignment; the buffering is not necessarily 0.
The scanlines are stored upside down, with the first scan (scan 0) in memory being the bottommost scan in the image. (See Figure 1.) This is another artifact of Presentation Manager compatibility. GDI automatically inverts the image during the Set and Get operations. Figure 1. (Embedded image showing memory and screen representations.)