I'm trying to understand building a bmp based on raw data in c++ and I have a few questions.
My bmp can be black and white so I figured that the in the bit per pixel field I should go with 1. However in a lot of guides I see the padding field adds the number of bits to keep 32 bit alignment, meaning my bmp will be the same file size as a 24 bit per pixel bmp.
Is this understanding correct or in some way is the 1 bit per pixel smaller than 24, 32 etc?
Thanks
Monochrome bitmaps are aligned too, but they will not take as much space as 24/32-bpp ones.
A row of 5-pixel wide 24-bit bitmap will take 16 bytes: 5*3=15 for pixels, and 1 byte of padding.
A row of 5-pixel wide 32-bit bitmap will take 20 bytes: 5*4=20 for pixels, no need for padding.
A row of 5-pixel wide monochrome bitmap will take 4 bytes: 1 byte for pixels (it is not possible to use less than a byte, so whole byte is taken but 3 of its 8 bits are not used), and 3 bytes of padding.
So, monochrome bitmap will of course be smaller than 24-bit one.
The answer is already given above (that bitmap rows are aligned/padded to 32-bit boundary), however if you want more information, you might want to read DIBs and Their Uses, the "DIB Header" section - it explains in detail.
Every scanline is DWORD-aligned. The scanline is buffered to alignment; the buffering is not necessarily 0.
The scanlines are stored upside down, with the first scan (scan 0) in memory being the bottommost scan in the image. (See Figure 1.) This is another artifact of Presentation Manager compatibility. GDI automatically inverts the image during the Set and Get operations. Figure 1. (Embedded image showing memory and screen representations.)
Related
MSDN documentation seems to contradict itself:
Here it says:
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
While here it says:
The number of bytes in each scan line. This value must be divisible by 2, because the system assumes that the bit values of a bitmap form an array that is word aligned.
So sometimes MSDN wants a 4-byte aligned stride and sometimes it wants a 2-byte aligned stride. Which is right?
To be more specific, when saving a bitmap file should I use a 4-byte stride or a 2-byte stride?
The first quote is accurate. The second dates back to the 16-bit version of Windows and did not get edited as it should have. Not entirely unusual, GDI32 docs have had a fair amount of mistakes.
Do note that the up voted answer is not accurate. Monochrome bitmaps still have a stride that's a multiple of 4, there is no special rule that makes it 2. A bit of .NET code to demonstrate this:
var bmp = new Bitmap(1, 1, System.Drawing.Imaging.PixelFormat.Format1bppIndexed);
var bdata = bmp.LockBits(new Rectangle(0, 0, 1, 1), System.Drawing.Imaging.ImageLockMode.ReadWrite, bmp.PixelFormat);
Console.WriteLine(bdata.Stride);
Output: 4
For uncompressed RGB formats, the minimum stride is always the image width in bytes, rounded up to the nearest DWORD.
Bitmaps are not necessarily always uncompressed RGB, they might be monochrome. In the BITMAP structure, the member bmBitsPixel specifies the number of bits per pixel, so it is valid for it to be 1. So, you should save RGB bitmaps with a byte stride that is a multiple of 4, and save monochrome bitmap with a stride that is a multiple of 2.
CreateBitmap/CreateBitmapIndirect/BITMAP struct - are all pre-Windows 3.0 APIs that was supposed to be used on 16-bit processors. Thats why they are using this 16-bit aligned stride.
All newer APIs are using 32bit stride aligment (sizeof(DWORD)).
You can use "newer" APIs (post-Windows 3.0) like CreateDIBitmap or CreateCompatibleBitmap/SetDiBits if your buffer have 32-bit aligned strides.
As for files - they are using BITMAPINFO/BITMAPINFOHEADER structure and implies 32bit stride aligment.
I have a 2D array of float values:
float values[1024][1024];
that I want to store as an image.
The values are in the range: [-range,+range].
I want to use a colortable that goes from red(-range) to white(0) to black(+range).
So far I have been storing each pixel as a 32 bit RGBA using the BMP file format. The total memory for storing my array is then 1024*1024*4 bytes = 4MB.
This seems very vasteful knowing that my colortable is "1 dimensional" whereas the 32 RGBA is "4 dimensional".
To see what I mean; lets assume that my colortable went from black(-range) to blue(+range).
In this case the only component that varies is clearly the B, all the others are fixed.
So I am only getting 8bits of precision whereas I am "paying" for 32 :-(.
I am therefore looking for a "palette" based file format.
Ideally I would like each pixel to be a 16 bit index (unsigned short int) into a "palette" consisting of 2^16 RGBA values.
The total memory used for storing my array in this case would be: 1024*1024*2 bytes + 2^16*4bytes = 2.25 MB.
So I would get twice as good precision for almost half the "price"!
Which image formats support this?
At the moment I am using Qt's QImage to write the array to file as an image. QImage has an internal 8 bit indexed ("palette") format. I would like a 16 bit one. Also I did not understand from Qt's documentation which file formats support the 8 bit indexed internal format.
Store it as a 16 bit greyscale PNG and do the colour table manually yourself.
You don't say why your image can be decomposed in 2^16 colours but using your knowledge of this special image you could make an algorithm so that indices that are near each other have similar colours and are therefore easier to compress.
"I want to use a colortable that goes from red(-range) to white(0) to black(+range)."
Okay, so you've got FF,00,00 (red) to FF,FF,FF (white) to 00,00,00 (black). In 24 bit RGB, that looks to me like 256 values from red to white and then another 256 from white to black. So you don't need a palette size of 2^16 (16384); you need 2^9 (512).
If you're willing to compromise and use a palette size of 2^8 then the GIF format could work. That's still relatively fine resolution: 128 shades of red on the negative size, plus 128 shades of grey on the positive. Each of a GIF's 256 palette entries can be an RGB value.
PNG is another candidate for palette-based color. You have more flexibility with PNG, including RGBA if you need an alpha channel.
You mentioned RGBA in your question but the use of the alpha channel was not explained.
So independent of file format, if you can use a 256 entry palette then you will have a very well compressed image. Back to your mapping requirement (i.e. mapping floats [-range -> 0.0 -> +range] to [red -> white -> black], here is a 256 entry palette that covers the range red-white-black you wanted:
float entry# color rgb
------ ------- ----- --------
-range 0 00 red FF,00,00
1 01 FF,02,02
2 02 FF,04,04
... ...
... ...
127 7F FF,FD,FD
0.0 128 80 white FF,FF,FF
129 81 FD,FD,FD
... ....
... ...
253 FD 04,04,04
254 FE 02,02,02
+range 255 FF black 00,00,00
If you double the size of the color table to be 9 bits (512 values) then you can make the increments between RGB entries more fine: increments of 1 instead of 2. Such a 9-bit palette would give you full single-channel resolution in RGB on both the negative and positive sides of the range. It's not clear that allocating 16 bits of palette would really be able to store any more visual information given the mapping you want to do. I hope I understand your question and maybe this is helpful.
PNG format supports paletted format up to 8-bits, but should also support grayscale images up to 16-bits. However, 16-bit modes are less used, and software support may be lacking. You should test your tools first.
But you could also test with plain 24-bit RGB truecolor PNG images. They are compressed and should produce better result than BMP in any case.
I have data for every pixel red one byte, green one byte, blue one byte. I need to pack this to 8 bits bitmap, so I have only one byte for pixel. How to transform rgb (three bytes) to one byte for bitmap format ?
( I am using C++ and cannot use any external libraries)
I think you misunderstood how to form a bitmap structure. You do not need to pack (somehow) 3 bytes into one. That is not possible after all, unless you throw away information (like using special image formats GL_R3_G3_B2).
The BMP file format wiki page shows detailed BMP format : it is a header, followed by data. Now depending on what you set in your header, it is possible to form a BMP image containing RBG data component, where each component is one byte.
First you need to decide how many bits you want to allocate for each color.
3bit per color will overflow a byte (9bits)
2bits per color will underflow;
In three byte RGB bitmap you have one byte to represent each color's intensity. Where 0 is minimum and 255 is max intensity. When you convert it to 1 byte bitmap (assuming you will choose 2bits per color ) transform should be:
1-byte red color/64
i.e you will get only 4 shades out of a spectrum of 265 shades per color.
First you have to produce 256 colors palette that best fits your source image.
Then you need to dither the image using the palette you've generated.
Both problems have many well-known solutions. However, it's impossible to produce high-quality result completely automatic: for different source images, different approaches work best. For example, here's the Photoshop UI that tunes the parameters of the process:
I am reading the OpenGL SuperBible for OpenGL 3.x. I am having a hard time understanding the whole "pixel packing concept." I get that typically a 199px wide image would require 597 bytes [(199 * 3)3 for each color channel RGB]. My first question is why would this only sometimes be right, the author says this only woks with a 4byte alignment system. He goes on to say that an extra three bytes will be added to make it divisible by four easily, which I don't understand either. So really my question is what is the significance of a 4 byte alignment -- what does that actually mean?? The author then says the alternative is 1-byte alignment, which I don't understand either.
The author says that .TGA is 1 byte aligned or "tight" and .bmp is 4 byte aligned.
What is a 4 byte alignment, a 1 byte alignment, and why should I use one over the other? When should I use .tga or .bmp for texturing?
Well it is about data alignment. It differs on different architectures and memory management. For example when you code a in 32-bit processor the default padding on a structure / int is 4-byte. In 64-bit is 8-byte. When you want to override the default padding on a code structure you can use #pragma pack (on visual studio), attribute((packed)) (gcc).
Look here for additional reference:
https://stackoverflow.com/a/10915310/1406063
https://stackoverflow.com/a/5398498/1406063
To make matters worse, your description doesn't differentiate pixel alignment vs scan line alignment.
A 199 pixel image with 24-bit color indeed requires 597 bytes. But the second scan line will start at offset 600, because it has to be aligned to a 4 byte boundary, and 600 is the first 4 byte boundary that doesn't overlap the first scan line.
This means that instead of each pixel starting at byte
(x + y * width) * 3
Each row will start at byte
y * 600
and each pixel at byte
y * 600 + x * 3
But why not just make your textures a power of 2, which is supported on all cards in all OpenGL versions and doesn't cause alignment questions? You use texture coordinates to make sure that the padding isn't ever processed or rendered.
So I have a x8r8g8b8 formatted IDirect3DSurface9 that contains the contents of the back buffer. When I call LockRect on it I get access to a struct containing pBits, a pointer to the pixels I assume, and and integer Pitch (which I am very unclear about its purpose).
How to read the individual pixels?
Visual Studio 2008 C++
The locked area is stored in a D3DLOCKED_RECT. I haven't ever used this but the documentation says it is the "Number of bytes in one row of the surface". Actually people would normally call this "stride" (some terms explained in the MSDN).
For example, if one pixel has 4 bytes (8 bits for each component of XRGB), and the texture width is 7, the image is usually stored as 8*4 bytes instead of 7*4 bytes because the memory can be accessed faster if the data is DWORD-aligned.
So, in order to read pixel [x, y] you would have to read
uint8_t *pixels = rect.pBits;
uint32_t *mypixel = (uint32_t*)&pixels[rect.Pitch*y + 4*x];
where 4 is the size of a pixel. *myPixel would be the content of the pixel in my example.
Yep, you would access the individual RGB components of the pixel like that.
The first byte of the pixel is not used, but it is more efficient to use 4 Bytes per pixel, so that each pixel is aligned on a 32Bit boundary (that's also, why there's the pitch).
In your example, the x is not used, but note that there are lso other pixel formats, for example ARGB, which stores the alpha value (transparency) in the first byte. Sometimes the colors are also reversed (BGR instead of RGB). If you're unsure what byte corresponds to what color, a good trick is to create a texture which is entirely red, green or blue and then check which of the 4 bytes has the value 255.