Sun Raster images: Why 1 byte row padding when width is odd? - c++

This may be waaay to specific for SO, but there seems to be a dearth of info on the sun raster standard. (Even JWZ is frustrated by this!)
Intro: The Sun raster standard says that rows of pixels have padding at the end such that the number of bits in a row is a factor of 16 (i.e. an even number of bytes). For example, if you had a 7-pixel-wide 24-bit image, a row would normally take 7 * 3 = 21 bytes, but sun raster would pad it to 22 bytes so the number of bits is divisible by 16. The code below achieves this for 24-bit images of arbitrary width:
row_byte_length = num_cols * 3;
row_byte_length += width_in_bytes % 2;
Here's my question: both Imagemagick and Gimp follow this rule for 24-bit images, but for 32-bit images it does something weird that I don't understand. Since the bit depth gives 4-byte pixels, any image width would take an even number of bytes per row, which always complies with the "16-bit alignment" rule. But when they compute the row length, they add an extra byte for images with odd widths, making the row length odd (i.e. the number of bits for the row is not divisible by 16). The code below describes what they're doing for 32-bit images:
row_byte_length = num_cols * 4 + num_cols % 2;
Adding one appears to go against the "16-bit alignment" rule as specified by the sun format, and is done with no apparent purpose. However, I'm sure if Gimp and Imagemagick do it this way, I must be misreading the sun raster spec.
Are there any Sun raster experts out there who know why this is done?
Edit
My mistake, Gimp only outputs up to 24 bit Sun raster. Looks like this is only an Imagemagick issue, so probably a bug. I'm labeling this for closure; better to discuss on the ImageMagick forums.

I'd say the image loading code in Gimp and ImageMagick has a bug. Simple as that.
Keep in mind that the SUN-Raster format isn't that widely used. It's very possible that you're one of the first who actually tried to use this format, found out that it doesn't work as expected and not ignored it.
If the spec. sais something along the lines: Regardless of width, the stored scanlines are rounded up to multiples of 16 bits, then there isn't much room for interpretation.

It seems like a mistake to me too. I even wonder if Sun even supports 32-bits RAS files. Basically, a 32-bits image would most likely add an alpha channel as "fourth" colour, to support transparency. But like many image file formats, it's a bit old and others have made adjustments to the format to support 32-bits images. So I think that whomever added the 32-bits support just implemented it wrong and ever since we have to live with that decision.
Compare it with the referer misspelling that has become part of the HTTP standard. :-) A mistake that has become part of a new standard.

Related

BMP File Format Confusion

I'm writing my own BMP file reader in C++, and there's some docs that i'm not so sure about.
In the Wiki page for the BMP File Format, there's a diagram of all the practical formats of BMP used out in the wild.
For the values that look like 8.8.8.0.8 like the 32 bpp BI_RGB in BITMAPINFOHEADER, does each value represent the amount of bits that can be used to represent each color channel in RGBAX? If this is the case, What is the 'X'? And why are there 8 bits devoted to it? Could i use this for an alpha channel?
The R.G.B.A.X notation used to be documented on Wikipedia (even with some colorful diagrams), but it has been deleted by some busybody there. However, you can still find it in the history of the article, though. See here.
Anyway, 8.8.8.0.8 means that there are 32 bits per pixel (because the sum of all of the digits equals 32) and the 0 means, there are no bits for the Alpha channel (Alpha channel is not encoded in this format, nor in any of the palletized pixel formats). The last 8 (at the fifth X position) means that there are 8 unused bits per pixel, in this encoding.
Also, the Wikipedia article and diagram below is more complete than anything the MSDN has on the subjet. See Chris Cox's criticism of the MSDN documentation here:

Converting 12 bit color values to 8 bit color values C++

I'm attempting to convert 12-bit RGGB color values into 8-bit RGGB color values, but with my current method it gives strange results.
Logically, I thought that simply dividing the 12-bit RGGB into 8-bit RGGB would work and be pretty simple:
// raw_color_array contains R,G1,G2,B in a bayer pattern with each element
// ranging from 0 to 4096
for(int i = 0; i < array_size; i++)
{
raw_color_array[i] /= 16; // 4096 becomes 256 and so on
}
However, in practice this actually does not work. Given, for example, a small image with water and a piece of ice in it you can see what actually happens in the conversion (right most image).
Why does this happen? and how can I get the same (or close to) image on the left, but as 8-bit values instead? Thanks!
EDIT: going off of #MSalters answer, I get a better quality image but the colors are still drasticaly skewed. What resources can I look into for converting 12-bit data to 8-bit data without a steep loss in quality?
It appears that your raw 12 bits data isn't on a linear scale. That is quite common for images. For a non-linear scale, you can't use a linear transformation like dividing by 16.
A non-linear transform like sqrt(x*16) would also give you an 8 bits value. So would std::pow(x, 12.0/8.0)
A known problem with low-gradient images is that you get banding. If your images has an area where the original value varies from say 100 to 200, the 12-to-8 bit reduction will shrink that to less than 100 different values. You get rounding , and with naive (local) rounding you get bands. Linear or non-linear, there will then be some inputs x that all map to y, and some that map to y+1. This can be mitigated by doing the transformation in floating point, and then adding a random value between -1.0 and +1.0 before rounding. This effectively breaks up the band structure.
After you clarified that this 12bit data is only for one color, here is my simple answer:
Since you want to convert its value to its 8 bit equivalent, it obviously means you lost some of the data (4bits). This is the reason why you are not getting the same output.
After clarification:
If you want to retain the actual colour values!
Apply de-mosaicking in the 12 Bit image and then scale the resultant data to 8 - Bit. So that the colour loss due to de-mosaicking will be less compared to the previous approach.
You say that your 12-bits represent 2^12 bits of one colour. That is incorrect. There are reds, greens and blues in your image. Look at the histogram. I made this with ImageMagick at the command line:
convert cells.jpg histogram:png:h.png
If you want 8-bits per pixel, rather than trying to blindly/statically apportion 3 bits to Green, 2 bits to Red and 3 bits to Blue, you would probably be better off going with an 8-bit palette so you can have 250+ colours of all variations rather than restricting yourself to just 8 blue shades, 4 reds an 8 green. So, like this:
convert cells.jpg -colors 254 PNG8:result.png
Here is the result of that beside the original:
The process above is called "quantisation" and if you want to implement it in C/C++, there is a writeup here.

C++: How to interpret a byte array representation of an image?

I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.

glPixelStorei(GL_UNPACK_ALIGNMENT, 1) Disadvantages?

What are the disadvantages of always using alginment of 1?
glPixelStorei(GL_UNPACK_ALIGNMENT, 1)
glPixelStorei(GL_PACK_ALIGNMENT, 1)
Will it impact performance on modern gpus?
How can data not be 1-byte aligned?
This strongly suggests a lack of understanding of what the row alignment in pixel transfer operations means.
Image data that you pass to OpenGL is expected to be grouped into rows. Each row contains width number of pixels, with each pixel being the size as defined by the format and type parameters. So a format of GL_RGB with a type of GL_UNSIGNED_BYTE will result in a pixel that is 24-bits in size. Pixels are otherwise expected to be packed, so a row of 16 of these pixels will take up 48 bytes.
Each row is expected to be aligned on a specific value, as defined by the GL_PACK/UNPACK_ALIGNMENT. This means that the value you add to the pointer to get to the next row is: align(pixel_size * width, GL_*_ALIGNMENT). If the pixel size is 3-bytes, the width is 2, and the alignment is 1, the row byte size is 6. If the alignment is 4, the row byte size is eight.
See the problem?
Image data, which may come from some image file format as loaded with some image loader, has a row alignment. Sometimes this is 1-byte aligned, and sometimes it isn't. DDS images have an alignment specified as part of the format. In many cases, images have 4-byte row alignments; pixel sizes less than 32-bits will therefore have padding at the end of rows with certain widths. If the alignment you give OpenGL doesn't match that, then you get a malformed texture.
You set the alignment to match the image format's alignment. If you know or otherwise can ensure that your row alignment is always 1 (and that's unlikely unless you've written your own image format or DDS writer), you need to set the row alignment to be exactly what your image format uses.
Will it impact performance on modern gpus?
No, because the pixel store settings are only relevent for the transfer of data from or to the GPU, namely the alignment of your data. Once on the GPU memory it's aligned in whatever way the GPU and driver desire.
There will be no impact on performance. Setting higher alignment (in openGL) doesn't improve anything, or speeds anything up.
All alignment does is to tell openGL where to expect the next row of pixels. You should always use an alignment of 1, if your image pixels are tightly packed, i.e. if there are no gaps between where a row of bytes ends and where a new row starts.
The default alignment is 4 (i.e. openGL expects the next row of pixels to be after a jump in memory which is divisible by 4), which may cause problems in cases where you load R, RG or RGB textures which are not 4-bytes floats, or the width is not divisible by 4. If your image pixels are tightly packed you have to change alignment to 1 in order for the unpacking to work.
You could (I personally haven’t encountered them) have an image of, say, 3x3 RGB ubyte, whose rows are 4th-aligned with 3 extra bytes used as padding in the end. Which rows might look like this:
R - G - B - R - G - B - R - G - B - X - X - X (16 bytes in total)
The reason for it is that aligned data improves the performance of the processor (not sure how much it's true/justified on todays processors). IF you have any control over how the original image is composed, then maybe aligning it one way or another will improve the handling of it. But this is done PRIOR to openGL. OpenGL has no way of changing anything about this, it only cares about where to find the pixels.
So, back to the 3x3 image row above - setting the alignment to 4 would be good (and necessary) to jump over the last padding. If you set it to 1 then, it will mess your result, so you need to keep/restore it to 4. (Note that you could also use ROW_LENGTH to jump over it, as this is the parameter used when dealing with subsets of the image, in which case you sometime have to jump much more then 3 or 7 bytes (which is the max the alignment parameter of 8 can give you). In our example if you supply a row length of 4 and an alignment of 1 will also work).
Same goes for packing. You can tell openGL to align the pixels row to 1, 2, 4 and 8. If you're saving a 3x3 RGB ubyte, you should set the alignment to 1. Technically, if you want the resulting rows to be tightly packed, you should always give 1. If you want (for whatever reason) to create some padding you can give another value. Giving (in our example) a PACK_ALIGNMENT of 4, would result in creating rows that look like the row above (with the 3 extra padding in the end). Note that in that case your containing object (openCV mat, bitmap, etc.) should be able to receive that extra padding.

C++ Bitmap Bit per pixel

I'm trying to understand building a bmp based on raw data in c++ and I have a few questions.
My bmp can be black and white so I figured that the in the bit per pixel field I should go with 1. However in a lot of guides I see the padding field adds the number of bits to keep 32 bit alignment, meaning my bmp will be the same file size as a 24 bit per pixel bmp.
Is this understanding correct or in some way is the 1 bit per pixel smaller than 24, 32 etc?
Thanks
Monochrome bitmaps are aligned too, but they will not take as much space as 24/32-bpp ones.
A row of 5-pixel wide 24-bit bitmap will take 16 bytes: 5*3=15 for pixels, and 1 byte of padding.
A row of 5-pixel wide 32-bit bitmap will take 20 bytes: 5*4=20 for pixels, no need for padding.
A row of 5-pixel wide monochrome bitmap will take 4 bytes: 1 byte for pixels (it is not possible to use less than a byte, so whole byte is taken but 3 of its 8 bits are not used), and 3 bytes of padding.
So, monochrome bitmap will of course be smaller than 24-bit one.
The answer is already given above (that bitmap rows are aligned/padded to 32-bit boundary), however if you want more information, you might want to read DIBs and Their Uses, the "DIB Header" section - it explains in detail.
Every scanline is DWORD-aligned. The scanline is buffered to alignment; the buffering is not necessarily 0.
The scanlines are stored upside down, with the first scan (scan 0) in memory being the bottommost scan in the image. (See Figure 1.) This is another artifact of Presentation Manager compatibility. GDI automatically inverts the image during the Set and Get operations. Figure 1. (Embedded image showing memory and screen representations.)