Right now I have an application that renders a geometry and allows you to select a PNG and overlay it. My question is this:
I have a scenario where I select a PNG where the width is not a multiple of 4 (the dimensions are 2719 x 1277).
My understanding is that if you try to store an image into a 2D texture who's dimensions aren't multiples of 4 this shouldn't work. At least if your GL_UNPACK_ALIGNMENT is 4, which mine is.
I'm storing my texture with an internal format of RGB so one row if pixels would be 2719 * 3 (8157). I've seen problems people have had where they're trying to do the same thing I am where their image width is not in a multiple of 4, and they haven't been able to use the texture, until they've set the unpack alignment to 1, but in my case it works if I use 4. I just wanna understand why this is.
I used this link as my primary learning source.
https://www.opengl.org/wiki/Pixel_Transfer
Related
In my OpenGL program, I'm loading a 24BPP image with the width of 501. The GL_UNPACK_ALINGMENT parameter is set to 4. They write it shouldn't work because the size of each of the rows which are being uploaded (501*3 = 1503) cannot be divided by 4. However, I can see a normal texture without artifacs when displaying it.
So my code works. I'm considering why to understand this fully and prevent the whole project from getting bugged.
Maybe (?) it works because I'm not just calling glTexImage2D. Instead, at first I'm creating a proper (with dimensions which are powers of two) blank texture, then uploading pixels with glTexSubImage2D.
EDIT:
But do you think it does a sense to write some code like that?
// w - the width of the image
// depth - the depth of the image
bool change_alignment = false;
if (depth != 4 && !is_divisible(w*depth)) // *
{
change_alignment = true;
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
}
// ... now use glTexImage2D
if (change_alingment) glPixelStorei(GL_UNPACK_ALIGNMENT, 4); // set to default
// * - of course we don't even need such a function
// but I wanted to make the code as clear as possible
Hope it should prevent the application from crashing or malfunction?
It depends on where your image data is coming from.
The Windows BMP format, for example, enforces a 4-byte row alignment. Indeed, formats like this are exactly why OpenGL has a row-alignment field: because some image formats enforce a row alignment.
So how correct it is to use a 4-byte row alignment on your data depends entirely on how your data is aligned in memory. Some image loaders will automatically align to 4 bytes. And some will not.
I am working on a project to losslessly compress a specific style of BMP images that look like this
I have thought about doing pattern recognition, to find repetitive blocks of N x N pixels but I feel like it wont be fast enough execution time.
Any suggestions?
EDIT: I have access to the dataset that created these images too, I just use the image to visualize my data.
Optical illusions make it hard to tell for sure but are the colors only black/blue/red/green? If so, the most straightforward compression would be to simply make more efficient use of pixels. I'm thinking pixels use a fixed amount of space regardless of what color they are. Thus, chances are you are using 12x as many pixels as you really need to be. Since a pixel can be a lot more colors than just those four.
A simple way to do that would be to do label the pixels with the following base 4 numbers:
Black = 0
Red = 1
Green = 2
Blue = 3
Example:
The first four colors of the image seems to be Blue-Red-Blue-Blue. This is equal to 3233 in base 4, which is simply EF in base 16 or 239 in base 10. This is enough to define what the red color of the new pixel should be. The next 4 would define the green color and the final 4 define what the blue color is. Thus turning 12 pixels into a single pixel.
Beyond that you'll probably want to look into more conventional compression software.
I'm trying to work with this camera SDK, and let's say the camera has this function called CameraGetImageData(BYTE* data), which I assume takes in a byte array, modifies it with the image data, and then returns a status code based on success/failure. The SDK provides no documentation whatsoever (not even code comments) so I'm just guestimating here. Here's a code snippet on what I think works
BYTE* data = new BYTE[10000000]; // an array of an arbitrary large size, I'm not
// sure what the exact size needs to be so I
// made it large
CameraGetImageData(data);
// Do stuff here to process/output image data
I've run the code w/ breakpoints in Visual Studio and can confirm that the CameraGetImageData function does indeed modify the array. Now my question is, is there a standard way for cameras to output data? How should I start using this data and what does each byte represent? The camera captures in 8-bit color.
Take pictures of pure red, pure green and pure blue. See what comes out.
Also, I'd make the array 100 million, not 10 million if you've got the memory, at least initially. A 10 megapixel camera using 24 bits per pixel is going to use 30 million bytes, bigger than your array. If it does something crazy like store 16 bits per colour it could take up to 60 million or 80 million bytes.
You could fill this big array with data before passing it. For example fill it with '01234567' repeated. Then it's really obvious what bytes have been written and what bytes haven't, so you can work out the real size of what's returned.
I don't think there is a standard but you can try to identify which values are what by putting some solid color images in front of the camera. So all pixels would be approximately the same color. Having an idea of what color should be stored in each pixel you may understand how the color is represented in your array. I would go with black, white, reg, green, blue images.
But also consider finding a better SDK which has the documentation, because making just a big array is really bad design
You should check the documentation on your camera SDK, since there's no "standard" or "common" way for data output. It can be raw data, it can be RGB data, it can even be already compressed. If the camera vendor doesn't provide any information, you could try to find some libraries that handle most common formats, and try to pass the data you have to see what happens.
Without even knowing the type of the camera, this question is nearly impossible to answer.
If it is a scientific camera, chances are good that it adhers to the IEEE 1394 (aka IIDC or DCAM) standard. I have personally worked with such a camera made by Hamamatsu using this library to interface with the camera.
In my case the camera output was just raw data. The camera itself was monochrome and each pixel had a depth-resolution of 12 bit. Therefore, each pixel intensity was stored as 16-bit unsigned value in the result array. The size of the array was simply width * height * 2 bytes, where width and height are the image dimensions in pixels the factor 2 is for 16-bit per pixel. The width and height were known a-priori from the chosen camera mode.
If you have the dimensions of the result image, try to dump your byte array into a file and load the result either in Python or Matlab and just try to visualize the content. Another possibility is to load this raw file with an image editor such as ImageJ and hope to get anything out from it.
Good luck!
I hope this question's solution will helps you: https://stackoverflow.com/a/3340944/291372
Actually you've got an array of pixels (assume 1 byte per pixel if you camera captires in 8-bit). What you need - is just determine width and height. after that you can try to restore bitmap image from you byte array.
I'm somewhat new to OpenGL though I'm fairly sure my problem lies in the pixel format being used, or how my texture is being generated...
I'm drawing a texture onto a flat 2D quad using a 16bit RGB5_A1 pixel format, though I don't make use of any alpha at this stage. The problem I'm having is that each pair of horizontal pixel values have been swapped.
That is... if the pixels positions should be in this order (assume 8x2 image)
0 1 2 3
4 5 6 7
they are instead drawn as
1 0 3 2
5 4 7 6
Or, more clearly from this image (below).
Left is what I get... Right is what I should get.
.
The question is... How have I ended up with this? Is there something wrong with the pixel format? Unlikely since the colours all appear correct, and I would expect all kinds of nasty if it were down to endian-ness. Suggestions greatly appreciated.
Update: Turns out the problem was in my source renderer. Interestingly, I've avoided the problem entirely by using 32-bit textures (haven't tried 24-bit at this point).
This may be unrelated, and you have found a workaround, but it could be related to OpenGL unpack alignment. Have you tried with the following call ? To instruct the alignment of every image row to 1 byte (default is 4).
glPixelStorei(GL_UNPACK_ALIGNMENT, 1);
I have an image which is representative of an Array2D:
template<class T = uint8_t>
Array2D<T> mPixData[4]; ///< 3 component channels + alpha channel.
The comment is in the library. I have no clues about the explanation.
Would someone:
explain what are the 3 component channels + alpha channel are about
show how I could resize this image based on the mPixData
Without know what library this is, here is a stab in the dark:
The type definition implies that it is creating a 2D array of unsigned chars (allowing you to store values up to 255.
template<class T = uint8_t> Array2D<T>
Then, mPixData itself is an array, which implies that at each co-ordinate, you have four values (bytes to contend with), 3 for the colours (let's say RGB, but could be something else) and 1 for Alpha.
The "image" is basically this three dimensional array. Presumably when loading stuff into it, it resizes to the input - what you need to do is to find some form of resizing algorithm (not an image processing expert myself, but am sure google will reveal something), which will then allow you to take this data and do what you need...
1) 3 component channels - Red Green Blue channels. alpha channel tells about image transparency
2) There are many algorithms you can use to resize the image. The simplest would be to discard extra pixels. Another simple is to do interpolation
The 3 component channels represent the Red Green Blue (aka RGB) channels. The 4th channel, ALPHA, is the transparency channel.
A pixel is defined by mPixData[4]
mPixData[0] -> R
mPixData[1] -> G
mPixData[2] -> B
mPixData[3] -> A
Therefore, an image can be represented as a vector or array of mPixData[4]. As you already stated, in this case is Array2D<T> mPixData[4];
Resize/rescale/resample an image is not a trivial process. There are lots of materials available on the web about it and I think you should consider using a library to do this. Check CxImage (Windows/Linux).
There are some code here but I haven't tested it. Check the resample() function.
Hi the 3 channels are the rgb + alpha channel. So red green and blue channels and the alpha channel. There are several methods to downscaling. You could take for example every 4 pixel, but the result would look quite bad, take a look at different interpolation methods e.g.: http://en.wikipedia.org/wiki/Bilinear_interpolation.
Or if you want to use a library use: http://www.imagemagick.org/Magick++/
or as mentioned by karlphillip:
http://www.xdp.it/cximage.htm