glGetTexImage reads too much data with texture format GL_ALPHA - c++

I'm trying to retrieve the pixel information for an alpha-only texture via glGetTexImage.
The problem is, the glGetTexImage-Call seems to read more data than it should, leading to memory corruption and a crash at the delete[]-Call. Here's my code:
int format;
int w;
int h;
if(w == 0 || h == 0)
return false;
if(format != GL_ALPHA)
return false;
unsigned int size = w *h *sizeof(unsigned char);
unsigned char *pixels = new unsigned char[size];
delete[] pixels;
glGetError reports no errors, and without the glGetTexImage-Call it doesn't crash.
'target' is GL_TEXTURE_2D (The texture is valid and bound before the shown code), 'w' is 19, 'h' is 24, 'level' is 0.
If I increase the array size to (w *h *100) it doesn't crash either. I know for a fact that GL_UNSIGNED_BYTE has the same size as an unsigned char on my system, so I don't understand what's going on here.
Where's the additional data coming from and how can I make sure that my array is large enough?

Each row written to or read from by OpenGL pixel operations like glGetTexImage are aligned to a 4-byte boundary by default, which may add some padding.
To modify the alignment, use glPixelStorei with the GL_[UN]PACK_ALIGNMENT setting. GL_PACK_ALIGNMENT affects operations that read from OpenGL memory (glReadPixels, glGetTexImage, etc.) while GL_UNPACK_ALIGNMENT affects operations that write to OpenGL memory (glTexImage, etc.)
The alignment can be any of 1 (tightly packed with no padding), 2, 4 (the default), or 8.
So in your case, run glPixelStorei(GL_PACK_ALIGNMENT, 1); before running glGetImage2D.


what is 'linesize alignment' meaning?

I'm following ffmpeg tutorial in
I have just found that avpicture_get_size function is deprecated.
So I have checked ffmpeg's document( and found substitute av_image_get_buffer_size.
But I can't understand align parameter meaning 'linesize alignment'......
What is it meaning?
Some parts of FFmpeg, notably libavcodec, require aligned linesizes[], which means that it requires:
assert(linesize[0] % 32 == 0);
assert(linesize[1] % 32 == 0);
assert(linesize[2] % 32 == 0);
This allows it to use fast/aligned SIMD routines (for example SSE2/AVX2 movdqa or vmovdqa instructions) for data access instead of their slower unaligned counterparts.
The align parameter to this av_image_get_buffer_size function is this line alignment, and you need it because the size of the buffer is affected by it. E.g., the size of a Y plane in a YUV buffer isn't actually width * height, it's linesize[0] * height. You'll see that (especially for image sizes that are not a multiple of 16 or 32), as you increase align to higher powers of 2, the return value slowly increases.
Practically speaking, if you're going to use this picture as output buffer for calls to e.g. avcodec_decode_video2, this should be 32. For swscale/avfilter, I believe there is no absolute requirement, but you're recommended to still make it 32.
My practice:
1.avpicture deprecated problem, I replace avpicture functions with AVFrame & imgutils functions. code sample:
//AVPicture _picture;
AVFrame *_pictureFrame;
uint8_t *_pictureFrameData;
//_pictureValid = avpicture_alloc(&_picture,
// _videoCodecCtx->width,
// _videoCodecCtx->height) == 0;
_pictureFrame = av_frame_alloc();
_pictureFrame->width = _videoCodecCtx->width;
_pictureFrame->height = _videoCodecCtx->height;
_pictureFrame->format = AV_PIX_FMT_RGB24;
int size = av_image_get_buffer_size(_pictureFrame->format,
//dont forget to free _pictureFrameData at last
_pictureFrameData = (uint8_t*)av_malloc(size);
if (_pictureFrame) {
if (_pictureFrameData) {
2.align parameter
first I set align to 32, but for some video streams it did not work, cause distorted images. Then I set it to 16(my environment : mac, Xcode, iPhone6), the some streams works well. But at last i set align to 1, for I have found this
Fill in the AVPicture fields, always assume a linesize alignment of 1.
If you look at the definition of avpicture_get_size in version 3.2 you see the following code:
int avpicture_get_size(enum AVPixelFormat pix_fmt, int width, int height)
return av_image_get_buffer_size(pix_fmt, width, height, 1);
It simply calls the suggested function: av_image_get_buffer_size with the align parameter set to 1. I did not go further to find out the full significance of why 1 is used for the depreciated function. As usual with ffmpeg, one can probably figure it out by reading the right code and enough code (with some code experiments).

How can I create a 3d texture with negative values and read it from a shader

I have written a volume rendering program that turns some 2d images into a 3d volume that can be rotated around by a user. I need to calculate a normal for each point in the 3d texture (for lighting) by taking the gradient in each direction around the point.
Calculating the normal requires six extra texture accesses within the fragment shader. The program is much faster without these extra texture access, so I am trying to precompute the gradients for each direction (x,y,z) in bytes and store it in the BGA channels of the original texture. My bytes seem to contain the right values when I test on the CPU, but when I get to the shader it comes out looking wrong. It's hard to tell why it fails from the shader, I think it is because some of the gradient values are negative. However, when I specify the texture type as GL_BYTE (as opposed to GL_UNSIGNED_BYTE) it is still wrong, and that screws up how the original texture should look. I can't tell exactly what's going wrong just by rendering the data as colors. What is the right way to put negative values into a texture? How can I know that values are negative when I read from it in the fragment shader?
The following code shows how I run the operation to compute the gradients from a byte array (byte[] all) and then turn it into a byte buffer (byteBuffer bb) that is read in as a 3d texture. The function 'toLoc(x,y,z,w,h,l)' simply returns (x+w*(y+z*h))*4)--it converts 3d subscripts to a 1d index. The image is grayscale, so I discard gba and only use the r channel to hold the original value. The remaining channels (gba) store the gradient.
int pixelDiffxy=5;
int pixelDiffz=1;
int count=0;
Float r=0f;
byte t=r.byteValue();
for(int i=0;i<w;i++){
for(int j=0;j<h;j++){
for(int k=0;k<l;k++){
if(i<pixelDiffxy || i>=w-pixelDiffxy || j<pixelDiffxy || j>=h-pixelDiffxy || k<pixelDiffz || k>=l-pixelDiffz){
//set these all to zero since they are out of bounds
int ri=(int)all[toLoc(i,j,k,w,h,l)+0] & 0xff;
//find the values on the sides of this pixel in each direction (use red channel)
int xgrad1=(all[toLoc(i-pixelDiffxy,j,k,w,h,l)])& 0xff;
int xgrad2=(all[toLoc(i+pixelDiffxy,j,k,w,h,l)])& 0xff;
int ygrad1=(all[toLoc(i,j-pixelDiffxy,k,w,h,l)])& 0xff;
int ygrad2=(all[toLoc(i,j+pixelDiffxy,k,w,h,l)])& 0xff;
int zgrad1=(all[toLoc(i,j,k-pixelDiffz,w,h,l)])& 0xff;
int zgrad2=(all[toLoc(i,j,k+pixelDiffz,w,h,l)])& 0xff;
//find the difference between the values on each side and divide by the distance between them
int xgrad=(xgrad1-xgrad2)/(2*pixelDiffxy);
int ygrad=(ygrad1-ygrad2)/(2*pixelDiffxy);
int zgrad=(zgrad1-zgrad2)/(2*pixelDiffz);
Vec3f grad=new Vec3f(xgrad,ygrad,zgrad);
Integer xg=(int) (grad.x);
Integer yg=(int) (grad.y);
Integer zg=(int) (grad.z);
//System.out.println("gs are: "+xg +", "+yg+", "+zg);
byte gby= (byte) (xg.byteValue());//green channel
byte bby= (byte) (yg.byteValue());//blue channel
byte aby= (byte) (zg.byteValue());//alpha channel
//System.out.println("gba is: "+(int)gby +", "+(int)bby+", "+(int)aby);
ByteBuffer bb=ByteBuffer.wrap(all);
final GL gl = drawable.getGL();
final GL2 gl2 = gl.getGL2();
final int[] bindLocation = new int[1];
gl.glGenTextures(1, bindLocation, 0);
gl2.glBindTexture(GL2.GL_TEXTURE_3D, bindLocation[0]);
gl2.glPixelStorei(GL.GL_UNPACK_ALIGNMENT, 1);//-byte alignment
gl2.glTexImage3D( GL2.GL_TEXTURE_3D, 0,GL.GL_RGBA,
w, h, l, 0,
Is there a better way to get a large array of signed data into the shader?
gl2.glTexImage3D( GL2.GL_TEXTURE_3D, 0,GL.GL_RGBA,
w, h, l, 0, GL.GL_RGBA, GL.GL_UNSIGNED_BYTE, bb );
Well, there are two ways to go about doing this, depending on how much work you want to do in the shader vs. what OpenGL version you want to limit things to.
The version that requires more shader work also requires a bit more out of your code. See, what you want to do is have your shader take unsigned bytes, then reinterpret them as signed bytes.
The way that this would typically be done is to pass unsigned normalized bytes (as you're doing), which produces floating-point values on the [0, 1] range, then simply expand that range by multiplying by 2 and subtracting 1, yielding numbers on the [-1, 1] range. This means that your uploading code needs to take it's [-128, 127] signed bytes and convert them into [0, 255] unsigned bytes by adding 128 to them.
I have no idea how to do this in Java, which does not appear to have an unsigned byte type at all. You can't just pass a 2's complement byte and expect it to work in the shader; that's not going to happen. The byte value -128 would map to the floating-point value 1, which isn't helpful.
If you can manage to convert the data properly as I described above, then your shader access would have to unpack from the [0, 1] range to the [-1, 1] range.
If you have access to GL 3.x, then you can do this quite easily, with no shader changes:
gl2.glTexImage3D( GL2.GL_TEXTURE_3D, 0,GL.GL_RGBA8_SNORM,
w, h, l, 0, GL.GL_RGBA, GL.GL_BYTE, bb );
The _SNORM in the image format means that it is a signed, normalized format. So your bytes on the range [-128, 127] will be mapped to floats on the range [-1, 1]. Exactly what you want.

C++ fwrite access violation when writing image file

I need to append RGB frame to file on each call.
Here is what I do :
size_t lenght=_viewWidth * _viewHeight * 3;
BYTE *bytes=(BYTE*)malloc(lenght);
/////////////// read pixels from OpenGL tex /////////////////////
///write it to file :
hOutFile = fopen( outFileName.c_str(), cfg.appendMode ? "ab" : "wb" );
fwrite(bytes, 1 ,w * h, hOutFile); // Write
Somehow I am getting access violation when fwrite gets called.Probably I misunderstood how to use it.
How do you determine _viewWidth and _viewHeight? When reading back a texture you should retrieve them with glGetTexLevelparameteri to retrieve the GL_TEXTURE_WIDTH, and GL_TEXTURE_HEIGHT parameters from the GL_TEXTURE_2D target.
Also the line
fwrite(bytes, 1 ,w * h, hOutFile);
is wrong. What is w, what is h? They never get initialized in the code and are not connected to the other allocations up there. Also if those are width and height of the image, it still lacks the number of elements of a pixel. Most likely 3.
It would make more sense to have something like
int elements = ...; // probably 3
int w = ...;
int h = ...;
size_t bytes_length = w*elements * h;
bytes = malloc(bytes_length)
fwrite(bytes, w*elements, h, hOutFile);
Is it caused by bytes?
maybe w * h is not what you think it is.
Is the width ever an odd number or not evenly divisible by 4?
By default OpenGL assumes that a row of pixel data is aligned to a four byte boundary. With RGB/BGR this isn't always the case, and if so you'll be writing beyond the malloc'ed block and clobbering something. Try putting
glPixelStorei(GL_PACK_ALIGNMENT, 1)
before reading the pixels and see if the problem goes away.

Loading a 3D byte array from a .raw file

As in my previous question, I'm interested in loading a .raw file of a volume dataset into a byte array. I think using a 3D byte array would make things easier when indexing the X,Y,Z coordinates, but I'm not sure about the read size that I should use to load the volume. Would this size declaration allow me to index the volume data correctly?
int XDIM=256, YDIM=256, ZDIM=256;
const int size = XDIM*YDIM*ZDIM;
bool LoadVolumeFromFile(const char* fileName) {
FILE *pFile = fopen(fileName,"rb");
if(NULL == pFile) {
return false;
GLubyte* pVolume=new GLubyte[XDIM][YDIM][ZDIM];
fread(pVolume,sizeof(GLubyte),size,pFile); // <-is this size ok?
From the code you posted the fread() call appears to be safe, but consider if a 3D byte array is the best choice of a data structure.
I assume you are doing some kind of rendering as you are using GLubyte. And of course to do any rendering you need to access a vertex defined in 3D space. That will lead to:
This will constantly cause your cahce to be thrashed. The memory will be laid out will all the xs first, then all the ys, and then all the zs. Thus, each time you jump from an x to y to z you may hit a cache miss and really slow perf.

C++ memcpy and happy access violation

For some reason i can't figure i am getting access violation.
memcpy_s (buffer, bytes_per_line * height, image, bytes_per_line * height);
This is whole function:
int Flip_Bitmap(UCHAR *image, int bytes_per_line, int height)
// this function is used to flip bottom-up .BMP images
UCHAR *buffer; // used to perform the image processing
int index; // looping index
// allocate the temporary buffer
if (!(buffer = (UCHAR *) malloc (bytes_per_line * height)))
// copy image to work area
//memcpy(buffer, image, bytes_per_line * height);
memcpy_s (buffer, bytes_per_line * height, image, bytes_per_line * height);
// flip vertically
for (index = 0; index < height; index++)
memcpy(&image[((height - 1) - index) * bytes_per_line], &buffer[index * bytes_per_line], bytes_per_line);
// release the memory
// return success
} // end Flip_Bitmap
Whole code:
To run this you'll need 24-bit bitmap, in your source directory.
This is a part of a larger code, i am trying to make Load_Bitmap_File function to work...
So, any ideas?
You're getting an access violation because a lot of image programs don't set biSizeImage properly. The image you're using probably has biSizeImage set to 0, so you're not allocating any memory for the image data (in reality, you're probably allocating 4-16 bytes, since most malloc implementations will return a non-NULL value even when the requested allocation size is 0). So, when you go to copy the data, you're reading past the ends of that array, which results in the access violation.
Ignore the biSizeImage parameter and compute the image size yourself. Keep in mind that the size of each scan line must be a multiple of 4 bytes, so you need to round up:
// Pseudocode
#define ROUNDUP(value, power_of_2) (((value) + (power_of_2) - 1) & (~((power_of_2) - 1)))
bytes_per_line = ROUNDUP(width * bits_per_pixel/8, 4)
image_size = bytes_per_line * height;
Then just use the same image size for reading in the image data and for flipping it.
As the comments have said, the image data is not necessarily width*height*bytes_per_pixel
Memory access is generally faster on 32bit boundaries and when dealing with images speed generally matters. Because of this the rows of an image are often shifted to start on a 4byte (32bit) boundary
If the image pixels are 32bit (ie RGBA) this isn't a problem but if you have 3bytes per pixel (24bit colour) then for certain image widths, where the number of columns * 3 isn't a multiple of 4, then extra blank bytes will be inserted at the edn of each row.
The image format probably has a "stride" width or elemsize value to tell you this.
You allocate bitmap->bitmapinfoheader.biSizeImage for image but proceed to copy bitmap->bitmapinfoheader.biWidth * (bitmap->bitmapinfoheader.biBitCount / 8) * bitmap->bitmapinfoheader.biHeight bytes of data. I bet the two numbers aren't the same.