How to flip data from frame buffer - opengl

I am attaching a PBO to the Opengl framebuffer and than use glMapBuffer() to get access to the data.
I am passing the data to a Bluefish card for SDI Output.
The issue is that the resultant output appears inverted.
How can i invert y axis of the data being pointed by PBO pointer.
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[writeIndex]);
// copy from framebuffer to PBO asynchronously. it will be ready in the NEXT frame
glReadPixels(0, 0, SCR_WIDTH, SCR_HEIGHT, GL_RGB, GL_UNSIGNED_BYTE, nullptr);
// now read other PBO which should be already in CPU memory
glBindBuffer(GL_PIXEL_PACK_BUFFER, pbo[readIndex]);
// map buffer so we can access it
void* downsampleData = (unsigned char *)glMapBuffer(GL_PIXEL_PACK_BUFFER,GL_READ_ONLY);
This is how i am trying to flip data after advised by Nicol and i get the desired result.
unsigned char OriginalData[width * height * 4];
unsigned char FlippedData[width * height * 4];
memcpy( OriginalData , downsampleData , sizeof( OriginalData) ); // copy data from the pointer.
for( int i = sizeof( OriginalData) - 1; i >= 0 ; i-- )
Flippeddata[k] = OriginalData[sizeof( OriginalData) - 1 - 1];

You can't. OpenGL always considers the first row to be the bottom row of the image data for any image operation (sending/receiving pixel blocks, fetching texture samples/image data in a shader, etc). So if you want to invert the data you get, you will have to do that manually by copying the data around.


How to use glImportMemoryWin32HandleEXT to share an ID3D11Texture2D KeyedMutex Shared handle with OpenGL?

I am investigating how to do cross-process interop with OpenGL and Direct3D 11 using the EXT_external_objects, EXT_external_objects_win32 and EXT_win32_keyed_mutex OpenGL extensions. My goal is to share a B8G8R8A8_UNORM texture (an external library expects BGRA and I can not change it. What's relevant here is the byte depth of 4) with 1 Mip-level allocated and written to offscreen with D3D11 by one application, and render it with OpenGL in another. Because the texture is being drawn to off-process, I can not use WGL_NV_DX_interop2.
My actual code can be seen here and is written in C# with Silk.NET. For illustration's purpose though, I will describe my problem in psuedo-C(++).
First I create my texture in Process A with D3D11, and obtain a shared handle to it, and send it over to process B.
#define WIDTH 100
#define HEIGHT 100
#define BPP 4 // BGRA8 is 4 bytes per pixel
ID3D11Texture2D *texture;
D3D11_TEXTURE2D_DESC texDesc = {
.Width = WIDTH,
.Height = HEIGHT,
.MipLevels = 1,
.ArraySize = 1,
.SampleDesc = { .Count = 1, .Quality = 0 }
.CPUAccessFlags = 0,
device->CreateTexture2D(&texDesc, NULL, &texture);
HANDLE sharedHandle;
texture->CreateSharedHandle(NULL, DXGI_SHARED_RESOURCE_READ, NULL, &sharedHandle);
SendToProcessB(sharedHandle, pid);
In Process B, I first duplicate the handle to get one that's process-local.
HANDLE localSharedHandle;
HANDLE hProcA = OpenProcess(PROCESS_DUP_HANDLE, false, processAPID);
DuplicateHandle(hProcA, sharedHandle, GetCurrentProcess(), &localSharedHandle, 0, false, DUPLICATE_SAME_ACCESS);
At this point, I have a valid shared handle to the DXGI resource in localSharedHandle. I have a D3D11 implementation of ProcessB that is able to successfully render the shared texture after opening with OpenSharedResource1. My issue here is OpenGL however.
This is what I am currently doing for OpenGL
GLuint sharedTexture, memObj;
glCreateTextures(GL_TEXTURE_2D, 1, &sharedTexture);
glTextureParameteri(sharedTexture, TEXTURE_TILING_EXT, OPTIMAL_TILING_EXT); // D3D11 side is D3D11_TEXTURE_LAYOUT_UNDEFINED
// Create the memory object handle
glCreateMemoryObjectsEXT(1, &memObj);
// I am not actually sure what the size parameter here is referring to.
// Since the source texture is DX11, there's no way to get the allocation size,
// I make a guess of W * H * BPP
// According to docs for VkExternalMemoryHandleTypeFlagBitsNV, NtHandle Shared Resources use HANDLE_TYPE_D3D11_IMAGE_EXT
glImportMemoryWin32HandleEXT(memObj, WIDTH * HEIGHT * BPP, GL_HANDLE_TYPE_D3D11_IMAGE_EXT, (void*)localSharedHandle);
Checking for errors along the way seems to indicate the import was successful. However I am not able to bind the texture.
if (glAcquireKeyedMutexWin32EXT(memObj, 0, (UINT)-1) {
glTextureStorageMem2D(sharedTexture, 1, GL_RGBA8, WIDTH, HEIGHT, memObj, 0);
glReleaseKeyedMutexWin32EXT(memObj, 0);
What goes wrong is the call to glTextureStorageMem2D. The shared KeyedMutex is being properly acquired and released. The extension documentation is unclear as to how I'm supposed to properly bind this texture and draw it.
After some more debugging, I managed to get [DebugSeverityHigh] DebugSourceApi: DebugTypeError, id: 1281: GL_INVALID_VALUE error generated. Memory object too small from the Debug context. By dividing my width in half I was able to get some garbled output on the screen.
It turns out the size needed to import the texture was not WIDTH * HEIGHT * BPP, (where BPP = 4 for BGRA in this case), but WIDTH * HEIGHT * BPP * 2. Importing the handle with size WIDTH * HEIGHT * BPP * 2 allows the texture to properly bind and render correctly.

Ring buffered SSBO with compute shader

I am performing view frustum culling and generating draw commands on the GPU in a compute shader and I want to pass the bounding volumes in a SSBO. Currently I am using just a large uniform array but I want to go bigger thus the need to move to a SSBO.
The thing I want to accomplish is something a kin to the AZDO approach of using triple buffering in order to avoid sync issues when updating the SSBO by only updating one third of the buffer while guarding the rest with fences.
Is this possible to combine with the compute shader dispatch or should I just create three different SSBOs and then bind each of them accordingly?
The solution as I currently see it would be to somehow tell the following drawcall to only fetch data in the SSBO from a certain offset (0 * buffer_size, 1 * buffer_size, etc). Is this even possible?
Render loop
/* Fence creation omitted for clarity */
// Cycle round updating different parts of the buffer
const uint32_t buffer_idx = (frame % gl_state.bvb_num_partitions);
uint8_t* ptr = (uint8_t*)gl_state.bvbp + buffer_idx * gl_state.bvb_buffer_size;
std::memcpy(ptr,, gl_state.bvb_buffer_size);
const uint32_t gl_bv_binding_point = 3; // Shader hard coded
const uint32_t offset = buffer_idx * gl_state.bvb_buffer_size;
glBindBufferRange(GL_SHADER_STORAGE_BUFFER, gl_bv_binding_point, gl_state.bvb, offset, gl_state.bvb_buffer_size);
// OLD WAY: glUniform4fv(glGetUniformLocation(gl_state.cull_shader.gl_program, "spheres"), NUM_OBJECTS, &bounding_volumes[0].pos.x);
glUniform4fv(glGetUniformLocation(gl_state.cull_shader.gl_program, "frustum_planes"), 6, glm::value_ptr(frustum[0]));
glDispatchCompute(NUM_OBJECTS, 1, 1);
glMemoryBarrier(GL_COMMAND_BARRIER_BIT | GL_SHADER_STORAGE_BARRIER_BIT); // Buffer objects affected by this bit are derived from the GL_DRAW_INDIRECT_BUFFER binding.
Bounding volume SSBO creation
// Bounding volume buffer
glGenBuffers(1, &gl_state.bvb);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, gl_state.bvb);
gl_state.bvb_buffer_size = NUM_OBJECTS * sizeof(BoundingVolume);
gl_state.bvb_num_partitions = 3; // 1 for application, 1 for OpenGL driver, 1 for GPU
glBufferStorage(GL_SHADER_STORAGE_BUFFER, gl_state.bvb_num_partitions * gl_state.bvb_buffer_size, nullptr, flags);
gl_state.bvbp = glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, gl_state.bvb_buffer_size * gl_state.bvb_num_partitions, flags);

How to use a .raw file in opengl

I'm trying to read a .raw image format and do some modifications on it in OpenGL. I can read the image like this:
int width, height;
BYTE * data;
FILE * file;
file = fopen( filename, "rb" );
if ( file == NULL ) return 0;
width = 256;
height = 256;
data = malloc( width * height * 3 );
fread( data, width * height * 3, 1, file );
fclose( file );
But i dont know how to use glDrawPixels to draw the picture.
My second problem is that I dont know how can I access each pixel. I mean in a .raw image format, each pixel should have 3 integers for storing RGB values(Am I right?). How can I access these RGB values directly?
There's no such thing as a .raw in the hard and fast sense. The name implies image data with no header but doesn't specify the format of the data. RGB is likely but so is RGBA and it's trivial to think of almost endless other possibilities.
Assuming RGB ordering, one byte per channel, then: each pixel is three bytes wide. So the nth pixel is:
r = data[n*3 + 0]
g = data[n*3 + 1]
b = data[n*3 + 2]
Assuming the data is set out so that the pixels are stored in left-to-right order, line by line, then on the first line the pixel at x=3 is at n=3, on the second it's at n=(width of first line)+3, on the third it's at n=(combined width of first two lines)+3, etc.
r = data[(x + y*width)*3 + 0]
g = data[(x + y*width)*3 + 1]
b = data[(x + y*width)*3 + 2]
To use glDrawPixels just follow what the manual tells you to specify as the parameters. It says:
void glDrawPixels( GLsizei width,
GLsizei height,
GLenum format,
GLenum type,
const GLvoid * data);
You say that width and height are 256. You've said that the format is RGB. Scan down the documentation and you'll see that the corresponding GLenum is GL_RGB. You're saying each channel is a single byte in size. So that's GL_UNSIGNED_BYTE. You've loaded the data to data. So:
glDrawPixels(256, 256, GL_RGB, GL_UNSIGNED_BYTE, data);
Further comments: obviously get this working first so you've something to build on but glDrawPixels is almost unused in practice. As a result it isn't even part of OpenGL ES or, correspondingly, WebGL. Look at the semantics of the thing. You supply your buffer every time you call. OpenGL can't know whether it has been modified since the last call. So every call transfers your data from CPU to GPU. Look into submitting your data once as a texture and drawing using geometry. That'll save the per-call transfer cost and therefore be a lot more efficient.

OpenGL: Shader storage buffer mapping/binding

I'm currently working on a program which supports depth-independent (also known as order-independent) alpha blending. To do that, I implemented a per-pixel linked list, using a texture for the header (points for every pixel to the first entry in the linked list) and a texture buffer object for the linked list itself. While this works fine, I would like to exchange the texture buffer object with a shader storage buffer as an excercise.
I think I almost got it, but it took me about a week to get to a point where I could actually use the shader storage buffer. My question are:
Why I can't map the shader storage buffer?
Why is it a problem to bind the shader storage buffer again?
For debugging, I just display the contents of the shader storage buffer (which doesn't contain a linked list yet). I created the shader storage buffer in the following way:
glm::vec4* bufferData = new glm::vec4[windowOptions.width * windowOptions.height];
glm::vec4* readBufferData = new glm::vec4[windowOptions.width * windowOptions.height];
for(unsigned int y = 0; y < windowOptions.height; ++y)
for(unsigned int x = 0; x < windowOptions.width; ++x)
// Set the whole buffer to red
bufferData[x + y * windowOptions.width] = glm::vec4(1,0,0,1);
GLuint ssb;
// Get a handle
glGenBuffers(1, &ssb);
// Create buffer
glBufferData(GL_SHADER_STORAGE_BUFFER, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData, GL_DYNAMIC_COPY);
// Now bind the buffer to the shader
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
In the shader, the shader storage buffer is defined as:
layout (std430, binding = 0) buffer BufferObject
vec4 points[];
In the rendering loop, I do the following:
for(unsigned int y = 0; y < windowOptions.height; ++y)
for(unsigned int x = 0; x < windowOptions.width; ++x)
// Create a green/red color gradient
bufferData[x + y * windowOptions.width] =
glm::vec4((float)x / (float)windowOptions.width,
(float)y / (float)windowOptions.height, 0.0f, 1.0f);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
glBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), bufferData);
// Retrieving the buffer also works fine
// glMemoryBarrier(GL_ALL_BARRIER_BITS);
// glGetBufferSubData(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height * sizeof(glm::vec4), readBufferData);
glMemoryBarrier(GL_ALL_BARRIER_BITS); // Don't know if this is necessary, just a precaution
// Draw a quad which fills the screen
// ...
This code works, but when I replace glBufferSubData with the following code,
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, windowOptions.width * windowOptions.height, GL_WRITE_ONLY);
for(unsigned int x = 0; x < windowOptions.width; ++x)
for(unsigned int y = 0; y < windowOptions.height; ++y)
p[x + y * windowOptions.width] = glm::vec4(0,1,0,1);
the mapping fails, returning GL_INVALID_OPERATION. It seems like the shader storage buffer is still bound to something, so it can't be mapped. I read something about glGetProgramResourceIndex ( and glShaderStorageBlockBinding (, but I don't really get it.
My second question is, why I can neither call
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 0, ssb);
, nor
in the render loop after glBufferSubData and glMemoryBarrier. This code should not change a thing, since these calls are the same as during the creation of the shader storage buffer. If I can't bind different shader storage buffers, I can only use one. But I know that more than one shader storage buffer is supported, so I think I'm missing something else (like "releasing" the buffer).
First of all, the glMapBufferRange fails simply because GL_WRITE_ONLY is not a valid argument to it. That was used for the old glMapBuffer, but glMapBufferRange uses a collection of flags for more fine-grained control. In your case you need GL_MAP_WRITE_BIT instead. And since you seem to completely overwrite the whole buffer, without caring for the previous values, an additional optimization would probably be GL_MAP_INVALIDATE_BUFFER_BIT. So replace that call with:
glm::vec4* p = (glm::vec4*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0,
windowOptions.width * windowOptions.height,
The other error is not described that well in the question. But fix this one first and maybe it will already help with the following error.

Write BITMAPINFOHEADER image data to IDirect3DTexture9

I'm writing a DX9 renderer and currently working on the ability to play AVI movie files. I've been able to retrieve any specified frame using AVIStreamGetFrame(), which returns a packed DIB, and from there I want to be able to copy that bitmap data to an already existing IDirect3DTexture9 *.
My issue is a lack of understanding of the bitmap file format and knowing how to convert the pixel data given from a BITMAPINFOHEADER to a format that IDirect3DTexture9 can interpret.
I first create my DX9 texture like this:
LPBITMAPINFO bmpInfo = m_pVideoData->GetVideoFormat();
Questions I have here are listed as comments above. When I get the BITMAPINFO and for instance it reads bmpInfo.bmiHeader.biBitCount = 8 (or 16, etc.) does this mean I need to change the D3DFMT_* accordingly?
Later on when I get a LPBITMAPINFOHEADER for the frame I want to render, I'm lost on what to do with pBits returned from the IDirect3DTexture9::LockRect() function. Here is what I have so far:
// Retrieve a frame from the video data as a BITMAPINFOHEADER
m_pVideoData->GetVideoFrame(0, 0, &pBmpInfoHeader);
if(FAILED(m_pD3DTexture->LockRect(0, &rect, NULL, 0)))
DWORD* pDest = (DWORD*)rect.pBits;
// Now what to copy from pBmpInfoHeader?
Are there any API calls that do this for me that I haven't seen? Or does anyone know of an easier way than this? Thanks for reading/helping.
Got it to work!
Couple notes to consider. My AVI file (a single frame/bitmap in this case) was in a 16 bit format, therefore I had to create my destination texture with D3DFMT_X1R5G5B5. Also, bitmaps are stored upside down, so I had to reverse my pointer and read each row backwards.
Here's the code:
// Retrieve a frame from the video data as a BITMAPINFOHEADER
m_pVideoData->GetVideoFrame(0, 0, &pBmpInfoHeader);
// Get dimentions
long nWidth = pBmpInfoHeader->biWidth;
long nHeight = pBmpInfoHeader->biHeight;
// Bitmap width correction (might not be needed...)
if (nWidth % 4 != 0)
nWidth = nWidth + (4 - nWidth%4);
// Get Pixel data (should be after the header in memory)
WORD bitCount = pBmpInfoHeader->biBitCount;
DWORD size = nWidth * nHeight * bitCount/8;
BYTE *pPixelSrc = (BYTE *)pBmpInfoHeader + sizeof(pBmpInfoHeader);
// Lock the texture so we can write this frame's texel data
if(FAILED(m_pD3DTexture->LockRect(0, &lock, NULL, 0)))
int iNumBytesPerRowSrc = pBmpInfoHeader->biWidth * (pBmpInfoHeader->biBitCount/8);
int iNumBytesPerRowDst = lock.Pitch;
int iNumBytesToCopyPerRow = min(iNumBytesPerRowSrc, iNumBytesPerRowDst);
// Bitmap data is stored upside down
// Start at the end and work backwards
pPixelSrc += (iNumBytesPerRowSrc * nHeight);
// Store a pointer to the texture pixel data and write new data
BYTE* ucTexDst = (BYTE *)lock.pBits;
for(int y = 0; y < nHeight; ++y)
pPixelSrc -= iNumBytesPerRowSrc;
memcpy(ucTexDst, pPixelSrc, iNumBytesToCopyPerRow);
ucTexDst += iNumBytesPerRowDst;
// Unlock texture so gfx card can resume its business