I have software that intercepts the calls of OpenGL applications (performing other tasks in some cases). For some reason, when I intercept a particular application, this call:
glTexImage2D(target= GL_TEXTURE_2D, level= 0, internalformat= 4, width= 256, height= 2, border= 0, format= GL_RGBA, type= GL_UNSIGNED_BYTE, pixels= 0)
causes GL_INVALID_OPERATION. However, when I run it without interception, just debug logging, it causes no error. I've read the documentation for this function, and not only does it seem to suggest this particular error is impossible given this input, but that it can only be caused by the arguments themselves, and not state-based issues.
Also, note that I've confirmed that the error does in fact occur only after the call to glTexImage2d().
So what's going on here?
Related
I am uploading image data into GL texture asynchronously.
In debug output I am getting these warnings during the rendering:
Source:OpenGL,type: Other, id: 131185, severity: Notification
Message: Buffer detailed info: Buffer object 1 (bound to
GL_PIXEL_UNPACK_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) has been
mapped WRITE_ONLY in SYSTEM HEAP memory (fast). Source:OpenGL,type:
Performance, id: 131154, severity: Medium Message: Pixel-path
performance warning: Pixel transfer is synchronized with 3D rendering.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
My code for that part:
//start copuying pixels into PBO from RAM:
mPBOs[mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
const uint32_t buffSize = pipe->GetBufferSize();
GLubyte* ptr = (GLubyte*)mPBOs[mCurrentPBO].MapRange(0, buffSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
if (ptr)
{
memcpy(ptr, pipe->GetBuffer(), buffSize);
mPBOs[mCurrentPBO].Unmap();
}
//copy pixels from another already full PBO(except of first frame into texture //
mPBOs[1 - mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
//mCopyTex is bound to mCopyFBO as attachment
glTextureSubImage2D(mCopyTex->GetHandle(), 0, 0, 0, mClientSize.x, mClientSize.y,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
mCurrentPBO = 1 - mCurrentPBO;
Then I just blit the result to default frame buffer. No rendering of geometry or anything like that.
glBlitNamedFramebuffer(
mCopyFBO,
0,//default FBO id
0,
0,
mViewportSize.x,
mViewportSize.y,
0,
0,
mViewportSize.x,
mViewportSize.y,
GL_COLOR_BUFFER_BIT,
GL_LINEAR);
Running on NVIDIA GTX 960 card.
This performance warning is nividia-specific and it is intended as a hint to tell you that you're not going to use a separate hw transfer queue, which is no wonder since you use a single thread, single GL context model, where both rendering (at least your your blit) and transfer are carried out.
See this nvidia presentation for some details about how nvidia handles this. Page 22 also explains this specific warning. Note that this warnign does not mean that your transfer is not asynchronous. It is still fully asynchronous to the CPU thread. It will just be synchronously processed on the GPU, with respect to the render commands which are in the same command queue, and you're not using the asynchronous copy engine which could do these copies independent from the rendering commands in a separate command queue.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
There is nothing wrong with your PBO usage.
It is not clear if your specific application could even benefit from using a more elaborate separate transfer queue scheme.
I implemented persistent mapped buffers in a renderer I wrote, very much alike in this tutorial here:
persistent-mapped-buffers-in-opengl
To keep it short, its working like this:
glGenBuffers(1, &vboID);
glBindBuffer(GL_ARRAY_BUFFER, vboID);
flags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT;
glBufferStorage(GL_ARRAY_BUFFER, MY_BUFFER_SIZE, 0, flags);
Mapping (only once after creation...):
flags = GL_MAP_WRITE_BIT | GL_MAP_PERSISTENT_BIT | GL_MAP_COHERENT_BIT;
myPointer = glMapBufferRange(GL_ARRAY_BUFFER, 0, MY_BUFFER_SIZE, flags);
Update:
// wait for the buffer
glClientWaitSync(Buffer.Sync[Index], GL_SYNC_FLUSH_COMMANDS_BIT, WaitDuration);
// modify underlying data...
lock the buffer:
glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
After I understood the idea, I was able to implement it without further problems, also gaining quite some performance due to it.
However due to the game engine I am working with, it can't be avoided that I sometimes get a lot of drawcalls with very small chunks, just a few verts. If this happens I get visual distortions, clearly indicating something goes wrong.
What I tried is to add more waits before and after the buffer is updated. Although this wouldn't have made any sense but just for testing, I tried to remove GL_MAP_COHERENT_BIT and to use glFlushMappedBufferRange, I tried with a single buffer instead of multiple buffers.
For me it looks like the fence wouldn't work correctly, but I can't see how this could happen.
Shouldn't fencing avoid any trouble like this, even if it would mean sacrificing performance?
The same situation however works without any trouble if I use glBufferData instead, or as said already, if the chunks are a few hundred verts instead and reduced drawcalls.
Any hints what this could be caused by or how I could get more information about what is failing would be very helpful.
There are no OpenGL errors at all.
It turned out that the problem was on an entirely different place and a combination of things.
For multiple buffers it was instead setting an index in
glDrawArrays(GL_TRIANGLES, 0, VertSize);
necessary to set the offset for the multiple buffers in
glVertexAttribPointer(VERTEX_COORD_ATTRIB, 3, GL_FLOAT, GL_FALSE, StrideSize, (void*) BeginOffset));
and the problem for the single buffer distortions came due to a missing wait or rather a wrong "if" when having only one.
I hope that this may be useful for someone else running into such a problem to know that it is no general issue with persistent mapped buffers.
Tying to figure out what the issue (and error code) is for this call. First to preface this works just fine on AMD, it only has issues on nVidia.
unsigned char *buffer;
...
cl_int status;
cl::size_t<3> origin;
cl::size_t<3> region;
origin[0]=0;
origin[1]=0;
origin[2]=0;
region[0]=m_width;
region[1]=m_height;
region[2]=1;
status=clEnqueueWriteImage(m_commandQueue, m_image, CL_FALSE, origin, region, 0, 0, buffer, 0, NULL, NULL);
status returns -1000, which is not a standard openCl error code. All other functions related to the opening of the device, context, and command queue all succeed. The context is interop'ed with openGl and again this is all completely functional on AMD.
For future reference, seems the error happens if the image is interop'ed with an OpenGL texture and the call is made before the image is acquired using clEnqueueAcquireGLObjects. I had used the acquire later when images were used but not right before the image was set. Amd's driver does not appear to care about this little detail.
I'm trying to write code that would offload framebuffer from one card to the other, and I'm wondering whether it's possible to efficiently use compression, since the memory seems to be bottlenecked in my case.
At the moment, I use simple readback & display routines:
readback:
glWaitsync(..);
glReadPixels(.., GL_BGRA, GL_UNISGNED_BYTE, NULL);
GLvoid *data = glMapBuffer(GL_PIXEL_PACK_BUFFER_EXT, GL_READ_ONLY);
display:
glGenBuffers(2, pbos);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_EXT, pbos[curr_ctx ^ 1]);
glBufferData(GL_PIXEL_UNPACK_BUFFER_EXT, width*height*4, NULL,GL_STREAM_DRAW);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_EXT, pbos[curr_ctx]);
glBufferData(GL_PIXEL_UNPACK_BUFFER_EXT, width*height*4, NULL,GL_STREAM_DRAW);
...
glBufferSubData(GL_PIXEL_UNPACK_BUFFER_EXT, 0, width*height*4, data);
glDrawPixels(width, height, GL_BGRA, GL_UNSIGNED_BYTE, NULL);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER_EXT, pbos[cur_ctx ^= 1]);
...
glXSwapBuffers(...);
There is also some synchronization via mutexes and other miscellaneous code in there, but this is the main body of the current code.
Unfortunately, it seems that memory bandwidth is the biggest problem here (on display card side which is sort of a capable USB capture card).
Is there any way to optimize this via OpenGL compression (S3TC)?
Preferably, I would like to compress on render card, copy into RAM, and then send it downstream to capture (display) card.
I do believe that I've seen some people do this by copying framebuffer into texture, asking GL to compress it, but quite frankly I'm new to GL programming. So I thought I would ask here.
I'm experiencing a difficult problem on certain ATI cards (Radeon X1650, X1550 + and others).
The message is: "Access violation at address 6959DD46 in module 'atioglxx.dll'. Read of address 00000000"
It happens on this line:
glGetTexImage(GL_TEXTURE_2D,0,GL_RGBA,GL_FLOAT,P);
Note:
Latest graphics drivers are installed.
It works perfectly on other cards.
Here is what I've tried so far (with assertions in the code):
That the pointer P is valid and allocated enough memory to hold the image
Texturing is enabled: glIsEnabled(GL_TEXTURE_2D)
Test that the currently bound texture is the one I expect: glGetIntegerv(GL_TEXTURE_2D_BINDING)
Test that the currently bound texture has the dimensions I expect: glGetTexLevelParameteriv( GL_TEXTURE_WIDTH / HEIGHT )
Test that no errors have been reported: glGetError
It passes all those test and then still fails with the message.
I feel I've tried everything and have no more ideas. I really hope some GL-guru here can help!
EDIT:
After concluded it is probably a driver bug I posted about it here too: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=295137#Post295137
I also tried GL_PACK_ALIGNMENT and it didn't help.
By some more investigation I found that it only happened on textures that I have previously filled with pixels using a call to glCopyTexSubImage2D. So I could produce a workaround by replacing the glCopyTexSubImage2d call with calls to glReadPixels and then glTexImage2D instead.
Here is my updated code:
{
glCopyTexSubImage2D cannot be used here because the combination of calling
glCopyTexSubImage2D and then later glGetTexImage on the same texture causes
a crash in atioglxx.dll on ATI Radeon X1650 and X1550.
Instead we copy to the main memory first and then update.
}
// glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, PixelWidth, PixelHeight); //**
GetMem(P, PixelWidth * PixelHeight * 4);
glReadPixels(0, 0, PixelWidth, PixelHeight, GL_RGBA, GL_UNSIGNED_BYTE, P);
SetMemory(P,GL_RGBA,GL_UNSIGNED_BYTE);
You might take care of the GL_PACK_ALIGNEMENT. This parameter told you the closest byte count to pack the texture. Ie, if you have a image of 645 pixels:
With GL_PACK_ALIGNEMENT to 4 (default value), you'll have 648 pixels.
With GL_PACK_ALIGNEMENT to 1, you'll have 645 pixels.
So ensure that the pack value is ok by doing:
glPixelStorei(GL_PACK_ALIGNMENT, 1)
Before your glGetTexImage(), or align your memory texture on the GL_PACK_ALIGNEMENT.
This is most likely a driver bug. Having written 3D apis myself it is even easy to see how. You are doing something that is really weird and rare to be covered by test: Convert float data to 8 bit during upload. Nobody is going to optimize that path. You should reconsider what you are doing in the first place. The generic conversion cpu conversion function probably kicks in there and somebody messed up a table that drives allocation of temp buffers for that. You should really reconsider using an external float format with an internal 8 bit format. Conversions like that in the GL api usually point to programming errors. If you data is float and you want to keep it as such you should use a float texture and not RGBA. If you want 8 bit why is your input float?