I'm experiencing a difficult problem on certain ATI cards (Radeon X1650, X1550 + and others).
The message is: "Access violation at address 6959DD46 in module 'atioglxx.dll'. Read of address 00000000"
It happens on this line:
glGetTexImage(GL_TEXTURE_2D,0,GL_RGBA,GL_FLOAT,P);
Note:
Latest graphics drivers are installed.
It works perfectly on other cards.
Here is what I've tried so far (with assertions in the code):
That the pointer P is valid and allocated enough memory to hold the image
Texturing is enabled: glIsEnabled(GL_TEXTURE_2D)
Test that the currently bound texture is the one I expect: glGetIntegerv(GL_TEXTURE_2D_BINDING)
Test that the currently bound texture has the dimensions I expect: glGetTexLevelParameteriv( GL_TEXTURE_WIDTH / HEIGHT )
Test that no errors have been reported: glGetError
It passes all those test and then still fails with the message.
I feel I've tried everything and have no more ideas. I really hope some GL-guru here can help!
EDIT:
After concluded it is probably a driver bug I posted about it here too: http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=295137#Post295137
I also tried GL_PACK_ALIGNMENT and it didn't help.
By some more investigation I found that it only happened on textures that I have previously filled with pixels using a call to glCopyTexSubImage2D. So I could produce a workaround by replacing the glCopyTexSubImage2d call with calls to glReadPixels and then glTexImage2D instead.
Here is my updated code:
{
glCopyTexSubImage2D cannot be used here because the combination of calling
glCopyTexSubImage2D and then later glGetTexImage on the same texture causes
a crash in atioglxx.dll on ATI Radeon X1650 and X1550.
Instead we copy to the main memory first and then update.
}
// glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, PixelWidth, PixelHeight); //**
GetMem(P, PixelWidth * PixelHeight * 4);
glReadPixels(0, 0, PixelWidth, PixelHeight, GL_RGBA, GL_UNSIGNED_BYTE, P);
SetMemory(P,GL_RGBA,GL_UNSIGNED_BYTE);
You might take care of the GL_PACK_ALIGNEMENT. This parameter told you the closest byte count to pack the texture. Ie, if you have a image of 645 pixels:
With GL_PACK_ALIGNEMENT to 4 (default value), you'll have 648 pixels.
With GL_PACK_ALIGNEMENT to 1, you'll have 645 pixels.
So ensure that the pack value is ok by doing:
glPixelStorei(GL_PACK_ALIGNMENT, 1)
Before your glGetTexImage(), or align your memory texture on the GL_PACK_ALIGNEMENT.
This is most likely a driver bug. Having written 3D apis myself it is even easy to see how. You are doing something that is really weird and rare to be covered by test: Convert float data to 8 bit during upload. Nobody is going to optimize that path. You should reconsider what you are doing in the first place. The generic conversion cpu conversion function probably kicks in there and somebody messed up a table that drives allocation of temp buffers for that. You should really reconsider using an external float format with an internal 8 bit format. Conversions like that in the GL api usually point to programming errors. If you data is float and you want to keep it as such you should use a float texture and not RGBA. If you want 8 bit why is your input float?
Related
I am uploading image data into GL texture asynchronously.
In debug output I am getting these warnings during the rendering:
Source:OpenGL,type: Other, id: 131185, severity: Notification
Message: Buffer detailed info: Buffer object 1 (bound to
GL_PIXEL_UNPACK_BUFFER_ARB, usage hint is GL_DYNAMIC_DRAW) has been
mapped WRITE_ONLY in SYSTEM HEAP memory (fast). Source:OpenGL,type:
Performance, id: 131154, severity: Medium Message: Pixel-path
performance warning: Pixel transfer is synchronized with 3D rendering.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
My code for that part:
//start copuying pixels into PBO from RAM:
mPBOs[mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
const uint32_t buffSize = pipe->GetBufferSize();
GLubyte* ptr = (GLubyte*)mPBOs[mCurrentPBO].MapRange(0, buffSize, GL_MAP_WRITE_BIT | GL_MAP_INVALIDATE_BUFFER_BIT);
if (ptr)
{
memcpy(ptr, pipe->GetBuffer(), buffSize);
mPBOs[mCurrentPBO].Unmap();
}
//copy pixels from another already full PBO(except of first frame into texture //
mPBOs[1 - mCurrentPBO].Bind(GL_PIXEL_UNPACK_BUFFER);
//mCopyTex is bound to mCopyFBO as attachment
glTextureSubImage2D(mCopyTex->GetHandle(), 0, 0, 0, mClientSize.x, mClientSize.y,
GL_RGBA, GL_UNSIGNED_BYTE, 0);
mCurrentPBO = 1 - mCurrentPBO;
Then I just blit the result to default frame buffer. No rendering of geometry or anything like that.
glBlitNamedFramebuffer(
mCopyFBO,
0,//default FBO id
0,
0,
mViewportSize.x,
mViewportSize.y,
0,
0,
mViewportSize.x,
mViewportSize.y,
GL_COLOR_BUFFER_BIT,
GL_LINEAR);
Running on NVIDIA GTX 960 card.
This performance warning is nividia-specific and it is intended as a hint to tell you that you're not going to use a separate hw transfer queue, which is no wonder since you use a single thread, single GL context model, where both rendering (at least your your blit) and transfer are carried out.
See this nvidia presentation for some details about how nvidia handles this. Page 22 also explains this specific warning. Note that this warnign does not mean that your transfer is not asynchronous. It is still fully asynchronous to the CPU thread. It will just be synchronously processed on the GPU, with respect to the render commands which are in the same command queue, and you're not using the asynchronous copy engine which could do these copies independent from the rendering commands in a separate command queue.
I can't see any wrong usage of PBOs in my case or any errors.So the questions is, if these warnings are safe to discard, or I am actually doing smth wrong.
There is nothing wrong with your PBO usage.
It is not clear if your specific application could even benefit from using a more elaborate separate transfer queue scheme.
I need to take sceenshots at every frame and I need very high performance (I'm using freeGlut). What I figured out is that it can be done like this inside glutIdleFunc(thisCallbackFunction)
GLubyte *data = (GLubyte *)malloc(3 * m_screenWidth * m_screenHeight);
glReadPixels(0, 0, m_screenWidth, m_screenHeight, GL_RGB, GL_UNSIGNED_BYTE, data);
// and I can access pixel values like this: data[3*(x*512 + y) + color] or whatever
It does work indeed but I have a huge issue with it, it's really slow. When my window is 512x512 it runs no faster than 90 frames per second when only cube is being rendered, without these two lines it runs at 6500 FPS! If we compare it to irrlicht graphics engine, there I can do this
// irrlicht code
video::IImage *screenShot = driver->createScreenShot();
const uint8_t *data = (uint8_t*)screenShot->lock();
// I can access pixel values from data in a similar manner here
and 512x512 window runs at 400 FPS even with a huge mesh (Quake 3 Map) loaded! Take into account that I'm using openGL as driver inside irrlicht. To my inexperienced eye it seems like glReadPixels is copying every pixel data from one place to another while (uint8_t*)screenShot->lock() is just copying a pointer to already existent array. Can I do something similar to latter using freeGlut? I expect it to be faster than irrlicht.
Note that irrlicht uses openGL too (well it offers directX and other options as well but in the example I gave above I used openGL and by the way it was the fastest compared to other options)
OpenGL methods are used to manage the rendering pipeline. In its nature, while the graphics card is showing image to the viewer, computations of the next frame are being done. When you call glReadPixels; graphics card wait for the current frame to be done, reads the pixels and then starts computing the next frame. Therefore pipeline becomes stalled and becomes sequential.
If you can hold two buffers and tell to the graphics card to read data into these buffers interchanging each frame; then you can read-back from your buffer 1-frame late but without stalling the pipeline. This is called double buffering. You can also do triple buffering with 2 frame late read-back and so on.
There is a relatively old web page describing the phenomenon and implementation here: http://www.songho.ca/opengl/gl_pbo.html
Also there are a lot of tutorials about framebuffers and rendering into a texture on the web. One of them is here: http://www.opengl-tutorial.org/intermediate-tutorials/tutorial-14-render-to-texture/
My application is going to take the rendered results from openGL (both depth map and the rendered 2D image information)
to CUDA for processing.
One way I did is to retrieve image/depth map by glReadPixel(..., image_array_HOST/depth_array_Host)*, and then pass image_HOST/depth_HOST to CUDA
by cudaMemcpy(..., cudaMemcpyHostToDevice). I have done this part, although it sounds redundant. (from GPU>CPU>GPU).
*image_array_HOST/depth_array_Host are array I define on host.
Another way is to use openGL<>cuda interpol.
First step is to create one buffer in openGL, and then pass image/depth information to that pixel buffer.
Also one cuda token is registered and linked to that buffer. And then link the matrix on CUDA to that cuda token.
(as far as I know, seems there is no a direct way to link pixel buffer to cuda matrix, there should be a cudatoken for openGL to recognize. Please, correct me if I ma wrong.)
I have also done this part. It thought it should be fairly efficicent becasue the data CUDA is processing was
not transferred to anywhere, but just at where it is located on openGL. It is a data processing inside the device(GPU).
However, the spent time I got from the 2nd method is even (slightly) longerr than the first one (GPU>CPU>GPU).
That really confuses me.
I am not sure if I missed any part, or maybe I didn't do it in an efficient way.
One thing I am also not sure is glReadPixel(...,*data).
In my understanding, if *data is a pointer linking to memory on HOST, then it will do the data transferring from GPU>CPU.
If *data=0, and one buffer is bind, then the data will be transferred to that buffer, and it should be a GPU>GPU thing.
Maybe some other method can pass the data more efficiently then glReadPixel(..,0).
Hope some people can explain my question.
Following is my code:
--
// openGL has finished its rendering, and the data are all save in the openGL. It is ready to go.
...
// declare one pointer and memory location on cuda for later use.
float *depth_map_Device;
cudaMalloc((void**) &depth_map_Device, sizeof(float) * size);
// inititate cuda<>openGL
cudaGLSetGLDevice(0);
// generate a buffer, and link the cuda token to it -- buffer <>cuda token
GLuint gl_pbo;
cudaGraphicsResource_t cudaToken;
size_t data_size = sizeof(float)*number_data; // number_data is defined beforehand
void *data = malloc(data_size);
glGenBuffers(1, &gl_pbo);
glBindBuffer(GL_ARRAY_BUFFER, gl_pbo);
glBufferData(GL_ARRAY_BUFFER, size, data, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
cudaGraphicsGLRegisterBuffer(&cudaToken, gl_pbo, cudaGraphicsMapFlagsNone); // now there is a link between gl_buffer and cudaResource
free(data);
// now it start to map(link) the data on buffer to cuda
glBindBuffer(GL_PIXEL_PACK_BUFFER, gl_pbo);
glReadPixels(0, 0, width, height, GL_RED, GL_FLOAT, 0);
// map the rendered data to buffer, since it is glReadPixels(..,0), it should be still fast? (GPU>GPU)
// width & height are defined beforehand. It can be GL_DEPTH_COMPONENT or others as well, just an example here.
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, gl_pbo);
cudaGraphicsMapResources(1, &cudaToken, 0); // let cufaResource which has a link to gl_buffer to the the current CUDA windows
cudaGraphicsResourceGetMappedPointer((void **)&depth_map_Device, &data_size, cudaToken); // transfer data
cudaGraphicsUnmapResources(1, &cudaToken, 0); // unmap it, for the next round
// CUDA kernel
my_kernel <<<block_number, thread_number>>> (...,depth_map_Device,...);
I think I can answer my question partly now, and hope it is useful for some people.
I was binding pbo to a float cuda (GPU) memory, but seems the openGL raw image rendered data is unsigned char format, (following is my supposition) so this data need to be transformed to float and then pass to cuda memory. I think what openGL did is using CPU to do this format transformation, and that is why there is no big difference between with and without using pbo.
By using unsigned char (glreadpixel(..,GL_UNSIGNED_BYTE,0)), binding with pbo is quicker than without using pbo for reading RGB data. And then I pass it do a simple cuda kernel to do the format transformation, which is more efficient than what openGL did. By doing this the speed is much quicker.
However, it doesnt work for depth buffer.
For some reason, reading depth map by glreadpixel (no matter with/without pbo) is slow.
And then, I found two old discussions:
http://www.opengl.org/discussion_boards/showthread.php/153121-Reading-the-Depth-Buffer-Why-so-slow
http://www.opengl.org/discussion_boards/showthread.php/173205-Saving-Restoring-Depth-Buffer-to-from-PBO
They pointed out the format question, and that is exactly what I found for RGB. (unsigned char). But I have tried unsigned char/unsigned short and unsigned int, and float for reading depth buffer, all performance almost the same speed.
So I still have speed problem for reading depth.
A while ago I converted a C# program of mine to use OpenGL and found it ran perfectly (and faster) on my Computer at Home. However, I have 2 issues. Firstly, the code I use to free textures from the graphics card doesn't word, it gives me a memory access violation exception at runtime. Secondly, most of the graphics don't work on any other machine but mine.
By accident, I managed to convert some of the graphics to 8-bit PNGs (all the others are 32bit) and these work fine on other machines. Recognising this, I attempted to regulate the quality when loading the images. My attempts failed (this was a while ago, I think they largely involved trying to format a bitmap then using the GDI to draw the texture onto it, creating a lower quality version). Is there any way in .NET to take a bitmap and nicely change the quality? The code concerned is below. I recall it is largely based on some I found on Stack Overflow in the past, but which didn't quite suit my needs. 'img' as a .NET Image, and 'd' is an integer dimension, which I use to ensure the images are square.
uint[] output = new uint[1];
Bitmap bMap = new Bitmap(img, new Size(d, d));
System.Drawing.Imaging.BitmapData bMapData;
Rectangle rect = new Rectangle(0, 0, bMap.Width, bMap.Height);
bMapData = bMap.LockBits(rect, System.Drawing.Imaging.ImageLockMode.ReadOnly, bMap.PixelFormat);
gl.glGenTextures(1, output);
gl.glBindTexture(gl.GL_TEXTURE_2D, output[0]);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_MAG_FILTER, gl.GL_NEAREST);
gl.glTexParameteri(gl.GL_TEXTURE_2D,gl.GL_TEXTURE_MIN_FILTER, gl.GL_NEAREST);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_WRAP_S, gl.GL_CLAMP);
gl.glTexParameteri(gl.GL_TEXTURE_2D, gl.GL_TEXTURE_WRAP_T, gl.GL_CLAMP);
gl.glPixelStorei(gl.GL_UNPACK_ALIGNMENT, 1);
if (use16bitTextureLimit)
gl.glTexImage2D(gl.GL_TEXTURE_2D, 0, gl.GL_RGBA_FLOAT16_ATI, bMap.Width, bMap.Height, 0, gl.GL_BGRA, gl.GL_UNSIGNED_BYTE, bMapData.Scan0);
else
gl.glTexImage2D(gl.GL_TEXTURE_2D, 0, gl.GL_RGBA, bMap.Width, bMap.Height, 0, gl.GL_BGRA, gl.GL_UNSIGNED_BYTE, bMapData.Scan0);
bMap.UnlockBits(bMapData);
bMap.Dispose();
return output;
The 'use16bitTextureLimit' is a bool, and I rather hoped the code shown would reduce the quality to 16bit, but I havn't noticed any difference. It may be that this works and the Graphics cards still don't like it. I was unable to find any indication of a way to use 8-bit PNgs.
This is in a function which returns the uint array (as a texture address) for use when rendering. The faulty texture disposale simply involves: gl.glDeleteTextures(1, imgsGL[i]); Where imgGL is an array of unit arrays.
As said, the rendering is fine on some computers, and the texture deletion causes a runtime error on all systems (except my netbook, where I can't create textures atall, though I think that may be linked to the quality issue).
If anyone can provide any info of relevance, that would be great. I've spent many days on the program, and would really like to more compatible with less good graphics cards.
The kind of access violation you encounter usually happens if the call to glTexImage2D causes a buffer overrun. Double check that all the glPixelStore parameters related to unpacking are properly set and that the format parameter (the second one that is) matches the type and size of the data you supply. I know this kind of bg very well, and those are the first checks I usually do, whenever I encounter it.
For the texture not showing up: Did you check, that the texture's dimensions are actually powers of two each?In C using a macro the test for power of two can be written like this (this one boils down to testing, that there's only one of the bits of a integer is set)
#define ISPOW2(x) ( x && !( (x) & ((x) - 1) ) )
It is not neccessary that a texture image is square, though. Common misconception, but you really just have to make sure that each dimension is a power of 2. A 16×128 image is perfectly fine.
Changing the internal format to GL_RGBA_FLOAT16_ATI will probably even increase quality, but one can not be sure, as GL_RGBA may coerce to to anything the driver sees fit. Also this is a vendor specific format, so I disregard it's use. There are all kinds of ARB formats, also a half float one (which FLOAT16_ATI is).
I am displaying a texture that I want to manipulate without out affecting the image data. I want to be able to clamp the texel values so that anything below the lower value becomes 0, anything above the upper value becomes 0, and anything between is linearly mapped from 0 to 1.
Originally, to display my image I was using glDrawPixels. And to solve the problem above I would create a color map using glPixelMap. This worked beautifully. However, for performance reasons I have begun using textures to display my image. The glPixelMap approach no longer seems to work. Well that approach may work but I was unable to get it working.
I then tried using glPixelTransfer to set scales and bias'. This seemed to have some sort of effect (not necessarily the desired) on first pass, but when the upper and lower constraints were changed no effect was visible.
I was then told that fragment shaders would work. But after a call to glGetString(GL_EXTENSIONS), I found that GL_ARB_fragment_shader was not supported. Plus, a call to glCreateShaderObjectARB cause a nullreferenceexception.
So now I am at a loss. What should I do? Please Help.
What ever might work I am willing to try. The vendor is Intel and the renderer is Intel 945G. I am unfortunately confined to a graphics card that is integrated on the motherboard, and only has gl 1.4.
Thanks for your response thus far.
Unless you have a pretty old graphics-card, it's surprising that you don't have fragment-shader support. I'd suggest you try double-checking using this.
Also, are you sure you want anything above the max value to be 0? Perhaps you meant 1? If you did mean 1 and not 0 then are quite long-winded ways to do what you're asking.
The condensed answer is that you use multiple rendering-passes. First you render the image at normal intensity. Then you use subtractive blending (look up glBlendEquation) to subtract your minimum value. Then you use additive blending to multiply everything up by 1/(max-min) (which may need multiple passes).
If you really want to do this, please post back the GL_VENDOR and GL_RENDERER for your graphics-card.
Edit: Hmm. Intel 945G don't have ARB_fragment_shader, but it does have ARB_fragment_program which will also do the trick.
Your fragment-code should look something like this (but it's been a while since I wrote any so it's probably bugged)
!!ARBfp1.0
ATTRIB tex = fragment.texcoord[0]
PARAM cbias = program.local[0]
PARAM cscale = program.local[1]
OUTPUT cout = result.color
TEMP tmp
TXP tmp, tex, texture[0], 2D
SUB tmp, tmp, cbias
MUL cout, tmp, cscale
END
You load this into OpenGL like so:
GLuint prog;
glEnable(GL_FRAGMENT_PROGRAM_ARB);
glGenProgramsARB(1, &prog);
glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, prog);
glProgramStringARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(src), src);
glDisable(GL_FRAGMENT_PROGRAM_ARB);
Then, before rendering your geometry, you do this:
glEnable(GL_FRAGMENT_PROGRAM_ARB);
glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, prog);
colour4f cbias = cmin;
colour4f cscale = 1.0f / (cmax-cmin);
glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 0, cbias.r, cbias.g, cbias.b, cbias.a);
glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 1, cscale.r, cscale.g, cscale.b, cscale.a);
//Draw your textured geometry
glDisable(GL_FRAGMENT_PROGRAM_ARB);
Also see if the GL_ARB_fragment_program extension is supported. That extension supports the ASM style fragment programs. That is supposed to be supported in OpenGL 1.4.
It's really unfortunate that you're using such an ancient version of OpenGL. Can you upgrade with your card?
For a more modern OGL 2.x, this is exactly the kind of program that GLSL is for. Great documentation can be found here:
OpenGL Documentation
OpenGL Shading Langauge