I'm developing a Java graphical application with jogl and OpenGL at the Linux. My application contains over 30 shaders and they work fine in most cases. But about once a week there is a driver (amdgpu pro) error (SIGSEGV).
Please tell me, is OpenGL safe language: It is protected from errors by the application program or incorrect actions of the application can cause damage to the memory of the driver (writing to someone else's memory or data race). In what do I look for the cause of the error (SIGSEGV) in the incorrect driver (amdgpu pro) or in the errors of the application itself? (The glGetError show that all fine at each application step).
In general, there are going to be plenty of ways to crash your program when you are using the OpenGL API. Buggy drivers are an unfortunate reality that you cannot avoid completely, and misuse of the API in creative ways can cause crashes instead of errors. In fact, I have personally caused computers to completely hang (unresponsive) on multiple platforms and different GPU vendors, even when using WebGL which is supposedly "safe".
So the only possible answer is "no, OpenGL is not safe."
Some tips for debugging OpenGL:
Don't use glGetError, use KHR_debug instead (unless it's not available).
Create a debug context with GL_CONTEXT_FLAG_DEBUG_BIT.
Use a less buggy OpenGL implementation when you are testing. In my experience, the Mesa implementation is very stable.
Is OpenGL 4.3 "safe"? Absolutely not. There are many things you can do that can crash the program. Having a rendering operation read from past the boundaries of a buffer, for example. 4.3 has plenty of ways of doing that.
Indeed, simply writing a shader that executes for too long can cause a GPU failure.
You could in theory read GPU memory that was written by some other application, just by reading from an uninitialized buffer.
There is no simple way to tell whether a particular crash was caused by a driver bug or by user error. You have to actually debug it and have a working understanding of the OpenGL specification to know for sure.
Related
We are running into issues with an old closed-source game engine failing to compile shaders when memory nears 2GB.
The issue is usually with D3DXCreateEffect. Usually it returns HResult "out of memory", sometimes d3dx9_25.dll prints random errors in a popup, or it just outright segfault.
I believe the issue is lack of Large Address Awareness: I noticed one of the d3dx9_25.dll crashes doing something that would hint as such. It took a valid pointer that looked like 0x8xxxxxx3, checked that bits 0x80000003 are lit and if yes, it bit inverts the pointer and derefs it. The resulting pointer pointed to unallocated memory. Forcing the engine to malloc 2GB before compilation makes the shaders fail to compile every time.
Unfortunately our knowledge of DX9 is very limited, I've seen that DX9 has a flag D3DXCONSTTABLE_LARGEADDRESSAWARE but I'm not sure where exactly Its supposed to go. The only API call the game uses that I can find relies on it is D3DXGetShaderConstantTable, but the issues happen before it is ever called. Injecting the flag (1 << 17) = 0x20000 to D3DXCreateEffect makes the shader fail compilation in another way.
Is D3DXCreateEffect supposed to accept the Large Address Aware flag? I found a wine test using it, but digging into DX9 assembly, the error it throws is caused by an internal function returning HResult Invalid Call when any bit out of FFFFF800 in flags is set, which leads me to believe CreateEffect is not supposed to accept this flag.
Is there anywhere else I should be injecting the Large Address Aware flag before this? I understand that a call to D3DXGetShaderConstantTable will need to be fixed to use D3DXGetShaderConstantTableEx, but its not even reached yet.
LargeAddressAware is a bit of a hack, so it may or may not help your case. It really only helps if your application needs a little more room close to 2GB of VA, not if if needs a lot more.
A key problem with the legacy DirectX SDK Direct3D 9 era effects system is that it assumed the high-bit of the effect "handle" was free so it could use it, and without the bit the handle was an address to a string. This assumption is not true for LargeAddressAware.
To enable this, you define D3DXFX_LARGEADDRESS_HANDLE before including d3dx9.h headers. You then must use the D3DXFX_LARGEADDRESSAWARE flag when creating all effects. You must also not use the alias trick where you can use a "string name" instead of a "handle" on all the effect methods. Instead you have to use GetParameterByName to get the handle and use that instead.
What I can't remember is when the LAA flag was added to Effects for Direct3D 9.
If you are using d3dx9_25.dll then that's the April 2005 release of the DirectX SDK. If you are using "Pixel Shader Model 1.x" then you can't use any version newer than d3dx9_31.dll (October 2006)--later versions of the DirectX SDK let you use D3DXSHADER_USE_LEGACY_D3DX9_31_DLL which just passed through shader compilation to the older version for this scenario.
A key reason that many 32-bit games would fail and then work with LAA enabled was because of virtual memory fragmentation. Improving your VA memory layout can making your allocations more uniform can help too.
The issue we were having with CreateEffect not accepting the LargeAddressAware flag is pretty obvious in hindsight, the dx9 version the engine is using (d3dx9_25.dll) simply did not have this feature yet.
Our options, other than optimizing our memory usage are:
Convert all our pixel shaders 1.x to 2.0 and force the engine to load a newer version of d3dx9, hope the engine is not relying on bugs of d3dx9_25.dll or the alias trick, then inject the LargeAddressAware flag bit there.
Wrap malloc, either avoiding giving handles large addresses (I am unsure if this is also required inside the dll as well) or stick enough other data in large addresses so dx9 related mallocs don't reach it.
When running a simple OpenGL application in windows there are two unknown threads .I want to know what is these threads in application ?are there any documentation about them? our application crash in one of this threads in first step i want to know what is these thread? .
and this is dump of nvoglv64:
Those threads are not something specific to OpenGL; OpenGL doesn't know anything about threads, because technically it's just a piece of text, namely the specification.
However in your case it's very likely that those threads are created by the OpenGL implementation (aka your graphics driver). As you can see those threads seem to be tasked with copying some data. Which suggest they crash, because you either give OpenGL
some invalid pointer
or invalid metrics for the pointer (size of the buffer, stride, etc.)
or you're deallocating / freeing memory in a different thread while OpenGL still access it from the OpenGL context thread.
In either case it's not the threads fault that the program crashes, but your lack of either supplying OpenGL with valid data, or to properly lock/synchronize with OpenGL so that you don't invalidate the buffers in mid-operation.
Update
And this crash happening with Application Verifier suggests, that something about Application Verifier messes up memory used some way by OpenGL. This is very likely a bug in Application Verifier, but I think the best course of action would be to inform NVidia of the problem, so that they can address the problem with a workaround in their drivers.
I'm writing a little game using OpenGL (in Java through JOGL, but I hardly think that matters), and I have lately been getting quite some error reports of people getting OpenGL errors "1285", which seem to indicate "out of memory". I catch this when checking glGetError after having created new textures, which lends me the feeling that I'm running out of texture memory.
However, this surprises me a bit. Isn't OpenGL supposed to manage texture memory for me, swapping textures between the GPU and process memory as necessary? The specifications for glTexImage2D, for sure, does not include any "out of memory" error as any of the possible error conditions.
Is this generally accepted practice for OpenGL drivers in spite of the specifications? Are only some drivers doing this in spite of the specifications? Do I need to take care to delete textures that haven't been used for a while if I catch this error after glTexImage2D? Or am I perhaps seeing something completely else here that OpenGL's error reporting doesn't quite concisely convey to me?
Edit: For further information, I cannot, unfortunately, debug the problem myself, because I'm not getting it. Reading from the reports people send me, the vast majority of those afflicted by this appear to be using Intel cards, but I've spotted a few nVidia cards as well (even a 680).
It's only a guess, but your program may suffer from address space fragmentation. And if that is the case it indeed matters, that you're running in a Java runtime.
OpenGL, namely the implementation, must keep copies of all data objects, so that it can swap them out on demand. But those copies, they need address space of your process. And if your process environment does a lot of allocations/deallocations, which is the very nature of Java (creating an object for just about everything) it can happen that your address space gets fragmented to the point, that larger chucks can not longer be allocated.
A few points to check: Is your program running on a 32 bit or a 64 bit JRE. If it's a 32 bit executable, try out what happens if you use a 64 bit JRE. If the issues vanish in a 64 bit environment, while on the same machine in a 32 bit environment they are there, it's a address space fragmentation issue for sure.
I am creating a GUI program that will run 24/7. I couldn't find much online on the subject, but is OpenGL stable enough to run 24/7 for weeks on end without leaks, crashes, etc?
Should I have any concerns or anything to look into before delving too deep into using OpenGL?
I know that OpenGL and DirectX are primarily used for games or other programs that aren't used for very long lengths. Hopefully someone here has some experience with this or knowledge on the subject. Thanks.
EDIT: Sorry for the lack of detail. This will only be doing 2D rendering, and nothing too heavy, what I have now (which will be similar to production) already runs at a stable 900-1000 FPS on my i5 laptop with Radeon 6850m
Going into OpenGL just for making a GUI sounds insane. You should be worried more about what language you use if you are concerned about stuff like memory leaks. Remember that in C/C++ you manage memory on your own.
Furthermore, do you really need the GUI to be running 24/7? If you are making a service sort of application, you might as well leave it in the background and make a second application which provides the GUI. These two applications would communicate via soma IPC (sockets?). That's how this sort of thing usually works, not having a window open all the time.
In the end, memory leaks are not caused by some graphical library, but more by the programmer writing bad code. The library should be the last in your list of possible reasons for memory leaks/creashes.
I work for a company that makes (windows based) quality assurance software (machine vision) using Delphi.
The main operator screen shows the camera images at up to 20fps (2 x 10fps) with opengl overlay, and has essentially unbounded uptime (longest uptimes close to an year, longer is hard due to power downs for maintenance). Higher speed cameras have their display rates throttled.
I would avoid integrated video from intel for a while longer though. Since i5 it meets our minimal requirements (non power of 2 textures mostly), but the initial drivers were bad, and while they have improved there are occasional stability and regularity problems still.
I am getting weird problem, the crash happens at random times, for example i managed to use the 3d app for a while without crashing, then most of the times it crashes always when i suddenly render a lot of objects at same time.
I have recently noticed that changing the huge texture surface on this ATI card will crash my whole computer when using huge texture sizes and a lot of them and switching from one to another in one frame. So it is a possibility i have broken ATI card or just buggy one. But it is imporobable since i've added some code lately and now i have noticed this crash first time. I didnt use any special opengl calls, just the good old glbegin() glend() glcolor() etc...
If i comment out the line where it crashed previously, for example glBegin(GL_QUADS) ... glEnd() Then next time i get crash on different openGL function call at different place of my code, for example glColor4f() and then i comment that out, and the next crash i get at glClear() at totally different part of the rendering code!
What could be causing these? Im using ATI card, and i am aware some opengl calls may crash the program if they are using improper values, like glLineWidth(4) will crash some ATI cards on a random line of openGL code because the max line width is 3!
Edit:
When i run the program in debug mode with ApplicationVerifier, it throws me this line:
if(!(PixelFormat = ChoosePixelFormat(hDC, &pfd))){
I dont understand, what could possibly be wrong on it?
pfd:
static PIXELFORMATDESCRIPTOR pfd = {
// *correct amount of elements*
};
IMO, chances are pretty good that the crash in OpenGL is only a symptom, and the real problem lies elsewhere. In general, your description sounds more or less typical for resource misuse (e.g., memory leakage, using a dangling pointer, trashing the heap, etc.)
Something like a driver bug is certainly possible -- in fact a graphics driver is big and complex enough that some bugs are probably almost inevitable. The obvious test for that would be to run other code that uses OpenGL and see if it works dependably. It's always possible that you're using an execution path that contains a bug, but is so obscure almost nothing else uses it so the bug isn't otherwise triggered -- but given that the crash isn't happening in a fixed location, that seems fairly unlikely (still possible, just unlikely). If a graphics driver has a bug (especially one serious enough to cause crashes, not just mis-rendering), it usually becomes known pretty quickly.
Such random behaviour is usually the symptom of a stack/heap corruption. You should check that you're not corrupting the heap and/or the stack. Buggy drivers is also a option, since crashing on a invalid value is a bug, that should not crash and instead produce a GL error.