glDrawArray+VBO increasing memory footprint

glDrawArray+VBO increasing memory footprint - opengl

I am writing a Windows based OpenGL viewer application.
I am using VBO + triangle strip + glDrawArrays method to render my meshes. Every thing is perfectly working on all machines.
In case of Windows Desktop with nVidia Quadro cards the working/peak working memory shoots when i first call glDrawArray.
While in case of laptops having nvidia mobile graphic cards the working memory or peak working memory does not shoot. Since last few days i am checking almost all forums/post/tuts about VBO memory issue. Tried all combinations of VBO like GL_STATIC_DRAW/DYNAMIC/STREAM, glMapbuffer/glunmapbuffer. But nothing stops shooting memory on my desktops.
I suspect that for VBO with ogl 1.5 i am missing some flags.
PS: I have almost 500 to 600 VBO's in my application. I am using array of structures ( i.e. v,n,c,t together in a structure). And I am not aligning my VBOs to 16k memory.
Can any one suggest me how I should go ahead to solve this issue. Any hints/pointers would be helpful.

Do you actually run out of memory or does your application increasingly consume memory? If not, why bother? If the OpenGL implementation keeps a working copy for itself, then this is probably for a reason. Also there's little you can do on the OpenGL side to avoid this, since it's entirely up to the driver how it manages its stuff. I think the best course of action, if you really want to keep the memory footprint low, is contacting NVidia, so that they can double check if this may be a bug in their drivers.

Related

OpenGL: Memory (texture storage) gets corrupted when running lots of applications

I'm building an OpenGL based application. It uses some high-res textures. Sometimes, when switching to other applications running on my computer, and coming back, the memory of some of my texture storages gets corrupted . I am not 100% sure of what is happening, but it feels like the driver is short on memory so it steals some blocks of memory of the texture storage of my application and gives it to other applications. Once I come back to my application, there are black rectangular holes in some of my textures.
I can totally understand that the system runs out of VRAM and things like this happen, but I would like to be informed about this when this happens, so when the user returns to the application, I can restore memory buffers if they got invalidated.
Is this behaviour normal, or is the driver supposed to sort of swap out texture data and restore it later (and is not doing that or failing to do so)? If this behaviour is normal, how do I detect that this happened and how do I deal with this?
For the sake of completeness: I'm experiencing this on macOS Sierra on a MacBook Pro 8,1 which has an Intel HD Graphics 3000 and 16 GB of ram.

opengl fixed function uses gpu or cpu?

I have a code which basically draws parallel coordinates using opengl fixed func pipeline.
The coordinate has 7 axes and draws 64k lines. SO the output is cluttered, but when I run the code on my laptop which has intel i5 proc, 8gb ddr3 ram it runs fine. One of my friend ran the same code in two different systems both having intel i7 and 8gb ddr3 ram along with a nvidia gpu. In those systems the code runs with shuttering and sometimes the mouse pointer becomes unresponsive. If you guys can give some idea why this is happening, it would be of great help. Initially I thought it would run even faster in those systems as they have a dedicated gpu. My own laptop has ubuntu 12.04 and both the other systems have ubuntu 10.x.

Fixed function pipeline is implemented using gpu programmable features in modern opengl drivers. This means most of the work is done by the GPU. Fixed function opengl shouldn't be any slower than using glsl for doing the same things, but just really inflexible.
What do you mean by coordinates having axes and 7 axes? Do you have screen shots of your application?
Mouse stuttering sounds like you are seriously taxing your display driver. This sounds like you are making too many opengl calls. Are you using immediate mode (glBegin glVertex ...)? Some OpenGL drivers might not have the best implementation of immediate mode. You should use vertex buffer objects for your data.

Maybe I've misunderstood you, but here I go.
There are API calls such as glBegin, glEnd which give commands to the GPU, so they are using GPU horsepower, though there are also calls to arrays, other function which have no relation to API - they use CPU.
Now it's a good practice to preload your models outside the onDraw loop of the OpenGL by saving the data in buffers (glGenBuffers etc) and then use these buffers(VBO/IBO) in your onDraw loop.
If managed correctly it can decrease the load on your GPU/CPU. Hope this helps.
Oleg

Determine limit for data sent to VBO?

I'm writing a 3d application for the Playbook, which has a PowerVR SGX 540. I noticed that if I stuff enough data in the VBO through opengl, I can cause the device to crash (not just the application, but the entire device, requiring a hard reboot). To cause the crash, I sent data for a model with ~300k triangles and ~150k vertices. I sent normal data for the vertices as well.
I found that the problem doesn't occur if I send less data (tried another model with half the triangles and vertices). Also, the issue doesn't occur if I use vertex arrays (though it's incredibly slow).
I'd like to know:
Is what I'm seeing a common result for mobile hardware? That is, is a 300k tri model with 150k vertices and normals overkill?
Can I check how much memory I have available for VBO usage outside of testing a bunch of different model sizes (it takes a good five minutes to recover the device from a crash)?
Could anything else be causing this issue? I've provided some additional information:
I'm using Qt for my GUI, and drawing the 3d scene to an FBO before it's painted to the GUI (I haven't checked if redoing all this without a UI by creating an EGL window and drawing to that recreates the problem yet -- that'll take awhile).
To verify it wasn't me using OpenGL poorly, I tried both using raw OpenGL calls for all the 3d stuff, and also doing everything with OpenSceneGraph. Both methods fail in the exact same way (VBO works with less data, vertex arrays work, increased VBO data causes a crash).
The program works fine on my desktop. Unfortunately, I don't have any other mobile devices I can test my application out on.

OpenGL ES only supports unsigned short (16 bit) as data type for indices, so if you're using an index array, you're over that limit.

C++/opengl application running smoother with debugger attached

Have you experienced a situation, where C++ opengl application is running faster and smoother when executed from visual studio? When executed normally, without debugger, I get lower framerate, 50 instead of 80, and a strange lagging, where fps is diving to about 25 frames/sec every 20-30th frame. Is there a way to fix this?
Edit:
Also we are using quite many display lists (created with glNewList). And increasing the number of display lists seem to increase lagging.
Edit:
The problem seems to be caused by page faults. Adjusting process working set with SetProcessWorkingSetSizeEx() doesn't help.
Edit:
With some large models the problem is easy to spot with procexp-utility's GPU-memory usage. Memory usage is very unstable when there are many glCallList-calls per frame. No new geometry is added, no textures loaded, but gpu-memory-allocation fluctuates +-20 Mbytes. After a while it becomes even worse, and may allocate something like 150Mb in one go.

I believe that what you are seeing is the debugger locking some pages so they couldn't be swapped to be immediately accessible to the debugger. This brings some caveats for OS at the time of process switching and is, in general, not reccommended.
You will probably not like to hear me saying this, but there is no good way to fix this, even if you do.
Use VBOs, or at least vertex arrays, those can be expected to be optimized much better in the driver (let's face it - display lists are getting obsolete). Display lists can be easily wrapped to generate vertex buffers so only a little of the old code needs to be modified. Also, you can use "bindless graphics" which was designed to avoid page faults in the driver (GL_EXT_direct_state_access).

Do you have an nVidia graphics card by any chance? nVidia OpenGL appears to use a different implementation when attached to the debugger. For me, the non-debugger version is leaking memory at up to 1 MB/sec in certain situations where I draw to the front buffer and don't call glClear each frame. The debugger version is absolutely fine.
I have no idea why it needs to allocate and (sometimes) deallocate so much memory for a scene that's not changing.
And I'm not using display lists.

It's probably the thread or process priority. Visual Studio might launch your process with a slightly higher priority to make sure the debugger is responsive. Try using SetPriorityClass() in your app's code:
SetPriorityClass(GetCurrentProcess(), ABOVE_NORMAL_PRIORITY_CLASS);
The 'above normal' class just nudges it ahead of everything else with the 'normal' class. As the documentation says, don't slap on a super high priority or you can screw up the system's scheduler.
In an app running at 60 fps you only get 16ms to draw a frame (less at 80 fps!) - if it takes longer you drop the frame which can cause a small dip in framerate. If your app has the same priority as other apps, it's relatively likely another app could temporarily steal the CPU for some task and you drop a few frames or at least miss your 16 ms window for the current frame. The idea is boosting the priority slightly means Windows comes back to your app more often so it doesn't drop as many frames.

What can cause a reduction in frame rate when upgrading a graphics card?

We have a two-screen DirectX application that previously ran at a consistent 60 FPS (the monitors' sync rate) using a NVIDIA 8400GS (256MB). However, when we swapped out the card for one with 512 MB of RAM the frame rate struggles to get above 40 FPS. (It only gets this high because we're using triple-buffering.) The two cards are from the same manufacturer (PNY). All other things are equal, this is a Windows XP Embedded application and we started from a fresh image for each card. The driver version number is 169.21.
The application is all 2D. I.E. just a bunch of textured quads and a whole lot of pre-rendered graphics (hence the need to upgrade the card's memory). We also have compressed animations which the CPU decodes on the fly - this involves a texture lock. The locks take forever but I've also tried having a separate system memory texture for the CPU to update and then updating the rendered texture using the device's UpdateTexture method. No overall difference in performance.
Although I've read through every FAQ I can find on the internet about DirectX performance, this is still the first time I've worked on a DirectX project so any arcane bits of knowledge you have would be useful. :)
One other thing whilst I'm on the subject; when calling Present on the swap chains it seems DirectX waits for the present to complete regardless of the fact that I'm using D3DPRESENT_DONOTWAIT in both present parameters (PresentationInterval) and the flags of the call itself. Because this is a two-screen application this is a problem as the two monitors do not appear to be genlocked, I'm working around it by running the Present calls through a threadpool. What could the underlying cause of this be?

Are the cards exactly the same (both GeForce 8400GS), and only the memory size differ? Quite often with different memory sizes come slightly different clock rates (i.e. your card with more memory might use slower memory!).
So the first thing to check would be GPU core & memory clock rates, using something like GPU-Z.

It's an easy test to see if the surface lock is the problem, just comment out the texture update and see if the framerate returns to 60hz. Unfortunately, writing to a locked surface and updating the resource kills perfomance, always has. Are you using mipmaps with the textures? I know DX9 added automatic generation of mipmaps, could be taking up a lot of time to generate those. If your constantly locking the same resource each frame, you could also try creating a pool of textures, kinda like triple-buffering except with textures. You would let the render use one texture, and on the next update you pick the next available texture in the pool that's not being used in to render. Unless of course your memory constrained or your only making diffs to the animated texture.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js