Determine limit for data sent to VBO? - c++

I'm writing a 3d application for the Playbook, which has a PowerVR SGX 540. I noticed that if I stuff enough data in the VBO through opengl, I can cause the device to crash (not just the application, but the entire device, requiring a hard reboot). To cause the crash, I sent data for a model with ~300k triangles and ~150k vertices. I sent normal data for the vertices as well.
I found that the problem doesn't occur if I send less data (tried another model with half the triangles and vertices). Also, the issue doesn't occur if I use vertex arrays (though it's incredibly slow).
I'd like to know:
Is what I'm seeing a common result for mobile hardware? That is, is a 300k tri model with 150k vertices and normals overkill?
Can I check how much memory I have available for VBO usage outside of testing a bunch of different model sizes (it takes a good five minutes to recover the device from a crash)?
Could anything else be causing this issue? I've provided some additional information:
I'm using Qt for my GUI, and drawing the 3d scene to an FBO before it's painted to the GUI (I haven't checked if redoing all this without a UI by creating an EGL window and drawing to that recreates the problem yet -- that'll take awhile).
To verify it wasn't me using OpenGL poorly, I tried both using raw OpenGL calls for all the 3d stuff, and also doing everything with OpenSceneGraph. Both methods fail in the exact same way (VBO works with less data, vertex arrays work, increased VBO data causes a crash).
The program works fine on my desktop. Unfortunately, I don't have any other mobile devices I can test my application out on.

OpenGL ES only supports unsigned short (16 bit) as data type for indices, so if you're using an index array, you're over that limit.

Related

CPU/GPU Shared Buffer in Direct3D12

I have no experience with Direct3D, so I may just be looking in the wrong places. However, I would like to convert a program I have written in OpenGL (using FreeGLUT) to a Windows IoT compatible UWP (running Direct3D, 12 'caus it's cool). I'm trying to port my program to a Raspberry Pi 3 and I don't want to convert to Linux.
Through the examples provided by Microsoft I have figured out most of what I believe I need to know to get started, but I can't figure out how to share a dynamic data buffer between the CPU and GPU.
What I want to know how to do:
Create a CPU/GPU shared circular buffer
Read and Draw with the GPU
Write / Replace sections with the CPU
Quick semi-pseudo code:
while (!buffer.inUse()){ //wait until buffer is not in use
updateBuffer(buffer.id, data, start, end); //insert data into buffer
drawToScreen(buffer.id); //draw using vertex data in buffer
}
This was previously done in OpenGL by simply using glBegin()/glEnd() and glVertex3f() for each value in an array when it wasn't being written to.
Update: I basically want a Direct3D12 equivalent of OpenGLs VBO editing using glBufferSubData(). If that makes more sense.
Update 2: I found that I can get away with discarding the vertex buffer every frame and re-uploading a new buffer to the GPU. There's a fair amount of overhead, as one would expect with transferring 10,000 - 200,000 doubles every frame. So I'm trying to find a way to use constant buffers to port the 5-10 updated vertexes into the shader, so I can copy from the constant buffer into the vertex buffer using the shader and not have to use map/unmap every frame. This way my circular buffer on the CPU is independent of the buffer being used on the GPU, but they will both share the same information through periodic updates. I'll do some more looking and post another more specific question on shaders if I don't find a solution.

Copy or move a FrameBufferObject

I want to find a way to send all the geometry from an opengl framebuffer to a remote computer, who would do the rendering. This would allow me to have very complex simulations running on some kind of a big supercomputer, and rendered on a small mobile or simply cheap client machine doing the rendering.
Before starting digging in my code, I though it would be relatively easy: let's copy the vertex arrays and send it through the network, using boost::serialisation for example, and that's it. But my geometry are encapsulated, which prevents me from accessing it from where I want to.
I have been able to render into a framebuffer instead of rendering directly on screen though, and I was wondering if there is a way to retrieve data from OpenGL's fbo's in anyway?
First your terminology is wrong. Frame Buffer Objects are encapsulations of off-screen images/surfaces and don't hold geometry.
Second: What you imagine has been implemented already by the VirtualGL project (however it's stuck at a rather old OpenGL profile and doesn't support modern GPUs).
Also X11/GLX always supported indirect OpenGL operation, i.e. a remote machine would send OpenGL commands to the local display server, which is exactly what you probably think of. But this has a major drawback: Network bandwidth becomes the major bottleneck.

glDrawArray+VBO increasing memory footprint

I am writing a Windows based OpenGL viewer application.
I am using VBO + triangle strip + glDrawArrays method to render my meshes. Every thing is perfectly working on all machines.
In case of Windows Desktop with nVidia Quadro cards the working/peak working memory shoots when i first call glDrawArray.
While in case of laptops having nvidia mobile graphic cards the working memory or peak working memory does not shoot. Since last few days i am checking almost all forums/post/tuts about VBO memory issue. Tried all combinations of VBO like GL_STATIC_DRAW/DYNAMIC/STREAM, glMapbuffer/glunmapbuffer. But nothing stops shooting memory on my desktops.
I suspect that for VBO with ogl 1.5 i am missing some flags.
PS: I have almost 500 to 600 VBO's in my application. I am using array of structures ( i.e. v,n,c,t together in a structure). And I am not aligning my VBOs to 16k memory.
Can any one suggest me how I should go ahead to solve this issue. Any hints/pointers would be helpful.
Do you actually run out of memory or does your application increasingly consume memory? If not, why bother? If the OpenGL implementation keeps a working copy for itself, then this is probably for a reason. Also there's little you can do on the OpenGL side to avoid this, since it's entirely up to the driver how it manages its stuff. I think the best course of action, if you really want to keep the memory footprint low, is contacting NVidia, so that they can double check if this may be a bug in their drivers.

Best technique to handle vertices in OpenGL? C++

I am implementing a map renderer for Quake. I am currently running through the arrays of vertices and sending them one at a time. I was told that by using vertex arrays, I can greatly speed up the rendering process by sending vertices in a batch. Now, I have just looked at display lists and finally VBO's or vertex buffer objects. VBO's mention a great advantage in relation to client/server communication. IF I am just going to be developing a client and not a server, are VBO's still applicable to what I am doing?
What are games using currently in the OpenGL spectrum for quick vertex processing?
When they say "client/server" communication, they are not talking about a internet network.
Client = The CPU
Server = The GFX hardware
These are two separate pieces of hardware. While they are (usually) attached to the same motherboard, they still need to communicate with each other. A graphics card doesn't usually have access to the main memory on your motherboard, so vertices (and textures, indices etc.) need to actually be sent to the graphics device.
Short story: Use Vertex Buffer Objects. For static data (like a Quake map) there is nothing better. The vertices are sent to the graphics device (server) once and stay there. When you draw things vertex by vertex (using something like glBegin(GL_TRIANGLES)) the vertices are sent across every frame, which, as you can imagine, is pretty inefficient.
VBOs are the modern answer.
Do not misunderstand the terms "client" and "server" in the OpenGL-sense. This is not the difference between Apache (HTTP server) and Chrome (HTTP client, among other things). In OpenGL terms, the client is you/your code. The server is the graphics driver/hardware.
VBOs allow you to store vertex information on the graphics hardware, which can let you avoid sending each vertex to the card every time you use it. This is a huge advantage.
You ask about vertex processing, but to answer your question we'd have to clarify what you meant. If you mean getting vertices to the graphics card, the answer is VBOs. If we're talking about manipulating vertex data this can efficiently be achieved through Shaders.

What can cause a reduction in frame rate when upgrading a graphics card?

We have a two-screen DirectX application that previously ran at a consistent 60 FPS (the monitors' sync rate) using a NVIDIA 8400GS (256MB). However, when we swapped out the card for one with 512 MB of RAM the frame rate struggles to get above 40 FPS. (It only gets this high because we're using triple-buffering.) The two cards are from the same manufacturer (PNY). All other things are equal, this is a Windows XP Embedded application and we started from a fresh image for each card. The driver version number is 169.21.
The application is all 2D. I.E. just a bunch of textured quads and a whole lot of pre-rendered graphics (hence the need to upgrade the card's memory). We also have compressed animations which the CPU decodes on the fly - this involves a texture lock. The locks take forever but I've also tried having a separate system memory texture for the CPU to update and then updating the rendered texture using the device's UpdateTexture method. No overall difference in performance.
Although I've read through every FAQ I can find on the internet about DirectX performance, this is still the first time I've worked on a DirectX project so any arcane bits of knowledge you have would be useful. :)
One other thing whilst I'm on the subject; when calling Present on the swap chains it seems DirectX waits for the present to complete regardless of the fact that I'm using D3DPRESENT_DONOTWAIT in both present parameters (PresentationInterval) and the flags of the call itself. Because this is a two-screen application this is a problem as the two monitors do not appear to be genlocked, I'm working around it by running the Present calls through a threadpool. What could the underlying cause of this be?
Are the cards exactly the same (both GeForce 8400GS), and only the memory size differ? Quite often with different memory sizes come slightly different clock rates (i.e. your card with more memory might use slower memory!).
So the first thing to check would be GPU core & memory clock rates, using something like GPU-Z.
It's an easy test to see if the surface lock is the problem, just comment out the texture update and see if the framerate returns to 60hz. Unfortunately, writing to a locked surface and updating the resource kills perfomance, always has. Are you using mipmaps with the textures? I know DX9 added automatic generation of mipmaps, could be taking up a lot of time to generate those. If your constantly locking the same resource each frame, you could also try creating a pool of textures, kinda like triple-buffering except with textures. You would let the render use one texture, and on the next update you pick the next available texture in the pool that's not being used in to render. Unless of course your memory constrained or your only making diffs to the animated texture.