A mobile application that I'm working on is expanding in scope. The client would like to have actual 3D objects in a product viewer within the app that a potential customer/dealer could zoom and rotate. I'm concerned about bringing a model into an OpenGL environment within a mobile device.
My biggest concern is complexity. I've looked at some of the engineering models for the products and some of them contain more than 360K faces! Does anyone know of any guidelines which would discuss how complex of an object OpenGL is able to handle?
Does anyone know of any guidelines which would discuss how complex of an object OpenGL is able to handle?
OpenGL is just a specification and doesn't deal with geometrical complexity. BTW: OpenGL doesn't treat geometry as coherent objects. For OpenGL it's just a bunch of loose points, lines or triangles that it throws (i.e. renders) to a framebuffer, one at a time.
Any considerations regarding performance make only sense with respect to an actual implementation. For example a low end GPU may be able to process as little as 500k vertices per second, while high end GPUs process several 10 million vertices per second with ease.
Related
I'm working on a data visualisation application where I need to draw about 20 different time series overlayed in 2D, each consisting of a few million data points. I need to be able to zoom and pan the data left and right to look at it and need to be able to place cursors on the data for measuring time intervals and to inspect data points. It's very important that when zoomed out all the way, I can easily spot outliers in the data and zoom in to look at them. So averaging the data can be problematic.
I have a naive implementation using a standard GUI framework on linux which is way too slow to be practical. I'm thinking of using OpenGL instead (testing on a Radeon RX480 GPU), with orthogonal projection. I searched around and it seems VBOs to draw line strips might work, but I have no idea if this is the best solution (would give me the best frame rate).
What is the best way to send data sets consisting of millions of vertices to the GPU, assuming the data does not change, and will be redrawn each time the user interacts with it (pan/zoom/click on it)?
In modern OpenGL (versions 3/4 core profile) VBOs are the standard way to transfer geometric / non-image data to the GPU, so yes you will almost certainly end up using VBOs.
Alternatives would be uniform buffers, or texture buffer objects, but for the application you're describing I can't see any performance advantage in using them - might even be worse - and it would complicate the code.
The single biggest speedup will come from having all the data points stored on the GPU instead of being copied over each frame as a 2D GUI might be doing. So do the simplest thing that works and then worry about speed.
If you are new to OpenGL, I recommend the book "OpenGL SuperBible" 6th or 7th edition. There are many good online OpenGL guides and references, just make sure you avoid the older ones written for OpenGL 1/2.
Hope this helps.
I'm wondering how e.g. graphic (/game) engines do their job with lot's of heterogeneous data while a customized simple rendering loop turns into a nightmare when you have some small changes.
Example:
First, let's say we have some blocks in our scene.
Graphic-Engine: create cubes and move them
Customized: create cube template for vertices, normals, etc. copy and translate them to the position and copy e.g. in a vbo. One glDraw* call does the job.
Second, some weird logic. We want block 1, 4, 7, ... to rotate on x-axis, 2, 5, 8, ... on y-axis and 3, 6, 9 on z-axis with a rotation speed linear to the camera distance.
Graphic-Engine: manipulating object's matrix and it works
Customized: (I think) per object glDraw* call with changing model-matrix uniform is not a good idea, so a translation matrix should be something like an attribute? I have to update them every frame.
Third, a block should disappear if the distance to the camera is lower than any const value Q.
Graphic-Engine: if (object.distance(camera) < Q) scene.drop(object);
Customized: (I think) our vbo is invalid and we have to recreate it?
Again to the very first sentence: it feels like engines do those manipulations for free, while we have to rethink how to provide and update data. And while we do so, the engine (might, but I actually don't know) say: 'update whatever you want, at least I'm going to send all matrizes'.
Another Example: What about a voxel-based world (e.g. Minecraft) where we only draw the visible surface, and we are able to throw a bomb and destroy many voxels. If the world's view data is in one huge buffer we only have one glDraw*-call but have to recreate the buffer every time then. If there are smaller chunks, we have many glDraw*-calls and also have to manipulate buffers, which are smaller.
So is it a good deal to send let's say 10MB of buffer update data instead of 2 gl*-calls with 1MB? How many updates are okay? Should a rendering loop deal with lazy updates?
I'm searching for a guide what a 60fps application should be able to update/draw per frame to get a feeling of what is possible. For my tests, every optimization try is another bottleneck.
And I don't want those tutorials which says: hey there is a new cool gl*Instance call which is super-fast, buuuuut you have to check if your gpu supports it. Well, I also rather consider this an optimization than a meaningful implementation at first.
Do you have any ideas, sources, best practices or rule of thumb how a rendering/updating routine best play together?
My questions are all nearly the same:
How many updates per frame are okay on today's hardware?
Can I lazy-load data to have it after a few frames, but without freezing my application
Do I have to do small updates and profile my loop if there are some microseconds left till next rendering?
Maybe I should implement a real-time profiler which gets a feeling over time, how expensive updates are and can determine the amount of updates per frame?
Thank you.
It's unclear how any of your questions relate to your "graphics engines" vs "customized" examples. All the updates you do with a "graphics engines" are translated to those OpenGL calls in the end.
In brief:
How many updates per frame are okay on today's hardware?
Today's PCIe bandwidth is huge (can go as high as 30 GB/s). However, to utilize it in its entirety you have to reduce the number transactions via consolidating OpenGL calls. The exact number of updates entirely depends on the hardware, drivers, and the way you use them, and graphics hardware is diverse.
This is the kind of answer you didn't want to hear, but unfortunately you have to face the truth: to reduce the number of OpenGL calls you have to use the newer version APIs. E.g. instead of setting each uniform individually you are better to submit a bunch of them through uniform shader buffer objects. Instead of submitting each MVP of each model individually, it's better to use instanced rendering. And so on.
An even more radical approach would be to move to a lower-level (and newer) API, i.e. Vulkan, which aims to solve exactly this problem: the cost of submitting work to the GPU.
Can I lazy-load data to have it after a few frames, but without freezing my application
Yes, you can upload buffer objects asynchronously. See Buffer Object Streaming for details.
Do I have to do small updates and profile my loop if there are some microseconds left till next rendering?
Maybe I should implement a real-time profiler which gets a feeling over time, how expensive updates are and can determine the amount of updates per frame?
You don't need any of these if you do it asynchronously.
I've been reading up on how some OpenGL-based architectures manage their objects in an effort to create my own light weight engine based on my application's specific needs (please no "why don't you just use this existing product" responses). One architecture I've been studying is Qt's Quick Scene Graph, and while it makes a lot of sense I'm confused about something.
According to their documentation, opaque primitives are ordered front-to-back and non-opaque primitives are ordered back-to-front. The ordering is to do early z-killing by hopefully eliminating the need to process pixels that appear behind others. This seems to be a fairly common practice and I get it. It makes sense.
Their documentation also talks about how items that use the same Material can be batched together to reduce the number of state changes. That is, a shared shader program can be bound once and then multiple items rendered using the same shader. This also makes sense and I'm good with it.
What I don't get is how these two techniques work together. Say I have 3 different materials (let's just say they are all opaque for simplification) and 100 items that each use one of the 3 materials, then I could theoretically create 3 batches based off the materials. But what if my 100 items are at different depths in the scene? Would I then need to create more than 3 batches so that I can properly sort the items and render them front-to-back?
Based on what I've read of other engines, like Ogre 3D, both techniques seem to be used pretty regularly, I just don't understand how they are used together.
If you really have 3 materials, you can only batch objects that are rendered in a group according to their sorting. At times the sorting can be optimized for objects that do not overlap each other to minimize the material switches.
The real "trick" behind all that how ever is to combine the materials. If the engine is able to create one single material out of the 3 source materials and use the shaders to properly apply the material settings to the different objects (mostly that is transforming the texture coordinates), everything can be batched and ordered at the same time. But if that is not possible the engine can't optimize it further and has to switch the material every now and then.
You don't have to group every material in your scene together. But if it's possible to group those materials that often switch with each other, it can already improve the performance a lot.
I'm working on a 3D engine suitable for very large scene display.
Appart of the rendering itself (frustum culling, occlusion culling, etc.), I'm wondering what is the best solution for scene management.
Data is given as a huge list of 3D meshs, with no relation between them, so I can't generate portals, I think...
The main goal is to be able to run this engine on systems with low RAM (500MB-1GB), and the scenes loaded into it are very large and can contain millions of triangles, which leads to very intensive memory usage. I'm actually working with a loose octree right now, constructed on loading, it works well on small and medium scenes, but many scenes are just to huge to fit entirely in memory, so here come my question:
How would you handle scenes to load and unload chunks dynamically (and ideally seamlessly), and what would you base on to determine if a chunk should be loaded/unloaded? If needed, I can create a custom file format, as scenes are being exported using a custom exporter on known 3D authoring tools.
Important information: Many scenes can't be effectively occluded, because of their construction.
Example: A very huge pipe network, so there isn't so much occlusion but very high number of elements.
I think that the best solution will be a "solution pack", a pack of different techniques.
Level of detail(LOD) can reduce memory footprint if unused levels are not loaded. It can be changed more or less seamlessly by using an alpha mix between the old and the new detail. The easiest controller will use mesh distance to camera.
Freeing the host memory(RAM) when the object has been uploaded to the GPU (device), and obviously free all unsued memory (OpenGL resources too). Valgrind can help you with this one.
Use low quality meshes and use tessellation to increase visual quality.
Use VBO indexing, this should reduce VRAM usage and increase performance
Don't use meshes if possible, terrain can be rendered using heightmaps. Some things can be procedurally generated.
Use bump or/and normalmaps. This will improve quality, then you can reduce vertex count.
Divide those "pipes" into different meshes.
Fake 3D meshes with 2D images: impostors, skydomes...
If the vast amount of ram is going to be used by textures, there are commercial packages available such as the GraniteSDK that offer seamless LOD-based texture streaming using a virtual texture cache. See http://graphinesoftware.com/granite . Alternatively you can look at http://ir-ltd.net/
In fact you can use the same technique to construct poly's on the fly from texture data in the shader, but it's going to be a bit more complicated.
For voxels there is a techniques to construct oct-trees entirely in GPU memory, and page in/out the parts you really need. The rendering can then be done using raycasting. See this post: Use octree to organize 3D volume data in GPU , http://www.icare3d.org/research/GTC2012_Voxelization_public.pdf and http://www.cse.chalmers.se/~kampe/highResolutionSparseVoxelDAGs.pdf
It comes down to how static the scene is going to be, and following from that, how well you can pre-bake the data according to your vizualization needs. It would already help if you can determine visibility constraints up front (e.g. google Potential Visiblity Sets) and organize it so that you can stream it at request. Since the visualizer will have limits, you always end up with a strategy to fit a section of the data into GPU memory as quickly and accurately as possible.
My straight answer would be NO. But I am curious how they created this video http://www.youtube.com/watch?v=HC3JGG6xHN8
They used video editing software. They recorded two nearly deterministic run-throughs of their engine and spliced them together.
As for the question posed by your title, not within the same window. It may be possible within the same application from two windows, but you'd be better off with two separate applications.
Yes, it is possible. I did this as an experiment for a graduate course; I implemented half of a deferred shading graphics engine in OpenGL and the other half in D3D10. You can share surfaces between OpenGL and D3D contexts using the appropriate vendor extensions.
Does it have any practical applications? Not many that I can think of. I just wanted to prove that it could be done :)
I digress, however. That video is just a side-by-side of two separately recorded videos of the Haven benchmark running in the two different APIs.
My straight answer would be NO.
My straight answer would be "probably yes, but you definitely don't want to do that."
But I am curious how they created this video http://www.youtube.com/watch?v=HC3JGG6xHN8
They prerendered the video, and simply combined it via video editor. Because camera has fixed path, that can be done easily.
Anyway, you could render both (DirectX/OpenGL) scenes onto offscreen buffers, and then combine them using either api to render final result. You would read data from render buffer in one api and transfer it into renderable buffer used in another api. The dumbest way to do it will be through system memory (which will be VERY slow), but it is possible that some vendors (nvidia, in particular) provide extensions for this scenario.
On windows platform you could also place two child windows/panels side-by-side on the main windows (so you'll get the same effect as in that youtube video), and create OpenGL context for one of them, and DirectX device for another. Unless there's some restriction I'm not aware of, that should work, because in order to render 3d graphics, you need window with a handle (HWND). However, both windows will be completely independent of each other and will not share resources, so you'll need 2x more memory for textures alone to run them both.