3D Picking in OpenGL - opengl

Is there a "standard" method for 3d picking? What do most game companies do? (for accurate picking)
I thought the fastest way is to use the gpu and render every object with an "color index", and then to use glReadPixels(), but then I heard that it's considered slow because of glFlush(), glFinish() calls.
There's also this ray casting approach, which is nice but isn't accurate because of the spheres/AABBs approximations.

Any question about what is "standard" is probably going to invoke some opinionated responses, but I would suggest that the closest to "standard" here is raycasting.
Take your watertight ray/triangle intersection function and test a ray that is unprojected from your mouse cursor position against the triangles in your scene.
Normally this would be quite slow, requiring linear complexity. So the next step is to accelerate it to something better, like logarithmic time. This is typically achieved with a data structure such as an octree, BVH, K-D tree, or BSP. Sometimes people skip this step and just try to make the ray/tri intersection really fast and really parallel, possibly even using GPGPU.
It takes a lot more work upfront than framebuffer-based solutions, but complex applications tend to go this route probably because:
Portability: it's decoupled from the rendering engine. It doesn't have to be tied to OpenGL or DirectX, e.g., and that improves portability.
Generality: typically the accelerator and associated queries are needed for other things. For example, an FPS game might have players and enemies constantly shooting at each other. Figuring out what projectiles hit what tends to require these kinds of intersection queries occurring constantly, and not just from a uniform viewing angle.
Simplicity: the developers can afford the extra work upfront to simplify things later on.
There's also this ray casting approach, which is nice but isn't
accurate because of the spheres/AABBs approximations.
There should be nothing inaccurate about using AABBs or bounding spheres for acceleration purposes. Those are purely to accelerate the tests and quickly reduce the number of the more costly ray/triangle intersections that need to occur by doing cheaper tests and ones that eliminate large batches of triangles to check in bulk. Normally they should be constructed to encompass the elements in the scene. If you do a ray/AABB intersection first, e.g., and if that hits, test the elements encompassed within the AABB. Any acceleration structure that doesn't give the same results without the accelerator would typically be a glitchy one.
For example, a very basic form of acceleration is just put a bounding box around one mesh element in a scene, like a character, and sometimes this basic form without involving a full-blown accelerator might be useful for very dynamic elements in the scene (to avoid the cost of constantly updating the accelerator). If the ray intersects the character's bounding box, then check all the triangles making up the character. As long as you check the triangles within the AABB afterwards, it becomes about acceleration rather than approximation. Of course if you only checked the AABB and nothing else, then it would be a crude approximation.

Related

Ray tracing via Compute Shader vs Screen Quad

I was recently looking for ray tracing via opengl tutorials. Most of tutorials prefer compute shaders. I wonder why don't they just render to texture, then render the texture to screen as quad.
What is the advantages and disadvantages of compute shader method over screen quad?
Short answer: because compute shaders give you more effective tools to perform complex computations.
Long answer:
Perhaps the biggest advantage that they afford (in the case of tracing) is the ability to control exactly how work is executed on the GPU. This is important when you're tracing a complex scene. If your scene is trivial (e.g., Cornell Box), then the difference is negligible. Trace some spheres in your fragment shader all day long. Check http://shadertoy.com/ to witness the madness that can be achieved with modern GPUs and fragment shaders.
But. If your scene and shading are quite complex, you need to control how work is done. Rendering a quad and doing the tracing in a frag shader is going to, at best, make your application hang while the driver cries, changes its legal name, and moves to the other side of the world...and at worst, crash the driver. Many drivers will abort if a single operation takes too long (which virtually never happens under standard usage, but will happen awfully quickly when you start trying to trace 1M poly scenes).
So you're doing too much work in the frag shader...next logical though? Ok, limit the workload. Draw smaller quads to control how much of the screen you're tracing at once. Or use glScissor. Make the workload smaller and smaller until your driver can handle it.
Guess what we've just re-invented? Compute shader work groups! Work groups are compute shader's mechanism for controlling job size, and they're a far better abstraction for doing so than fragment-level hackery (when we're dealing with this kind of complex task). Now we can very naturally control how many rays we dispatch, and we can do so without being tightly-coupled to screen-space. For a simple tracer, that adds unnecessary complexity. For a 'real' one, it means that we can easily do sub-pixel raycasting on a jittered grid for AA, huge numbers of raycasts per pixel for pathtracing if we so desire, etc.
Other features of compute shaders that are useful for performant, industrial-strength tracers:
Shared Memory between thread groups (allows, for example, packet tracing, wherein an entire packet of spatially-coherent rays are traced at the same time to exploit memory coherence & the ability to communicate with nearby rays)
Scatter Writes allow compute shaders to write to arbitrary image locations (note: image and texture are different in subtle ways, but the advantage remains relevant); you no longer have to trace directly from a known pixel location
In general, the architecture of modern GPUs are designed to support this kind of task more naturally using compute. Personally, I have written a real-time progressive path tracer using MLT, kd-tree acceleration, and a number of other computationally-expensive techniques (PT is already extremely expensive). I tried to remain in a fragment shader / full-screen quad as long as I could. Once my scene was complex enough to require an acceleration structure, my driver started choking no matter what hackery I pulled. I re-implemented in CUDA (not quite the same as compute, but leveraging the same fundamental GPU architectural advances), and all was well with the world.
If you really want to dig in, have a glance at section 3.1 here: https://graphics.cg.uni-saarland.de/fileadmin/cguds/papers/2007/guenther_07_BVHonGPU/Guenter_et_al._-_Realtime_Ray_Tracing_on_GPU_with_BVH-based_Packet_Traversal.pdf. Frankly the best answer to this question would be an extensive discussion of GPU micro-architecture, and I'm not at all qualified to give that. Looking at modern GPU tracing papers like the one above will give you a sense of how deep the performance considerations go.
One last note: any performance advantage of compute over frag in the context of raytracing a complex scene has absolutely nothing to do with rasterization / vertex shader overhead / blending operation overhead, etc. For a complex scene with complex shading, bottlenecks are entirely in the tracing computations, which, as discussed, compute shaders have tools for implementing more efficiently.
I am going to complete Josh Parnell information.
One problem with both fragment shader and compute shader is that they both lack recursivity.
A ray tracer is recursive by nature (yeah I know it is always possible to transform a recursive algorithm in a non recursive one, but is is not always that easy to do it).
So another way to see the problem could be the following :
Instead to have "one thread" per pixel, one idea could be to have one thread per path (a path is a part of your ray (between 2 bounces)).
Going that way, you are dispatching on your "bunch" of rays instead on your "pixel grid". Doing so simplify the potential recursivity of the ray tracer, and avoid divergence in complex materials :
More information here :
http://research.nvidia.com/publication/megakernels-considered-harmful-wavefront-path-tracing-gpus

Level object collision

I'm using SDL2 for my game.
I have an std::vector of SDL_Rects (that is, rectangle objects) that holds the solid platforms (i.e. platforms that the player can't go through) of a level in my game.
When checking for collision, my current code does the following:
for (SDL_Rect rect : rects) {
if (player.collides(rect)) {
// handle collision
}
}
Consider I have a level with numerous (e.g. 500) solid platform rectangles, is it inefficient to go through all of them and check for collision? Is there a better way to do this?
The collides() function only checks for AABB collision (4 simple conditions).
I think this is reasonable. You have simple shapes and are doing simple collision checking. Imagine a more graphically intense game. Even then, they may have a complex skeletal mesh for the character, but just do the collision checking against an easy-to-calculate bounding shape, and they probably have a lot more than 500 things going on at once.
In a more complex game engine, different types may be blocking against some types and non-blocking against others, so not only will it be checking for simple overlap events, it has to know if the overlapping objects should interact or not. Or there might be different interactions for different objects.
With games, by and large your bottleneck is rendering and the associated calculations, so unless you know you're in danger of doing something incredibly slowly with the game logic (complex path finding or AI or something like that), I would concentrate my optimizing efforts on the rendering.

Approach to sphere sphere collision

I am trying to implement a sphere sphere collision. I understand the math behind it. However, I am still looking around tutorials to see if there are better and faster approaches. I came across nehe's collision detection tutorial ( http://nehe.gamedev.net/tutorial/collision_detection/17005/ ). In this tutorial, if I understood correctly, he is trying to check if two spheres collides within a frame, and he tries not to miss it, by first checking if their paths intersect. And then tries to simulate it in a way.
My approach was to check every frame, whether spheres are colliding and be done with it. I didnt consider checking intersection paths etc. Now I am kinda confused how to approach to the problem.
My question is, is it really necessary trying to be that safe and check if we missed the collision by one frame ?
When writing collision detection algorithms, it is important to recognize objects are moving at discrete time steps, unlike the real world. In a typical modern game, objects will be moving with a time step of 0.016 seconds per frame (often with smaller fixed or variable time steps).
It is possible that two spheres moving with very high velocities could pass through each other during a single frame and not be within each other's bounding spheres after integration is performed. This problem is called Tunneling and there are multiple ways, each with varying levels of complexity and cost, to approach the problem. Some options are swept volumes and Minkowski Addition.
Choosing the right algorithm depends on your application. How precise does the collision need to be? Is it vital to your application or can you get away with some false negatives/positives. Typically, the more precise the collision detection, the more you pay in performance.
Similar question here

Regarding object managers

I am designing a class that manages a collection of rectangles.
Each rectangle represents a button, so contains the following properties:
x position
y position
width
height
a callback function for what happens when it is pressed.
The concept itself is fairly straightforward and is managed through a command line interface. In particular.
If I type "100, 125" it looks up whether there is a rectangle that contains this point (or multiple) and performs their callback functions.
My proposal is to iterate over all rectangles in this collection and perform the callback of each individual rectangle which contains this point, or stop upon the first rectangle that matches (simulating z order).
I fear however that this solution is sloppy, as this iteration becomes longer the more rectangles I have.
This is fine for the console application, since it can easily go over 10,000 rectangles and find which ones match, expensive computation, but time wise it is not particularly an issue.
The issue is that if I were to implement this algorithm in a GUI application which needs to perform this check every time a mouse is moved (to simulate mouse over effect), moving your mouse 10 pixels over a panel with 10,000 object would require checking of 100,000 objects, that's not even 1000 pixels or over people tend to move mouse over at.
Is there a more elegant solution to this issue, or will such programs always need to be so expensive?
Note: I understand that most GUIs do not have to deal with 10,000 active objects at once, but that is not my objective.
The reason I choose to explain this issue in terms of buttons is because they are simpler. Ideally, I would like a solution which would be able to work in GUIs as well as particle systems which interact with the mouse, and other demanding systems.
In a GUI, I can easily use indirection to reduce amount of checks drastically, but this does not alleviate the issue of
needing to perform checks every single time a mouse is moved, which can be quite demanding even if there are 25 buttons, as moving over 400 pixels with 25 objects(in ideal conditions) would be as bad as moving 1 pixel with 10,000 objects.
In a nutshell, this issue is twofold:
What can I do to reduce the amount of checks from 10,000 (granted there are 10,000 objects).
Is it possible to reduce amount of checks required in such a GUI application from every mouse move, to something more reasonable.
Any help is appreciated!
There are any number of 2D-intersection acceleration structures you could apply. For example, you could use a Quadtree (http://en.wikipedia.org/wiki/Quadtree) to recursively divide the viewport into quadrants. Subdivide each quadrant that doesn't fall entirely within or entirely outside every rectangle and place a pointer to either the top or the list of rectangles at each leaf (or NULL if no rectangles land there). The structure isn't trivial, but it's fairly straightforward conceptually.
Is there a more elegant solution to this issue, or will such programs
always need to be so expensive?
Instead of doing a linear search through all the objects you could use a data structure like a quad tree that lets you efficiently find the nearest object.
Or, you could come up with a more realistic set of requirements based on the intended use for your algorithm. A GUI with 10,000 buttons visible at once is a poor design for many reasons, the main one being that the poor user will have a very hard time finding the right button. A linear search through a number of rectangles more typical of a UI, say somewhere between 2 and 100, will be no problem from a performance point of view.

Which is a larger performance drain: quantity of vertices in one draw call, or quantity of calls?

I am quickly finding that one of the organisational considerations you must make when preparing rendering in OpenGL is the type of topography and the arrangement of vertices.
Now there are some interesting methods out there for organising vertices into very long arrays, with nice uses of interleaved arrays, indexes, etc, so that you can pour a lot of geometry into one OpenGL call.
But it's much easier in some cases to simply iterate and perform multiple calls with smaller vertex arrays.
While I agree with the notion that premature optimization is somewhat wasteful, just how important of a consideration should it be to minimize OpenGL calls, especially if multiple calls would actually involve far fewer vertexes per call?
I can see that this is one of those decisions that is important early in the development process, since it forms a lot of the structure of how vertexes get created and organized.
There is an overhead for each command you send down to the GPU. By batching the vertices you minimize that overhead and also allows the driver to make small optimizations in you data before sending it to the hardware. It can make quite a difference and is the reason the glBegin and glEnd was completely removed from newer iterations of OpenGL.
You should try to avoid making many driver states changes and many drawing calls.
EDIT: Consider using degenerated vertices in you triangle strips (also helps in minimizing the number of vertices processed) so that you can just use one drawing call and render all your topology (unless you need to change some driver state between parts of the topology).
You can find a balance for your specific needs. But the thing is that there're many variables in the equation. And there's no simple solution (like "always make scene as one big single batch!"). TraxNet gave you a good advice though - always try to minimize api calls(whether drawing or state changes). But it hasn't to be just a few calls. On modern PC it could be thousands per frame, not so modern mobile phone, maybe, just a few hundred.
Also TraxNet mentioned degenerate triangles(helping form strips) - though they're still triangles(kinda add to 'total' triangle count rendered) - they cost almost nothing still helping to minimize amount of draw calls.