Why does stereo 3D rendering require software written especially for it?

Why does stereo 3D rendering require software written especially for it? - opengl

Given a naive take on 3D graphics rendering it seems that stereo 3D rendering should be essentially transparent to the developer and be entirely a feature of the graphics hardware and drivers. Wherever an OpenGL window is displaying a scene, it takes the geometry, lighting, camera and texture etc. information to render a 2D image of the scene.
Adding stereo 3D to the scene seems to essentially imply using two laterally offset cameras where there was originally one, and all other scene variables stay the same. The only additional information then would be how far apart to make the cameras and how far out to to make their central rays converge. Given this it would seem trivial to take a GL command sequence and interleave the appropriate commands at driver level to drive a 3D rendering.
It seems though applications need to be specially written to make use of special 3D hardware architectures making it cumbersome and prohibitive to implement. Would we expect this to be the future of stereo 3D implementations or am I glossing over too many important details?
In my specific case we are using a .net OpenGL viewport control. I originally hoped that simply having stereo enabled hardware and drivers would be enough to enable stereo 3D.

Your assumptions are wrong. OpenGL does not "take geometry, lighting camera and texture information to render a 2D image". OpenGL takes commands to manipulate its state machine and commands to execute draw calls.
As Nobody mentions in his comment, the core profile does not even care about transformations at all. The only thing it really provides you with now is ways to provide arbitrary data to a vertex shader, and an arbitrary 3D cube to do rendering to. Wether that corresponds or not to the actual view, GL does not care, nor should it.
Mind you, some people have noticed that a driver can try to guess what's the view and what's not, and this is what the nvidia driver tries to do when doing automatic stereo rendering. This requires some specific guess-work, which amounts to actual analysis of game rendering to tweak the algorithms so that the driver guesses right. So it's typically a per-title, in-driver change. And some developers have noticed that the driver can guess wrong, and when that happens, it starts to get confusing. See some first-hand account of those questions.
I really recommend you read that presentation, because it makes some further points as to where the camera should be pointing towards (should the 2 view directions be parallel and such).
Also, It turns out that is essentially costs twice as much rendering for everything that is view dependent. Some developers (including, for example, the Crytek guys, see Part 2), figured out that to a great extent, you can do a single render, and fudge the picture with additional data to generate the left and right eye pictures.
The amount of saved work here is worth a lot by itself, for the developer to do this themselves.

Stereo 3D rendering is unfortunately more complex than just adding a lateral camera offset.
You can create stereo 3D from an original 'mono' rendered frame and the depth buffer. Given the range of (real world) depths in the scene, the depth buffer for each value tells you how far away the corresponding pixel would be. Given a desired eye separation value, you can slide each pixel left or right depending on distance. But...
Do you want parallel axis stereo (offset asymmetrical frustums) or 'toe in' stereo where the two cameras eventually converge? If the latter, you will want to tweak the camera angles scene by scene to avoid 'reversing' bits of geometry beyond the convergence point.
For objects very close to the viewer, the left and right eyes see quite different images of the same object, even down to the left eye seeing one side of the object and the right eye the other side - but the mono view will have averaged these out to just the front. If you want an accurate stereo 3D image, it really does have to be rendered from different eye viewpoints. Does this matter? FPS shooter game, probably not. Human surgery training simulator, you bet it does.
Similar problem if the viewer tilts their head to one side, so one eye is higher than the other. Again, probably not important for a game, really important for the surgeon.
Oh, and do you have anti-aliasing or transparency in the scene? Now you've got a pixel which really represents two pixel values at different depths. Move an anti-aliased pixel sideways and it probably looks worse because the 'underneath' color has changed. Move a mostly-transparent pixel sideways and the rear pixel will be moving too far.
And what do you do with gunsight crosses and similar HUD elements? If they were drawn with depth buffer disabled, the depth buffer values might make them several hundred metres away.
Given all these potential problems, OpenGL sensibly does not try to say how stereo 3D rendering should be done. In my experience modifying an OpenGL program to render in stereo is much less effort than writing it in the first place.
Shameless self promotion: this might help
http://cs.anu.edu.au/~Hugh.Fisher/3dteach/stereo3d-devel/index.html

Related

What tasks in 3d program does CPU / GPU handle?

When rotating a scene in a 3d modeling interface, which part of such task is CPU responsible for and which part the GPU takes on? (mesh verticies moving, shading, keeping track of UV coords - perhaps offsetting them, lighting the triangles and rendering transparency correctly)
What rendering mode is normally used via such modeling programs (realtime) - immediate or retained?

First and formost the GPU is responsible for putting points, lines and triangles to the screen. However this involves a certain amount of calculations.
The usual pipeline is, that for each vertex (which is a combination of attributes, that usually includes, but is not limited to position, normals, texture coordinates and so on), the vertex position is transformed from model local space into normalized device coordinates. This is in most implementations a 3-stage process.
transformation from model local space into view=eye space – eye space coordinates are later reused for things like illumination calculations
transformation from view space to clip space, also called projection; this is determined which part of the view space will later be visible in the viewport; it's also where affine perspective is introduced
mapping into normalized device coordinates, by coordinate homogenization (this later step actually creates perspective if an affine projection is used).
The above calculations are normally carried out by the GPU.
When rotating a scene in a 3d modeling interface, which part of such task is CPU responsible for and which part the GPU takes on?
Well, that depends on what kind of rotation you mean. If you mean an alteration of the viewport but nothing in the scene input data is actually changed. The only thing that gets altered is a parameter used in the first transformation step. This parameter is normally a 4×4 matrix. When rotating the viewport a new modelview transformation matrix is calculated on the CPU. This matrix is then passed to the GPU and the whole scene redrawn.
If however a model is actually modified in a modeller, then the calculations are usually carried out on the CPU.
(mesh verticies moving, shading, keeping track of UV coords - perhaps offsetting them, lighting the triangles and rendering transparency correctly)
In an online renderer this normally done mostly by the GPU, but certain parts may be precalculated by the CPU.
It's impossible to make a definitive statement, because how the workload is shared depends on the actual application.

This question is really I mean REALLY vague.
Generally whatever comes, because it really depends on the program and it's makers. Older programs were mostly all CPU since GPUs where non existent or too weak to handle massive scenes.
However today GPUs are powerful enough to handle massive scenes, the programs creators can offer various solutions but its usually an abstracted system where you have your data and your view and thus they allow you to specify what you want with in the editing viewport : realtime / immediate, preprocessed , high or low detail. Programs usually sacrifice accuracy for speed so you can edit with ease.
For example 3ds max uses rendering devices, viewport handlers and renderers. Renderers handle the production quality output but today they are not limited to CPU since they can take advantage of the GPU ( just think about OpenCL or CUDA ) while maintaining the quality and lowering rendering times. Secondly anyone can make plugins to implement a viewport renderer the way they want it let it be CPU or GPU or a mixed renderer.
So abstraction being common in modelling tools the scene information is fed into a viewport renderer which is usually very similar to a game engine's renderer. If you think about it , because it's a tool and has an UI and various systems that it needs to handle the try to offload the CPU as much as possible by doing as much rendering work as possible on the GPU, so editing takes place in memory on the in the "general" data structure than that is displayed to you, I'm not sure but I think they may also use the graphics api ( let it be OpenGL or DirectX ) to do the picking ( selection ).
The rendering mode is as I would describe it "on-demand" it usually renders when it needs to, so in a scene where realtime preview is off, it would only render if you modify something or move the camera, and as soon as you do something that needs constant updating it will do exactly that.
On top of all this there are hybrid methods, where the users desire production like quality which even to this day is difficult with GPUs so they settle with a fast watered down version of a real renderer to get as close to production quality as possible, one simple example is 3dsMax's 'Realistic' viewport it does Ambient Occlusion its not "realtime" but it does it fast enough so it's actually useful. In more advanced cases they make special extension cards to handle fast raytracing to be able to do fast / good quality graphics but still even in these cases, the main thing is the same they store the editable data in a generic internal format and feed that into some sort of renderer that outputs something, not necessarily the same that you would get from very high quality offline renderer but still it gives a good outline of what it will be like.

OpenGL water refraction

I'm trying to create an OpenGL application with water waves and refraction. I need to either cast rays from the sun and then the camera and figure out where they intersect, or I need to start from the ocean floor and figure out in which direction(s, if any) I have to go in order to hit the sun or the camera. I'm kind of stuck, can any one give me an inpoint into either OpenGL ray casting or a crash course in advanced geometry? I don't want the ocean floor to be at a constant depth and I don't want the water waves to be simple sinusoidal waves.

First things first: The effect you're trying to achieve can be implemented using OpenGL, but it is not a feature of OpenGL. OpenGL by itself is just a sophisticated triangle to screen drawing API. You got some input data and write a program that performs relatively simple rasterizing drawing operations based on the input data using the OpenGL API. Shaders give it some space; you can implement a raytracer in the fragment shader.
In your case that means, you must implement a some algorithm that generates a picture like you intend. For water is must be some kind of raytracer or fake refraction method to get the effect of looking into the water. The caustics require either a full features photon mapper, or you're good with a fake effect based on the 2nd derivative of the water surface.
There is a WebGL demo, rendering stunningly good looking, interactive water: http://madebyevan.com/webgl-water/ And here's a video of it on YouTube http://www.youtube.com/watch?v=R0O_9bp3EKQ
This demo uses true raytracing (the water surface, the sphere and the pool are raytraced), the caustics are a "fake caustics" effect, based on projecting the 2nd derivative of the water surface heightmap.

There's nothing very OpenGL-specific about this.
Are you talking about caustics? Here's another good Gamasutra article.
Reflections are normally achieved by reflecting the camera in the plane of the mirror and rendering to a texture, you can apply distortion and then use it to texture the water surface. This only works well for small waves.
What you're after here is lots of little ways to cheat :-)

Techincally, all you perceive is a result of lightwaves/photons bouncing off the surfaces and propagating through mediums. For the "real deal" you'll have to trace the light directly from the Sun with each ray following the path:
hit the water surface
refract+reflect, reflected goes into the camera(*), refracted part goes further
hits the ocean bottom
reflects
hits the water from beneath
reflect+refracts, refracted part gets out of the water and hits the camera(*), reflected again goes to the ocean bottom, reflects etc.
(*) Actually, most of the rays will miss the camera, but that will be overly expensive, so this is a cheat.
Do this for at least three wavelengths - "red", "green" and "blue". Each of them will refract and reflect differently. You'll get the whole picture by combining the three.
Then you just create a texture with the rays that got into the camera and overlay it in OpenGL.
That's a straighforward, simple and very computationally expensive way that gives an approximation to the physics beyond the caustics.

2D engine with OpenGL: Use Z buffer or own implementation for sprite sorting?

If I was making a 3D engine, the answer to this question would be clear: I'd go for using the depth buffer instead of thinking of sorting all my polygons on my own.
However, this is a different situation with 2D, because here layers can be implemented easily without the help of OpenGL - and you then could even sort and move sprites within layers. (Which isn't possible in OpenGL afaik)
(Why) should I use the OpenGL depth buffer instead of a C++ layer system running on the CPU?
How much slower would the depth buffer version be?
It is clear to me that making a layer system in C++ would impose as good as no performance impact at all, as I have to iterate over the sprites for rendering in any case.

I would suggest you to do it in software since you probably want to use transparency on your sprites and that implies you render them from back to front. Also sorting a couple of sprites shouldn't be that CPU demanding.

Use both, if you can.
Depth information is nice for post-processing and stuff like 3D-glasses, so you shouldn't throw it away. These kinds of effects can be very nice for 2D games.
Also, if you draw your (opaque) layers front to back, you can save fill-rate because the Z-Buffer can do the clipping for you (Depth tests are faster than actual drawing).
Depth testing is usually almost free, especially when you got hierarchical Z info. Because of this and the fill-rate savings, using depth testing will probably be even faster.
On the other hand, the software sorting is nice so you can actually do front to back rendering for opaque sprites and it's mandatory to do alpha-blending right (of course, you draw these sprites back to front).

Direct answers:
allowing the GPU to use the depth buffer would allow you to dynamically adjust the draw order of things without any on-CPU shuffling and would free you from having to assign things to different layers in situations where doing so is a bit of a fiction — for example, you could have effects like projectiles that come from the background towards and then in front of the player, without having to figure out which layer to assign them to all the time
on the GPU, the use of a depth would have no measurable effect, even if you're on an embedded chip, a plug-in card from more than a decade ago or an integrated part; they're so fundamental to modern GPUs that they've been optimised down to costing nothing in practical terms
However, I'd imagine you actually want to do it on the CPU for the simple reason of treating transparency correctly. A depth buffer stores one depth per pixel, so if you draw a near transparent object then attempt to draw something behind it, the thing behind won't be drawn even though it should be visible. In a 2d game it's likely that anti-aliasing will give your sprites partially transparent edges; if you submit drawing to the GPU in draw order then your partial transparencies will always be composited correctly. If you leave the z-buffer to do it then you risk weird looking fringing.

MipMapping problems in OpenGL

I'm loading 3D objects (obj or 3ds or collada files) into my openGL application. The 3 environment is quite large (a few hundred metres in all axis').
My problem is that smaller 3D objects (i.e. in the order of ~< 1-2m ) don't appear to be depth-tested properly. Depending on the zoom of the camera, I can sometimes see the back face of the object (I have been using a simple cube for testing) or other faces becoming visible/invisible/torn. Please see the attached images for a better explanation.
I am led to believe the problem is due to mipmapping being enabled. I would either like to disable mipmapping (can someone suggest a simple, fast way to do this) or set the resolution to be greater for the mipmapped objects. Or am I barking up the wrong tree completely?
Thanks
Chris

That's the result of insufficient z-buffer precission, which is an issue in games that have huge worlds but (relatively) small objects. The immediate solution would be to try using a 24 bit z-buffer instead of a 16 bit one. Another way to tackle this would be to render the game world it two steps, first the big distant objects, then clearing the zbuffer and then drawing the closer objects.
This specific problem is called z-fighting by the way, here's a great resource on this issue: http://www.codermind.com/articles/Depth-buffer-tutorial.html
The take-away is the last paragraph of the article above:
the true issue is that you can't draw
both objects that are very far and
objects that are very near with the
same depth buffer equations. If you
want to draw very far objects then you
need to sacrifice your near view by
pushing it further. To avoid clipping
artifacts you can make your collision
envelope large enough so that your
clip plane will never intercept an
existing object within your frustum.
Or you can make object gradually
disappear with transparency as they
come near your clip plane.
If you want to keep near objects and
at the same time draw mountains (or
planets) in the far distance, then you
can cut your rendering in parts. First
drawing your far objects, then
clearing the depth buffer and
rendering the near objects with a
different z buffer.

Like Julio, I believe that this is a depth precision issues, not something related to mip-mapping. However, I suggest you start by adjusting your near and far clipping plane before changing anything else (You are probably already using a 24-bit depth buffer anyways, as that is the default on most drivers/cards). Particularly the near plane should be as far away as possible for your scene. Look for calls to glFrustum or gluPerspective.

Drawing "point-like" shapes in OpenGL, indifferent to zoom

I'm working with Qt and QWt3D Plotting tools, and extending them to provide some 3-D and 2-D plotting functionality that I need, so I'm learning some OpenGL in the process.
I am currently able to plot points using OpenGL, but only as circles (or "squares" by turning anti-aliasing off). These points act the way I like - i.e. they don't change size as I zoom in, although their x/y/z locations move appropriately as I zoom, pan, etc.
What I'd like to be able to do is plot points using a myriad of shapes (^,<,>,*,., etc.). From what I understand of OpenGL (which isn't very much) this is not trivial to accomplish because OpenGL treats everything as a "real" 3-D object, so zooming in on any openGL shape but a "point" changes the object's projected size.
After doing some reading, I think there are (at least) 2 possible solutions to this problem:
Use OpenGL textures. This doesn't seem to difficult, but I believe that the texture images will get larger and smaller as I zoom in - is that correct?
Use OpenGL polygons, lines, etc. and draw *'s, triangles, or whatever. But here again I run into the same problem - how do I prevent OpenGL from re-sizing the "points" as I zoom?
Is the solution to simply bite the bullet and re-draw the whole data set each time the user zooms or pans to make sure that the points stay the same size? Is there some way to just tell openGL to not re-calculate an object's size?
Sorry if this is in the OpenGL doc somewhere - I could not find it.

What you want is called a "point sprite." OpenGL1.4 supports these through the ARB_point_sprite extension.
Try this tutorial
http://www.ploksoftware.org/ExNihilo/pages/Tutorialpointsprite.htm
and see if it's what you're looking for.

The scene is re-drawn every time the user zooms or pans, anyway, so you might as well re-calculate the size.
You suggested using a textured poly, or using polygons directly, which sound like good ideas to me. It sounds like you want the plot points to remain in the correct position in the graph as the camera moves, but you want to prevent them from changing size when the user zooms. To do this, just resize the plot point polygons so the ratio between the polygon's size and the distance to the camera remains constant. If you've got a lot of plot points, computing the distance to the camera might get expensive because of the square-root involved, but a lookup table would probably solve that.
In addition to resizing, you'll want to keep the plot points facing the camera, so billboarding is your solution, there.
An alternative is to project each of the 3D plot point locations to find out their 2D screen coordinates. Then simply render the polygons at those screen coordinates. No re-scaling necessary. However, gluProject is quite slow, so I'd be very surprised if this wasn't orders of magnitude slower than simply rescaling the plot point polygons like I first suggested.
Good luck!

There's no easy way to do what you want to do. You'll have to dynamically resize the primitives you're drawing depending on the camera's current zoom. You can use a technique known as billboarding to make sure that your objects always face the camera.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js