Tracking camera movement and object orientation in xtk - xtk

This is a followup to Matt's previous question about camera orientation. I'm working with him on a javascript interface for a python analysis code for 3D hydro simulations.
We've successfully used xtk to build a 3D model of the mesh structure in our simulation. The resulting demo looks a lot like the simple cube demo on the xtk website so your advice based on that demo should be readily portable to our use case.
We were able to infer the view matrix at runtime from the XTK camera object. After a lot of poking and some trial and error, we figured out that the view matrix is really (in openGL nomenclature) the model-view matrix - it combines the camera's view and translation with the orientation and translation of the model the camera is looking at.
We are trying to infer the orientation of the camera relative to the (from our point of view) fixed model as we click, drag, and zoom with respect to the model. In the end, we'd like to save a set of keyframes from which we can generate a camera path that will eventually be exported to python to make a 3D volume rendering movie of the simulation data. We've tried a bunch of things but have been unable to invert the model-view matrix to infer the camera's orientation with respect to the model.
Do you have any insight into how this might be done? Is our inference about the view matrix correct or is it actually tracking something different from what I described above?
From our point of view it would be really great if xtk kept track of the camera's up, look, and position vectors with respect to the model so that we could just query for and use the values directly.
Thanks very much for your help with this and for making your visualization toolkit freely available.

This page might be useful as long as I'm understanding your needs . It gives a very well understanding of the View Matrix which i needed myself and about half-way down the page might be a bit of info you could use .


Rasterization and Secondary Reflections

I have worked to develop a GPU-based underwater imaging sonar simulation for real-time applications (see more details in my last paper). The mission part is the reverberation phenomenon, that can be represented by a multipath algorithm.
This work uses precomputed information (normal, depth and angle) during rasterization pipeline using shaders in order to calculate the simulated sonar data, however, this way is restricted to primary reflections. So I need to take account the secondary reflections. Could ray tracing be used only for this part, in a hybrid pipeline (rasterization and ray tracing)?
I hope I can help!
With raytracing, in order to calculate secondary reflections you normally need to first calculate each ray's primary reflection, and you then recursively shoot off another ray from that position. I guess you could skip the first reflection part of raytracing if you can use you shader results to figure out where each ray starts and in which direction it should reflect. You could shoot your rays out of the pixels in the shader's result, using depth information, pixel coordinates, and camera parameters to figure out where the ray's origin is, and using normal information to figure out which direction the ray should go in.
From looking at your project's paper, I think raytracing would be a very useful tool for this project, and I wonder if it might be better to just go for a full raytracing approach to simplify the process. Why exactly do you want to do the primary reflections through shaders? I would recommend looking into nvidia optix, which performs raytracing on the gpu, and looking into global illumination techniques in order to calculate reflections off of all objects in the scene. Global illumination techniques also take into account the fact that surfaces are not perfectly smooth without the use of normal maps, as mentioned in your paper, by using monte carlo integration.
I hope this helps, if you would like me to clarify anything or have any other questions feel free to ask!

ROS Rviz visualization of Tango Pose data

We have modified sample code for the C API so Tango pose data (position (x,y,z) and quaternion (x,y,z,w)) is published as PoseStamped ROS messages.
We are attempting to visualize the pose using Rviz. The pose data appears to need some transformation as the rotation of the Rviz arrow does not match the behavior of the Tango when we move it around.
We realize that in the sample code, before visualization on the Tango screen, the pose data is transformed into a 4x4 Pose matrix (function PoseData::GetExtrinsicsAppliedOpenGLWorldFrame), which is then multiplied left and right by various matrices representing changes of coordinate frames (for instance, Tango to OpenGL).
Ideally, we would be able to apply a similar transformation to the pose data before publishing it for visualization. However we must keep the pose data in the position (x,y,z) and quaternion (x,y,z,w) format in order to publish it in a PoseStamped message, and we do not see what transform to apply.
We have looked at the Tango coordinate systems conventions but the transformations the Tango developers suggest we apply are only suited for pose data in a Pose matrix format. We have also attempted to apply transformations applied by Ologic in their code to no avail.
Does anyone have any suggestions on how to transform Tango pose data, without changing its format, for correct visualization on the Rviz OpenGL interface?
If it's OpenGL convention, you will basically need to do a transformation on the left hand side of the pose data. The c++ motion tracking example has a line doing this operation here. You could ignore the rotation part, but just apply following code:
glm::mat4 opengl_world_T_opengl_camera = tango_gl::conversions::opengl_world_T_tango_world() * start_service_T_deivce;
I know that is a late answer but it can maybe help others people.
If you want to visualize any data with Rviz, I assume that you want to use ros. Then maybe the best way to do it is to use the rasjava library to do your Tango android app. It works well for me. I you just have to use poseStamp, odometry and tf publisher on your tango device and then display the topic with rviz. Morever it is one of the best way to keep the real-time aspect.
Moreover here there is 2 good way to learn how to use rosjava :

OpenGL - selective world rendering

I'm building a miniature city with the basic minimum looks of a city (roads,buildings,trees etc) where u can move around. I know that rendering the whole model set in each frame doesn't work...
So can anyone give me an insight on the standard (but easiest) procedure used in selectively rendering only the visible parts of the system? I mean, just displaying only the visible stuff (with respect to the camera position) and not rendering the unseen part..
Im using VC++ and GLUT API.
Maybe this Wikipedia article provides a very basic introduction to the field of culling techniques.
A good starting point and one of the easiest techniques is view frustum culling. With this method you check for each object in your scene if it is inside the viewing volume (viewing frustum). This basically amounts to checking for some simplified bounding volume of the geometry (like a box or a sphere, that completely contain the geometry) if it lies inside the viewing frustum, defined by six planes.
This can further be optimized by grouping objects by their position and create a so-called bounding volume hierarchy, this way you e.g. first check if a whole city block is inside the viewing volume (by using a bounding volume that contains the whole block) and only if it is, you further check the individual houses.
A more complicated technique is occlusion culling, which means checking if an object is completely hidden behind another object. Because these techniques can get substantially more complicated it should (if done) actually be done after the view frustum culling. OpenGL has hardware occlusion queries that can aid you in determining if an object is actually visible, but they require some additional work to work well. Especially for cities there may be special two-dimensional occlusion culling techniques (long time ago I heard about that, don't know).
This is just a very broad overview, feel free to google for individual keywords. It is always a good idea to carefully weight if the additional CPU-overhead is worth it (especially with complicated occlusion culling techniques), considering that nowadays the trend is to batch as many geometry as possible into a single draw call (by the way, I hope you don't use immediate mode glBegin/glEnd, otherwise changing this to vertex arrays or better VBOs is the first point on your agenda). But view frustum culling might be a nice and easy starting point, especially if the city gets rather large.
Google "binary space partition trees".
BSP trees are a good means of determining what should be rendered from the camera's view angle and position. The old-school first-person shooters, i.e. Quake et al, used them (or at least some derivation of the principle).
Here is a good FAQ.
Other good resources:

OpenGL animation

If I have a human body 3d model, that I want to animate walking, what is the best way to achieve this? Here are the possible ways I see this being implemented:
Create several models with the legs in different positions and then interpolate between these models.
Load the model into openGL, and somehow figure which vertices correspond to the legs and perform the appropriate transformations.
Implement a skeleton or armature (similar to this: blender animation wiki).
Technique that you described in the first option is called morph target animation and often used for some detailes of animation like facial animation or maybe opening and closing of hands.
Second option is procedural or physical animation which works something like robotics where you give the body of your character some velocity to move forward and calculate what legs need to do for it to avoid falling. But you wouldn't do it directly on vertices, but on skeleton. See next one.
Third option is skeletal animation which animates skeleton and the vertices follow it by the set of rules. Attaching vertices to skeleton is called skinning.
I suggest that, after getting hang of opengl stuff (viewing and positioning models in space, camera, etc), you start with skeletal animation.
You will need a rigged and animated model for your 3d app of choice. Then you can write an exporter to your custom format or choose a format that you want to read from your app. That file format should contain description of the model, skeleton, skinning and key frames. Than you read and use that data from your code to build the mesh, skeleton and animate over key frames.
If I were you, I'd download Blender from and work through some animation tutorials. For example, this one:
Having done that, you can then export your model and animations using e.g. the Ogre exporter. I think this is the latest version, but check to make sure:
From there, you just need to write the C++ code to load everything in, interpolate between keyframes, etc. I have code I can show you for this if you're interested.

How do you determine when an object is drawn on-screen in OpenGL?

I'm extremely new to OpenGL. I'm writing a program that displays flying 3D text on screen. I need to know when certain text string appears (drawn) onto the screen and are visible to the user. The program needs to identify which text strings are displayed. (Note: although my problem deals with text, it could be generalized to any OpenGL object).
At first, I started to think that I could use OpenGL's picking mechanism, but so far I've only seen examples where the selection area is focused on some sort of user interaction. I want to know what objects are displayed on the entire window area. This leads me to think I'm on the wrong track... Am I missing something?
Any suggestions are welcome.
You can use the query objects (specifically those object created using GL_ARB_occlusion_query extension Specification). Those object are used to query how many fragments are rendered using a sequence of OpenGL operations (begin/end, etc...).
Another scheme (software only), is to determine a bounding box for your rendered text, then compute mathematically whether the bounding box is inside the view frustrum (derived from the current perspective used for rendering.
A note: using OpenGL picking doesn't necessary imply the use of gluPickMatrix. You can render you scene "as is", and the query the rendered names (altought picking is deprecated from OpenGL 3).
Query objects are easy to use, and they are lightweight. Picking is another good solution for most hardware, but more schematic than query objects.
hmm, is it actually in 3D? or is it just 2D text on the screen in 2D space? in that case I would just keep track of it manually. how exactly are you drawing your text?
generally the way you do this is with a "frustum check" where you basically just make a volume for the camera and test whether you're 3d objects are inside it or not.
You can try OpenGL's feedback mechanism. In this mode, OpenGL generates fragments and passes them to a feedback buffer. If something is clipped, no fragments will be generated. When the text becomes visible, you will find the corresponding fragment in the fragment buffer.
This link should get you started.
Here is another link, the Question 10.010 seems particularly relevant to what you want.
Run your object coordinates through your projection and modelview matrices to get screen-space coordinates. Compare the X/Y output against your screen extents to figure out if the text is on-screen.