raytracing: problems with transformations - c++

I am having trouble incorporating transformations. For whatever reasons, everything is not going as I think it should, but to be honest - all the transformations back and forth make me quite dizzy.
As I have read everywhere (although explicit explanations are rare, imho), the principle algorithm for transformations is as follows:
transform the Ray (Origin and Direction) with the inverse of the transformation matrix
transform the resulting Intersection-Point with the transformation matrix
transform the object's normal at the Intersection-Point with the transposed of the inverse
From what I understood, that should do the trick. I'm pretty sure that my problem lies when I try to calculate the lighting, since both the initial intersection and the lighting algorithm use the same function (obj.getIntersection()). But then again, I have no idea. :(
You can read parts of my code here:
main.cpp, scene.cpp, sphere.cpp, sdf-loader.cpp
Please let me know if you need more info to help me - and please help me! ;)
EDIT:
I made some results, maybe someone "sees" (by the results) where I may be wrong:
untransformed scene:
sphere scaled (2,4,2):
box translated (0,-200,0):
sphere translated (-300,0,0):
sphere x-rotated (45°):

Generally for transformation in computer graphics I would recommend you to have a look at scratchapixel.com and particularly this lesson:
http://scratchapixel.com/lessons/3d-basic-lessons/lesson-4-geometry/
and this one, where you can see how transformations (matrices) are used to transform rays and objects:
http://scratchapixel.com/lessons/3d-basic-lessons/lesson-8-putting-it-all-together-our-first-ray-tracer/
If you don't know this amazing resource yet, I would advice to use it and maybe spread the word at your university. Your teacher should have pointed it out to you.

Related

Perspective projection based on 4 points in 2D

I'm writing to ask about homography and perspective projection.
I'm trying to write a piece of code, that will "warp" my image so that its corners align with 4 reference points that are in the 3D space - however, the game engine that I'm running it in, already allows me to get the screen position of them, so I already have their screen-space coordinates of both xi,yi and ui,vi, normalized to values between 0 and 1.
I have to mention that I don't have a degree in mathematics, which seems to be a requirement in the posts I've seen on this topic so far, but I'm hoping there is actually a solution to this problem that one can comprehend. I never had a chance to take classes in Computer Vision.
The reason I came here is that in all the posts I've seen online, the simple explanation that I came across is that each point must be put into a 1x3 matrix and multiplied by a 3x3 homography, which consists of 9 components h1,h2,h3...h9, and this transformation matrix will transform each point to the correct perspective. And that's where I'm hitting a brick wall - how do I calculate the transformation matrix? It feels like it should be a relatively simple algebraic task, but apparently it's not.
At this point I spent days reading on the topic, and the solutions I've come across are either based on matlab (which have a ton of mathematical functions built into them), or include elaborations and discussions that don't really explain much; sometimes they suggest tons of different parameters and simplifications, but rarely explain why and what's their purpose, or they are referencing books and studies that have been since removed from the web, and I found myself more confused than I was in the beginning. Most of the resources I managed to find online are also made in a different context - image stitching and 3d engine development.
I also want to mention that I need to run this code each frame on the CPU, and I'm fairly concerned about the effect of having to run too many matrix transformations and solving a ton of linear algebra equations.
I apologize for not asking about any specific code, but my general question is - can anyone point me in the right direction with this issue?
Limit the problem you deal with.
For example, if you always warp the entire rectangular image, you can treat that the coordinates of the image corners are {(0,0), (1,0), (0,1), (1,1)}.
This can simplify the equation, and you'll be able to solve the equation by yourself.
So you'll be able to implement the answer.
Note : Homograpy is scale invariant. So you can decrease the freedom to 8. (e.g. you can solve the equation under h9=1).
Best advice I can give: read a good book on the subject. For example, "Multiple View Geometry" by Hartley and Zisserman

Estimating a cuboid's rotation from its parallel projection

I paint in my spare time, and that means I have a truly massive collection of reference images. Folders full of buildings, people, animals, cars, etc. It's gotten to the point where it'd be great to tag the objects by their pose, so I can find the right object at the right angle. CVAT, an image annotating tool for machine learning, allows you to mark images with cuboids, as you can see in this picture.
But suddenly I'm wondering... is it even possible for a computer to estimate the rotation of a cuboid based on a single image, when all I can feed it are the eight (x,y) pairs that define the image of said cuboid?
My thinking is that I need to somehow invert the transformation matrix so that this cuboid looks like a rectangle. That would mean that we're looking at it "on-axis", and I'm imagining that this inversion could furnish me with those XYZ rotations I'm looking for.
My best lead right now is OpenCv's getPerspectiveTransform function, which can create a matrix that will warp an image, but that transformation seems to be purely two-dimensional.
Wikipedia does mention the idea of using an "augmented matrix" to perform transformations in an extra dimension, which seems apropos here, since I want to go from a 2D representation to a 3d.
A couple constraints & advantages that might clarify the feasibility, here:
The cuboids are rendered in a parallel projection. They don't match the perspective of the image, and that's okay! Just need a rough sense of their pose -- a margin of error of 10 degrees on any given axis of rotation is fine by me, in case there are some inexact solutions that could work.
In the case of multiple cuboids in the scene, I don't care at all about their interrelations -- each case can be treated separately.
I always have a sense of the "rear wall" of the cuboid, because I'm careful in how I make these annotations, in case that symmetry-breaking helps.
The lengths of edges are irrelevant, I'm not trying to measure the "aspect ratio" of these bounding cuboids.
Thank you for any advice or hints!

How flexible is OpenGL's quadric functionality and transformation matrices?

To give you an idea of where I'm coming from, this started as a teaching exercise to get a 12-year-old video game addict into coding. The 2D games, I did in SDL with him and that was fine because I wasn't planning on going into 3D. Yeah, right! So now I'm in at the deep end in OpenGL and mainly trying to figure out exactly what it can and cannot do. I understand the theory (still working on beziers and nurbs if the truth be told) and could code the whole thing by hand in calculated triangular vertices but I'd hate to spend days on that only to be told that there's a built in function/library that does the whole thing faster and easier.
Quadrics seem to be extremely powerful but not terribly flexible. Consider the human head - roughly speaking a 3x4x3 sphere or a torso as a truncated cone that's taller than it is wide than it is thick. Again, a quadric shape with independent x,y and z radii. Since only one radius is provided, am I right in thinking that I would have to generate it around the origin and then apply a scaling matrix to adjust them? Furthermore, if this is so, am I also correct in thinking that saving the results into a vertex array rather than a frame list results in the system neither knowing or caring how they got there?
Transitions: I'm familiar with the basic transitions but, again, consider the torso. It can achieve, maybe, a 45 degree twist from the hips to the shoulders that is distributed linearly across the entire length or even the sideways lean. This is applied around the Y or Z axis respectively but I've obviously missed something about applying transformations that are based on an independent value. (eg rot = dist x (max_rot/max_dist). Again, I could do this by hand (and will probably have to in order to apply the correct physics) but does OpenGL have this functionality built in somewhere?
Any other areas of research I need to put in would be appreciated in the notes.

moving object in the world towards a stationary camera

I want to move the camera forward, which is equivalent to moving the world back towards camera. I'm using Glut and glTranslate would do the job, but my question is how should I use it?
Suppose initially I start with glLoadIdentity(), then I set up the look at point using gluLookAt, and then I did some translation/rotation to the model. In this case how should I use glTranslate to translate the object in the world so that they can move with respect to the camera instead of their own origin/coordinate?
I thought I could save the current matrix using glGet, load Identity matrix, then do the translation I wanted, and then multiply the previous matrix back using glMultmatrix. But this didn't work for me.
And also if I want to enable yaw/pitch using glrotate, how should I do? (Also in the sense to rotate the world to make it seems rotating camera)
Sorry for my poor wording or conceptual mistake if there is any. I'm quite new to opengl and graphic programming in general and I'm still trying to fully understand the opengl pipeline, especially the matrix part. Any detailed explanation to that will also be greatly appreciated!
From reading your question, it sounds to me like what you're trying to do is simulate camera movement by translating every other object in the world about a fixed point (the camera)
While you're correct in saying that moving the camera actually moves everything else in the world about it, you seem to be going about it the wrong way. After all, look how much difficulty you're having just moving one box. Now imagine you have hundreds! Not much fun :)
Fortunately, there is a function that can help you, and you're already using it! gluLookAt (http://www.opengl.org/sdk/docs/man2/xhtml/gluLookAt.xml) is your guy. What it does under the hood is it creates a matrix (Not sure what a matrix is? Give this a read: http://solarianprogrammer.com/2013/05/22/opengl-101-matrices-projection-view-model/) that every other point in the world is multiplied by. This multiplication translates each point until its in its correct position relative to the camera. So you are correct in saying that moving the camera actually moves the whole world relative to the camera, this way we can do it all in one pass instead of having to calculate the new positions of each point manually.
So, you want to move the camera forward on the z axis? Just call gluLookAt, but pass in a value of eyez that is less than when you previously called gluLookAt. Here's an example:
gluLookAt(0,3,0,0,0,0,0,1,0);//This is out starting position, (0,3,0)
gluLookAt(0,2,0,0,0,0,0,1,0);//And this is out ending position. Notice that the eyez value has decreased by one
As for how to rotate, take a look at the second group of three parameters, the "center" parameters. Those determine what point is in the center of the camera, that is, what the camera is looking at. In the previous example, the center point was (0,0,0). You can rotate the camera by moving these points around. How you do it is a pretty complicated topic with a good bit of math thrown in, but the following links should help a bit:
http://ogldev.atspace.co.uk/www/tutorial15/tutorial15.html
http://www.fastgraph.com/makegames/3drotation/
http://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation
Don't get discouraged if it seems too hard, keep at it! Feel free to ask me if you need clarification on this answer.

GJK collision detection implementation from 2D to 3D

I apologize for the length of this question and give a pre-emptive thanks for anyone who reads through this!
So i've spent the last few days going over the GJK algorithm. I understand the general concepts behind it, and understand the most of the nitty gritties of its implementation in 2D thanks to the wonderful article by William Bittle at http://www.codezealot.org/archives/88 .
I've implemented his pseudo code (found at the end of the article) into my own c++ project, however i want to make a 3D implementation. My weakness comes into using the dot products to test the voronoi regions and the tripleProducts to get perpandicular lines. But im trying to read up more on that.
My problem comes down to the containsOrigin function. Im having trouble visualizing and accounting for the new voronoi regions that the z axis adds. I just can't seem to wrap my head around how to determine which regions contains the origin. I assume there is 4 I have to account for, each extending from the triangular planes that the comprise the 4 faces of the tetrahedron simplex. If the origin is not within any of those regions, then it is contained, and we have a collision.
How do i go about testing if it is contained in a particular voronoi region/ which triangular face is pointing in the direction of the origin?
The current 2D algorithm checks if a triangle is made, if not, then the simplex is a line and it finds the 3rd point. I assume the 3D algorithm with check if a tetrahedron is made, if not, then it will check for a triangle, if true then it will to find a 4th point to make a tetrahedron(how would i get this? using a normal in direction of origin?). If i trangle isnt made, it will find a 3rd point to make a triangle (do i still use triple product for this like in 2D?).
Any suggestions, outlines, resources, code augmentations, comments are much appretiated.
Depending on what result you expect from the GJK algorithm you might want to look at this nice tutorial from Molly Rocket: https://mollyrocket.com/849
Be aware though that his implementation only outputs intersection? yes/no. But it might be a nice start.