What the math behind such animation trajectories? - c++

What's the math behind something like this? C++ perspective.
More examples on this MSDN page here.
UPDATE: Was asked for a more concrete question. What's the math/animation theory for Penner's tweens^? How do you come up with those formulas? What are the math principles they are based on?
Me and math, we are not BFFs! I'm working on a multi-FLOAT value animator for a UI thing I'm writing and I was wondering what's the math from a native C++ programmer's point of view for generating such a trajectory.
Googled and found code but I'm also looking for a bit of theory from a programming perspective... not just code or pure math. I can whip the code I need together from what I found online but I'd like to understand it in the process. Like this site that allows one to experiment with an easing function generator.
I could also use the Windows Animation Manager (and I might if things get bloody), but that operates on a single float. And just animating RGB requires animating each FLOAT by itself. It leads to huge code-bloat... very bad.
If anyone has some hints, I would very much appreciate it. I'm looking mainly for theory from a programming perspective. The end goal is to write a bunch of different animation algorithms that can animate a set of FLOATs from their initial values to their target values in a period of time or speed and such.
The plan is not just to get the code written, but also to understand what goes on behind it. And then maybe get creative with this animations... unless these prove to be some rigid standard math functions.

So think of the requirements for a tweening function.
The function should represent a valid smooth motion between two positions/states. For those who haven't read the relevant section of the book, this means that f should be a continuous and differentiable function such that f(0) == 0 and f(1) == 1; actual motions are constructed using this as a primitive.
"Ease" (in the animation tweening sense) means "derivative is zero"; this gives the effect that the motion starts and/or ends with zero velocity (i.e. a standstill). So "ease-in" means f'(0) == 0 and "ease-out" means f'(1) == 0.
Everything else is based on aesthetic considerations.
Cubic curves (e.g. Bezier/Hermite splines) are popular partly because they let you control both the position and tangent(speed) at both ends of the curve, but also because they are close to the natural shape that a flexible beam adopts if you constrain its position it at a few points. The cubic shape minimises the internal stress of a flexed beam. (Unsurprisingly, these wooden beams are known to boat designers and other drafters as "splines", for this is where we get the word.)
Historically, hand-drawn cartoon animators have always specified their tweens by feel, based on experience. Key animators draw a chart (called a "timing chart"; look this up on your favourite image search engine) on the side of their key drawings, which tell the inbetweeners how the intermediate cels should be timed.
Camera motion (pan, zoom, rotate), however, were a different matter. Layout/animation artists specified the start and the end of the motion (specified using a field chart), the number of frames over which the motion would happen, instructions on easing and anything else the layout/animation team felt important (e.g. if you had to "linger").
The actual motions needed to be calculated; the audience would notice if one frame of a rotation was out even by a couple of hundredths of a degree. Doing these calculations was part of the job of the camera department.
There's a wonderful book called "Basic Animation Stand Techniques" by Brian Salt which dates from back in the days of physical animation cameras, and describes in some detail the sort of thing they had to do, and to what extent. I recommend it if you're at all interested in this stuff.

Math is math is math.
A good tutorial on Riemann Sum will demonstrate the concept.
In fundamental programming, you have a math equation that generates a Y value (height) for a given X (time). Periodically, like once a second for example, you plug in a new X (time) value and get the height back.
The more often you evaluate this function, the better the resolution (this is where the diagrams of a Riemann sum and calculus come in). The best you will get is an approximation to the curve that looks like stair steps.
In embedded systems, there is not a lot of resources to evaluate a function like this very frequently. The curve can be approximated using line segments. The more line segments, the better the approximation (improves accuracy). So one method is to break up the curve into line segments. For a given x, use the appropriate linear formula for the line. Evaluation of a line usually takes less resources than evaluating a higher degree equation.
Your curves are usually generated from equations of Physics. So not only do you need to improve on Math, you should also improve on Physics.
Otherwise you can search the web for libraries that handle trajectories.
As we traverse closer to the hardware, a timer can be used to call a method that evaluates the trajectory function for the given X. The timer helps produce a more accurate time value.
Search the web for "curve fit algorithm", "Bresenham algorithm", "graphics collision detection algorithms"

Related

How flexible is OpenGL's quadric functionality and transformation matrices?

To give you an idea of where I'm coming from, this started as a teaching exercise to get a 12-year-old video game addict into coding. The 2D games, I did in SDL with him and that was fine because I wasn't planning on going into 3D. Yeah, right! So now I'm in at the deep end in OpenGL and mainly trying to figure out exactly what it can and cannot do. I understand the theory (still working on beziers and nurbs if the truth be told) and could code the whole thing by hand in calculated triangular vertices but I'd hate to spend days on that only to be told that there's a built in function/library that does the whole thing faster and easier.
Quadrics seem to be extremely powerful but not terribly flexible. Consider the human head - roughly speaking a 3x4x3 sphere or a torso as a truncated cone that's taller than it is wide than it is thick. Again, a quadric shape with independent x,y and z radii. Since only one radius is provided, am I right in thinking that I would have to generate it around the origin and then apply a scaling matrix to adjust them? Furthermore, if this is so, am I also correct in thinking that saving the results into a vertex array rather than a frame list results in the system neither knowing or caring how they got there?
Transitions: I'm familiar with the basic transitions but, again, consider the torso. It can achieve, maybe, a 45 degree twist from the hips to the shoulders that is distributed linearly across the entire length or even the sideways lean. This is applied around the Y or Z axis respectively but I've obviously missed something about applying transformations that are based on an independent value. (eg rot = dist x (max_rot/max_dist). Again, I could do this by hand (and will probably have to in order to apply the correct physics) but does OpenGL have this functionality built in somewhere?
Any other areas of research I need to put in would be appreciated in the notes.

Optical flow vs keypoint matching: what are the differences?

I've spent some months studying and making experiments with the process of keypoint detection, description and matching. In the last period, I'm also into the concepts behind augmented reality, precisely the "markerless" recognition and pose estimation.
Luckily, I've found that the previous concepts are still widely used in this setting. A common pipeline to create a basic augmented reality is the following, without going on details about each needed algorithm:
While capturing a video, at every frame...
Get some keypoints and create their descriptors
Find some matches between these points and the ones inside a previously saved "marker" (like a photo)
If matches are enough, estimate the pose of the visible object and play with it
That is, a very simplified procedure used, for example, by this student(?) project.
Now the question: during my personal researches, I've also found another method called "optical flow". I'm still at the beginning of the study, but first I would like to know how much different is it from the previous method. Specifically:
What are the main concepts behind it? Does it use a "subset" of the algorithms roughly described before?
What are the main differences in term of computational costs, performance, stability and accurancy? (I know this could be a too generalized question)
Which one of those is more used in commercial AR tools? (junaio, Layar, ...)
Thanks for your cooperation.
Optical flow(OF) is method around so called "brightness constancy assumption". You assume that pixels - more specific, theirs intensities (up to some delta) - are not changing, only shifting. And you find solution of this equation:
I(x,y,t) = I(x+dx, y+dy, t+dt).
First order of Tailor series is:
I(x + dx, y+dy, t+ dt) = I (x,y,t) + I_x * dx + I_y * dy + I_t * dt.
Then you solve this equation and get dx and dy - shifts for every pixel.
Optical flow is mainly used for tracking and odometry.
upd.: if applied not to whole image, but the patch, optical flow is almost the same, as Lucas-Kanade-Tomashi tracker.
Difference between this method and feature-based methods is density. With feature points you usually get difference in position of the feature points only, while optical flow estimates it for whole image.
The drawback is that vanilla OF works only for small displacements. For handling larger ones, one can downscale image and calculate OF on it - "coarse-to-fine" method.
One can change "brightness constancy assumption" to, i.e., "descriptor constancy assumption" and solve the same equation but with descriptor value instead of raw intensity. SIFT flow is an example of this .
Unfortunately, I don`t know much about augmented reality commercial solutions and cannot answer the last question.
Optical flow computation is slow, while recent developments in detection speed has been a lot. Augmented reality commercial solutions require realtime performance. Hence, it is hard to apply optical flow based techniques (until you use a good GPU). AR systems mostly use feature based techniques. Most of the time their aim is to know the 3D geometry of the scene, which can be robustly estimated by a set of points. Other differences have been mentioned by old-ufo.

Calibrating camera with a glass-covered checkerboard

I need to find intrinsic calibration parameters of a single. To do this I take several images of checkerboard patten from different angles and then use calibration software.
To make the calibration pattern as flat as possible, I print it on a paper and cover with a 3mm glass. Obviously image of the pattern is modified by glass, because it has a different refraction coefficient compared to air.
Extrinsic parameters will be distorted by the glass. This is because checkerboard is not in place we see it in. However, if thickness of the glass and refraction coefficients of glass and air are known, it seems to be possible to recover extrinsic parameters.
So, the questions are:
Can extrinsic parameters be calculated, and if yes, then how? (This is not necessary right now, just an interesting theoretical question)
Are intrinsic calibration parameters obtained from these images equivalent to ones obtained from a usual calibration procedure (without cover glass)?
By using a glass, calibration parameters as reported by GML Camera Calibration Toolbox (based on OpenCV), become much more accurate. (Does it make any sense at all?) But this approach has a little drawback - unwanted reflections, especially from light sources.
I commend you on choosing a very flat support (which is what I recommend myself here). But, forgive me for asking the obvious question, why did you cover the pattern with the glass?
Since the point of the exercise is to ensure the target's planarity and nothing else, you might as well glue the side opposite to the pattern of the paper sheet and avoid all this trouble. Yes, in time the pattern will get dirty and worn and need replacement. So you just scrape it off and replace it: printing checkerboards is cheap.
If, for whatever reasons, you are stuck with the glass in the front, I recommend doing first a back-of-the-envelope calculation of the expected ray deflection due to the glass refraction, and check if it is actually measurable by your apparatus. Given the nominal focal length in mm of the lens you are using and the physical width and pixel density of the sensor, you can easily work it out at the image center, assuming an "extreme" angle of rotation of the target w.r.t the focal axis (say, 45 deg), and a nominal distance. To a first approximation, you may model the pattern as "painted" on the glass, and so ignore the first refraction and only consider the glass-to-air one.
If the above calculation suggests that the effect is measurable (deflection >= 1 pixel), you will need to add the glass to your scene model and solve for its parameters in the bundle adjustment phase, along with the intrinsics and extrinsics. To begin with, I'd use two parameters, thickness and refraction coefficient, and assume both faces are really planar and parallel. It will just make the computation of the corner projections in the cost function a little more complicated, as you'll have to take the ray deflection into account.
Given the extra complexity of the cost function, I'd definitely write the model's code to use Automatic Differentiation (AD).
If you really want to go through this exercise, I'd recommend writing the solver on top of Google Ceres bundle adjuster, which supports AD, among many nice things.

Simplest way to simulate basic diffusion over a 3D matrix?

I'm currently writing a program that will simulate in very basic terms the diffusion and pressure of a gas in a 3D volume with boundaries throughout - Think for example an ant cave.
The effects I want to achieve:
1. Gas diffuses throughout the environment over time, respecting walls.
2. I'd like to measure pressure, or the compression of the gas, per grid point. The effect of this should be that if a room is opened the gas will diffuse out of the opening in a speed that reflects the pressure difference.
My problem is that I lack the knowledge to fully understand theoretical math equations, and to be honest I'm really not looking for an accurate simulation. I'd just want to achieve some of the prominent effects of the physics at play. I'm not interested in fluid dynamics (For example the simulation of smoke).
I'll be writing this program in OpenCL but happy to take any form of code examples, be it C or pseudo code.
I'm thinking I should pass in 3 3D arrays - One for the gas, one that defines the walls (eg 1 at xyz = wall), and one to store the pressure.
I'm currently assuming checking for the wall is easy enough. One simply checks each neighbour cell for it and account for the cell if its not a wall:
For each grid point,
is wallmatrix[x+1] a wall?
[diffusion math here]
is wallmatrix[x-1] a wall?
[diffusion math here]
is wallmatrix[y+1] a wall?
[diffusion math here]
etc...
But I'm just not sure what to do with the diffusion math, nor how I would include pressure in all this?
Diffusion is one of the easiest things to simulate because it's self smoothing.
For example, you could run your simulation in terms of constant time steps and keeping track of the individual particle positions, and at each time step move each particle a fixed (small) distance, but in a random direction.
There are other ways too, for example, you can do a grid based approach, where change the number of particles in each grid location.
A slight issue with your question is where you say, "diffuse out of the opening in a speed that reflects the pressure difference". Diffusion doesn't really do this, since diffusion is just the random motion of particles. I think, though, that even straight diffusion might look satisfying to you here, since the gas will diffuse out of an opening, and it will look faster. Really what will be happening though is that it will be diffusing out at the same speed as everywhere else, it's just that nothing will be diffusing back in. Still, if this isn't satisfying, then you will need to get into fluid dynamics, at least a bit, since this is how one describes how fluid behaves when there's a pressure gradient, not diffusion.
Well, this is not an easy task!
First of all: you want to simulate basic diffusion OR the complete motion of the gas? The second case isn't easy at all, but you can get an idea here.
If you just want to diffuse a gas in an static environment, things are easier but you can't measure the
total pressure, your only variable will be the local concentration of the gas.
This phenomena is governed by the Fick laws; the second one is probably what you are looking for.
Read for finite difference methods to understand how to discretize the diffusion equation.
The subject is quite big to write a complete answer here.

Which deconvolution algorithm is best suited for removing motion blur from text?

I'm using OpenCV to process pictures taken with a mobile phone. The pictures contain text, and they have small amounts of motion blur, which I need to remove.
What would be the most viable algorithm to use? I have tested so far Lucy-Richardson and Weiner deconvolution, but they did not yield satisfactory results.
Agree with #TheJuice, your problem lies in the PSF estimation. Usually to be able to do this from a single frame, several assumptions need to be made about the factors leading to the blur (motion of object, type of motion of the sensor, etc.).
You can find some pointers, especially on the monodimensional case, here. They use a filtering method that leaves mostly correlation from the blur, discarding spatial correlation of original image, and use this to deduce motion direction and thence the PSF. For small blurs you might be able to consider the motion as constant; otherwise you will have to use a more complex accelerated motion model.
Unfortunately, mobile phone blur is often a compound of CCD integration and non-linear motion (translation perpendicular to line of sight, yaw from wrist motion, and rotation around the wrist), so Yitzhaky and Kopeika's method will probably only yield acceptable results in a minority of cases. I know there are methods to deal with that ("depth awareness" and other) but I have never had occasion of dealing with them.
You can preview the results using photo recovery software such as Focus Magic; while they do not employ YK estimator (motion description is left to you), the remaining workflow is necessarily very similar. If your pictures are amenable to Focus Magic recovery, then probably YK method will work. If they are not (or not enough, or not enough of them to be worthwhile), then there's no point even trying to implement it.
Motion blur is a difficult problem to overcome. The best results are gained when
The speed of the camera relative to the scene is known
You have many pictures of the blurred object which you can correlate.
You do have one major advantage in that you are looking at text (which normally constitutes high contrast features). If you only apply deconvolution to high contrast (I know that the theory is often to exclude high contrast) areas of your image you should get results which may enable you to better recognise characters. Also a combination of sharpening/blurring filters pre/post processing may help.
I remember being impressed with this paper previously. Perhaps an adaption on their implementation would be worth a go.
I think the estimation of your point-spread function is likely to be more important than the algorithm used. It depends on the kind of motion blur you're trying to remove, linear motion is likely to be the easiest but is unlikely to be the kind you're trying to remove: i imagine it's non-linear caused by hand movement during the exposure.
You cannot eliminate motion blur. The information is lost forever. What you are dealing with is a CCD that is recording multiple real objects to a single pixel, smearing them together. In other words if the pixel reads 56, you cannot magically determine that the actual reading should have been 37 at time 1, and 62 at time 2, and 43 at time 3.
Another way to look at this: imagine you have 5 pictures. You then use photoshop to blend the pictures together, averaging the value of each pixel. Can you now somehow from the blended picture tell what the original 5 pictures were? No, you cannot, because you do not have the information to do that.