How does Mathf.SmoothDamp() work? what is it algorithm? - c++

I was wondering how SmoothDamp works in unity. I'm trying to re-create the function outside unity but the thing is I don't know how it works.

From Unity3d C# reference source code:
// Gradually changes a value towards a desired goal over time.
public static float SmoothDamp(float current, float target, ref float currentVelocity, float smoothTime, [uei.DefaultValue("Mathf.Infinity")] float maxSpeed, [uei.DefaultValue("Time.deltaTime")] float deltaTime)
// Based on Game Programming Gems 4 Chapter 1.10
smoothTime = Mathf.Max(0.0001F, smoothTime);
float omega = 2F / smoothTime;
float x = omega * deltaTime;
float exp = 1F / (1F + x + 0.48F * x * x + 0.235F * x * x * x);
float change = current - target;
float originalTo = target;
// Clamp maximum speed
float maxChange = maxSpeed * smoothTime;
change = Mathf.Clamp(change, -maxChange, maxChange);
target = current - change;
float temp = (currentVelocity + omega * change) * deltaTime;
currentVelocity = (currentVelocity - omega * temp) * exp;
float output = target + (change + temp) * exp;
// Prevent overshooting
if (originalTo - current > 0.0F == output > originalTo)
output = originalTo;
currentVelocity = (output - originalTo) / deltaTime;
return output;


Efficient floating point scaling in C++

I'm working on my fast (and accurate) sin implementation in C++, and I have a problem regarding the efficient angle scaling into the +- pi/2 range.
My sin function for +-pi/2 using Taylor series is the following
(Note: FLOAT is a macro expanded to float or double just for the benchmark)
* Sin for 'small' angles, accurate on [-pi/2, pi/2], fairly accurate on [-pi, pi]
// To switch between float and double
#define FLOAT float
my_sin_small(FLOAT x)
constexpr FLOAT C1 = 1. / (7. * 6. * 5. * 4. * 3. * 2.);
constexpr FLOAT C2 = -1. / (5. * 4. * 3. * 2.);
constexpr FLOAT C3 = 1. / (3. * 2.);
constexpr FLOAT C4 = -1.;
// Correction for sin(pi/2) = 1, due to the ignored taylor terms
constexpr FLOAT corr = -1. / 0.9998431013994987;
const FLOAT x2 = x * x;
return corr * x * (x2 * (x2 * (x2 * C1 + C2) + C3) + C4);
So far so good... The problem comes when I try to scale an arbitrary angle into the +-pi/2 range. My current solution is:
my_sin(FLOAT x)
constexpr FLOAT pi = 3.141592653589793238462;
constexpr FLOAT rpi = 1 / pi;
// convert to +-pi/2 range
int n = std::nearbyint(x * rpi);
FLOAT xbar = (n * pi - x) * (2 * (n & 1) - 1);
// (2 * (n % 2) - 1) is a sign correction (see below)
return my_sin_small(xbar);
I made a benchmark, and I'm losing a lot for the +-pi/2 scaling.
Tricking with int(angle/pi + 0.5) is a nope since it is limited to the int precision, also requires +- branching, and i try to avoid branches...
What should I try to improve the performance for this scaling? I'm out of ideas.
Benchmark results for float. (In the benchmark the angle could be out of the validity range for my_sin_small, but for the bench I don't care about that...):
Benchmark results for double.
Sign correction for xbar in my_sin():
Algo accuracy compared to python sin() function:
Candidate improvements
Convert the radians x to rotations by dividing by 2*pi.
Retain only the fraction so we have an angle (-1.0 ... 1.0). This simplifies the OP's modulo step to a simple "drop the whole number" step instead. Going forward with different angle units simply involves a co-efficient set change. No need to scale back to radians.
For positive values, subtract 0.5 so we have (-0.5 ... 0.5) and then flip the sign. This centers the possible values about 0.0 and makes for better convergence of the approximating polynomial as compared to the math sine function. For negative values - see below.
Call my_sin_small1() that uses this (-0.5 ... 0.5) rotations range rather than [-pi ... +pi] radians.
In my_sin_small1(), fold constants together to drop the corr * step.
Rather than use the truncated Taylor's series, use a more optimal set. IMO, this will provide better answers, especially near +/-pi.
Notes: No int to/from float code. With more analysis, possible to get a better set of coefficients that fix my_sin(+/-pi) closer to 0.0. This is just a quick set of code to demo less FP steps and good potential results.
C like code for OP to port to C++
FLOAT my_sin_small1(FLOAT x) {
static const FLOAT A1 = -5.64744881E+01;
static const FLOAT A2 = +7.81017968E+01;
static const FLOAT A3 = -4.11145353E+01;
static const FLOAT A4 = +6.27923581E+00;
const FLOAT x2 = x * x;
return x * (x2 * (x2 * (x2 * A1 + A2) + A3) + A4);
FLOAT my_sin1(FLOAT x) {
static const FLOAT pi = 3.141592653589793238462;
static const FLOAT pi2i = 1/(pi * 2);
x *= pi2i;
FLOAT xfraction = 0.5f - (x - truncf(x));
return my_sin_small1(xfraction);
For negative values, use -my_sin1(-x) or like code to flip the sign - or add 0.5 in the above minus 0.5 step.
#include <math.h>
#include <stdio.h>
int main(void) {
for (int d = 0; d <= 360; d += 20) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = my_sin1(x);
printf("%12.6f %11.8f %11.8f\n", x, sin(x), y);
0.000000 0.00000000 -0.00022483
0.349066 0.34202013 0.34221691
0.698132 0.64278759 0.64255589
1.047198 0.86602542 0.86590189
1.396263 0.98480775 0.98496443
1.745329 0.98480775 0.98501128
2.094395 0.86602537 0.86603642
2.443461 0.64278762 0.64260530
2.792527 0.34202022 0.34183803
3.141593 -0.00000009 0.00000000
3.490659 -0.34202016 -0.34183764
3.839724 -0.64278757 -0.64260519
4.188790 -0.86602546 -0.86603653
4.537856 -0.98480776 -0.98501128
4.886922 -0.98480776 -0.98496443
5.235988 -0.86602545 -0.86590189
5.585053 -0.64278773 -0.64255613
5.934119 -0.34202036 -0.34221727
6.283185 0.00000017 -0.00022483
Alternate code below makes for better results near 0.0, yet might cost a tad more time. OP seems more inclined to speed.
FLOAT xfraction = 0.5f - (x - truncf(x));
// vs.
FLOAT xfraction = x - truncf(x);
if (x >= 0.5f) x -= 1.0f;
Below is a better set with about 10% reduced error.
Yet another approach:
Spend more time (code) to reduce the range to ±pi/4 (±45 degrees), then possible to use only 3 or 2 terms of a polynomial that is like the usually Taylors series.
float sin_quick_small(float x) {
const float x2 = x * x;
#if 0
// max error about 7e-7
static const FLOAT A2 = +0.00811656036940792f;
static const FLOAT A3 = -0.166597759850666f;
static const FLOAT A4 = +0.999994132743861f;
return x * (x2 * (x2 * A2 + A3) + A4);
// max error about 0.00016
static const FLOAT A3 = -0.160343346851626f;
static const FLOAT A4 = +0.999031566686144f;
return x * (x2 * A3 + A4);
float cos_quick_small(float x) {
return cosf(x); // TBD code.
float sin_quick(float x) {
if (x < 0.0) {
return -sin_quick(-x);
int quo;
float x90 = remquof(fabsf(x), 3.141592653589793238462f / 2, &quo);
switch (quo % 4) {
case 0:
return sin_quick_small(x90);
case 1:
return cos_quick_small(x90);
case 2:
return sin_quick_small(-x90);
case 3:
return -cos_quick_small(x90);
return 0.0;
int main() {
float max_x = 0.0;
float max_error = 0.0;
for (int d = -45; d <= 45; d += 1) {
FLOAT x = d / 180.0 * M_PI;
FLOAT y = sin_quick(x);
double err = fabs(y - sin(x));
if (err > max_error) {
max_x = x;
max_error = err;
printf("%12.6f %11.8f %11.8f err:%11.8f\n", x, sin(x), y, err);
printf("x:%.6f err:%.6f\n", max_x, max_error);
return 0;

How do I resolve a collision's position properly in 2D collision detection?

My current implementation looks like this:
if (shapesCollide) {
if (velocity.y > 0) entity.position.y = other.position.y - entity.size.y;
else entity.position.y = other.position.y + other.size.y;
velocity.y = 0;
if (velocity.x > 0) entity.position.x = other.position.x - entity.size.x;
else entity.position.x = other.position.x + other.size.x;
velocity.x = 0;
However, this leads to weird handling when movement is happening on both axes - for example, having entity moving downward to the left of object, and then moving it to collide with object, will correctly resolve the horizontal collision, but will break the vertical movement.
I previously simply went
if (shapesCollide) {
position = oldPosition;
velocity = { 0, 0 };
But this lead to another multi-axis issue: if I have my entity resting atop the object, it will be unable to move, as the gravity-induced movement will constantly cancel out both velocities. I also tried considering both axes separately, but this lead to issues whenever the collision only occurs when both velocities are taken into account.
What is the best solution to resolving collision on two axes?
I assume that the entities can be considered to be more or less round and that size is the radius of the entities?
We probably need a little vector math to resolve this. (I don't know the square-root function in c++, so be aware at sqrt.) Try replacing your code inside if(shapesCollide) with this and see how it works for you.
float rEntity = sqrt(entity.size.x * entity.size.x + entity.size.y * entity.size.y);
float rOther = sqrt(other.size.x * other.size.x + other.size.y * other.size.y);
float midX = (entity.position.x + other.position.x) / 2.0;
float midY = (entity.position.y + other.position.y) / 2.0;
float dx = entity.position.x - midX;
float dy = entity.position.y - midY;
float D = sqrt(dx * dx + dy * dy);
rEntity and rOther are the radii of the objects, and midX and midY are their center coordinates. dx and dy are the distances to the center from the entity.
Then do:
entity.position.x = midX + dx * rEntity / D;
entity.position.y = midY + dy * rEntity / D;
other.position.x = midX - dx * rOther / D;
other.position.y = midY - dy * rOther / D;
You should probably check that D is not 0, and if it is, just set dx = 1, dy = 0, D = 1 or something like that.
You should also still do:
velocity.x = 0;
velocity.y = 0;
if you want the entities to stop.
For more accurate modelling, you could also try the following:
float rEntity = sqrt(entity.size.x * entity.size.x + entity.size.y * entity.size.y);
float rOther = sqrt(other.size.x * other.size.x + other.size.y * other.size.y);
float midX = (entity.position.x * rOther + other.position.x * rEntity) / (rEntity + rOther);
float midY = (entity.position.y * rOther + other.position.y * rEntity) / (rEntity + rOther);
float dxEntity = entity.position.x - midX;
float dyEntity = entity.position.y - midY;
float dEntity = sqrt(dxEntity * dxEntity + dyEntity * dyEntity);
float dxOther = other.position.x - midX;
float dyOther = other.position.y - midY;
float dOther = sqrt(dxOther * dxOther + dyOther * dyOther);
entity.position.x = midX + dxEntity * rEntity / dEntity;
entity.position.y = midY + dyEntity * rEntity / dEntity;
other.position.x = midX + dxOther * rOther / dOther;
other.position.y = midY + dyOther * rOther / dOther;
which finds the midpoints when the radii are taken into account. But I won't guarantee that that works. Also, the signs on the last additions are important.
I hope this helps (and works). Let me know if something is unclear.

How does this lighting calculation work?

I have that piece of code that is responsible for lighting a pyramid.
float Geometric3D::calculateLight(int vert1, int vert2, int vert3) {
float ax = tabX[vert2] - tabX[vert1];
float ay = tabY[vert2] - tabY[vert1];
float az = tabZ[vert2] - tabZ[vert1];
float bx = tabX[vert3] - tabX[vert1];
float by = tabY[vert3] - tabY[vert1];
float bz = tabZ[vert3] - tabZ[vert1];
float Nx = (ay * bz) - (az * by);
float Ny = (az * bx) - (ax * bz);;
float Nz = (ax * by) - (ay * bx);;
float Lx = -300.0f;
float Ly = -300.0f;
float Lz = -1000.0f;
float lenN = sqrtf((Nx * Nx) + (Ny * Ny) + (Nz * Nz));
float lenL = sqrtf((Lx * Lx) + (Ly * Ly) + (Lz * Lz));
float res = ((Nx * Lx) + (Ny * Ly) + (Nz * Lz)) / (lenN * lenL);
if (res < 0.0f)
res = -res;
return res;
I cannot understand calculations at the end. Can someone explain me the maths that is behind them? I know that firstly program calculates two vectors of a plane to compute the normal of it (which goes for vector N). Vector L stand for lighting but what happens next? Why do we calculate length of normal and light then multiply it and divide by their sizes?

Optimizing circle-circle collision response

With SFML, I made an algorithm that calculates the trajectoris of two balls after a collision; it works fine, but if I try with more than 30 balls, it freezes instantly or after 10-20 seconds.
I tried to avoid doing the same calculations multiple times, but it doesn't work.
Any suggestions?(I have an high-end PC, the problem is not there)
Phi is the collision angle, dis is distance;
void collisionResponse(Circle &a, Circle &b)
float mass1 = a.getMass();
float mass2 = b.getMass();
float disX = a.pos.x - b.pos.x;
float disY = a.pos.y - b.pos.y;
float phi = atan2(disY, disX);
float speed1 = a.getSpeed();
float speed2 = b.getSpeed();
float angle1 = a.getAngle();
float angle2 = b.getAngle();
float v1x = speed1*cos((angle1 - phi));
float v1y = speed1*sin((angle1 - phi));
float v2x = speed2*cos((angle2 - phi));
float v2y = speed2*sin((angle2 - phi));
float f1x = ((mass1 - mass2)*v1x + (mass2 + mass2)*v2x) / (mass1+mass2);
float f2x = ((mass1 + mass1)*v1x + (mass2 - mass1)*v2x) / (mass1+mass2);
float f1y = v1y;
float f2y = v2y;
float cosphi = cos(phi);
float sinphi = sin(phi);
float cosphiPI = cos(phi + PI / 2);
float sinphiPI = sin(phi + PI / 2);
a.speed.x = cosphi*f1x + cosphiPI*f1y;
a.speed.y = sinphi*f1x + sinphiPI*f1y;
b.speed.x = cosphi*f2x + cosphiPI*f2y;
b.speed.y = sinphi*f2x + sinphiPI*f2y;
while (sqr(a.pos.x - b.pos.x) + sqr(a.pos.y - b.pos.y) <= sqr(a.getRadius()+ b.getRadius()))

How to speed up bilinear interpolation of image?

I'm trying to rotate image with interpolation, but it's too slow for real time for big images.
the code something like:
for(int y=0;y<dst_h;++y)
for(int x=0;x<dst_w;++x)
//do inverse transform
fPoint pt(Transform(Point(x, y)));
//in coor of src
int x1= (int)floor(pt.x);
int y1= (int)floor(pt.y);
int x2= x1+1;
int y2= y1+1;
Mask[y][x]= 1; //show pixel
float dx1= pt.x-x1;
float dx2= 1-dx1;
float dy1= pt.y-y1;
float dy2= 1-dy1;
pd[x].blue= (dy2*(ps[y1*src_w+x1].blue*dx2+ps[y1*src_w+x2].blue*dx1)+
pd[x].green= (dy2*(ps[y1*src_w+x1].green*dx2+ps[y1*src_w+x2].green*dx1)+
pd[x].red= (dy2*(ps[y1*src_w+x1].red*dx2+ps[y1*src_w+x2].red*dx1)+
//nearest neighbour
//pd[x]= ps[((int)pt.y)*src_w+(int)pt.x];
Mask[y][x]= 0; //transparent pixel
pd+= dst_w;
How I can speed up this code, I try to parallelize this code but it seems there is no speed up because of memory access pattern (?).
The key is to do most of your computations as ints. The only thing that is necessary to do as a float is the weighting. See here for a good resource.
From that same resource:
int px = (int)x; // floor of x
int py = (int)y; // floor of y
const int stride = img->width;
const Pixel* p0 = img->data + px + py * stride; // pointer to first pixel
// load the four neighboring pixels
const Pixel& p1 = p0[0 + 0 * stride];
const Pixel& p2 = p0[1 + 0 * stride];
const Pixel& p3 = p0[0 + 1 * stride];
const Pixel& p4 = p0[1 + 1 * stride];
// Calculate the weights for each pixel
float fx = x - px;
float fy = y - py;
float fx1 = 1.0f - fx;
float fy1 = 1.0f - fy;
int w1 = fx1 * fy1 * 256.0f;
int w2 = fx * fy1 * 256.0f;
int w3 = fx1 * fy * 256.0f;
int w4 = fx * fy * 256.0f;
// Calculate the weighted sum of pixels (for each color channel)
int outr = p1.r * w1 + p2.r * w2 + p3.r * w3 + p4.r * w4;
int outg = p1.g * w1 + p2.g * w2 + p3.g * w3 + p4.g * w4;
int outb = p1.b * w1 + p2.b * w2 + p3.b * w3 + p4.b * w4;
int outa = p1.a * w1 + p2.a * w2 + p3.a * w3 + p4.a * w4;
wow you are doing a lot inside most inner loop like:
1.float to int conversions
can do all on floats ...
they are these days pretty fast
the conversion is what is killing you
also you are mixing float and ints together (if i see it right) which is the same ...
any unnecessary call makes heap trashing and slow things down
instead add 2 variables xx,yy and interpolate them insde your for loops
3.if ....
why to heck are you adding if ?
limit the for ranges before loop and not inside ...
the background can be filled with other fors before or later