Clamping a value to a range (sort of) - c++

I have a slider that returns values from 0.0f to 1.0f.
I want to use this value and clamp it to MIN and MAX, but not exactly clamp.
Say min is 0.2f and max is 0.3f. When the slider would be at 0, I want 0.2f. When the slider is at 0.5f, I want 0.25f, and so on.
It's just so that the effect of the slider is not as strong.
given MIN MAX and sliderVal, how could I clamp the sliderVal?
Thanks

slider_range = slider_max - slider_min;
range = range_max - range_min;
value = (double)(slider_pos - slider_min) / slider_range * range + range_min;

Assuming you want the slider to linearly change between 0.2f and 0.3f, then the transformation from the interval [0.0 1.0] to [0.2 0.3] is trivial:
newVal = 0.2f + (sliderVal)*0.1f;
Looking at this from a mathematical perspective, you want the output to be linear with respect to the input, according to your desciption. Thus, the transfer function between the input and output values must be of the form:
y = mx + b
Consider the x value to be the input (the slider value), and the y value to be the output (the new, desired value). Thus, you have two points: (0.0, 0.2) and (1.0, 0.3) Substitute these points into the above equation:
0.2 = (0.0)m + b
0.3 = (1.0)m + b
You now have a system of linear equations which are trivial to solve for:
0.2 = (0.0)m + b --> b = 0.2
0.3 = (1.0)m + b --> 0.3 = m + 0.2 --> m = 0.1
Thus, the transfer function is:
y = 0.1 * x + 0.2
Q.E.D.
We can generalize the above process. Instead of using points (0.0, 0.2) and (1.0, 0.3), use points (minSlider, maxSlider) and (minValue, maxValue).
minValue = (minSlider)m + b
maxValue = (maxSlider)m + b
Elimate the variable b:
minValue = (minSlider)m + b
-maxValue = -(maxSlider)m - b
--> minValue-maxValue = (minSlider-maxSlider)m
m = (minValue-maxValue)/(minSlider-maxSlider)
Eliminate the variable m:
minValue*maxSlider = (minSlider*maxSlider)m + b*maxSlider
-maxValue*minSlider = -(minSlider*maxSlider)m - b*minSlider
--> minValue*maxSlider - maxValue*minSlider = b(maxSlider-minSlider)
b = (minValue*maxSlider - maxValue*minSlider)/(maxSlider-minSlider)
You can verify that these equations give you the exact same values for m and b. If we assume that the minimum slider value will always be 0.0:
m = (minValue-maxValue)/(minSlider-maxSlider)
b = (minValue*maxSlider - maxValue*minSlider)/(maxSlider-minSlider)
--> m = (maxValue-minValue)/(maxSlider)
b = minValue
In C++:
const double maxSlider = 1.0;
const double minValue = 0.2;
const double maxValue = 0.3;
double value = (maxValue-minValue)/(maxSlider)*getSliderPosition() + minValue;

Basically you have
0.0f -> MIN
1.0f -> MAX
and you want
clampedVal = sliderVal * ( MAX - MIN ) + MIN

std::lerp does this. It accepts three floating points and clamps interpolates third argument between first and second.
Qouting from cppreference:
#include <iostream>
#include <cmath>
int main()
{
float a=10.0f, b=20.0f;
std::cout << "a=" << a << ", " << "b=" << b << '\n'
<< "mid point=" << std::lerp(a,b,0.5f) << '\n'
<< std::boolalpha << (a == std::lerp(a,b,0.0f)) << ' '
<< std::boolalpha << (b == std::lerp(a,b,1.0f)) << '\n';
}
Output:
a=10, b=20
mid point=15
true true

Related

Why 0.2f + 0.3f == 0.5f?

I already know how floating point numbers stored in memory and I understand why expression 0.1 + 0.2 != 0.3 is True.
But I don't understand why 0.2f + 0.3f == 0.5f is true.
Here is my code:
cout << setprecision(64)
<< "0.3 = " << 0.3 << "\n"
<< "0.2 = " << 0.2 << "\n"
<< "0.2 + 0.3 = " << 0.2 + 0.3 << "\n"
<< "0.3f = " << 0.3f << "\n"
<< "0.2f = " << 0.2f << "\n"
<< "0.2f + 0.3f = " << 0.2f + 0.3f << "\n";
I get output:
0.3 = 0.299999999999999988897769753748434595763683319091796875
0.2 = 0.200000000000000011102230246251565404236316680908203125
0.2 + 0.3 = 0.5
0.3f = 0.300000011920928955078125
0.2f = 0.20000000298023223876953125
0.2f + 0.3f = 0.5
I agree that if we sum 0.3 + 0.2 with double types a result will be 0.5, because 0.299999999999999988897769753748434595763683319091796875 + 0.200000000000000011102230246251565404236316680908203125 = 0.5.
But I still don't understand why sum 0.2f + 0.3f is 0.5 too. I expect the result will be 0.50000001490116119384765625 (0.300000011920928955078125 + 0.20000000298023223876953125).
Could you please help me understand where I'm wrong?
The basic reason is that although .2f is a little above .2 and .3f is a little above .3, the sum of the excesses is less than halfway from .5 to the next representable float number.
First, let’s note of the scales used for these numbers. Using the
IEEE-754 binary32 format, the step between numbers in [1, 2) is
2−23. Each representable number in this interval is an integer multiple of 2−23.
.3 in is [¼, ½), where the step is 2−25.
.2 in is [⅛, ¼), where the step is 2−26.
The literal 0.2f is .2 converted to float. This produces 13,421,773•2−26, which equals
0.20000000298023223876953125. For 0.3f, we get 10,066,330•2−25,
which is 0.300000011920928955078125.
Let’s convert those scales to the scale used for numbers in [½, 1), where the step is 2−24.
13,421,773•2−26 becomes 3,355,443.25•2−24, and
10,066,330•2−25 becomes 5,033,165•2−24. Adding those produces 8,388,608.25•2−24. To get a representable result, we round that to the nearest integer. As you can see, the fraction is .25, so we round down, yielding 8,388,608•2−24, which is .5. The next representable number, 8,388,609•2−24, which is 0.500000059604644775390625, is further away.

how to wrap radians between -pi and pi with mod? [duplicate]

I'm looking for some nice C code that will accomplish effectively:
while (deltaPhase >= M_PI) deltaPhase -= M_TWOPI;
while (deltaPhase < -M_PI) deltaPhase += M_TWOPI;
What are my options?
Edit Apr 19, 2013:
Modulo function updated to handle boundary cases as noted by aka.nice and arr_sea:
static const double _PI= 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348;
static const double _TWO_PI= 6.2831853071795864769252867665590057683943387987502116419498891846156328125724179972560696;
// Floating-point modulo
// The result (the remainder) has same sign as the divisor.
// Similar to matlab's mod(); Not similar to fmod() - Mod(-3,4)= 1 fmod(-3,4)= -3
template<typename T>
T Mod(T x, T y)
{
static_assert(!std::numeric_limits<T>::is_exact , "Mod: floating-point type expected");
if (0. == y)
return x;
double m= x - y * floor(x/y);
// handle boundary cases resulted from floating-point cut off:
if (y > 0) // modulo range: [0..y)
{
if (m>=y) // Mod(-1e-16 , 360. ): m= 360.
return 0;
if (m<0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(106.81415022205296 , _TWO_PI ): m= -1.421e-14
}
}
else // modulo range: (y..0]
{
if (m<=y) // Mod(1e-16 , -360. ): m= -360.
return 0;
if (m>0 )
{
if (y+m == y)
return 0 ; // just in case...
else
return y+m; // Mod(-106.81415022205296, -_TWO_PI): m= 1.421e-14
}
}
return m;
}
// wrap [rad] angle to [-PI..PI)
inline double WrapPosNegPI(double fAng)
{
return Mod(fAng + _PI, _TWO_PI) - _PI;
}
// wrap [rad] angle to [0..TWO_PI)
inline double WrapTwoPI(double fAng)
{
return Mod(fAng, _TWO_PI);
}
// wrap [deg] angle to [-180..180)
inline double WrapPosNeg180(double fAng)
{
return Mod(fAng + 180., 360.) - 180.;
}
// wrap [deg] angle to [0..360)
inline double Wrap360(double fAng)
{
return Mod(fAng ,360.);
}
One-liner constant-time solution:
Okay, it's a two-liner if you count the second function for [min,max) form, but close enough — you could merge them together anyways.
/* change to `float/fmodf` or `long double/fmodl` or `int/%` as appropriate */
/* wrap x -> [0,max) */
double wrapMax(double x, double max)
{
/* integer math: `(max + x % max) % max` */
return fmod(max + fmod(x, max), max);
}
/* wrap x -> [min,max) */
double wrapMinMax(double x, double min, double max)
{
return min + wrapMax(x - min, max - min);
}
Then you can simply use deltaPhase = wrapMinMax(deltaPhase, -M_PI, +M_PI).
The solutions is constant-time, meaning that the time it takes does not depend on how far your value is from [-PI,+PI) — for better or for worse.
Verification:
Now, I don't expect you to take my word for it, so here are some examples, including boundary conditions. I'm using integers for clarity, but it works much the same with fmod() and floats:
Positive x:
wrapMax(3, 5) == 3: (5 + 3 % 5) % 5 == (5 + 3) % 5 == 8 % 5 == 3
wrapMax(6, 5) == 1: (5 + 6 % 5) % 5 == (5 + 1) % 5 == 6 % 5 == 1
Negative x:
Note: These assume that integer modulo copies left-hand sign; if not, you get the above ("Positive") case.
wrapMax(-3, 5) == 2: (5 + (-3) % 5) % 5 == (5 - 3) % 5 == 2 % 5 == 2
wrapMax(-6, 5) == 4: (5 + (-6) % 5) % 5 == (5 - 1) % 5 == 4 % 5 == 4
Boundaries:
wrapMax(0, 5) == 0: (5 + 0 % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
wrapMax(5, 5) == 0: (5 + 5 % 5) % 5 == (5 + 0) % 5== 5 % 5 == 0
wrapMax(-5, 5) == 0: (5 + (-5) % 5) % 5 == (5 + 0) % 5 == 5 % 5 == 0
Note: Possibly -0 instead of +0 for floating-point.
The wrapMinMax function works much the same: wrapping x to [min,max) is the same as wrapping x - min to [0,max-min), and then (re-)adding min to the result.
I don't know what would happen with a negative max, but feel free to check that yourself!
If ever your input angle can reach arbitrarily high values, and if continuity matters, you can also try
atan2(sin(x),cos(x))
This will preserve continuity of sin(x) and cos(x) better than modulo for high values of x, especially in single precision (float).
Indeed, exact_value_of_pi - double_precision_approximation ~= 1.22e-16
On the other hand, most library/hardware use a high precision approximation of PI for applying the modulo when evaluating trigonometric functions (though x86 family is known to use a rather poor one).
Result might be in [-pi,pi], you'll have to check the exact bounds.
Personaly, I would prevent any angle to reach several revolutions by wrapping systematically and stick to a fmod solution like the one of boost.
There is also fmod function in math.h but the sign causes trouble so that a subsequent operation is needed to make the result fir in the proper range (like you already do with the while's). For big values of deltaPhase this is probably faster than substracting/adding `M_TWOPI' hundreds of times.
deltaPhase = fmod(deltaPhase, M_TWOPI);
EDIT:
I didn't try it intensively but I think you can use fmod this way by handling positive and negative values differently:
if (deltaPhase>0)
deltaPhase = fmod(deltaPhase+M_PI, 2.0*M_PI)-M_PI;
else
deltaPhase = fmod(deltaPhase-M_PI, 2.0*M_PI)+M_PI;
The computational time is constant (unlike the while solution which gets slower as the absolute value of deltaPhase increases)
I would do this:
double wrap(double x) {
return x-2*M_PI*floor(x/(2*M_PI)+0.5);
}
There will be significant numerical errors. The best solution to the numerical errors is to store your phase scaled by 1/PI or by 1/(2*PI) and depending on what you are doing store them as fixed point.
Instead of working in radians, use angles scaled by 1/(2π) and use modf, floor etc. Convert back to radians to use library functions.
This also has the effect that rotating ten thousand and a half revolutions is the same as rotating half then ten thousand revolutions, which is not guaranteed if your angles are in radians, as you have an exact representation in the floating point value rather than summing approximate representations:
#include <iostream>
#include <cmath>
float wrap_rads ( float r )
{
while ( r > M_PI ) {
r -= 2 * M_PI;
}
while ( r <= -M_PI ) {
r += 2 * M_PI;
}
return r;
}
float wrap_grads ( float r )
{
float i;
r = modff ( r, &i );
if ( r > 0.5 ) r -= 1;
if ( r <= -0.5 ) r += 1;
return r;
}
int main ()
{
for (int rotations = 1; rotations < 100000; rotations *= 10 ) {
{
float pi = ( float ) M_PI;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in radians " << a << " => " << wrap_rads ( a ) / two_pi << '\n' ;
}
{
float pi = ( float ) 0.5;
float two_pi = 2 * pi;
float a = pi;
a += rotations * two_pi;
std::cout << rotations << " and a half rotations in grads " << a << " => " << wrap_grads ( a ) / two_pi << '\n' ;
}
std::cout << '\n';
}}
Here is a version for other people finding this question that can use C++ with Boost:
#include <boost/math/constants/constants.hpp>
#include <boost/math/special_functions/sign.hpp>
template<typename T>
inline T normalizeRadiansPiToMinusPi(T rad)
{
// copy the sign of the value in radians to the value of pi
T signedPI = boost::math::copysign(boost::math::constants::pi<T>(),rad);
// set the value of rad to the appropriate signed value between pi and -pi
rad = fmod(rad+signedPI,(2*boost::math::constants::pi<T>())) - signedPI;
return rad;
}
C++11 version, no Boost dependency:
#include <cmath>
// Bring the 'difference' between two angles into [-pi; pi].
template <typename T>
T normalizeRadiansPiToMinusPi(T rad) {
// Copy the sign of the value in radians to the value of pi.
T signed_pi = std::copysign(M_PI,rad);
// Set the value of difference to the appropriate signed value between pi and -pi.
rad = std::fmod(rad + signed_pi,(2 * M_PI)) - signed_pi;
return rad;
}
I encountered this question when searching for how to wrap a floating point value (or a double) between two arbitrary numbers. It didn't answer specifically for my case, so I worked out my own solution which can be seen here. This will take a given value and wrap it between lowerBound and upperBound where upperBound perfectly meets lowerBound such that they are equivalent (ie: 360 degrees == 0 degrees so 360 would wrap to 0)
Hopefully this answer is helpful to others stumbling across this question looking for a more generic bounding solution.
double boundBetween(double val, double lowerBound, double upperBound){
if(lowerBound > upperBound){std::swap(lowerBound, upperBound);}
val-=lowerBound; //adjust to 0
double rangeSize = upperBound - lowerBound;
if(rangeSize == 0){return upperBound;} //avoid dividing by 0
return val - (rangeSize * std::floor(val/rangeSize)) + lowerBound;
}
A related question for integers is available here:
Clean, efficient algorithm for wrapping integers in C++
A two-liner, non-iterative, tested solution for normalizing arbitrary angles to [-π, π):
double normalizeAngle(double angle)
{
double a = fmod(angle + M_PI, 2 * M_PI);
return a >= 0 ? (a - M_PI) : (a + M_PI);
}
Similarly, for [0, 2π):
double normalizeAngle(double angle)
{
double a = fmod(angle, 2 * M_PI);
return a >= 0 ? a : (a + 2 * M_PI);
}
In the case where fmod() is implemented through truncated division and has the same sign as the dividend, it can be taken advantage of to solve the general problem thusly:
For the case of (-PI, PI]:
if (x > 0) x = x - 2PI * ceil(x/2PI) #Shift to the negative regime
return fmod(x - PI, 2PI) + PI
And for the case of [-PI, PI):
if (x < 0) x = x - 2PI * floor(x/2PI) #Shift to the positive regime
return fmod(x + PI, 2PI) - PI
[Note that this is pseudocode; my original was written in Tcl, and I didn't want to torture everyone with that. I needed the first case, so had to figure this out.]
deltaPhase -= floor(deltaPhase/M_TWOPI)*M_TWOPI;
The way suggested you suggested is best. It is fastest for small deflections. If angles in your program are constantly being deflected into the proper range, then you should only run into big out of range values rarely. Therefore paying the cost of a complicated modular arithmetic code every round seems wasteful. Comparisons are cheap compared to modular arithmetic (http://embeddedgurus.com/stack-overflow/2011/02/efficient-c-tip-13-use-the-modulus-operator-with-caution/).
In C99:
float unwindRadians( float radians )
{
const bool radiansNeedUnwinding = radians < -M_PI || M_PI <= radians;
if ( radiansNeedUnwinding )
{
if ( signbit( radians ) )
{
radians = -fmodf( -radians + M_PI, 2.f * M_PI ) + M_PI;
}
else
{
radians = fmodf( radians + M_PI, 2.f * M_PI ) - M_PI;
}
}
return radians;
}
If linking against glibc's libm (including newlib's implementation) you can access
__ieee754_rem_pio2f() and __ieee754_rem_pio2() private functions:
extern __int32_t __ieee754_rem_pio2f (float,float*);
float wrapToPI(float xf){
const float p[4]={0,M_PI_2,M_PI,-M_PI_2};
float yf[2];
int q;
int qmod4;
q=__ieee754_rem_pio2f(xf,yf);
/* xf = q * M_PI_2 + yf[0] + yf[1] /
* yf[1] << y[0], not sure if it could be ignored */
qmod4= q % 4;
if (qmod4==2)
/* (yf[0] > 0) defines interval (-pi,pi]*/
return ( (yf[0] > 0) ? -p[2] : p[2] ) + yf[0] + yf[1];
else
return p[qmod4] + yf[0] + yf[1];
}
Edit: Just realised that you need to link to libm.a, I couldn't find the symbols declared in libm.so
I have used (in python):
def WrapAngle(Wrapped, UnWrapped ):
TWOPI = math.pi * 2
TWOPIINV = 1.0 / TWOPI
return UnWrapped + round((Wrapped - UnWrapped) * TWOPIINV) * TWOPI
c-code equivalent:
#define TWOPI 6.28318531
double WrapAngle(const double dWrapped, const double dUnWrapped )
{
const double TWOPIINV = 1.0/ TWOPI;
return dUnWrapped + round((dWrapped - dUnWrapped) * TWOPIINV) * TWOPI;
}
notice that this brings it in the wrapped domain +/- 2pi so for +/- pi domain you need to handle that afterward like:
if( angle > pi):
angle -= 2*math.pi

How to stretch points?

Let say I've 5 points, where p0 and p4 are fixed with values 0.0 and 4.0:
0 | 1.0 | 2.0 | 3.0 | 4
The points in the middle can change, but they must stretch the others once moving.
So for a stretch "to right", it must enlarge the prev values around the moving point and press the next ones between the moving point and the last point, keeping the proportions between each points.
I've write this code which move the 3° point to 2.5 from its original 2.0 x-position:
const int numPoints = 5;
double points[numPoints] = { 0.0, 1.0, 2.0, 3.0, 4.0 };
int stretchedPoint = 2;
double prevX = points[stretchedPoint];
points[stretchedPoint] = 2.5;
std::cout<< points[0];
for (int prevPoint = 1; prevPoint < numPoints - 1; prevPoint++) {
// prev points
if (prevPoint < stretchedPoint) {
double ratio = points[stretchedPoint] / prevX;
points[prevPoint] *= ratio;
// next points
} else if (prevPoint > stretchedPoint) {
double ratio = (points[numPoints - 1] - prevX) / (points[numPoints - 1] - points[stretchedPoint]);
points[prevPoint] *= ratio;
}
std::cout << " | " << points[prevPoint];
}
std::cout << " | " << points[numPoints - 1];
which give to me right result for prev points:
0 | 1.25 | 2.5 | 0.76 | 4
but when I try to apply the "same-wrapped-math" for the next points, I get a non-proportional scaling, which give weird results (4?)
Can anyone help me?
You forgot about non-zero starting point
points[prevPoint] = points[stretchedPoint] + ratio * (points[prevPoint] - prevX)
Note that same logic should be applied to previos points, if start value is non-zero
In general, to apply linear interpolation for initial X0..X1 interval and final X0new..X1new interval, one have to use
(Xnew - X0new) / (X1new - X0new) = (X - X0) / (X1 - X0)
so
XNew = X0new + (X1new - X0new) * (X - X0) / (X1 - X0)
What you did on the left side of the point (and which is working) can be rewritten somehow like this:
// double ratio = (points[stretchedPoint] - 0) / (prevX - 0);
// points[prevPoint] = 0 + ratio * (points[prevPoint] - 0);
To achieve exactly the dual on the right side, it should be:
} else if (prevPoint > stretchedPoint) {
double ratio = (points[numPoints - 1] - points[stretchedPoint]) /
(points[numPoints - 1] - prevX);
points[prevPoint] = points[numPoints - 1] -
ratio * (points[numPoints-1] - points[prevPoint]);
}

How to calculate Gaussian-weighted Circular Window?

I have a Matrix with values filled in every Field. The size is e.g. 15x15(225) now I want to calculate the Weight of every Field based on the Center Field of the Matrix. For a bigger distance, the value of the Pixel will be less weighted for the calculation. This should be look like a circle around the center Field. Here a example Image:
The small Rectangle is the centre field. The weighting should be a Gaussain-weighted circular window with a sigma of 1.5. How could I get this done? My thought was sth. like this where every Weight is filled in a Matrix with the same Size for the calculation afterwards.
expf = 1.f/(2.f * 1.5 * 1.5);
[...]
W[k] = (i*i + j*j) * expf;
Where i and j are the distanze from the centre pixel (e.g. for first iteration i = -7, j = -7)
For me this solution seemed to be fine, but the values I get are always very small e.g:
W[0]: 3.48362e-10
W[1]: 6.26123e-09
W[2]: 7.21553e-08
W[3]: 5.3316e-07
W[4]: 2.52596e-06
W[5]: 7.67319e-06
W[6]: 1.49453e-05
[...]
W[40]: 0.000523195
W[41]: 0.000110432
W[42]: 1.49453e-05
W[43]: 1.29687e-06
W[44]: 7.21553e-08
W[45]: 5.3316e-07
W[46]: 9.58266e-06
W[47]: 0.000110432
W[48]: 0.000815988
[...]
W[85]: 0.055638
W[86]: 0.0117436
W[87]: 0.00158933
W[88]: 0.000137913
[...]
W[149]: 7.67319e-06
W[150]: 2.52596e-06
W[151]: 4.53999e-05
W[152]: 0.000523195
W[153]: 0.00386592
Could it be, that the calculation of the weights is wrong?
The PDF of a multivariate normal distribution is
2 π -k / 2 |Σ|-0.5exp(-0.5 ((x - μ) |Σ|-1 ((x - μ))
For your case, this translates to
double weight(int i, int j, double var) {
return 1 / (2 * M_PI) * std::exp(-0.5 * (i * i + j * j) / var / var);
}
where i and j are centered at 0 and 0, and var is the variance.
Note:
This is the PDF. If you want the value to be 1 at the center, use weight(i, j, var) / weight(0, 0, var). Otherwise, you will indeed get small numbers.
The decay is specified by var - lower values will show larger decay.
The following code prints
$ g++ --std=c++11 gs.cpp && ./a.out
1
0.884706
1
4.78512e-06
for example
#include <cmath>
#include <iostream>
double weight(int i, int j, double var) {
return 1 / (2 * M_PI) * std::exp(-0.5 * (i * i + j * j) / var / var);
}
int main() {
{
const double f = weight(0, 0, 20);
std::cout << weight(0, 0, 20) / f << std::endl;
std::cout << weight(-7, -7, 20) / f << std::endl;
}
{
const double f = weight(0, 0, 2);
std::cout << weight(0, 0, 2) / f << std::endl;
std::cout << weight(-7, -7, 2) / f << std::endl;
}
}

Sum exceeding permissible value in looping floats

I recently created this simple program to find average velocity.
Average velocity = Δx / Δt
I chose x as a function of t as x = t^2
Therefore v = 2t
also, avg v = (x2 - x1) / (t2 - t1)
I chose the interval to be t = 1s to 4s. Implies x goes from 1 to 16
Therefore avg v = (16 - 1) / (4 - 1) = 5
Now the program :
#include <iostream>
using namespace std;
int main() {
float t = 1, v = 0, sum = 0, n = 0; // t = time, v = velocity, sum = Sigma v, n = Sigma 1
float avgv = 0;
while( t <= 4 ) {
v = 2*t;
sum += v;
t += 0.0001;
n++;
}
avgv = sum/n;
cout << "\n----> " << avgv << " <----\n";
return 0;
}
I used very small increments of time to calculate velocity at many moments. Now, if the increment of t is 0.001, The avg v calculated is 4.99998.
Now if i put increment of t as 0.0001, The avg v becomes 5.00007!
Further decreasing increment to 0.00001 yields avg v = 5.00001
Why is that so?
Thank you.
In base 2 0.0001 and 0.001 are periodic numbers, so they don't have an exact representation. One of them is being rounded up, the other one is rounded down, so when you sum lots of them you get different values.
This is the same thing that happens in decimal representation, if you choose the numbers to sum accordingly (assume each variable can hold 3 decimal digits).
Compare:
a = 1 / 3; // a becomes 0.333
b = a * 6; // b becomes 1.998
with:
a = 2 / 3; // a becomes 0.667
b = a * 3; // b becomes 2.001
both should (theoretically) result into 2 but because of rounding error they give different results
In the decimal system, since 10 is factorised into primes 2 and 5 only fractions whose denominator is divisible only by 2 and 5 can be represented with a finite number of decimal digits (all other fractions are periodic), in base 2 only fractions which have as denominator a power of 2 can be represented exactly. Try using 1.0/512.0 and 1.0/1024.0 as steps in your loop. Also, be careful because if you choose a step that is too small, you may not have enough digits to represent that in the float datatype (i.e., use doubles)