Basically I have two variables:
double halfWidth = Width / 2;
double halfHeight = Height / 2;
As they are being divided by 2, they will either be a whole number or a decimal. How can I check whether they are a whole number or a .5?
You can use modf, this should be sufficient:
double intpart;
if( modf( halfWidth, &intpart) == 0 )
{
// your code here
}
First, you need to make sure that you're using double-precision floating-point math:
double halfWidth = Width / 2.0;
double halfHeight = Height / 2.0;
Because one of the operands is a double (namely, 2.0), this will force the compiler to convert Width and Height to doubles before doing the math (assuming they're not already doubles). Once converted, the division will be done in double-precision floating-point. So it will have a decimal, where appropriate.
The next step is to simply check it with modf.
double temp;
if(modf(halfWidth, &temp) != 0)
{
//Has fractional part.
}
else
{
//No fractional part.
}
You may discard a fractional part and compare the result with the original value using floor():
if (floor(halfWidth) == halfWidth) {
// halfWidth is a whole number
} else {
// halfWidth has a non-zero fractional part
}
As rightly pointed out by #Dávid Laczkó, it's a better solution than modf() because there's no need for an additional variable.
And according to my benchmarks (Linux, gcc 8.3.0, optimizations -O0...-O3), the floor() call consumes less CPU time than modf() on the modern notebook and server processors. The difference even growing with compiler optimizations enabled. Probably it's because the modf() has two arguments when the floor() has only one argument.
Related
What is the best way to constrain any value from -pi to pi ?
I currently have:
if (fAngle > XM_PI) {
fAngle = fAngle - XM_2PI;
}
else if (fAngle < -XM_PI) {
fAngle = fAngle - -XM_2PI;
}
However, I fear those if's should instead be while's
For reference, under the Exploit Symmetrical Functions section:
https://developer.arm.com/solutions/graphics-and-gaming/developer-guides/learn-the-basics/understanding-numerical-precision/mitigating-loss-of-precision
Extra bit of precision!
Adding or subtracting XM_2PI cannot restore any accuracy that has been lost. In fact, it adds noise, generally losing more accuracy, because XM_2PI is necessarily only an approximation of 2π. It has some error itself, so adding or subtracting it adds or subtracts the error in the approximation.
What it can do is keep you from losing more accuracy by ensuring that future results remain low in magnitude, thus remaining in a region where the floating-point format has more precision than if the number grew beyond 4, 8, 16, or other points where the exponent changes and the absolute precision becomes worse.
If you already have some value x outside [−π, π] and want its sine or cosine, you should get the best result by using sin(x) or cos(x) directly. Good implementations of sin and cos will reduce the argument using a high-precision value for 2π, so you will get a better result than using sin(x-XM_PI) or cos(x-XM_PI) (unless, by chance, the various errors in these happen to cancel).
So your task with trigonometric functions is not to reduce values you already have but to design your algorithms to keep values from growing. Adding or subtracting 2π is a reasonable way to do this. However, when you do it, add or subtract an extended-precision version of 2π, not just XM_2PI. You can do this by representing 2π as XM_2PI (which should be the value representable in floating-point that is closest to 2π) plus some residue r. r should be the value representable in floating-point that is closest to 2π−XM_2PI. You can calculate that with extended-precision software such as GMP or Maple and can likely find it online. (I do not have it handy or I would paste it here; anybody else is welcome to edit it in.) Then you would update your angle with fAngle = fAngle - XM_2PI - r; or fAngle = fAngle + XM_2PI + r;.
An exception is if you have the angle measured in some unit that you can represent or reduce exactly, such as in degrees (which you can reduce by 360º with no error as long as the number of degrees itself is represented with no error) or in time (such as number of seconds for some function with a period of a day or other rational number of seconds, so you can again reduce with no error). In that case, you can let the angle grow as long as you can represent it exactly, and you would reduce it modulo the period prior to converting it to radians.
The simplest coding way is to use the math library function remainder, as in
fAngle = remainder( fangle, XM_2PI);
STATIC_INLINE_PURE float const __vectorcall constrain(float const fAngle)
{
static constexpr double const
dPI(std::numbers::pi),
d2PI(2.0 * std::numbers::pi),
dResidue(-1.74845553146951715461909770965576171875e-07); // abs difference between d2PI(double precision) and XM_2PI(float precision)
double dAngle(fAngle);
dAngle = std::remainder(dAngle, d2PI);
if (dAngle > dPI) {
dAngle = dAngle - d2PI - dResidue;
}
else if (dAngle < -dPI) {
dAngle = dAngle + d2PI + dResidue;
}
return((float)dAngle);
}
I've written a simple program to calculate the first and second derivative of a function, using function pointers. My program computes the correct answers (more or less), but for some functions, the accuracy is less than I would like.
This is the function I am differentiating:
float f1(float x) {
return (x * x);
}
These are the derivative functions, using the central finite difference method:
// Function for calculating the first derivative.
float first_dx(float (*fx)(float), float x) {
float h = 0.001;
float dfdx;
dfdx = (fx(x + h) - fx(x - h)) / (2 * h);
return dfdx;
}
// Function for calculating the second derivative.
float second_dx(float (*fx)(float), float x) {
float h = 0.001;
float d2fdx2;
d2fdx2 = (fx(x - h) - 2 * fx(x) + fx(x + h)) / (h * h);
return d2fdx2;
}
Main function:
int main() {
pc.baud(9600);
float x = 2.0;
pc.printf("**** Function Pointers ****\r\n");
pc.printf("Value of f(%f): %f\r\n", x, f1(x));
pc.printf("First derivative: %f\r\n", first_dx(f1, x));
pc.printf("Second derivative: %f\r\n\r\n", second_dx(f1, x));
}
This is the output from the program:
**** Function Pointers ****
Value of f(2.000000): 4.000000
First derivative: 3.999948
Second derivative: 1.430511
I'm happy with the accuracy of the first derivative, but I believe the second derivative is too far off (it should be equal to ~2.0).
I have a basic understanding of how floating point numbers are represented and why they are sometimes inaccurate, but how can I make this second derivative result more accurate? Could I be using something better than the central finite difference method, or is there a way I can get better results with the current method?
The accuracy can be increased by choosing a type which has more precision. float is currently defined as an IEEE-754 32-bit number, giving you a precision of ~7.225 decimal places.
What you want is the 64-bit counterpart: double with ~15.955 decimal places accuracy.
That should be sufficient for your calculation, however worth mentioning is boosts implementation which offers a quadruple-precision floating point number (128-bit).
Finally The GNU Multiple Precision Arithmetic Library offers types with an arbitrary number of decimal places for precision.
Go analytical. ;-) probably not an option given "with the current
method".
Use double instead of float.
Vary the epsilon (h), and combine the results in some way. For example you could try 0.00001, 0.000001, 0.0000001 and average them. In fact, you'd want the result with the smallest h that doesn't overflow/underflow. But it's not clear how to detect overflow and underflow.
I have two polygons with their vertices stored as Double coordinates. I'd like to find the intersecting area of these polygons, so I'm looking at the Clipper library (C++ version). The problem is, Clipper only works with integer math (it uses the Long type).
Is there a way I can safely transform both my polygons with the same scale factor, convert their coordinates to Longs, perform the Intersection algorithm with Clipper, and scale the resulting intersection polygon back down with the same factor, and convert it back to a Double without too much loss of precision?
I can't quite get my head around how to do that.
You can use a simple multiplier to convert between the two:
/* Using power-of-two because it is exactly representable and makes
the scaling operation (not the rounding!) lossless. The value 1024
preserves roughly three decimal digits. */
double const scale = 1024.0;
// representable range
double const min_value = std::numeric_limits<long>::min() / scale;
double const max_value = std::numeric_limits<long>::max() / scale;
long
to_long(double v)
{
if(v < 0)
{
if(v < min_value)
throw out_of_range();
return static_cast<long>(v * scale - 0.5);
}
else
{
if(v > max_value)
throw out_of_range();
return static_cast<long>(v * scale + 0.5);
}
}
Note that the larger you make the scale, the higher your precision will be, but it also lowers the range. Effectively, this converts a floating-point number into a fixed-point number.
Lastly, you should be able to locate code to compute intersections between line segments using floating-point math easily, so I wonder why you want to use exactly Clipper.
I'm making a function that takes in 3 unsigned long longs, and applies the law of cosines to find out if the triangle is obtuse, acute or a right triangle. Should I just cast the variables to doubles before I use them?
void triar( unsigned long long& r,
unsigned long long x,
unsigned long long y,
unsigned long long z )
{
if(x==0 || y==0 || z==0) die("invalid triangle sides");
double t=(x*x + y*y -z*z)/(2*x*y);
t=acos (t) * (180.0 / 3.14159265);
if(t > 90) {
cout<<"Obtuse Triangle"<<endl;
r=t;
} else if(t < 90){
cout<<"Acute Triangle"<<endl;
r=t;
} else if(t == 90){
cout<<"Right Traingle"<<endl;
r=t;
}
}
There is generally no reason why you could not cast if you need floating point arithmetics. However, there is also an implicit conversion from unsigned long to double, so you can also often do completely without casting.
In many cases, including yours, you can cast only one of the arguments to force double arithmetics on a particular operation only. For example,
double t = (double)(x*x + y*y - z*z) / (2*x*y)
This way, all operations except for the division are computed in integer arithmetics and are therefore slighly faster. The cast is still necessary to avoid truncation during division.
Your code contains a comparison of floating point arguments. Floating point arithmetics however almost inevitably reduces accuracy. Avoid limited accuracy, or analyze and control accuracy.
Prefer an integer only solution as described in an excellent sister answer if you have a wide enough integral type at your disposal
Always avoid conversion from radians to degrees except for presentation to humans
Take the value of π from your mathematical library header files (unfortunately, this is platform dependent - try _USE_MATH_DEFINES + M_PI or, if already using boost libraries, boost::math::constants::pi<double>()), or express it analytically. For example, std::atan(1)*2 is the right angle.
If you choose double precision, and the ultimate difference value is less than, say, std::numeric_limits<double>::min() * 8, you can probably not tell anything about the triangle and the classification you return is basically bogus. (I made up the value of 8, you will possibly lose way more bits than three.)
You have a problem with obtuse triangles, x*x + y*y - z*z would mathematically give a negative result, that is then reduced modulo 2^WIDTH (where WIDTH is the number of value bits in unsigned long long, at least 64 and probably exactly that) yielding a - probably large - positive value (or in rare cases 0). Then the computed result of t = (x*x + y*y - z*z)/(2*x*y) can be larger than 1, and acos(t) would return a NaN.
The correct way to find out whether the triangle is obtuse/acute/right-angled with the given argument type is to check whether x*x + y*y < /* > / == */ z*z - if you can be sure the mathematical results don't exceed the unsigned long long range.
If you can't be sure of that, you can either convert the variables to double before the computation,
double xd = x, yd = y, zd = z;
double t = (xd*xd + yd*yd - zd*zd)/(2*xd*yd);
with possible loss of precision and incorrect results for nearly right-angled triangles (e.g. for the slightly obtuse triangle x = 2^29, y = 2^56-1, z = 2^56+2, both y and z would be converted to 2^56 with standard 64-bit doubles, xd*xd + yd*yd = 2^58 + 2^112 would be evaluated to 2^112, subtracting zd*zd then results in 0).
Or you can compare x*x + y*y to z*z - or x*x to z*z - y*y - using only integer arithmetic. If x*x is representable as an unsigned long long (I assume that 0 < x <= y <= z), it's relatively easy, first check whether (z - y)*(z + y) would exceed ULLONG_MAX, if yes, the triangle is obtuse, otherwise calculate and compare. If x*x is not representable, it becomes complicated, I think the easiest way (except for using a big integer library, of course) would be to compute the high and if necessary low 64 (or whatever width unsigned long long has) bits separately by splitting the numbers at half the width and compare those.
Further note: Your value for π, 3.14159265 is too inaccurate, right-angled triangles will be reported as obtuse.
Here is my problem, I have several parameters that I need to increment by 0.1.
But my UI only renders x.x , x.xx, x.xxx for floats so since 0.1f is not really 0.1 but something like 0.10000000149011612 on the long run my ui will render -0.00 and that doesn't make much sense. How to prevent that for all the possible cases of UI.
Thank you.
Use integers and divide by 10 (or 1000 etc...) just before displaying. Your parameters will store an integer number of tenths, and you'll increment them by 1 tenth.
If you know that your floating point value will always be a multiple of 0.1, you can round it after every increment to make sure it maintains a sensible value. It still won't be exact (because it physically can't be), but at least the errors won't accumulate and it will display properly.
Instead of:
x += delta;
Do:
x = floor((x + delta) / precision + 0.5) * precision;
Edit: It's useful to turn the rounding into a stand-alone function and decouple it from the increment:
inline double round(double value, double precision = 1.0)
{
return floor(value / precision + 0.5) * precision;
}
x = round(x + 0.1, 0.1);