Subtle differences in output values (+/-) between float and doubles

Subtle differences in output values (+/-) between float and doubles - c++

This is a follow up from an older question found here: Chaining Function Calls and user Mooing Duck provided me with an answer that works through the use of Proxy Class and Proxy functions. I have managed to template this class and it appears to be working. I'm getting completely different results between float and double...
Here are the non templated versions of the classes and application for floats and doubles:
Just replace all floats with doubles within the classes, functions, and proxy functions... The main program won't change except for the arguments.
#include <cmath>
#include <exception>
#include <iostream>
#include <utility>
namespace pipes {
const double PI = 4 * atan(1);
struct vec2 {
float x;
float y;
};
std::ostream& operator<<(std::ostream& out, vec2 v2) {
return out << v2.x << ',' << v2.y;
}
vec2 translate(vec2 in, float a) {
return vec2{ in.x + a, in.y + a };
}
vec2 rotate(vec2 in, float a) {
// convert a in degrees to radians:
a *= (float)(PI / 180.0);
return vec2{ in.x*cos(a) - in.y*sin(a),
in.x*sin(a) + in.y*cos(a) };
}
vec2 scale(vec2 in, float a) {
return vec2{ in.x*a, in.y*a };
}
// proxy class
template<class rhst, vec2(*f)(vec2, rhst)>
class vec2_op1 {
std::decay_t<rhst> rhs; // store the parameter until the call
public:
vec2_op1(rhst rhs_) : rhs(std::forward<rhst>(rhs_)) {}
vec2 operator()(vec2 lhs) { return f(lhs, std::forward<rhst>(rhs)); }
};
// proxy methods
vec2_op1<float, translate> translate(float a) { return { a }; }
vec2_op1<float, rotate> rotate(float a) { return { a }; }
vec2_op1<float, scale> scale(float a) { return { a }; }
// lhs is the object, rhs is the operation on the object
template<class rhst, vec2(*f)(vec2, rhst)>
vec2& operator|(vec2& lhs, vec2_op1<rhst, f>&& op) { return lhs = op(lhs); }
} // namespace pipes
int main() {
try {
pipes::vec2 a{ 1.0, 0.0 };
pipes::vec2 b = (a | pipes::rotate(90.0));
std::cout << b << '\n';
} catch (const std::exception& e) {
std::cerr << e.what() << "\n\n";
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
Output for float:
-4.37114e-08,1
Output for double:
6.12323e-17,1
Here is the templated version...
#include <cmath>
#include <exception>
#include <iostream>
#include <utility>
namespace pipes {
const double PI = 4 * atan(1);
template<typename Ty>
struct vec2_t {
Ty x;
Ty y;
};
template<typename Ty>
std::ostream& operator<<(std::ostream& out, vec2_t<Ty> v2) {
return out << v2.x << ',' << v2.y;
}
template<typename Ty>
vec2_t<Ty> translate(vec2_t<Ty> in, Ty a) {
return vec2_t<Ty>{ in.x + a, in.y + a };
}
template<typename Ty>
vec2_t<Ty> rotate(vec2_t<Ty> in, Ty a) {
// convert a in degrees to radians:
a *= (Ty)(PI / 180.0);
return vec2_t<Ty>{ in.x*cos(a) - in.y*sin(a),
in.x*sin(a) + in.y*cos(a) };
}
template<typename Ty>
vec2_t<Ty> scale(vec2_t<Ty> in, Ty a) {
return vec2_t<Ty>{ in.x*a, in.y*a };
}
// proxy class
template<class rhst, typename Ty, vec2_t<Ty>(*f)(vec2_t<Ty>, rhst)>
class vec2_op1 {
std::decay_t<rhst> rhs; // store the parameter until the call
public:
vec2_op1(rhst rhs_) : rhs(std::forward<rhst>(rhs_)) {}
vec2_t<Ty> operator()(vec2_t<Ty> lhs) { return f(lhs, std::forward<rhst>(rhs)); }
};
// proxy methods
template<typename Ty>
vec2_op1<Ty, Ty, translate<Ty>> translate(Ty a) { return { a }; }
template<typename Ty>
vec2_op1<Ty, Ty, rotate<Ty>> rotate(Ty a) { return { a }; }
template<typename Ty>
vec2_op1<Ty, Ty, scale<Ty>> scale(Ty a) { return { a }; }
// overloaded | operator for chaining function calls to vec2_t objects
// lhs is the object, rhs is the operation on the object
template<class rhst, typename Ty, vec2_t<Ty>(*f)(vec2_t<Ty>, rhst)>
vec2_t<Ty>& operator|(vec2_t<Ty>& lhs, vec2_op1<rhst, Ty, f>&& op) { return lhs = op(lhs); }
} // namespace pipes
// for double just instantiate with double...
int main() {
try {
pipes::vec2_t<float> a{ 1.0f, 0.0f };
pipes::vec2_t<float> b = (a | pipes::rotate(90.0f));
std::cout << b << '\n';
} catch (const std::exception& e) {
std::cerr << e.what() << "\n\n";
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
The output for floats:
-4.37114e-08,1
The output for doubles:
6.12323e-17,1
This goes to show that the conversion of my class to a class template appears to be working. I understand that there may be a bit of precision lost due to conversion from double to float or widening from float to double when casting, however, I can't seem to wrap my mind around why there is such a difference in output values from one to the other...
The rotation of the point or vector {1,0} at 90 degrees or PI/2 radians should be {0,1}. I understand how floating-point arithmetic works and that the generated output for the x values is relatively close to 0 so they should be considered 0 for all tense and purposes and I can include the use an epsilon checking function to test if it is close enough to 0 to set it directly to 0 which is not an issue...
What intrigues my curiosity is why is it -4.3...e-8 for float and +6.1...e-17 for double? In the float case, I'm getting negative values, and for the double case, I'm getting positive values. In both cases yes they are extremely small and close to 0 which is fine, but opposite signs, that has me scratching my head?
I'm seeking clarity to get a better insight as to why these values are being generated the way they are... Is it coming from the type-conversion or is it due to the trig function that is being used? Or a combination of both? Just trying to pinpoint where the divergence of signs is coming from...
I need to be aware of what is causing this subtle difference as it will pertain to my usage of this class and its generated outputs when precision is preferred over good enough estimations.
Edit
When working with the instantiation of these function templates, specifically for the rotate function and I started to test <int> type for my vector objects... I started to get some compiler errors... The translate and scale functions were fine, I only had an issue with the rotate function due to similar reasons loss of data, narrowing and widening conversions, etc...
I had to change my rotate function's implemenation to this:
template<typename Ty>
vec2_t<Ty> rotate(vec2_t<Ty> in, Ty a) {
// convert a in degrees to radians:
auto angle = (double)(a * (PI / 180.0));
return vec2_t<Ty>{ static_cast<Ty>( in.x*cos(angle) - in.y*sin(angle) ),
static_cast<Ty>( in.x*sin(angle) + in.y*cos(angle) )
};
}
Here I'm forcing the angle to always be a double regardless of the type Ty. The rotate function still expects the same type for its argument as the type of the vec2_t object that is being instantiated. The issue was with the initialization of the vec2_t object that was being created and returned from the calculations. I had to explicitly static_cast the x and y coordinates to Ty. Now when I try the same program above for vec2_t<int> passing in a rotation value of 90 I am getting exactly 0,1 for my output.
Another interesting fact by forcing the angle to always be double and always casting the calculated values back to Ty, when I instantiate my vec2_t as either a double or float I'm always getting the positive 6.123...e-17 result back for both cases... This should also allow me to simplify the design of the is_zero() function to test if these values are close enough to 0 to set them explicitly to 0.

TL;DR: Small numbers are close to zero whatever their sign. The numbers you got are "almost zero" given the circumstances.
I'd call this "sign obsession". Two very small numbers are similar even if their signs differ. Here you're looking at numbers at the edge of accuracy of the computations you performed. They are both equally "small", given their types. Other answer(s) give hints about where exactly is the clbuttic mistake :)

Your problem is in the line:
a *= (Ty)(PI / 180.0);
For the float case, this evaluates to 1.570796371
For the double case, this evaluates to 1.570796327

Related

How to define C++ struct for configuration?

I have been trying to solve following problem in C++. I would like to define a struct containing a configuration parameters for some software module. The configuration parameters are basically a floating point values and they are of two types:
parameters which are independent i.e. their values are given directly by some floating point numbers
parameters which are dependent i.e. their values are given by some expressions where the operands are the independent parameters
Here is an example
struct Configuration {
float param_independent_01;
float param_independent_02;
float param_dependent_01; // param_independent_01 + param_independent_02
float param_dependent_02; // 1.5f*param_independent_01/(param_independent_01 + param_independent_02)
};
I have been looking for a solution which enables the client code to only set values for the independent parameters and the dependent parameters values will be calculated automatically behind the scene.
Configuration config = {
param_independent_01 = 0.236f,
param_independent_02 = 0.728f
// param_dependent_01 = 0.236f + 0.728f
// param_dependent_02 = 1.5f*0.236f/(0.236f + 0.728f)
};
I suppose that the Configuration structure will be instantiated only once and the values of the parameters are known at compile time. Can anybody give me an advice how to do that in the C++?

One approach to achieve this behavior is to make use of C++'s constructor initialization list.
struct Configuration {
float param_independent_01;
float param_independent_02;
float param_dependent_01;
float param_dependent_02;
Configuration(float p1, float p2) :
param_independent_01(p1),
param_independent_02(p2),
param_dependent_01(p1 + p2),
param_dependent_02(1.5f * p1 / (p1 + p2)
)
{}
};
int main() {
Configuration config(0.236f, 0.728f);
return 0;
}

Or just inline constexpr variables in a namespace (can be put in a header).
This allows you to write some constexpr (consteval) functions to calculate the values too. (Not everything needs to be a class or a struct)
// header file
#pragma once
namespace configuration
{
inline constexpr float get_param_dependent_02(const float p1, const float p2)
{
return (1.5f * p1) / (p1+p2);
}
inline constexpr float param_independent_01{ 0.236f };
inline constexpr float param_independent_02{ 0.728f };
inline constexpr float param_dependent_01 = param_independent_01 + param_independent_02; // direct
inline constexpr float param_dependent_02 = get_param_dependent_02(param_independent_01, param_independent_02); // or through constexpr/consteval function
};
int main()
{
float f = configuration::param_dependent_02;
}

If you know the configuration is not going to change at runtime, you can implement a constexpr constructor for Configuration, and then define a constexpr Configuration variable. The construction will be done at compile time (see the generated assembler code for the godbolt link below).
If you wanted to make sure the configuration is not going to change at runtime, I would change Configuration into a class with private members, and just provide accessors for those members.
Notice also that the constructor may throw (due to a division by zero). If you want to take control of that situation, you may want to try-catch the setting of the dependent parameter 2 in the constructor's body.
[Demo]
#include <fmt/format.h>
#include <iostream>
class Configuration {
float param_independent_01;
float param_independent_02;
float param_dependent_01;
float param_dependent_02;
public:
constexpr Configuration(float p1, float p2)
: param_independent_01{p1}
, param_independent_02{p2}
, param_dependent_01{p1 + p2}
, param_dependent_02{(p1 * 1.5f)/param_dependent_01}
{}
auto get_pi1() { return param_independent_01; }
auto get_pi2() { return param_independent_02; }
auto get_pd1() { return param_dependent_01; }
auto get_pd2() { return param_dependent_02; }
friend std::ostream& operator<<(std::ostream& os, const Configuration& c) {
return os << fmt::format("pi1: {}\npi2: {}\npd1: {}\npd2: {}\n",
c.param_independent_01, c.param_independent_02,
c.param_dependent_01, c.param_dependent_02);
}
};
int main() {
constexpr Configuration c{3.14, 9.8};
std::cout << c;
}

No need for a class with a custom constructor, just do this:
struct Configuration
{
float param_independent_01 = 0; // Always initialize all class members.
float param_independent_02 = 0;
float param_dependent_01() const {return param_independent_01 + param_independent_02;}
float param_dependent_02() const {return 1.5f*param_independent_01/(param_independent_01 + param_independent_02);}
};

templating a primitive in C++

I have some code with fairly complicated logic that passing around angles in both radians and degrees. All of the variables are doubles. It would be helpful to add some additional guards to prevent passing a radians to a function that requires the value in degrees. The code below uses a struct and does work but requires .value to get the actual double back. Is it possible to template a primitive without using a struct? Is there a better way of doing this? I'm currently working C++17.
enum class AngleType
{
Degree,
Radian
};
template <AngleType T>
struct Angle
{
double value;
};
void example_function(Angle<AngleType::Radian> angle_radians) { };

A somewhat common way to do this is provide a conversion operator. Example:
#include <iostream>
#include <math.h>
template<int N, typename T>
struct AngleType
{
T value;
AngleType(T val) : value(val) {}
operator T() const noexcept
{
return value;
}
};
using AngleRadians = AngleType<0, double>;
using AngleDegrees = AngleType<1, double>;
void example_func(AngleRadians angle) {
std::cout << "angle in radians = " << angle << "\n";
}
int main(int argc, char **argv)
{
AngleRadians rad = M_PI;
AngleDegrees deg = 180;
example_func(rad);
example_func(deg); // <-- compiler error
}
It has its drawbacks, but it may be good enough for what you're trying to do.

Depends on what you really need, you could actually get rid of Angle and AngleType all together with User-defined literals.
Before starting, you need to decide the base unit you want to use. For my example, I will use radian as base unit.
The idea here is every time you attempt to use a number in degree, it would automatically convert that into radian.
// User-defined literal
constexpr auto operator"" _deg (long double deg)
{
return deg * PI / 180;
}
constexpr auto operator"" _deg (unsigned long long int deg)
{
return 1.0_deg * deg;
}
After defining this two, if you want to write a number in degree, you can simply use:
auto a = 90.0_deg;
And it would be equivalent to:
long double a = ((long double)90.0 * PI / 180);
To make it more consistent, you can also define a literal for _rad, and just use:
constexpr auto operator"" _rad (long double rad)
{
return rad;
}
constexpr auto operator"" _rad (unsigned long long int rad)
{
return 1.0_rad * rad;
}
Now every time you assign a number to something, you would do:
auto a = 3.14_rad, b = 180_deg;
However, do note that you cannot use literals on variables, so you can't do things like PI_rad. But, since we already settled the base unit as radian, then all variables are stored in radian anyways.
Also note that the parameter for those function are set to long double and unsigned long long int, as they were required by standard.

How to compare positions in sfml?

I'm new to sfml and I'm making a simple game. I need to compare 2 positions and I can't find how to do it.
How can I do it? I though that I can do it somehow like this:
if (somesprite.getPosition() < (some x,some y)) { some code}
So I just need to find out how to compare two positions.
Thank you in advance for answers that will get me closer to finding the right way to do it.
- Torsmel

getPosition() returns a sf::Vector2<T> which has overloads for subtraction.
Subtract one sf::Vector2<T> from another and the length of the resulting sf::Vector2<T> will be the distance between the positions.
#include <SFML/System/Vector2.hpp>
#include <cmath>
template<typename T>
T Vector2length(const sf::Vector2<T>& v) {
return std::sqrt(v.x * v.x + v.y * v.y);
}
void some_func() {
auto spos = somesprite.getPosition();
decltype(spos) xy(some_x, some_y);
auto distance_vec = spos - xy;
if( Vector2length(distance_vec) < max_distance ) {
// do stuff
}
}
Since sf::Vector2<T> is lacking length() and other common functions usually associated with Cartesian vectors, an alternative to the above is to inherit sf::Vector2<T> and extend it:
#include <SFML/System/Vector2.hpp>
#include <cmath>
// extending sf::Vector2<T> with functions that could be nice to have
template<typename T>
struct Vector2_ext : sf::Vector2<T> {
using sf::Vector2<T>::Vector2;
// converting from a sf::Vector2
Vector2_ext(const sf::Vector2<T>& o) : sf::Vector2<T>(o) {}
// converting back to a sf::Vector2
operator sf::Vector2<T>&() { return *this; }
operator sf::Vector2<T> const&() const { return *this; }
// your utility functions
T length() const { return std::sqrt(this->x * this->x + this->y * this->y); }
};
// deduction guide
template<typename T>
Vector2_ext(T, T)->Vector2_ext<T>;
With this, you can convert back and forth between sf::Vector2<T> and Vector2_ext<T> when needed.
int some_func(sf::Sprite& somesprite, float some_x, float some_y, float max_distance) {
auto distance =
// somesprite.getPosition() - sf::Vector2(some_x, some_y) returns a sf::Vector2<T>
// that we convert to a temporary Vector2_ext<T> and call its length() function:
Vector2_ext(somesprite.getPosition() - sf::Vector2(some_x, some_y)).length();
if(distance < max_distance) {
// do stuff
}
}

Easily convert between different geometry classes in c++?

I work in robotics, which means I use a large number of open-source projects dealing with 3D geometry. Since the classes and math tend to be fairly simple, everyone seems to implement their own version of Vector3D, Quaternion, etc., each with slight variations, e.g. vec.x, vec.X, vec.x(). So within one project, one might need to convert between Eigen, ROS, Assimp, Bullet, and other versions of the same basic classes. Is there an easy or elegant way to do this in C++ that doesn't require an n^2 mapping from every library to every other library?
Similar to: This SO question, but I can't edit any of the source libraries.
Example:
namespace a
{
class Vector
{
public:
double x, y, z;
};
} // namespace a
namespace b
{
class Vector
{
public:
double X, Y, Z;
};
} // namespace b
namespace c
{
class Vector
{
public:
double& x() { return mx; }
double& y() { return my; }
double& z() { return mz; }
private:
double mx, my, mz;
};
} // namespace c
int main()
{
a::Vector va;
b::Vector vb;
c::Vector vc = va + vb; // Ideal, but probably unrealistic goal
return 0;
}
EDIT:
If there are ~10 different geometry libraries, a particular project may only use 2-4 of them, so I'd like to avoid introducing a dependency on all the unused libraries. I was hoping for something like static_cast<b::Vec>(a::Vec), or maybe
c::Vec vc = my_cvt<c::Vec>(vb + my_cvt<b::Vec>(va));
but my understanding of templates and type_traits is pretty weak.

If you write three helper functions for each vector type to access X, Y and Z:
double X(const a::Vector& v) { return v.x; }
double Y(const a::Vector& v) { return v.y; }
double Z(const a::Vector& v) { return v.z; }
double X(const c::Vector& v) { return v.x(); }
double Y(const c::Vector& v) { return v.y(); }
//...
then you can easily write template functions that work with any type. e.g:
template<typename V1, typename V2>
V1 operator+(const V1& v1, const V2& v2) {
return {X(v1)+X(v2), Y(v1)+Y(v2), Z(v1)+Z(v2)};
}
template<typename V1, typename V2>
V1 convert(const V2& v) {
return {X(v), Y(v), Z(v)};
}
int main() {
a::Vector va;
b::Vector vb;
auto vc = convert<c::Vector>(va + vb);
}
Live demo.

Well, just define a operator+ function and your 'unrealistic goals' would be achieved:
c::Vector operator+(const a::Vector& a, const b::Vector& b) {
return {a.x+b.X, a.y+b.Y, a.z+b.Z};
}
And your small code snippet will work.
EDIT
If you do not want to define a hell lot of function, and assuming you can't change the Vector version from a and b, modifiy your vector class by adding these constructors:
Vector(a::Vector a) : mx(a.x), my(a.y), mz(a.z) {}
Vector(b::Vector b) : mx(b.X), my(b.Y), mz(b.Z) {}
And then define only one operator dealing only with the c class:
c::Vector operator+(c::Vector a, c::Vector b) {
return {a.x()+b.x(), a.y()+b.y(), a.z()+b.z()};
}
And your code snippet will work with declaring thousands of operator
EDIT 2
If you want your type to be compatible with your library's types you may add conversion operator to your struct, example, if you want your type to be convertible with Vector a, add this function inside your class:
operator a::Vector() const {
// return a a::Vector from our c::Vector
return a::Vector{mx, my, mz};
}

I see that this is an old question but check out Boost QVM.

Can we declare a function with the same signature but different return type in the base class?

the question may look silly ,but i want to ask..
Is there any way we can declare a method in a class with same signature but different return type (like int fun(int) and float fun(int) ) and during the object creation can we dynamically decide which function to be executed! i have got the compilation error...is there any other way to achieve this logic may be using templates...

You can always take the return value as a template.
template<typename T> T fun(int);
template<> float fun<float>(int);
template<> int fun<int>(int);
Can you decide dynamically at run-time which to call? No.

#DeadMG proposed the template based solution, however you can simply "tweak" the signature (which is, arguably, what the template argument does).
The idea is simply to add a dummy argument:
struct Foo
{
float fun(float); // no name, it's a dummy
int fun(int); // no name, it's a dummy
};
Then for execution:
int main() {
Foo foo;
std::cout << foo.fun(int()) << ", " << foo.fun(float());
}
This can be used exactly as the template solution (ie invoked from a template method), but is much easier to pull:
less wordy
function template specialization should be defined outside the class (although VC++ will accept inline definition in the class)
I prefer to avoid function template specialization, in general, as with specialization on arguments, the rules for selecting the right overload/specialization are tricky.

You can (but shouldn't*) use a proxy class that overloads the conversion operators.
Long example with actual usecase *
Let me take my example from Dot & Cross Product Notation:
[...]
There is also the possibility of having operator* for both dot-product and cross-product.
Assume a basic vector-type (just for demonstration):
struct Vector {
float x,y,z;
Vector() {}
Vector (float x, float y, float z) : x(x), y(y), z(z) {}
};
We observe that the dot-product is a scalar, the cross-product is a vector. In C++, we may overload conversion operators:
struct VecMulRet {
public:
operator Vector () const {
return Vector (
lhs.y*rhs.z - lhs.z*rhs.y,
lhs.z*rhs.x - lhs.x*rhs.z,
lhs.x*rhs.y - lhs.y*rhs.x
);
}
operator float () const {
return lhs.x*rhs.x + lhs.y*rhs.y + lhs.z*rhs.z;
}
private:
// make construction private and only allow operator* to create an instance
Vector const lhs, rhs;
VecMulRet (Vector const &lhs, Vector const &rhs)
: lhs(lhs), rhs(rhs)
{}
friend VecMulRet operator * (Vector const &lhs, Vector const &rhs);
};
Only operator* is allowed to use struct VecMulRet, copying of VecMulRet is forbidden (paranoia first).
Operator* is now defined as follows:
VecMulRet operator * (Vector const &lhs, Vector const &rhs) {
return VecMulRet (lhs, rhs);
}
Et voila, we can write:
int main () {
Vector a,b;
float dot = a*b;
Vector cross = a*b;
}
Btw, this is blessed by the Holy Standard as established in 1999.
If you read further in that thread, you'll find a benchmark that confirms that this comes at no performance penalty.
Short example for demonstration *
If that was too much to grasp, a more constructed example:
struct my_multi_ret {
operator unsigned int() const { return 0xdeadbeef; }
operator float() const { return 42.f; }
};
my_multi_ret multi () {
return my_multi_ret();
}
#include <iostream>
#include <iomanip>
int main () {
unsigned int i = multi();
float f = multi();
std::cout << std::hex << i << ", " << f << std::endl;
}
* You can, but shouldn't, because it does not conform to the principle of least surprise as it is not common practice. Still, it is funny.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Subtle differences in output values (+/-) between float and doubles - c++

Your problem is in the line: a *= (Ty)(PI / 180.0); For the float case, this evaluates to 1.570796371 For the double case, this evaluates to 1.570796327

Related

How to define C++ struct for configuration?

templating a primitive in C++

How to compare positions in sfml?

Easily convert between different geometry classes in c++?

Can we declare a function with the same signature but different return type in the base class?

Categories

Resources