How to use Boost.Unit with ratios in the same unit - c++

I have a very simple use case for using Boost.Unit, but not sure if there is a better/easier way to get the same done.
I want to convert between the same units, but different ratios. For example, hertz to kilohertz to megahertz.
From my understanding, I first must define units with my specific ratios:
typedef boost::units::make_scaled_unit<si::frequency, scale<10, static_rational<0> > >::type Hertz_unit;
typedef boost::units::make_scaled_unit<si::frequency, scale<10, static_rational<3> > >::type KilloHertz_unit;
typedef boost::units::make_scaled_unit<si::frequency, scale<10, static_rational<6> > >::type MegaHertz_unit;
Then create quantities that represent the units:
typedef boost::units::quantity<Hertz_unit , double> Hertz;
typedef boost::units::quantity<KilloHertz_unit, double> KilloHertz;
typedef boost::units::quantity<MegaHertz_unit , double> MegaHertz;
Finally some constants and literals:
BOOST_UNITS_STATIC_CONSTANT( Hz, Hertz_unit );
BOOST_UNITS_STATIC_CONSTANT(KHz, KilloHertz_unit);
BOOST_UNITS_STATIC_CONSTANT(MHz, MegaHertz_unit );
Hertz operator"" _Hz (long double val) { return Hertz (val * Hz); }
KilloHertz operator"" _KHz (long double val) { return KilloHertz(val * KHz); }
MegaHertz operator"" _MHz (long double val) { return MegaHertz (val * MHz); }
Now I can use the quantities:
Hertz freq_1 = (10 * Hz);
KilloHertz freq_2 = (10 * KHz);
MegaHertz freq_3 = (10 * MHz);
// OR
Hertz freq_4 = 10.0_Hz;
KilloHertz freq_5 = 10.0_KHz;
MegaHertz freq_6 = 10.0_MHz;
// Convert between units
Hertz freq_7 = static_cast<Hertz>(10 * KHz);
Is this how Boost.Unit should be used or am I missing something that might make it easier to use?
Are there not already defined units/quantities that I can use somewhere hidden in a header? Or should this be done for all my units that I use?
Do I need to know/remember that Kilo is scale<10, static_rational<3> or is this already defined and available?

There are a few different predefined "systems" that make things easier to use and avoid needing to define your own units and scales.
While this code doesn't involve frequencies, you should be able to adapt it to your needs (there is a boost/units/systems/si/frequency.hpp header, for example):
#include <boost/units/quantity.hpp>
#include <boost/units/systems/si/length.hpp>
#include <boost/units/systems/si/prefixes.hpp>
using boost::units::si::meters;
using boost::units::si::milli;
typedef boost::units::quantity<boost::units::si::length> length;
static const auto millimeters = milli * meters;
// ...
auto x = length(5 * millimeters);
auto mm = double(x / meters * 1000.0);
You used to be able to do this without the explicit cast to length (although you then needed to explicitly type the variable as length instead of using auto), but at some point this was made to require an explicit cast.
In theory you shouldn't need to do the conversion from meters to mm manually in that second line, but the obvious construction x / millimeters produces compile errors that I never did manage to figure out a good workaround (the scale doesn't cancel out like it should).
(You can also use x.value() rather than x / meters, but I don't like that approach as it will still compile and give you surprising results if the base unit of x wasn't what you were expecting. And it still doesn't solve the mm conversion issue.)
Alternatively you might want to consider something like this answer, although that's mostly geared to using a single alternative scale as your base unit.
Here's another method using frequencies and multiple quantity types:
#include <boost/units/quantity.hpp>
#include <boost/units/systems/si/frequency.hpp>
#include <boost/units/systems/si/prefixes.hpp>
using boost::units::si::hertz;
using boost::units::si::kilo;
using boost::units::si::mega;
static const auto kilohertz = kilo * hertz;
static const auto megahertz = mega * hertz;
typedef boost::units::quantity<boost::units::si::frequency> Hertz;
typedef boost::units::quantity<decltype(kilohertz)> KiloHertz;
typedef boost::units::quantity<decltype(megahertz)> MegaHertz;
// ...
auto freq_1 = Hertz(10 * hertz);
auto freq_2 = KiloHertz(10 * kilohertz);
auto freq_3 = MegaHertz(10 * megahertz);
auto freq_4 = KiloHertz(freq_3);
// freq1.value() == 10.0
// freq2.value() == 10.0
// freq3.value() == 10.0
// freq4.value() == 10000.0
You can do conversions and maths on these fairly easily; in this context the value() is probably the most useful as it will naturally express the same unit as the variable.
One slightly unfortunate behaviour is that the default string output for these units is presented as inverse seconds (eg. freq2 is "10 k(s^-1)"). So you probably just want to avoid using those.
And yes, the operator"" works as well, so you could substitute these in the above:
Hertz operator"" _Hz(long double val) { return Hertz(val * hertz); }
KiloHertz operator"" _kHz(long double val) { return KiloHertz(val * kilohertz); }
MegaHertz operator"" _MHz(long double val) { return MegaHertz(val * megahertz); }
auto freq_1 = 10.0_Hz;
auto freq_2 = 10.0_kHz;
auto freq_3 = 10.0_MHz;
auto freq_4 = KiloHertz(freq_3);
For consistency you can also define Hertz in terms of hertz, but due to quirks it's a little more tricky than the others; this works though:
typedef boost::units::quantity<std::decay_t<decltype(hertz)>> Hertz;
typedef boost::units::quantity<std::decay_t<decltype(kilohertz)>> KiloHertz;
typedef boost::units::quantity<std::decay_t<decltype(megahertz)>> MegaHertz;

Related

Can a C++ union perform math on the number it contains?

Is it possible in C++ to create a union that would let me do something like this ...
union myTime {
long millis;
double seconds;
};
BUT, have it somehow do the conversion so that if I input times in milliseconds, and then call seconds, it will take the number and divide it by 1000, or conversely, if I input the number in seconds, then call millis, it would multiply the number by 1000...
So that:
myTime.millis = 1340;
double s = myTime.seconds;
Where s would equal 1.34
or
myTime.seconds = 2.5;
long m = myTime.millis;
Where m would = 2500
Is this possible?
A union is just different representations for the same value (the same bytes), so you can't define any smart logic over that.
In this case, you can define a class with conversion functions (both for initializtion or for getting the data).
class myTime {
public:
myTime(long millis);
double as_seconds();
static void from_seconds(double seconds);
};
Notice that as mentioned in other answers, for time conversions you can use std::chrono objects (c++11 and above)
To answer the question as asked: No. Unions are lower-level structure that simply allow multiple object representations to live in the same memory space. In your example, long and double share the same address.
They are not, however, smart enough to automatically do a conversation of any kind. Accessing the inactive member of a union is actually undefined behavior in most cases (there are exceptions for if you have a common-initial sequence in a standard-layout object).
Even if the behavior were well-defined, the value you would see in the double would be the double interpretation of the byte-pattern necessary to represent 1340.
If your problem is specifically to do with converting millis to seconds, as per your example, have you considered using std::chrono::duration units? These units are designed specifically for automatically doing these conversions between time units for you -- and you are capable of defining durations with custom representations (such as double).
Your example in your problem could be rewritten:
using double_seconds = std::chrono::duration<double>;
const auto millis = std::chrono::millis{1340};
const auto m = double_seconds{millis};
// m contains 1.340
You can if you abuse the type system a bit:
union myTime {
double seconds;
class milli_t {
double seconds;
public:
milli_t &operator=(double ms) {
seconds = ms/1000.0;
return *this; }
operator double() const { return seconds * 1000; }
} millis;
};
Now if you do
myTime t;
t.millis = 1340;
double s = t.seconds;
s would equal 1.34
and
myTime t;
t.seconds = 2.5;
long m = t.millis;
m would be 2500, exactly as you desire.
Of course, why you would want to do this is unclear.

Is there a C++ preprocessor directive or similar that I can use to convert units

I am doing a bit of motor control, and instead of saying 39553 encoder ticks, it would be easier for my human brain to say 6.5 inches. I would like to save processor overhead by converting this at compile time. Is there a way to do this with preprocessor directives or templates maybe? Thanks.
I would use a constexpr for this. As #ChrisMM mentioned, it is not preprocessor, but it will be evaluated at compile time rather than run time. If the values you gave correspond to the actual relationship between ticks and inches, you can use:
constexpr int toTicks(double inches) {
return int(6085 * inches);
}
By doing the cast to an int, you can make sure you do not ever ask the motor to move some fraction of a tick. The only thing to be cautious of with an approach like this is that over time if you ask the motor to move to a bunch of different locations without returning to the origin between each move your origin can slowly shift with rounding errors from the conversion to an int.
You can use user defined literals. For example, here's the implementation for centimeters to the unit meters:
struct meter {
double value;
};
constexpr auto operator"" _cm(double cm) -> meter
{
return meter{cm / 100.0};
}
constexpr auto operator"" _mm(double mm) -> meter
{
return meter{mm / 1000.0};
}
Then use it like this:
auto length1 = 50_cm;
auto length2 = 500_mm;
std::cout << length1.value + length2.value; // prints 1
I would just use #defines:
#define CM_PER_TICK 0.01
#define toTicks(a) (a * CM_PER_TICK)
move(toTicks(6.5));
This would boil down to
move(6.5 * 0.01);
By using cm per ticks instead of ticks per cm we can get rid of the division, but you won't be able to get rid of any multiplication.

Why acts std::chrono::duration::operator*= not like built-in *=?

As described in std::chrono::duration::operator+= the signature is
duration& operator*=(const rep& rhs);
This makes me wonder. I would assume that a duration literal can be used like any other built-in, but it doesn't.
#include <chrono>
#include <iostream>
int main()
{
using namespace std::chrono_literals;
auto m = 10min;
m *= 1.5f;
std::cout << " 150% of 10min: " << m.count() << "min" << std::endl;
int i = 10;
i *= 1.5f;
std::cout << " 150% of 10: " << i << std::endl;
}
Output is
150% of 10min: 10min
150% of 10: 15
Why was the interface choosen that way? To my mind, an interface like
template<typename T>
duration& operator*=(const T& rhs);
would yield more intuitive results.
EDIT:
Thanks for your responses, I know that the implementation behaves that way and how I could handle it. My question is, why is it designed that way.
I would expect the conversion to int take place at the end of the operation. In the following example both operands get promoted to double before the multiplications happens. The intermediate result of 4.5 is converted to int afterwards, so that the result is 4.
int i = 3;
i *= 1.5;
assert(i == 4);
My expectation for std::duration would be that it behaves the same way.
The issue here is
auto m = 10min;
gives you a std::chrono::duration where rep is a signed integer type. When you do
m *= 1.5f;
the 1.5f is converted to the type rep and that means it is truncated to 1, which gives you the same value after multiplication.
To fix this you need to use
auto m = 10.0min;
to get a std::chrono::duration that uses a floating point type for rep and wont truncate 1.5f when you do m *= 1.5f;.
My question is, why is it designed that way.
It was designed this way (ironically) because the integral-based computations are designed to give exact results, or not compile. However in this case the <chrono> library exerts no control over what conversions get applied to arguments prior to binding to the arguments.
As a concrete example, consider the case where m is initialized to 11min, and presume that we had a templated operator*= as you suggest. The exact answer is now 16.5min, but the integral-based type chrono::minutes is not capable of representing this value.
A superior design would be to have this line:
m *= 1.5f; // compile-time error
not compile. That would make the library more self-consistent: Integral-based arithmetic is either exact (or requires duration_cast) or does not compile. This would be possible to implement, and the answer as to why this was not done is simply that I didn't think of it.
If you (or anyone else) feels strongly enough about this to try to standardize a compile-time error for the above statement, I would be willing to speak in favor of such a proposal in committee.
This effort would involve:
An implementation with unit tests.
Fielding it to get a feel for how much code it would break, and ensuring that it does not break code not intended.
Write a paper and submit it to the C++ committee, targeting C++23 (it is too late to target C++20).
The easiest way to do this would be to start with an open-source implementation such as gcc's libstdc++ or llvm's libc++.
Looking at the implementation of operator*=:
_CONSTEXPR17 duration& operator*=(const _Rep& _Right)
{ // multiply rep by _Right
_MyRep *= _Right;
return (*this);
}
the operator takes a const _Rep&. It comes from std::duration which looks like:
template<class _Rep, //<-
class _Period>
class duration
{ // represents a time Duration
//...
So now if we look at the definition of std::chrono::minutes:
using minutes = duration<int, ratio<60>>;
It is clear that _Rep is an int.
So when you call operator*=(const _Rep& _Right) 1.5f is beeing cast to an int - which equals 1 and therefore won't affect any mulitiplications with itself.
So what can you do?
you can split it up into m = m * 1.5f and use std::chrono::duration_cast to cast from std::chrono::duration<float, std::ratio> to std::chrono::duration<int, std::ratio>
m = std::chrono::duration_cast<std::chrono::minutes>(m * 1.5f);
150% of 10min: 15min
if you don't like always casting it, use a float for it as the first template argument:
std::chrono::duration<float, std::ratio<60>> m = 10min;
m *= 1.5f; //> 15min
or even quicker - auto m = 10.0min; m *= 1.5f; as #NathanOliver answered :-)

Specialization of STL Templates

I am trying to write high-performance code that uses random numbers, using Mersenne Twister. It takes roughly ~5ns to generate a random unsigned long long. This is used to generate a double, however, these take ~40ns to generate in a distribution.
Viewing the STL code the doubles, generated by a distribution, are generated by calls to std::generate_canonical, which involves a std::ceil and std::log2 operation, I believe it is these that are costly.
These operations are unnecessary as they are used to calculate the number of bits needed for calls to any RNG implementation. As this is known before compile time, I have written my own implementation that does not make these calls, and the time to generate a double is ~15ns.
Is it possible to specialize a templated STL function? If so how is this achieved, my attempts so far result in the original function still being used. I would like to specialize this STL function as I would still like to use the distributions in <random>.
This is in Visual C++, though once the code has been developed it will be run on Linux and use either GCC or ICC. If the method for generating doubles on Linux is different, (and quicker), this problem is irrelevant.
Edit 1:
I believe all distributions requiring a double make calls to std::generate_canonical, this function creates a double in the range [0,1) and the correct precision is created by iteratively adding calls to the RNG operator(). The log2 and ceil are used to calculate the number of iterations.
MSVC std::generate_canonical
// FUNCTION TEMPLATE generate_canonical
template<class _Real,
size_t _Bits,
class _Gen>
_Real generate_canonical(_Gen& _Gx)
{ // build a floating-point value from random sequence
_RNG_REQUIRE_REALTYPE(generate_canonical, _Real);
const size_t _Digits = static_cast<size_t>(numeric_limits<_Real>::digits);
const size_t _Minbits = _Digits < _Bits ? _Digits : _Bits;
const _Real _Gxmin = static_cast<_Real>((_Gx.min)());
const _Real _Gxmax = static_cast<_Real>((_Gx.max)());
const _Real _Rx = (_Gxmax - _Gxmin) + static_cast<_Real>(1);
const int _Ceil = static_cast<int>(_STD ceil(
static_cast<_Real>(_Minbits) / _STD log2(_Rx)));
const int _Kx = _Ceil < 1 ? 1 : _Ceil;
_Real _Ans = static_cast<_Real>(0);
_Real _Factor = static_cast<_Real>(1);
for (int _Idx = 0; _Idx < _Kx; ++_Idx)
{ // add in another set of bits
_Ans += (static_cast<_Real>(_Gx()) - _Gxmin) * _Factor;
_Factor *= _Rx;
}
return (_Ans / _Factor);
}
My Simplified Version
template<size_t _Bits>
double generate_canonical(std::mt19937_64& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Gxmax = static_cast<double>((_Gx.max)());
const double _Rx = (_Gxmax - _Gxmin) + static_cast<double>(1);
double _Ans = (static_cast<double>(_Gx()) - _Gxmin);
return (_Ans / _Rx);
}
This function is written in namespace std {}
Edit 2:
I found a solution please see my answer below.
Sorry, specializing Standard Library functions is not allowed; doing so results in Undefined Behavior.
However, you can use alternative distributions; C++ has well-defined interfaces between generators and distributions.
Oh, and just to eliminate the possibility of a beginners error (since you don't show code): you do not create a new distribution for every number.
By creating a template function with all parameters set, and declaring the functions as inline, it is possible to create user defined version of std::generate_canonical.
User defined std::generate_canonical:
namespace std {
template<>
inline double generate_canonical<double, static_cast<size_t>(-1), std::mt19937>(std::mt19937& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Rx = (static_cast<double>((_Gx.max)()) - _Gxmin) + static_cast<double>(1);
double _Ans = (static_cast<double>(_Gx()) - _Gxmin);
_Ans += (static_cast<double>(_Gx()) - _Gxmin) *_Rx;
return (_Ans / _Rx * _Rx);
}
template<>
inline double generate_canonical<double, static_cast<size_t>(-1), std::mt19937_64>(std::mt19937_64& _Gx)
{ // build a floating-point value from random sequence
const double _Gxmin = static_cast<double>((_Gx.min)());
const double _Rx = (static_cast<double>((_Gx.max)()) - _Gxmin) + static_cast<double>(1);
return ((static_cast<double>(_Gx()) - _Gxmin) / _Rx);
}
}
The second parameter static_cast<size_t>(-1) should be modified to whatever value is used by specific libraries, this is the case for VC++ but may be different for GCC. This means it is not portable.
This function has been defined for std::mt19337 and std::mt19937_64 and seems to be used for STL distributions correctly.
Results:
double using std::generate_canonical
Generating 400000000 doubles using standard MT took: 17625 milliseconds
This equivalent to: 44.0636 nanoseconds per value
Generating 400000000 doubles using 64bit MT took: 11958 milliseconds
This equivalent to: 29.8967 nanoseconds per value
double using new generate_canonical
Generating 400000000 doubles using standard MT took: 4843 milliseconds
This equivalent to: 12.1097 nanoseconds per value
Generating 400000000 doubles using 64bit MT took: 2645 milliseconds
This equivalent to: 6.61362 nanoseconds per value

How does this float square root approximation work?

I found a rather strange but working square root approximation for floats; I really don't get it. Can someone explain me why this code works?
float sqrt(float f)
{
const int result = 0x1fbb4000 + (*(int*)&f >> 1);
return *(float*)&result;
}
I've test it a bit and it outputs values off of std::sqrt() by about 1 to 3%. I know of the Quake III's fast inverse square root and I guess it's something similar here (without the newton iteration) but I'd really appreciate an explanation of how it works.
(nota: I've tagged it both c and c++ since it's both valid-ish (see comments) C and C++ code)
(*(int*)&f >> 1) right-shifts the bitwise representation of f. This almost divides the exponent by two, which is approximately equivalent to taking the square root.1
Why almost? In IEEE-754, the actual exponent is e - 127.2 To divide this by two, we'd need e/2 - 64, but the above approximation only gives us e/2 - 127. So we need to add on 63 to the resulting exponent. This is contributed by bits 30-23 of that magic constant (0x1fbb4000).
I'd imagine the remaining bits of the magic constant have been chosen to minimise the maximum error across the mantissa range, or something like that. However, it's unclear whether it was determined analytically, iteratively, or heuristically.
It's worth pointing out that this approach is somewhat non-portable. It makes (at least) the following assumptions:
The platform uses single-precision IEEE-754 for float.
The endianness of float representation.
That you will be unaffected by undefined behaviour due to the fact this approach violates C/C++'s strict-aliasing rules.
Thus it should be avoided unless you're certain that it gives predictable behaviour on your platform (and indeed, that it provides a useful speedup vs. sqrtf!).
1. sqrt(a^b) = (a^b)^0.5 = a^(b/2)
2. See e.g. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Exponent_encoding
See Oliver Charlesworth’s explanation of why this almost works. I’m addressing an issue raised in the comments.
Since several people have pointed out the non-portability of this, here are some ways you can make it more portable, or at least make the compiler tell you if it won’t work.
First, C++ allows you to check std::numeric_limits<float>::is_iec559 at compile time, such as in a static_assert. You can also check that sizeof(int) == sizeof(float), which will not be true if int is 64-bits, but what you really want to do is use uint32_t, which if it exists will always be exactly 32 bits wide, will have well-defined behavior with shifts and overflow, and will cause a compilation error if your weird architecture has no such integral type. Either way, you should also static_assert() that the types have the same size. Static assertions have no run-time cost and you should always check your preconditions this way if possible.
Unfortunately, the test of whether converting the bits in a float to a uint32_t and shifting is big-endian, little-endian or neither cannot be computed as a compile-time constant expression. Here, I put the run-time check in the part of the code that depends on it, but you might want to put it in the initialization and do it once. In practice, both gcc and clang can optimize this test away at compile time.
You do not want to use the unsafe pointer cast, and there are some systems I’ve worked on in the real world where that could crash the program with a bus error. The maximally-portable way to convert object representations is with memcpy(). In my example below, I type-pun with a union, which works on any actually-existing implementation. (Language lawyers object to it, but no successful compiler will ever break that much legacy code silently.) If you must do a pointer conversion (see below) there is alignas(). But however you do it, the result will be implementation-defined, which is why we check the result of converting and shifting a test value.
Anyway, not that you’re likely to use it on a modern CPU, here’s a gussied-up C++14 version that checks those non-portable assumptions:
#include <cassert>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <iomanip>
#include <iostream>
#include <limits>
#include <vector>
using std::cout;
using std::endl;
using std::size_t;
using std::sqrt;
using std::uint32_t;
template <typename T, typename U>
inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it reads an inactive union member.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
union tu_pun {
U u = U();
T t;
};
const tu_pun pun{x};
return pun.t;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
const bool is_little_endian = after_rshift == target;
float est_sqrt(const float x)
/* A fast approximation of sqrt(x) that works less well for subnormal numbers.
*/
{
static_assert( std::numeric_limits<float>::is_iec559, "" );
assert(is_little_endian); // Could provide alternative big-endian code.
/* The algorithm relies on the bit representation of normal IEEE floats, so
* a subnormal number as input might be considered a domain error as well?
*/
if ( std::isless(x, 0.0F) || !std::isfinite(x) )
return std::numeric_limits<float>::signaling_NaN();
constexpr uint32_t magic_number = 0x1fbb4000UL;
const uint32_t raw_bits = reinterpret<uint32_t,float>(x);
const uint32_t rejiggered_bits = (raw_bits >> 1U) + magic_number;
return reinterpret<float,uint32_t>(rejiggered_bits);
}
int main(void)
{
static const std::vector<float> test_values{
4.0F, 0.01F, 0.0F, 5e20F, 5e-20F, 1.262738e-38F };
for ( const float& x : test_values ) {
const double gold_standard = sqrt((double)x);
const double estimate = est_sqrt(x);
const double error = estimate - gold_standard;
cout << "The error for (" << estimate << " - " << gold_standard << ") is "
<< error;
if ( gold_standard != 0.0 && std::isfinite(gold_standard) ) {
const double error_pct = error/gold_standard * 100.0;
cout << " (" << error_pct << "%).";
} else
cout << '.';
cout << endl;
}
return EXIT_SUCCESS;
}
Update
Here is an alternative definition of reinterpret<T,U>() that avoids type-punning. You could also implement the type-pun in modern C, where it’s allowed by standard, and call the function as extern "C". I think type-punning is more elegant, type-safe and consistent with the quasi-functional style of this program than memcpy(). I also don’t think you gain much, because you still could have undefined behavior from a hypothetical trap representation. Also, clang++ 3.9.1 -O -S is able to statically analyze the type-punning version, optimize the variable is_little_endian to the constant 0x1, and eliminate the run-time test, but it can only optimize this version down to a single-instruction stub.
But more importantly, this code isn’t guaranteed to work portably on every compiler. For example, some old computers can’t even address exactly 32 bits of memory. But in those cases, it should fail to compile and tell you why. No compiler is just suddenly going to break a huge amount of legacy code for no reason. Although the standard technically gives permission to do that and still say it conforms to C++14, it will only happen on an architecture very different from we expect. And if our assumptions are so invalid that some compiler is going to turn a type-pun between a float and a 32-bit unsigned integer into a dangerous bug, I really doubt the logic behind this code will hold up if we just use memcpy() instead. We want that code to fail at compile time, and to tell us why.
#include <cassert>
#include <cstdint>
#include <cstring>
using std::memcpy;
using std::uint32_t;
template <typename T, typename U> inline T reinterpret(const U &x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it modifies a variable.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
T temp;
memcpy( &temp, &x, sizeof(T) );
return temp;
}
constexpr float source = -0.1F;
constexpr uint32_t target = 0x5ee66666UL;
const uint32_t after_rshift = reinterpret<uint32_t,float>(source) >> 1U;
extern const bool is_little_endian = after_rshift == target;
However, Stroustrup et al., in the C++ Core Guidelines, recommend a reinterpret_cast instead:
#include <cassert>
template <typename T, typename U> inline T reinterpret(const U x)
/* Reinterprets the bits of x as a T. Cannot be constexpr
* in C++14 because it uses reinterpret_cast.
*/
{
static_assert( sizeof(T)==sizeof(U), "" );
const U temp alignas(T) alignas(U) = x;
return *reinterpret_cast<const T*>(&temp);
}
The compilers I tested can also optimize this away to a folded constant. Stroustrup’s reasoning is [sic]:
Accessing the result of an reinterpret_cast to a different type from the objects declared type is still undefined behavior, but at least we can see that something tricky is going on.
Update
From the comments: C++20 introduces std::bit_cast, which converts an object representation to a different type with unspecified, not undefined, behavior. This doesn’t guarantee that your implementation will use the same format of float and int that this code expects, but it doesn’t give the compiler carte blanche to break your program arbitrarily because there’s technically undefined behavior in one line of it. It can also give you a constexpr conversion.
Let y = sqrt(x),
it follows from the properties of logarithms that log(y) = 0.5 * log(x) (1)
Interpreting a normal float as an integer gives INT(x) = Ix = L * (log(x) + B - σ) (2)
where L = 2^N, N the number of bits of the significand, B is the exponent bias, and σ is a free factor to tune the approximation.
Combining (1) and (2) gives: Iy = 0.5 * (Ix + (L * (B - σ)))
Which is written in the code as (*(int*)&x >> 1) + 0x1fbb4000;
Find the σ so that the constant equals 0x1fbb4000 and determine whether it's optimal.
Adding a wiki test harness to test all float.
The approximation is within 4% for many float, but very poor for sub-normal numbers. YMMV
Worst:1.401298e-45 211749.20%
Average:0.63%
Worst:1.262738e-38 3.52%
Average:0.02%
Note that with argument of +/-0.0, the result is not zero.
printf("% e % e\n", sqrtf(+0.0), sqrt_apx(0.0)); // 0.000000e+00 7.930346e-20
printf("% e % e\n", sqrtf(-0.0), sqrt_apx(-0.0)); // -0.000000e+00 -2.698557e+19
Test code
#include <float.h>
#include <limits.h>
#include <math.h>
#include <stddef.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
float sqrt_apx(float f) {
const int result = 0x1fbb4000 + (*(int*) &f >> 1);
return *(float*) &result;
}
double error_value = 0.0;
double error_worst = 0.0;
double error_sum = 0.0;
unsigned long error_count = 0;
void sqrt_test(float f) {
if (f == 0) return;
volatile float y0 = sqrtf(f);
volatile float y1 = sqrt_apx(f);
double error = (1.0 * y1 - y0) / y0;
error = fabs(error);
if (error > error_worst) {
error_worst = error;
error_value = f;
}
error_sum += error;
error_count++;
}
void sqrt_tests(float f0, float f1) {
error_value = error_worst = error_sum = 0.0;
error_count = 0;
for (;;) {
sqrt_test(f0);
if (f0 == f1) break;
f0 = nextafterf(f0, f1);
}
printf("Worst:%e %.2f%%\n", error_value, error_worst*100.0);
printf("Average:%.2f%%\n", error_sum / error_count);
fflush(stdout);
}
int main() {
sqrt_tests(FLT_TRUE_MIN, FLT_MIN);
sqrt_tests(FLT_MIN, FLT_MAX);
return 0;
}