i am doing fixed point implementation in c++ and i am trying to define “not-a-number” and support a function bool isnan( … ) which returns true if the number is not-a-number and false otherwise.
can someone one give me some ideas of how to define “not-a-number” and implement a function bool isnan( … ) in my fixed point math implemenation.
i have read about C++ Nan, but i couldnt get any source or reference for how to manually define and create function nan() to use it in fixed point implemenation.
can some one tell me how to proceed or give some references to proceed?
Thank you
UPDATE Fixedpoint header
#ifndef __fixed_point_header_h__
#define __fixed_point_header_h__
#include <boost/operators.hpp>
#include <boost/assert.hpp>
#endif
namespace fp {
template<typename FP, unsigned char I, unsigned char F>
class fixed_point: boost::ordered_field_operators<fp::fixed_point<FP, I, F> >
{
//compute the power of 2 at compile time by template recursion
template<int P,typename T = void>
struct power2
{
static const long long value = 2 * power2<P-1,T>::value;
};
template <typename P>
struct power2<0, P>
{
static const long long value = 1;
};
fixed_point(FP value,bool): fixed_(value){ } // initializer list
public:
typedef FP base_type; /// fixed point base type of this fixed_point class.
static const unsigned char integer_bit_count = I; /// integer part bit count.
static const unsigned char fractional_bit_count = F; /// fractional part bit count.
fixed_point(){ } /// Default constructor.
//Integer to Fixed point
template<typename T> fixed_point(T value) : fixed_((FP)value << F)
{
BOOST_CONCEPT_ASSERT((boost::Integer<T>));
}
//floating point to fixed point
fixed_point(float value) :fixed_((FP)(value * power2<F>::value)){ }
fixed_point(double value) : fixed_((FP)(value * power2<F>::value)) { }
fixed_point(long double value) : fixed_((FP)(value * power2<F>::value)) { }
/// Copy constructor,explicit definition
fixed_point(fixed_point<FP, I, F> const& rhs): fixed_(rhs.fixed_)
{ }
// copy-and-swap idiom.
fp::fixed_point<FP, I, F> & operator =(fp::fixed_point<FP, I, F> const& rhs)
{
fp::fixed_point<FP, I, F> temp(rhs); // First, make a copy of the right-hand side
swap(temp); //swapping the copied(old) data the new data.
return *this; //return by reference
}
/// Exchanges the elements of two fixed_point objects.
void swap(fp::fixed_point<FP, I, F> & rhs)
{
std::swap(fixed_, rhs.fixed_);
}
bool operator <(
/// Right hand side.
fp::fixed_point<FP, I, F> const& rhs) const
{
return fixed_ < rhs.fixed_; //return by value
}
bool operator ==(
/// Right hand side.
fp::fixed_point<FP, I, F> const& rhs) const
{
return fixed_ == rhs.fixed_; //return by value
}
// Addition.
fp::fixed_point<FP, I, F> & operator +=(fp::fixed_point<FP, I, F> const& summation)
{
fixed_ += summation.fixed_;
return *this; //! /return A reference to this object.
}
/// Subtraction.
fp::fixed_point<FP, I, F> & operator -=(fp::fixed_point<FP, I, F> const& subtraction)
{
fixed_ -= subtraction.fixed_;
return *this; // return A reference to this object.
}
// Multiplication.
fp::fixed_point<FP, I, F> & operator *=(fp::fixed_point<FP, I, F> const& factor)
{
fixed_ = ( fixed_ * (factor.fixed_ >> F) ) +
( ( fixed_ * (factor.fixed_ & (power2<F>::value-1) ) ) >> F );
return *this; //return A reference to this object.
}
/// Division.
fp::fixed_point<FP, I, F> & operator /=(fp::fixed_point<FP, I, F> const& divisor)
{
fp::fixed_point<FP, I, F> fp_z=1;
fp_z.fixed_ = ( (fp_z.fixed_) << (F-2) ) / ( divisor.fixed_ >> (2) );
*this *= fp_z;
return *this; //return A reference to this object
}
private:
/// The value in fixed point format.
FP fixed_;
};
} // namespace fmpl
#endif
#endif // __fixed_point_header__
usually fixedpoint math is used on embedded hardware that has no FPU.
Mostly this hardware also lacks of Programm or Data Space or/and processing power.
Are you sure that you require a generic support of NAN, INF, or what ever?
May be it is sufficient to explicitly implement this on as separate Flags on the operations that can produce theese values.
THen you use fixed point arithmetic you extremely good have to know your data to avoid overflows or underflows on mutliplications or divisions. So your algorithms have to be written in a way that avoids theese special conditions anyway.
Additional to this even then using double: once you have one of theese special values in your algorithm they spread like a virus and the result is quite useless.
As conlusion: In My Opinions explicitly implementing this in your fixedpoint class is a significant waste of processing power, because you have to add conditionals to every fixpoint operation. And conditionals are poison to the deep cpu pipelines of DSPs or µC.
Could you give us an example of what you mean by fixed point? Is it implemented as a class? Is it fixed number of bytes, or do you support 8, 16, 32, 64bit numbers? How do you represent negative values?
Depending on these factors you can implement a few possibe different ways. The way the IEEE floating point numbers get away with it, is because the number are encoded in a special format allowing flags to be set based on the bit pattern. In a fixed point implementation that might not be possible. but if its a class you could define the arithmetic operators for the class and then set the resultant number to be nan.
UPDATE
Looking at the code it seems you are just stuffing the information in the value. So the best way may be to have an isnan flag in the class, and set it from the appropriate math operations, and then check for it before you perform the operations so the isnan propogates.
Essentially, you must set aside some value or set of values to represent a NaN. In every operation on your objects (e.g., addition), you must test whether an input value is a NaN and respond accordingly.
Additionally, you must make sure no normal operation produces a NaN result inadvertently. So you have to handle overflows and such to ensure that, if a calculated result would be a bit pattern for a NaN, you produce an infinity and/or an exception indication and/or whatever result is desired.
That is basically it; there is no magic.
Generally, you would not want to use a single bit as a flag, because that wastes many bit combinations that could be used to represent values. IEEE 754 sets aside one value of the exponent field (all ones) to indicate infinity (if the significand field is all zeroes) or NaN (otherwise). That way, only a small portion of the bit combinations are used for NaNs. (For 32-bit, there are 224-2 NaNs out of 232 possible bit combinations, so less than .4% of the potential values are expended on NaNs.)
Related
This question requires a bit of context - if you're feeling impatient, skip past the line break... I have a Vector-3,4 and Matrix-3,4 library defined in terms of template specializations; i.e., Vector<n> and Matrix<n> are defined in Matrix.hh, while non-trivial implementations (e.g., matrix multiplication, matrix inverse) have explicit specializations or instantiations in Matrix.cc for N = {3,4}.
This approach has worked well. In theory, an app could instantiate a Matrix<100>, but couldn't multiply or invert the matrix, as there are no implementation templates visible in the header. Only N = {3,4} are instantiated in Matrix.cc
Recently, I've been adding robust methods to complement any operation that involves an inner product - including matrix multiplications, matrix transforms of vectors, etc. Most 3D transforms (projections / orientations) are relatively well-conditioned, and any minor precision errors are not a problem since shared vertices / edges yield a consistent rasterization.
There are some operations that must be numerically robust. I can't do anything about how a GPU does dot products and matrix operations when rendering; but I cannot have control / camera parameters choke on valid geometry - and inner products are notorious for pathological cancellation errors, so the robust methods use compensated summation, products, dot products, etc.
This works fine for, say, Vector inner product in Matrix.hh :
////////////////////////////////////////////////////////////////////////////////
//
// inner product:
template <int n> float
inner (const GL0::Vector<n> & v0, const GL0::Vector<n> & v1)
{
float r = v0[0] * v1[0];
for (int i = 1; i < n; i++)
r += v0[i] * v1[i];
return r; // the running sum for the inner product.
}
float
robust_inner (const GL0::Vector<3> &, const GL0::Vector<3> &);
float
robust_inner (const GL0::Vector<4> &, const GL0::Vector<4> &);
////////////////////////////////////////////////////////////////////////////////
The implementations in Matrix.cc are not trivial.
I'm in more dubious territory when adding a robust method for [A]<-[A][B] matrix multiplication; perhaps the naming is not ideal:
template <int n> GL0::Matrix<n> &
operator *= (GL0::Matrix<n> & m0, const GL0::Matrix<n> & m1);
// (external instantiation)
GL0::Matrix<3> &
robust_multiply (GL0::Matrix<3> &, const GL0::Matrix<3> &);
GL0::Matrix<4> &
robust_multiply (GL0::Matrix<4> &, const GL0::Matrix<4> &);
There is a N = {3,4} implementation for the operator *= in Matrix.cc, but it relies on the naive inner product and is not robust - though typically good enough for GL / visualization. The robust_multiply functions are also implemented in Matrix.cc.
Now of course, I want the Matrix multiplication operator:
template <int n> GL0::Matrix<n>
operator * (GL0::Matrix<n> m0, const GL0::Matrix<n> & m1) {
return (m0 *= m1);
}
Leading me to the problematic definitions:
inline GL0::Matrix<3>
robust_multiply (GL0::Matrix<3> m0, const GL0::Matrix<3> & m1) {
return robust_multiply(m0, m1);
}
inline GL0::Matrix<4>
robust_multiply (GL0::Matrix<4> m0, const GL0::Matrix<4> & m1) {
return robust_multiply(m0, m1);
}
The call to robust_multiply(m0, m1) is ambiguous. Q: How can I force the LHS argument to be interpreted as a reference, ensuring a call to the previous function that modifies the (m0) argument. Obviously I can name robust_multiply as something else, but I'm more interested in utilizing the type system. I feel I'm missing something obvious in <utility> or <functional>. How do I force a call to the correct function?
(Sorry about the word count - I'm trying to clarify my own thinking as I write)
You named robust_multiply wrong.
*= and * are fundamentally different operations. They are related, but not the same operation - different verbs.
Overloading should be used when you are doing the same operation on different nouns.
If you do that, then your problems almost certainly evaporate. Sensible overloads are easy to write.
In your case, you want to change between writing to an argument or not based on its l/r value category. That leads to ambiguity problems.
I mean, there are workarounds to your problem -- use std::ref or pointers, for example, or &, && and const& overloads -- but they are patches here.
Naming this in programming is hard. And here is a case were you should do that hard bit.
...
Now one thing you could do is bless the arguments.
template<class T>
struct robust{
T t;
explicit operator T&()&{return t;}
explicit operator T()&&{
return std::forward<T>(t);
}
// also get() methods
explicit robust(T&&tin):
t(std::forward<T>(tin))
{}
};
then override *= and * for robust wrapped matrices.
robust{a}*=b;
(have lhs must be robust to keep overload count down).
Now the verb is clear, I just dressed up the nouns.
But this is just an idea, and not use-tested.
As seen in this question, there is a difference between the results MKL gives, between serial and distributed execution. For that reason, I would like to study that error. From my book I have:
|ε_(x_c)| = |x - x_c| <= 1/2 * 10^(-d), where d specifies the decimal digits that are accurate, between the actual number, x and the number the computer has, x_c.
|ρ_(x_c)| = |x - x_c|/|x| <= 5 * 10^(-s) is the absolute relative error, where s specifies the number of significant digits.
So, we can write code like this:
double calc_error(double a,double x)
{
return std::abs(x-a)/std::abs(a);
}
in order to compute the absolute error for example, as seen here.
Are there more types of errors to study, except from the absolute error and the absolute relative error?
Here are some of my data to play with:
serial gives:
-250207683.634793 -1353198687.861288 2816966067.598196 -144344843844.616425 323890119928.788757
distributed gives:
-250207683.634692 -1353198687.861386 2816966067.598891 -144344843844.617096 323890119928.788757
and then I can expand the idea(s) to the actual data and results.
It doesn't get much more complicated than absolute and absolute relative errors. There is another method that compares integer-representations of floating-point formats, the idea being that you want your "tolerance" to adapt with the magnitude of the numbers you are comparing (specifically because there aren't "as many" numbers representable depending on the magnitude).
All in all, I think your question is very similar to floating-point comparison, for which there is this excellent guide, and this more exhaustive but much longer paper.
It might also be worth throwing in these for comparing floating point values:
#include <limits>
#include <cmath>
template <class T>
struct fp_equal_strict
{
inline bool operator() ( const T& a, const T& b )
{
return std::abs(a - b)
<= std::max(
std::numeric_limits<T>::min() * std::min( std::abs(a), std::abs(b) ),
std::numeric_limits<T>::epsilon()
);
}
};
template <class T>
struct fp_equal_loose
{
inline bool operator() ( const T& a, const T& b )
{
return std::abs(a - b)
<= std::max(
std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) ),
std::numeric_limits<T>::epsilon()
);
}
};
template <class T>
struct fp_greater
{
inline bool operator() ( const T& a, const T& b )
{
return (a - b) >= std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) );
}
};
template <class T>
struct fp_lesser
{
inline bool operator() ( const T& a, const T& b )
{
return (b - a) >= std::numeric_limits<T>::min() * std::max( std::abs(a), std::abs(b) );
}
};
I would mention that it is also possible to perform an ULPs (Units in the Last Place) comparison, which shows how far away two floating point numbers are in the binary representation. This is a nice indication of "closeness" since if two numbers are for example one ULP apart it means that there is no floating point number between them, so they are as close as possible in the binary representation without acutally being equal.
This method is descrbied here which is a more recent version than the article linked from the accepted answer by the same author. Sample code is also provided.
As an aside, but related to the context of your work (comparing sequential vs parallel floating point computations) it is important to note that floating point operations are not associative which means parallel implementations may not in general give the same result as sequential implementations. Even changing the compiler and optimisation options can actually lead to different results (e.g. GCC vs ICC, -O0 vs -O3).
An example algorithm on how to reduce the error computation for performing summation of floating point numbers can be found here and a comprehensive document by the author of that algorithm can be found here.
This question already has answers here:
What is the best way to indicate that a double value has not been initialized?
(9 answers)
Closed 10 years ago.
I'm in a situation where I have a std::vector<double> but I want some of those doubles to be "nothing"/"non-existent". How is this done in C++? We can safely assume that all "normal" doubles are not negative for my purposes.
Should I let -1 (or some negative) denote "nothing"? That doesn't sound very elegant.
Should I create a Double class with a "nothing" bool member? That could work but seems rather lengthy and ugly.
Should I create a Double class and create a "NoDouble : public Double" subclass? That sounds even worse.
Any ideas would be appreciated.
If you have IEEE floating point arithmetic then use std::numeric_limits<double>::quiet_NaN() as value for "nothing". For checking if d is "nothing" use isnan(d). Also d != d is true only when d is NaN. Problem with NaN is that one may get it when doing defective calculations like dividing zero by zero or taking sqare root from negative number. Any calculations with NaN result also with NaN.
If you happen to use boost you may use boost::optional<double> that adds other level of not availability to side of NaN. Then you have two bad states: invalid number and missing number. Boost contains lot of useful libraries so it is worthy tool anyway.
If you need several possible reasons attached for why it is "nothing", then use special fallible class instead of double. Fallible was invented by Barnton and Nackman, the authors of the highly acclaimed
"Scientific and Engineering C++" book.
You mentioned that there may not be negative numbers. On such case enwrap the double into class. What you have is not technically normal double so your class can add limitations to it.
You could use std::vector<double *>. A NULL pointer would indicate an empty slot or value.
What you want to do is to keep the vector the same, but also use a vector of bool, and then wrap that in a class.
Using the vector of bool on decent compiler should be optimized to 1 bit per boolean so that should get rid of your space problem.
class MyNullable {
public:
double value;
bool is_null;
};
class NullableDoubles {
public:
std::vector<double> values;
std::vector<bool> nulls;
void push_back(double d, bool is_null) {
values.push_back(d);
nulls.push_back(is_null);
}
MyNullable GetValue(int index) {
MyNullable result;
result.value = values[index];
result.is_null = nulls[index];
return result;
}
bool IsNull(int index) { return nulls[index]; }
bool MakeNull(int index) { nulls[index] = false; }
};
And I am sure you can see the value(not pun intended) of wrapping that up in a template or two and then making nullable lists of anything.
template <class T>
class NullablesClass {
public:
std::vector<T> values;
std::vector<bool> nulls;
void push_back(T d, bool is_null) {
values.push_back(d);
nulls.push_back(is_null);
}
MyNullable GetValue(int index) {
MyNullableT<T> result;
result.value = values[index];
result.is_null = nulls[index];
return result;
}
bool IsNull(int index) { return nulls[index]; }
bool MakeNull(int index) { nulls[index] = false; }
T GetValue(int index) { return values[index]; }
};
I hope that can do. Seems like the best way to be able to use all possible double values and also know if it is NULL while using the least memory and using the best alignment of memory. The vector is a specialization template in the C++ library so you should really only get 1 bit per bool for that.
I need a bitset with a slightly diffrent behavior when asigning variables with integer type to a specific bit. The bit should be set to zero if the assigned integer is smaller then one, and to one elsewise.
As a simple solution I copied the STL bitset, replaced the classname with altbitset, adjusted namespaces and include guard and added following function under reference& operator=(bool __x) in the nested reference class:
template <typename T>
reference& operator=(T i) {
if (i<1) return operator=(false);
return operator=(true);
}
It works as expected.
Question is if there is a better way doing this.
You shouldn't copy a library just to add a new function. Not only that, the new function is wildly unintuitive and could possibly be the source of errors for even just reading the code, let alone writing it.
Before:
bv[n] = -1; // I know a Boolean conversion on -1 will take place
assert(bv[n]); // of course, since -1 as a Boolean is true
After:
bv[n] = -1; // I guess an integer < 1 means false?
assert(bv[n]); // Who changed my bitvector semantics?!
Just write it out so it makes sense in your domain:
bv[n] = (i < 1);
Remember: simplest doesn't always mean fewest characters, it means clearest to read.
If you do want to extend the functionality of existing types, you should do so with free functions:
template <typename BitSet, typename Integer>
auto assign_bit_integer(BitSet& bits, const std::size_t bit, const Integer integer) ->
typename std::enable_if<std::is_integral<Integer>::value,
typename BitSet::reference>::type
{
return bits[bit] = (integer < 1);
}
Giving:
std::bitset<8> bits;
assign_bit_integer(bits, 0, 5);
// ERROR: assign_bit_integer(bits, 0, 5.5);
But for such a small function with no clear "obvious" name that describes what it does concisely(assign_bit_true_if_less_than_one_otherwise_false is verbose, to say the least), just write out the code; it says the same thing anyway.
Basically I want to restrict variables to the values 0, 1 or 2.
I have tried doing this with the following:
enum Value
{
0,
1,
2
};
Value var;
But this is a compile error because the enum values are unlabelled. It just makes the code less readable to assign names like "ZERO", "ONE" and "TWO" rather than referring to the values as 0, 1 and 2. Is there any way around this or should I just get rid of the enum and enforce the rule elsewhere?
If you want to use enum, then you need to name them. Since you're just working with integer values, and you apparently want them to actually represent integer values, your best bet is it use an int parameter, and do a quick check at the top of the method. A comment on the method specifying this constraint would be welcome.
Note that if your values actually correspond to non-numeric settings, then you should just come up with good names and use the enum
Just because you add identifiers for the values doesn't mean you have to use them... you can use Value(0), Value(2) etc. if that's more convenient, but there is a danger: enum doesn't restrict the value stored to those listed... e.g. it won't protect you against Value(3).
Inside structs/classes you can use bit fields to restrict the storage used for numbers, but even then:
- the range has to correspond to either the signed or unsigned values possible in the number of bits requested
- attempts to assign other values will result in high order bits being removed rather than any kind of compile- or run-time error
If your intention is to create a distinct type that enforces a restricted values 0 through 2, then you need a class with specialised constructor and assignment operators:
template <int MIN, int MAX>
class Bound
{
public:
explicit Bound(int n) { *this = n; }
Bound& operator=(int n)
{
if (n < MIN or n > MAX)
throw std::runtime_error("out of bounds");
n_ = n;
return *this;
}
Bound& operator+=(int n) { *this = n_ + n; }
// should "+" return int or Bound? entirely usage dependent...
Bound operator+(int n) { return Bound(n_ + n); }
// -=, -, *=, *, /=, /, %=, %, bitwise ops, pre/post ++/-- etc...
operator int() const { return n_; }
private:
int n_;
};
You are looking for the builtin int type, AFAICT
If you really want to behave like a Java programmer use ADT religiously, you can always:
template <typename ordinal=int>
struct Value
{
ordinal _val;
/*implicit*/ Value(ordinal val) : _val(val) {}
/*implicit*/ operator const ordinal&() const { return _val; }
/*implicit*/ operator ordinal&() { return _val; }
};
int main()
{
Value<> x = 3;
int y = x;
x = y;
x += 17;
x++;
return x;
}
This will return 22
Of course, it is entirely possible to make the Value<> less generic and more useful in many ways, but you didn't really tell us anything about what you want with that