Representing nothing in C++ [duplicate] - c++

This question already has answers here:
What is the best way to indicate that a double value has not been initialized?
(9 answers)
Closed 10 years ago.
I'm in a situation where I have a std::vector<double> but I want some of those doubles to be "nothing"/"non-existent". How is this done in C++? We can safely assume that all "normal" doubles are not negative for my purposes.
Should I let -1 (or some negative) denote "nothing"? That doesn't sound very elegant.
Should I create a Double class with a "nothing" bool member? That could work but seems rather lengthy and ugly.
Should I create a Double class and create a "NoDouble : public Double" subclass? That sounds even worse.
Any ideas would be appreciated.

If you have IEEE floating point arithmetic then use std::numeric_limits<double>::quiet_NaN() as value for "nothing". For checking if d is "nothing" use isnan(d). Also d != d is true only when d is NaN. Problem with NaN is that one may get it when doing defective calculations like dividing zero by zero or taking sqare root from negative number. Any calculations with NaN result also with NaN.
If you happen to use boost you may use boost::optional<double> that adds other level of not availability to side of NaN. Then you have two bad states: invalid number and missing number. Boost contains lot of useful libraries so it is worthy tool anyway.
If you need several possible reasons attached for why it is "nothing", then use special fallible class instead of double. Fallible was invented by Barnton and Nackman, the authors of the highly acclaimed
"Scientific and Engineering C++" book.
You mentioned that there may not be negative numbers. On such case enwrap the double into class. What you have is not technically normal double so your class can add limitations to it.

You could use std::vector<double *>. A NULL pointer would indicate an empty slot or value.

What you want to do is to keep the vector the same, but also use a vector of bool, and then wrap that in a class.
Using the vector of bool on decent compiler should be optimized to 1 bit per boolean so that should get rid of your space problem.
class MyNullable {
public:
double value;
bool is_null;
};
class NullableDoubles {
public:
std::vector<double> values;
std::vector<bool> nulls;
void push_back(double d, bool is_null) {
values.push_back(d);
nulls.push_back(is_null);
}
MyNullable GetValue(int index) {
MyNullable result;
result.value = values[index];
result.is_null = nulls[index];
return result;
}
bool IsNull(int index) { return nulls[index]; }
bool MakeNull(int index) { nulls[index] = false; }
};
And I am sure you can see the value(not pun intended) of wrapping that up in a template or two and then making nullable lists of anything.
template <class T>
class NullablesClass {
public:
std::vector<T> values;
std::vector<bool> nulls;
void push_back(T d, bool is_null) {
values.push_back(d);
nulls.push_back(is_null);
}
MyNullable GetValue(int index) {
MyNullableT<T> result;
result.value = values[index];
result.is_null = nulls[index];
return result;
}
bool IsNull(int index) { return nulls[index]; }
bool MakeNull(int index) { nulls[index] = false; }
T GetValue(int index) { return values[index]; }
};
I hope that can do. Seems like the best way to be able to use all possible double values and also know if it is NULL while using the least memory and using the best alignment of memory. The vector is a specialization template in the C++ library so you should really only get 1 bit per bool for that.

Related

C++ How to create an array of template class? [duplicate]

I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.

How to iterate through variable members of a class C++

I'm currently trying to do a complicated variable correction to a bunch of variables (based on normalizing in various phase spaces) for some data that I'm reading in. Since each correction follows the same process, I was wondering if there would be anyway to do this iteratively rather than handle each variable by itself (since I need to this for about 18-20 variables). Can C++ handle this? I was told by someone to try this in python but I feel like it could be done in C++ in some way... I'm just hitting a wall!
To give you an idea, given something like:
class VariableClass{
public :
//each object of this class represents an event for this particlular data set
//containing the following variables
double x;
double y;
double z;
}
I want to do something along the lines of:
for (int i=0; i < num_variables; i++)
{
for (int j=0; j < num_events; j++)
{
//iterate through events
}
//correct variable here, then move on to next one
}
Thanks in advance for any advice!!!
I'm assuming your member variables will not all have the same type. Otherwise you can just throw them into a container. If you have C++11, one way you could solve this problem is a tuple. With some template metaprogramming you can simulate a loop over all elements of the tuple. The function std::tie will build a tuple with references to all of your members that you can "iterate" like this:
struct DoCorrection
{
template<typename T>
void operator()(T& t) const { /* code goes here */ }
};
for_each(std::tie(x, y, z), DoCorrection());
// see linked SO answer for the detailed code to make this special for_each work.
Then, you can specialize operator() for each member variable type. That will let you do the appropriate math automatically without manually keeping track of the types.
taken from glm (detail vec3.incl)
template <typename T>
GLM_FUNC_QUALIFIER typename tvec3<T>::value_type &
tvec3<T>::operator[]
(
size_type i
)
{
assert(i < this->length());
return (&x)[i];
}
this would translate to your example:
class VariableClass{
public :
//each object of this class represents an event for this particlular data
double x;
double y;
double z;
double & operator[](int i) {
assert(i < 3);
return (&x)[i];
}
}
VariableClass foo();
foo.x = 2.0;
std::cout << foo[0] << std::endl; // => 2.0
Althought i would recomment glm, if it is just about vector math.
Yes, just put all your variables into a container, like std::vector, for example.
http://en.cppreference.com/w/cpp/container/vector
I recommend spending some time reading about all the std classes. There are many containers and many uses.
In general you cannot iterate over members without relying on implementation defined things like padding or reordering of sections with different access qualifiers (literally no compiler does the later - it is allowed though).
However, you can use a the generalization of a record type: a std::tuple. Iterating a tuple isn't straight-forward but you will find plenty of code that does it. The worst here is the loss of named variables, which you can mimic with members.
If you use Boost, you can use Boost.Fusion's helper-macro BOOST_FUSION_ADAPT_STRUCT to turn a struct into a Fusion sequence and then you can use it with Fusion algorithms.

Store different data types in map - with info on type

I need to parse and store a somewhat (but not too) complex stream and need to store the parsed result somehow. The stream essentially contains name-value pairs with values possibly being of different type for different names. Basically, I end up with a map of key (always string) to a pair <type, value>.
I started with something like this:
typedef enum ValidType {STRING, INT, FLOAT, BINARY} ValidType;
map<string, pair<ValidType, void*>> Data;
However I really dislike void* and storing pointers. Of course, I can always store the value as binary data (vector<char> for example), in which case the map would end up being
map<string, pair<ValidType, vector<char>>> Data;
Yet, in this case I would have to parse the binary data every time I need the actual value, which would be quite expensive in terms of performance.
Considering that I am not too worried about memory footprint (the amount of data is not large), but I am concerned about performance, what would be the right way to store such data?
Ideally, I'd like to avoid using boost, as that would increase the size of the final app by a factor of 3 if not more and I need to minimise that.
You're looking for a discriminated (or tagged) union.
Boost.Variant is one example, and Boost.Any is another. Are you so sure Boost will increase your final app size by a factor of 3? I would have thought variant was header-only, in which case you don't need to link any libraries.
If you really can't use Boost, implementing a simple discriminated union isn't so hard (a general and fully-correct one is another matter), and at least you know what to search for now.
For completeness, a naive discriminated union might look like:
class DU
{
public:
enum TypeTag { None, Int, Double };
class DUTypeError {};
private:
TypeTag type_;
union {
int i;
double d;
} data_;
void typecheck(TypeTag tt) const { if(type_ != tt) throw DUTypeError(); }
public:
DU() : type_(None) {}
DU(DU const &other) : type_(other.type_), data_(other.data_) {}
DU& operator= (DU const &other) {
type_=other.type_; data_=other.data_; return *this;
}
TypeTag type() const { return type_; }
bool istype(TypeTag tt) const { return type_ == tt; }
#define CONVERSIONS(TYPE, ENUM, MEMBER) \
explicit DU(TYPE val) : type_(ENUM) { data_.MEMBER = val; } \
operator TYPE & () { typecheck(ENUM); return data_.MEMBER; } \
operator TYPE const & () const { typecheck(ENUM); return data_.MEMBER; } \
DU& operator=(TYPE val) { type_ = ENUM; data_.MEMBER = val; return *this; }
CONVERSIONS(int, Int, i)
CONVERSIONS(double, Double, d)
};
Now, there are several drawbacks:
you can't store non-POD types in the union
adding a type means modifying the enum, and the union, and remembering to add a new CONVERSIONS line (it would be even worse without the macro)
you can't use the visitor pattern with this (or, you'd have to write your own dispatcher for it), which means lots of switch statements in the client code
every one of these switches may also need updating if you add a type
if you did write a visitor dispatch, that needs updating if you add a type, and so may every visitor
you need to manually reproduce something like the built-in C++ type-conversion rules if you want to do anything like arithmetic with these (ie, operator double could promote an Int instead of only handling Double ... but only if you hand-roll every operator)
I haven't implemented operator== precisely because it needs a switch. You can't just memcmp the two unions if the types match, because identical 32-bit integers could still compare different if the extra space required for the double holds a different bit pattern
Some of these issues can be addressed if you care about them, but it's all more work. Hence my preference for not re-inventing this particular wheel if it can be avoided.
Since your data types are fixed what about something like this...
Have something like a std::vector for each type of value.
And your map would have as the second value of the pair the index to the data.
std::vector<int> vInt;
std::vector<float> vFloat;
.
.
.
map<std::string, std::pair<ValidType, int>> Data;
You can implement a multi-type map by leveraging the nifty features of std::tuple in C++11, which allows access by a type key. You can wrap this to create access by arbitrary keys. An in-depth explanation of this (and quite an interesting read) is available here:
https://jguegant.github.io/blogs/tech/thread-safe-multi-type-map.html
The modern C++ features provide create ways to solve old problems.

Cache gauss points for numerical integration in c++

This is a question on what people think the best way to lay out my class structure for my problem. I am doing some numerical analysis and require certain "elements" to be integrated. So I have created a class called "BoundaryElement" like so
class BoundaryElement
{
/* private members */
public:
integrate(Point& pt);
};
The key function is 'integrate' which I need to evaluate for a whole variety of different points. What happens is that, depending on the point, I need to use a different number of integration points and weights, which are basically vectors of numbers. To find these, I have a class like so:
class GaussPtsWts
{
int numPts;
double* gaussPts;
double* gaussWts;
public:
GaussPtsWts(const int n);
GaussPtsWts(const GaussPtsWts& rhs);
~GaussPtsWts();
GaussPtsWts& operator=(const GaussPtsWts& rhs);
inline double gwt(const unsigned int i)
{
return gaussWts[i];
}
inline double gpt(const unsigned int i)
{
return gaussPts[i];
}
inline int numberGPs()
{
return numGPs;
}
};
Using this, I could theoretically create a GaussPtsWts instance for every call to the integrate function. But I know that I maybe using the same number of gauss points many times , and so I would like to cache this data. I'm not very confident on how this might be done - potentially a std::map which is a static member of the BoundaryElement class? If people could shed any light on this I would be very grateful. Thanks!
I had a similar issue once and used a map (as you suggested). What I would do is change the GaussPtsWts to contain the map:
typedef std::map<int, std::vector<std::pair<double, double>>> map_type;
Here I've taken your two arrays of the points and weights and put them into a single vector of pairs - which should apply if I remember my quadrature correctly. Feel free to make a small structure of the point and weight to make it more readable.
Then I'd create a single instance of the GaussPtsWts and store a reference to it in each BoundaryElement. Or perhaps a shared_ptr depending on how you like it. You'd also need to record how many points you are using.
When you ask for a weight, you might have something like this:
double gwt(const unsigned int numGPs, const unsigned int i)
{
map_type::const_iterator found = themap.find(numGPs);
if(found == themap.end())
calculatePoints(numGPs);
return themap[numGPs][i].first;
}
Alternatively you could mess around with templates with an integer parameter:
template <int N>
class GaussPtsWts...

C++ cast to array of a smaller size

Here's an interesting question about the various quirks of the C++ language. I have a pair of functions, which are supposed to fill an array of points with the corners of a rectangle. There are two overloads for it: one takes a Point[5], the other takes a Point[4]. The 5-point version refers to a closed polygon, whereas the 4-point version is when you just want the 4 corners, period.
Obviously there's some duplication of work here, so I'd like to be able to use the 4-point version to populate the first 4 points of the 5-point version, so I'm not duplicating that code. (Not that it's much to duplicate, but I have terrible allergic reactions whenever I copy and paste code, and I'd like to avoid that.)
The thing is, C++ doesn't seem to care for the idea of converting a T[m] to a T[n] where n < m. static_cast seems to think the types are incompatible for some reason. reinterpret_cast handles it fine, of course, but is a dangerous animal that, as a general rule, is better to avoid if at all possible.
So my question is: is there a type-safe way of casting an array of one size to an array of a smaller size where the array type is the same?
[Edit] Code, yes. I should have mentioned that the parameter is actually a reference to an array, not simply a pointer, so the compiler is aware of the type difference.
void RectToPointArray(const degRect& rect, degPoint(&points)[4])
{
points[0].lat = rect.nw.lat; points[0].lon = rect.nw.lon;
points[1].lat = rect.nw.lat; points[1].lon = rect.se.lon;
points[2].lat = rect.se.lat; points[2].lon = rect.se.lon;
points[3].lat = rect.se.lat; points[3].lon = rect.nw.lon;
}
void RectToPointArray(const degRect& rect, degPoint(&points)[5])
{
// I would like to use a more type-safe check here if possible:
RectToPointArray(rect, reinterpret_cast<degPoint(&)[4]> (points));
points[4].lat = rect.nw.lat; points[4].lon = rect.nw.lon;
}
[Edit2] The point of passing an array-by-reference is so that we can be at least vaguely sure that the caller is passing in a correct "out parameter".
I don't think it's a good idea to do this by overloading. The name of the function doesn't tell the caller whether it's going to fill an open array or not. And what if the caller has only a pointer and wants to fill coordinates (let's say he wants to fill multiple rectangles to be part of a bigger array at different offsets)?
I would do this by two functions, and let them take pointers. The size isn't part of the pointer's type
void fillOpenRect(degRect const& rect, degPoint *p) {
...
}
void fillClosedRect(degRect const& rect, degPoint *p) {
fillOpenRect(rect, p); p[4] = p[0];
}
I don't see what's wrong with this. Your reinterpret-cast should work fine in practice (i don't see what could go wrong - both alignment and representation will be correct, so the merely formal undefinedness won't carry out to reality here, i think), but as i said above i think there's no good reason to make these functions take the arrays by reference.
If you want to do it generically, you can write it by output iterators
template<typename OutputIterator>
OutputIterator fillOpenRect(degRect const& rect, OutputIterator out) {
typedef typename iterator_traits<OutputIterator>::value_type value_type;
value_type pt[] = {
{ rect.nw.lat, rect.nw.lon },
{ rect.nw.lat, rect.se.lon },
{ rect.se.lat, rect.se.lon },
{ rect.se.lat, rect.nw.lon }
};
for(int i = 0; i < 4; i++)
*out++ = pt[i];
return out;
}
template<typename OutputIterator>
OutputIterator fillClosedRect(degRect const& rect, OutputIterator out) {
typedef typename iterator_traits<OutputIterator>::value_type value_type;
out = fillOpenRect(rect, out);
value_type p1 = { rect.nw.lat, rect.nw.lon };
*out++ = p1;
return out;
}
You can then use it with vectors and also with arrays, whatever you prefer most.
std::vector<degPoint> points;
fillClosedRect(someRect, std::back_inserter(points));
degPoint points[5];
fillClosedRect(someRect, points);
If you want to write safer code, you can use the vector way with back-inserters, and if you work with lower level code, you can use a pointer as output iterator.
I would use std::vector or (this is really bad and should not be used) in some extreme cases you can even use plain arrays via pointer like Point* and then you shouldn't have such "casting" troubles.
Why don't you just pass a standard pointer, instead of a sized one, like this
void RectToPointArray(const degRect& rect, degPoint * points ) ;
I don't think your framing/thinking of the problem is correct. You don't generally need to concretely type an object that has 4 vertices vs an object that has 5.
But if you MUST type it, then you can use structs to concretely define the types instead.
struct Coord
{
float lat, long ;
} ;
Then
struct Rectangle
{
Coord points[ 4 ] ;
} ;
struct Pentagon
{
Coord points[ 5 ] ;
} ;
Then,
// 4 pt version
void RectToPointArray(const degRect& rect, const Rectangle& rectangle ) ;
// 5 pt version
void RectToPointArray(const degRect& rect, const Pentagon& pent ) ;
I think this solution is a bit extreme however, and a std::vector<Coord> that you check its size (to be either 4 or 5) as expected with asserts, would do just fine.
I guess you could use function template specialization, like this (simplified example where first argument was ignored and function name was replaced by f(), etc.):
#include <iostream>
using namespace std;
class X
{
};
template<int sz, int n>
int f(X (&x)[sz])
{
cout<<"process "<<n<<" entries in a "<<sz<<"-dimensional array"<<endl;
int partial_result=f<sz,n-1>(x);
cout<<"process last entry..."<<endl;
return n;
}
//template specialization for sz=5 and n=4 (number of entries to process)
template<>
int f<5,4>(X (&x)[5])
{
cout<<"process only the first "<<4<<" entries here..."<<endl;
return 4;
}
int main(void)
{
X u[5];
int res=f<5,5>(u);
return 0;
}
Of course you would have to take care of other (potentially dangerous) special cases like n={0,1,2,3} and you're probably better off using unsigned int's instead of ints.
So my question is: is there a
type-safe way of casting an array of
one size to an array of a smaller size
where the array type is the same?
No. I don't think the language allows you to do this at all: consider casting int[10] to int[5]. You can always get a pointer to it, however, but we can't 'trick' the compiler into thinking a fixed-sized has a different number of dimensions.
If you're not going to use std::vector or some other container which can properly identify the number of points inside at runtime and do this all conveniently in one function instead of two function overloads which get called based on the number of elements, rather than trying to do crazy casts, consider this at least as an improvement:
void RectToPointArray(const degRect& rect, degPoint* points, unsigned int size);
If you're set on working with arrays, you can still define a generic function like this:
template <class T, size_t N>
std::size_t array_size(const T(&/*array*/)[N])
{
return N;
}
... and use that when calling RectToPointArray to pass the argument for 'size'. Then you have a size you can determine at runtime and it's easy enough to work with size - 1, or more appropriate for this case, just put a simple if statement to check if there are 5 elements or 4.
Later if you change your mind and use std::vector, Boost.Array, etc. you can still use this same old function without modifying it. It only requires that the data is contiguous and mutable. You can get fancy with this and apply very generic solutions that, say, only require forward iterators. Yet I don't think this problem is complicated enough to warrant such a solution: it'd be like using a cannon to kill a fly; fly swatter is okay.
If you're really set on the solution you have, then it's easy enough to do this:
template <size_t N>
void RectToPointArray(const degRect& rect, degPoint(&points)[N])
{
assert(N >= 4 && "points requires at least 4 elements!");
points[0].lat = rect.nw.lat; points[0].lon = rect.nw.lon;
points[1].lat = rect.nw.lat; points[1].lon = rect.se.lon;
points[2].lat = rect.se.lat; points[2].lon = rect.se.lon;
points[3].lat = rect.se.lat; points[3].lon = rect.nw.lon;
if (N >= 5)
points[4].lat = rect.nw.lat; points[4].lon = rect.nw.lon;
}
Yeah, there is one unnecessary runtime check but trying to do it at compile time is probably analogous to taking things out of your glove compartment in an attempt to increase your car's fuel efficiency. With N being a compile-time constant expression, the compiler is likely to recognize that the condition is always false when N < 5 and just eliminate that whole section of code.