Union vs. static_cast(void*) - c++

I'm writing code and until now I was using structures like this:
struct s{
enum Types { zero = 0, one, two };
unsigned int type;
void* data;
}
I needed some generic structure to store data from different classes and I wanted to use it in std::vector, so that's reason why I can't use templates. What's better option: unions or void pointers?
Void pointer allocates only as much space as I need, but c++ is strong typed language for some reason and casting everywhere I need to use those data is not the way c++ code should be designed. As I read, void pointers shouldn't be used unless there's no alternative.
That alternative could be Unions. They comes with c++ and uses the same memory space for every member, very much like void pointers. However they come at price - allocated space is the size of largest element in union, and in my case differences between sizes are big.
This is rather stylistic and "correct language using" problem, as both ways accomplish what I need to do, but I can't decide if nicely stylized c++ code can pay for that wasted memory (even though memory these days isn't a big concern).

Consider boost::any or boost::variant if you want to store objects of heterogeneous types.
And before deciding which one to use, have a look at the comparison:
Boost.Variant vs. Boost.Any
Hopefully, it will help you to make the correct decision. Choose one, and any of the container from the standard library to store the objects, std::vector<boost::any>, std::vector<boost::variant>, or any other.

boost::variant.
Basically, it is a type-safe union, and in this case, it seems like unions are by far the most appropriate answer. A void* could be used, but that would mean dynamic allocation, and you would have to maintain the Types enum, and the table for casting.
Memory constraints could make void* an acceptable choice, but it's not the 'neat' answer, and I wouldn't go for it until both boost::variant and just a plain union have shown to be unacceptable.

If your classes have enough in common to be put in the same container give them a base class with a virtual destructor, and possibly a virtual member function to retrieve your type code, even though at that point not only dynamic_cast would be more appropriate, but it could be reasonable to explore whether your classes don't have enough in common to provide them with a more complete common interface.
Otherwise consider providing a custom container class with appropriately typed data members to hold instances of all the different classes you need to put into it.

Related

Cast char array struct vector to a POD vector?

Say, if I want to create a vector type of only for holding POD structures and regular data types. Can I do the following? It looks very unsafe but it works. If it is, what sort of issues might arise?
template <size_t N>
struct Bytes {
char data[N];
};
std::vector<Bytes<sizeof(double)> > d_byte_vector;
std::vector<double>* d_vectorP = reinterpret_cast<std::vector<double>*>(&d_byte_vector);
for (int i=0;i<50;i++) {
d_vectorP->push_back(rand()/RAND_MAX);
}
std::cout << d_vectorP->size() << ":" << d_byte_vector.size() << std::endl;
No, this is not safe. Specific compilers may make some guarantee that is particular to that dialect of C++, but according to the Standard you evoke Undefined Behavior by instantiating an array of char and pretending it's actually something else completely unrelated.
Usually when I see code like this, the author was going for one of three things and missed one of the Right Ways to go about it:
Maybe the author wanted an abstract data structure. A better way to go about that is to use an abstract base class, and move the implementation elsewhere.
Maybe the author wanted an opaque data structure. In that case I would employ some variant of the pimpl idiom, where the void* (or char*) presented to the user actually points to some real data type.
Maybe the author wanted some kind of memory pool. The best way to accomplish that is to allocate a large buffer of char, and then use placement-new to construct real objects within it.
No, this is not safe and generally not recommended because compilers aren't required to operate in a method that allows it. With that said, I've found exactly one reason to ever do this (very recently as well) and that is a variant of the pimpl idiom where I wanted to avoid pointers so that all of my data access could avoid the need to allocate memory for, deallocate memory for, and dereference the extra pointer. That code isn't in production yet and I'm still keeping an eye on that section of code to make sure that it doesn't start causing any other problems.
Unless you're generating code that has to be Extremely optimized, I would recommend finding some other way of doing whatever it is you need to do.

Storing elements of different type in a vector/array in C++?

I'm trying to create a simple dynamic language interpreter in C++. I'd like to be able to declare dynamically typed arrays, but I'm not sure how to store them in some object in C++.
In Ruby/Python I can store anything I want, but what's an efficient way of doing this in C++?
(Also, if someone has a link to a simple open source lexer/parser/interpreter for dynamic languages like Ruby, I'd appreciate a link).
You will have to roll some custom solution based on your language's semantics. For example, you can use boost::any to store any object, but you won't be able to perform, for example, name lookups. A knowledge of some assembler is useful here because you're basically emulating that. What most people do is something like
struct Object {
boost::any cppobject;
std::unordered_map<std::string, std::function<void(boost::any&, std::vector<boost::any>&)> funcs;
};
std::vector<Object> stuff;
When, in your hypothetical language, you have something like
stuff[0].hi();
Then you can convert it into something like
std::vector<boost::any> args;
// fill args
stuff.at(0).funcs["hi"](stuff.at(0).cppobject, args);
// now args holds the result
It's quite possible to optimize this scheme further, but not to generalize it further, as it's already maximally general.
The way dynamic languages store universal objects is via pointers, you can do the same thing in C++. Store a pointer to a generic "object" that you define in your C++ classes, that's an efficient way of doing it.
An alternative to using unions or dynamically allocating objects of a common base type (and downcasting them as appropriate via dynamic_cast or an equivalent construct) is boost::variant, which allows you to write code such as:
typedef boost::variant<int, float, char, std::string> LangType;
std::vector<LangType> langObjects;
If your design allows for such an implementation, this has the advantage of being compile-time safe and avoiding any penalty imposed by use of the heap, virtual functions and polymorphic downcasts.

Is it unefficient to use a std::vector when it only contains two elements?

I am building a C++ class A that needs to contain a bunch of pointers to other objects B.
In order to make the class as general as possible, I am using a std::vector<B*> inside this class. This way any number of different B can be held in A (there are no restrictions on how many there can be).
Now this might be a bit of overkill because most of the time, I will be using objects of type A that only hold either 2 or 4 B*'s in the vector.
Since there is going to be a lot of iterative calculations going on, involving objects of class A, I was wondering if there is a lot of overhead involved in using a vector of B's when there are only two B's needed.
Should I overload the class to use another container when there are less than 3 B present?
to make things clearer: A are multipoles and B are magnetic coils, that constitute the multipoles
Premature optimization. Get it working first. If you profile your application and see that you need more efficiency (in memory or performance), then you can change it. Otherwise, it's a potential waste of time.
I would use a vector for now, but typedef a name for it instead of spelling std::vector out directly where it's used:
typedef std::vector vec_type;
class A {
vec_type<B*> whatever;
};
Then, when/if it becomes a problem, you can change that typedef name to refer to a vector-like class that's optimized for a small number of contained objects (e.g., does something like the small-string optimization that's common with many implementations of std::string).
Another possibility (though I don't like it quite as well) is to continue to use the name "vector" directly, but use a using declaration to specify what vector to use:
class A {
using std::vector;
vector<B*> whatever;
};
In this case, when/if necessary, you put your replacement vector into a namespace, and change the using declaration to point to that instead:
class A {
using my_optimized_version::vector;
// the rest of the code remains unchanged:
vector<B*> whatever;
};
As far as how to implement the optimized class, the typical way is something like this:
template <class T>
class pseudo_vector {
T small_data[5];
T *data;
size_t size;
size_t allocated;
public:
// ...
};
Then, if you have 5 or fewer items to store, you put them in small_data. When/if your vector contains more items than that fixed limit, you allocate space on the heap, and use data to point to it.
Depending a bit on what you're trying to optimize, you may want to use an abstract base class, with two descendants, one for small vectors and the other for large vectors, with a pimpl-like class to wrap them and make either one act like something you can use directly.
Yet another possibility that can be useful for some situations is to continue to use std::vector, but provide a custom Allocator object for it to use when obtaining storage space. Googling for "small object allocator" should turn up a number of candidates that have already been written. Depending on the situation, you may want to use one of those directly, or you may want to use them as inspiration to write your own.
If you need an array of B* that will never change its size, you won't need the dynamic shrinking and growing abilities of the std::vector.
So, probably not for reasons of efficiency, but for reasons of intuition, you could consider using a fixed length array:
struct A {
enum { ndims = 2 };
B* b[ndims];
};
or std::array (if available):
struct A {
std::array<B*, 2> b;
};
see also this answer on that topic.
Vectors are pretty lean as far as overhead goes. I'm sure someone here can give more detailed information about what that really means. But if you've got performance issues, they're not going to come from vector.
In addition I'd definitely avoid the tactic of using different containers depending on how many items there are. That's just begging for a disaster and won't really give you anything in return.

Encapsulation vs structs - is this considered bad style?

I have a bunch of classes in a CUDA project that are mostly glorified structs and are dependent on each other by composition:
class A {
public:
typedef boost::shared_ptr<A> Ptr;
A(uint n_elements) { ... // allocate element_indices };
DeviceVector<int>::iterator get_element_indices();
private:
DeviceVector<int> element_indices;
}
class B {
public:
B(uint n_elements) {
... // initialize members
};
A::Ptr get_a();
DevicePointer<int>::iterator get_other_stuff();
private:
A::Ptr a;
DeviceVector<int> other_stuff;
}
DeviceVector is just a wrapper around thrust::device_vectors and the ::iterator can be cast to a raw device pointer. This is needed, as custom kernels are called and require handles to device memory.
Now, I do care about encapsulation, but
raw pointers to the data have to be exposed, so the classes using A and B can run custom kernels on the GPU
a default constructor is not desired, device memory should be allocated automatically --> shared_ptr<T>
only very few methods on A and B are required
So, one could make life much simpler by simply using structs
struct A {
void initialize(uint n_elements);
DeviceVector<int> element_indices;
}
struct B {
void initialize(uint n_elements);
A a;
DeviceVector<int> other_stuff;
}
I'm wondering whether I'm correct that in the sense of encapsulation this is practically equivalent. If so, is there anything that is wrong with the whole concept and might bite at some point?
Make it simple. Don't introduce abstraction and encapsulation before you need it.
It is a good habit to always make your data members private. It may seem at first that your struct is tiny, has no or a couple of member functions, and needs to expose the data members. However, as your program evolves, these "structs" tend to grow and proliferate. Before you know it, all of your code depends on the internals of one of these structs, and a slight change to it will reverberate throughout your code base.
Even if you need to expose raw pointers to the data, it is still a good idea to do that through getters. You may want to change how the data is handled internally, e. g. replace a raw array with an std::vector. If your data member is private and you are using a getter, you can do that without affecting any code using your class. Furthermore, getters let you enforce const-ness, and make a particular piece of data read-only by returning a const pointer.
It is a bit more work up front, but most of the time it pays off in the long run.
It's a trade off.
Using value structs can be a beautifully simple way to group a bunch of data together. They can be very kludgy if you start tacking on a lot of helper routines and rely on them beyond their intended use. Be strict with yourself about when and how to use them and they are fine. Having zero methods on these objects is a good way to make this obvious to yourself.
You may have some set of classes that you use to solve a problem, I'll call it a module. Having value structs within the module are easy to reason about. Outside of the module you have to hope for good behavior. You don't have strict interfaces on them, so you have to hope the compiler will warn you about misuse.
Given that statement, I think they are more appropriate in anonymous or detail namespaces. If they end up in public interfaces, people tend to adding sugar to them. Delete the sugar or refactor it into a first class object with an interface.
I think they are more appropriate as const objects. The problem you fall into is that you are (trying to) maintain the invariance of this "object" everywhere that its used for its entire lifetime. If a different level of abstraction wants them with slight mutations, make a copy. The named parameter idiom is good for this.
Domain Driven Design gives thoughtful, thorough treatment on the subject. It characterizes it a more practical sense of how to understand and facilitate design.
Clean Code also discusses the topic, though from a different perspective. It is more of a morality book.
Both are awesome books and generally recommend outside of this topic.

Dynamic type dereferrencing?

In attempting to answer another question, I was intrigued by a bout of curiousity, and wanted to find out if an idea was possible.
Is it possible to dynamically dereference either a void * pointer (we assume it points to a valid referenced dynamically allocated copy) or some other type during run time to return the correct type?
Is there some way to store a supplied type (as in, the class knows the void * points to an int), if so how?
Can said stored type (if possible) be used to dynamically dereference?
Can a type be passed on it's own as an argument to a function?
Generally the concept (no code available) is a doubly-linked list of void * pointers (or similar) that can dynamically allocated space, which also keep with them a copy of what type they hold for later dereference.
1) Dynamic references:
No. Instead of having your variables hold just pointers, have them hold a struct containing both the actual pointer and a tag defining what type the pointer is pointing to
struct Ref{
int tag;
void *ref;
};
and then, when "dereferencing", first check the tag to find out what you want to do.
2) Storing types in your variables, passing them to functions.
This doesn't really make sense, as types aren't values that can be stored around. Perhaps what you just want is to pass around a class / constructor function and that is certainly feasible.
In the end, C and C++ are bare-bones languages. While a variable assignment in a dynamic language looks a lot like a variable assignment in C (they are just a = after all) in reality the dynamic language is doing a lot of extra stuff behind the scenes (something it is allowed to do, since a new language is free to define its semantics)
Sorry, this is not really possible in C++ due to lack of type reflection and lack of dynamic binding. Dynamic dereferencing is especially impossible due to these.
You could try to emulate its behavior by storing types as enums or std::type_info* pointers, but these are far from practical. They require registration of types, and huge switch..case or if..else statements every time you want to do something with them. A common container class and several wrapper classes might help achieving them (I'm sure this is some design pattern, any idea of its name?)
You could also use inheritance to solve your problem if it fits.
Or perhaps you need to reconsider your current design. What exactly do you need this for?