Is this a proper usage of union - c++

I want to have named fields rather than indexed fields, but for some usage I have to iterate on the fields. Dumb simplified example:
struct named_states {float speed; float position;};
#define NSTATES (sizeof(struct named_states)/sizeof(float))
union named_or_indexed_states {
struct named_states named;
float indexed[NSTATES];
}
...
union named_or_indexed_states states,derivatives;
states.named.speed = 0;
states.named.position = 0;
...
derivatives.named.speed = acceleration;
derivatives.named.position= states.named.speed;
...
/* This code is in a generic library (consider nstates=NSTATES) */
for(i=0;i<nstates;i++)
states.indexed[i] += time_step*derivatives.indexed[i];
This avoid a copy from named struct to indexed array and vice-versa, and replace it with a generic solution and is thus easier to maintain (I have very few places to change when I augment the state vector).It also work well with various compiler I tested (several versions of gcc/g++ and MSVC).
But theorically, as I understand it, it does not strictly adhere to proper union usage since I wrote named field then read indexed field, and I'm not sure at all we can say that they share same struct fields...
Can you confirm that's it's theorically bad (non portable)?
Should I better use a cast, a memcpy() or something else?
Apart theory, from pragmatic POV is there any REAL portability issue (some incompatible compiler, exotic struct alignment, planned evolutions...)?
EDIT: your answers deserve a bit more clarification about my intentions that were:
to let programmer focus on domain specific equations and release them from maintenance of conversion functions (I don't know how to write a generic one, apart cast or memcpy tricks which do not seem more robust)
to add a bit more coding security by using struct (fully controlled by compiler) vs arrays (decalaration and access subject to more programmer mistakes)
to avoid polluting namespace too much with enum or #define
I need to know
how portable/dangerous is my steering off the standard (maybe some compiler with aggressive inlining will use full register solution and avoid any memory exchange ruining the trick),
and if I missed a standard solution that address above concerns in part or whole.

There's no requirement that the two fields in named_states line up the same way as the array elements. There's a good chance that they do, but you've got a compiler dependency there.
Here's a simple implementation in C++ of what you're trying to do:
struct named_or_indexed_states {
named_or_indexed_states() : speed(indexed[0], position(indexed[1]) { }
float &speed;
float &position;
float indexed[2];
};
If the size increase because of the reference elements is too much, use accessors:
struct named_or_indexed_states {
float indexed[2];
float& speed() { return indexed[0]; }
float& position() { return indexed[1]; }
};
The compiler will have no problem inlining the accessors, so reading or writing speed() and position() will be just as fast as if they were member data. You still have to write those annoying parentheses, though.

Only accessing last written member of union is well-defined; the code you presented uses, as far as only standard C (or C++) is concerned, undefined behavior - it may work, but it's wrong way to do it. It doesn't really matter that struct uses the same type as the type of array - there may be padding involved, as well as other invisible tricks used by compiler.
Some compilers, like GCC, do define it as allowed way to achieve type-punning. Now the question arises - are we talking about standard C (or C++), or GNU or any other extensions?
As for what you should use - proper conversion operators and/or constructors.

This may be a little old-fashioned, but what I would do in this situation is:
enum
{
F_POSITION,
F_SPEED,
F_COUNT
};
float states[F_COUNT];
Then you can reference them as:
states[F_POSITION] and states[F_SPEED].
That's one way that I might write this. I'm sure that there are many other possibilities.

Related

Repacking a struct in C/C++

I would like to have a single definition for packed and unpacked structs.
The intent is to use reflection to translate one to the other. I do this currently by redefinition for performance versus wire formatting. It is necessary to have two definitions and memberwise copy between them to stop those rude segfaults.
It would be nice to not have two definitions in the spirit of DRY.
The C++ standard doesn't help which is unfortunate given how important packing is in wire formats.
I can't seem to find an implementation-defined way of doing this.
An __attribute__((packed)) within a using alias or a #pragma pack(1) wrapping the same doesn't change the packing.
I was hoping there would be some alternative to wrapping stuff in YAM (yet another macro).
FWIW adding an alignment via the using A __attribute__((aligned(256))) = UnA; does work on gcc v10.2. Changing the packing does not, sadly :-(.
Any clever trick I haven't found would be much appreciated.
--Matt.
It would be nice to not have two definitions in the spirit of DRY.
From a technical point of view it’s not a question of DRY. Two stucts with the same members in the same order but different padding are two distinct types. They have nothing in common because they may have different sizes and the members may live at different offsets. That’s as distinct as it gets.
So, separate definitions are unavoidable. To mitigate the source code duplication you could use the preprocessor. But you’re right of course, YAM is generally ugly, probably error prone, and altogether awful.
Instead take your preferred code generator and get rid of the duplication that way. The details depend on what that generator can do, of course. The most convenient solution for the programmer would be to write the non-packed struct manually. Something like this:
// header.hpp
namespace foo
{
struct Bar
{
// members ...
};
}
Possibly you need some kind of tag (macro? custom annotation?) to mark the types that need a packed version. Then let the code generator parse that header (possibly using libClang/libTooling) and generate the rest:
// header_packed.hpp
namespace foo::packed
{
struct __attribute__((packed)) Bar
{
// members ...
};
// conversion functions ...
}
The second best option is to write the relevant structs not in C++ but in some kind of DSL your code generator is familiar with and then generate all the C++ from that.

C++ make member object unnamed

I want to write convinient Color management class, that would allow me to use different orders of components and basically different setups. I want it to be distinguishable at compile-time.
Lets say I have this code:
template <typename _valueType>
struct RGBAColorData {
using ValueType = _valueType;
union {
struct { ValueType r, g, b, a; };
ValueType components[4];
};
};
This (even if anonymous structs are non-standard) works fine, when I want to use it like this:
RGBAColorData color;
color.r = whatever;
However, this is not the final form of my code. I want it to have "owning" class template, that would in compile-time select between XYZColorData. Lets say it looks like this:
template <typename _valueType, template <typename> _dataScheme>
struct Color
{
using ValueType = _valueType;
using DataScheme = _dataScheme<ValueType>;
// what now?
// DataScheme data; // ???
};
This makes a problem, because I want my code to be used like this:
using RGBAColorF = Color<float, RGBAColorData>;
RGBAColorF brushColor;
brushColor.r = whatever;
This would make a really convinient way to use colors, however I can't think of any solution to this problem.
Finally, maybe I have wrong approach to this and maybe this can be done with less effort, however I can't think about any other method that wouldn't involve massive amount of template class specializations.
Don't do it !
Tricking around for obtaining some nice syntactic effects is full of danger and might obliterate future evolution.
First of all, in C++ only one union member can be active at any moment. so switching the use of the array and of the struct is not guaranteed to work, even if on many compilers this may lead to the expected results.
Then, there is no guarantee for structure members to be contiguous, so that if mixing use of array and struct would work, it might still not lead to a correct result, even if on many compilers this will work as expected.
Or do it with a safer approach...
If you still like to mix the use of specific color components r, g, b and of the array, you should consider a safer approach:
template <typename _valueType>
struct RGBAColorData {
using ValueType = _valueType;
ValueType components[4];
ValueType &r=components[0], &g=components[1],
&b=components[2], &a=components[3]; // ATTENTION (see explanations)
};
ATTENTION: I made it quick and dirty. You should better implement the rule of three, with a proper constructor, a copy constructor and an assignment operator, to make sure that the references are not messed up.
I do not like this solution so much, but it works safely (online demo): the trick is to make r, g, b, a references to specific array items. You are then sure that you can mix the access way, and you are absolutely sure that the mapping between the two is correct.
But prefer clean encapsulation
The problem with your initial approach and my workaround is that they oth break the encapsulation: you have to know the inner structure of your color in order to use it.
With this approach, you'll never able to evolve. For example switching to a CMYK color scheme, or adopting a bit fields encoding would be compromised.
The proper way would be to have a set of getters and setters to completely hide the inner structure to the outside world. Of course, syntactically it does not look so nice, but then you'd really be able to write truly generic color code, where the encoding scheme could be a compile-time strategy.
Finally, I decided to use inheritance (as Ben Voigt said).
I fixed the other problem, with unions that made code unsafe, using the brilliant method proposed by this answer:
https://stackoverflow.com/a/494760/4386320

Overuse of redefining primitive data types?

My current project code base has every unit and its friend refined.
Extract :-
...
typedef int m; // meter
typedef int htz;
typedef int s; // second
...
Good or Bad?
I hate it! Its a pain, there is no benefit, and "m" globally defined, omg!
But I want to state the reason why I hate it, in a bit more of technical/articulate manor... hello readers!
Can people list For/Against arguments for this pattern? Many thanks.
Better to make them custom types, as then you can control conversions and overload operators. Right now, I can do meaningless things like multiply a metre by a hertz. Ideally, m / s would yield a velocity- but it won't. It's meaningless to just typedef them like that.
Presumably they are trying to document intent, but without type safety there is no enforcing it. It is just clutter that increases the barrier of entry for reasoning about the code.
Even if they did try and create type safety, trying to abstract data at low levels just adds complexity. It doesn't make solving problems easier. The variable name describes the contents well enough anyway.

Is it okay to forgo getters and setters for simple classes?

I'm making a very simple class to represent positions in 3D space.
Currently, I'm just letting the user access and modify the individual X, Y and Z values directly. In other words, they're public member variables.
template <typename NumericType = double>
struct Position
{
NumericType X, Y, Z;
// Constructors, operators and stuff...
};
The reasoning behind this is that, because NumericType is a template parameter, I can't rely on there being a decent way to check values for sanity. (How do I know the user won't want a position to be represented with negative values?) Therefore, there's no point in adding getters or setters to complicate the interface, and direct access should be favored for its brevity.
Pos.X = Pos.Y + Pos.Z; // Versus...
Pos.SetX(Pos.GetY() + Pos.GetZ());
Is this an okay exception to good practice? Will a (hypothetical) future maintainer of my code hunt me down and punch me in the face?
The idea behind using getters and setters is to be able to perform other behavior than just setting a value. This practice is recommended because there are a multitude of things you might want to retrofit into your class.
Common reasons to use a setter (there are probably more):
Validation: not all values allowed by the type of the variable are valid for the member: validation is required before assignment.
Invariants: dependent fields might need to be adjusted (e.g. re-sizing an array might require re-allocation, not just storing the new size).
Hooks: there is extra work to perform before/after assignment, such as triggering notifications (e.g. observers/listeners are registered on the value).
Representation: the field is not stored in the format "published" as getters and setters. The field might not even stored in the object itself; the value might be forwarded to some other internal member or stored in separate components.
If you think your code will never, ever use or require any of the above, then writing getters and setters by principle is definitely not good practice. It just results in code bloat.
Edit: contrarily to popular belief, using getters and setters is unlikely to help you in changing the internal representation of the class unless these changes are minor. The presence of setters for individual members, in particular, makes this change very difficult.
Getters and setters are really only an important design choice if they get/set an abstract value that you may have implemented in any number of ways. But if your class is so straight-forward and the data members so fundamental that you need to expose them directly, then just make them public! You get a nice, cheap aggregate type without any frills and it's completely self-documenting.
If you really do want to make a data member private but still give full access to it, just make a single accessor function overloaded once as T & access() and once as const T & access() const.
Edit: In a recent project I simply used tuples for coordinates, with global accessor functions. Perhaps this could be useful:
template <typename T>
inline T cX(const std::tuple<T,T,T> & t) { return std::get<0>(t); }
typedef std::tuple<double, double, double> coords;
//template <typename T> using coords = std::tuple<T,T,T>; // if I had GCC 4.8
coords c{1.2, -3.4, 5.6};
// Now we can access cX(c), cY(c), cZ(c).
Took me a while, but I tracked this old Stroustrup interview down, where he discusses exposed-data structs versus encapsulated classes himself: http://www.artima.com/intv/goldilocks3.html
Getting more heavily into specifics, there's are dimensions to this that may be missing / understated in existing answers. The benefits of encapsulation increase with:
re-compilation/link dependency: low-level library code that is used by large numbers of applications, where those apps may be time-consuming and/or difficult to recompile and redeploy
it's usually easier if implementation was out-of-line (which may require pImpl idiom and performance compromises) so you only have to relink, and easier still if you can deploy new shared libraries and simply bounce the app
by way of contrast, there's massively less benefit from encapsulation if the object's only used in "non-extern" implementation of a specific translation unit
interface stability despite implementation volatility: code where the implementation is more experimental / volatile, but the API requirement is well understood
note that by being careful it may be possible to give direct access to member variables while using typedefs for their types, such that a proxy object can be substituted and support identical client-code usage while invoking different implementation
If you do some very easy stuff your solution could be just fine.
If you later realize that calculations in a spherical coordinate system are much easier or faster (and you need performance), you can count on that punch.
It is ok for such well known structure that :
Can have any possible values, like an int;
Should operate like a built-in type when manipulating it's value, for performance reasons.
However, if you need more than a type that "just is a 3D vector", then you should wrap it in another class, as private member, that would then expose x, y and z through member functions and additional features member functions.
The reasoning behind this is that, because NumericType is a template parameter, I can't rely on there being a decent way to check values for sanity. (How do I know the user won't want a position to be represented with negative values?)
The language and compilers support this case well (via specialization).
Therefore, there's no point in adding getters or setters to complicate the interface, and direct access should be favored for its brevity.
Moot argument -- see above.
Is this an okay exception to good practice?
I don't think it is. Your question implies validation should exist, but it's not worth implementing/supporting because you've chosen to use a template in your implementation, and not specialize appropriate for the language feature you've chosen. By that approach, the interface only appears to be partially supported -- those missing implementations will just pollute clients' implementations.

Do the accessors affect the performance of an application?

I was wondering if the use of accessors can significantly affect performance of an application. Let's say we have a class Point and there are two private fields. We can get access to these fields by calling public functions such as GetX().
class Point
{
public:
Point(void);
double GetX();
double GetY();
void SetX(double x);
void SetY(double y);
~Point(void);
private:
double x,y;
};
However if we need to get the value of field x a lot of time (e.g if we process images) wouldn't this construction affect the performance of application? Maybe it would be faster just to make fields x and y public?
First and foremost, this is probably premature optimization, and in the general case accessors are not the source of application-level bottlenecks. However, they're not magic pixie dust. It's generally not the case that accessors will hurt performance. There are a few things to consider:
If the implementation is inline or if you have a toolchain that supports link-time optimization, it's likely that there will be 0 impact. Here's an example that lets you get absolutely the same performance on a compiler that doesn't suck.
class Point {
public: double GetX() const;
private: double x;
};
inline double Point::GetX() const { return x; }
If the implementation is out-of-line, then you have the added cost of a function call. If, as you say, the function is being called many times, then at least the code is more or less guaranteed to be in the cache, but the relative % of overhead may be high: the work to perform the function call is higher than the work of moving a double around, and there's a pointer indirection because the function actually uses this as a parameter.
If the implementation is both out-of-line and part of a relocatable library (Linux *.so or Windows *.dll), there's an additional indirection that occurs in order to manage the relocation.
Both of the latter costs are reduced on x86-64 hardware relative to x86 32-bit; so much so that you should just not worry about it. I can't speak about other architectures.
Penultimately, if you have many trivial objects with trivial getters and setters, and if you have no profile-guided optimization or link-time optimization, there may be caching effects due to large numbers of tiny functions. It's likely that each function requires a minimum of one cache line, and the functions are not going to be naturally organized in a way that groups commonly-used sections together. This cost is something you should probably ignore unless you're writing a very large-scale C++ project or core component, such as the KDE base system.
Ultimately, don't worry about it.
Such methods should always be inlined by the compiler and the performance of that will be identical to making them public. You can use the inline keyword to help the compiler along, but that's just a hint. If it's really critical that you avoid function call overhead, read the generated assembly. If they're getting inlined you're ok. Otherwise you might want to consider loosening their visibility.
In a typical case, no, there will not be a difference in performance (unless you've fairly specifically told the compiler not to inline any functions). If you allow it to inline functions, however, chances are that it'll generate identical assembly language for both.
That should not, however, be seen as an excuse for ruining your design by including these abominations. First of all, a class should generally provide high level operations, so (for example) you could have a move_relative and move_absolute, so instead of something like this:
Point whatever;
whatever.SetX(GetX()+3);
whatever.SetY(GetY()+4);
...you'd do something like this:
Point whatever;
whatever.move_relative(3, 4);
There are times, however, that exposing something as data really does make sense and work well. If/when you are going to do that, C++ already provides a good way to encapsulate access to the data: a class. It also provides a predefined name for SetXXX and GetXXX -- they're operator= and operator T respectively. The right way to do this is something like this:
template <class T>
class encapsulate {
T value;
public:
encapsulate(T const &t) : value(t) {}
encapsulate &operator=(encapsulate const &t) { value = t.value; }
operator T() { return value; }
};
Using this, your Point class looks like:
struct Point {
encapsulate<double> x, y;
};
With this, the data you want to be public looks and acts as if it is. At the same time, you retain full control over getting/setting the values by changing the encapsulate to something that does whatever you need done.
Point whatever;
whatever.x = whatever.x + 3;
whatever.y = whatever.y + 4;
Though I haven't bothered to in the demo template above, it's fairly easy to support the normal compound assignment operators (+=, -=, *=, /=, etc.) as well. Depending on the situation, it's often useful to eliminate many of these though. Just for example, adding/subtracting to an X/Y coordinate often makes sense -- but multiplication and division frequently won't, so you can just add += and -=, and if somebody accidentally types in /= or |= (for just a couple of examples), their code simply won't compile.
This also provides better enforcement of whatever constraints you need on the data. With private data and an accessor/mutator, other code in the class can (and almost inevitably will) modify the data in ways you didn't want. With a class dedicated to nothing by enforcing the correct constraints, that issue is virtually eliminated. Instead, code both inside and outside the class does a simple assignment (or uses the value, as the case may be) and it's routed through the operator=/operator T automatically -- code inside the class can't bypass whatever checking is needed.
Since you're (apparently) concerned with efficiency, I'll add that this won't normally have any run-time cost either. In fact, being a template gives it a slight advantage in that regard. Where code in a normal function could (even if only by accident) be rewritten in a way that prevented inline expansion, using a template eliminates that -- if you try to rewrite it in a way that otherwise wouldn't generate inline code, with a template it won't compile at all.
As long as you define the functions in the header so the compiler can inline them there should be no difference at all. But even if they aren't inlined you still shouldn't make them public unless profiling indicates that it's a significant bottleneck and that making the variables public improves the problem. Making variables public decreases encapsulation and maintainability. For a bit more on public variables, see my answer on What good are public variables then?
The short answer is yes, this will affect the performance. Whether you will notice the difference or not is another matter that depends on how much code you have in the accessors, among other things.
The more important questions, though, is do you need what you gain from using accessors? If you make the fields public, then you lose control over their values. Do you want to allow x or y to be NaN? or +-infinity? Making them public would make such cases possible.
If you decide later that a double is not acceptable for your point class (maybe you need more precision or the precision isn't necessary), then accessing the fields directly would cause trouble. While this change might also require changes in the accessors, the setters should be fine with overloaded methods. And you may still be fine with a public representation of a double whereas the internal representation is not a double (although this is not so likely with a Point class, I imagine).
There are other cases where you might want to have side effects on accessors and setters as well that making the fields public would circumvent. Maybe you want to create events for when your point changes, but if the fields are public, then your class won't know when the values change.
ADDED
Ok, so my glossing over with my "yes" so that I could get to the non-performance issues that I felt more important wasn't appreciated.
In many cases, the yes is probably as correct as it will be imperceptible. True, using inline and a kick-ass compiler may very well end up with the same code (assuming an accessor like double GetX() { return x; }), but there are a lot of ifs there. Compilers will only inline things that end up in the same object file (often created from a single code file). So you also need a kick-ass linker to optimize the references in other object files (by the time you get to the linker, the inline hint may not even still remain in the code). So some, but not necessarily all, of the code may end up being identical, but that would be something you can confirm only after the fact and isn't useful.
If you're concerned about image processing then it might be worth allowing for friend classes so that an image class that you code can have access directly to the fields, but again I don't think that even in that case the accessor will be adding a lot to your runtime.