Performance implications of C++ unions

Performance implications of C++ unions - c++

In Agner Fog's "Optimizing software in C++" it is stated that union forces a variable to be stored in memory even in cases where it otherwise could have been stored in a register, which might have performance implications. (e.g. page 148)
I often see code that looks like this:
struct Vector {
union {
struct {
float x, y, z, w;
};
float v[4];
}
};
This can be quite convenient, but now I'm wondering if there might be potential performance hit.
I wrote a small benchmark that compares Vector implementations with and without union and there where cases where the Vector without union apparently performed better, although I don't know how trust-worthy my benchmark is. (I compared three implementations: union; x, y, z, w; v[4]. For example, v[4] seemed to be slower when passed by value, although the structs all have the same size.)
My question now is, whether this is something that people consider when writing actual production code? Do you know of cases where it was decided against unions specifically for this reason?

It appears the goal is to provide friendly names for elements of a vector type, and union is not the best way to do that. Comments have pointed out the undefined behavior already, and even if it works its a form of aliasing which limits optimization opportunities.
Instead, avoid the whole mess and just add accessors that name the elements.
struct quaternion
{
float vec[4];
float &x() { return vec[0]; }
float &y() { return vec[1]; }
float &z() { return vec[2]; }
float &w() { return vec[3]; }
const float &x() const { return vec[0]; }
const float &y() const { return vec[1]; }
const float &z() const { return vec[2]; }
const float &w() const { return vec[3]; }
}
In fact, much as Eigen does for its quaternion implementation:
https://eigen.tuxfamily.org/dox/Quaternion_8h_source.html

My question now is, whether this is something that people consider when writing actual production code?
No. That's premature optimization (the union construct itself also is). Once the code is written in somewhat clean and reliable way, it can be profiled and true bottlenecks addressed. No need to reason above some union for 5 minutes to guess whether it will affect performance somewhere in the future. It either will, or will not, and only profiling can tell.

Related

Can I iterate over class members as though they were an array in C++?

Suppose I have a class akin to the following:
struct Potato {
Eigen::Vector3d position;
double weight, size;
string name;
};
and a collection of Potatos
std::vector<Potato> potato_farm = {potato1, potato2, ...};
This is pretty clearly an array-of-structures (AoS) layout because, let's say for most purposes, it makes sense to have all of a Potato's data lumped together. However, I might like to do a calculation of the most common name where a structure-of-arrays (SoA) design makes things agnostic to the type of thing with a name (an array of people all with names, an array of places, all with names, etc.) Does C++ have any tools or tricks that makes an AoS layout look like a SoA to do something like this, or is there a better design that accomplishes the same thing?

You can use lambdas to access particular member in algos that work over range:
double mean = std::accumulate( potato_farm.begin(), potato_farm.end(), 0.0, []( double val, const Potato &p ) { return val + p.weight; } ) / potato_farm.size();
if that is not enough you cannot make it look like array of data as that requires objects to be in continuous memory, but you can make it like a container. So you can implement custom iterators (for example random access iterator of type == double which iterates over weight member). How to implement custom iterators is described here. You can probably even make that generic, but it is not clear if that would worse the effort as it is not very simple to implement properly.

Unfortunately, there is no language tool to generically change a struct into SoA. This is actually one of big obstacles when you try to bring SIMD programming into higher level.
You will need to create a SoA manually. However, you can help yourself by creating a reference to SoA objects acting as if it was a regular Potato.
struct Potato {
float position;
double weight, size;
std::string name;
};
struct PotatoSoARef {
float& position;
double& weight;
double& size;
std::string& name;
};
class PotatoSoA {
private:
float* position;
double* weight;
double* size;
std::string* name;
public:
PotatoSoA(std::size_t size) { /* allocate the SoA */ }
PotatoSoARef operator[](std::size_t idx) {
return PotatoSoARef{position[idx], weight[idx], size[idx], name[idx]};
}
};
This way, regardless if you have an AoS or SoA of Potatos, you can access its fields as arr[idx].position etc. (both as r- and l-value). The compiler is likely to optimize the proxy away.
You might want to add other constructors and accessors as well.
You might also be interested in implementing a regular AoS with an operator[] returning a PotatoSoARef if you want functions to have a uniform interface for both AoS and SoA access patterns.
If you are willing to depart from C++ though, you might be interested in language extensions such as Sierra

As Slava has said you aren't going to get SoA-like access out of AoS data without writing your own iterators, and I would think really hard about whether the use of STL algorithms is that important before doing that, especially if this isn't meant to be a generic solution. The primary benefit of SoA data is cache performance anyway, not the particular syntax of whatever containers you're using, and nothing besides actual SoA data is going to get you that.

With range-v3 (not in C++17 :-/ ), you may use Projection or transformation view:
ranges::accumulate(potato_farm, 0., ranges::v3::plus{}, &Potato::weight);
or
auto weightsView = potato_farm | ranges::view::transform([](auto& p) { return p.weight; });
ranges::accumulate(weightsView, 0.);

Is reading inactive union member of the same type as active one well-defined? [duplicate]

This question already has an answer here:
Accessing same-type inactive member in unions
(1 answer)
Closed 6 years ago.
Consider the following structure:
struct vec4
{
union{float x; float r; float s};
union{float y; float g; float t};
union{float z; float b; float p};
union{float w; float a; float q};
};
Something like this seems to be used in e.g. GLM to provide GLSL-like types like vec4, vec2 etc..
But although the intended usage is like to make this possible
vec4 a(1,2,4,7);
a.x=7;
a.b=a.r;
, it seems to be an undefined behavior, because, as quoted here,
In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.
Wouldn't it be better to e.g. use just define the structure something like the following?
struct vec4
{
float x,y,z,w;
float &r,&g,&b,&a;
float &s,&t,&p,&q;
vec4(float X,float Y,float Z,float W)
:x(X),y(Y),z(Z),w(W),
r(x),g(y),b(z),a(w),
s(x),t(y),p(z),q(w)
{}
vec4()
:r(x),g(y),b(z),a(w),
s(x),t(y),p(z),q(w)
{}
vec4(const vec4& rhs)
:x(rhs.x),y(rhs.y),z(rhs.z),w(rhs.w),
r(x),g(y),b(z),a(w),
s(x),t(y),p(z),q(w)
{}
vec4& operator=(const vec4& rhs)
{
x=rhs.x;
y=rhs.y;
z=rhs.z;
w=rhs.w;
return *this;
}
};
Or am I working around a non-existent issue? Is there maybe some special statement allowing access to identically-typed inactive union members?

I think the quote in which you are referring to is directed at having different types with in the union.
struct foo {
union {
float x,
int y,
double z,
};
};
These are different data, conveniently stored into the same structure, unions are not supposed to be a casting mechanism.
GLM approach uses the same data and uses the union for an alias mechanic.
Your approach might be 'better' C++ but its worse 'Engineering'. Vector math needs to be fast, and the smaller the better in this case.
Your implementation is makes the vector 3 times bigger. sizeof(glm::vec4); // 16 while sizeof(your_vec4); // 48 - ouch If you where processing a large armount of these which is often the case, 3 times as more cache misses with your_vec4.
I think you are right though glm's use of unions as alias's is a bit much, while I'm not sure if its undefined, but this type of thing I've seen a lot without much issue, and glm is widely used.
I don't really see the need to emulate glsl in C++, and struct { float x,y,z,w; } would be better (at least in my mind).

Passing multiple variables back from a single function?

I have an assignment (see below for question) for a beginners c++ class, where i am asked to pass 2 values back from a single function. I am pretty sure of my understanding of how to use functions and the general structure of what the program should be, but i am having trouble fingin how to pass two variables back to "main" from the function.
Assignment:
Write a program that simulates an airplane race. The program will display a table showing the speed in km/hour and distance in km traveled by two airplanes every second until one of them has gone 10 kilometers.
These are the requirements for the program:
-The program will use a function that has the following parameters: time and acceleration.
-The function will pass back two data items: speed and distance.

You have two options (well, three really, but I'm leaving pointers out).
Take references to output arguments and assign them within the function.
Return a data structure which contains all of the return values.
Which option is best depends on your program. If this is a one off function that isn't called from many places then you may chose to use option #1. I assume by "speed" you mean the "constant velocity" which is reached after "time" of acceleration.
void calc_velocity_profile(double accel_time,
double acceleration,
double &out_velocity, // these last two are
double &out_distance); // assigned in the function
If this is a more general purpose function and/or a function which will be called by many clients I would probably prefer option #2.
struct velocity_profile {
double velocity;
double distance;
};
velocity_profile calc_velocity_profile(double accel_time, double acceleration);
Everything being equal, I prefer option 1. Given the choice, I like a function which returns a value instead of a function which mutates its input.

2017 Update: This is discussed in the C++ Core Guidelines :
F.21 To return multiple "out" values, prefer returning a tuple or struct
However, I would lean towards returning a struct over a tuple due to named, order-independent access that is encapsulated and reusable as a explicit strong type.
In the special case of returning a bool and a T, where the T is only filled if the bool is true , consider returning a std::optional<T>. See this CPPCon17 video for an extended discussion.
Struct version:
struct SpeedInfo{
float speed;
float distance;
};
SpeedInfo getInfo()
{
SpeedInfo si;
si.speed = //...
si.distance = //...
return si;
}
The benefit of this is that you get an encapsulated type with named access.
Reference version:
void getInfo(float& speed, float& distance)
{
speed = //...
distance = //...
}
You have to pass in the output vars:
float s;
float d;
getInfo(s, d);
Pointer version:
void getInfo(float* speed, float* distance)
{
if(speed)
{
*speed = //...
}
if(distance)
{
*distance= //...
}
}
Pass the memory address of the output variable:
float s;
float d;
getInfo(&s, &d);
Pointer version is interesting because you can just pass a nullptr/NULL/0 for things you aren't interested in; this can become useful when you are using such a function that potentially takes a lot of params, but are not interested in all the output values. e.g:
float d;
getInfo(nullptr, &d);
This is something which you cant do with references, although they are safer.

There is already such a data structure in C++ that is named as std::pair. It is declared in header <utility>. So the function could look the following way
std::pair<int, int> func( int time, int acceleration )
{
// some calculations
std::pair<int, int> ret_value;
ret_value.first = speed_value;
ret_value.second = distance_value;
return ( ret_value );
}

Overloading operator[] to start at 1 and performance overhead

I am doing some C++ computational mechanics (don't worry, no physics knowledge required here) and there is something that really bothers me.
Suppose I want to represent a 3D math Vector (nothing to do with std::vector):
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[0] = x;
coordinates[1] = y;
coordinates[2] = z;
}
private:
double coordinates[3];
};
So far so good. Now I can overload operator[] to extract coordinates:
double& Vector::operator[](int i) {
return coordinates[i] ;
}
So I can type:
Vector V;
… //complex computation with V
double x1 = V[0];
V[1] = coord2;
The problem is, indexing from 0 is NOT natural here. I mean, when sorting arrays, I don't mind, but the fact is that the conventionnal notation in every paper, book or whatever is always substripting coordinates beginning with 1.
It may seem a quibble but the fact is that in formulas, it always takes a double-take to understand what we are taking about. Of course, this is much worst with matrices.
One obvious solution is just a slightly different overloading :
double& Vector::operator[](int i) {
return coordinates[i-1] ;
}
so I can type
double x1 = V[1];
V[2] = coord2;
It seems perfect except for one thing: this i-1 subtraction which seems a good candidate for a small overhead. Very small you would say, but I am doing computationnal mechanics, so this is typically something we couldn't afford.
So now (finally) my question: do you think a compiler can optimize this, or is there a way to make it optimize ? (templates, macro, pointer or reference kludge...)
Logically, in
double xi = V[i];
the integer between the bracket being a literal most of the time (except in 3-iteration for loops), inlining operator[] should make it possible, right ?
(sorry for this looong question)
EDIT:
Thanks for all your comments and answers
I kind of disagree with people telling me that we are used to 0-indexed vectors.
From an object-oriented perspective, I see no reason for a math Vector to be 0-indexed because implemented with a 0-indexed array. We're not suppose to care about the underlying implementation. Now, suppose I don't care about performance and use a map to implement Vector class. Then I would find it natural to map '1' with the '1st' coordinate.
That said I tried out with 1-indexed vectors and matrices, and after some code writing, I find it not interacting nicely every time I use an array around. I thougth Vector and containers (std::array,std::vector...) would not interact often (meaning, transfering data between one another), but it seems I was wrong.
Now I have of a solution that I think is less controversial (please give me your opinion) :
Every time I use a Vector in some physical context, I think of using an enum :
enum Coord {
x = 0,
y = 1,
z = 2
};
Vector V;
V[x] = 1;
The only disadvantage I see being that these x,y and z can be redefined without enven a warning...

This one should be measured or verified by looking at the disassembly, but my guess is: The getter function is tiny and its arguments are constant. There is a high chance the compiler will inline the function and constant-fold the subtraction. In that case the runtime cost would be zero.

Why not to try this:
class Vector {
public:
Vector(double x=0., double y=0., double z=0.) {
coordinates[1] = x;
coordinates[2] = y;
coordinates[3] = z;
}
private:
double coordinates[4];
};
If you are not instantiating your object in quantities of millions, then the memory waist might be affordable.

Have you actually profiled it or examined the generated code? That's how this question is answered.
If the operator[] implementation is visible then this is likely to be optimized to have zero overhead.

I recommend you define this in the header (.h) for your class. If you define it in the .cpp then the compiler can't optimize as much. Also, your index should not be an "int" which can have negative values... make it a size_t:
class Vector {
// ...
public:
double& operator[](const size_t i) {
return coordinates[i-1] ;
}
};

You cannot say anything objective about performance without benchmarking. On x86, this subtraction can be compiled using relative addressing, which is very cheap. If operator[] is inlined, then the overhead is zero—you can encourage this with inline or with compiler-specific instructions such as GCC’s __attribute__((always_inline)).
If you must guarantee it, and the offset is a compile-time constant, then using a template is the way to go:
template<size_t I>
double& Vector::get() {
return coordinates[i - 1];
}
double x = v.get<1>();
For all practical purposes, this is guaranteed to have zero overhead thanks to constant-folding. You could also use named accessors:
double Vector::x() const { return coordinates[0]; }
double Vector::y() const { return coordinates[1]; }
double Vector::z() const { return coordinates[2]; }
double& Vector::x() { return coordinates[0]; }
double& Vector::y() { return coordinates[1]; }
double& Vector::z() { return coordinates[2]; }
And for loops, iterators:
const double* Vector::begin() const { return coordinates; }
const double* Vector::end() const { return coordinates + 3; }
double* Vector::begin() { return coordinates; }
double* Vector::end() { return coordinates + 3; }
// (x, y, z) -> (x + 1, y + 1, z + 1)
for (auto& i : v) ++i;
Like many of the others here, however, I disagree with the premise of your question. You really should simply use 0-based indexing, as it is more natural in the realm of C++. The language is already very complex, and you need not complicate things further for those who will maintain your code in the future.

Seriously, benchmark this all three ways (ie, compare the subtraction and the double[4] methods to just using zero-based indices in the caller).
It's entirely possible you'll get a huge win from forcing 16-byte alignment on some cache architectures, and equally possible the subtraction is effectively free on some compiler/instruction set/code path combinations.
The only way to tell is to benchmark realistic code.

Class design: arrays vs multiple variables

I have a bit of a theoretical question, however it is a problem I sometimes face when designing classes and I see it done differently when reading others code. Which of the following would be better and why:
example 1:
class Color
{
public:
Color(float, float, float);
~Color();
friend bool operator==(Color& lhs, Color& rhs);
void multiply(Color);
// ...
float get_r();
float get_g();
float get_b();
private:
float color_values[3];
}
example 2:
class Color
{
public:
// as above
private:
float r;
float g;
float b;
}
Is there a general rule one should follow in cases like this or is it just up to a programmer and what seems to make more sense?

Both!
Use this:
class Color {
// ...
private:
union {
struct {
float r, g, b;
};
float c[3];
};
};
Then c[0] will be equivalent to r, et cetera.

It depends, do you intend to iterate over the whole array ?
In that case, I think solution 1 is more appropriate.
It is very useful to have an array like that when you have functions that operate in a loop on the data
e.g.
void BumpColors(float idx)
{
for (int i = 0; i < 3; ++i)
color_values[i] += idx;
}
vs
void BumpColors(float idx)
{
color_values[0] += idx;
color_values[1] += idx;
color_values[2] += idx;
}
Of course this is trivial, and I think it really is a matter of preference. In some rare occasion you might have APIs that take a pointer to the data though, and while you can do
awesomeAPI((float*)&r);
I would much prefer doing
awesomeAPI((float*)&color_values[0]);
because the array will guarantee its contiguity whereas you can mess up with the contiguity by adding by mistake another member variable that is not related after float r.
Performance wise there would be no difference.

I'd say the second one is the best one.
First, the data your variables contain isn't supposed (physically) to be in an array. If you had for example a class with 3 students, not more, not less, you'd put them in an array, cause they are an array of students, but here, it's just colors.
Second, Someone that reads your code also can understand in the second case really fast what your variables contain (r is red, etc). It isn't the case with an array.
Third, you'll have less bugs, you won't have to remember "oh, in my array, red is 0, g is 1, b is 2", and you won't replace by mistake
return color_values[0]
by
return color_values[1]
in your code.

I think that you are right: "It just up to a programmer and what seems to make more sense." If this were my program, I would choose one form or the other without worrying too much about it, then write some other parts of the program, then revisit the matter later.
One of the benefits of class-oriented design is that it makes internal implementation details of this kind private, which makes it convenient to alter them later.
I think that your question does matter, only I doubt that one can answer it well until one has written more code. In the abstract, there are only three elements, and the three have names -- red, green and blue -- so I think that you could go either way with this. If forced to choose, I choose example 2.

Is there a general rule one should follow in cases like this or is it just up to a programmer and what seems to make more sense?
It's definitely up to the programmer and whatever makes more sense.
In your case, the second option seems more appropriate. After all, logically thinking, your member isn't an array of values, but values for r, g and b.

Advantages of using an array:
Maintainability: You can use the values in the array to loop
Maintainability: When a value should be added (like yellow?) than you don't have to change a lot of code.
Disadvantage:
Readability: The 'values' have more clearer names (namely r, g, b in this case).
In your case probably the r, g, b variables are best, since it's unlikely a color is added and a loop over 3 elements has probably a less high importance than readability.

Sometimes a programmer will use an array ( or data structure )
in order to save the data faster to disk (or memory) using 1 write operation.
This is especially useful if you are reading and writing a lot of data.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Performance implications of C++ unions - c++

Related

Can I iterate over class members as though they were an array in C++?

Is reading inactive union member of the same type as active one well-defined? [duplicate]

Passing multiple variables back from a single function?

Overloading operator[] to start at 1 and performance overhead

Class design: arrays vs multiple variables

Categories

Resources