Is there a significant inherent cost of object instantiation in C++? - c++

I was recently told in a code review (by an older and wiser C++ developer) to rewrite a class I'd written turning it into a set of static methods instead. He justified this by saying that although my object did contain a very small amount of internal state, it could be derived at runtime anyway and if I changed to static methods I'd avoid the cost of insantiating objects all over the place.
I have now made this change but it got me to thinking, what is the cost of instantiation in C++? I'm aware that in managed languages, there's all the cost of garbage collecting the object which would be significant. However, my C++ object was simply on the stack, it didn't contain any virtual methods so there would be no runtime function lookup cost. I'd used the new C++11 delete mechanism to delete the default copy/assignment operators so there was no copying involved. It was just a simple object with a constructor that did a small amount of work (required anyway with static methods) and a destructor which did nothing. Can anyway tell me what these instation consts would be? (The reviewer is a bit intimidating and I don't want to look stupid by asking him!) ;-)

Short answer - inherently object allocation is cheap but can get expensive in certain cases.
Long Answer
In C++ the cost of instantiating an object is the same as instantiating a struct in C. All an object is, is a block of memory big enough to store the v-table (if it has one) and all the data attributes. Methods consume no further memory after the v-table has been instantiated.
A non-virtual method is a simple function with an implicit this as its first parameter. Calling a virtual function is a bit more complicated since it must to a v-table lookup in order to know which function of which class to call.
This means that instantiating a object on the stack involves a simple decrement of the stack pointer (for a full decending stack).
When an object is instantiated on the heap, the cost can go up substantially. But this is something inherent with any heap related allocation. When allocating memory on the heap, the heap needs to find a free block big enough to hold your object. Finding such a block is a non-constant time operation and can be expensive.
C++ has constructors that may allocated more memory for certain pointer data attributes. These are normally heap allocated. This is further compounded if said data members perform heap allocations themselves. This can lead to something involving a substantial number of instructions.
So bottom line is that it depends on how and what the object is that you are instatiating.

If your object-type must invoke a non-trivial constructor and destructor during it's life-time, then the cost is the going to be the minimum cost of creating any C++ object that has a non-trivial constructor and destructor. Making the rest of your methods static will not reduce that cost. The "price" of space will be at least 1 byte since your class is not a base-class of a derived class, and the only cost-savings in the static class method calls will be the omission of the implicit this pointer passed as the hidden first argument of the call, something that would be required for non-static class methods.
If the methods your reviewer is asking you to re-designate as static never touch the non-static data-members of your class-type, then the passing of the implicit this pointer is a wasted resource, and the reviewer has a good point. Otherwise, you would have to add an argument to the static methods that would take the class-type as either a reference or pointer, nullifying the gained performance from the omission of the implicit this pointer.

Probably not a lot, and I'd be amazed if it were any sort of bottleneck. But there's the principle of the thing if nothing else.
However, you should ask the guy; never be afraid to do that, and it's not entirely clear here that losing the stored state and instead deriving it each time (if that's what you're doing instead) is not going to make things worse. And, if it's not, you'd think a namespace would be better than static methods.
A testcase/example would make this easier to answer categorically, further than "you should ask him".

It depends on what your application does. Is it a real time system on a device with limited memory? If not, most of the time object instantiation won't be an issue, unless you are instantiating millions of these and keeping them around or some weird design like that.
Most systems will have a lot more bottlenecks such as:
user input
network calls
database access
computation intensive algos
thread switching costs
system calls
I think in most cases encapsulation into a class for design trumps small costs of instantiation. Of course there can be those 1% of cases where this doesn't hold but is yours one of those?

As a general rule, if a function can be made static it probably should be. It is cheaper. How much cheaper? That depends on what the object does in its constructor, but the base cost of constructing a C++ object is not that high (dynamic memory allocation of course is more expensive).
The point is not to pay for that which you do not need. If a function can be static, why make it a member function? It makes no sense to be a member function in that case. Will the penalty of creating an object kill the performance of your application? Probably not, but again, why pay for what you don't need?

As others have suggested talk to your colleague and ask him to explain his reasoning. If practical, you should investigate with a small test program the performance of the two versions. Doing both of these will help you grow as a programmer.
In general I agree with the advice to make a member function static if practical. Not because of performance reasons but because it reduces the amount of context you need to remember to understand the behaviour of the function.
It is worth noting that there is one case where using a member function will result in faster code. That case is when the compiler can perform inlining. This is kind of an advanced topic but it is stuff like that makes it hard to write categorical rules about programming.
#include <algorithm>
#include <iostream>
#include <vector>
#include <stdlib.h>
#include <time.h>
bool int_lt(int a, int b)
{
return a < b;
}
int
main()
{
size_t const N = 50000000;
std::vector<int> c1;
c1.reserve(N);
for (size_t i = 0; i < N; ++i) {
int r = rand();
c1.push_back(r);
}
std::vector<int> c2 = c1;
std::vector<int> c3 = c1;
clock_t t1 = clock();
std::sort(c2.begin(), c2.end(), std::less<int>());
clock_t t2 = clock();
std::sort(c3.begin(), c3.end(), int_lt);
clock_t t3 = clock();
std::cerr << (t2 - t1) / double(CLOCKS_PER_SEC) << '\n';
std::cerr << (t3 - t2) / double(CLOCKS_PER_SEC) << '\n';
return 0;
}
On my i7 Linux because g++ can't inline the function int_lt but can inline std::less::operator() the non member function version is about 50% slower.
> g++-4.5 -O2 p3.cc
> ./a.out
3.85
5.88
To understand why such a big difference you need to consider what type the compiler infers for the comparator. In the case int_lt it infers the type bool (*)(int, int) whereas with std::less it infers std::less. With the function pointer the function to be called is only ever known at run time. Which means that it is impossible for the compiler to inline its definition at compile time. In contrast with std::less the compiler has access to the type and its definition at compile time so it can inline std::less::operator(). Which makes a significant difference to performance in this case.
Is this behaviour only related to templates? No, it relates to a loss of abstraction when passing functions as objects. A function pointer does not include as much information as a function object type for the compiler to make use of. Here is a similar example using no templates (well aside from std::vector for convenience).
#include <iostream>
#include <time.h>
#include <vector>
#include <stdlib.h>
typedef long (*fp_t)(long, long);
inline long add(long a, long b)
{
return a + b;
}
struct add_fn {
long operator()(long a, long b) const
{
return a + b;
}
};
long f(std::vector<long> const& x, fp_t const add, long init)
{
for (size_t i = 0, sz = x.size(); i < sz; ++i)
init = add(init, x[i]);
return init;
}
long g(std::vector<long> const& x, add_fn const add, long init)
{
for (size_t i = 0, sz = x.size(); i < sz; ++i)
init = add(init, x[i]);
return init;
}
int
main()
{
size_t const N = 5000000;
size_t const M = 100;
std::vector<long> c1;
c1.reserve(N);
for (size_t i = 0; i < N; ++i) {
long r = rand();
c1.push_back(r);
}
std::vector<long> c2 = c1;
std::vector<long> c3 = c1;
clock_t t1 = clock();
for (size_t i = 0; i < M; ++i)
long s2 = f(c2, add, 0);
clock_t t2 = clock();
for (size_t i = 0; i < M; ++i)
long s3 = g(c3, add_fn(), 0);
clock_t t3 = clock();
std::cerr << (t2 - t1) / double(CLOCKS_PER_SEC) << '\n';
std::cerr << (t3 - t2) / double(CLOCKS_PER_SEC) << '\n';
return 0;
}
Cursory testing indicates that the free function is 100% slower than the member function.
> g++ -O2 p5.cc
> ./a.out
0.87
0.32
Bjarne Stroustrup provided an excellent lecture recently on C++11 which touches on this. You can watch it at the link below.
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Keynote-Bjarne-Stroustrup-Cpp11-Style

Related

Efficient functor dispatcher

I need help understanding two different versions of functor dispatcher, see here:
#include <cmath>
#include <complex>
double* psi;
double dx = 0.1;
int range;
struct A
{
double operator()(int x) const
{
return dx* (double)x*x;
}
};
template <typename T>
void dispatchA()
{
constexpr T op{};
for (int i=0; i<range; i++)
psi[i]+=op.operator()(i);
}
template <typename T>
void dispatchB(T op)
{
for (int i=0; i<range; i++)
psi[i]+=op.operator()(i);
}
int main(int argc, char** argv)
{
range= argc;
psi = new double[range];
dispatchA<A>();
// dispatchB<A>(A{});
}
Live at https://godbolt.org/z/93h5T46oq
The dispatcher will be called many times in a big loop, so I need to make sure that I'm doing it right.
Both version seem to me unnecessarily complex since the type of the functor is known at compile-time.
DispatchA, because it unnecessarily creates an (constexpr) object.
DispatchB, because it passes the object over and over.
Of course those could be solved by a) making the a static function in the functor,
but static functions are bad practice, right?
b) making a static instance of the functor inside the dispatcher, but then the lifetime of the object grows to the lifetime of the program.
That being said I don't know enough assembly to meaningfully compare the two appoaches.
Is there a more elegant/efficient approach?
This likely isn't the answer you are looking for, but the general advice you are going to get from almost any seasoned developer is to just write the code in a natural/understandable way, and only optimize if you need to.
This may sound like a non-answer, but it's actually good advice.
The majority of the time, the cost you may (if at all) incur due to small decisions like this will be inconsequential overall. Generally, you'll see more gains when optimizing an algorithm more so than optimizing a few instructions. There are, indeed, exceptions to this rule -- but generally such optimizations are part of a tight loop -- and this is the type of thing you can retroactively look at by profiling and benchmarking.
It's better to write code in a way that can be maintained in the future, and only really optimizing it if this proves to be an issue down the line.
For the code in question, both code-snippets when optimized produce identical assembly -- meaning that both approach should perform equally as well in practice (provided the calling characteristics are the same). But even then, benchmarking would be the only real way to verify this.
Since the dispatchers are function template definitions, they are implicitly inline, and their definition will always be visible before invoking. Often, this is enough for an optimizer to both introspect and inline such code (if it deems this is better than not).
... static functions are bad practice, right?
No; static functions are not bad practice. Like any utility in C++, they can surely be misused -- but there is nothing inherently bad about them.
DispatchA, ... unnecessarily creates an (constexpr) object
constexpr objects are constructed at compile-time -- and so you would not see any real cost to this other than perhaps a bit more space on the stack being reserved. This cost would really be minimal.
You could also make this static constexpr instead if you really wanted to avoid this. Although logically the "lifetime of the object grows to the lifetime of the program" as you mentioned, constexpr objects cannot have exit-time behavior in C++, so the cost is virtually nonexistent.
Assuming A is stateless, as it is in your example, and has no non-static data members, they are identical. The compiler is smart enough to see that construction of the object is a no-op and omits it. Let's clear up your code a bit to get clean assembly we can easily reason about:
struct A {
double operator()(int) const noexcept;
};
void useDouble(double);
int genInt();
void dispatchA() {
constexpr A op{};
auto const range = genInt();
for (int i = 0; i < range; i++) useDouble(op(genInt()));
}
void dispatchB(A op) {
auto const range = genInt();
for (int i = 0; i < range; i++) useDouble(op(genInt()));
}
Here, where input comes from and where the output goes is abstracted away. Generated assembly can only differ because of how the op object is created. Compiling it with GCC 11.1, I get identical assembly generation. No creation or initialization of A takes place.

Can you call the destructor without calling the constructor?

I've been trying not to initialize memory when I don't need to, and am using malloc arrays to do so:
This is what I've run:
#include <iostream>
struct test
{
int num = 3;
test() { std::cout << "Init\n"; }
~test() { std::cout << "Destroyed: " << num << "\n"; }
};
int main()
{
test* array = (test*)malloc(3 * sizeof(test));
for (int i = 0; i < 3; i += 1)
{
std::cout << array[i].num << "\n";
array[i].num = i;
//new(array + i) i; placement new is not being used
std::cout << array[i].num << "\n";
}
for (int i = 0; i < 3; i += 1)
{
(array + i)->~test();
}
free(array);
return 0;
}
Which outputs:
0 ->- 0
0 ->- 1
0 ->- 2
Destroyed: 0
Destroyed: 1
Destroyed: 2
Despite not having constructed the array indices. Is this "healthy"? That is to say, can I simply treat the destructor as "just a function"?
(besides the fact that the destructor has implicit knowledge of where the data members are located relative to the pointer I specified)
Just to specify: I'm not looking for warnings on the proper usage of c++. I would simply like to know if there's things I should be wary of when using this no-constructor method.
(footnote: the reason I don't wanna use constructors is because many times, memory simply does not need to be initialized and doing so is slow)
No, this is undefined behaviour. An object's lifetime starts after the call to a constructor is completed, hence if a constructor is never called, the object technically never exists.
This likely "seems" to behave correctly in your example because your struct is trivial (int::~int is a no-op).
You are also leaking memory (destructors destroy the given object, but the original memory allocated via malloc still needs to be freed).
Edit: You might want to look at this question as well, as this is an extremely similar situation, simply using stack allocation instead of malloc. This gives some of the actual quotes from the standard around object lifetime and construction.
I'll add this as well: in the case where you don't use placement new and it clearly is required (e.g. struct contains some container class or a vtable, etc.) you are going to run into real trouble. In this case, omitting the placement-new call is almost certainly going to gain you 0 performance benefit for very fragile code - either way, it's just not a good idea.
Yes, the destructor is nothing more than a function. You can call it at any time. However, calling it without a matching constructor is a bad idea.
So the rule is: If you did not initialize memory as a specific type, you may not interpret and use that memory as an object of that type; otherwise it is undefined behavior. (with char and unsigned char as exceptions).
Let us do a line by line analysis of your code.
test* array = (test*)malloc(3 * sizeof(test));
This line initializes a pointer scalar array using a memory address provided by the system. Note that the memory is not initialized for any kind of type. This means you should not treat these memory as any object (even as scalars like int, let aside your test class type).
Later, you wrote:
std::cout << array[i].num << "\n";
This uses the memory as test type, which violates the rule stated above, leading to undefined behavior.
And later:
(array + i)->~test();
You used the memory a test type again! Calling destructor also uses the object ! This is also UB.
In your case you are lucky that nothing harmful happens and you get something reasonable. However UBs are solely dependent on your compiler's implementation. It can even decide to format your disk and that's still standard-conforming.
That is to say, can I simply treat the destructor as "just a function"?
No. While it is like other functions in many ways, there are some special features of the destructor. These boil down to a pattern similar to manual memory management. Just as memory allocation and deallocation need to come in pairs, so do construction and destruction. If you skip one, skip the other. If you call one, call the other. If you insist upon manual memory management, the tools for construction and destruction are placement new and explicitly calling the destructor. (Code that uses new and delete combine allocation and construction into one step, while destruction and deallocation are combined into the other.)
Do not skip the constructor for an object that will be used. This is undefined behavior. Furthermore, the less trivial the constructor, the more likely that something will go wildly wrong if you skip it. That is, as you save more, you break more. Skipping the constructor for a used object is not a way to be more efficient — it is a way to write broken code. Inefficient, correct code trumps efficient code that does not work.
One bit of discouragement: this sort of low-level management can become a big investment of time. Only go this route if there is a realistic chance of a performance payback. Do not complicate your code with optimizations simply for the sake of optimizing. Also consider simpler alternatives that might get similar results with less code overhead. Perhaps a constructor that performs no initializations other than somehow flagging the object as not initialized? (Details and feasibility depend on the class involved, hence extend outside the scope of this question.)
One bit of encouragement: If you think about the standard library, you should realize that your goal is achievable. I would present vector::reserve as an example of something that can allocate memory without initializing it.
You currently have UB as you access field from non-existing object.
You might let field uninitialized by doing a constructor noop. compiler might then easily doing no initialization, for example:
struct test
{
int num; // no = 3
test() { std::cout << "Init\n"; } // num not initalized
~test() { std::cout << "Destroyed: " << num << "\n"; }
};
Demo
For readability, you should probably wrap it in dedicated class, something like:
struct uninitialized_tag {};
struct uninitializable_int
{
uninitializable_int(uninitialized_tag) {} // No initalization
uninitializable_int(int num) : num(num) {}
int num;
};
Demo

C++ smart pointer performance and difference with a simple wrapped pointer

I came across this test that someone did on C++ smart pointers, and I was wondering a couple of things. First of all, I've heard that make_shared and make_unique are faster than normal construction of a shared or unique pointer. But my results and the results of the guy who created the test showed that make_unique and make_shared are slightly slower (probably nothing significant). But I was also wondering, in debug mode for me a unique_pointer is about 3 times slower than normal pointers, and indeed also much slower than a simply wrapping a pointer in a class myself. In release mode the raw pointers, my wrapped class and unique_ptrs were roughly the same. I was wondering, does the unique_pointer do anything special that I would lose if I used my own smart pointer? It seems to be rather heavy, well at least in debug mode it seems to be doing a lot. The test is below:
#include <chrono>
#include <iostream>
#include <memory>
static const long long numInt = 100000000;
template <typename T>
struct SmartPointer
{
SmartPointer(T* pointee) : ptr(pointee) {}
T* ptr;
~SmartPointer() { delete ptr; }
};
int main() {
auto start = std::chrono::system_clock::now();
for (long long i = 0; i < numInt; ++i) {
//int* tmp(new int(i));
//delete tmp;
//SmartPointer<int> tmp(new int(i));
//std::shared_ptr<int> tmp(new int(i));
//std::shared_ptr<int> tmp(std::make_shared<int>(i));
//std::unique_ptr<int> tmp(new int(i));
//std::unique_ptr<int> tmp(std::make_unique<int>(i));
}
std::chrono::duration<double> dur = std::chrono::system_clock::now() - start;
std::cout << "time native: " << dur.count() << " seconds" << std::endl;
system("pause");
}
The link where I found this is at
http://www.modernescpp.com/index.php/memory-and-performance-overhead-of-smart-pointer
As best I can tell, the actual question is:
I was wondering, does the unique_pointer do anything special that I would lose if I used my own smart pointer? It seems to be rather heavy, well at least in debug mode it seems to be doing a lot.
It is possible that unique_ptr may have more trivial function calls or something like that, which doesn't get fully inlined, leading to worse performance in debug mode. However, as you said yourself, the performance when it matters, with optimizations enabled, is the same.
Even though unique_ptr is the simplest owning smart pointer to write, it still does a lot of things that your trivial wrapper does not:
It allows custom deleters, while ensuring that stateless custom deleters don't use extra space through Empty Base Class Optimization
It handles moves and copies correctly
It handles all kinds of conversions correctly; for instance unique_ptr<Derived> will implicitly convert to unique_ptr<Base>
it's const correct
Although most decent C++ programmers can implement a decent unique_ptr, I don't think most can implement one that is fully correct. And those edge cases will hurt you.
Just use unique_ptr, rolling your own for better performance with optimizations off is not a good reason.

Setting size of custom C++ container as template parameter vs constructor

I've written a fixed-size container (a ring buffer, to be exact) in C++. Currently I'm setting the size of the container in the constructor and then allocate the actual buffer on the heap. However, I've been thinking about moving the size parameter out of the constructor and into the template.
Going from this (RingBuffer fitting 100 integers)
RingBuffer<int> buffer(size);
to this
RingBuffer<int, 100> buffer;
This would allow me to allocate the whole buffer on the stack, which is faster than heap allocation, as far as I know. Mainly it's a matter of readability and maintainability though. These buffers often appear as members of classes. I have to initialize them with a size, so I have to initialize them in the initializer-list of every single constructor of the class. That means if I want to change the capacity of the RingBuffer I have to either remember to change it in every initializer-list or work with awkward static const int BUFFER_SIZE = 100; member variables.
My question is, is there any downside to specifying the container size as a template parameter as opposed to in the constructor? What are the pros and cons of either method?
As far as I know the compiler will generate a new type for each differently-sized RingBuffer. This could turn out to be quite a few. Does that hurt compile times much? Does it bloat the code or prevent optimizations? Of course I'm aware that much of this depends on the exact use case but what are the things I need to be aware of when making this decision?
My question is, is there any downside to specifying the container size as a template parameter as opposed to in the constructor? What are the pros and cons of either method?
If you give the size as template parameter, then it needs to be a constexpr (compile time constant expression). Thus your buffer size cannot depend on any run time characteristics (like user input).
Being a compile time constant opens up doors for some optimizations (loop unrolling and constant folding come to my mind) to be more efficient.
As far as I know the compiler will generate a new type for each differently-sized RingBuffer.
This is true. But I wouldn't worry about that, as having many different types per se won't have any impact on performance or code size (but probably on compile time).
Does that hurt compile times much?
It will make compilation slower. Though I doubt that in your case (this is a pretty simple template) this will even be noticeable. Thus it depends on your definition of "much".
Does it bloat the code or prevent optimizations?
Prevent optimizations? No. Bloat the code? Possibly. That depends on both how exactly you implement your class and what your compiler does. Example:
template<size_t N>
struct Buffer {
std::array<char, N> data;
void doSomething(std::function<void(char)> f) {
for (size_t i = 0; i < N; ++i) {
f(data[i]);
}
}
void doSomethingDifferently(std::function<void(char)> f) {
doIt(data.data(), N, f);
}
};
void doIt(char const * data, size_t size, std::function<void(char)> f) {
for (size_t i = 0; i < size; ++i) {
f(data[i]);
}
}
doSomething might get compiled to (perhaps completely) unrolled loop code, and you'd have a Buffer<100>::doSomething, a Buffer<200>::doSomething and so on, each a possibly large function. doSomethingDifferently might get compiled to not much more than a simple jump instruction, so having multiple of those wouldn't be much of an issue. Though your compiler could also change doSomething to be implemented similar doSomethingDifferently, or the other way around.
So in the end:
Don't try to make this decision depend on performance, optimizations, compile time or code bloat. Decide what's more meaningful in your situation. Will there only ever be buffers with compile time known sizes?
Also:
These buffers often appear as members of classes. I have to initialize them with a size, so I have to initialize them in the initializer-list of every single constructor of the class.
Do you know "delegating constructors"?
As Daniel Jour already said code bloating is not a huge issue and can be dealt with if needed.
The good about having size as constexpr is that it will allow you to detect some errors in compile time that would otherwise happen in runtime.
This would allow me to allocate the whole buffer on the stack, which is faster than heap allocation, as far as I know.
These buffers often appear as members of classes
This will happen only if owning class is allocated in automatic memory. Which is usually not the case. Consider following example:
struct A {
int myArray[10];
};
struct B {
B(): dynamic(new A()) {}
A automatic; // should be in the "stack"
A* dynamic; // should be in the "heap"
};
int main() {
B b1;
b1; // automatic memory
b1.automatic; // automatic memory
b1.automatic.myArray; // automatic memory
b1.dynamic; // automatic memory
(*b1.dynamic); // dynamic memory
(*b1.dynamic).myArray; // dynamic memory
B* b2 = new B();
b2; // automatic memory
(*b2); // dynamic memory
(*b2).automatic; // dynamic memory
(*b2).automatic.myArray; // dynamic memory
(*b2).dynamic; // dynamic memory
(*(*b2).dynamic).myArray; // dynamic memory
}

Comparison between constant accessors of private members

The main portion of this question is in regards to the proper and most computationally efficient method of creating a public read-only accessor for a private data member inside of a class. Specifically, utilizing a const type & reference to access the variables such as:
class MyClassReference
{
private:
int myPrivateInteger;
public:
const int & myIntegerAccessor;
// Assign myPrivateInteger to the constant accessor.
MyClassReference() : myIntegerAccessor(myPrivateInteger) {}
};
However, the current established method for solving this problem is to utilize a constant "getter" function as seen below:
class MyClassGetter
{
private:
int myPrivateInteger;
public:
int getMyInteger() const { return myPrivateInteger; }
};
The necessity (or lack thereof) for "getters/setters" has already been hashed out time and again on questions such as: Conventions for accessor methods (getters and setters) in C++ That however is not the issue at hand.
Both of these methods offer the same functionality using the syntax:
MyClassGetter a;
MyClassReference b;
int SomeValue = 5;
int A_i = a.getMyInteger(); // Allowed.
a.getMyInteger() = SomeValue; // Not allowed.
int B_i = b.myIntegerAccessor; // Allowed.
b.myIntegerAccessor = SomeValue; // Not allowed.
After discovering this, and finding nothing on the internet concerning it, I asked several of my mentors and professors for which is appropriate and what are the relative advantages/disadvantages of each. However, all responses I received fell nicely into two categories:
I have never even thought of that, but use a "getter" method as it is "Established Practice".
They function the same (They both run with the same efficiency), but use a "getter" method as it is "Established Practice".
While both of these answers were reasonable, as they both failed to explain the "why" I was left unsatisfied and decided to investigate this issue further. While I conducted several tests such as average character usage (they are roughly the same), average typing time (again roughly the same), one test showed an extreme discrepancy between these two methods. This was a run-time test for calling the accessor, and assigning it to an integer. Without any -OX flags (In debug mode), the MyClassReference performed roughly 15% faster. However, once a -OX flag was added, in addition to performing much faster both methods ran with the same efficiency.
My question is thus has two parts.
How do these two methods differ, and what causes one to be faster/slower than the others only with certain optimization flags?
Why is it that established practice is to use a constant "getter" function, while using a constant reference is rarely known let alone utilized?
As comments pointed out, my benchmark testing was flawed, and irrelevant to the matter at hand. However, for context it can be located in the revision history.
The answer to question #2 is that sometimes, you might want to change class internals. If you made all your attributes public, they're part of the interface, so even if you come up with a better implementation that doesn't need them (say, it can recompute the value on the fly quickly and shave the size of each instance so programs that make 100 million of them now use 400-800 MB less memory), you can't remove it without breaking dependent code.
With optimization turned on, the getter function should be indistinguishable from direct member access when the code for the getter is just a direct member access anyway. But if you ever want to change how the value is derived to remove the member variable and compute the value on the fly, you can change the getter implementation without changing the public interface (a recompile would fix up existing code using the API without code changes on their end), because a function isn't limited in the way a variable is.
There are semantic/behavioral differences that are far more significant than your (broken) benchmarks.
Copy semantics are broken
A live example:
#include <iostream>
class Broken {
public:
Broken(int i): read_only(read_write), read_write(i) {}
int const& read_only;
void set(int i) { read_write = i; }
private:
int read_write;
};
int main() {
Broken original(5);
Broken copy(original);
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
Yields:
5
42
The problem is that when doing a copy, copy.read_only points to original.read_write. This may lead to dangling references (and crashes).
This can be fixed by writing your own copy constructor, but it is painful.
Assignment is broken
A reference cannot be reseated (you can alter the content of its referee but not switch it to another referee), leading to:
int main() {
Broken original(5);
Broken copy(4);
copy = original;
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
generating an error:
prog.cpp: In function 'int main()':
prog.cpp:18:7: error: use of deleted function 'Broken& Broken::operator=(const Broken&)'
copy = original;
^
prog.cpp:3:7: note: 'Broken& Broken::operator=(const Broken&)' is implicitly deleted because the default definition would be ill-formed:
class Broken {
^
prog.cpp:3:7: error: non-static reference member 'const int& Broken::read_only', can't use default assignment operator
This can be fixed by writing your own copy constructor, but it is painful.
Unless you fix it, Broken can only be used in very restricted ways; you may never manage to put it inside a std::vector for example.
Increased coupling
Giving away a reference to your internals increases coupling. You leak an implementation detail (the fact that you are using an int and not a short, long or long long).
With a getter returning a value, you can switch the internal representation to another type, or even elide the member and compute it on the fly.
This is only significant if the interface is exposed to clients expecting binary/source-level compatibility; if the class is only used internally and you can afford to change all users if it changes, then this is not an issue.
Now that semantics are out of the way, we can speak about performance differences.
Increased object size
While references can sometimes be elided, it is unlikely to ever happen here. This means that each reference member will increase the size of an object by at least sizeof(void*), plus potentially some padding for alignment.
The original class MyClassA has a size of 4 on x86 or x86-64 platforms with mainstream compilers.
The Broken class has a size of 8 on x86 and 16 on x86-64 platforms (the latter because of padding, as pointers are aligned on 8-bytes boundaries).
An increased size can bust up CPU caches, with a large number of items you may quickly experience slow downs due to it (well, not that it'll be easy to have vectors of Broken due to its broken assignment operator).
Better performance in debug
As long as the implementation of the getter is inline in the class definition, then the compiler will strip the getter whenever you compile with a sufficient level of optimizations (-O2 or -O3 generally, -O1 may not enable inlining to preserve stack traces).
Thus, the performance of access should only vary in debug code, where performance is least necessary (and otherwise so crippled by plenty of other factors that it matters little).
In the end, use a getter. It's established convention for a good number of reasons :)
When implementing constant reference (or constant pointer) your object also stores a pointer, which makes it bigger in size. Accessor methods, on the other hand, are instantiated only once in program and are most likely optimized out (inlined), unless they are virtual or part of exported interface.
By the way, getter method can also be virtual.
To answer question 2:
const_cast<int&>(mcb.myIntegerAccessor) = 4;
Is a pretty good reason to hide it behind a getter function. It is a clever way to do a getter-like operation, but it completely breaks abstraction in the class.