Comparison between constant accessors of private members - c++

The main portion of this question is in regards to the proper and most computationally efficient method of creating a public read-only accessor for a private data member inside of a class. Specifically, utilizing a const type & reference to access the variables such as:
class MyClassReference
{
private:
int myPrivateInteger;
public:
const int & myIntegerAccessor;
// Assign myPrivateInteger to the constant accessor.
MyClassReference() : myIntegerAccessor(myPrivateInteger) {}
};
However, the current established method for solving this problem is to utilize a constant "getter" function as seen below:
class MyClassGetter
{
private:
int myPrivateInteger;
public:
int getMyInteger() const { return myPrivateInteger; }
};
The necessity (or lack thereof) for "getters/setters" has already been hashed out time and again on questions such as: Conventions for accessor methods (getters and setters) in C++ That however is not the issue at hand.
Both of these methods offer the same functionality using the syntax:
MyClassGetter a;
MyClassReference b;
int SomeValue = 5;
int A_i = a.getMyInteger(); // Allowed.
a.getMyInteger() = SomeValue; // Not allowed.
int B_i = b.myIntegerAccessor; // Allowed.
b.myIntegerAccessor = SomeValue; // Not allowed.
After discovering this, and finding nothing on the internet concerning it, I asked several of my mentors and professors for which is appropriate and what are the relative advantages/disadvantages of each. However, all responses I received fell nicely into two categories:
I have never even thought of that, but use a "getter" method as it is "Established Practice".
They function the same (They both run with the same efficiency), but use a "getter" method as it is "Established Practice".
While both of these answers were reasonable, as they both failed to explain the "why" I was left unsatisfied and decided to investigate this issue further. While I conducted several tests such as average character usage (they are roughly the same), average typing time (again roughly the same), one test showed an extreme discrepancy between these two methods. This was a run-time test for calling the accessor, and assigning it to an integer. Without any -OX flags (In debug mode), the MyClassReference performed roughly 15% faster. However, once a -OX flag was added, in addition to performing much faster both methods ran with the same efficiency.
My question is thus has two parts.
How do these two methods differ, and what causes one to be faster/slower than the others only with certain optimization flags?
Why is it that established practice is to use a constant "getter" function, while using a constant reference is rarely known let alone utilized?
As comments pointed out, my benchmark testing was flawed, and irrelevant to the matter at hand. However, for context it can be located in the revision history.

The answer to question #2 is that sometimes, you might want to change class internals. If you made all your attributes public, they're part of the interface, so even if you come up with a better implementation that doesn't need them (say, it can recompute the value on the fly quickly and shave the size of each instance so programs that make 100 million of them now use 400-800 MB less memory), you can't remove it without breaking dependent code.
With optimization turned on, the getter function should be indistinguishable from direct member access when the code for the getter is just a direct member access anyway. But if you ever want to change how the value is derived to remove the member variable and compute the value on the fly, you can change the getter implementation without changing the public interface (a recompile would fix up existing code using the API without code changes on their end), because a function isn't limited in the way a variable is.

There are semantic/behavioral differences that are far more significant than your (broken) benchmarks.
Copy semantics are broken
A live example:
#include <iostream>
class Broken {
public:
Broken(int i): read_only(read_write), read_write(i) {}
int const& read_only;
void set(int i) { read_write = i; }
private:
int read_write;
};
int main() {
Broken original(5);
Broken copy(original);
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
Yields:
5
42
The problem is that when doing a copy, copy.read_only points to original.read_write. This may lead to dangling references (and crashes).
This can be fixed by writing your own copy constructor, but it is painful.
Assignment is broken
A reference cannot be reseated (you can alter the content of its referee but not switch it to another referee), leading to:
int main() {
Broken original(5);
Broken copy(4);
copy = original;
std::cout << copy.read_only << "\n";
original.set(42);
std::cout << copy.read_only << "\n";
return 0;
}
generating an error:
prog.cpp: In function 'int main()':
prog.cpp:18:7: error: use of deleted function 'Broken& Broken::operator=(const Broken&)'
copy = original;
^
prog.cpp:3:7: note: 'Broken& Broken::operator=(const Broken&)' is implicitly deleted because the default definition would be ill-formed:
class Broken {
^
prog.cpp:3:7: error: non-static reference member 'const int& Broken::read_only', can't use default assignment operator
This can be fixed by writing your own copy constructor, but it is painful.
Unless you fix it, Broken can only be used in very restricted ways; you may never manage to put it inside a std::vector for example.
Increased coupling
Giving away a reference to your internals increases coupling. You leak an implementation detail (the fact that you are using an int and not a short, long or long long).
With a getter returning a value, you can switch the internal representation to another type, or even elide the member and compute it on the fly.
This is only significant if the interface is exposed to clients expecting binary/source-level compatibility; if the class is only used internally and you can afford to change all users if it changes, then this is not an issue.
Now that semantics are out of the way, we can speak about performance differences.
Increased object size
While references can sometimes be elided, it is unlikely to ever happen here. This means that each reference member will increase the size of an object by at least sizeof(void*), plus potentially some padding for alignment.
The original class MyClassA has a size of 4 on x86 or x86-64 platforms with mainstream compilers.
The Broken class has a size of 8 on x86 and 16 on x86-64 platforms (the latter because of padding, as pointers are aligned on 8-bytes boundaries).
An increased size can bust up CPU caches, with a large number of items you may quickly experience slow downs due to it (well, not that it'll be easy to have vectors of Broken due to its broken assignment operator).
Better performance in debug
As long as the implementation of the getter is inline in the class definition, then the compiler will strip the getter whenever you compile with a sufficient level of optimizations (-O2 or -O3 generally, -O1 may not enable inlining to preserve stack traces).
Thus, the performance of access should only vary in debug code, where performance is least necessary (and otherwise so crippled by plenty of other factors that it matters little).
In the end, use a getter. It's established convention for a good number of reasons :)

When implementing constant reference (or constant pointer) your object also stores a pointer, which makes it bigger in size. Accessor methods, on the other hand, are instantiated only once in program and are most likely optimized out (inlined), unless they are virtual or part of exported interface.
By the way, getter method can also be virtual.

To answer question 2:
const_cast<int&>(mcb.myIntegerAccessor) = 4;
Is a pretty good reason to hide it behind a getter function. It is a clever way to do a getter-like operation, but it completely breaks abstraction in the class.

Related

In what sense const allows only atomic changes to mutable member variables?

I'm reading Functional Programming in C++ from Ivan Čukić, and I am having a hard time interpreting a point in the summary of Chapter 5:
When you make a member function const, you promise that the function won't change any data in the class (not a bit of the object will change), or that any changes to the object (to members declared as mutable) will be atomic as far as the users of the object are concerned.
If the part in italic was simply are limited to members declared as mutable I would have been happy with it. However, this rewording of mine seems to correspond to what the author put in parenthesis. What is out of parenthesis is what is puzzling me: what is the meaning of atomic in that sentence?
The author is making a claim about best practices, not about the rules of the language.
You can write a class in which const methods alter mutable members in ways that are visible to the user, like this:
struct S {
mutable int value = 0;
int get() const {
return value++;
}
};
const S s;
std::cout << s.get(); // prints 0
std::cout << s.get(); // prints 1
// etc
You can do that, and it wouldn't break any of the rules of the language. However, you shouldn't. It violates the user's expectation that the const method should not change the internal state in an observable way.
There are legitimate uses for mutable members, such as memoization that can speed up subsequent executions of a const member function.
The author suggests that, as a matter of best practices, such uses of mutable members by const member functions should be atomic, since users are likely to expect that two different threads can call const member functions on an object concurrently.
If you violate this guideline, then you're not directly breaking any rules of the language. However, it makes it likely that users will use your class in a way that will cause data races (which are undefined behaviour). It takes away the user's ability to use the const qualifier to reason about the thread-safety of your class.
or that any changes to the object (to members declared as mutable)
will be atomic as far as the users of the object are concerned.
I think the author (or editor) of the book worded his statement poorly there -- const and mutable make no guarantees about thread-safety; indeed, they were part of the language back when the language had no support for multithreading (i.e. back when multithreading specifications were not part of the C++ standard and therefore anything you did with multithreading in your C++ program was therefore technically undefined behavior).
I think what the author intended to convey is that changes to mutable member-variables from with a const-tagged method should be limited only to changes that don't change the object's state as far as the calling code can tell. The classic example of this would be memo-ization of an expensive computation for future reference, e.g.:
class ExpensiveResultGenerator
{
public:
ExpensiveResultGenerator()
: _cachedInputValue(-1)
{
}
float CalculateResult(int inputValue) const
{
if ((_cachedInputValue < 0)||(_cachedInputValue != inputValue))
{
_cachedInputValue = inputValue;
_cachedResult = ReallyCPUExpensiveCalculation(inputValue);
}
return _cachedResult;
}
private:
float ReallyCPUExpensiveCalculation(int inputValue) const
{
// Code that is really expensive to calculate the value
// corresponding to (inputValue) goes here....
[...]
return computedResult;
}
mutable int _cachedInputValue;
mutable float _cachedResult;
}
Note that as far as the code using the ExpensiveResultGenerator class is concerned, CalculateResult(int) const doesn't change the state of the ExpensiveResultGenerator object; it is simply computing a mathematical function and returning the result. But internally we are making a memo-ization optimization so that if the user calls CalculateResult(x) with the same value for x multiple times in a row, we can skip the expensive calculation after the first time and just return the _cachedResult instead, for a speedup.
Of course, making that memo-ization optimization can introduce race conditions in a multi-threaded environment, since now we are changing state variables even if the calling code can't see us doing it. So to do this safely in a multithreaded environment, you would need to employ a Mutex of some sort to serialize accesses to the two mutable variables -- either that, or require the calling code to serialize any calls to CalculateResult().

C++ - Loop Efficiency: Storing temporarly values of a class member Vs pointer to this member

Within a class method, I'm accessing private attributes - or attributes of a nested class. Moreover, I'm looping over these attributes.
I was wondering what is the most efficient way in terms of time (and memory) between:
copying the attributes and accessing them within the loop
Accessing the attributes within the loop
Or maybe using an iterator over the attribute
I feel my question is related to : Efficiency of accessing a value through a pointer vs storing as temporary value. But in my case, I just need to access a value, not change it.
Example
Given two classes
class ClassA
{
public:
vector<double> GetAVector() { return AVector; }
private:
vector<double> m_AVector;
}
and
class ClassB
{
public:
void MyFunction();
private:
vector<double> m_Vector;
ClassA m_A;
}
I. Should I do:
1.
void ClassB::MyFunction()
{
vector<double> foo;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(m_Vector[i]));
}
/// do something ...
}
2.
void ClassB::MyFunction()
{
vector<double> foo;
vector<double> VectorCopy = m_Vector;
for(int i=0; i<... ; i++)
{
foo.push_back(SomeFunction(VectorCopy[i]));
}
/// do something ...
}
3.
void ClassB::MyFunction()
{
vector<double> foo;
for(vector<double>::iterator it = m_Vector.begin(); it != m_Vector.end() ; it++)
{
foo.push_back(SomeFunction((*it)));
}
/// do something ...
}
II. What if I'm not looping over m_vector but m_A.GetAVector()?
P.S. : I understood while going through other posts that it's not useful to 'micro'-optimize at first but my question is more related to what really happens and what should be done - as for standards (and coding-style)
You're in luck: you can actually figure out the answer all by yourself, by trying each approach with your compiler and on your operating system, and timing each approach to see how long it takes.
There is no universal answer here, that applies to every imaginable C++ compiler and operating system that exists on the third planet from the sun. Each compiler, and hardware is different, and has different runtime characteristics. Even different versions of the same compiler will often result in different runtime behavior that might affect performance. Not to mention various compilation and optimization options. And since you didn't even specify your compiler and operating system, there's literally no authoritative answer that can be given here.
Although it's true that for some questions of this type it's possible to arrive at the best implementation with a high degree of certainty, for most use cases, this isn't one of them. The only way you can get the answer is to figure it out yourself, by trying each alternative yourself, profiling, and comparing the results.
I can categorically say that 2. is less efficient than 1. Copying to a local copy, and then accessing it like you would the original would only be of potential benefit if accessing a stack variable is quicker than accessing a member one, and it's not, so it's not (if you see what I mean).
Option 3. is trickier, since it depends on the implementation of the iter() method (and end(), which may be called once per loop) versus the implementation of the operator [] method. I could irritate some C++ die-hards and say there's an option 4: ask the Vector for a pointer to the array and use a pointer or array index on that directly. That might just be faster than either!
And as for II, there is a double-indirection there. A good compiler should spot that and cache the result for repeated use - but otherwise it would only be marginally slower than not doing so: again, depending on your compiler.
Without optimizations, option 2 would be slower on every imaginable platform, becasue it will incur copy of the vector, and the access time would be identical for local variable and class member.
With optimization, depending on SomeFunction, performance might be the same or worse for option 2. Same performance would happen if SomeFunction is either visible to compiler to not modify it's argument, or it's signature guarantees that argument will not be modified - in this case compiler can optimize away the copy altogether. Otherwise, the copy will remain.

Task ownership in Factory-Processor model

Factory supplies Tasks of different types to Processor asynchronously. Processor doesn't know details of Tasks and executes them via known Interface. Dynamic allocation is prohibited due to performance reasons. Factory should not own Tasks because otherwise Processor would need to inform Factory when he finishes execution of Task to do the cleanup. Processor should know only Interface, but not Tasks themselves. Processor may own Tasks as opaque objects while he processes them.
One possible solution is: store all kinds of Tasks inside the union of "Interface & padding buffer". Please, consider the following working example (C++11):
#include <iostream>
struct Interface
{
virtual void execute() {}
};
union X
{
X() {}
Interface i;
char padding[1024];
template <class T>
X& operator= (T &&y)
{
static_assert (sizeof(T) <= sizeof(padding), "X capacity is not enough!");
new (padding) T(y);
}
};
struct Task : public Interface
{
Task() : data(777) {}
virtual void execute() { std::cout << data << std::endl; }
int data;
};
int main()
{
Task t;
X x;
x = std::move(t);
Interface *i = &x.i;
i->execute();
};
The snippet works well (prints 777). But are there any dangers (like virtual inheritance) in such approach? Maybe any better solution is possible?
Your solution seems to involve both an unnecessary copy operation, and making assumptions about the layout of your objects in memory that are not guaranteed to be correct in all circumstances. It further invokes undefined behaviour by using memcpy to copy an object with virtual methods, which is explicitly disallowed by the c++ spec. It also has the potential to cause confusion over when object destructors run.
I would use an arrangement like this:
class Processor has an array of buffers, each of which is large enough to contain any defined subclass of your task interface. It has two methods used in submitting tasks:
one to return a pointer to a currently available buffer
one to submit a job
The job interface is extended with a requirement to track the pointer to the buffer that contains it (which will be supplied as a constructor parameter), and has a method to return that pointer.
Submitting a new task is now done like this:
void * buffer = processor.getBuffer();
Task * task = new (buffer) Task(buffer);
processor.submitJob(task);
(this could be simplified using a template method in Processor if required). Then, the processor simply executes jobs, and when it's done with them it asks them for their buffer, runs their destructor, and adds the buffer back into its free buffer list.
Updated answer.
See: std::aligned_union (en.cppreference.com). It is designed to be used together with placement new and explicit destructor call.
Below is the earlier answer, now retracted.
From the design perspective,
Avoiding dynamic allocation seems a drastic requirement. It requires some extraordinary justification.
In case one does not trust the standard allocator for any reason, one could still implement a custom allocator, in order to have full control of its behavior.
If there is a class or method that "owns" all instances of everything: all Factories, all Processors, and all Tasks (as is the case in your main() method), then it is not necessary to copy anything. Just pass references or pointers around, since this "class or method that owns everything" will take care of object lifetime.
My answer is only applicable to the question about "memcpy".
I do not try to cover the issue of memcpy-ing between "Task which has Interface as base class, and X which has Interface as member". This doesn't seem universally valid for all of the C++ compilers, but I don't know offhand which C++ compilers would fail this code.
Short answer, which is applicable to all C++ compilers:
To use memcpy on a type, the type needs to be trivially copyable.
Currently, trivially copyable lists "no virtual functions" as one of the necessary conditions, so the "according to the spec" answer is that your struct Task is not trivially copyable.
The longer, non-standard answer is whether your particular compiler will synthesize the struct and the machine code that would be effectively copyable (i.e. without ill-effects), despite the C++ specification saying no. Obviously this answer will be compiler-specific, and will depend on a lot of circumstances (such as optimization flags and minor code changes).
Remember that compiler optimization and code generation can change from version to version. There is no guarantee that the next version of the compiler will behave exactly the same.
To give an example of something that would be likely to be unsafe for memcpy-ing between two instances, consider:
struct Task : public Interface
{
Task(std::string&& s)
: data(std::move(s))
{}
virtual void execute() { std::cout << data << std::endl; }
std::string data;
};
The reason this is problematic is that, for sufficiently long strings, std::string will allocate dynamic memory to store its content. If there are two instances of Task, and memcpy is used to copy its bytes from one instance to another instance (which would have copied over the internal fields of the std::string class), their pointers will point to the same address, and therefore their destructors will both try to delete the same memory, leading to undefined behavior. In addition, if the instance that was being overwritten had an earlier string value, the memory will not be freed.
Since you have said that "dynamic allocation is prohibited", my guess is that you will not be using std::string or anything similar, instead opting to write C-like code exclusively. So this concern may not be relevant to you.
Speaking of "low level C-like code", here is my idea:
struct TaskBuffer
{
typedef void (*ExecuteFunc) (TaskBuffer*);
ExecuteFunc executeFunc;
char padding[1024];
};
void ProcessMethod(TaskBuffer* tb)
{
(tb->executeFunc)(tb);
}

C++ const-reference semantics?

Consider the sample application below. It demonstrates what I would call a flawed class design.
#include <iostream>
using namespace std;
struct B
{
B() : m_value(1) {}
long m_value;
};
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// assert(this != &b);
m_B.m_value += b.m_value;
m_B.m_value += b.m_value;
}
protected:
B m_B;
};
int main(int argc, char* argv[])
{
A a;
cout << "Original value: " << a.GetB().m_value << endl;
cout << "Expected value: 3" << endl;
a.Foo(a.GetB());
cout << "Actual value: " << a.GetB().m_value << endl;
return 0;
}
Output:
Original value: 1
Expected value: 3
Actual value: 4
Obviously, the programmer is fooled by the constness of b. By mistake b points to this, which yields the undesired behavior.
My question: What const-rules should you follow when designing getters/setters?
My suggestion: Never return a reference to a member variable if it can be set by reference through a member function. Hence, either return by value or pass parameters by value. (Modern compilers will optimize away the extra copy anyway.)
Obviously, the programmer is fooled by the constness of b
As someone once said, You keep using that word. I do not think it means what you think it means.
Const means that you cannot change the value. It does not mean that the value cannot change.
If the programmer is fooled by the fact that some other code else can change something that they cannot, they need a better grounding in aliasing.
If the programmer is fooled by the fact that the token 'const' sounds a bit like 'constant' but means 'read only', they need a better grounding in the semantics of the programming language they are using.
So if you have a getter which returns a const reference, then it is an alias for an object you don't have the permission to change. That says nothing about whether its value is immutable.
Ultimately, this comes down to a lack of encapsulation, and not applying the Law of Demeter. In general, don't mutate the state of other objects. Send them a message to ask them to perform an operation, which may (depending on their own implementation details) mutate their state.
If you make B.m_value private, then you can't write the Foo you have. You either make Foo into:
void Foo(const B &b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
void B::increment_by (const B& b)
{
// assert ( this != &b ) if you like
m_value += b.m_value;
}
or, if you want to ensure that the value is constant, use a temporary
void Foo(B b)
{
m_B.increment_by(b);
m_B.increment_by(b);
}
Now, incrementing a value by itself may or may not be reasonable, and is easily tested for within B::increment_by. You could also test whether &m_b==&b in A::Foo, though once you have a couple of levels of objects and objects with references to other objects rather than values (so &a1.b.c == &a2.b.c does not imply that &a1.b==&a2.b or &a1==&a2), then you really have to just be aware that any operation is potentially aliased.
Aliasing means that incrementing by an expression twice is not the same as incrementing by the value of the expression the first time you evaluated it; there's no real way around it, and in most systems the cost of copying the data isn't worth the risk of avoiding the alias.
Passing in arguments which have the least structure also works well. If Foo() took a long rather than an object which it has to get a long from, then it would not suffer aliasing, and you wouldn't need to write a different Foo() to increment m_b by the value of a C.
I propose a slightly different solution to this that has several advantages (especially in an every increasing, multi-threaded world). Its a simple idea to follow, and that is to "commit" your changes last.
To explain via your example you would simply change the 'A' class to:
struct A
{
const B& GetB() const { return m_B; }
void Foo(const B &b)
{
// copy out what we are going to change;
int itm_value = m_b.m_value;
// perform operations on the copy, not our internal value
itm_value += b.m_value;
itm_value += b.m_value;
// copy over final results
m_B.m_value = itm_value ;
}
protected:
B m_B;
};
The idea here is to place all assignment to memory viewable above the current function at the end, where they pretty much can't fail. This way, if an error is thrown (say there was a divide in the middle of those 2 operations, and if it just happens to be 0) in the middle of the operation, then we aren't left with half baked data in the middle.
Furthermore, in a multi-threading situation, you can do all of the operation, and then just check at the end if anything has changed before your "commit" (an optimistic approach, which will usually pass and usually yield much better results than locking the structure for the entire operation), if it has changed, you simply discard the values and try again (or return a value saying it has failed if there is something it can do instead).
On top of this, the compiler can usually optimise this better, because it is no longer required to write the variables being modified to memory (we are only forcing one read of the value to be changed and one write). This way, the compiler has the option of just keeping the relevant data in a register, saves L1 cache access if not cache misses. Otherwise the compiler will probably make it write to the memory as it doesn't know what aliasing might be taking place (so it can't ensure those values stay the same, if they are all local, it knows it can't be aliasing because the current function is the only one that knows about it).
There's a lot of different things that can happen with the original code posted. I wouldn't be surprised if some compilers (with optimizations enabled) will actually produce code that produces the "expected" result, whereas others won't. All of this is simply because the point at which variables, that aren't 'volatile', are actually written/read from memory isn't well defined within the c++ standards.
The real problem here is atomicity. The precondition of the Foo function is that it's argument doesn't change while in use.
If e.g. Foo had been specified with a value-argument i.s.o. reference argument, no problem would have shown.
Frankly, A::Foo() rubs me the wrong way more than your original problem. Anyhow I look at it, it must be B::Foo(). And inside B::Foo() check for this wouldn't be that outlandish.
Otherwise I do not see how one can specify a generic rule to cover that case. And keep teammates sane.
From past experience, I would treat that as a plain bug and would differentiate two cases: (1) B is small and (2) B is large. If B is small, then simply make A::getB() to return a copy. If B is large, then you have no choice but to handle the case that objects of B might be both rvalue and lvalue in the same expression.
If you have such problems constantly, I'd say simpler rule would be to always return a copy of an object instead of a reference. Because quite often, if object is large, then you have to handle it differently from the rest anyway.
My stupid answer, I leave it here just in case someone else comes up with the same bad idea:
The problem is I think that the object referred to is not const (B const & vs const B &), only the reference is const in your code.

Returning Large Objects in Functions

Compare the following two pieces of code, the first using a reference to a large object, and the second has the large object as the return value. The emphasis on a "large object" refers to the fact that repeated copies of the object, unnecessarily, is wasted cycles.
Using a reference to a large object:
void getObjData( LargeObj& a )
{
a.reset() ;
a.fillWithData() ;
}
int main()
{
LargeObj a ;
getObjData( a ) ;
}
Using the large object as a return value:
LargeObj getObjData()
{
LargeObj a ;
a.fillWithData() ;
return a ;
}
int main()
{
LargeObj a = getObjData() ;
}
The first snippet of code does not require copying the large object.
In the second snippet, the object is created inside the function, and so in general, a copy is needed when returning the object. In this case, however, in main() the object is being declared. Will the compiler first create a default-constructed object, then copy the object returned by getObjData(), or will it be as efficient as the first snippet?
I think the second snippet is easier to read but I am afraid it is less efficient.
Edit: Typically, I am thinking of cases LargeObj to be generic container classes that, for the sake of argument, contains thousands of objects inside of them. For example,
typedef std::vector<HugeObj> LargeObj ;
so directly modifying/adding methods to LargeObj isn't a directly accessible solution.
The second approach is more idiomatic, and expressive. It is clear when reading the code that the function has no preconditions on the argument (it does not have an argument) and that it will actually create an object inside. The first approach is not so clear for the casual reader. The call implies that the object will be changed (pass by reference) but it is not so clear if there are any preconditions on the passed object.
About the copies. The code you posted is not using the assignment operator, but rather copy construction. The C++ defines the return value optimization that is implemented in all major compilers. If you are not sure you can run the following snippet in your compiler:
#include <iostream>
class X
{
public:
X() { std::cout << "X::X()" << std::endl; }
X( X const & ) { std::cout << "X::X( X const & )" << std::endl; }
X& operator=( X const & ) { std::cout << "X::operator=(X const &)" << std::endl; }
};
X f() {
X tmp;
return tmp;
}
int main() {
X x = f();
}
With g++ you will get a single line X::X(). The compiler reserves the space in the stack for the x object, then calls the function that constructs the tmp over x (in fact tmp is x. The operations inside f() are applied directly on x, being equivalent to your first code snippet (pass by reference).
If you were not using the copy constructor (had you written: X x; x = f();) then it would create both x and tmp and apply the assignment operator, yielding a three line output: X::X() / X::X() / X::operator=. So it could be a little less efficient in cases.
Use the second approach. It may seem that to be less efficient, but the C++ standard allows the copies to be evaded. This optimization is called Named Return Value Optimization and is implemented in most current compilers.
Yes in the second case it will make a copy of the object, possibly twice - once to return the value from the function, and again to assign it to the local copy in main. Some compilers will optimize out the second copy, but in general you can assume at least one copy will happen.
However, you could still use the second approach for clarity even if the data in the object is large without sacrificing performance with the proper use of smart pointers. Check out the suite of smart pointer classes in boost. This way the internal data is only allocated once and never copied, even when the outer object is.
The way to avoid any copying is to provide a special constructor. If you
can re-write your code so it looks like:
LargeObj getObjData()
{
return LargeObj( fillsomehow() );
}
If fillsomehow() returns the data (perhaps a "big string" then have a constructor that takes a "big string". If you have such a constructor, then the compiler will very likelt construct a single object and not make any copies at all to perform the return. Of course, whether this is userful in real life depends on your particular problem.
A somewhat idiomatic solution would be:
std::auto_ptr<LargeObj> getObjData()
{
std::auto_ptr<LargeObj> a(new LargeObj);
a->fillWithData();
return a;
}
int main()
{
std::auto_ptr<LargeObj> a(getObjData());
}
Alternatively, you can avoid this issue all together by letting the object get its own data, i. e. by making getObjData() a member function of LargeObj. Depending on what you are actually doing, this may be a good way to go.
Depending on how large the object really is and how often the operation happens, don't get too bogged down in efficiency when it will have no discernible effect either way. Optimization at the expense of clean, readable code should only happen when it is determined to be necessary.
The chances are that some cycles will be wasted when you return by copy. Whether it's worth worrying about depends on how large the object really is, and how often you invoke this code.
But I'd like to point out that if LargeObj is a large and non-trivial class, then in any case its empty constructor should be initializing it to a known state:
LargeObj::LargeObj() :
m_member1(),
m_member2(),
...
{}
That wastes a few cycles too. Re-writing the code as
LargeObj::LargeObj()
{
// (The body of fillWithData should ideally be re-written into
// the initializer list...)
fillWithData() ;
}
int main()
{
LargeObj a ;
}
would probably be a win-win for you: you'd have the LargeObj instances getting initialized into known and useful states, and you'd have fewer wasted cycles.
If you don't always want to use fillWithData() in the constructor, you could pass a flag into the constructor as an argument.
UPDATE (from your edit & comment) : Semantically, if it's worthwhile to create a typedef for LargeObj -- i.e., to give it a name, rather than referencing it simply as typedef std::vector<HugeObj> -- then you're already on the road to giving it its own behavioral semantics. You could, for example, define it as
class LargeObj : public std::vector<HugeObj> {
// constructor that fills the object with data
LargeObj() ;
// ... other standard methods ...
};
Only you can determine if this is appropriate for your app. My point is that even though LargeObj is "mostly" a container, you can still give it class behavior if doing so works for your application.
Your first snippet is especially useful when you do things like have getObjData() implemented in one DLL, call it from another DLL, and the two DLLs are implemented in different languages or different versions of the compiler for the same language. The reason is because when they are compiled in different compilers they often use different heaps. You must allocate and deallocate memory from within the same heap, else you will corrupt memory. </windows>
But if you don't do something like that, I would normally simply return a pointer (or smart pointer) to memory your function allocates:
LargeObj* getObjData()
{
LargeObj* ret = new LargeObj;
ret->fillWithData() ;
return ret;
}
...unless I have a specific reason not to.