Getter for large member variables w/o copying - c++

I have a class containing large member variables. In my case, the large member variable is a container of many objects and it must be private as I don't want to allow a user to modify it directly
class Example {
public:
std::vector<BigObject> get_very_big_object() const { return very_big_object; }
private:
std::vector<BigObject> very_big_object;
}
I want a user to be able to view the object without making a copy:
Example e();
auto very_big_object = e.get_very_big_object(); // Uh oh, made a copy
cout << very_big_object[11]; // Look at any element in the vector etc
I'm a bit confused about the best way to do it. I thought about returning a constant reference, i.e., make my getter:
const std::vector<BigObject>& get_very_big_object() const { return very_big_object; }
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics. But I found that a bit cryptic.
What's the modern best practice for doing this?

I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics.
On this point, the article is flat-out wrong. A smart pointer does not remove the "risk".
Quick summary of relevant parts of the article
If a class returns a const reference to a data member, client code may introduce a const_cast and thereby change the data member without going through the class' API.
The article proposes (incorrectly) that the above can be avoided by using a smart pointer. The setup is for the class to maintain a shared pointer to the data, and have the getter return that pointer cast to a shared pointer to const data.
Critique of the points
First of all, this does not work. All one has to do is de-reference the smart pointer to get a const reference to the data, which can then be const_cast as before. Using the author's own example, instead of
std::string &evil = const_cast<std::string&>(obj.someStr());
use
std::string &evil = const_cast<std::string&>(*obj.str_ptr());
to get the same data-changing results when returning a smart pointer. The entire article is not wrong, but it does get several points wrong. This is one of them.
Second of all, this is not your concern. When you return a const reference, you are telling client code that this value is not to be changed. If the client code does so anyway, it's the client code that broke the agreement. Essentially, the client code invoked undefined behavior, so your class is free to do anything, even crash the program.
What's the modern best practice for doing this?
Simply return a const reference. (Most rules have exceptions, but in my experience, this one seems to be on target 95-99.9% of the time.)

What I did when I was working on my BDD-library for school is to create a wrapper class called VeryBigObject, which contacts the singleton upon instantiation and hides a reference-counting pointer, from there you can override the operator->() method to allow for direct access to the class's methods.
So something like this
class VeryBigObject {
private:
vector<BigObject>* obj;
public:
VeryBigObject() {
// find a way to instantiate with a pointer, not by copying
}
VeryBigObject(const VeryBigObject& o) {
// Update reference counts
obj = o.obj;
}
virtual VeryBigObject operator->(const VeryBigObject&); // I don't remember how to do this one, just google it.
... // Do other overloads as you see fit to mask working with the pointer directly.
};
This allows you to create a small portable class that you don't have to worry about copying, but also has access to the larger object easily. You'll still need to worry about things like caching and such though

Related

Using smart pointers as a class member

I have been reading up on smart pointers and recently in class my TA said that we should never use raw pointers. Now, I've done a lot of reading online and looked at different questions on this website but I'm still confused on some aspects of smart pointers. My question is: which smart pointer would I use if I want it to be used across my program? I'll show some code.
So I have a basic Application class that makes declarations of objects from class AI. Note: I have two different smart pointers, a unique one and a shared one, for testing reasons.
// Application class in Application.h
class Application
{
public:
Application(){}
~Application(){}
//... additional non-important variables and such
unique_ptr<AI> *u_AI; // AI object using a unique pointer
shared_ptr<AI> *s_AI; // AI object using a shared pointer
//... additional non-important variables and such
void init();
void update();
};
// AI class in AI.h
class AI
{
public:
AI(){}
~AI(){}
bool isGoingFirst;
};
In the Application init function, I want to create the AI object, and then I want to use it in the update function. I am not sure if I am declaring my pointer right at all, but I know for a fact that it compiles and it works for assigning and printing out data in the init function. More code below.
void Application::init()
{
//.. other initialization's.
std::shared_ptr<AI> temp(new AI());
sh_AI = &temp;
sh_AI->isGoingFirst = true;
//.. other initialization's.
// Function ends.
}
void Application::update()
{
if(sh_AI->get()->isGoingFirst == true)
{
// Do something
}
else
{
// Do something else
}
// Other code below
}
Later in my program, the update function is called, which uses the same AI smart pointer that I declared in my class Application. What I have found out is that the smart pointer AI object is being deleted. I understand that smart pointers have automatic memory management, but is there a smart pointer that will allow you to use a it in different functions without creating any major problems, such as memory leaks or dangling references? Or am I missing the whole point of smart pointers?
I'm sorry if this was answered in another question but I read into a lot of the other questions, and while I understand more about smart pointers, I'm still learning. Thank you!
As Neil Kirk pointed out in the comments, these declarations are not what you want:
unique_ptr<AI> *u_AI; // AI object using a unique pointer
shared_ptr<AI> *s_AI; // AI object using a shared pointer
u_AI and s_AI are still objects to raw pointers. The whole point is to remove the need to manage the raw pointer directly. So now you replace them with:
unique_ptr<AI> u_AI; // AI object using a unique pointer
shared_ptr<AI> s_AI; // AI object using a shared pointer
to assign your created pointer, you use the function make_unique or make_shared:
u_AI = unique_ptr<AI>(new AI()); // Yu may be able to use make_unique like
// make_shared but it's new to C++14. may not be available
s_AI = make_shared<AI>();
Then, when you need to access them, you just pretend they are pointers, so in your update function:
if(sh_AI->get()->isGoingFirst == true)
becomes:
if(sh_AI->isGoingFirst == true)
As for when to use unique_ptr vs shared_ptr, you answer that by answering the following question: What do I want to happen when someone makes a copy of Application? i.e.:
Application app1;
app1.init();
Application app2 = app1; // ?? what happens to AI object in app2?
There are 3 possible answers:
I want there to be an extra copy of AI in app2. In this case you use unique_ptr and make sure you implement a copy constructor that does the copying.
I want app2 and app1 to share a copy of AI. In this case you use shared_ptr and the default copy constructor will do the job for you.
I don't want there ever to be a copy of Application. (Which makes sense for a class called Application). In this case it doesn't really matter (in which case I would default to unique_ptr) and remove the copy constructor:
Application(const Application&) = delete;
Short answer: Since your pointer is public, I suggest you use a shared_ptr. However, your pointer does not need to be public so if it was private you could use a unique_ptr since you only use it in your own instance.
The truth is though that it does not really matter much (and I know I'll get some downvotes with this). There are two reasons to use unique_ptr:
it never leaves your module and you just need a replacement for a naked pointer
you want to explicitly show that it is not supposed to leave your module.
On the other hand if you need to ever share the pointer (even in a read-only way) then you will have to use a shared_ptr.
A lot of times it is more convenient to use shared_ptr to begin with but for reason 2) above it is worth using unique_ptr.
Not a reason to use unique_ptr: performance. All I say is make_shared.
Now to your code
This is how you define a smart pointer:
std::shared_ptr<AI> s_AI;
std::unique_ptr<AI> u_AI;
This is how you initialize it:
s_AI = std::make_shared<AI>(); // or
s_AI = std::shared_ptr<AI>(new AI);
u_AI = std::unique_ptr<AI>(new AI);
Note that there is no std::make_unique in C++11. It's going to be in C++14 and it's not that hard to write a replacement but fact is that in C++11 there is none.
This is how you use the pointers:
s_AI->isGoingFirst;
That's it, it behaves like a normal pointer. Only if you have to pass it to a function that needs a pointer you need to use .get().
here is how you delete (empty) the pointer:
s_AI.reset();
Again, I suggest you make your pointer private. If you need to pass it out make sure you use a shared_ptr and write a getter method:
std::shared_ptr<AI> getAI() const {
return s_AI;
}
Remember that if you do this you can't assume that your AI object will be destroyed when your Application object is.

Is it alright to return a reference to a non-pointer member variable as a pointer?

I recently came across some C++ code that looked like this:
class SomeObject
{
private:
// NOT a pointer
BigObject foobar;
public:
BigObject * getFoobar() const
{
return &foobar;
}
};
I asked the programmer why he didn't just make foobar a pointer, and he said that this way he didn't have to worry about allocating/deallocating memory. I asked if he considered using some smart pointer, he said this worked just as well.
Is this bad practice? It seems very hackish.
That's perfectly reasonable, and not "hackish" in any way; although it might be considered better to return a reference to indicate that the object definitely exists. A pointer might be null, and might lead some to think that they should delete it after use.
The object has to exist somewhere, and existing as a member of an object is usually as good as existing anywhere else. Adding an extra level of indirection by dynamically allocating it separately from the object that owns it makes the code less efficient, and adds the burden of making sure it's correctly deallocated.
Of course, the member function can't be const if it returns a non-const reference or pointer to a member. That's another advantage of making it a member: a const qualifier on SomeObject applies to its members too, but doesn't apply to any objects it merely has a pointer to.
The only danger is that the object might be destroyed while someone still has a pointer or reference to it; but that danger is still present however you manage it. Smart pointers can help here, if the object lifetimes are too complex to manage otherwise.
You are returning a pointer to a member variable not a reference. This is bad design.
Your class manages the lifetime of foobar object and by returning a pointer to its members you enable the consumers of your class to keep using the pointer beyond the lifetime of SomeObject object. And also it enables the users to change the state of SomeObject object as they wish.
Instead you should refactor your class to include the operations that would be done on the foobar in SomeObject class as methods.
ps. Consider naming your classes properly. When you define it is a class. When you instantiate, then you have an object of that class.
It's generally considered less than ideal to return pointers to internal data at all; it prevents the class from managing access to its own data. But if you want to do that anyway I see no great problem here; it simplifies the management of memory.
Is this bad practice? It seems very hackish.
It is. If the class goes out of scope before the pointer does, the member variable will no longer exist, yet a pointer to it still exists. Any attempt to dereference that pointer post class destruction will result in undefined behaviour - this could result in a crash, or it could result in hard to find bugs where arbitrary memory is read and treated as a BigObject.
if he considered using some smart pointer
Using smart pointers, specifically std::shared_ptr<T> or the boost version, would technically work here and avoid the potential crash (if you allocate via the shared pointer constructor) - however, it also confuses who owns that pointer - the class, or the caller? Furthermore, I'm not sure you can just add a pointer to an object to a smart pointer.
Both of these two points deal with the technical issue of getting a pointer out of a class, but the real question should be "why?" as in "why are you returning a pointer from a class?" There are cases where this is the only way, but more often than not you don't need to return a pointer. For example, suppose that variable needs to be passed to a C API which takes a pointer to that type. In this case, you would probably be better encapsulating that C call in the class.
As long as the caller knows that the pointer returned from getFoobar() becomes invalid when the SomeObject object destructs, it's fine. Such provisos and caveats are common in older C++ programs and frameworks.
Even current libraries have to do this for historical reasons. e.g. std::string::c_str, which returns a pointer to an internal buffer in the string, which becomes unusable when the string destructs.
Of course, that is difficult to ensure in a large or complex program. In modern C++ the preferred approach is to give everything simple "value semantics" as far as possible, so that every object's life time is controlled by the code that uses it in a trivial way. So there are no naked pointers, no explicit new or delete calls scattered around your code, etc., and so no need to require programmers to manually ensure they are following the rules.
(And then you can resort to smart pointers in cases where you are totally unable to avoid shared responsibility for object lifetimes.)
Two unrelated issues here:
1) How would you like your instance of SomeObject to manage the instance of BigObject that it needs? If each instance of SomeObject needs its own BigObject, then a BigObject data member is totally reasonable. There are situations where you'd want to do something different, but unless that situation arises stick with the simple solution.
2) Do you want to give users of SomeObject direct access to its BigObject? By default the answer here would be "no", on the basis of good encapsulation. But if you do want to, then that doesn't change the assessment of (1). Also if you do want to, you don't necessarily need to do so via a pointer -- it could be via a reference or even a public data member.
A third possible issue might arise that does change the assessment of (1):
3) Do you want to give users of SomeObject direct access to an instance of BigObject that they continue using beyond the lifetime of the instance of SomeObject that they got it from? If so then of course a data member is no good. The proper solution might be shared_ptr, or for SomeObject::getFooBar to be a factory that returns a different BigObject each time it's called.
In summary:
Other than the fact it doesn't compile (getFooBar() needs to return const BigObject*), there is no reason so far to suppose that this code is wrong. Other issues could arise that make it wrong.
It might be better style to return const & rather than const *. Which you return has no bearing on whether foobar should be a BigObject data member.
There is certainly no "just" about making foobar a pointer or a smart pointer -- either one would necessitate extra code to create an instance of BigObject to point to.

private object pointer vs object value, and returning object internals

Related to: C++ private pointer "leaking"?
According to Effective C++ (Item 28), "avoid returning handles (references, pointers, or iterators) to object internals. It increases encapsulation, helps const member functions act const, and minimizes the creation of dangling handles."
Returning objects by value is the only way I can think of to avoid returning handles. This to me suggests I should return private object internals by value as much as possible.
However, to return object by value, this requires the copy constructor which goes against the Google C++ Style Guide of "DISALLOW_COPY_AND_ASSIGN" operators.
As a C++ newbie, unless I am missing something, I find these two suggestions to conflict each other.
So my questions are: is there no silver bullet which allows efficient reference returns to object internals that aren't susceptible to dangling pointers? Is the const reference return as good as it gets? In addition, should I not be using pointers for private object fields that often? What is a general rule of thumb for choosing when to store private instance fields of objects as by value or by pointer?
(Edit) For clarification, Meyers' example dangling pointer code:
class Rectangle {
public:
const Point& upperLeft() const { return pData->ulhc; }
const Point& lowerRight() const { return pData->lrhc; }
...
};
class GUIObject { ... };
const Rectangle boundingBox(const GUIObject& obj);
If the client creates a function with code such as:
GUIObject *pgo; // point to some GUIObject
const Point *pUpperLeft = &(boundingBox(*pgo).upperLeft());
"The call to boundingBox will return a new, temporary Rectangle object [(called temp from here.)] upperLeft will then be called on temp, and that call will return a reference to an internal part of temp, in particular, to one of the Points making it up...at the end of the statement, boundingBox's return value temp will be destroyed, and that will indirectly lead to the destruction of temp's Points. That, in turn, will leave pUpperLeft pointing to an object that no longer exists." Meyers, Effective C++ (Item 28)
I think he is suggesting to return Point by value instead to avoid this:
const Point upperLeft() const { return pData->ulhc; }
The Google C++ style guide is, shall we say, somewhat "special" and has led to much discussion on various C++ newsgroups. Let's leave it at that.
Under normal circumstances I would suggest that following the guidelines in Effective C++ is generally considered to be a good thing; in your specific case, returning an object instead of any sort of reference to an internal object is usually the right thing to do. Most compilers are pretty good at handling large return values (Google for Return Value Optimization, pretty much every compiler does it).
If measurements with a profiler suggest that returning a value is becoming a bottleneck, then I would look at alternative methods.
First, let's look at this statement in context:
According to Effective C++ (Item 28),
"avoid returning handles (references,
pointers, or iterators) to object
internals. It increases encapsulation,
helps const member functions act
const, and minimizes the creation of
dangling handles."
This is basically talking about a class's ability to maintain invariants (properties that remain unchanged, roughly speaking).
Let's say you have a button widget wrapper, Button, which stores an OS-specific window handle to the button. If the client using the class had access to the internal handle, they could tamper with it using OS-specific calls like destroying the button, making it invisible, etc. Basically by returning this handle, your Button class sacrifices any control it originally had over the button handle.
You want to avoid these situations in such a Button class by providing everything you can do with the button as methods in this Button class. Then you don't need to ever return a handle to the OS-specific button handle.
Unfortunately, this doesn't always work in practice. Sometimes you have to return the handle or pointer or some other internal by reference for various reasons. Let's take boost::scoped_ptr, for instance. It is a smart pointer designed to manage memory through the internal pointer it stores. It has a get() method which returns this internal pointer. Unfortunately, that allows clients to do things like:
delete my_scoped_ptr.get(); // wrong
Nevertheless, this compromise was required because there are many cases where we are working with C/C++ APIs that require regular pointers to be passed in. Compromises are often necessary to satisfy libraries which don't accept your particular class but does accept one of its internals.
In your case, try to think if your class can avoid returning internals this way by instead providing functions to do everything one would want to do with the internal through your public interface. If not, then you've done all you can do; you'll have to return a pointer/reference to it but it would be a good habit to document it as a special case. You should also consider using friends if you know which places need to gain access to the class's internals in advance; this way you can keep such accessor methods private and inaccessible to everyone else.
Returning objects by value is the only
way I can think of to avoid returning
handles. This to me suggests I should
return private object internals by
value as much as possible.
No, if you can return a copy, then you can equally return by const reference. The clients cannot (under normal circumstances) tamper with such internals.
It really depends on the situation. If you plan to see changes in the calling method you want to pass by reference. Remember that passing by value is a pretty heavy operation. It requires a call to the copy constructor which in essence has to allocate and store enough memory to fit size of your object.
One thing you can do is fake pass by value. What that means is pass the actual parameter by value to a method that accepts const your object. This of course means the caller does not care to see changes to your object.
Try to limit pass by value if you can unless you have to.

Implement Register/Unregister-Pattern in C++

I often come accross the problem that I have a class that has a pair of Register/Unregister-kind-of-methods. e.g.:
class Log {
public:
void AddSink( ostream & Sink );
void RemoveSink( ostream & Sink );
};
This applies to several different cases, like the Observer pattern or related stuff. My concern is, how safe is that? From a previous question I know, that I cannot safely derive object identity from that reference. This approach returns an iterator to the caller, that they have to pass to the unregister method, but this exposes implementation details (the iterator type), so I don't like it. I could return an integer handle, but that would require a lot of extra internal managment (what is the smallest free handle?). How do you go about this?
You are safe unless the client object has two derivations of ostream without using virtual inheritance.
In short, that is the fault of the user -- they should not be multiply inheriting an interface class twice in two different ways.
Use the address and be done with it. In these cases, I take a pointer argument rather than a reference to make it explicit that I will store the address. It also prevents implicit conversions that might kick in if you decided to take a const reference.
class Log {
public:
void AddSink( ostream* Sink );
void RemoveSink( ostream* Sink );
};
You can create an RAII object that calls AddSink in the constructor, and RemoveSink in the destructor to make this pattern exception-safe.
You could manage your objects using smart pointers and compare the pointers for equality inside your register / deregister functions.
If you only have stack allocated objects that are never copied between an register and deregister call you could also pass a pointer instead of the reference.
You could also do:
typedef iterator handle_t;
and hide the fact that your giving out internal iterators if exposing internal data structures worries you.
In your previous question, Konrad Rudolph posted an answer (that you did not accept but has the highest score), saying that everything should be fine if you use base class pointers, which you appear to do.

AddRef and function signature

I've always used the following rule for signatures of functions that return ref-counted objects based on whether they do an AddRef or not, but want to explain it to my colleagues too... So my question is, is the rule described below a widely followed rule? I'm looking for pointers to (for example) coding rules that advocate this style.
If the function does not add a reference to the object, it should be returned as the return value of the function:
class MyClass
{
protected:
IUnknown *getObj() { return m_obj; }
private:
IUnknown *m_obj;
};
However, if the function adds a reference to the object, then a pointer-to-pointer of the object is passed as a parameter to the function:
class MyClass
{
public:
void getObj(IUnknown **outObj) { *outObj = m_obj; (*outObj)->AddRef(); }
private:
IUnknown *m_obj;
};
It's much more typical to use the reference-counting smart pointers for cases when a new object is created and the caller has to take ownership of it.
I've used this same style on projects with a lot of COM. It was taught to me by a couple of people that learned it when they worked at NuMega on a little thing called SoftICE. I think this is also the style taught in the book "Essential COM", by Don Box (here it is at Amazon). At one point in time this book was considered the Bible for COM. I think the only reason this isn't still the case is that COM has become so much more than just COM.
All that said, I prefer CComPtr and other smart pointers.
One approach is to never use the function's return value. Only use output parameters, as in your second case. This is already a rule anyway in published COM interfaces.
Here's an "official" reference but, as is typical, it doesn't even mention your first case: http://support.microsoft.com/kb/104138
But inside a component, banning return values makes for ugly code. It is much nicer to have composability - i.e. putting functions together conveniently, passing the return value of one function directly as an argument to another.
Smart pointers allow you to do that. They are banned in public COM interfaces but then so are non-HRESULT return values. Consequently, your problem goes away. If you want to use a return value to pass back an interface pointer, do it via a smart pointer. And store members in smart pointers as well.
However, suppose for some reason you didn't want to use smart pointers (you're crazy, by the way!) then I can tell you that your reasoning is correct. Your function is acting as a "property getter", and in your first example it should not AddRef.
So your rule is correct (although there's a bug in your implementation which I'll come to in a second, as you may not have spotted it.)
This function wants an object:
void Foo(IUnknown *obj);
It doesn't need to affect obj's refcount at all, unless it wants to store it in a member variable. It certainly should NOT be the responsibility of Foo to call Release on obj before it returns! Imagine the mess that would create.
Now this function returns an object:
IUnknown *Bar();
And very often we like to compose functions, passing the output of one directly to another:
Foo(Bar());
This would not work if Bar had bumped up the refcount of whatever it returned. Who's going to Release it? So Bar does not call AddRef. This means that it is returning something that it stores and manages, i.e. it's effectively a property getter.
Also if the caller is using a smart pointer, p:
p = Bar();
Any sane smart pointer is going to AddRef when it is assigned an object. If Bar had also AddRef-ed well, we have again leaked one count. This is really just a special case of the same composability problem.
Output parameters (pointer-to-pointer) are different, because they aren't affected by the composability problem in the same way:
Again, smart pointers provide the most common case, using your second example:
myClass.getObj(&p);
The smart pointer isn't going to do any ref-counting here, so getObj has to do it.
Now we come to the bug. Suppose smart pointer p already points to something when you pass it to getObj...
The corrected version is:
void getObj(IUnknown **outObj)
{
if (*outObj != 0)
(*outObj)->Release();
*outObj = m_obj;
(*outObj)->AddRef(); // might want to check for 0 here also
}
In practise, people make that mistake so often that I find it simpler to make my smart pointer assert if operator& is called when it already has an object.