Tradeoffs when returning a collection - c++

There are various ways of returning a collection of items from a method of a class in C++.
For example, consider the class MessageSpy that listens on all Messages sent over a connection. A client could access the messaging information in a number of ways.
const CollectionClass MessageSpy::getMessages()
Iterator MessageSpy::begin(), Iterator MessageSpy::end()
void MessageSpy::getMessages(OutputIterator)
void MessageSpy::eachMessage(Functor)
others...
Each approach has its trade-offs. For example: Approach 1 would require copying the whole collection which is expensive for large collections. While approach 2 makes the class look like a collection which is inappropriate for a view...
Since I'm always strungling choosing the most appropriate approach in I wonder what you consider the trade-offs/costs when considering these approaches?

I suggest an iterator based/callback based approach in cases where you demand the most lightweight solution possible.
The reason is that it decouples the supplier from the usage patterns by the consumer.
In particular, slamming the result into a collection1 (even though the result maybe "optimized" - likely into (N)RVO or moving instead of copying the object) would still allocate a complete container for the full capacity.
Edit: 1 an excellent addition to "obligatory papers" (they're not; they're just incredibly helpful if you want to understand things): Want Speed? Pass By value by Dave Abrahams.
Now
this is overkill if the consumer actually stops processing data after the first few elements
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
if (!doProcessing(*f))
break;
}
this can be suboptimal even if the consumer processes al elements eventually: there might not be a need to have all elements copied at any particular moment, so the 'slot' for the 'current element' can be reused, reducing memory requirements, increasing cache locality. E.g.:
for(auto f=myType.begin(), l=myType.end(); f!=l; ++f)
{
myElementType const& slot = *f; // making the temp explicit
doProcessing(slot);
}
Note that iterator interfaces are simply still superior if the consumer did want a collection containing all elements:
std::vector<myElementType> v(myType.begin(), myType.end());
// look: the client gets to _decide_ what container he wants!
std::set<myElementType, myComparer> s(myType.begin(), myType.end());
Try getting this flexibility otherwise.
Finally, there are some elements of style:
by nature it's easy to expose (const) references to the elements using iterators; this makes it much easier to avoid object slicing and to enable clients to use the elements polymorphically.
iterator-style interfaces could be overloaded to return non-const references on dereference. A container to be returned, couldn't contain references (directly)
if you adhere to the requirements of range-based-for in C++11 you can have some syntactic sugar:
for (auto& slot : myType)
{
doProcessing(slot);
}
Finally, (as shown above), in the general sense iterators work nicely with the standard library.
The callback style (and similarly the Output-iterator style) has many of the benefits of the iterator style (namely, you could use return values to abort iteration halfway, and you could do processing without allocating copies of all elements up front), but it seems to me to be slightly less flexible in use. Of course, there may be situations where you want to encourage a particular usage pattern, and this migh be a good way to go.

The first thing (you somehow didn't mention at all) I would think about is
const CollectionClass& MessageSpy::getMessages()
Note the &. That returns you const reference which can't be modified but can be freely accepted.
No copying, unless you really want to copy.
If that's not suitable, Qt, for example, uses "implicit data sharing" for plenty of classes.
I.e. your classes are "kinda" returned by value, BUT their internal data is shared until you attempt to perform write operation on one of them. In this case, class you're trying to write into, performs a deep copy, and data stops being shared. That means less data is moved around.
And there's return value optimization some people on SO seems to love too much. Basically, when you return something big by value, some compilers in certain situations can eliminate extra copy, and immediately pass value bypassing extra assignment which may be faster than returning by reference. I wouldn't rely on it too much, but if you profiled your code and figured out that using RVO provides a good speed-up, then it is worth using.
I wouldn't recommend "iterators", because using them on C++03 compiler without auto keyword is royal pain in the #&#. Long names or many typedefs. I would return const reference to container itself instead.

Related

Why should methods return a new instance, rather than modify the instance itself?

Suppose I have a Vector3 class, that contains a normalize() method. Should that method return a new Vector3, or modify the Vector3 instance it is called on (therefore returning a reference to itself (Vector3&)?) What are some instances where one would be preferred over the other? What about performance?
The answer depends on the design of your class.
For mutable classes rotate should rotate the vector itself. This is viewed as somewhat more efficient, and in case of large objects it lets you avoid copying large volumes of data when vectors have many items in them.
Immutable classes, on the other hand, must return only new objects, because they cannot be mutated themselves. This adds some overhead, but it has a lot of pluses, especially when objects must be used concurrently.
A common naming convention is to use a verb for mutating operations, as in
myVector.rotate(angle);
myVector.scale(factor);
while operations that return new objects should be named with past participles, as in
auto newVector = myVector.rotated(angle).scaled(factor);
Competing goals: Correctness vs Performance (sometimes).
If you use immutable types, you have it easier to write correct (parallel) programs. If you use mutable types, you sometimes have a certain performance benefit which might well be lost once you try to go parallel. Then there is the 80/20 rules. 80% of the code need not be optimized. So why use mutable types by default?
First go immutable, then see if it has enough performance, then optimize, if not.
Vector3 rotate(const Angle& angle)
is probably fine, but well depends on Vector3 is implemented correctly. Especially regarding std::move() behavior.
Side effects. If I pass an object that I retrieved from a map, for example, and you do something to the object I pass, you have changed the thing that is inside my map, and the next time I ask for it I get something that does not look the same as I got last time. Keeping objects immutable prevents accidents like this, especially when multiple devs are working on the same app.

Observer with multiple subjects using std::unordered_set

I have seen implementations of the Observer design pattern in which the Observer is responsible for multiple Subjects. Most of these implementations use a std::vector<Subject*> in order to keep track of the Subjects.
Would it be possible for me to do a similar thing, using a std::unordered_set<weak_ptr<Subject>> instead?
The reason I want to use an unordered_set is that I will not need duplicates, and I don't need an ordered container. From what I understand, an unordered_set is the way to go in this situation. Also, the reason I am using a weak_ptr is that it should be safer?
If you disagree, leave an answer explaining what container I should use instead. If I did use the unordered_set, I would have to declare a hash function for the weak_ptr, but could this be accomplished by just using the hash function for the pointer inside, obtained with subjects.lock().get()?
First of all, in my answer I will use Subject as the one who sends messages to registered Observers, since it is the common use of this two terms.
Would it be possible for me to do a similar thing, using a std::unordered_set<weak_ptr<Observer>> instead?
It is possible. However remeber that the object held by a weak_ptr can be freed, weak_ptr needs to be casted to a shared_ptr before accessing the underlying object. It is done this way so the object is not freed while you are handling it.
Would it be possible for me to do a similar thing, using a std::unordered_set> instead?
If you need to enforce uniqueness the unordered_set looks like a good choice to me. If you don't need to, then a vector is more straightforward solution. Some would tell that unique_set is slower and requires more memory than a vector, but unless you need very high frequency registration of Observers or thousands of them, you won't notice the difference.
About the weak pointer, it gives you the flexibility of having your Observers deallocated while registered, so it should be fine. This behaviour may be unexpected if you come from a memory managed language like Java. If you want to hold them in existence while they are registered in your Subject you may use a shared_pointer instead.
I would have to declare a hash function for the weak_ptr, but could this be accomplished by just using the hash function for the pointer inside, obtained with observer.lock().get()?
Be careful when creating hash functions, I dont recommend you to use object's pointer for the hash function, specially if your Subjects can be copied/moved. Instead you may create an unique identifier for every Subject upon creation using a counter, and remember to write copy/move constructors and operators accordingly.
If you cannot write an identifying hash function, then you should't use the unique_set, since you lose the advantages it brings.
As a footnote, the beauty of object containers is that you can fit them to your needs, every solution is the correct solution if it does what you really want.
There isn't really one 'correct' answer for the choice of container; it depends upon what you are aiming for from the point of view of performance. And whether or not performance is really all that important for this.
It also depends upon memory efficiency. If you only have a few of these unordered_set objects and need very fast lookup then it may be a good choice. Since it is a hash table it will use a fairly large amount of memory per unordered_set object.
If you have a lot of unordered_set objects with fairly few items then it may get a bit expensive in terms of memory budget. If you need fast insertion and removal then std::set may be better in this case. However if the collection only ever contains a handful of items then the lookup will probably actually be faster with a linear search of std::vector due to the processor cache (i.e. better locality of reference of the vector elements compared to std::set - which may result in more elements being on the same cache line). Memory usage of vector will be lower than either of std::set or std::unordered_set.
If you need fast lookup of specific objects for some reason and use std::vector and typically have a a moderate number of elements then you could insert items into the vector in sorted order. You can then use std::lower_bound to do an O(log n) binary search lookup. However, this has a potentially high cost for inserting and removing elements.
I'd probably just go for std:vector in most cases - you generally have few observers, so may as well keep memory usage tighter.
And using weak_ptr is certainly a good option if these objects are used elsewhere with shared_ptr.
In the Observer pattern the Observer subscribes to notifications about changes on a Subject. The Subject is responsible for updating all subscribed Observers whenever its observable state changes. For that to work the Observers do not need to keep track of the Subjects. Instead the Subject must keep track of all subscribed Observers.
A nice explanation of the Observer pattern can be found here: https://sourcemaking.com/design_patterns/observer
Code outline:
class Subject;
class Observer
{
public:
// when notified about a change, the Observer
// knows which Subject changed, because of the parameter s
virtual void subjectChanged(Subject* s)=0;
};
class Subject
{
private:
int m_internalState;
std::set<Observer*> m_observers;
public:
void subscribe(Observer* o)
{
m_observers.insert(o);
}
void unsubscribe(Observer* o);
{
m_observers.erase(o);
}
void setInternalState(int state)
{
auto end=m_observers.end();
for(auto it=m_observers.begin(); it != end; ++it)
it->subjectChanged(this);
}
};
In most cases it won't matter which exact collection type you choose for storing the Observers, because there will be very few Observers. However, choosing a set-type has the advantage, that each Observer will receive only one notification. With a vector it could happen that the same Observer receives multiple notifications about the same change, if (for some reason) it was subscribed multiple times.
I really think that using std::unordered_set is a bit over-kill.
what is this observer pattern? when an event or a change of state occurs, iterate over an array of state-checkers and make them do something if the state is invalid or special in any sort.
this has being said, you want to iterate over an array with objects with overriden virtual function and call it. why whould set give us any benefit?
also, I don't get the weak_ptr idea - the owner of the observers is the array with holds them. the owner of that array is the Subject.
now that all has being said, I would go with std::vector<std::unique_ptr<Observer>>.
Edit:
using C++11, I'd even go with std::vector<std::function<void(Subject&)>> and avoid the boilerplate of inheriting+overriding.
The simplest thing to do is to roll with boost::signals2, which already implemented this for you, for all signatures. The fundamental problem with your approach is that the implementation is tied to a particular signature with a particular subject and observer, which is virtually worthless compared to a generic solution that applies for all cases.
The Observer pattern is not a pattern, it's a class template.

How to return smart pointers (shared_ptr), by reference or by value?

Let's say I have a class with a method that returns a shared_ptr.
What are the possible benefits and drawbacks of returning it by reference or by value?
Two possible clues:
Early object destruction. If I return the shared_ptr by (const) reference, the reference counter is not incremented, so I incur the risk of having the object deleted when it goes out of scope in another context (e.g. another thread). Is this correct? What if the environment is single-threaded, can this situation happen as well?
Cost. Pass-by-value is certainly not free. Is it worth avoiding it whenever possible?
Thanks everybody.
Return smart pointers by value.
As you've said, if you return it by reference, you won't properly increment the reference count, which opens up the risk of deleting something at the improper time. That alone should be enough reason to not return by reference. Interfaces should be robust.
The cost concern is nowadays moot thanks to return value optimization (RVO), so you won't incur a increment-increment-decrement sequence or something like that in modern compilers. So the best way to return a shared_ptr is to simply return by value:
shared_ptr<T> Foo()
{
return shared_ptr<T>(/* acquire something */);
};
This is a dead-obvious RVO opportunity for modern C++ compilers. I know for a fact that Visual C++ compilers implement RVO even when all optimizations are turned off. And with C++11's move semantics, this concern is even less relevant. (But the only way to be sure is to profile and experiment.)
If you're still not convinced, Dave Abrahams has an article that makes an argument for returning by value. I reproduce a snippet here; I highly recommend that you go read the entire article:
Be honest: how does the following code make you feel?
std::vector<std::string> get_names();
...
std::vector<std::string> const names = get_names();
Frankly, even though I should know better, it makes me nervous. In principle, when get_names()
returns, we have to copy a vector of strings. Then, we need to copy it again when we initialize
names, and we need to destroy the first copy. If there are N strings in the vector, each copy
could require as many as N+1 memory allocations and a whole slew of cache-unfriendly data accesses > as the string contents are copied.
Rather than confront that sort of anxiety, I’ve often fallen back on pass-by-reference to avoid
needless copies:
get_names(std::vector<std::string>& out_param );
...
std::vector<std::string> names;
get_names( names );
Unfortunately, this approach is far from ideal.
The code grew by 150%
We’ve had to drop const-ness because we’re mutating names.
As functional programmers like to remind us, mutation makes code more complex to reason about by undermining referential transparency and equational reasoning.
We no longer have strict value semantics for names.
But is it really necessary to mess up our code in this way to gain efficiency? Fortunately, the answer turns out to be no (and especially not if you are using C++0x).
Regarding any smart pointer (not just shared_ptr), I don't think it's ever acceptable to return a reference to one, and I would be very hesitant to pass them around by reference or raw pointer. Why? Because you cannot be certain that it will not be shallow-copied via a reference later. Your first point defines the reason why this should be a concern. This can happen even in a single-threaded environment. You don't need concurrent access to data to put bad copy semantics in your programs. You don't really control what your users do with the pointer once you pass it off, so don't encourage misuse giving your API users enough rope to hang themselves.
Secondly, look at your smart pointer's implementation, if possible. Construction and destruction should be darn close to negligible. If this overhead isn't acceptable, then don't use a smart pointer! But beyond this, you will also need to examine the concurrency architecture that you've got, because mutually exclusive access to the mechanism that tracks the uses of the pointer is going to slow you down more than mere construction of the shared_ptr object.
Edit, 3 years later: with the advent of the more modern features in C++, I would tweak my answer to be more accepting of cases when you've simply written a lambda that never lives outside of the calling function's scope, and isn't copied somewhere else. Here, if you wanted to save the very minimal overhead of copying a shared pointer, it would be fair and safe. Why? Because you can guarantee that the reference will never be mis-used.

How to avoid out parameters?

I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).

Is it a good (correct) way to encapsulate a collection?

class MyContainedClass {
};
class MyClass {
public:
MyContainedClass * getElement() {
// ...
std::list<MyContainedClass>::iterator it = ... // retrieve somehow
return &(*it);
}
// other methods
private:
std::list<MyContainedClass> m_contained;
};
Though msdn says std::list should not perform relocations of elements on deletion or insertion, is it a good and common way to return pointer to a list element?
PS: I know that I can use collection of pointers (and will have to delete elements in destructor), collection of shared pointers (which I don't like), etc.
I don't see the use of encapsulating this, but that may be just me. In any case, returning a reference instead of a pointer makes a lot more sense to me.
In a general sort of way, if your "contained class" is truly contained in your "MyClass", then MyClass should not be allowing outsiders to touch its private contents.
So, MyClass should be providing methods to manipulate the contained class objects, not returning pointers to them. So, for example, a method such as "increment the value of the umpteenth contained object", rather than "here is a pointer to the umpteenth contained object, do with it as you wish".
It depends...
It depends on how much encapsulated you want your class to be, and what you want to hide, or show.
The code I see seems ok for me. You're right about the fact the std::list's data and iterators won't be invalidated in case of another data/iterator's modification/deletion.
Now, returning the pointer would hide the fact you're using a std::list as an internal container, and would not let the user to navigate its list. Returning the iterator would let more freedom to navigate this list for the users of the class, but they would "know" they are accessing a STL container.
It's your choice, there, I guess.
Note that if it == std::list<>.end(), then you'll have a problem with this code, but I guess you already know that, and that this is not the subject of this discussion.
Still, there are alternative I summarize below:
Using const will help...
The fact you return a non-const pointer lets the user of you object silently modify any MyContainedClass he/she can get his/her hands on, without telling your object.
Instead or returning a pointer, you could return a const pointer (and suffix your method with const) to stop the user from modifying the data inside the list without using an accessor approved by you (a kind of setElement ?).
const MyContainedClass * getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return &(*it);
}
This will increase somewhat the encapsulation.
What about a reference?
If your method cannot fail (i.e. it always return a valid pointer), then you should consider returning the reference instead of the pointer. Something like:
const MyContainedClass & getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return *it;
}
This has nothing to do with encapsulation, though..
:-p
Using an iterator?
Why not return the iterator instead of the pointer? If for you, navigating the list up and down is ok, then the iterator would be better than the pointer, and is used mostly the same way.
Make the iterator a const_iterator if you want to avoid the user modifying the data.
std::list<MyContainedClass>::const_iterator getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return it;
}
The good side would be that the user would be able to navigate the list. The bad side is that the user would know it is a std::list, so...
Scott Meyers in his book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says it's just not worth trying to encapsulate your containers since none of them are completely replaceable for another.
Think good and hard about what you really want MyClass for. I've noticed that some programmers write wrappers for their collections just as a matter of habit, regardless of whether they have any specific needs above and beyond those met by the standard STL collections. If that's your situation, then typedef std::list<MyContainedClass> MyClass and be done with it.
If you do have operations you intend to implement in MyClass, then the success of your encapsulation will depend more on the interface you provide for them than on how you provide access to the underlying list.
No offense meant, but... With the limited information you've provided, it smells like you're punting: exposing internal data because you can't figure out how to implement the operations your client code requires in MyClass... or possibly, because you don't even know yet what operations will be required by your client code. This is a classic problem with trying to write low-level code before the high-level code that requires it; you know what data you'll be working with, but haven't really nailed down exactly what you'll be doing with it yet, so you write a class structure that exposes the raw data all the way to the top. You'd do well to re-think your strategy here.
#cos:
Of course I'm encapsulating
MyContainedClass not just for the sake
of encapsulation. Let's take more
specific example:
Your example does little to allay my fear that you are writing your containers before you know what they'll be used for. Your example container wrapper - Document - has a total of three methods: NewParagraph(), DeleteParagraph(), and GetParagraph(), all of which operate on the contained collection (std::list), and all of which closely mirror operations that std::list provides "out of the box". Document encapsulates std::list in the sense that clients need not be aware of its use in the implementation... but realistically, it is little more than a facade - since you are providing clients raw pointers to the objects stored in the list, the client is still tied implicitly to the implementation.
If we put objects (not pointers) to
container they will be destroyed
automatically (which is good).
Good or bad depends on the needs of your system. What this implementation means is simple: the document owns the Paragraphs, and when a Paragraph is removed from the document any pointers to it immediately become invalid. Which means you must be very careful when implementing something like:
other objects than use collections of
paragraphs, but don't own them.
Now you have a problem. Your object, ParagraphSelectionDialog, has a list of pointers to Paragraph objects owned by the Document. If you are not careful to coordinate these two objects, the Document - or another client by way of the Document - could invalidate some or all of the pointers held by an instance of ParagraphSelectionDialog! There's no easy way to catch this - a pointer to a valid Paragraph looks the same as a pointer to a deallocated Paragraph, and may even end up pointing to a valid - but different - Paragraph instance! Since clients are allowed, and even expected, to retain and dereference these pointers, the Document loses control over them as soon as they are returned from a public method, even while it retains ownership of the Paragraph objects.
This... is bad. You've end up with an incomplete, superficial, encapsulation, a leaky abstraction, and in some ways it is worse than having no abstraction at all. Because you hide the implementation, your clients have no idea of the lifetime of the objects pointed to by your interface. You would probably get lucky most of the time, since most std::list operations do not invalidate references to items they don't modify. And all would be well... until the wrong Paragraph gets deleted, and you find yourself stuck with the task of tracing through the callstack looking for the client that kept that pointer around a little bit too long.
The fix is simple enough: return values or objects that can be stored for as long as they need to be, and verified prior to use. That could be something as simple as an ordinal or ID value that must be passed to the Document in exchange for a usable reference, or as complex as a reference-counted smart pointer or weak pointer... it really depends on the specific needs of your clients. Spec out the client code first, then write your Document to serve.
The Easy way
#cos, For the example you have shown, i would say the easiest way to create this system in C++ would be to not trouble with the reference counting. All you have to do would be to make sure that the program flow first destroys the objects (views) which holds the direct references to the objects (paragraphs) in the collection, before the root Document get destroyed.
The Tough Way
However if you still want to control the lifetimes by reference tracking, you might have to hold references deeper into the hierarchy such that Paragraph objects holds reverse references to the root Document object such that, only when the last paragraph object gets destroyed will the Document object get destructed.
Additionally the paragraph references when used inside the Views class and when passed to other classes, would also have to passed around as reference counted interfaces.
Toughness
This is too much overhead, compared to the simple scheme i listed in the beginning. It avoids all kinds of object counting overheads and more importantly someone who inherits your program does not get trapped in the reference dependency threads traps that criss cross your system.
Alternative Platforms
This kind-of tooling might be easier to perform in a platform that supports and promotes this style of programming like .NET or Java.
You still have to worry about memory
Even with a platform such as this you would still have to ensure your objects get de-referenced in a proper manner. Else outstanding references could eat up your memory in the blink of an eye. So you see, reference counting is not the panacea to good programming practices, though it helps avoid lots of error checks and cleanups, which when applied the whole system considerably eases the programmers task.
Recommendation
That said, coming back to your original question which gave raise to all the reference counting doubts - Is it ok to expose your objects directly from the collection?
Programs cannot exist where all classes / all parts of the program are truly interdependent of each other. No, that would be impossible, as a program is the running manifestation of how your classes / modules interact. The ideal design can only minimize the dependencies and not remove them totally.
So my opinion would be, yes it is not a bad practice to expose the references to the objects from your collection, to other objects that need to work with them, provided you do this in a sane manner
Ensure that only a few classes / parts of your program can get such references to ensure minimum interdependency.
Ensure that the references / pointers passed are interfaces and not concrete objects so that the interdependency is avoided between concrete classes.
Ensure that the references are not further passed along deeper into the program.
Ensure that the program logic takes care of destroying the dependent objects, before cleaning up the actual objects that satisfy those references.
I think the bigger problem is that you're hiding the type of collection so even if you use a collection that doesn't move elements you may change your mind in the future. Externally that's not visible so I'd say it's not a good idea to do this.
std::list will not invalidate any iterators, pointers or references when you add or remove things from the list (apart from any that point the item being removed, obviously), so using a list in this way isn't going to break.
As others have pointed out, you may want not want to be handing out direct access to the private bits of this class. So changing the function to:
const MyContainedClass * getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return &(*it);
}
may be better, or if you always return a valid MyContainedClass object then you could use
const MyContainedClass& getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return *it;
}
to avoid the calling code having to cope with NULL pointers.
STL will be more familiar to a future programmer than your custom encapsulation, so you should avoid doing this if you can. There will be edge cases that you havent thought about which will come up later in the app's lifetime, wheras STL is failry well reviewed and documented.
Additionally most containers support somewhat similar operations like begin end push etc. So it should be fairly trivial to change the container type in your code should you change the container. eg vector to deque or map to hash_map etc.
Assuming you still want to do this for a more deeper reason, i would say the correct way to do this is to implement all the methods and iterator classes that list implements. Forward the calls to the member list calls when you need no changes. Modify and forward or do some custom actions where you need to do something special (the reason why you decide to this in the first place)
It would be easier if STl classes where designed to be inherited from but for efficiency sake it was decided not to do so. Google for "inherit from STL classes" for more thoughts on this.