Efficient alternatives for exposing a Collection - c++

In C++, what alternatives do I have for exposing a collection, from the point of view of performance and data integrity?
My problem is that I want to return an internal list of data to the caller, but I don't want to generate a copy. Thant leaves me with either returning a reference to the list, or a pointer to the list. However, I'm not crazy about letting the caller change the data, I just want to let it read the data.
Do I have to choose between performance and data integrity?
If so, is in general better to go one way or is it particular to the case?
Are there other alternatives?

Many times the caller wants access just to iterate over the collection. Take a page out of Ruby's book and make the iteration a private aspect of your class.
#include <algorithm>
#include <boost/function.hpp>
class Blah
{
public:
void for_each_data(const std::function<void(const mydata&)>& f) const
{
std::for_each(myPreciousData.begin(), myPreciousData.end(), f);
}
private:
typedef std::vector<mydata> mydata_collection;
mydata_collection myPreciousData;
};
With this approach you're not exposing anything about your internals, i.e. that you even have a collection.

RichQ's answer is a reasonable technique, if you're using an array, vector, etc.
If you're using a collection that isn't indexed by ordinal values... or think you might need to at some point in the near future... then you might want to consider exposing your own iterator type(s), and associated begin()/end() methods:
class Blah
{
public:
typedef std::vector<mydata> mydata_collection;
typedef myDataCollection::const_iterator mydata_const_iterator;
// ...
mydata_const_iterator data_begin() const
{ return myPreciousData.begin(); }
mydata_const_iterator data_end() const
{ return myPreciousData.end(); }
private:
mydata_collection myPreciousData;
};
...which you can then use in the normal fashion:
Blah blah;
for (Blah::mydata_const_iterator itr = blah.data_begin();
itr != blah.data_end();
++itr)
{
// ...
}

Maybe something like this?
const std::vector<mydata>& getData()
{
return _myPrivateData;
}
The benefit here is that it's very, very simple, and as safe as you getin C++. You can cast this, like RobQ suggests, but there's nothing you can do that would prevent someone from that if you're not copying. Here, you would have to use const_cast, which is pretty easy to spot if you're looking for it.
Iterators, alternatively, might get you pretty much the same thing, but it's more complicated. The only added benefit of using iterators here (that I can think of) is that you can have better encapsulation.

Use of const reference or shared pointer will only help if the contents of underlying collection do not change over time.
Consider your design. Does the caller really need to see the internal array? Can you restructure the code so that the caller tells object what to do with the array? E.g., if the caller intends to search the array, could the owner object do it?
You could pass a reference to result vector to the function. On some compilers that may result in marginally faster code.
I would recommend trying to redesign first, going with a clean solution second, optimizing for performance third (if necessary).

One advantage of both #Shog9's and #RichQ's solutions is that they de-couple the client from the collection implementation.
If you decide th change your collection type to something else, your clients will still work.

What you want is read-only access without copying the entire blob of data. You have a couple options.
Firstly, you could just return a const refererence to whatever your data container is, like suggested above:
const std::vector<T>& getData() { return mData; }
This has the disadvantage of concreteness: you can't change how you store the data internally without changing the interface of your class.
Secondly, you can return const-ed pointers to the actual data:
const T* getDataAt(size_t index)
{
return &mData[index];
}
This is a bit nicer, but also requires that you provide a getNumItems call, and protect against out-of-bounds indices. Also, the const-ness of your pointers is easily cast away, and your data is now read-write.
Another option is to provide a pair of iterators, which is a bit more complex. This has the same advantages of pointers, as well as not (necessarily) needing to provide a getNumItems call, and there's considerably more work involved to strip the iterators of their const-ness.
Probably the easiest way to manage this is by using a Boost Range:
typedef vector<T>::const_iterator range_iterator_type;
boost::iterator_range< range_iterator_type >& getDataRange()
{
return boost::iterator_range(mData.begin(), mData.end());
}
This has the advantages of ranges being composable, filterable, etc, as you can see on the website.

Using const is a reasonable choice.
You may also wish to check out the boost C++ library for their shared pointer implementation. It provides the advantages of pointers i.e. you may have the requirement to return a shared pointer to "null" which a reference would not allow.
http://www.boost.org/doc/libs/1_36_0/libs/smart_ptr/smart_ptr.htm
In your case you would make the shared pointer's type const to prohibit writes.

If you have a std::list of plain old data (what .NET would call 'value types'), then returning a const reference to that list will be fine (ignoring evil things like const_cast)
If you have a std::list of pointers (or boost::shared_ptr's) then that will only stop you modifying the collection, not the items in the collection. My C++ is too rusty to be able to tell you the answer to that at this point :-(

I suggest using callbacks along the lines of EnumChildWindows. You will have to find some means to prevent the user from changing your data. Maybe use a const pointer/reference.
On the other hand, you could pass a copy of each element to the callback function overwriting the copy each time. (You do not want to generate a copy of your entire collection. I am only suggesting making a copy one element at a time. That shouldn't take much time/memory).
MyClass tmp;
for(int i = 0; i < n; i++){
tmp = elements[i];
callback(tmp);
}

The following two articles elaborate on some of the issues involved in, and the need for, encapsulating container classes. Although they do not provide a complete worked solution, they essentially lead to the same approach as given by Shog9.
Part 1: Encapsulation and Vampires
Part 2 (free registration is now required to read this): Train Wreck Spotting
by Kevlin Henney

Related

Getter for large member variables w/o copying

I have a class containing large member variables. In my case, the large member variable is a container of many objects and it must be private as I don't want to allow a user to modify it directly
class Example {
public:
std::vector<BigObject> get_very_big_object() const { return very_big_object; }
private:
std::vector<BigObject> very_big_object;
}
I want a user to be able to view the object without making a copy:
Example e();
auto very_big_object = e.get_very_big_object(); // Uh oh, made a copy
cout << very_big_object[11]; // Look at any element in the vector etc
I'm a bit confused about the best way to do it. I thought about returning a constant reference, i.e., make my getter:
const std::vector<BigObject>& get_very_big_object() const { return very_big_object; }
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics. But I found that a bit cryptic.
What's the modern best practice for doing this?
I read this article that suggests it could be risky and that that a smart pointer std::unique_ptr could be better, but that this problem can be best solved using modern C++11 move semantics.
On this point, the article is flat-out wrong. A smart pointer does not remove the "risk".
Quick summary of relevant parts of the article
If a class returns a const reference to a data member, client code may introduce a const_cast and thereby change the data member without going through the class' API.
The article proposes (incorrectly) that the above can be avoided by using a smart pointer. The setup is for the class to maintain a shared pointer to the data, and have the getter return that pointer cast to a shared pointer to const data.
Critique of the points
First of all, this does not work. All one has to do is de-reference the smart pointer to get a const reference to the data, which can then be const_cast as before. Using the author's own example, instead of
std::string &evil = const_cast<std::string&>(obj.someStr());
use
std::string &evil = const_cast<std::string&>(*obj.str_ptr());
to get the same data-changing results when returning a smart pointer. The entire article is not wrong, but it does get several points wrong. This is one of them.
Second of all, this is not your concern. When you return a const reference, you are telling client code that this value is not to be changed. If the client code does so anyway, it's the client code that broke the agreement. Essentially, the client code invoked undefined behavior, so your class is free to do anything, even crash the program.
What's the modern best practice for doing this?
Simply return a const reference. (Most rules have exceptions, but in my experience, this one seems to be on target 95-99.9% of the time.)
What I did when I was working on my BDD-library for school is to create a wrapper class called VeryBigObject, which contacts the singleton upon instantiation and hides a reference-counting pointer, from there you can override the operator->() method to allow for direct access to the class's methods.
So something like this
class VeryBigObject {
private:
vector<BigObject>* obj;
public:
VeryBigObject() {
// find a way to instantiate with a pointer, not by copying
}
VeryBigObject(const VeryBigObject& o) {
// Update reference counts
obj = o.obj;
}
virtual VeryBigObject operator->(const VeryBigObject&); // I don't remember how to do this one, just google it.
... // Do other overloads as you see fit to mask working with the pointer directly.
};
This allows you to create a small portable class that you don't have to worry about copying, but also has access to the larger object easily. You'll still need to worry about things like caching and such though

shared_ptr with vector

I currently have vectors such as:
vector<MyClass*> MyVector;
and I access using
MyVector[i]->MyClass_Function();
I would like to make use of shared_ptr. Does this mean all I have to do is change my vector to:
typedef shared_ptr<MyClass*> safe_myclass
vector<safe_myclass>
and I can continue using the rest of my code as it was before?
vector<shared_ptr<MyClass>> MyVector; should be OK.
But if the instances of MyClass are not shared outside the vector, and you use a modern C++11 compiler, vector<unique_ptr<MyClass>> is more efficient than shared_ptr (because unique_ptr doesn't have the ref count overhead of shared_ptr).
Probably just std::vector<MyClass>. Are you
working with polymorphic classes or
can't afford copy constructors or have a reason you can't copy and are sure this step doesn't get written out by the compiler?
If so then shared pointers are the way to go, but often people use this paradigm when it doesn't benefit them at all.
To be complete if you do change to std::vector<MyClass> you may have some ugly maintenance to do if your code later becomes polymorphic, but ideally all the change you would need is to change your typedef.
Along that point, it may make sense to wrap your entire std::vector.
class MyClassCollection {
private : std::vector<MyClass> collection;
public : MyClass& at(int idx);
//...
};
So you can safely swap out not only the shared pointer but the entire vector. Trade-off is harder to input to APIs that expect a vector, but those are ill-designed as they should work with iterators which you can provide for your class.
Likely this is too much work for your app (although it would be prudent if it's going to be exposed in a library facing clients) but these are valid considerations.
Don't immediately jump to shared pointers. You might be better suited with a simple pointer container if you need to avoid copying objects.

How do I return hundreds of values from a C++ function?

In C++, whenever a function creates many (hundreds or thousands of) values, I used to have the caller pass an array that my function then fills with the output values:
void computeValues(int input, std::vector<int>& output);
So, the function will fill the vector output with the values it computes. But this is not really good C++ style, as I'm realizing now.
The following function signature is better because it doesn't commit to using a std::vector, but could use any container:
void computeValues(int input, std::insert_iterator<int> outputInserter);
Now, the caller can call with some inserter:
std::vector<int> values; // or could use deque, list, map, ...
computeValues(input, std::back_inserter(values));
Again, we don't commit to using std::vector specifically, which is nice, because the user might just need the values in a std::set etc. (Should I pass the iterator by value or by reference?)
My question is: Is the insert_iterator the right or standard way to do it? Or is there something even better?
EDIT: I edited the question to make it clear that I'm not talking about returning two or three values, but rather hundreds or thousands. (Imagine you have return all the files you find in a certain directory, or all the edges in a graph etc.)
Response to Edit: Well, if you need to return hundreds and thousands if values, a tuple of course would not be the way to go. Best pick the solution with the iterator then, but it's best not use any specific iterator type.
If you use iterators, you should use them as generic as possible. In your function you have used an insert iterator like insert_iterator< vector<int> >. You lost any genericity. Do it like this:
template<typename OutputIterator>
void computeValues(int input, OutputIterator output) {
...
}
Whatever you give it, it will work now. But it will not work if you have different types in the return set. You can use a tuple then. Also available as std::tuple in the next C++ Standard:
boost::tuple<int, bool, char> computeValues(int input) {
....
}
If the amount of values is variadic and the type of the values is from a fixed set, like (int, bool, char), you can look into a container of boost::variant. This however implies changes only on the call-side. You can keep the iterator style of above:
std::vector< boost::variant<int, bool, char> > data;
computeValues(42, std::back_inserter(data));
You could return a smart pointer to a vector. That should work and no copy of the vector will be made.
If you don't want to keep the smart pointer for the rest of your program, you could simply create a vector before calling the function, and swap both vectors.
Actually, your old method of passing in the vector has a lot to recommend it -- it's efficient, reliable, and easy to understand. The disadvantages are real but don't apply equally in all cases. Are people really going to want the data in an std::set or list? Are they really going to want to use the long list of numbers without bothering to assign it to a variable first (one of the reasons to return something via 'return' rather than a parameter)? Being generic is nice, but there is a cost in your programming time that may not be redeemed.
If you ever have a group of objects, chances are you have at least a few methods that work on that group of objects (otherwise, what are you doing with them?)
If that's the case, it would make sense to have those methods in a class that contain both said objects and methods.
If that makes sense and you have such a class, return it.
I virtually never find myself thinking that I wish I could return more than one value. By the fact that a method should only do one small thing, your parameters and return values tend to have a relationship, and so are more often than not deserving of a class that contains them, so returning more than one value is rarely interesting (Maybe I wished for it 5 times in 20 years--each time I refactored instead, came up with a better result and realized my first attempt was sub-standard.)
One other option is boost::tuple: http://www.boost.org/doc/libs/1_38_0/libs/tuple/doc/tuple_users_guide.html
int x, y;
boost::tie(x,y) = bar();
A stadard container works for homogenous objects 9which you can return).
The standard library way is to abstract an algorithm from the container and use iterators to bridge the gap between.
If you need to pass more than a single type think of structs/classes.
My question is: Is the insert_iterator the right or standard way to do it?
Yes. Otherwise, if you are not going to have at least as many elements in your container as there will be computed values. This is not always possible, specially, if you want to write to a stream. So, you are good.
Your example with insert_iterator won't work, because insert_iterator is a template requiring a container for a parameter. You could declare it
void computeValues(int input, std::insert_iterator<vector<int> > outputInserter);
or
template<class Container>
void computeValues(int input, std::insert_iterator<Container> outputInserter);
The first will tie you back to a vector<int> implementation, without any obvious advantages over your initial code. The second is less restrictive, but implementing as a template will give you other constraints that might make it a less desirable choice.
I'd use something like
std::auto_ptr<std::vector<int> > computeValues(int input);
{
std::auto_ptr<std::vector<int> > r(new std::vector<int>);
r->push_back(...) // Hundreds of these
return r;
}
No copying overhead in the return or risk of leaking (if you use auto_ptr correctly in the caller).
I'd say your new solution is more general, and better style. I'm not sure I'd worry too much about style in c++, more about usability and efficiency.
If you're returning a lot of items, and know the size, using a vector would allow you to reserve the memory in one allocation, which may or may not be worth it.

What you think about throwing an exception for not found in C++?

I know most people think that as a bad practice but when you are trying to make your class public interface only work with references, keeping pointers inside and only when necessary, I think there is no way to return something telling that the value you are looking doesn't exist in the container.
class list {
public:
value &get(type key);
};
Let's think that you don't want to have dangerous pointers being saw in the public interface of the class, how do you return a not found in this case, throwing an exception?
What is your approach to that? Do you return an empty value and check for the empty state of it? I actually use the throw approach but I introduce a checking method:
class list {
public:
bool exists(type key);
value &get(type key);
};
So when I forget to check that the value exists first I get an exception, that is really an exception.
How would you do it?
The STL deals with this situation by using iterators. For example, the std::map class has a similar function:
iterator find( const key_type& key );
If the key isn't found, it returns 'end()'. You may want to use this iterator approach, or to use some sort of wrapper for your return value.
The correct answer (according to Alexandrescu) is:
Optional and Enforce
First of all, do use the Accessor, but in a safer way without inventing the wheel:
boost::optional<X> get_X_if_possible();
Then create an enforce helper:
template <class T, class E>
T& enforce(boost::optional<T>& opt, E e = std::runtime_error("enforce failed"))
{
if(!opt)
{
throw e;
}
return *opt;
}
// and an overload for T const &
This way, depending on what might the absence of the value mean, you either check explicitly:
if(boost::optional<X> maybe_x = get_X_if_possible())
{
X& x = *maybe_x;
// use x
}
else
{
oops("Hey, we got no x again!");
}
or implicitly:
X& x = enforce(get_X_if_possible());
// use x
You use the first way when you’re concerned about efficiency, or when you want to handle the failure right where it occurs. The second way is for all other cases.
The problem with exists() is that you'll end up searching twice for things that do exist (first check if it's in there, then find it again). This is inefficient, particularly if (as its name of "list" suggests) your container is one where searching is O(n).
Sure, you could do some internal caching to avoid the double search, but then your implementation gets messier, your class becomes less general (since you've optimised for a particular case), and it probably won't be exception-safe or thread-safe.
Don't use an exception in such a case. C++ has a nontrivial performance overhead for such exceptions, even if no exception is thrown, and it additially makes reasoning about the code much harder (cf. exception safety).
Best-practice in C++ is one of the two following ways. Both get used in the STL:
As Martin pointed out, return an iterator. Actually, your iterator can well be a typedef for a simple pointer, there's nothing speaking against it; in fact, since this is consistent with the STL, you could even argue that this way is superior to returning a reference.
Return a std::pair<bool, yourvalue>. This makes it impossible to modify the value, though, since a copycon of the pair is called which doesn't work with referende members.
/EDIT:
This answer has spawned quite some controversy, visible from the comments and not so visible from the many downvotes it got. I've found this rather surprising.
This answer was never meant as the ultimate point of reference. The “correct” answer had already been given by Martin: execeptions reflect the behaviour in this case rather poorly. It's semantically more meaningful to use some other signalling mechanism than exceptions.
Fine. I completely endorse this view. No need to mention it once again. Instead, I wanted to give an additional facet to the answers. While minor speed boosts should never be the first rationale for any decision-making, they can provide further arguments and in some (few) cases, they may even be crucial.
Actually, I've mentioned two facets: performance and exception safety. I believe the latter to be rather uncontroversial. While it's extremely hard to give strong exceptions guarantees (the strongest, of course, being “nothrow”), I believe it's essential: any code that is guaranteed to not throw exceptions makes the whole program easier to reason about. Many C++ experts emphasize this (e.g. Scott Meyers in item 29 of “Effective C++”).
About speed. Martin York has pointed out that this no longer applies in modern compilers. I respectfully disagree. The C++ language makes it necessary for the environment to keep track, at runtime, of code paths that may be unwound in the case of an exception. Now, this overhead isn't really all that big (and it's quite easy to verify this). “nontrivial” in my above text may have been too strong.
However, I find it important to draw the distinction between languages like C++ and many modern, “managed” languages like C#. The latter has no additional overhead as long as no exception is thrown because the information necessary to unwind the stack is kept anyway. By and large, stand by my choice of words.
STL Iterators?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
If you're accessor is one among many, then iterators are the way to go, because you will be able to use them to move in the container. But if your accessor is a simple getter, able to return either the value or the fact there is no value, then your iterator is perhaps only a glorified pointer...
Which leads us to...
Smart pointers?
The point of smart pointers is to simplify pointer ownership. With a shared pointer, you'll get a ressource (memory) which will be shared, at the cost of an overhead (shared pointers needs to allocate an integer as a reference counter...).
You have to choose: Either your Value is already inside a shared pointer, and then, you can return this shared pointer (or a weak pointer). Or Your value is inside a raw pointer. Then you can return the row pointer. You don't want to return a shared pointer if your ressource is not already inside a shared pointer: A World of funny things will happen when your shared pointer will get out of scope an delete your Value without telling you...
:-p
Pointers?
If your interface is clear about its ownership of its ressources, and by the fact the returned value can be NULL, then you could return a simple, raw pointer. If the user of your code is dumb enough ignore the interface contract of your object, or to play arithmetics or whatever with your pointer, then he/she will be dumb enough to break any other way you'll choose to return the value, so don't bother with the mentally challenged...
Undefined Value
Unless your Value type really has already some kind of "undefined" value, and the user knows that, and will accept to handle that, it is a possible solution, similar to the pointer or iterator solution.
But do not add a "undefined" value to your Value class because of the problem you asked: You'll end up raising the "references vs. pointer" war to another level of insanity. Code users want the objects you give them to either be Ok, or to not exist. Having to test every other line of code this object is still valid is a pain, and will complexify uselessly the user code, by your fault.
Exceptions
Exceptions are usually not as costly as some people would like them to be. But for a simple accessor, the cost could be not trivial, if your accessor is used often.
For example, the STL std::vector has two accessors to its value through an index:
T & std::vector::operator[]( /* index */ )
and:
T & std::vector::at( /* index */ )
The difference being that the [] is non-throwing . So, if you access outside the range of the vector, you're on your own, probably risking memory corruption, and a crash sooner or later. So, you should really be sure you verified the code using it.
On the other hand, at is throwing. This means that if you access outside the range of the vector, then you'll get a clean exception. This method is better if you want to delegate to another code the processing of an error.
I use personnaly the [] when I'm accessing the values inside a loop, or something similar. I use at when I feel an exception is the good way to return the current code (or the calling code) the fact something went wrong.
So what?
In your case, you must choose:
If you really need a lightning-fast access, then the throwing accessor could be a problem. But this means you already used a profiler on your code to determinate this is a bottleneck, doesn't it?
;-)
If you know that not having a value can happen often, and/or you want your client to propagate a possible null/invalid/whatever semantic pointer to the value accessed, then return a pointer (if your value is inside a simple pointer) or a weak/shared pointer (if your value is owned by a shared pointer).
But if you believe the client won't propagate this "null" value, or that they should not propagate a NULL pointer (or smart pointer) in their code, then use the reference protected by the exception. Add a "hasValue" method returning a boolean, and add a throw should the user try the get the value even if there is none.
Last but not least, consider the code that will be used by the user of your object:
// If you want your user to have this kind of code, then choose either
// pointer or smart pointer solution
void doSomething(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
}
MyValue * doSomethingElseAndReturnValue(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
return pValue ;
}
// ==========================================================
// If you want your user to have this kind of code, then choose the
// throwing reference solution
void doSomething(MyClass & p_oMyClass)
{
if(p_oMyClass.hasValue())
{
MyValue & oValue = p_oMyClass.getValue() ;
}
}
So, if your main problem is choosing between the two user codes above, your problem is not about performance, but "code ergonomy". Thus, the exception solution should not be put aside because of potential performance issues.
:-)
Accessor?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
I agree with paercebal, an iterator is to iterate. I don't like the way STL does. But the idea of an accessor seems more appealing. So what we need? A container like class that feels like a boolean for testing but behaves like the original return type. That would be feasible with cast operators.
template <T> class Accessor {
public:
Accessor(): _value(NULL)
{}
Accessor(T &value): _value(&value)
{}
operator T &() const
{
if (!_value)
throw Exception("that is a problem and you made a mistake somewhere.");
else
return *_value;
}
operator bool () const
{
return _value != NULL;
}
private:
T *_value;
};
Now, any foreseeable problem? An example usage:
Accessor <type> value = list.get(key);
if (value) {
type &v = value;
v.doSomething();
}
How about returning a shared_ptr as the result. This can be null if the item wasn't found. It works like a pointer, but it will take care of releasing the object for you.
(I realize this is not always the right answer, and my tone a bit strong, but you should consider this question before deciding for other more complex alternatives):
So, what's wrong with returning a pointer?
I've seen this one many times in SQL, where people will do their earnest to never deal with NULL columns, like they have some contagious decease or something. Instead, they cleverly come up with a "blank" or "not-there" artificial value like -1, 9999 or even something like '#X-EMPTY-X#'.
My answer: the language already has a construct for "not there"; go ahead, don't be afraid to use it.
what I prefer doing in situations like this is having a throwing "get" and for those circumstances where performance matter or failiure is common have a "tryGet" function along the lines of "bool tryGet(type key, value **pp)" whoose contract is that if true is returned then *pp == a valid pointer to some object else *pp is null.
#aradtke, you said.
I agree with paercebal, an iterator is
to iterate. I don't like the way STL
does. But the idea of an accessor
seems more appealing. So what we need?
A container like class that feels like
a boolean for testing but behaves like
the original return type. That would
be feasible with cast operators. [..] Now,
any foreseeable problem?
First, YOU DO NOT WANT OPERATOR bool. See Safe Bool idiom for more info. But about your question...
Here's the problem, users need to now explict cast in cases. Pointer-like-proxies (such as iterators, ref-counted-ptrs, and raw pointers) have a concise 'get' syntax. Providing a conversion operator is not very useful if callers have to invoke it with extra code.
Starting with your refence like example, the most concise way to write it:
// 'reference' style, check before use
if (Accessor<type> value = list.get(key)) {
type &v = value;
v.doSomething();
}
// or
if (Accessor<type> value = list.get(key)) {
static_cast<type&>(value).doSomething();
}
This is okay, don't get me wrong, but it's more verbose than it has to be. now consider if we know, for some reason, that list.get will succeed. Then:
// 'reference' style, skip check
type &v = list.get(key);
v.doSomething();
// or
static_cast<type&>(list.get(key)).doSomething();
Now lets go back to iterator/pointer behavior:
// 'pointer' style, check before use
if (Accessor<type> value = list.get(key)) {
value->doSomething();
}
// 'pointer' style, skip check
list.get(key)->doSomething();
Both are pretty good, but pointer/iterator syntax is just a bit shorter. You could give 'reference' style a member function 'get()'... but that's already what operator*() and operator->() are for.
The 'pointer' style Accessor now has operator 'unspecified bool', operator*, and operator->.
And guess what... raw pointer meets these requirements, so for prototyping, list.get() returns T* instead of Accessor. Then when the design of list is stable, you can come back and write the Accessor, a pointer-like Proxy type.
Interesting question. It's a problem in C++ to exclusively use references I guess - in Java the references are more flexible and can be null. I can't remember if it's legal C++ to force a null reference:
MyType *pObj = nullptr;
return *pObj
But I consider this dangerous. Again in Java I'd throw an exception as this is common there, but I rarely see exceptions used so freely in C++.
If I was making a puclic API for a reusable C++ component and had to return a reference, I guess I'd go the exception route.
My real preference is to have the API return a pointer; I consider pointers an integral part of C++.

Is it a good (correct) way to encapsulate a collection?

class MyContainedClass {
};
class MyClass {
public:
MyContainedClass * getElement() {
// ...
std::list<MyContainedClass>::iterator it = ... // retrieve somehow
return &(*it);
}
// other methods
private:
std::list<MyContainedClass> m_contained;
};
Though msdn says std::list should not perform relocations of elements on deletion or insertion, is it a good and common way to return pointer to a list element?
PS: I know that I can use collection of pointers (and will have to delete elements in destructor), collection of shared pointers (which I don't like), etc.
I don't see the use of encapsulating this, but that may be just me. In any case, returning a reference instead of a pointer makes a lot more sense to me.
In a general sort of way, if your "contained class" is truly contained in your "MyClass", then MyClass should not be allowing outsiders to touch its private contents.
So, MyClass should be providing methods to manipulate the contained class objects, not returning pointers to them. So, for example, a method such as "increment the value of the umpteenth contained object", rather than "here is a pointer to the umpteenth contained object, do with it as you wish".
It depends...
It depends on how much encapsulated you want your class to be, and what you want to hide, or show.
The code I see seems ok for me. You're right about the fact the std::list's data and iterators won't be invalidated in case of another data/iterator's modification/deletion.
Now, returning the pointer would hide the fact you're using a std::list as an internal container, and would not let the user to navigate its list. Returning the iterator would let more freedom to navigate this list for the users of the class, but they would "know" they are accessing a STL container.
It's your choice, there, I guess.
Note that if it == std::list<>.end(), then you'll have a problem with this code, but I guess you already know that, and that this is not the subject of this discussion.
Still, there are alternative I summarize below:
Using const will help...
The fact you return a non-const pointer lets the user of you object silently modify any MyContainedClass he/she can get his/her hands on, without telling your object.
Instead or returning a pointer, you could return a const pointer (and suffix your method with const) to stop the user from modifying the data inside the list without using an accessor approved by you (a kind of setElement ?).
const MyContainedClass * getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return &(*it);
}
This will increase somewhat the encapsulation.
What about a reference?
If your method cannot fail (i.e. it always return a valid pointer), then you should consider returning the reference instead of the pointer. Something like:
const MyContainedClass & getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return *it;
}
This has nothing to do with encapsulation, though..
:-p
Using an iterator?
Why not return the iterator instead of the pointer? If for you, navigating the list up and down is ok, then the iterator would be better than the pointer, and is used mostly the same way.
Make the iterator a const_iterator if you want to avoid the user modifying the data.
std::list<MyContainedClass>::const_iterator getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return it;
}
The good side would be that the user would be able to navigate the list. The bad side is that the user would know it is a std::list, so...
Scott Meyers in his book Effective STL: 50 Specific Ways to Improve Your Use of the Standard Template Library says it's just not worth trying to encapsulate your containers since none of them are completely replaceable for another.
Think good and hard about what you really want MyClass for. I've noticed that some programmers write wrappers for their collections just as a matter of habit, regardless of whether they have any specific needs above and beyond those met by the standard STL collections. If that's your situation, then typedef std::list<MyContainedClass> MyClass and be done with it.
If you do have operations you intend to implement in MyClass, then the success of your encapsulation will depend more on the interface you provide for them than on how you provide access to the underlying list.
No offense meant, but... With the limited information you've provided, it smells like you're punting: exposing internal data because you can't figure out how to implement the operations your client code requires in MyClass... or possibly, because you don't even know yet what operations will be required by your client code. This is a classic problem with trying to write low-level code before the high-level code that requires it; you know what data you'll be working with, but haven't really nailed down exactly what you'll be doing with it yet, so you write a class structure that exposes the raw data all the way to the top. You'd do well to re-think your strategy here.
#cos:
Of course I'm encapsulating
MyContainedClass not just for the sake
of encapsulation. Let's take more
specific example:
Your example does little to allay my fear that you are writing your containers before you know what they'll be used for. Your example container wrapper - Document - has a total of three methods: NewParagraph(), DeleteParagraph(), and GetParagraph(), all of which operate on the contained collection (std::list), and all of which closely mirror operations that std::list provides "out of the box". Document encapsulates std::list in the sense that clients need not be aware of its use in the implementation... but realistically, it is little more than a facade - since you are providing clients raw pointers to the objects stored in the list, the client is still tied implicitly to the implementation.
If we put objects (not pointers) to
container they will be destroyed
automatically (which is good).
Good or bad depends on the needs of your system. What this implementation means is simple: the document owns the Paragraphs, and when a Paragraph is removed from the document any pointers to it immediately become invalid. Which means you must be very careful when implementing something like:
other objects than use collections of
paragraphs, but don't own them.
Now you have a problem. Your object, ParagraphSelectionDialog, has a list of pointers to Paragraph objects owned by the Document. If you are not careful to coordinate these two objects, the Document - or another client by way of the Document - could invalidate some or all of the pointers held by an instance of ParagraphSelectionDialog! There's no easy way to catch this - a pointer to a valid Paragraph looks the same as a pointer to a deallocated Paragraph, and may even end up pointing to a valid - but different - Paragraph instance! Since clients are allowed, and even expected, to retain and dereference these pointers, the Document loses control over them as soon as they are returned from a public method, even while it retains ownership of the Paragraph objects.
This... is bad. You've end up with an incomplete, superficial, encapsulation, a leaky abstraction, and in some ways it is worse than having no abstraction at all. Because you hide the implementation, your clients have no idea of the lifetime of the objects pointed to by your interface. You would probably get lucky most of the time, since most std::list operations do not invalidate references to items they don't modify. And all would be well... until the wrong Paragraph gets deleted, and you find yourself stuck with the task of tracing through the callstack looking for the client that kept that pointer around a little bit too long.
The fix is simple enough: return values or objects that can be stored for as long as they need to be, and verified prior to use. That could be something as simple as an ordinal or ID value that must be passed to the Document in exchange for a usable reference, or as complex as a reference-counted smart pointer or weak pointer... it really depends on the specific needs of your clients. Spec out the client code first, then write your Document to serve.
The Easy way
#cos, For the example you have shown, i would say the easiest way to create this system in C++ would be to not trouble with the reference counting. All you have to do would be to make sure that the program flow first destroys the objects (views) which holds the direct references to the objects (paragraphs) in the collection, before the root Document get destroyed.
The Tough Way
However if you still want to control the lifetimes by reference tracking, you might have to hold references deeper into the hierarchy such that Paragraph objects holds reverse references to the root Document object such that, only when the last paragraph object gets destroyed will the Document object get destructed.
Additionally the paragraph references when used inside the Views class and when passed to other classes, would also have to passed around as reference counted interfaces.
Toughness
This is too much overhead, compared to the simple scheme i listed in the beginning. It avoids all kinds of object counting overheads and more importantly someone who inherits your program does not get trapped in the reference dependency threads traps that criss cross your system.
Alternative Platforms
This kind-of tooling might be easier to perform in a platform that supports and promotes this style of programming like .NET or Java.
You still have to worry about memory
Even with a platform such as this you would still have to ensure your objects get de-referenced in a proper manner. Else outstanding references could eat up your memory in the blink of an eye. So you see, reference counting is not the panacea to good programming practices, though it helps avoid lots of error checks and cleanups, which when applied the whole system considerably eases the programmers task.
Recommendation
That said, coming back to your original question which gave raise to all the reference counting doubts - Is it ok to expose your objects directly from the collection?
Programs cannot exist where all classes / all parts of the program are truly interdependent of each other. No, that would be impossible, as a program is the running manifestation of how your classes / modules interact. The ideal design can only minimize the dependencies and not remove them totally.
So my opinion would be, yes it is not a bad practice to expose the references to the objects from your collection, to other objects that need to work with them, provided you do this in a sane manner
Ensure that only a few classes / parts of your program can get such references to ensure minimum interdependency.
Ensure that the references / pointers passed are interfaces and not concrete objects so that the interdependency is avoided between concrete classes.
Ensure that the references are not further passed along deeper into the program.
Ensure that the program logic takes care of destroying the dependent objects, before cleaning up the actual objects that satisfy those references.
I think the bigger problem is that you're hiding the type of collection so even if you use a collection that doesn't move elements you may change your mind in the future. Externally that's not visible so I'd say it's not a good idea to do this.
std::list will not invalidate any iterators, pointers or references when you add or remove things from the list (apart from any that point the item being removed, obviously), so using a list in this way isn't going to break.
As others have pointed out, you may want not want to be handing out direct access to the private bits of this class. So changing the function to:
const MyContainedClass * getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return &(*it);
}
may be better, or if you always return a valid MyContainedClass object then you could use
const MyContainedClass& getElement() const {
// ...
std::list<MyContainedClass>::const_iterator it = ... // retrieve somehow
return *it;
}
to avoid the calling code having to cope with NULL pointers.
STL will be more familiar to a future programmer than your custom encapsulation, so you should avoid doing this if you can. There will be edge cases that you havent thought about which will come up later in the app's lifetime, wheras STL is failry well reviewed and documented.
Additionally most containers support somewhat similar operations like begin end push etc. So it should be fairly trivial to change the container type in your code should you change the container. eg vector to deque or map to hash_map etc.
Assuming you still want to do this for a more deeper reason, i would say the correct way to do this is to implement all the methods and iterator classes that list implements. Forward the calls to the member list calls when you need no changes. Modify and forward or do some custom actions where you need to do something special (the reason why you decide to this in the first place)
It would be easier if STl classes where designed to be inherited from but for efficiency sake it was decided not to do so. Google for "inherit from STL classes" for more thoughts on this.