C++ Is using auto_ptr references as out variables idiomatic? - c++

Suppose I want to write factory method that is supposed to allocate heterogeneous objects on the heap and return them to the caller. I am thinking of designing the API like this:
bool MakeEm(auto_ptr<Foo>& outFoo, auto_ptr<Bar>& outBar) {
...
if (...) {
return false;
}
outFoo.reset(new Foo(...));
outBar.reset(new Bar(...));
return true;
}
This allows a caller to do this:
auto_ptr<Foo> foo;
auto_ptr<Bar> bar;
MakeEm(foo, bar);
My question is: "Is this idiomatic? If not, what is the right way to do this?"
The alternative approaches I can think of include returning a struct of auto_ptrs, or writing the factory API to take raw pointer references. They both require writing more code, and the latter has other gotchyas when it comes to exception safety.

Asking of something is idiomatic can get you some very subjective answers.
In general, however, I think auto_ptr is a great way to convey ownership, so as a return from a class factory - it's probably a Good Thing.
I would want to refactor this, such that
You return one object instead of 2. If you need 2 objects that are so tightly coupled they cannot exist without each other I'd say you have a strong case for is-a or has-a refactoring.
This is C++. Really ask yourself if you should return a value indicating success, forcing the consumer of your factory to have to check every time. Throw exceptions or pass exceptions from the constructors of your classes in the factory. Would you ever want to be OK with false and try to operate on uninitialized auto_ptr's?

You don't have to make up your own struct to return two values - you can use std::pair. In that case there isn't much syntactic overhead in returning the two values. This solution does have the problem that ".first" and ".second" aren't very descriptive names, but if the types involved and the name of the function make the intent clear enough then that's not necessarily a problem.
If you are using C++0x you could use unique_ptr insted of auto_ptr and the caller can use auto instead of having to type the longer std::pair<std::unique_ptr<A>, std::unique_ptr<B>>. If you are not using C++0x you might consider using a typedef for that instead.
If you return the two values then you won't have space for the bool. You could use a C++0x tuple to return all three values. You could also indicate error by throwing an exception or by returning null pointers. I would prefer an exception assuming that the error is rare/exceptional.
As other answers have pointed out, it is often preferable to have two separate functions that each return a single object. If you can't do that because the initialization of the two objects is inextricably linked then you could make a class that encapsulates the initialization. You could pass the necessary information to make the two objects to the constructor (requires exception to signal errors) and then have two methods on that class that yield one object each.

Let's assume that a return value of false means "don't look at the output parameters".
Then what I would do is get rid of the bool return value, return a struct or pair that has the auto_pointers you want, and throw in the error condition.

Usually when you have auto_ptr parameters they are not references.
This is because when you pass something to a function that takes auto_ptr you are expecting that function to take ownership. If you are passing by reference it does not actually take the object (it may take the object).
Its a subtle point, but in the end you need to look at what your interface is trying to say to the user.
Also you seem to be using it as an out parameter.
Personally I have never seen this use case (but I can see it) just document what you are trying to do and more importantly why.

As a general rule, if it involves auto_ptr, it's not idiomatic. In general, the structure is not idiomatic too- normally, you'd make one function for each, return by value and throw an exception if they fail, and if you need to share variables, make it an object.

Related

how to make sure that a function won't get a garbage pointer?

I have a function that accepts a pointer and return an enum depending on some conditions related to that pointer:
my_enum function(char* prt)
{
/* function body*/
if (condition1) return enum1;
if (condition2) return enum2;
if (condition3) return enum3;
if (condition4) return enum4;
else return enum5;
}
I have another function which also accepts a pointer, invokes my_function and reacts to the obtained value:
void another_function(char* ptr)
{
my_enum result = function(ptr);
if (result == MY_VALUE) std::cout<<"OK"<<endl;
}
I'm running Valgrind to check for memory leaks. The above code results in the following error:
Conditional jump depends on an uninitialized variable.
In fact, it is possible to pass an uninitialized pointer to the function function.
My question is: What is the best way of dealing with this situation (apart from using references instead)? I can't make sure that everyone who will use that code, will initialize the pointer that he will pass to the function. I can't check inside my function if the pointer points to some garbage (I'm checking whether it is a null pointer though) as well.
Should I ignore such errors? If they are useless, why does Valgrind bother to inform me about them? There must be something I can do.
How far are you willing to go? If someone WANTS to break your code, they will, you can't help it.
The more efficient protections you apply the more difficult they get.
The one simple one is to check for NULL. That doesn't prevent stupid pointers, but it prevents ones consciously invalidated. Most people are satisfied by that.
Then you may give the pointer a wrapper class. Instantiating this class requires a valid object pointed to (or some hopeless jumping through hoops to give it an invalid one, which amounts to purposefully shooting your foot), so no scenario of uninitialized pointer can occur - but the object can cease to exist before its pointer is used.
Then you can maintain a factory/manager class for these objects and their pointers. Every time the pointer destination object is created or destroyed, its pointer is created or invalidated. This will be fail-proof unless your code is multi-threading and destruction can occur while your function is already past the checks and before using the validated value.
Then you can add thread safety, wrapping both your function and the manager in mutexes. This adds all kinds of headaches related to deadlocks and synchronization. But the user must really try very hard to create a class derived from yours (probably with #define private public first) that overrides its safety features...
With each step your overhead climbs to levels where the effect really stops being worth the effort. So just check that pointer for NULL and stop worrying about others out to get you.
Opinions will vary on what is the "best" approach, since it is impossible to prevent someone passing a bad (e.g. uninitialised, dangling) pointer at all.
A common solution is to avoid raw pointers altogether, and write the function in a way that does not accept a pointer at all.
One way is to accept a reference. Writing your code so it doesn't use raw pointers at all makes it harder to call your function with a bad parameter. The limitation is that the caller can still create a bad reference (e.g. by dereferencing a bad pointer) but it takes more effort (or a longer sequence of mistakes if done unwittingly) to pass a bad reference to a function than it does to pass a bad pointer.
Another way is to accept some class object by value (or reference, in some cases) to hold your pointer. Then implement all member functions of that so that they prevent a situation of holding a bad pointer. Give that class no member functions that accept a pointer. Ensure the constructors and other member functions maintain consistency (formally, the constructors establish a rigorous set of invariants, other member functions maintain that set of invariants). This includes techniques like throwing an exception if an attempt is made to construct an object using bad data (if an exception is thrown in the process of constructing an object, that object never exists, and cannot be passed in any manner to your function). As a result, your function can assume - if it is successfully called - that the data it receives is valid.
The thing is, the above make it harder to accidentally pass bad data to your function. No technique can absolutely prevent someone who is determined enough (whether through genius or foolishness) to find a way to bypass all the safeguards, and to pass bad data to your function.
There are essentially two solutions.
Expect a valid pointer and state that clearly in the documentation of your API. Then any invalid use will cause UB, but it's not your fault. However, handling raw pointers is C-style and frowned upon by C++ programmers.
Take (the reference to) an encapsulated pointer type, which is always sensibly initialised, such as std::string (instead of const char*), std::unique_ptr, or std::shared_ptr. For example,
my_enum function(std::string const&str)
{
/* function body*/
if (str.empty()) // deal with improper input
std::cerr<<"warning: empty string in function()"<<std::endl;
if (condition1) return enum1;
if (condition2) return enum2;
if (condition3) return enum3;
if (condition4) return enum4;
else return enum5;
}
or
my_enum function(std::unique_ptr<SomeType> const&ptr)
{
/* function body*/
if (!ptr) { // deal with improper input
std::cerr<<"warning: invalid pointer in function()"<<std::endl;
return enum_error;
}
if (condition1) return enum1;
if (condition2) return enum2;
if (condition3) return enum3;
if (condition4) return enum4;
else return enum5;
}
This avoids raw pointers and is the C++ way for dealing with this sort of situation. One problem with the latter code is that it only works for unique_ptr arguments. One may generalise this to be overloaded (using SFINAE or otherwise) to take (const reference to) any auto-pointer like objects (for instance defined as objects obj with member obj::get() const returning a const obj::element_type*).

c++ function that can't return a meaningful value under some conditions

I have a member function with an object type as the return value type:
MyObject myfunction(parameters) {
if (some condition) {
return MyObject(parameters);
} else {
... no valid object can be created ...
}
}
Under certain conditions (checked for in the function body) an object of type MyObject cannot be created and returned.
Beeing just an occasional c++ programmer I can spontaneously come up with three solutions:
Changing the return value type to * MyObject and returning nullptr if no valid object can be created (C++11), then checking for equality to nullptr in the calling code.
Throwing an exception if no object can be created and catching that one in the calling code.
Creating an object with some values that I define as invalid and check for that before using the returned object.
What would be the standard way of dealing with such a situation and the best solution in terms of performance? ... or some obvious work-around that I just don't see ...
A state-of-the-art C++11 solution would be perfect :-)
My thoughts so far:
Solution 1 seems OK, but is C++11 only and I would have to create the returned object on the heap in order to be able to pass it to the main program (returning the object itself to the calling function, thus keeping it in the stack might be quicker for small objects?).
Solution 2 might be slower and leads to verbose coding in the main program.
Solution 3 is probably the slowest (an object is created in vain) and not very convenient to check for in the main program.
For my code no valid return object is rather the default situation than the exception and the created object is rather small, but general considerations considering different cases are certainly useful for other readers' applications ...
Thanks a lot to all of you for help :-)
In the usual case, returning a Boost.Optional works:
boost::optional<MyObject> myfunction(parameters) {
if (some condition) {
return MyObject(parameters);
} else {
return boost::none;
}
}
And at the call site:
auto ret = myfunction(...);
if(ret)
// use '*ret' or 'ret.get()'
But as R. Martinho mentions, there are drawbacks to this solution (namely, move-only types don't work because Boost.Optional is not yet updated to support move-semantics).
All 3 of your suggested solutions are valid and common, depending on the circumstances.
If being unable to create the object is an error condition that is likely to cause the calling function to have to abort, back up and retry, or take other drastic measures, then throw an exception.
If inability to create the object is a routine event, and you expect the caller to check if an object was created and proceed fairly normally in either case, returning null is a good solution.
If there's a reasonable dummy or blank object that can be created, that's a fine solution. But this is pretty rare. You should only do this if the caller will actually process the dummy object.
If you return a null pointer and then you find that every place you call this function you are writing
MyObject* myobject=myfunction(whatever);
if (myobject==null) throw new PanicException;
Then you might as well just throw the exception inside the function.
Worse, if you are writing:
MyObject* myobject=myfunction(whatever);
if (myobject!=null)
{
... process it ...
}
else
{
... display error message ...
}
Then you are just simulating exception handling with an IF statement. Use a real exception.
On the other hand, if you throw an exception and then you find you are regularly writing:
MyObject* myobject;
try
{
myobject=myfunction(whatever);
}
catch (PanicException pe)
{
myobject=null;
}
Well then, you would have been better off to just return the null.
I've occasionally created dummy objects. The most common case is when a function returns a collection, like an array or linked list, and if I find no data to put in the collection, then return a collection with zero elements. Then the caller loops through the elements in the collection, and if there are none, that's just fine. I've had a few cases where I've returned an object with a zero-length string for the name or customer id or whatever. But in general, if you're just returning a dummy object so that the caller can test and say, oh, it's a dummy object, and then throw it away, I think you're better off to return null.
BTW not sure what you meant when you said that you could only return a null pointer in C++11. The ability to pass around nulls goes back to the earliest version of C++ that I ever saw.
Since your question is phrased in generalities, I will also respond in generalities.
If you have a function whose job it is to create and return an object, then that is it's job.
Now if you want to design this function in such a way so that when certian conditions needed to build the object are not met then not return the object, you have actually changed the semantics of this function. Now instead on just one responsibility, it has three:
Determine if the right conditions exist to construct the object
If yes, construct and return the object
If no, return nothing, or some condition value that indicates non-creation
The "Single Responsibility Principle" suggests that in general, good design dictates that one function (or class or what have you) should have one job to do. Here, your function has three.
I would suggest that none of your suggested approaches is best in general. Rather, I would go with:
4: Implement a separate function to determine the eligibility to
construct the object. If that function returns true, then call
myFunction which constructs the object & returns it.
The first solution should be used except if the conditions that prevent the creation of the object are exceptional. Otherwise returning a NULL pointer is a perfectly valid solution... even in C++11.
The "pointer" should of course be wrapped using an RAII container such as a std::unique_ptr. Should be common practice for you code really.
The third solution is a total waste of resources if you ask me. You would have to create an invalid (not useful) object, and copy it for the return value... only for it to be discarded.

How to avoid out parameters?

I've seen numerous arguments that using a return value is preferable to out parameters. I am convinced of the reasons why to avoid them, but I find myself unsure if I'm running into cases where it is unavoidable.
Part One of my question is: What are some of your favorite/common ways of getting around using an out parameter? Stuff along the lines: Man, in peer reviews I always see other programmers do this when they could have easily done it this way.
Part Two of my question deals with some specific cases I've encountered where I would like to avoid an out parameter but cannot think of a clean way to do so.
Example 1:
I have a class with an expensive copy that I would like to avoid. Work can be done on the object and this builds up the object to be expensive to copy. The work to build up the data is not exactly trivial either. Currently, I will pass this object into a function that will modify the state of the object. This to me is preferable to new'ing the object internal to the worker function and returning it back, as it allows me to keep things on the stack.
class ExpensiveCopy //Defines some interface I can't change.
{
public:
ExpensiveCopy(const ExpensiveCopy toCopy){ /*Ouch! This hurts.*/ };
ExpensiveCopy& operator=(const ExpensiveCopy& toCopy){/*Ouch! This hurts.*/};
void addToData(SomeData);
SomeData getData();
}
class B
{
public:
static void doWork(ExpensiveCopy& ec_out, int someParam);
//or
// Your Function Here.
}
Using my function, I get calling code like this:
const int SOME_PARAM = 5;
ExpensiveCopy toModify;
B::doWork(toModify, SOME_PARAM);
I'd like to have something like this:
ExpensiveCopy theResult = B::doWork(SOME_PARAM);
But I don't know if this is possible.
Second Example:
I have an array of objects. The objects in the array are a complex type, and I need to do work on each element, work that I'd like to keep separated from the main loop that accesses each element. The code currently looks like this:
std::vector<ComplexType> theCollection;
for(int index = 0; index < theCollection.size(); ++index)
{
doWork(theCollection[index]);
}
void doWork(ComplexType& ct_out)
{
//Do work on the individual element.
}
Any suggestions on how to deal with some of these situations? I work primarily in C++, but I'm interested to see if other languages facilitate an easier setup. I have encountered RVO as a possible solution, but I need to read up more on it and it sounds like a compiler specific feature.
I'm not sure why you're trying to avoid passing references here. It's pretty much these situations that pass-by-reference semantics exist.
The code
static void doWork(ExpensiveCopy& ec_out, int someParam);
looks perfectly fine to me.
If you really want to modify it then you've got a couple of options
Move doWork so that's it's a member of ExpensiveCopy (which you say you can't do, so that's out)
return a (smart) pointer from doWork instead of copying it. (which you don't want to do as you want to keep things on the stack)
Rely on RVO (which others have pointed out is supported by pretty much all modern compilers)
Every useful compiler does RVO (return value optimization) if optimizations are enabled, thus the following effectively doesn't result in copying:
Expensive work() {
// ... no branched returns here
return Expensive(foo);
}
Expensive e = work();
In some cases compilers can apply NRVO, named return value optimization, as well:
Expensive work() {
Expensive e; // named object
// ... no branched returns here
return e; // return named object
}
This however isn't exactly reliable, only works in more trivial cases and would have to be tested. If you're not up to testing every case, just use out-parameters with references in the second case.
IMO the first thing you should ask yourself is whether copying ExpensiveCopy really is so prohibitive expensive. And to answer that, you will usually need a profiler. Unless a profiler tells you that the copying really is a bottleneck, simply write the code that's easier to read: ExpensiveCopy obj = doWork(param);.
Of course, there are indeed cases where objects cannot be copied for performance or other reasons. Then Neil's answer applies.
In addition to all comments here I'd mention that in C++0x you'd rarely use output parameter for optimization purpose -- because of Move Constructors (see here)
Unless you are going down the "everything is immutable" route, which doesn't sit too well with C++. you cannot easily avoid out parameters. The C++ Standard Library uses them, and what's good enough for it is good enough for me.
As to your first example: return value optimization will often allow the returned object to be created directly in-place, instead of having to copy the object around. All modern compilers do this.
What platform are you working on?
The reason I ask is that many people have suggested Return Value Optimization, which is a very handy compiler optimization present in almost every compiler. Additionally Microsoft and Intel implement what they call Named Return Value Optimization which is even more handy.
In standard Return Value Optimization your return statement is a call to an object's constructor, which tells the compiler to eliminate the temporary values (not necessarily the copy operation).
In Named Return Value Optimization you can return a value by its name and the compiler will do the same thing. The advantage to NRVO is that you can do more complex operations on the created value (like calling functions on it) before returning it.
While neither of these really eliminate an expensive copy if your returned data is very large, they do help.
In terms of avoiding the copy the only real way to do that is with pointers or references because your function needs to be modifying the data in the place you want it to end up in. That means you probably want to have a pass-by-reference parameter.
Also I figure I should point out that pass-by-reference is very common in high-performance code for specifically this reason. Copying data can be incredibly expensive, and it is often something people overlook when optimizing their code.
As far as I can see, the reasons to prefer return values to out parameters are that it's clearer, and it works with pure functional programming (you can get some nice guarantees if a function depends only on input parameters, returns a value, and has no side effects). The first reason is stylistic, and in my opinion not all that important. The second isn't a good fit with C++. Therefore, I wouldn't try to distort anything to avoid out parameters.
The simple fact is that some functions have to return multiple things, and in most languages this suggests out parameters. Common Lisp has multiple-value-bind and multiple-value-return, in which a list of symbols is provided by the bind and a list of values is returned. In some cases, a function can return a composite value, such as a list of values which will then get deconstructed, and it isn't a big deal for a C++ function to return a std::pair. Returning more than two values this way in C++ gets awkward. It's always possible to define a struct, but defining and creating it will often be messier than out parameters.
In some cases, the return value gets overloaded. In C, getchar() returns an int, with the idea being that there are more int values than char (true in all implementations I know of, false in some I can easily imagine), so one of the values can be used to denote end-of-file. atoi() returns an integer, either the integer represented by the string it's passed or zero if there is none, so it returns the same thing for "0" and "frog". (If you want to know whether there was an int value or not, use strtol(), which does have an out parameter.)
There's always the technique of throwing an exception in case of an error, but not all multiple return values are errors, and not all errors are exceptional.
So, overloaded return values causes problems, multiple value returns aren't easy to use in all languages, and single returns don't always exist. Throwing an exception is often inappropriate. Using out parameters is very often the cleanest solution.
Ask yourself why you have some method that performs work on this expensive to copy object in the first place. Say you have a tree, would you send the tree off into some building method or else give the tree its own building method? Situations like this come up constantly when you have a little bit off design but tend to fold into themselves when you have it down pat.
I know in practicality we don't always get to change every object at all, but passing in out parameters is a side effect operation, and it makes it much harder to figure out what's going on, and you never really have to do it (except as forced by working within others' code frameworks).
Sometimes it is easier, but it's definitely not desirable to use it for no reason (if you've suffered through a few large projects where there's always half a dozen out parameters you'll know what I mean).

AddRef and function signature

I've always used the following rule for signatures of functions that return ref-counted objects based on whether they do an AddRef or not, but want to explain it to my colleagues too... So my question is, is the rule described below a widely followed rule? I'm looking for pointers to (for example) coding rules that advocate this style.
If the function does not add a reference to the object, it should be returned as the return value of the function:
class MyClass
{
protected:
IUnknown *getObj() { return m_obj; }
private:
IUnknown *m_obj;
};
However, if the function adds a reference to the object, then a pointer-to-pointer of the object is passed as a parameter to the function:
class MyClass
{
public:
void getObj(IUnknown **outObj) { *outObj = m_obj; (*outObj)->AddRef(); }
private:
IUnknown *m_obj;
};
It's much more typical to use the reference-counting smart pointers for cases when a new object is created and the caller has to take ownership of it.
I've used this same style on projects with a lot of COM. It was taught to me by a couple of people that learned it when they worked at NuMega on a little thing called SoftICE. I think this is also the style taught in the book "Essential COM", by Don Box (here it is at Amazon). At one point in time this book was considered the Bible for COM. I think the only reason this isn't still the case is that COM has become so much more than just COM.
All that said, I prefer CComPtr and other smart pointers.
One approach is to never use the function's return value. Only use output parameters, as in your second case. This is already a rule anyway in published COM interfaces.
Here's an "official" reference but, as is typical, it doesn't even mention your first case: http://support.microsoft.com/kb/104138
But inside a component, banning return values makes for ugly code. It is much nicer to have composability - i.e. putting functions together conveniently, passing the return value of one function directly as an argument to another.
Smart pointers allow you to do that. They are banned in public COM interfaces but then so are non-HRESULT return values. Consequently, your problem goes away. If you want to use a return value to pass back an interface pointer, do it via a smart pointer. And store members in smart pointers as well.
However, suppose for some reason you didn't want to use smart pointers (you're crazy, by the way!) then I can tell you that your reasoning is correct. Your function is acting as a "property getter", and in your first example it should not AddRef.
So your rule is correct (although there's a bug in your implementation which I'll come to in a second, as you may not have spotted it.)
This function wants an object:
void Foo(IUnknown *obj);
It doesn't need to affect obj's refcount at all, unless it wants to store it in a member variable. It certainly should NOT be the responsibility of Foo to call Release on obj before it returns! Imagine the mess that would create.
Now this function returns an object:
IUnknown *Bar();
And very often we like to compose functions, passing the output of one directly to another:
Foo(Bar());
This would not work if Bar had bumped up the refcount of whatever it returned. Who's going to Release it? So Bar does not call AddRef. This means that it is returning something that it stores and manages, i.e. it's effectively a property getter.
Also if the caller is using a smart pointer, p:
p = Bar();
Any sane smart pointer is going to AddRef when it is assigned an object. If Bar had also AddRef-ed well, we have again leaked one count. This is really just a special case of the same composability problem.
Output parameters (pointer-to-pointer) are different, because they aren't affected by the composability problem in the same way:
Again, smart pointers provide the most common case, using your second example:
myClass.getObj(&p);
The smart pointer isn't going to do any ref-counting here, so getObj has to do it.
Now we come to the bug. Suppose smart pointer p already points to something when you pass it to getObj...
The corrected version is:
void getObj(IUnknown **outObj)
{
if (*outObj != 0)
(*outObj)->Release();
*outObj = m_obj;
(*outObj)->AddRef(); // might want to check for 0 here also
}
In practise, people make that mistake so often that I find it simpler to make my smart pointer assert if operator& is called when it already has an object.

What you think about throwing an exception for not found in C++?

I know most people think that as a bad practice but when you are trying to make your class public interface only work with references, keeping pointers inside and only when necessary, I think there is no way to return something telling that the value you are looking doesn't exist in the container.
class list {
public:
value &get(type key);
};
Let's think that you don't want to have dangerous pointers being saw in the public interface of the class, how do you return a not found in this case, throwing an exception?
What is your approach to that? Do you return an empty value and check for the empty state of it? I actually use the throw approach but I introduce a checking method:
class list {
public:
bool exists(type key);
value &get(type key);
};
So when I forget to check that the value exists first I get an exception, that is really an exception.
How would you do it?
The STL deals with this situation by using iterators. For example, the std::map class has a similar function:
iterator find( const key_type& key );
If the key isn't found, it returns 'end()'. You may want to use this iterator approach, or to use some sort of wrapper for your return value.
The correct answer (according to Alexandrescu) is:
Optional and Enforce
First of all, do use the Accessor, but in a safer way without inventing the wheel:
boost::optional<X> get_X_if_possible();
Then create an enforce helper:
template <class T, class E>
T& enforce(boost::optional<T>& opt, E e = std::runtime_error("enforce failed"))
{
if(!opt)
{
throw e;
}
return *opt;
}
// and an overload for T const &
This way, depending on what might the absence of the value mean, you either check explicitly:
if(boost::optional<X> maybe_x = get_X_if_possible())
{
X& x = *maybe_x;
// use x
}
else
{
oops("Hey, we got no x again!");
}
or implicitly:
X& x = enforce(get_X_if_possible());
// use x
You use the first way when you’re concerned about efficiency, or when you want to handle the failure right where it occurs. The second way is for all other cases.
The problem with exists() is that you'll end up searching twice for things that do exist (first check if it's in there, then find it again). This is inefficient, particularly if (as its name of "list" suggests) your container is one where searching is O(n).
Sure, you could do some internal caching to avoid the double search, but then your implementation gets messier, your class becomes less general (since you've optimised for a particular case), and it probably won't be exception-safe or thread-safe.
Don't use an exception in such a case. C++ has a nontrivial performance overhead for such exceptions, even if no exception is thrown, and it additially makes reasoning about the code much harder (cf. exception safety).
Best-practice in C++ is one of the two following ways. Both get used in the STL:
As Martin pointed out, return an iterator. Actually, your iterator can well be a typedef for a simple pointer, there's nothing speaking against it; in fact, since this is consistent with the STL, you could even argue that this way is superior to returning a reference.
Return a std::pair<bool, yourvalue>. This makes it impossible to modify the value, though, since a copycon of the pair is called which doesn't work with referende members.
/EDIT:
This answer has spawned quite some controversy, visible from the comments and not so visible from the many downvotes it got. I've found this rather surprising.
This answer was never meant as the ultimate point of reference. The “correct” answer had already been given by Martin: execeptions reflect the behaviour in this case rather poorly. It's semantically more meaningful to use some other signalling mechanism than exceptions.
Fine. I completely endorse this view. No need to mention it once again. Instead, I wanted to give an additional facet to the answers. While minor speed boosts should never be the first rationale for any decision-making, they can provide further arguments and in some (few) cases, they may even be crucial.
Actually, I've mentioned two facets: performance and exception safety. I believe the latter to be rather uncontroversial. While it's extremely hard to give strong exceptions guarantees (the strongest, of course, being “nothrow”), I believe it's essential: any code that is guaranteed to not throw exceptions makes the whole program easier to reason about. Many C++ experts emphasize this (e.g. Scott Meyers in item 29 of “Effective C++”).
About speed. Martin York has pointed out that this no longer applies in modern compilers. I respectfully disagree. The C++ language makes it necessary for the environment to keep track, at runtime, of code paths that may be unwound in the case of an exception. Now, this overhead isn't really all that big (and it's quite easy to verify this). “nontrivial” in my above text may have been too strong.
However, I find it important to draw the distinction between languages like C++ and many modern, “managed” languages like C#. The latter has no additional overhead as long as no exception is thrown because the information necessary to unwind the stack is kept anyway. By and large, stand by my choice of words.
STL Iterators?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
If you're accessor is one among many, then iterators are the way to go, because you will be able to use them to move in the container. But if your accessor is a simple getter, able to return either the value or the fact there is no value, then your iterator is perhaps only a glorified pointer...
Which leads us to...
Smart pointers?
The point of smart pointers is to simplify pointer ownership. With a shared pointer, you'll get a ressource (memory) which will be shared, at the cost of an overhead (shared pointers needs to allocate an integer as a reference counter...).
You have to choose: Either your Value is already inside a shared pointer, and then, you can return this shared pointer (or a weak pointer). Or Your value is inside a raw pointer. Then you can return the row pointer. You don't want to return a shared pointer if your ressource is not already inside a shared pointer: A World of funny things will happen when your shared pointer will get out of scope an delete your Value without telling you...
:-p
Pointers?
If your interface is clear about its ownership of its ressources, and by the fact the returned value can be NULL, then you could return a simple, raw pointer. If the user of your code is dumb enough ignore the interface contract of your object, or to play arithmetics or whatever with your pointer, then he/she will be dumb enough to break any other way you'll choose to return the value, so don't bother with the mentally challenged...
Undefined Value
Unless your Value type really has already some kind of "undefined" value, and the user knows that, and will accept to handle that, it is a possible solution, similar to the pointer or iterator solution.
But do not add a "undefined" value to your Value class because of the problem you asked: You'll end up raising the "references vs. pointer" war to another level of insanity. Code users want the objects you give them to either be Ok, or to not exist. Having to test every other line of code this object is still valid is a pain, and will complexify uselessly the user code, by your fault.
Exceptions
Exceptions are usually not as costly as some people would like them to be. But for a simple accessor, the cost could be not trivial, if your accessor is used often.
For example, the STL std::vector has two accessors to its value through an index:
T & std::vector::operator[]( /* index */ )
and:
T & std::vector::at( /* index */ )
The difference being that the [] is non-throwing . So, if you access outside the range of the vector, you're on your own, probably risking memory corruption, and a crash sooner or later. So, you should really be sure you verified the code using it.
On the other hand, at is throwing. This means that if you access outside the range of the vector, then you'll get a clean exception. This method is better if you want to delegate to another code the processing of an error.
I use personnaly the [] when I'm accessing the values inside a loop, or something similar. I use at when I feel an exception is the good way to return the current code (or the calling code) the fact something went wrong.
So what?
In your case, you must choose:
If you really need a lightning-fast access, then the throwing accessor could be a problem. But this means you already used a profiler on your code to determinate this is a bottleneck, doesn't it?
;-)
If you know that not having a value can happen often, and/or you want your client to propagate a possible null/invalid/whatever semantic pointer to the value accessed, then return a pointer (if your value is inside a simple pointer) or a weak/shared pointer (if your value is owned by a shared pointer).
But if you believe the client won't propagate this "null" value, or that they should not propagate a NULL pointer (or smart pointer) in their code, then use the reference protected by the exception. Add a "hasValue" method returning a boolean, and add a throw should the user try the get the value even if there is none.
Last but not least, consider the code that will be used by the user of your object:
// If you want your user to have this kind of code, then choose either
// pointer or smart pointer solution
void doSomething(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
}
MyValue * doSomethingElseAndReturnValue(MyClass & p_oMyClass)
{
MyValue * pValue = p_oMyClass.getValue() ;
if(pValue != NULL)
{
// Etc.
}
return pValue ;
}
// ==========================================================
// If you want your user to have this kind of code, then choose the
// throwing reference solution
void doSomething(MyClass & p_oMyClass)
{
if(p_oMyClass.hasValue())
{
MyValue & oValue = p_oMyClass.getValue() ;
}
}
So, if your main problem is choosing between the two user codes above, your problem is not about performance, but "code ergonomy". Thus, the exception solution should not be put aside because of potential performance issues.
:-)
Accessor?
The "iterator" idea proposed before me is interesting, but the real point of iterators is navigation through a container. Not as an simple accessor.
I agree with paercebal, an iterator is to iterate. I don't like the way STL does. But the idea of an accessor seems more appealing. So what we need? A container like class that feels like a boolean for testing but behaves like the original return type. That would be feasible with cast operators.
template <T> class Accessor {
public:
Accessor(): _value(NULL)
{}
Accessor(T &value): _value(&value)
{}
operator T &() const
{
if (!_value)
throw Exception("that is a problem and you made a mistake somewhere.");
else
return *_value;
}
operator bool () const
{
return _value != NULL;
}
private:
T *_value;
};
Now, any foreseeable problem? An example usage:
Accessor <type> value = list.get(key);
if (value) {
type &v = value;
v.doSomething();
}
How about returning a shared_ptr as the result. This can be null if the item wasn't found. It works like a pointer, but it will take care of releasing the object for you.
(I realize this is not always the right answer, and my tone a bit strong, but you should consider this question before deciding for other more complex alternatives):
So, what's wrong with returning a pointer?
I've seen this one many times in SQL, where people will do their earnest to never deal with NULL columns, like they have some contagious decease or something. Instead, they cleverly come up with a "blank" or "not-there" artificial value like -1, 9999 or even something like '#X-EMPTY-X#'.
My answer: the language already has a construct for "not there"; go ahead, don't be afraid to use it.
what I prefer doing in situations like this is having a throwing "get" and for those circumstances where performance matter or failiure is common have a "tryGet" function along the lines of "bool tryGet(type key, value **pp)" whoose contract is that if true is returned then *pp == a valid pointer to some object else *pp is null.
#aradtke, you said.
I agree with paercebal, an iterator is
to iterate. I don't like the way STL
does. But the idea of an accessor
seems more appealing. So what we need?
A container like class that feels like
a boolean for testing but behaves like
the original return type. That would
be feasible with cast operators. [..] Now,
any foreseeable problem?
First, YOU DO NOT WANT OPERATOR bool. See Safe Bool idiom for more info. But about your question...
Here's the problem, users need to now explict cast in cases. Pointer-like-proxies (such as iterators, ref-counted-ptrs, and raw pointers) have a concise 'get' syntax. Providing a conversion operator is not very useful if callers have to invoke it with extra code.
Starting with your refence like example, the most concise way to write it:
// 'reference' style, check before use
if (Accessor<type> value = list.get(key)) {
type &v = value;
v.doSomething();
}
// or
if (Accessor<type> value = list.get(key)) {
static_cast<type&>(value).doSomething();
}
This is okay, don't get me wrong, but it's more verbose than it has to be. now consider if we know, for some reason, that list.get will succeed. Then:
// 'reference' style, skip check
type &v = list.get(key);
v.doSomething();
// or
static_cast<type&>(list.get(key)).doSomething();
Now lets go back to iterator/pointer behavior:
// 'pointer' style, check before use
if (Accessor<type> value = list.get(key)) {
value->doSomething();
}
// 'pointer' style, skip check
list.get(key)->doSomething();
Both are pretty good, but pointer/iterator syntax is just a bit shorter. You could give 'reference' style a member function 'get()'... but that's already what operator*() and operator->() are for.
The 'pointer' style Accessor now has operator 'unspecified bool', operator*, and operator->.
And guess what... raw pointer meets these requirements, so for prototyping, list.get() returns T* instead of Accessor. Then when the design of list is stable, you can come back and write the Accessor, a pointer-like Proxy type.
Interesting question. It's a problem in C++ to exclusively use references I guess - in Java the references are more flexible and can be null. I can't remember if it's legal C++ to force a null reference:
MyType *pObj = nullptr;
return *pObj
But I consider this dangerous. Again in Java I'd throw an exception as this is common there, but I rarely see exceptions used so freely in C++.
If I was making a puclic API for a reusable C++ component and had to return a reference, I guess I'd go the exception route.
My real preference is to have the API return a pointer; I consider pointers an integral part of C++.