Approaching STL algorithms, lambda, local classes and other approaches

Approaching STL algorithms, lambda, local classes and other approaches - c++

One of the things that seems to be necessary with use of STL is a way to specify local functions.
Many of the functions that I would normally provide cannot be created using STL function object creation tools ( eg bind ), I have to hand roll my function object.
Since the C++ standard forbids local types to be used as arguments in template instantiations the best I was able to use was to create a small library, ( just showing relevant parts )
// library header
class MyFunctionBase<R,T>
{
public:
virtual ~MyFunctionBase();
virtual R operator()(const T &) const=0;
};
class MyFunction<R,T>
{
MyFunctionBase<R,T> *b;
public:
~MyFunction()
{
delete b;
}
virtual R operator()(const T &) const
{
return (*b)(T);
}
};
// source file
....
class func: public MyFunctionBase ...
std::stl_alg(.... MyFunction(new funct));
This has always seemed unwieldly to me. I guess to the people on the ISO committee believe so too and added a lambda to C++.
In the meantime how have compilers addressed this problem? ( Especially Windows compilers. )
A correction which might clarify a bit.
Changelog:
Nov 2
replaced to clarify
Since the C++ standard forbids local classes as function objects

The standard way is a "functor" - basically, a struct that supplies an operator()
For example:
struct MyMinFunctor {
bool operator()(const int& a, const int& b) { return a > b; }
};
vector<int> v;
sort(v.begin(), v.end(), MyMinFunctor());
Because it is a struct/class, you can subclass any of the things like 'binary_operator' as well as maintain state for more advanced functors.

Boost.Bind, Boost.Function, and Boost.Lambda are your friends.

With C++0x you can use lambda's (as you mentioned):
for_each(container.begin(), container.end(),
[](auto item) {
// do something with item
}
);
This is already available in MS Visual C++ 2010 (currently in Community Tech Preview) and GCC 4.3.x (with the -std=c++0x compiler flag). However, without lambda's, you just need to provide a type that:
Is default constructible
Is copy constructible
Defines a function operator overload
There are some algorithms that require binary function objects while there are some that require unary function objects. Refer your vendor's STL documentation to find out exactly which algorithms require binary function objects and which ones require unary function objects.
One thing you might also want to look into are the newer implementations of bind and function in TR1 (based on Boost.Bind and Boost.Function).

Related

C++ Techniques: Type-Erasure vs. Pure Polymorphism

What are the advantages/disadvantages of the two techniques in comparison ? And more importantly: Why and when should one be used over the other ? Is it just a matter of personal taste/preference ?
To the best of my abilities, I haven't found another post that explicitly addresses my question. Among many questions regarding the actual use of polymorphism and/or type-erasure, the following seems to be closest, or so it seemed, but it doesn't really address my question either:
C++ -& CRTP . Type erasure vs polymorphism
Please, note that I very well understand both techniques. To this end, I provide a simple, self-contained, working example below, which I'm happy to remove, if it is felt unnecessary. However, the example should clarify what the two techniques mean with respect to my question. I'm not interested in discussing nomenclatures. Also, I know the difference between compile- and run-time polymorphism, though I wouldn't consider this to be relevant to the question. Note that my interest is less in performance-differences, if there are any. However, if there was a striking argument for one or the other based on performance, I'd be curious to read it. In particular, I would like to hear about concrete examples (no code) that would really only work with one of the two approaches.
Looking at the example below, one primary difference is the memory-management, which for polymorphism remains on the user-side, and for type-erasure is neatly tucked away requiring some reference-counting (or boost). Having said that, depending on the usage scenarios, the situation might be improved for the polymorphism-example by using smart-pointers with the vector (?), though for arbitrary cases this may very well turn out to be impractical (?). Another aspect, potentially in favor of type-erasure, may be the independence of a common interface, but why exactly would that be an advantage (?).
The code as given below was tested (compiled & run) with MS VisualStudio 2008 by simply putting all of the following code-blocks into a single source-file. It should also compile with gcc on Linux, or so I hope/assume, because I see no reason why not (?) :-) I have split/divided the code here for clarity.
These header-files should be sufficient, right (?).
#include <iostream>
#include <vector>
#include <string>
Simple reference-counting to avoid boost (or other) dependencies. This class is only used in the type-erasure-example below.
class RefCount
{
RefCount( const RefCount& );
RefCount& operator= ( const RefCount& );
int m_refCount;
public:
RefCount() : m_refCount(1) {}
void Increment() { ++m_refCount; }
int Decrement() { return --m_refCount; }
};
This is the simple type-erasure example/illustration. It was copied and modified in part from the following article. Mainly I have tried to make it as clear and straightforward as possible.
http://www.cplusplus.com/articles/oz18T05o/
class Object {
struct ObjectInterface {
virtual ~ObjectInterface() {}
virtual std::string GetSomeText() const = 0;
};
template< typename T > struct ObjectModel : ObjectInterface {
ObjectModel( const T& t ) : m_object( t ) {}
virtual ~ObjectModel() {}
virtual std::string GetSomeText() const { return m_object.GetSomeText(); }
T m_object;
};
void DecrementRefCount() {
if( mp_refCount->Decrement()==0 ) {
delete mp_refCount; delete mp_objectInterface;
mp_refCount = NULL; mp_objectInterface = NULL;
}
}
Object& operator= ( const Object& );
ObjectInterface *mp_objectInterface;
RefCount *mp_refCount;
public:
template< typename T > Object( const T& obj )
: mp_objectInterface( new ObjectModel<T>( obj ) ), mp_refCount( new RefCount ) {}
~Object() { DecrementRefCount(); }
std::string GetSomeText() const { return mp_objectInterface->GetSomeText(); }
Object( const Object &obj ) {
obj.mp_refCount->Increment(); mp_refCount = obj.mp_refCount;
mp_objectInterface = obj.mp_objectInterface;
}
};
struct MyObject1 { std::string GetSomeText() const { return "MyObject1"; } };
struct MyObject2 { std::string GetSomeText() const { return "MyObject2"; } };
void UseTypeErasure() {
typedef std::vector<Object> ObjVect;
typedef ObjVect::const_iterator ObjVectIter;
ObjVect objVect;
objVect.push_back( Object( MyObject1() ) );
objVect.push_back( Object( MyObject2() ) );
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
std::cout << iter->GetSomeText();
}
As far as I'm concerned, this seems to achieve pretty much the same using polymorphism, or maybe not (?).
struct ObjectInterface {
virtual ~ObjectInterface() {}
virtual std::string GetSomeText() const = 0;
};
struct MyObject3 : public ObjectInterface {
std::string GetSomeText() const { return "MyObject3"; } };
struct MyObject4 : public ObjectInterface {
std::string GetSomeText() const { return "MyObject4"; } };
void UsePolymorphism() {
typedef std::vector<ObjectInterface*> ObjVect;
typedef ObjVect::const_iterator ObjVectIter;
ObjVect objVect;
objVect.push_back( new MyObject3 );
objVect.push_back( new MyObject4 );
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
std::cout << (*iter)->GetSomeText();
for( ObjVectIter iter = objVect.begin(); iter != objVect.end(); ++iter )
delete *iter;
}
And finally for testing all of the above together.
int main() {
UseTypeErasure();
UsePolymorphism();
return(0);
}

C++ style virtual method based polymorphism:
You have to use classes to hold your data.
Every class has to be built with your particular kind of polymorphism in mind.
Every class has a common binary-level dependency, which restricts how the
compiler creates the instance of each class.
The data you are abstracting must explicitly describe an interface that describes
your needs.
C++ style template based type erasure (with virtual method based polymorphism doing the erasure):
You have to use template to talk about your data.
Each chunk of data you are working on may be completely unrelated to other options.
The type erasure work is done within public header files, which bloats compile time.
Each type erased has its own template instantiated, which can bloat binary size.
The data you are abstracting need not be written as being directly dependent on your needs.
Now, which is better? Well, that depends if the above things are good or bad in your particular situation.
As an explicit example, std::function<...> uses type erasure which allows it to take function pointers, function references, output of a whole pile of template-based functions that generate types at compile time, myraids of functors which have an operator(), and lambdas. All of these types are unrelated to one another. And because they aren't tied to having a virtual operator(), when they are used outside of the std::function context the abstraction they represent can be compiled away. You couldn't do this without type erasure, and you probably wouldn't want to.
On the other hand, just because a class has a method called DoFoo, doesn't mean that they all do the same thing. With polymorphism, it isn't just any DoFoo you are calling, but the DoFoo from a particular interface.
As for your sample code... your GetSomeText should be virtual ... override in the polymorphism case.
There is no need to reference count just because you are using type erasure. There is no need not to use reference counting just because you are using polymorphsm.
Your Object could wrap T*s like how you stored vectors of raw pointers in the other case, with manual destruction of their contents (equivalent to having to call delete). Your Object could wrap a std::shared_ptr<T>, and in the other case you could have vector of std::shared_ptr<T>. Your Object could contain a std::unique_ptr<T>, equivalent to having a vector of std::unique_ptr<T> in the other case. Your Object's ObjectModel could extract copy constructors and assignment operators from the T and expose them to Object, allowing full-on value semantics for your Object, which corresponds to the a vector of T in your polymorphism case.

Here's one view: The question seems to ask how one should choose between late binding ("runtime polymorphism") and early binding ("compile-time polymorphism").
As KerrekSB points out in his comments, there are some things you can do with late binding that it just isn't realistic to do with early binding. Many uses of the Strategy pattern (decoding network I/O) or the Abstract Factory pattern (runtime-selected class factories) fall into this category.
If both approaches are viable, then choosing is a matter of the trade offs involved. In C++ applications, the main tradeoffs I see between early and late binding are implementation maintainability, binary size, and performance.
There are at least some people who feel that C++ templates in any shape or form are impossible to comprehend. Or possibly have some other, less dramatic reservation with templates. C++ templates have many little gotchas ("when do I need to use the 'typename' and 'template' keywords?"), and non-obvious tricks (SFINAE comes to mind).
Another tradeoff is optimization. When you bind early, you give the compiler more information about your program, and so it can (potentially) do a better job optimizing. When you bind late, the compiler (probably) doesn't know ahead of time as much information -- some of that information may be in other compilation units, and so the optimizer can't do as much.
Another tradeoff is program size. In C++ at least, using "compile-time polymorphism" sometimes balloons binary size, as the compiler creates, optimizes, and emits different code for each used specialization. In contrast, when binding late, there's only one code path.
It's interesting to compare the same tradeoff being made in a different context. Take web applications, where one uses (some type of) polymorphism to deal with differences between browsers, and possibly for internationalization (i18n)/localization. Now, a hand-written JavaScript web application would likely use what amounts to late binding here, by having methods which detect capabilities at runtime to figure out what to do. Libraries like jQuery take this tack.
Another approach is to write different code for each possible browser/i18n possibility. While this sounds absurd, it is far from unheard of. The Google Web Toolkit uses this approach. GWT has its "deferred binding" mechanism, used to specialize the compiler's output to different browsers and different localizations. GWT's "deferred binding" mechanism uses early binding: The GWT Java-to-JavaScript compiler figures out all possible ways the polymorphism might be needed, and spits out an entirely different "binary" for each.
The tradeoffs are similar. Wrapping your head around how you extend GWT using deferred binding can be a headache and a half; Having knowledge at compile time allows GWT's compiler to optimize each specialization separately, possibly yielding better performance, and smaller size for each specialization; The whole of a GWT application can end up being many times the size of a comparable jQuery application, due to all of the precompiled specializations.

One benefit to runtime generics that no-one here has mentioned (?) is the possibility for code that is generated and injected into a running application, to use the same List, Hashmap / Dictionary etc. that everything else in that application is already using. Why you'd want to do that, is another question.

What is a C++ delegate?

What is the general idea of a delegate in C++? What are they, how are they used and what are they used for?
I'd like to first learn about them in a 'black box' way, but a bit of information on the guts of these things would be great too.
This is not C++ at its purest or cleanest, but I notice that the codebase where I work has them in abundance. I'm hoping to understand them enough, so I can just use them and not have to delve into the horrible nested template awfulness.
These two The Code Project articles explain what I mean but not particularly succinctly:
Member Function Pointers and the Fastest Possible C++ Delegates
The Impossibly Fast C++ Delegates

You have an incredible number of choices to achieve delegates in C++. Here are the ones that came to my mind.
Option 1 : functors:
A function object may be created by implementing operator()
struct Functor
{
// Normal class/struct members
int operator()(double d) // Arbitrary return types and parameter list
{
return (int) d + 1;
}
};
// Use:
Functor f;
int i = f(3.14);
Option 2: lambda expressions (C++11 only)
// Syntax is roughly: [capture](parameter list) -> return type {block}
// Some shortcuts exist
auto func = [](int i) -> double { return 2*i/1.15; };
double d = func(1);
Option 3: function pointers
int f(double d) { ... }
typedef int (*MyFuncT) (double d);
MyFuncT fp = &f;
int a = fp(3.14);
Option 4: pointer to member functions (fastest solution)
See Fast C++ Delegate (on The Code Project).
struct DelegateList
{
int f1(double d) { }
int f2(double d) { }
};
typedef int (DelegateList::* DelegateType)(double d);
DelegateType d = &DelegateList::f1;
DelegateList list;
int a = (list.*d)(3.14);
Option 5: std::function
(or boost::function if your standard library doesn't support it). It is slower, but it is the most flexible.
#include <functional>
std::function<int(double)> f = [can be set to about anything in this answer]
// Usually more useful as a parameter to another functions
Option 6: binding (using std::bind)
Allows setting some parameters in advance, convenient to call a member function for instance.
struct MyClass
{
int DoStuff(double d); // actually a DoStuff(MyClass* this, double d)
};
std::function<int(double d)> f = std::bind(&MyClass::DoStuff, this, std::placeholders::_1);
// auto f = std::bind(...); in C++11
Option 7: templates
Accept anything as long as it matches the argument list.
template <class FunctionT>
int DoSomething(FunctionT func)
{
return func(3.14);
}

A delegate is a class that wraps a pointer or reference to an object instance, a member method of that object's class to be called on that object instance, and provides a method to trigger that call.
Here's an example:
template <class T>
class CCallback
{
public:
typedef void (T::*fn)( int anArg );
CCallback(T& trg, fn op)
: m_rTarget(trg)
, m_Operation(op)
{
}
void Execute( int in )
{
(m_rTarget.*m_Operation)( in );
}
private:
CCallback();
CCallback( const CCallback& );
T& m_rTarget;
fn m_Operation;
};
class A
{
public:
virtual void Fn( int i )
{
}
};
int main( int /*argc*/, char * /*argv*/ )
{
A a;
CCallback<A> cbk( a, &A::Fn );
cbk.Execute( 3 );
}

The need for C++ delegate implementations are a long lasting embarassment to the C++ community.
Every C++ programmer would love to have them, so they eventually use them despite the facts that:
std::function() uses heap operations (and is out of reach for serious embedded programming).
All other implementations make concessions towards either portability or standard conformity to larger or lesser degrees (please verify by inspecting the various delegate implementations here and on codeproject). I have yet to see an implementation which does not use wild reinterpret_casts, Nested class "prototypes" which hopefully produce function pointers of the same size as the one passed in by the user, compiler tricks like first forward declare, then typedef then declare again, this time inheriting from another class or similar shady techniques. While it is a great accomplishment for the implementers who built that, it is still a sad testimoney on how C++ evolves.
Only rarely is it pointed out, that now over 3 C++ standard revisions, delegates were not properly addressed. (Or the lack of language features which allow for straightforward delegate implementations.)
With the way C++11 lambda functions are defined by the standard (each lambda has anonymous, different type), the situation has only improved in some use cases. But for the use case of using delegates in (DLL) library APIs, lambdas alone are still not usable. The common technique here, is to first pack the lambda into a std::function and then pass it across the API.

Very simply, a delegate provides functionality for how a function pointer SHOULD work. There are many limitations of function pointers in C++. A delegate uses some behind-the-scenes template nastyness to create a template-class function-pointer-type-thing that works in the way you might want it to.
ie - you can set them to point at a given function and you can pass them around and call them whenever and wherever you like.
There are some very good examples here:
http://www.codeproject.com/Articles/7150/Member-Function-Pointers-and-the-Fastest-Possible
http://www.codeproject.com/Articles/11015/The-Impossibly-Fast-C-Delegates
http://www.codeproject.com/Articles/13287/Fast-C-Delegate

An option for delegates in C++ that is not otherwise mentioned here is to do it C style using a function ptr and a context argument. This is probably the same pattern that many asking this question are trying to avoid. But, the pattern is portable, efficient, and is usable in embedded and kernel code.
class SomeClass
{
in someMember;
int SomeFunc( int);
static void EventFunc( void* this__, int a, int b, int c)
{
SomeClass* this_ = static_cast< SomeClass*>( this__);
this_->SomeFunc( a );
this_->someMember = b + c;
}
};
void ScheduleEvent( void (*delegateFunc)( void*, int, int, int), void* delegateContext);
...
SomeClass* someObject = new SomeObject();
...
ScheduleEvent( SomeClass::EventFunc, someObject);
...

Windows Runtime equivalent of a function object in standard C++. One can use the whole function as a parameter (actually that is a function pointer). It is mostly used in conjunction with events. The delegate represents a contract that event handlers much fulfill. It facilitate how a function pointer can work for.

How do boost::variant and boost::any work?

How do variant and any from the boost library work internally? In a project I am working on, I currently use a tagged union. I want to use something else, because unions in C++ don't let you use objects with constructors, destructors or overloaded assignment operators.
I queried the size of any and variant, and did some experiments with them. In my platform, variant takes the size of its longest possible type plus 8 bytes: I think it my just be 8 bytes o type information and the rest being the stored value. On the other hand, any just takes 8 bytes. Since i'm on a 64-bit platform, I guess any just holds a pointer.
How does Any know what type it holds? How does Variant achieve what it does through templates? I would like to know more about these classes before using them.

If you read the boost::any documentation they provide the source for the idea: http://www.two-sdg.demon.co.uk/curbralan/papers/ValuedConversions.pdf
It's basic information hiding, an essential C++ skill to have. Learn it!
Since the highest voted answer here is totally incorrect, and I have my doubts that people will actually go look at the source to verify that fact, here's a basic implementation of an any like interface that will wrap any type with an f() function and allow it to be called:
struct f_any
{
f_any() : ptr() {}
~f_any() { delete ptr; }
bool valid() const { return ptr != 0; }
void f() { assert(ptr); ptr->f(); }
struct placeholder
{
virtual ~placeholder() {}
virtual void f() const = 0;
};
template < typename T >
struct impl : placeholder
{
impl(T const& t) : val(t) {}
void f() const { val.f(); }
T val;
};
// ptr can now point to the entire family of
// struct types generated from impl<T>
placeholder * ptr;
template < typename T >
f_any(T const& t) : ptr(new impl<T>(t)) {}
// assignment, etc...
};
boost::any does the same basic thing except that f() actually returns typeinfo const& and provides other information access to the any_cast function to work.

The key difference between boost::any and boost::variant is that any can store any type, while variant can store only one of a set of enumerated types. The any type stores a void* pointer to the object, as well as a typeinfo object to remember the underlying type and enforce some degree of type safety. In boost::variant, it computes the maximum sized object, and uses "placement new" to allocate the object within this buffer. It also stores the type or the type index.
Note that if you have Boost installed, you should be able to see the source files in "any.hpp" and "variant.hpp". Just search for "include/boost/variant.hpp" and "include/boost/any.hpp" in "/usr", "/usr/local", and "/opt/local" until you find the installed headers, and you can take a look.
Edit
As has been pointed out in the comments below, there was a slight inaccuracy in my description of boost::any. While it can be implemented using void* (and a templated destroy callback to properly delete the pointer), the actualy implementation uses any<T>::placeholder*, with any<T>::holder<T> as subclasses of any<T>::placeholder for unifying the type.

boost::any just snapshots the typeinfo while the templated constructor runs: it has a pointer to a non-templated base class that provides access to the typeinfo, and the constructor derived a type-specific class satisfying that interface. The same technique can actually be used to capture other common capabilities of a set of types (e.g. streaming, common operators, specific functions), though boost doesn't offer control of this.
boost::variant is conceptually similar to what you've done before, but by not literally using a union and instead taking a manual approach to placement construction/destruction of objects in its buffer (while handling alignment issues explicitly) it works around the restrictions that C++ has re complex types in actual unions.

What do we need unary_function and binary_function for?

I read the tutorials about the binary and unary functions. I understood the structure of them, but I couldn't imagine in which case I need these functions. Can you give an example for usage of them.
http://www.cplusplus.com/reference/std/functional/unary_function/
http://www.cplusplus.com/reference/std/functional/binary_function/

These aren't functions, these are classes (structs, actually, but doesn't matter). When you define your own binary functions to use with STL algorithms, you derive them from these classes in order to automatically get all the typedefs.
E.g.
struct SomeFancyUnaryFunction: public std::unary_function<Arg_t, Result_t>
{
Result_t operator ()(Arg_t const &)
{
...
}
};
now you don't need to manually provide the typedefs for argument_type, result_type etc. These structs, just like the iterator struct are there just for our convenience, in order to reuse the typedefs needed for algorithms.
Update for C++11:
As of C++11, the new std::bind does not really need any typedefs, so there are, in a way, obsolete.

Basically, they provide all the typedefs necessary to allow composition of higher-order functions from unary and binary function objects using function adaptors. For example, this allows using a binary functor where a unary is needed, binding one of the arguments to a literal value:
std::find_if( begin, end, std::bind1st(greater<int>(),42) );
std::bind1st relies on the functor passed to it to provide those types.
AFAIK the new std::bind doesn't need them, so it seems in new code you can use std::bindand do away with them.

There's an explanation on the sgi STL documentation of Function Objects. In summary, unary_function and binary_function are used to make functors adaptable. This allows them to be used with function object adaptors such as unary_negate.

What are they?
std::unary_function and std::binary_function are base structs for creation adaptable function objects. The word adaptable means that they provide necessary typedefs for being used in conjunction with standard function adaptors like std::not1, std::not2, std::bind1st, std::bind2nd.
When I need to use them?
You may use them every time you need to use your custom function object together with standard function adaptor.
Do you have an example?
Lets consider some examples (I know, they are artificial. From the other side I hope, that they are rather descriptive).
Example 1.
Suppose you want to print all strings in a vector with their lengths not less than a particular threshold and print them to std::cout.
One might use the next function object:
class LengthThreshold
{
public:
LengthThreshold(std::size_t threshold) : threshold(threshold) {}
bool operator()(const std::string& instance) const
{
return (instance.size() < threshold);
}
private:
const std::size_t threshold;
};
Now the task is pretty simple and can be performed by std::remove_copy_if algorithm:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
LengthThreshold(threshold)
);
What if you want to use the same function object to print all the strings with their lengths strictly less than the threshold?
The obvious solution we can come up with is the usage of std::not1 function adaptor:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
std::not1(LengthThreshold(threshold))
);
In fact, the code above won't compile because our LengthThreshold is not adaptable and has no typedefs which are necessary for std::not1.
To make it adaptable we need to inherit from std::unary_function:
class LengthThreshold : public std::unary_function<std::string, bool>
{
// Function object's body remains the same
}
Now our first example works like a charm.
Example 2.
Lets change our previous example. Suppose we don't want to store a threshold inside the function object. In such case we may change the function object from unary predicate to binary predicate:
class LengthThreshold : public std::binary_function<std::string, std::size_t, bool>
{
public:
bool operator()(const std::string& lhs, std::size_t threshold) const
{
return lhs.size() < threshold;
}
};
And make use of std::bind2nd function adaptor:
// std::size_t threshold is defined somewhere
std::remove_copy_if(some_strings.begin(), some_strings.end(),
std::ostream_iterator<std::string>(std::cout, "\n"),
std::bind2nd(LengthThreshold(), threshold)
);
What about C++11 and higher?
All the examples above intentionally use only C++ 03.
The reason is that std::unary_function and std::binary_function are deprecated since C++ 11 and completely removed from C++ 17.
It happened with the advent of more generalized and flexible functions like std::bind which make inheriting from std::unary_function and std::binary_function superfluous.

Optional function parameters: Use default arguments (NULL) or overload the function?

I have a function that processes a given vector, but may also create such a vector itself if it is not given.
I see two design choices for such a case, where a function parameter is optional:
Make it a pointer and make it NULL by default:
void foo(int i, std::vector<int>* optional = NULL) {
if(optional == NULL){
optional = new std::vector<int>();
// fill vector with data
}
// process vector
}
Or have two functions with an overloaded name, one of which leaves out the argument:
void foo(int i) {
std::vector<int> vec;
// fill vec with data
foo(i, vec);
}
void foo(int i, const std::vector<int>& optional) {
// process vector
}
Are there reasons to prefer one solution over the other?
I slightly prefer the second one because I can make the vector a const reference, since it is, when provided, only read, not written. Also, the interface looks cleaner (isn't NULL just a hack?). And the performance difference resulting from the indirect function call is probably optimized away.
Yet, I often see the first solution in code. Are there compelling reasons to prefer it, apart from programmer laziness?

I would not use either approach.
In this context, the purpose of foo() seems to be to process a vector. That is, foo()'s job is to process the vector.
But in the second version of foo(), it is implicitly given a second job: to create the vector. The semantics between foo() version 1 and foo() version 2 are not the same.
Instead of doing this, I would consider having just one foo() function to process a vector, and another function which creates the vector, if you need such a thing.
For example:
void foo(int i, const std::vector<int>& optional) {
// process vector
}
std::vector<int>* makeVector() {
return new std::vector<int>;
}
Obviously these functions are trivial, and if all makeVector() needs to do to get it's job done is literally just call new, then there may be no point in having the makeVector() function. But I'm sure that in your actual situation these functions do much more than what is being shown here, and my code above illustrates a fundamental approach to semantic design: give one function one job to do.
The design I have above for the foo() function also illustrates another fundamental approach that I personally use in my code when it comes to designing interfaces -- which includes function signatures, classes, etc. That is this: I believe that a good interface is 1) easy and intuitive to use correctly, and 2) difficult or impossible to use incorrectly . In the case of the foo() function we are implictly saying that, with my design, the vector is required to already exist and be 'ready'. By designing foo() to take a reference instead of a pointer, it is both intuitive that the caller must already have a vector, and they are going to have a hard time passing in something that isn't a ready-to-go vector.

I would definitely favour the 2nd approach of overloaded methods.
The first approach (optional parameters) blurs the definition of the method as it no longer has a single well-defined purpose. This in turn increases the complexity of the code, making it more difficult for someone not familiar with it to understand it.
With the second approach (overloaded methods), each method has a clear purpose. Each method is well-structured and cohesive. Some additional notes:
If there's code which needs to be duplicated into both methods, this can be extracted out into a separate method and each overloaded method could call this external method.
I would go a step further and name each method differently to indicate the differences between the methods. This will make the code more self-documenting.

While I do understand the complaints of many people regarding default parameters and overloads, there seems to be a lack of understanding to the benefits that these features provide.
Default Parameter Values:
First I want to point out that upon initial design of a project, there should be little to no use for defaults if well designed. However, where defaults' greatest assets comes into play is with existing projects and well established APIs. I work on projects that consist of millions of existing lines of code and do not have the luxury to re-code them all. So when you wish to add a new feature which requires an extra parameter; a default is needed for the new parameter. Otherwise you will break everyone that uses your project. Which would be fine with me personally, but I doubt your company or users of your product/API would appreciate having to re-code their projects on every update. Simply, Defaults are great for backwards compatibility! This is usually the reason you will see defaults in big APIs or existing projects.
Function Overrides:
The benefit of function overrides is that they allow for the sharing of a functionality concept, but with with different options/parameters. However, many times I see function overrides lazily used to provide starkly different functionality, with just slightly different parameters. In this case they should each have separately named functions, pertaining to their specific functionality (As with the OP's example).
These, features of c/c++ are good and work well when used properly. Which can be said of most any programming feature. It is when they are abused/misused that they cause problems.
Disclaimer:
I know that this question is a few years old, but since these answers came up in my search results today (2012), I felt this needed further addressing for future readers.

I agree, I would use two functions. Basically, you have two different use cases, so it makes sense to have two different implementations.
I find that the more C++ code I write, the fewer parameter defaults I have - I wouldn't really shed any tears if the feature was deprecated, though I would have to re-write a shed load of old code!

A references can't be NULL in C++, a really good solution would be to use Nullable template.
This would let you do things is ref.isNull()
Here you can use this:
template<class T>
class Nullable {
public:
Nullable() {
m_set = false;
}
explicit
Nullable(T value) {
m_value = value;
m_set = true;
}
Nullable(const Nullable &src) {
m_set = src.m_set;
if(m_set)
m_value = src.m_value;
}
Nullable & operator =(const Nullable &RHS) {
m_set = RHS.m_set;
if(m_set)
m_value = RHS.m_value;
return *this;
}
bool operator ==(const Nullable &RHS) const {
if(!m_set && !RHS.m_set)
return true;
if(m_set != RHS.m_set)
return false;
return m_value == RHS.m_value;
}
bool operator !=(const Nullable &RHS) const {
return !operator==(RHS);
}
bool GetSet() const {
return m_set;
}
const T &GetValue() const {
return m_value;
}
T GetValueDefault(const T &defaultValue) const {
if(m_set)
return m_value;
return defaultValue;
}
void SetValue(const T &value) {
m_value = value;
m_set = true;
}
void Clear()
{
m_set = false;
}
private:
T m_value;
bool m_set;
};
Now you can have
void foo(int i, Nullable<AnyClass> &optional = Nullable<AnyClass>()) {
//you can do
if(optional.isNull()) {
}
}

I usually avoid the first case. Note that those two functions are different in what they do. One of them fills a vector with some data. The other doesn't (just accept the data from the caller). I tend to name differently functions that actually do different things. In fact, even as you write them, they are two functions:
foo_default (or just foo)
foo_with_values
At least I find this distinction cleaner in the long therm, and for the occasional library/functions user.

I, too, prefer the second one. While there are not much difference between the two, you are basically using the functionality of the primary method in the foo(int i) overload and the primary overload would work perfectly without caring about existence of lack of the other one, so there is more separation of concerns in the overload version.

In C++ you should avoid allowing valid NULL parameters whenever possible. The reason is that it substantially reduces callsite documentation. I know this sounds extreme but I work with APIs that take upwards of 10-20 parameters, half of which can validly be NULL. The resulting code is almost unreadable
SomeFunction(NULL, pName, NULL, pDestination);
If you were to switch it to force const references the code is simply forced to be more readable.
SomeFunction(
Location::Hidden(),
pName,
SomeOtherValue::Empty(),
pDestination);

I'm squarely in the "overload" camp. Others have added specifics about your actual code example but I wanted to add what I feel are the benefits of using overloads versus defaults for the general case.
Any parameter can be "defaulted"
No gotcha if an overriding function uses a different value for its default.
It's not necessary to add "hacky" constructors to existing types in order to allow them to have default.
Output parameters can be defaulted without needing to use pointers or hacky global objects.
To put some code examples on each:
Any parameter can be defaulted:
class A {}; class B {}; class C {};
void foo (A const &, B const &, C const &);
inline void foo (A const & a, C const & c)
{
foo (a, B (), c); // 'B' defaulted
}
No danger of overriding functions having different values for the default:
class A {
public:
virtual void foo (int i = 0);
};
class B : public A {
public:
virtual void foo (int i = 100);
};
void bar (A & a)
{
a.foo (); // Always uses '0', no matter of dynamic type of 'a'
}
It's not necessary to add "hacky" constructors to existing types in order to allow them to be defaulted:
struct POD {
int i;
int j;
};
void foo (POD p); // Adding default (other than {0, 0})
// would require constructor to be added
inline void foo ()
{
POD p = { 1, 2 };
foo (p);
}
Output parameters can be defaulted without needing to use pointers or hacky global objects:
void foo (int i, int & j); // Default requires global "dummy"
// or 'j' should be pointer.
inline void foo (int i)
{
int j;
foo (i, j);
}
The only exception to the rule re overloading versus defaults is for constructors where it's currently not possible for a constructor to forward to another. (I believe C++ 0x will solve that though).

I would favour a third option:
Separate into two functions, but do not overload.
Overloads, by nature, are less usable. They require the user to become aware of two options and figure out what the difference between them is, and if they're so inclined, to also check the documentation or the code to ensure which is which.
I would have one function that takes the parameter,
and one that is called "createVectorAndFoo" or something like that (obviously naming becomes easier with real problems).
While this violates the "two responsibilities for function" rule (and gives it a long name), I believe this is preferable when your function really does do two things (create vector and foo it).

Generally I agree with others' suggestion to use a two-function approach. However, if the vector created when the 1-parameter form is used is always the same, you could simplify things by instead making it static and using a default const& parameter instead:
// Either at global scope, or (better) inside a class
static vector<int> default_vector = populate_default_vector();
void foo(int i, std::vector<int> const& optional = default_vector) {
...
}

The first way is poorer because you cannot tell if you accidentally passed in NULL or if it was done on purpose... if it was an accident then you have likely caused a bug.
With the second one you can test (assert, whatever) for NULL and handle it appropriately.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js