Scenario
I have a class which I want to be able to compare for equality. The class is large (it contains a bitmap image) and I will be comparing it multiple times, so for efficiency I'm hashing the data and only doing a full equality check if the hashes match. Furthermore, I will only be comparing a small subset of my objects, so I'm only calculating the hash the first time an equality check is done, then using the stored value for subsequent calls.
Example
class Foo
{
public:
Foo(int data) : fooData(data), notHashed(true) {}
private:
void calculateHash()
{
hash = 0; // Replace with hashing algorithm
notHashed = false;
}
int getHash()
{
if (notHashed) calculateHash();
return hash;
}
inline friend bool operator==(Foo& lhs, Foo& rhs)
{
if (lhs.getHash() == rhs.getHash())
{
return (lhs.fooData == rhs.fooData);
}
else return false;
}
int fooData;
int hash;
bool notHashed;
};
Background
According to the guidance on this answer, the canonical form of the equality operator is:
inline bool operator==(const X& lhs, const X& rhs);
Furthermore, the following general advice for operator overloading is given:
Always stick to the operator’s well-known semantics.
Questions
My function must be able to mutate it's operands in order to perform the hashing, so I have had to make them non-const. Are there any potential negative consequences of this (examples might be standard library functions or STL containers which will expect operator== to have const operands)?
Should a mutating operator== function be considered contrary to its well-known semantics, if the mutation doesn't have any observable effects (because there's no way for the user to see the contents of the hash)?
If the answer to either of the above is "yes", then what would be a more appropriate approach?
It seems like a perfectly valid usecase for a mutable member. You can (and should) still make your operator== take the parameters by const reference and give the class a mutable member for the hash value.
Your class would then have a getter for the hash value that is itself marked as a const method and that lazy-evaluates the hash value when called for the first time. It's actually a good example of why mutable was added to the language as it does not change the object from a user's perspective, it's only an implementation detail for caching the value of a costly operation internally.
Use mutable for the data that you want to cache but which does not affect the public interface.
U now, “mutate” → mutable.
Then think in terms of logical const-ness, what guarantees the object offers to the using code.
You should never modify the object on comparison. However, this function does not logically modify the object. Simple solution: make hash mutable, as computing the hash is a form of cashing. See:
Does the 'mutable' keyword have any purpose other than allowing the variable to be modified by a const function?
Having side effect in the comparison function or operator is not recommended. It will be better if you can manage to compute the hash as part of the initialization of the class. Another option is to have a manager class that is responsible for that. Note: that even what seems as innocent mutation will require locking in multithreaded application.
Also I will recommend to avoid using the equality operator for classes where the data structure is not absolutely trivial. Very often the progress of the project creates a need for comparison policy (arguments) and the interface of the equality operator becomes insufficient. In this case adding compare method or functor will not need to reflect the standard operator== interface for immutability of the arguments.
If 1. and 2. seem overkill for your case you could use the c++ keyword mutable for the hash value member. This will allow you to modify it even from a const class method or const declared variable
Yes, introducing semantically unexpected side-effects is always a bad idea. Apart from the other reasons mentioned: always assume that any code you write will forever only be used by other people who haven't even heard of your name, and then consider your design choices from that angle.
When someone using your code library finds out his application is slow, and tries to optimize it, he will waste ages trying to find the performance leak if it is inside an == overload, since he doesn't expect it, from a semantic point of view, to do more than a simple object comparison. Hiding potentially costly operations within semantically cheap operations is a bad form of code obfuscation.
You can go the mutable route, but I'm not sure if that is needed. You can do a local cache when needed without having to use mutable. For example:
#include <iostream>
#include <functional> //for hash
using namespace std;
template<typename ReturnType>
class HashCompare{
public:
ReturnType getHash()const{
static bool isHashed = false;
static ReturnType cachedHashValue = ReturnType();
if(!isHashed){
isHashed = true;
cachedHashValue = calculate();
}
return cachedHashValue;
}
protected:
//derived class should implement this but use this.getHash()
virtual ReturnType calculate()const = 0;
};
class ReadOnlyString: public HashCompare<size_t>{
private:
const std::string& s;
public:
ReadOnlyString(const char * s):s(s){};
ReadOnlyString(const std::string& s): s(s){}
bool equals(const ReadOnlyString& str)const{
return getHash() == str.getHash();
}
protected:
size_t calculate()const{
std::cout << "in hash calculate " << endl;
std::hash<std::string> str_hash;
return str_hash(this->s);
}
};
bool operator==(const ReadOnlyString& lhs, const ReadOnlyString& rhs){ return lhs.equals(rhs); }
int main(){
ReadOnlyString str = "test";
ReadOnlyString str2 = "TEST";
cout << (str == str2) << endl;
cout << (str == str2) << endl;
}
Output:
in hash calculate
1
1
Can you give me a good reason to keep as to why keeping isHashed as a member variable is necessary instead of making it local to where its needed? Note that we can further get away from 'static' usage if we really want, all we have todo is make a dedicated structure/class
Related
I'm working on learning C++ with Stroustrup's (Programming Principles & Practice Using C++) book. In an exercise we define a simple struct:
template<typename T>
struct S {
explicit S(T v):val{v} { };
T& get();
const T& get() const;
void set(T v);
void read_val(T& v);
T& operator=(const T& t); // deep copy assignment
private:
T val;
};
We're then asked to define a const and a non-const member function to get val.
I was wondering: Is there any case where it makes sense to have non-const get function that returns val?
It seems much cleaner to me that we can't change the value in such situations indirectly. What might be use cases where you need a const and a non-const get function to return a member variable?
Non-const getters?
Getters and setters are merely convention. Instead of providing a getter and a setter, a sometimes used idiom is to provide something along the line of
struct foo {
int val() const { return val_; }
int& val() { return val_; }
private:
int val_;
};
Such that, depending on the constness of the instance you get a reference or a copy:
void bar(const foo& a, foo& b) {
auto x = a.val(); // calls the const method returning an int
b.val() = x; // calls the non-const method returning an int&
};
Whether this is good style in general is a matter of opinion. There are cases where it causes confusion and other cases where this behaviour is just what you would expect (see below).
In any case, it is more important to design the interface of a class according to what the class is supposed to do and how you want to use it rather than blindly following conventions about setters and getters (eg you should give the method a meaningful name that expresses what it does, not just in terms of "pretend to be encapsulated and now provide me access to all your internals via getters", which is what using getters everywhere actually means).
Concrete example
Consider that element access in containers is usually implemented like this. As a toy example:
struct my_array {
int operator[](unsigned i) const { return data[i]; }
int& operator[](unsigned i) { return data[i]; }
private:
int data[10];
};
It is not the containers job to hide the elements from the user (even data could be public). You dont want different methods to access elements depending on whether you want to read or write the element, hence providing a const and a non-const overload makes perfectly sense in this case.
non-const reference from get vs encapsulation
Maybe not that obvious, but it is a bit controversial whether providing getters and setters supports encapsulation or the opposite. While in general this matter is to a large extend opinion based, for getters that return non const references it is not so much about opinions. They do break encapuslation. Consider
struct broken {
void set(int x) {
counter++;
val = x;
}
int& get() { return x; }
int get() const { return x; }
private:
int counter = 0;
int value = 0;
};
This class is broken as the name suggests. Clients can simply grab a reference and the class has no chance to count the number of times the value is modified (as the set suggests). Once you return a non-const reference then regarding encapsulation there is little difference to making the member public. Hence, this is used only for cases where such behaviour is natural (eg container).
PS
Note that your example returns a const T& rather than a value. This is reasonable for template code, where you dont know how expensive a copy is, while for an int you wont gain much by returning a const int& instead of an int. For the sake of clarity I used non-template examples, though for templated code you would probably rather return a const T&.
First let me rephrase your question:
Why have a non-const getter for a member, rather than just making the member public?
Several possible reasons reasons:
1. Easy to instrument
Whoever said the non-const getter needs to be just:
T& get() { return val; }
? it could well be something like:
T& get() {
if (check_for_something_bad()) {
throw std::runtime_error{
"Attempt to mutate val when bad things have happened");
}
return val;
}
However, as #BenVoigt suggests, it is more appropriate to wait until the caller actually tries to mutate the value through the reference before spewing an error.
2. Cultural convention / "the boss said so"
Some organizations enforce coding standards. These coding standards are sometimes authored by people who are possibly overly-defensive. So, you might see something like:
Unless your class is a "plain old data" type, no data members may be public. You may use getter methods for such non-public members as necessary.
and then, even if it makes sense for a specific class to just allow non-const access, it won't happen.
3. Maybe val just isn't there?
You've given an example in which val actually exists in an instance of the class. But actually - it doesn't have to! The get() method could return some sort of a proxy object, which, upon assignment, mutation etc. performs some computation (e.g. storing or retrieving data in a database; or flipping a bit, which itself is not addressable like an object needs to be).
4. Allows changing class internals later without changing user code
Now, reading items 1. or 3, above, you might ask "but my struct S does have val!" or "by my get() doesn't do anything interesting!" - well, true, they don't; but you might want to change this behavior in the future. Without a get(), all of your class' users will need to change their code. With a get(), you only need to make changes to the implementation of struct S.
Now, I don't advocate for this kind of a design approach approach, but some programmers do.
get() is callable by non const objects which are allowed to mutate, you can do:
S r(0);
r.get() = 1;
but if you make r const as const S r(0), the line r.get() = 1 no longer compile, not even to retrieve the value, that's why you need a const version const T& get() const to at least to able to retrieve the value for const objects, doing so allows you do:
const S r(0)
int val = r.get()
The const version of member functions try to be consistent with the constness property of the object the call is made on, i.e if the object is immutable by being const and the member function returns a reference, it may reflect the constness of the caller by returning a const reference, thus preserving the immutability property of the object.
It depends on the purpose of S. If it's some kind of a thin wrapper, it might be appropriate to allow the user to access the underlaying value directly.
One of the real-life examples is std::reference_wrapper.
No. If a getter simply returns a non-const reference to a member, like this:
private:
Object m_member;
public:
Object &getMember() {
return m_member;
}
Then m_member should be public instead, and the accessor is not needed. There is absolutely no point making this member private, and then create an accessor, which gives all access to it.
If you call getMember(), you can store the resulting reference to a pointer/reference, and afterwards, you can do whatever you want with m_member, the enclosing class will know nothing about it. It's the same, as if m_member had been public.
Note, that if getMember() does some additional task (for example, it doesn't just simply return m_member, but lazily constructs it), then getMember() could be useful:
Object &getMember() {
if (!m_member) m_member = new Object;
return *m_member;
}
I'm working on learning C++ with Stroustrup's (Programming Principles & Practice Using C++) book. In an exercise we define a simple struct:
template<typename T>
struct S {
explicit S(T v):val{v} { };
T& get();
const T& get() const;
void set(T v);
void read_val(T& v);
T& operator=(const T& t); // deep copy assignment
private:
T val;
};
We're then asked to define a const and a non-const member function to get val.
I was wondering: Is there any case where it makes sense to have non-const get function that returns val?
It seems much cleaner to me that we can't change the value in such situations indirectly. What might be use cases where you need a const and a non-const get function to return a member variable?
Non-const getters?
Getters and setters are merely convention. Instead of providing a getter and a setter, a sometimes used idiom is to provide something along the line of
struct foo {
int val() const { return val_; }
int& val() { return val_; }
private:
int val_;
};
Such that, depending on the constness of the instance you get a reference or a copy:
void bar(const foo& a, foo& b) {
auto x = a.val(); // calls the const method returning an int
b.val() = x; // calls the non-const method returning an int&
};
Whether this is good style in general is a matter of opinion. There are cases where it causes confusion and other cases where this behaviour is just what you would expect (see below).
In any case, it is more important to design the interface of a class according to what the class is supposed to do and how you want to use it rather than blindly following conventions about setters and getters (eg you should give the method a meaningful name that expresses what it does, not just in terms of "pretend to be encapsulated and now provide me access to all your internals via getters", which is what using getters everywhere actually means).
Concrete example
Consider that element access in containers is usually implemented like this. As a toy example:
struct my_array {
int operator[](unsigned i) const { return data[i]; }
int& operator[](unsigned i) { return data[i]; }
private:
int data[10];
};
It is not the containers job to hide the elements from the user (even data could be public). You dont want different methods to access elements depending on whether you want to read or write the element, hence providing a const and a non-const overload makes perfectly sense in this case.
non-const reference from get vs encapsulation
Maybe not that obvious, but it is a bit controversial whether providing getters and setters supports encapsulation or the opposite. While in general this matter is to a large extend opinion based, for getters that return non const references it is not so much about opinions. They do break encapuslation. Consider
struct broken {
void set(int x) {
counter++;
val = x;
}
int& get() { return x; }
int get() const { return x; }
private:
int counter = 0;
int value = 0;
};
This class is broken as the name suggests. Clients can simply grab a reference and the class has no chance to count the number of times the value is modified (as the set suggests). Once you return a non-const reference then regarding encapsulation there is little difference to making the member public. Hence, this is used only for cases where such behaviour is natural (eg container).
PS
Note that your example returns a const T& rather than a value. This is reasonable for template code, where you dont know how expensive a copy is, while for an int you wont gain much by returning a const int& instead of an int. For the sake of clarity I used non-template examples, though for templated code you would probably rather return a const T&.
First let me rephrase your question:
Why have a non-const getter for a member, rather than just making the member public?
Several possible reasons reasons:
1. Easy to instrument
Whoever said the non-const getter needs to be just:
T& get() { return val; }
? it could well be something like:
T& get() {
if (check_for_something_bad()) {
throw std::runtime_error{
"Attempt to mutate val when bad things have happened");
}
return val;
}
However, as #BenVoigt suggests, it is more appropriate to wait until the caller actually tries to mutate the value through the reference before spewing an error.
2. Cultural convention / "the boss said so"
Some organizations enforce coding standards. These coding standards are sometimes authored by people who are possibly overly-defensive. So, you might see something like:
Unless your class is a "plain old data" type, no data members may be public. You may use getter methods for such non-public members as necessary.
and then, even if it makes sense for a specific class to just allow non-const access, it won't happen.
3. Maybe val just isn't there?
You've given an example in which val actually exists in an instance of the class. But actually - it doesn't have to! The get() method could return some sort of a proxy object, which, upon assignment, mutation etc. performs some computation (e.g. storing or retrieving data in a database; or flipping a bit, which itself is not addressable like an object needs to be).
4. Allows changing class internals later without changing user code
Now, reading items 1. or 3, above, you might ask "but my struct S does have val!" or "by my get() doesn't do anything interesting!" - well, true, they don't; but you might want to change this behavior in the future. Without a get(), all of your class' users will need to change their code. With a get(), you only need to make changes to the implementation of struct S.
Now, I don't advocate for this kind of a design approach approach, but some programmers do.
get() is callable by non const objects which are allowed to mutate, you can do:
S r(0);
r.get() = 1;
but if you make r const as const S r(0), the line r.get() = 1 no longer compile, not even to retrieve the value, that's why you need a const version const T& get() const to at least to able to retrieve the value for const objects, doing so allows you do:
const S r(0)
int val = r.get()
The const version of member functions try to be consistent with the constness property of the object the call is made on, i.e if the object is immutable by being const and the member function returns a reference, it may reflect the constness of the caller by returning a const reference, thus preserving the immutability property of the object.
It depends on the purpose of S. If it's some kind of a thin wrapper, it might be appropriate to allow the user to access the underlaying value directly.
One of the real-life examples is std::reference_wrapper.
No. If a getter simply returns a non-const reference to a member, like this:
private:
Object m_member;
public:
Object &getMember() {
return m_member;
}
Then m_member should be public instead, and the accessor is not needed. There is absolutely no point making this member private, and then create an accessor, which gives all access to it.
If you call getMember(), you can store the resulting reference to a pointer/reference, and afterwards, you can do whatever you want with m_member, the enclosing class will know nothing about it. It's the same, as if m_member had been public.
Note, that if getMember() does some additional task (for example, it doesn't just simply return m_member, but lazily constructs it), then getMember() could be useful:
Object &getMember() {
if (!m_member) m_member = new Object;
return *m_member;
}
at first I'm new here and English isn't my native language so apologize for any grammatical failures but I find this community really nice so I will try to ask my question as precise as I can.
I want to add my own class object into a stl container multiset and want to sort it with my own overloaded less operator defined in my class. I really tried out several solutions but nothing really worked so I hope someone can give me some useful hints to solve it.
Here is my general idea of my class definition:
class object {
public:
int first;
string second;
object(int f, string s) {
first = f;
second = s;
}
bool operator<(const object &comp) {
return first < comp.first;
}
};
This was my first try and it didn't work so I also tried out to declare the overloaded operator as a friend method but it didn't work also.
Here is a short code extract from my main function:
includes ...
//code omitted
int main() {
multiset<object*> mmset;
mmset.insert(new object(10, "test"));
mmset.insert(new object(11, "test"));
return 0;
}
After a while I started to debugging my code and try to figure out where the problem is and I come across the following thing that have made me a bit suspicious.
code extract from the stl:
// TEMPLATE STRUCT less
template<class _Ty>
struct less : public binary_function<_Ty, _Ty, bool>
{ // functor for operator<
bool operator()(const _Ty& _Left, const _Ty& _Right) const
{ // apply operator< to operands
return (_Left < _Right);
}
};
I have set a breakpoint on this line and observed what the program is doing here and I don't know why, but it only compares the addresses from the two objects and return so always false. It never calls my overloaded less operator although the operator exists and the _Left and _Right variables contain the address to my object.
I would really appreciate it if someone could help me.
Best Greetings
Tom
You are not storing objects in your multiset. You are storing object*s. These are pointers to objects. This means the set will order the pointers that you're inserting into it.
It seems like you really just want a multiset<object>:
multiset<object> mmset;
mmset.emplace(10, "test");
mmset.emplace(11, "test");
Now it will use < to compare the objects themselves.
If you really want to store pointers, you'll need to provide a custom comparator to the multiset. In C++11, you can do this easily with a lambda:
auto f = [](int* a, int* b) { return *a < *b; };
std::multiset<int*, decltype(f)> mmset(f);
Pre-C++11, you can create a function object that implements operator() with the same body as this lambda function.
Thank you for your help. That's seems to be a good solution to solve this problem.
I have searched a bit deeper in the new C++11 standard and found out that there is another possible solution to solve this with a little bit simpler implementation but the same result :)
I will post it just as information for other seekers with the same problem.
You can pass any constructor a stl container a so-called comparison object which the container will use to arrange your content.
The only thing you have to do is to define the overloaded operator() in your class and "misuse" them as a comparison operator.
class object {
int first;
string second;
object() { };
object(int f, string s) {
first = f;
second = s;
}
bool operator()(const object *comp1, const object *comp2) const {
return comp1->first < comp2->first;
}
}
The other thing what you have additionally to do now is to pass the object as the second argument in your definition of the container:
multiset(object*, object) mmset;
You can also use an extra class for this purpose just for comparison because otherwise you need a default constructor to use this class in this way.
I need to use an unordered_multimap for my Note objects and the keys will be the measureNumber member of my objects. I'm trying to implement it as shown here but I'm stuck.
First off, I don't understand why I have to overwrite the operator== before I can use it. I'm also confused about why I need a hash and how to implement it. In this example here, none of those two things is done.
So based on the first example, this is what I have:
class Note {
private:
int measureNumber;
public:
inline bool operator== (const Note ¬eOne, const Note ¬eTwo);
}
inline bool Note::operator ==(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber == noteTwo.measureNumber;
}
I don't know how to implement the hash part though. Any ideas?
std::multimap is based on a sorted binary tree, which uses a less-than operation to sort the nodes.
std::unordered_multimap is based on a hash table, which uses hash and equality operations to organize the nodes without sorting them.
The sorting or hashing is based on the key values. If the objects are the keys, then you need to define these operations. If the keys are of predefined type like int or string, then you don't need to worry about it.
The problem with your pseudocode is that measureNumber is private, so the user of Note cannot easily specify the key to the map. I would recommend making measureNumber public or rethinking the design. (Is measure number really a good key value? I'm guessing this is musical notation.)
std::multimap< int, Note > notes;
Note myNote( e_sharp, /* octave */ 3, /* measure */ 5 );
notes.insert( std::make_pair( myNote.measureNumber, myNote ) );
The objects can be keys and values at the same time, if you use std::multiset or std::unordered_multiset, in which case you would want to define the operator overload (and possibly hash). If operator== (or operator<) is a member function, then the left-hand side becomes this and the right-hand side becomes the sole argument. Usually these functions should be non-member friends. So then you would have
class Note {
private:
int measureNumber;
public:
friend bool operator< (const Note ¬eOne, const Note ¬eTwo);
}
inline bool operator <(const Note& noteOne, const Note& noteTwo){
return noteOne.measureNumber < noteTwo.measureNumber;
}
This class could be used with std::multiset. To perform a basic lookup, you can construct a dummy object with uninitialized values except for measureNumber — this only works for simple object types.
I need to use an unordered_multimap for my Note objects and the keys
will be the measureNumber member of my objects.
OK - I'm not sure whether you're after a multiset, unordered_multiset, multimap, or unordered_multimap. I know your title refers to unordered_multimap, but the link you provided leads to unordered_multiset. There are a multitude of considerations which should be taken into account when choosing a container, but second-guessing which will be the best-performing without profiling is a risky business.
I don't understand why I have to overwrite the operator== before I can use it.
I'm also confused about why I need a hash and how to implement it.
In this example here, none of those two things is done.
You need the operator== and std::hash as they're used internally by unordered_multimap and unordered_multiset. In the example you linked to, the key is of type int, so operator== and std::hash<int> are already defined. If you choose to use Note as a key, you have to define these yourself.
I'd recommend starting with a multiset if you don't need to change the elements frequently. If you do want to be able to change Notes without erasing and inserting, I'd recommend removing measureNumber as a member of Note and using a multimap<int, Note>.
If you feel an unordered_ version of your container would better suit your needs, you still have the set vs map choice. If you choose unordered_multimap<int, Note> (having removed measureNumber from Note), then as in your linked example, the key is int. So you won't have to define anything special for this to work. If you choose to keep measureNumber as a member of Note and use unordered_multiset<Note>, then Note is the key and so you need to do further work, e.g.
#include <functional>
#include <unordered_set>
class Note; // Forward declaration to allow specialisation of std::hash<>
namespace std {
template<>
class hash<Note> {
public:
size_t operator()(const Note &) const; // declaration of operator() to
// allow befriending by Note
};
}
class Note {
private:
int measureNumber;
public:
// functions befriended to allow access to measureNumber
friend bool operator== (const Note &, const Note &);
friend std::size_t std::hash<Note>::operator()(const Note &) const;
};
inline bool operator== (const Note ¬eOne, const Note ¬eTwo) {
return noteOne.measureNumber == noteTwo.measureNumber;
}
std::size_t std::hash<Note>::operator()(const Note ¬e) const {
return std::hash<int>()(note.measureNumber);
}
This lets you create and use std::unordered_multiset<Note>. However, I'm not sure this is really what you need; you could even find that a sorted std::vector<Note> is best for you. Further research and thought as to how you'll use your container along with profiling should give the best answer.
I have a function that processes a given vector, but may also create such a vector itself if it is not given.
I see two design choices for such a case, where a function parameter is optional:
Make it a pointer and make it NULL by default:
void foo(int i, std::vector<int>* optional = NULL) {
if(optional == NULL){
optional = new std::vector<int>();
// fill vector with data
}
// process vector
}
Or have two functions with an overloaded name, one of which leaves out the argument:
void foo(int i) {
std::vector<int> vec;
// fill vec with data
foo(i, vec);
}
void foo(int i, const std::vector<int>& optional) {
// process vector
}
Are there reasons to prefer one solution over the other?
I slightly prefer the second one because I can make the vector a const reference, since it is, when provided, only read, not written. Also, the interface looks cleaner (isn't NULL just a hack?). And the performance difference resulting from the indirect function call is probably optimized away.
Yet, I often see the first solution in code. Are there compelling reasons to prefer it, apart from programmer laziness?
I would not use either approach.
In this context, the purpose of foo() seems to be to process a vector. That is, foo()'s job is to process the vector.
But in the second version of foo(), it is implicitly given a second job: to create the vector. The semantics between foo() version 1 and foo() version 2 are not the same.
Instead of doing this, I would consider having just one foo() function to process a vector, and another function which creates the vector, if you need such a thing.
For example:
void foo(int i, const std::vector<int>& optional) {
// process vector
}
std::vector<int>* makeVector() {
return new std::vector<int>;
}
Obviously these functions are trivial, and if all makeVector() needs to do to get it's job done is literally just call new, then there may be no point in having the makeVector() function. But I'm sure that in your actual situation these functions do much more than what is being shown here, and my code above illustrates a fundamental approach to semantic design: give one function one job to do.
The design I have above for the foo() function also illustrates another fundamental approach that I personally use in my code when it comes to designing interfaces -- which includes function signatures, classes, etc. That is this: I believe that a good interface is 1) easy and intuitive to use correctly, and 2) difficult or impossible to use incorrectly . In the case of the foo() function we are implictly saying that, with my design, the vector is required to already exist and be 'ready'. By designing foo() to take a reference instead of a pointer, it is both intuitive that the caller must already have a vector, and they are going to have a hard time passing in something that isn't a ready-to-go vector.
I would definitely favour the 2nd approach of overloaded methods.
The first approach (optional parameters) blurs the definition of the method as it no longer has a single well-defined purpose. This in turn increases the complexity of the code, making it more difficult for someone not familiar with it to understand it.
With the second approach (overloaded methods), each method has a clear purpose. Each method is well-structured and cohesive. Some additional notes:
If there's code which needs to be duplicated into both methods, this can be extracted out into a separate method and each overloaded method could call this external method.
I would go a step further and name each method differently to indicate the differences between the methods. This will make the code more self-documenting.
While I do understand the complaints of many people regarding default parameters and overloads, there seems to be a lack of understanding to the benefits that these features provide.
Default Parameter Values:
First I want to point out that upon initial design of a project, there should be little to no use for defaults if well designed. However, where defaults' greatest assets comes into play is with existing projects and well established APIs. I work on projects that consist of millions of existing lines of code and do not have the luxury to re-code them all. So when you wish to add a new feature which requires an extra parameter; a default is needed for the new parameter. Otherwise you will break everyone that uses your project. Which would be fine with me personally, but I doubt your company or users of your product/API would appreciate having to re-code their projects on every update. Simply, Defaults are great for backwards compatibility! This is usually the reason you will see defaults in big APIs or existing projects.
Function Overrides:
The benefit of function overrides is that they allow for the sharing of a functionality concept, but with with different options/parameters. However, many times I see function overrides lazily used to provide starkly different functionality, with just slightly different parameters. In this case they should each have separately named functions, pertaining to their specific functionality (As with the OP's example).
These, features of c/c++ are good and work well when used properly. Which can be said of most any programming feature. It is when they are abused/misused that they cause problems.
Disclaimer:
I know that this question is a few years old, but since these answers came up in my search results today (2012), I felt this needed further addressing for future readers.
I agree, I would use two functions. Basically, you have two different use cases, so it makes sense to have two different implementations.
I find that the more C++ code I write, the fewer parameter defaults I have - I wouldn't really shed any tears if the feature was deprecated, though I would have to re-write a shed load of old code!
A references can't be NULL in C++, a really good solution would be to use Nullable template.
This would let you do things is ref.isNull()
Here you can use this:
template<class T>
class Nullable {
public:
Nullable() {
m_set = false;
}
explicit
Nullable(T value) {
m_value = value;
m_set = true;
}
Nullable(const Nullable &src) {
m_set = src.m_set;
if(m_set)
m_value = src.m_value;
}
Nullable & operator =(const Nullable &RHS) {
m_set = RHS.m_set;
if(m_set)
m_value = RHS.m_value;
return *this;
}
bool operator ==(const Nullable &RHS) const {
if(!m_set && !RHS.m_set)
return true;
if(m_set != RHS.m_set)
return false;
return m_value == RHS.m_value;
}
bool operator !=(const Nullable &RHS) const {
return !operator==(RHS);
}
bool GetSet() const {
return m_set;
}
const T &GetValue() const {
return m_value;
}
T GetValueDefault(const T &defaultValue) const {
if(m_set)
return m_value;
return defaultValue;
}
void SetValue(const T &value) {
m_value = value;
m_set = true;
}
void Clear()
{
m_set = false;
}
private:
T m_value;
bool m_set;
};
Now you can have
void foo(int i, Nullable<AnyClass> &optional = Nullable<AnyClass>()) {
//you can do
if(optional.isNull()) {
}
}
I usually avoid the first case. Note that those two functions are different in what they do. One of them fills a vector with some data. The other doesn't (just accept the data from the caller). I tend to name differently functions that actually do different things. In fact, even as you write them, they are two functions:
foo_default (or just foo)
foo_with_values
At least I find this distinction cleaner in the long therm, and for the occasional library/functions user.
I, too, prefer the second one. While there are not much difference between the two, you are basically using the functionality of the primary method in the foo(int i) overload and the primary overload would work perfectly without caring about existence of lack of the other one, so there is more separation of concerns in the overload version.
In C++ you should avoid allowing valid NULL parameters whenever possible. The reason is that it substantially reduces callsite documentation. I know this sounds extreme but I work with APIs that take upwards of 10-20 parameters, half of which can validly be NULL. The resulting code is almost unreadable
SomeFunction(NULL, pName, NULL, pDestination);
If you were to switch it to force const references the code is simply forced to be more readable.
SomeFunction(
Location::Hidden(),
pName,
SomeOtherValue::Empty(),
pDestination);
I'm squarely in the "overload" camp. Others have added specifics about your actual code example but I wanted to add what I feel are the benefits of using overloads versus defaults for the general case.
Any parameter can be "defaulted"
No gotcha if an overriding function uses a different value for its default.
It's not necessary to add "hacky" constructors to existing types in order to allow them to have default.
Output parameters can be defaulted without needing to use pointers or hacky global objects.
To put some code examples on each:
Any parameter can be defaulted:
class A {}; class B {}; class C {};
void foo (A const &, B const &, C const &);
inline void foo (A const & a, C const & c)
{
foo (a, B (), c); // 'B' defaulted
}
No danger of overriding functions having different values for the default:
class A {
public:
virtual void foo (int i = 0);
};
class B : public A {
public:
virtual void foo (int i = 100);
};
void bar (A & a)
{
a.foo (); // Always uses '0', no matter of dynamic type of 'a'
}
It's not necessary to add "hacky" constructors to existing types in order to allow them to be defaulted:
struct POD {
int i;
int j;
};
void foo (POD p); // Adding default (other than {0, 0})
// would require constructor to be added
inline void foo ()
{
POD p = { 1, 2 };
foo (p);
}
Output parameters can be defaulted without needing to use pointers or hacky global objects:
void foo (int i, int & j); // Default requires global "dummy"
// or 'j' should be pointer.
inline void foo (int i)
{
int j;
foo (i, j);
}
The only exception to the rule re overloading versus defaults is for constructors where it's currently not possible for a constructor to forward to another. (I believe C++ 0x will solve that though).
I would favour a third option:
Separate into two functions, but do not overload.
Overloads, by nature, are less usable. They require the user to become aware of two options and figure out what the difference between them is, and if they're so inclined, to also check the documentation or the code to ensure which is which.
I would have one function that takes the parameter,
and one that is called "createVectorAndFoo" or something like that (obviously naming becomes easier with real problems).
While this violates the "two responsibilities for function" rule (and gives it a long name), I believe this is preferable when your function really does do two things (create vector and foo it).
Generally I agree with others' suggestion to use a two-function approach. However, if the vector created when the 1-parameter form is used is always the same, you could simplify things by instead making it static and using a default const& parameter instead:
// Either at global scope, or (better) inside a class
static vector<int> default_vector = populate_default_vector();
void foo(int i, std::vector<int> const& optional = default_vector) {
...
}
The first way is poorer because you cannot tell if you accidentally passed in NULL or if it was done on purpose... if it was an accident then you have likely caused a bug.
With the second one you can test (assert, whatever) for NULL and handle it appropriately.