About the implementations of c++ stl predicate - c++

I wonder how c++ stl predicate is implemented? For example in copy_if()
http://www.cplusplus.com/reference/algorithm/copy_if/
According to Effective STL, predicate is passed by value. For the following code for int,
struct my_predicate{
int var_1;
float var_2;
bool operator()(const int& arg){
// some processing here
}
}
How is copy_if() implemented regarding to passing value of my_predicate? There are var_1 and var_2 here. For other predicates, there may be different variables in the struct.
If passing by reference or pointer, that is very reasonable to me.
Thanks a lot!

(I hope I'm not misunderstanding your question.)
The reason why it can be passed by value is that the 'my_predicate' struct has an implicit copy constructor automatically generated by the compiler. You can pass it by value because it has a copy constructor.
In practice, It is very likely the compiler will optimise away the copy. In fact it is very likely the compiler will optimise away the entire predicate object and for example in the case of std::copy_if reduce the code to the equivalent of a for loop + if statement.
By convention predicates are passed by value. They are not meant to be heavy weight objects and for small objects even if the entire predicate isn't optimised away, it is faster to pass by value anyway.
Also generally you cannot pass temporary values by non-const reference (let alone pointer) so:
std::copy_if(begin(..),end(..),my_predicate{});
would not compile as your predicate is not a const function. With pass by value you can get away with this.

Related

C++ overloading the equality operator. Should I write my function to accept argument passed by reference or value?

I want to overload the == operator for a simple struct
struct MyStruct {
public:
int a;
float b;
bool operator==( ) { }
}
All the examples I'm seeing seem to pass the value by reference using a &.
But I really want to pass these structs by value.
Is there anything wrong with me writing this as
bool operator== (MyStruct another) { return ( (a==another.a) && (b==another.b) ); }
It should really not matter expect that you pay the penalty of a copy when you pass by value. This applies if the struct is really heavy. In the simple example you quote, there may not be a big difference.
That being said, passing by const reference makes more sense since it expresses the intent of the overloaded function == clearly. const makes sure that the overloaded function accidentally doesn't modify the object and passing by reference saves you from making a copy. For == operator, there is no need to pass a copy just for comparison purposes.
If you are concerned about consistency, it's better to switch the other pass by value instances to pass by const ref.
While being consistent is laudable goal, one shouldn't overdo it. A program containing only 'A' characters would be very consistent, but hardly useful. Argument passing mechanism is not something you do out of consistency, it is a technical decision based on certain technical aspects.
For example, in your case, passing by value could potentially lead to better performance, since the struct is small enough and on AMD64 ABI (the one which is used on any 64bit Intel/AMD chip) it will be passed in a register, thus saving time normally associated with dereferencing.
On the hand, in your case, it is reasonable to assume that the function will be inlined, and passing scheme will not matter at all (since it won't be passed). This is proven by codegen here (no call to operator== exist in generated assembly): https://gcc.godbolt.org/z/G7oEgE

comparator for sorting a vector contatining pointers to objects of custom class

By this question I am also trying to understand fundamentals of C++, as I am very new to C++. There are many good answers to problem of sorting a vector/list of custom classes, like this. In all of the examples the signature of comparator functions passed to sort are like this:
(const ClassType& obj1, const ClassType& obj2)
Is this signature mandatory for comparator functions? Or we can give some thing like this also:
(ClassType obj1, ClassType obj2)
Assuming I will modify the body of comparator accordingly.
If the first signature is mandatory, then why?
I want to understand reasons behind using const and reference'&'.
What I can think is const is because you don't want the comparator function to be able to modify the element. And reference is so that no multiple copies are created.
How should my signature be if I want to sort a vector which contains pointers to objects of custom class? Like (1) or (2) (see below) or both will work?
vertor to be sorted is of type vector
(1)
(const ClassType*& ptr1, const ClassType*& ptr2)
(2)
(ClassType* ptr1, ClassType* ptr2)
I recommend looking through This Documentation.
It explains that the signature of the compare function must be equivalent to:
bool cmp(const Type1& a, const Type2& b);
Being more precise it then goes on to explain that each parameter needs to be a type that is implicitly convertable from an object that is obtained by dereferencing an iterator to the sort function.
So if your iterator is std::vector<ClassType*>::iterator then your arguments need to be implicitly convertable to ClassType*.
If you are using something relatively small like an int or a pointer then I would accept them by value:
bool cmp(const ClassType* ptr1, const ClassType* ptr2) // this is more efficient
NOTE: I made them pointers to const because a sort function should not modify the values it is sorting.
(ClassType obj1, ClassType obj2)
In most situations this signature will also work, for comparators. The reason it is not used is because you have to realize that this is passing the objects by value, which requires the objects to be copied.
This will be a complete waste. The comparator function does not need to have its own copies of its parameters. All it needs are references to two objects it needs to compare, that's it. Additionally, a comparator function does not need to modify the objects it is comparing. It should not do that. Hence, explicitly using a const reference forces the compiler to issue a compilation error, if the comparator function is coded, in error, to modify the object.
And one situation where this will definitely not work is for classes that have deleted copy constructors. Instances of those classes cannot be copied, at all. You can still emplace them into the containers, but they cannot be copied. But they still can be compared.
const is so you know not to change the values while you're comparing them. Reference is because you don't want to make a copy of the value while you're trying to compare them -- they may not even be copyable.
It should look like your first example -- it's always a reference to the const type of the elements of the vector.
If you have vector, it's always:
T const & left, T const & right
So, if T is a pointer, then the signature for the comparison includes the comparison.
There's nothing really special about the STL. I use it for two main reasons, as a slightly more convenient array (std::vector) and because a balanced binary search tree is a hassle to implement. STL has a standard signature for comparators, so all the algorithms are written to operate on the '<' operation (so they test for equality with if(!( a < b || b < a)) ). They could just as easily have chosen the '>' operation or the C qsort() convention, and you can write your own templated sort routines to do that if you want. However it's easier to use C++ if everything uses the same conventions.
The comparators take const references because a comparator shouldn't modify what it is comparing, and because references are more efficient for objects than passing by value. If you just want to sort integers (rarely you need to sort just raw integers in a real program, though it's often done as an exercise) you can quite possibly write your own sort that passes by value and is a tiny bit faster than the STL sort as a consequence.
You can define the comparator with the following signature:
bool com(ClassType* const & lhs, ClassType* const & rhs);
Note the difference from your first option. (What is needed is a const reference to a ClassType* instead of a reference to a const ClassType*)
The second option should also be good.

Is it costly to pass an initializer_list as a list by value?

I want to pass a std::list as a parameter to fn(std::list<int>), so I do fn({10, 21, 30}) and everybody is happy.
However, I've come to learn that one shouldn't pass list by value, cause it's costly. So, I redefine my fn as fn(std::list<int> &). Now, when I do the call fn({10, 21, 30}), I get an error: candidate function not viable: cannot convert initializer list argument to 'std::list<int> &'.
QUESTION TIME
Is the "you shall not pass an costly object by value" rule valid here? We aren't passing a list after all, but an initializer_list, no?
If the rule still applies, what's the easy fix here?
I guess my doubt comes from the fact that I don't know clearly what happens when one passes an initializer_list argument to a function that accepts a list.
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
However, I've come to learn that one shouldn't pass list by value, cause it's costly.
That's not entirely accurate. If you need to pass in a list that the function can modify, where the modifications shouldn't be externally visible, you do want to pass a list by value. This gives the caller the ability to choose whether to copy or move from an existing list, so gives you the most reasonable flexibility.
If the modifications should be externally visible, you should prevent temporary list objects from being passed in, since passing in a temporary list object would prevent the caller from being able to see the changes made to the list. The flexibility to silently pass in temporary objects is the flexibility to shoot yourself in the foot. Don't make it too flexible.
If you need to pass in a list that the function will not modify, then const std::list<T> & is the type to use. This allows either lvalues or rvalues to be passed in. Since there won't be any update to the list, there is no need for the caller to see any update to the list, and there is no problem passing in temporary list objects. This again gives the caller the most reasonable flexibility.
Is the "you shall not pass an costly object by value" rule valid here? We aren't passing a list after all, but an initializer_list, no?
You're constructing a std::list from an initializer list. You're not copying that std::list object, but you are copying the list items from the initializer list to the std::list. If the copying of the list items is cheap, you don't need to worry about it. If the copying of the list items is expensive, then it should be up to the caller to construct the list in some other way, it still doesn't need to be something to worry about inside your function.
If the rule still applies, what's the easy fix here?
Both passing std::list by value or by const & allow the caller to avoid pointless copies. Which of those you should use depends on the results you want to achieve, as explained above.
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
Passing the list by value constructs a new std::list object in the location of the function parameter, using the function argument to specify how to construct it. This may or may not involve a copy or a move of an existing std::list object, depending on what the caller specifies as the function argument.
The expression {10, 21, 30} will construct a initializer_list<int>
This in turn will be used to create a list<int>
That list will be a temporary and a temporarys will not bind to a
non-const reference.
One fix would be to change the prototype for you function to
fn(const std::list<int>&)
This means that you can't edit it inside the function, and you probably don't need to.
However, if you must edit the parameter inside the function, taking it by value would be appropriate.
Also note, don't optimize prematurely, you should always use idiomatic
constructs that clearly represents what you want do do, and for functions,
that almost always means parameters by const& and return by value.
This is easy to use right, hard to use wrong, and almost always fast enough.
Optimization should only be done after profiling, and only for the parts of the program that you have measured to need it.
Quoting the C++14 standard draft, (emphasis are mine)
18.9 Initializer lists [support.initlist]
2: An object of type initializer_list provides access to an array of
objects of type const E. [ Note: A pair of pointers or a pointer plus
a length would be obvious representations for initializer_list.
initializer_list is used to implement initializer lists as specified
in 8.5.4. Copying an initializer list does not copy the underlying
elements. —end note ]
std::list has a constructor which is used to construct from std::initializer_list. As you can see, it takes it by value.
list(initializer_list<T>, const Allocator& = Allocator());
If you are never going to modify your parameter, then fn(const std::list<int>&) will do just fine. Otherwise, fn(std::list<int>) will suffice well for.
To answer your questions:
Is the "you shall not pass an costly object by value" rule valid here?
We aren't passing a list after all, but an initializer_list, no?
std::initializer_list is not a costly object. But std::list<int> surely sounds like a costly object
If the rule still applies, what's the easy fix here?
Again, it's not costly
Is list generated on the spot and then passed by value? If not, what is it that actually happens?
Yes, it is... your list object is created on the spot at run-time right before the program enters your function scope
However, I've come to learn that one shouldn't pass list by value, cause it's costly. So, I redefine my fn as fn(std::list &). Now, when I do the call fn({10, 21, 30}), I get an error: candidate function not viable: cannot convert initializer list argument to 'std::list &'.
A way to fix the problem would be:
fn(std::list<int>& v) {
cout << v.size();
}
fn(std::list<int>&& v) {
fn(v);
}
Now fn({1, 2, 3 }); works as well (it will call the second overloaded function that accepts a list by rvalue ref, and then fn(v); calls the first one that accepts lvalue references.
fn(std::list<int> v)
{
}
The problem with this function is that it can be called like:
list<int> biglist;
fn(biglist);
And it will make a copy. And it will be slow. That's why you want to avoid it.
I would give you the following solutions:
Overloaded your fn function to accept both rvalues and lvalues
properly as shown before.
Only use the second function (the one that accepts only rvalue
references). The problem with this approach is that will throw a compile error even if it's called with a lvalue reference, which is something you want to allow.
Like the other answers and comments you can use a const reference to the list.
void fn(const std::list<int>& l)
{
for (auto it = l.begin(); it != l.end(); ++it)
{
*it; //do something
}
}
If this fn function is heavily used and you are worried about the overhead of constructing and destructing the temporary list object, you can create a second function that receives the initializer_list directly that doesn't involve any copying whatsoever. Using a profiler to catch such a performance hot spot is not trivial in many cases.
void fn(const std::initializer_list<int>& l)
{
for (auto it = l.begin(); it != l.end(); ++it)
{
*it; //do something
}
}
You can have std::list<> because in fact you're making temporary list and passing initializer_list by value is cheap. Also accessing that list later can be faster than a reference because you avoid dereferencing.
You could hack it by having const& std::list as parameter or like that
void foo( std::list<int> &list ) {}
int main() {
std::list<int> list{1,2,3};
foo( list );
}
List is created on function scope and this constructor is called
list (initializer_list<value_type> il,
const allocator_type& alloc = allocator_type())
So there's no passing list by value. But if you'll use that function and pass list as parameter it'll be passed by value.

Why pass by value and not by const reference?

Since const reference is pretty much the same as passing by value but without creating a copy (to my understanding). So is there a case where it is needed to create a copy of the variables (so we would need to use pass by value).
There are situations where you don't modify the input, but you still need an internal copy of the input, and then you may as well take the arguments by value. For example, suppose you have a function that returns a sorted copy of a vector:
template <typename V> V sorted_copy_1(V const & v)
{
V v_copy = v;
std::sort(v_copy.begin(), v_copy.end());
return v;
}
This is fine, but if the user has a vector that they never need for any other purpose, then you have to make a mandatory copy here that may be unnecessary. So just take the argument by value:
template <typename V> V sorted_copy_2(V v)
{
std::sort(v.begin(), v.end());
return v;
}
Now the entire process of producing, sorting and returning a vector can be done essentially "in-place".
Less expensive examples are algorithms which consume counters or iterators which need to be modified in the process of the algorithm. Again, taking those by value allows you to use the function parameter directly, rather than requiring a local copy.
It's usually faster to pass basic data types such as ints, floats and pointers by value.
Your function may want to modify the parameter locally, without altering the state of the variable passed in.
C++11 introduces move semantics. To move an object into a function parameter, its type cannot be const reference.
Like so many things, it's a balance.
We pass by const reference to avoid making a copy of the object.
When you pass a const reference, you pass a pointer (references are pointers with extra sugar to make them taste less bitter). And assuming the object is trivial to copy, of course.
To access a reference, the compiler will have to dereference the pointer to get to the content [assuming it can't be inlined and the compiler optimises away the dereference, but in that case, it will also optimise away the extra copy, so there's no loss from passing by value either].
So, if your copy is "cheaper" than the sum of dereferencing and passing the pointer, then you "win" when you pass by value.
And of course, if you are going to make a copy ANYWAY, then you may just as well make the copy when constructing the argument, rather than copying explicitly later.
The best example is probably the Copy and Swap idiom:
C& operator=(C other)
{
swap(*this, other);
return *this;
}
Taking other by value instead of by const reference makes it much easier to write a correct assignment operator that avoids code duplication and provides a strong exception guarantee!
Also passing iterators and pointers is done by value since it makes those algorithms much more reasonable to code, since they can modify their parameters locally. Otherwise something like std::partition would have to immediately copy its input anyway, which is both inefficient and looks silly. And we all know that avoiding silly-looking code is the number one priority:
template<class BidirIt, class UnaryPredicate>
BidirIt partition(BidirIt first, BidirIt last, UnaryPredicate p)
{
while (1) {
while ((first != last) && p(*first)) {
++first;
}
if (first == last--) break;
while ((first != last) && !p(*last)) {
--last;
}
if (first == last) break;
std::iter_swap(first++, last);
}
return first;
}
A const& cannot be changed without a const_cast through the reference, but it can be changed. At any point where code leaves the "analysis range" of your compiler (maybe a function call to a different compilation unit, or through a function pointer it cannot determine the value of at compilation time) it must assume that the value referred to may have changed.
This costs optimization. And it can make it harder to reason about possible bugs or quirks in your code: a reference is non-local state, and functions that operate only on local state and produce no side effects are really easy to reason about. Making your code easy to reason about is a large boon: more time is spent maintaining and fixing code than writing it, and effort spent on performance is fungible (you can spent it where it matters, instead of wasting time on micro optimizations everywhere).
On the other hand, a value requires that the value be copied into local automatic storage, which has costs.
But if your object is cheap to copy, and you don't want the above effect to occur, always take by value as it makes the compilers job of understanding the function easier.
Naturally only when the value is cheap to copy. If expensive to copy, or even if the copy cost is unknown, that cost should be enough to take by const&.
The short version of the above: taking by value makes it easier for you and the compiler to reason about the state of the parameter.
There is another reason. If your object is cheap to move, and you are going to store a local copy anyhow, taking by value opens up efficiencies. If you take a std::string by const&, then make a local copy, one std::string may be created in order to pass thes parameter, and another created for the local copy.
If you took the std::string by value, only one copy will be created (and possibly moved).
For a concrete example:
std::string some_external_state;
void foo( std::string const& str ) {
some_external_state = str;
}
void bar( std::string str ) {
some_external_state = std::move(str);
}
then we can compare:
int main() {
foo("Hello world!");
bar("Goodbye cruel world.");
}
the call to foo creates a std::string containing "Hello world!". It is then copied again into the some_external_state. 2 copies are made, 1 string discarded.
The call to bar directly creates the std::string parameter. Its state is then moved into some_external_state. 1 copy created, 1 move, 1 (empty) string discarded.
There are also certain exception safety improvements caused by this technique, as any allocation happens outside of bar, while foo could throw a resource exhausted exception.
This only applies when perfect forwarding would be annoying or fail, when moving is known to be cheap, when copying could be expensive, and when you know you are almost certainly going to make a local copy of the parameter.
Finally, there are some small types (like int) which the non-optimized ABI for direct copies is faster than the non-optimized ABI for const& parameters. This mainly matters when coding interfaces that cannot or will not be optimized, and is usually a micro optimization.

What is the difference between these two parameters in C++?

I am new to C++ and currently am learning about templates and iterators.
I saw some code implementing custom iterators and I'm curious to know what the difference between these two iterator parameters is:
iterator & operator=(iterator i) { ... i.someVar }
bool operator==(const iterator & i) { ... i.someVar }
They implement the = and == operators for the particular iterator. Assuming the iterator class has a member variable 'someVar', why is one operator implemented using "iterator i" and another with "iterator & i"? Is there any difference between the two "i.someVar" expressions?
I googled a little and found this question
Address of array - difference between having an ampersand and no ampersand
to which the answer was "the array is converted to a pointer and its value is the address of the first thing in the array." I'm not sure this is related, but it seems like the only valid explanation I could find.
Thank you!
operator= takes its argument by value (a.k.a. by copy). operator == takes its argument by const reference (a.k.a. by address, albeit with a guarantee that the object will not be modified).
An iterator may be/contain a pointer into an array but it is not itself an array.
The ampersand (&) has different contextual meanings. Used in an expression, it behaves as an operator. Used in a declaration such as iterator & i, it forms part of the type iterator & and indicates that i is a reference, as opposed to an object.
For more discussion (with pictures!), see Pass by Reference / Value in C++ and What's the difference between passing by reference vs. passing by value? (this one is language agnostic).
the assignment operator = takes the iterator i as value, which means a copy of the original iterator is made and passed to the function so any changes applied to the iterator i inside the operator method won't affect the original.
the comparison operator == takes a constant reference, which denotes that the original object can't/shouldn't be changed in the method. This makes sense since a comparison operator usually only compares objects without changing them. The reference allows to pass a reference to the original iterator which lives outside the method. This means that the actual object won't be copied which is usually faster.
First, you don't have an address of an array here.
There's no semantic difference, unless you try to make a local change to the local variable i: iterator i will allow a local change, while const iterator & i will not.
Many people are used to writing const type & var for function parameters because passing by reference can be faster than by value, especially if type is big and expensive to copy, but in your case, an iterator should be small and cheap to copy, so there's no gain from avoiding copying. (Actually, having a local copy can enhance locality of reference and help optimization, so I would just pass small values by value (by copying).)