How does modern compiler optimize function object in c++? - c++

As I knew from the book Effective C++, it would have a better performance if I pass a Function Object by its value rather than function reference or function pointer in C++. So how does the modern compiler do to optimize that kind of scenario?
Or let's say usually we do not recommend to pass an object of our self-customized class by value, but as function object is actually the same as a normal object but just implemented the "operator()" inside the class. So, there must be something different for the compiler to treat these two things when passing them by value, right?
Below is a case giving a comparison between the function object and function pointer.
#include <algorithm>
#include <vector>
#include <ctime>
#include <iostream>
bool cmp(int a, int b) { return a < b; }
int main() {
std::vector<int> v(10000000);
for (size_t i = 0; i < 10000000; ++i)
v.push_back(rand());
std::vector<int> v2(v);
std::sort(v.begin(), v.end(), std::less<int>()); // This way would be faster than below;
std::sort(v2.begin(), v2.end(), cmp);
}

In case of function pointer, compilers is likely to pass function pointer and performing indirect function call, instead of making direct function call or even inlining.
In contrast, operator() of a function object is likely to inline, or at least be called directly, since it is not passed, only data to it is passed (by value or by reference). In case of function object without data, you pass nothing (that would compile to a dummy integer, or even nothing).
Especially it is true with std::function, there's almost no way from implementation side to avoid double indirect function call in case of function pointer.
A lambda is easiest way to make this optimization. Here is your example with one character difference:
#include <algorithm>
#include <vector>
#include <ctime>
#include <iostream>
int main() {
std::vector<int> v(10000000);
for (size_t i = 0; i < 10000000; ++i)
v.push_back(rand());
std::vector<int> v2(v);
std::sort(v.begin(), v.end(), [] (int a, int b) { return a < b; }); // This way would be faster than below;
std::sort(v2.begin(), v2.end(), +[] (int a, int b) { return a < b; });
}
Modern compilers did not go much further than old compilers in this regard. Although you can try your example on different modern compilers to check for sure (you can use https://godbolt.org/ and inspect disassembly)

In case of gcc 7.5, std::sort uses internally __gnu_cxx::__ops::_Iter_comp_iter template which looks like that:
template<typename _Compare>
struct _Iter_comp_iter
{
_Compare _M_comp;
explicit _GLIBCXX14_CONSTEXPR
_Iter_comp_iter(_Compare __comp) : _M_comp(_GLIBCXX_MOVE(__comp)) { }
template<typename _Iterator1, typename _Iterator2>
_GLIBCXX14_CONSTEXPR bool
operator()(_Iterator1 __it1, _Iterator2 __it2)
{ return bool(_M_comp(*__it1, *__it2)); }
}
In the first case _Compare is std::less<int>, in the second -- bool (*)(int, int).
In the first case gcc inlines comparison, while in the second it generates something like callq *%r13 to call that pointer stored in _M_comp.
Update:
After more digging around prompted by comments, it turns out that the problem is not in the type of _Compare -- gcc 7.5 can inline small pure functions with function pointers, too, even without inline modifier -- but rather in presence of recursion in the internal workings of std::sort. That throws the compiler off and it generates indirect call. Good news is that gcc 8+ seems to be free of this drawback.

Related

Type agnostic abstraction to handle forward and reverse iterators and ranges using the same runtime interface?

By design forward and reverse iterators and ranges are fundamentally different types. This is nice in the compile time optimization that it allows for. Sometimes it would be nice to hide that type difference behind an abstraction that allows them to be passed to the same run-time interface.
Are there any adapters in boost or the stl that make this easy? (ideally but not strictly C++11)
The following code shows both the known/expected failure and the desired hypothetical:
#include <boost/range.hpp>
#include <vector>
using Ints = std::vector<int>;
void real(boost::iterator_range<Ints::iterator> range){}
void fake(boost::agnostic_range<Ints::iterator> range){} // imaginary desired
int main()
{
auto ints = Ints{1,2,3,4,5};
real(boost::make_iterator_range(ints.begin(), ints.end()));
real(boost::make_iterator_range(ints.rbegin(), ints.rend())); // Error
fake(boost::make_agnsotic_range(ints.begin(), ints.end())); // imaginary
fake(boost::make_agnsotic_range(ints.rbegin(), ints.rend())); // imaginary
return 0;
}
Yes! Boost::any_range type erases the iterated object type and exposes only the output type and the iterator access type.
Note that the type erasure here requires a call through a virtual function to dereference the iterator so there's a performance cost there, but as long as non-trivial operations are performed inside the loop, this cost will be likely irrelevant.
BUG WARNING: boost::range had a big bug between ~1.55ish until release 1.74 (2020-08) which would cause access to destroyed items being passed through any_range that would cause UB (undefined behavior/probably crash) The work around to this exists in the code below where you explicitly pass the so-called reference type though the template parameters as const which causes some of the internal machinery to avoid tripping over the mistake.
#include <boost/range/adaptor/type_erased.hpp>
#include <boost/range/adaptor/reversed.hpp>
#include <boost/range/any_range.hpp>
#include <vector>
#include <list>
#include <iostream>
// note const int bug workaround
using GenericBiDirIntRange =
boost::any_range<int, boost::bidirectional_traversal_tag, const int>;
void possible(GenericBiDirIntRange const &inputRange) {
for(auto item: inputRange)
std::cout << item << "\n";
}
// note const int bug workaround
using type_erased_bi =
boost::adaptors::type_erased<int, boost::bidirectional_traversal_tag, const int>;
using reversed = boost::adaptors::reversed;
auto main() -> int {
auto intVec = std::vector<int>{1, 2, 3, 4};
auto intList = std::list<int>{1, 2, 3, 4};
possible(intVec | type_erased_bi());
possible(intList | reversed | type_erased_bi());
return 0;
}

Efficient way to return a std::vector in c++

How much data is copied, when returning a std::vector in a function and how big an optimization will it be to place the std::vector in free-store (on the heap) and return a pointer instead i.e. is:
std::vector *f()
{
std::vector *result = new std::vector();
/*
Insert elements into result
*/
return result;
}
more efficient than:
std::vector f()
{
std::vector result;
/*
Insert elements into result
*/
return result;
}
?
In C++11, this is the preferred way:
std::vector<X> f();
That is, return by value.
With C++11, std::vector has move-semantics, which means the local vector declared in your function will be moved on return and in some cases even the move can be elided by the compiler.
You should return by value.
The standard has a specific feature to improve the efficiency of returning by value. It's called "copy elision", and more specifically in this case the "named return value optimization (NRVO)".
Compilers don't have to implement it, but then again compilers don't have to implement function inlining (or perform any optimization at all). But the performance of the standard libraries can be pretty poor if compilers don't optimize, and all serious compilers implement inlining and NRVO (and other optimizations).
When NRVO is applied, there will be no copying in the following code:
std::vector<int> f() {
std::vector<int> result;
... populate the vector ...
return result;
}
std::vector<int> myvec = f();
But the user might want to do this:
std::vector<int> myvec;
... some time later ...
myvec = f();
Copy elision does not prevent a copy here because it's an assignment rather than an initialization. However, you should still return by value. In C++11, the assignment is optimized by something different, called "move semantics". In C++03, the above code does cause a copy, and although in theory an optimizer might be able to avoid it, in practice its too difficult. So instead of myvec = f(), in C++03 you should write this:
std::vector<int> myvec;
... some time later ...
f().swap(myvec);
There is another option, which is to offer a more flexible interface to the user:
template <typename OutputIterator> void f(OutputIterator it) {
... write elements to the iterator like this ...
*it++ = 0;
*it++ = 1;
}
You can then also support the existing vector-based interface on top of that:
std::vector<int> f() {
std::vector<int> result;
f(std::back_inserter(result));
return result;
}
This might be less efficient than your existing code, if your existing code uses reserve() in a way more complex than just a fixed amount up front. But if your existing code basically calls push_back on the vector repeatedly, then this template-based code ought to be as good.
It's time I post an answer about RVO, me too...
If you return an object by value, the compiler often optimizes this so it doesn't get constructed twice, since it's superfluous to construct it in the function as a temporary and then copy it. This is called return value optimization: the created object will be moved instead of being copied.
A common pre-C++11 idiom is to pass a reference to the object being filled.
Then there is no copying of the vector.
void f( std::vector & result )
{
/*
Insert elements into result
*/
}
If the compiler supports Named Return Value Optimization (http://msdn.microsoft.com/en-us/library/ms364057(v=vs.80).aspx), you can directly return the vector provide that there is no:
Different paths returning different named objects
Multiple return paths (even if the same named object is returned on
all paths) with EH states introduced.
The named object returned is referenced in an inline asm block.
NRVO optimizes out the redundant copy constructor and destructor calls and thus improves overall performance.
There should be no real diff in your example.
vector<string> getseq(char * db_file)
And if you want to print it on main() you should do it in a loop.
int main() {
vector<string> str_vec = getseq(argv[1]);
for(vector<string>::iterator it = str_vec.begin(); it != str_vec.end(); it++) {
cout << *it << endl;
}
}
follow code will works without copy constructors:
your routine:
std::vector<unsigned char> foo()
{
std::vector<unsigned char> v;
v.resize(16, 0);
return std::move(v); // move the vector
}
After, You can use foo routine for get the vector without copy itself:
std::vector<unsigned char>&& moved_v(foo()); // use move constructor
Result: moved_v size is 16 and it filled by [0]
As nice as "return by value" might be, it's the kind of code that can lead one into error. Consider the following program:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
static std::vector<std::string> strings;
std::vector<std::string> vecFunc(void) { return strings; };
int main(int argc, char * argv[]){
// set up the vector of strings to hold however
// many strings the user provides on the command line
for(int idx=1; (idx<argc); ++idx){
strings.push_back(argv[idx]);
}
// now, iterate the strings and print them using the vector function
// as accessor
for(std::vector<std::string>::interator idx=vecFunc().begin(); (idx!=vecFunc().end()); ++idx){
cout << "Addr: " << idx->c_str() << std::endl;
cout << "Val: " << *idx << std::endl;
}
return 0;
};
Q: What will happen when the above is executed? A: A coredump.
Q: Why didn't the compiler catch the mistake? A: Because the program is
syntactically, although not semantically, correct.
Q: What happens if you modify vecFunc() to return a reference? A: The program runs to completion and produces the expected result.
Q: What is the difference? A: The compiler does not
have to create and manage anonymous objects. The programmer has instructed the compiler to use exactly one object for the iterator and for endpoint determination, rather than two different objects as the broken example does.
The above erroneous program will indicate no errors even if one uses the GNU g++ reporting options -Wall -Wextra -Weffc++
If you must produce a value, then the following would work in place of calling vecFunc() twice:
std::vector<std::string> lclvec(vecFunc());
for(std::vector<std::string>::iterator idx=lclvec.begin(); (idx!=lclvec.end()); ++idx)...
The above also produces no anonymous objects during iteration of the loop, but requires a possible copy operation (which, as some note, might be optimized away under some circumstances. But the reference method guarantees that no copy will be produced. Believing the compiler will perform RVO is no substitute for trying to build the most efficient code you can. If you can moot the need for the compiler to do RVO, you are ahead of the game.
vector<string> func1() const
{
vector<string> parts;
return vector<string>(parts.begin(),parts.end()) ;
}
This is still efficient after c++11 onwards as complier automatically uses move instead of making a copy.

Why is it bad to have a local functor?

For example, whats wrong with declaring the class doubler within the main function, if the predicate will only be used once?
#include <list>
#include <algorithm>
#define SIZE 10
int main()
{
std::list<int> myList;
for(int i=0; i<SIZE ;++i)
{
myList.push_back(i);
}
class doubler
{
public:
doubler(){}
int operator()(int a)
{
return a + a;
}
} pred;
std::for_each(myList.begin(), myList.end(), pred);
return 0;
}
The problem with this setup is that, at least in C++03, you cannot use a local functor as a template argument because it doesn't have external linkage. This means that technically speaking, the above code isn't legal. However, they're fixing this in C++0x since it's a pretty silly restriction, and since VS2010 has rudimentary C++0x support the above code is totally fine.
In short, the answer to your question is that there's nothing wrong with it if you're using C++0x-compliant compilers, but otherwise you should probably refrain from doing so to maximize cross-compiler compatibility.
It is illegal before C++0x
In C++0x, there is a better solution (lambdas/closures)
So in either case you should use a different solution.

C++ sort method

I want to sort a vector using std::sort, but my sort method is a static method of a class, and I want to call std::sort outside it, but it seems to be trouble doing it this way.
On the class:
static int CompareIt(void *sol1, void *sol2) { ... }
std::sort call:
sort(distanceList.at(q).begin(),
distanceList.at(q).end(),
&DistanceNodeComparator::CompareIt);
Shouldn't it be possible to do this way?
std::sort takes a comparator that accepts value of the type held in the collection and returns bool. It should generally implement some notion of <. E.g., assuming your distanceList elements have collections of integers (I assume they don't, but for the sake of the example):
static bool CompareIt(int sol1, int sol2) { ... }
And of course you only need to supply a comparator if there isn't already a < operator that does the right thing for your scenario.
It should be a boolean method (sort uses operator <() by default to compare values)
The comparison function you've provided has the signature of the one needed by qsort, which is the sorting function that C provided before C++ came along. sort requires a completely different function.
For example if your declaration of distanceList is std::vector<DistanceNode> your function would look like:
static bool CompareIt(const DistanceNode &sol1, const DistanceNode &sol2)
{
return sol1.key < sol2.key;
}
Notice that sorting a std::list with the standard sort algorithm isn't efficient, which is why list supplies its own sort member function.
As others have mentioned, it needs a boolean return type. Here's an example which works:
#include "stdafx.h"
#include <vector>
#include <algorithm>
using namespace std;
class MyClass
{
public:
static bool CompareIt(const void *a1, const void *a2)
{
return a1 < a2;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
// Create a vector that contains elements of type MyData
vector<void*> myvector;
// Add data to the vector
myvector.push_back((void*)0x00000005);
myvector.push_back((void*)0x00000001);
// Sort the vector
std::sort(myvector.begin(), myvector.end(), MyClass::CompareIt);
// Display some results
for( int i = 0; i < myvector.size(); i++ )
{
printf("%d = 0x%08X\n", i, myvector[i] );
}
return 0;
}
[Edit] Updated the code above to make it a little simpler. I'm not suggesting it's nice code, but without know more about the OPs real implementation, it's difficult to give a better example!
First, the return type should be bool. Actually the requirement is only that the return type be assignable to bool, which int is. But the fact that you're returning int suggests that you might have written a three-way comparator instead of the strict weak ordering required by std::sort.
Your CompareIt function takes two void* pointers as parameters. Is distanceList.at(q) a vector<void*> (or vector of something convertible to void*)? If not, then the comparator inputs aren't right either. Using void* with algorithms also suggests that you're doing something wrong, because much of the point of generic programming is that you don't need opaque pointers that later get cast back to their original type.

C++ STL - iterate through everything in a sequence

I have a sequence, e.g
std::vector< Foo > someVariable;
and I want a loop which iterates through everything in it.
I could do this:
for (int i=0;i<someVariable.size();i++) {
blah(someVariable[i].x,someVariable[i].y);
woop(someVariable[i].z);
}
or I could do this:
for (std::vector< Foo >::iterator i=someVariable.begin(); i!=someVariable.end(); i++) {
blah(i->x,i->y);
woop(i->z);
}
Both these seem to involve quite a bit of repetition / excessive typing. In an ideal language I'd like to be able to do something like this:
for (i in someVariable) {
blah(i->x,i->y);
woop(i->z);
}
It seems like iterating through everything in a sequence would be an incredibly common operation. Is there a way to do it in which the code isn't twice as long as it should have to be?
You could use for_each from the standard library. You could pass a functor or a function to it. The solution I like is BOOST_FOREACH, which is just like foreach in other languages. C+0x is gonna have one btw.
For example:
#include <iostream>
#include <vector>
#include <algorithm>
#include <boost/foreach.hpp>
#define foreach BOOST_FOREACH
void print(int v)
{
std::cout << v << std::endl;
}
int main()
{
std::vector<int> array;
for(int i = 0; i < 100; ++i)
{
array.push_back(i);
}
std::for_each(array.begin(), array.end(), print); // using STL
foreach(int v, array) // using Boost
{
std::cout << v << std::endl;
}
}
Not counting BOOST_FOREACH which AraK already suggested, you have the following two options in C++ today:
void function(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
std::for_each(someVariable.begin(), someVariable.end(), function);
struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());
Both require you to specify the "body" of the loop elsewhere, either as a function or as a functor (a class which overloads operator()). That might be a good thing (if you need to do the same thing in multiple loops, you only have to define the function once), but it can be a bit tedious too. The function version may be a bit less efficient, because the compiler is generally unable to inline the function call. (A function pointer is passed as the third argument, and the compiler has to do some more detailed analysis to determine which function it points to)
The functor version is basically zero overhead. Because an object of type functor is passed to for_each, the compiler knows exactly which function to call: functor::operator(), and so it can be trivially inlined and will be just as efficient as your original loop.
C++0x will introduce lambda expressions which make a third form possible.
std::for_each(someVariable.begin(), someVariable.end(), [](Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
});
Finally, it will also introduce a range-based for loop:
for(Foo& arg : my_someVariable)
{
blah(arg.x, arg.y);
woop(arg.z);
}
So if you've got access to a compiler which supports subsets of C++0x, you might be able to use one or both of the last forms. Otherwise, the idiomatic solution (without using Boost) is to use for_eachlike in one of the two first examples.
By the way, MSVS 2008 has a "for each" C++ keyword. Look at How to: Iterate Over STL Collection with for each.
int main() {
int retval = 0;
vector<int> col(3);
col[0] = 10;
col[1] = 20;
col[2] = 30;
for each( const int& c in col )
retval += c;
cout << "retval: " << retval << endl;
}
Prefer algorithm calls to hand-written loops
There are three reasons:
1) Efficiency: Algorithms are often more efficient than the loops programmers produce
2) Correctness: Writing loops is more subject to errors than is calling algorithms.
3) Maintainability: Algorithm calls often yield code that is clearer and more
straightforward than the corresponding explicit loops.
Prefer almost every other algorithm to for_each()
There are two reasons:
for_each is extremely general, telling you nothing about what's really being done, just that you're doing something to all the items in a sequence.
A more specialized algorithm will often be simpler and more direct
Consider, an example from an earlier reply:
void print(int v)
{
std::cout << v << std::endl;
}
// ...
std::for_each(array.begin(), array.end(), print); // using STL
Using std::copy instead, that whole thing turns into:
std::copy(array.begin(), array.end(), std::ostream_iterator(std::cout, "\n"));
"struct functor {
void operator()(Foo& arg){
blah(arg.x, arg.y);
woop(arg.z);
}
};
std::for_each(someVariable.begin(), someVariable.end(), functor());"
I think approaches like these are often needlessly baroque for a simple problem.
do i=1,N
call blah( X(i),Y(i) )
call woop( Z(i) )
end do
is perfectly clear, even if it's 40 years old (and not C++, obviously).
If the container is always a vector (STL name), I see nothing wrong with an index and nothing wrong with calling that index an integer.
In practice, often one needs to iterate over multiple containers of the same size simultaneously and peel off a datum from each, and do something with the lot of them. In that situation, especially, why not use the index?
As far as SSS's points #2 and #3 above, I'd say it could be so for complex cases, but often iterating 1...N is often as simple and clear as anything else.
If you had to explain the algorithm on the whiteboard, could you do it faster with, or without, using 'i'? I think if your meatspace explanation is clearer with the index, use it in codespace.
Save the heavy C++ firepower for the hard targets.