Loop fusion in C++ (how to help the compiler?) - c++

I try to understand under what circumstances a C++ compiler is able to perform loop fusion and when not.
The following code measures the performance of two different ways to calculate the squared doubles (f(x) = (2*x)^2) of all values in a vector.
#include <chrono>
#include <iostream>
#include <numeric>
#include <vector>
constexpr int square( int x )
{
return x * x;
}
constexpr int times_two( int x )
{
return 2 * x;
}
// map ((^2) . (^2)) $ [1,2,3]
int manual_fusion( const std::vector<int>& xs )
{
std::vector<int> zs;
zs.reserve( xs.size() );
for ( int x : xs )
{
zs.push_back( square( times_two( x ) ) );
}
return zs[0];
}
// map (^2) . map (^2) $ [1,2,3]
int two_loops( const std::vector<int>& xs )
{
std::vector<int> ys;
ys.reserve( xs.size() );
for ( int x : xs )
{
ys.push_back( times_two( x ) );
}
std::vector<int> zs;
zs.reserve( ys.size() );
for ( int y : ys )
{
zs.push_back( square( y ) );
}
return zs[0];
}
template <typename F>
void test( F f )
{
const std::vector<int> xs( 100000000, 42 );
const auto start_time = std::chrono::high_resolution_clock::now();
const auto result = f( xs );
const auto end_time = std::chrono::high_resolution_clock::now();
const auto elapsed = end_time - start_time;
const auto elapsed_us = std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
std::cout << elapsed_us / 1000 << " ms - " << result << std::endl;
}
int main()
{
test( manual_fusion );
test( two_loops );
}
The version with two loops takes about twice as much time as the version with one loop, even with -O3 for GCC and Clang.
Is there a way to allow the compiler to optimize two_loops into being as fast as manual_fusion without operating in-place in the second loop? The reason I'm asking is I want to make chained calls to my library FunctionalPlus like fplus::enumerate(fplus::transform(f, xs)); faster.

You can try modify your two_loops function as follows:
int two_loops( const std::vector<int>& xs )
{
std::vector<int> zs;
zs.reserve( xs.size() );
for ( int x : xs )
{
zs.push_back( times_two( x ) );
}
for ( int i=0 : i<zs.size(); i++ )
{
zs[i] = ( square( zs[i] ) );
}
return zs[0];
}
The point is to avoid allocating memory twice and push_back to another vector

Related

C++ overload [][] for a list

I got a class Matrix with a member std::list<Element> listMatrix;. Element is a a class with 3 int members line, column, value. I save in the list, elements of a matrix that are not 0 by saving the line, column and the value of the respectively element. I want to overload the operator [][] so I can do something like Matrix a; a[2][3] = 5;. I know you can't overload [][] directly.
Do overload Element& operator()(int, int) (and the const variant) so you can write
matrix(2, 3) = 5;
If you absolutely need the [2][3] syntax, you'd need to define a proxy class so matrix[2] return a proxy value and proxy[3] return the desired reference. But it comes with a lot of problems. The basic idea would be:
class naive_matrix_2x2
{
int data[4];
struct proxy
{
naive_matrix_2x2& matrix;
int x;
int& operator[](int y) { return matrix.data[x*2+y]; }
};
public:
proxy operator[](int x) { return {*this, x}; }
};
Full demo: https://coliru.stacked-crooked.com/a/fd053610e56692f6
The list is not a suitable container for using the subscript operator because it has no direct access to its elements without moving an iterator through the list.
So the operator will be inefficient.
It is better to use the standard container std::vector that already has the subscript operator.
Nevertheless answering your question the operator can be defined the following way. You can add to the operators an exception then an index will point outside the list.
#include <iostream>
#include <list>
struct A
{
int x, y, z;
int & operator []( size_t n )
{
return n == 0 ? x : n == 1 ? y : z;
}
const int & operator []( size_t n ) const
{
return n == 0 ? x : n == 1 ? y : z;
}
};
struct B
{
std::list<A> lst;
A & operator []( size_t n )
{
auto it = std::begin( lst );
for ( ; n; n-- ) std::advance( it, 1 );
return *it;
}
const A & operator []( size_t n ) const
{
auto it = std::begin( lst );
for ( ; n; n-- ) std::advance( it, 1 );
return *it;
}
};
int main()
{
B b = { { { 1, 2, 3 }, { 4, 5, 6 }, { 7, 8, 9 } } };
std::cout << b[0][0] << '\n';
std::cout << b[0][1] << '\n';
std::cout << b[0][2] << '\n';
b[2][1] += 20;
std::cout << b[2][1] << '\n';
}
The program output is
1
2
3
28

How do I get the value OUT of my maybe<> monad?

For educational reasons, I'm trying to implement a maybe monad in C++14. My (perhaps overly simplistic) understanding of monads is that they let you define a computation as a series of composable function calls. The wikipedia article on monads calls them "programmable semicolons" because they let you define what happens between what would otherwise be a set of discreet function calls. The maybe monad is a monad that interrupts computation if a failure occurs.
template<class T>
struct maybe
{
maybe( const T& t ) : argument( t ), valid( true ) {}
maybe() : argument(), valid( false ) {}
T argument;
bool valid;
};
template<class T>
maybe<T> just( const T& t ) { return maybe<T>(t); }
template<class T>
maybe<T> nothing() { return maybe<T>(); }
auto terminal_maybe = [] ( auto term ) {
return [=] ( auto func ) {
return func( term );
};
};
auto fmap_maybe = [] ( auto f ) {
return [=] ( auto t ) {
if( t.valid ) {
try {
t.argument = f( t.argument );
printf("argument = %d\n",t.argument);
}
catch(...) {
t.valid = false;
}
}
return (t.valid) ? terminal_maybe( just( t.argument ) ) : terminal_maybe( nothing<decltype(t.argument)>() );
};
};
int main( int argc, char* argv[] )
{
auto plus_2 = [] ( auto arg ) { return arg + 2; };
auto minus_2 = [] ( auto arg ) { return arg - 2; };
maybe<int> forty = just(40);
terminal_maybe(forty)
(fmap_maybe( plus_2 ))
(fmap_maybe( plus_2 ));
printf("result = %d\n",forty.argument);
return 0;
}
As you can see I am super close! I can chain multiple calls together monadically (and I can tell from printf that my value does what I expect (increments from 40 to 42 and then from 42 to 44)). The problem is that I have no way to get the final value OUT! I tried making terminal_maybe accept a reference (auto&) and that forced me to modify fmap's return statement (to just return terminal_maybe( t ) rather than a new maybe). But it still didn't have the correct value for the final printf.
This works, but I don't know if it makes sense from a FP point of view.
auto unwrap = [](auto const &f) {
return f;
};
int main( int argc, char* argv[] )
{
auto plus_2 = [] ( auto arg ) { return arg + 2; };
auto minus_2 = [] ( auto arg ) { return arg - 2; };
maybe<int> forty = just(40);
auto const &outv = terminal_maybe(forty)
(fmap_maybe( plus_2 ))
(fmap_maybe( plus_2 ))
(unwrap);
std::printf("result = %d\n",outv.argument);
return 0;
}

Fill vector using function specified by enum

The functionality that I want is like:
std::vector<float> GetFuncVec(int N, FuncType type)
{
std::vector<float> fn(N);
float tmp = (N - 1) / 2.0;
switch (type) {
case SIN:
for (int i=0; i<N; ++i)
fn[i] = sin(M_PI * i / tmp);
break;
case SINC:
for (int i=0; i<N; ++i)
fn[i] = sin(M_PI * i / tmp) / (M_PI * i / tmp);
break;
...
}
return fn;
}
I find this unsatisfactory because there is a lot of code duplication. Looking around, I found the STL algorithm std::generate() which can fill a vector using a functor, which can have an increment member to play the role of i.
I see two potential routes. The first is to use a factory to initialize the functor. The problem with this method is code separation (above, the different cases are kept nicely together) and increased overheads with multiple new classes needed.
The second is to use lambda functions (which I have very little experience with). This is nice because I can define each function in a single line in the switch statement. But I don't see how I can avoid a scoping problem (the lambda function is not accessible outside the scope of the switch statement).
Is there a solution using lambda functions? What is the best option, from an efficiency viewpoint and from a readability viewpoint?
Maybe you want something like this...? (see it run here
#include <iostream>
#include <vector>
#include <cmath>
#include <functional>
enum Func { Sin, Sinc };
std::vector<float> f(int n, Func func)
{
std::vector<float> results(n);
float tmp = (n - 1) / 2.0;
int i;
std::function<float()> fns[] = {
[&] { return sin(M_PI * i / tmp); },
[&] { return sin(M_PI * i / tmp) / (M_PI * i / tmp); }
};
auto& fn = fns[func];
for (i=0; i<n; ++i)
results[i] = fn();
return results;
}
int main()
{
std::vector<float> x = f(10, Sin);
for (auto& v : x) std::cout << v << ' '; std::cout << '\n';
std::vector<float> y = f(10, Sinc);
for (auto& v : y) std::cout << v << ' '; std::cout << '\n';
}
Output:
0 0.642788 0.984808 0.866025 0.34202 -0.34202 -0.866025 -0.984808 -0.642788 -2.44929e-16
-nan 0.920725 0.705317 0.413497 0.122477 -0.0979816 -0.206748 -0.201519 -0.115091 -3.89817e-17
One option that may not be fast (there is indirection on each function call) but that would be a bit more flexible would be to create an std::map<FuncType, std::function<float(int,float)>>. You can't use std::generate() because you need the argument i to calculate the result, but writing your own is not that hard:
template <typename Iterator, typename Generator, typename Index, typename... Args>
void generate_i(Iterator first, Iterator last, Generator gen, Index i, Args... args)
{
while (first != last) {
*first = gen(i, args...);
++i;
++first;
}
}
Now that we have this, we need to populate a map of functors:
using FuncTypeFunction = std::function<float(int,float)>;
using FuncTypeFunctionMap = std::map<FuncType, FuncTypeFunction>;
FuncTypeFunctionMap create_functype_map()
{
FuncTypeFunctionMap functions;
functions[SIN] = [] (int i, float tmp) {
return sin(M_PI * i / tmp);
};
functions[SINC] = [] (int i, float tmp) {
return sin(M_PI * i / tmp) / (M_PI * i / tmp);
};
// ...
return functions;
}
FuncTypeFunctionMap const FuncTypeFunctions = create_functype_map();
(If you prefer you can use boost.assign to improve readability of this bit.)
And finally, we can use this map:
std::vector<float> GetFuncVec(int N, FuncType type)
{
std::vector<float> fn(N);
float tmp = (N - 1) / 2.0;
auto func = FuncTypeFunctions.find(type);
if (func != FuncTypeFunctions.end()) {
generate_i(fn.begin(), fn.end(), func->second, 0, tmp);
}
return fn;
}
Adding new functions only requires populating the map in create_functype_map(). Note that each iteration in the generate_i() loop is going to invoke the operator() on std::function, which will require a level of indirection to resolve the call, similar to the overhead of a virtual method invocation. This will cost a bit in terms of performance but may not be an issue for you.
(See a demo)
You may write a general class that will be used in standard algorithm std::iota
For example
#include <iostream>
#include <functional>
#include <vector>
#include <numeric>
class Value
{
public:
Value() : i( 0 ), fn( []( size_t i ) { return ( float )i; } ) {}
Value & operator ++() { ++i; return *this; }
operator float () const { return fn( i ); }
Value & operator =( std::function<float( size_t )> fn )
{
this->fn = fn;
return *this;
}
private:
size_t i;
std::function<float( size_t )> fn;
};
enum E { First, Second };
std::vector<float> f( size_t N, E e )
{
Value value;
float tmp = N / 2.0f;
switch( e )
{
case First:
value = [tmp] ( size_t i ) { return i * tmp; };
break;
case Second:
value = [tmp] ( size_t i ) { return i * tmp + tmp; };
break;
}
std::vector<float> v( N );
std::iota( v.begin(), v.end(), value );
return v;
}
int main()
{
for ( float x : f( 10, First ) ) std::cout << x << ' ';
std::cout << std::endl;
for ( float x : f( 10, Second ) ) std::cout << x << ' ';
std::cout << std::endl;
return 0;
}
The output is
0 5 10 15 20 25 30 35 40 45
5 10 15 20 25 30 35 40 45 50
Of course you may use your own lambda expressions that include some mathematical functions like sin

Changing all elements in vector(list, deque...) using C++11 Lambda functions

I have the following code:
#include <iostream>
#include <vector>
#include <algorithm>
int main( int argc, char* argv[] )
{
std::vector< int > obj;
obj.push_back( 10 );
obj.push_back( 20 );
obj.push_back( 30 );
std::for_each( obj.begin(), obj.end(), []( int x )
{
return x + 2;
} );
for( int &v : obj )
std::cout << v << " ";
std::cout << std::endl;
return 0;
}
The result is : 10, 20, 30
i want to change all elements in vector (obj), using Lambda functions of new C++11 standard.
This is the code of implementation for_each function:
template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function f)
{
for ( ; first!=last; ++first )
f(*first);
return f;
}
*first passed by value and cope of element is changed, what is the alternative of for_each i must use that i have a result: 12, 22, 32 ?
i want to change all elements in vector (obj), using Lambda functions of new C++11 standard.
You've to do this :
std::for_each( obj.begin(), obj.end(), [](int & x)
{ //^^^ take argument by reference
x += 2;
});
In your (not my) code, the return type of the lambda is deduced as int, but the return value is ignored as nobody uses it. That is why there is no return statement in my code, and the return type is deduced as void for this code.
By the way, I find the range-based for loop less verbose than std::for_each for this purpose:
for( int &v : obj ) v += 2;
You should use transform:
std::transform( obj.begin(), obj.end(), obj.begin(), []( int x )
{
return x + 2;
} );
Pass the argument by reference and modify the reference:
std::for_each( obj.begin(), obj.end(), [](int & x){ x += 2; } );
// ^^^^^
std::for_each( obj.begin(), obj.end(), []( int& x )
{
x += 2;
} );
In addition to the already existing (and perfectly correct) answers, you can also use your existing lambda function, that returns the result instead of modifying the argument, and just use std::transform instead of std::for_each:
std::transform(obj.begin(), obj.end(), obj.begin(), []( int x )
{
return x + 2;
} );

Mistake in calling std::stable_sort?

struct SimGenRequest {
int wakeup_mfm_;
double value_;
bool operator < ( const SimGenRequest & r2 ) const
{ return ( wakeup_mfm_ < r2.wakeup_mfm_ ) ; }
};
Use :
std::stable_sort ( all_requests_.begin ( ), all_requests_.end ( ) );
Works ( compiles ). But
struct SimGenRequest {
int wakeup_mfm_;
double value_;
};
bool CompareByWakeTime ( const SimGenRequest & r1, const SimGenRequest & r2 ) {
return ( r1.wakeup_mfm_ < r2.wakeup_mfm_ ) ;
}
Use :
std::stable_sort ( all_requests_.begin ( ), all_requests_.end ( ),
CompareByWakeTime );
does not work.
Any pointers ?
The following is more or less your code. It compiles, and produces the expected output. To further help you, we need more information as to what isn't working.
#include <algorithm>
#include <iostream>
#include <set>
#include <vector>
struct SimGenRequest {
int wakeup_mfm_;
double value_;
SimGenRequest(int w, double v) :
wakeup_mfm_(w),
value_(v)
{ }
};
bool CompareByWakeTime ( const SimGenRequest & r1, const SimGenRequest & r2 ) {
return ( r1.wakeup_mfm_ < r2.wakeup_mfm_ ) ;
}
int main()
{
std::vector<SimGenRequest> all_requests_;
all_requests_.push_back(SimGenRequest(3, 1));
all_requests_.push_back(SimGenRequest(4, 3));
all_requests_.push_back(SimGenRequest(3, 2));
all_requests_.push_back(SimGenRequest(1, 4));
std::stable_sort(all_requests_.begin(), all_requests_.end(), CompareByWakeTime);
for(std::vector<SimGenRequest>::const_iterator i = all_requests_.begin();
i != all_requests_.end();
++i)
{
std::cout << '(' << i->wakeup_mfm_ << ", " << i->value_ << ')' << std::endl;
}
return 0;
}
STL uses onle operator less overloading. Otherwise you can specify any boolean functor for the sort, but you have to input it at the stable_sort call.