Are efficient "repeatedly used intermediates" possible in C++ expression template programming? - c++

Here's one thing I haven't seen explicitly addressed in C++ expression template programming in order to avoid building unnecessary temporaries (through creating trees of "inlinable templated objects" that only get collapsed at the assignment operator). Suppose for the illustration we're modeling 1-D sequences of values, with elementwise application of arithmetic operators like +, *, etc. Call the basic class for fully-created sequences Seq (which holds a fixed-length list of doubles for the sake of concreteness) and consider the following illustrative pseudo-C++-code.
void f(Seq &a,Seq &b,Seq &c,Seq &d,Seq &e){
AType t=(a+2*b)/(a+b+c); // question is about what AType can be
Seq f=d*t;
Seq g=e*e*t;
//do something with f and g
}
where there are expression templated overloads for +, etc, elsewhere. For the line defining t:
I can implement this code if I make AType be Seq, but then I've created this full intermediate variable when I don't need it (except in how it enables computation of f and g). But at least it's only calculated once.
I can also implement this making AType be the appropriate templated expression type, so that a full Seq isn't created at the commented line, but consumed chunk-by-chunk in f and g. But then the same computation involved in creating every particular chunk will be repeated in both f and g. (I suppose in theory an incredibly smart compiler might realise the same computation is being done twice and CSE-it, but I don't think any do and I wouldn't want to rely on an optimiser always being able to spot the opportunities.)
My understanding is that there's no clever code rewriting and/or usage of templates that allow each chunk of t to be calculated only once and for t to be calculated chunkwise rather than all at once?
(I can vaguely imagine AType could be some kind of object that contains both an expression template type and a cached value that gets written after it's evaluated the first time, but that doesn't seem to help with the need to synchronise the two implicit loops in the assignments to f and g.)
In googling, I have come across one Masters thesis on another subject that mentions in passing that manual "common subexpression elimination" should be avoided with expression templates, but I'd like to find a more authoritative "it's not possible" or a "here's how to do it".
The closest stackoverflow question is Intermediate results using expression templates
which seems to be about the type-naming issue rather than the efficiency issue in creating a full intermediate.

Since you obviously don't want to do the entire calculation twice, you have to cache it somehow. The easiest way to cache it seems to be for AType to be a Seq. You say This has the downside of a full intermediate variable, but that's exactly what you want in this case. That full intermediate is your cache, and cannot be trivially avoided.
If you profile the code and this is a chokepoint, then the only faster way I can think of is to write a special function to calculate f and g in parallell, but that'd be super-confusing, and very much not recommended.
void g(Seq &d, Seq &e, Expr &t, Seq &f, Seq &g)
{
for(int i=0; i<d.size(); ++i) {
auto ti = t[i];
f[i] = d[i]*ti;
g[i] = e[i]*e[i]*ti;
}
}
void f(Seq &a,Seq &b,Seq &c,Seq &d,Seq &e)
{
Expr t = (a+2*b)/(a+b+c);
Seq f, g;
g(d, e, t, f, g);
//do something with f and g
}

Related

Why is C++ auto risky [duplicate]

It seems that auto was a fairly significant feature to be added in C++11 that seems to follow a lot of the newer languages. As with a language like Python, I have not seen any explicit variable declaration (I am not sure if it is possible using Python standards).
Is there a drawback to using auto to declare variables instead of explicitly declaring them?
The question is about drawbacks of auto, so this answer highlights some of those. A drawback of using a programming language feature (in this case, a facility associated with a language keyword) does not mean that feature is unacceptable, nor does it mean that feature should be avoided entirely. It means there are disadvantages along with advantages, so a decision to use auto type deduction over alternatives must consider engineering trade-offs.
When used well, auto has several advantages as well - which is not the subject of the question. The drawbacks result from ease of abuse, and from increased potential for code to behave in unintended or unexpected ways.
The main drawback is that, by using auto, you don't necessarily know the type of object being created. There are also occasions where the programmer might expect the compiler to deduce one type, but the compiler adamantly deduces another.
Given a declaration like
auto result = CallSomeFunction(x,y,z);
you don't necessarily have knowledge of what type result is. It might be an int. It might be a pointer. It might be something else. All of those support different operations. You can also dramatically change the code by a minor change like
auto result = CallSomeFunction(a,y,z);
because, depending on what overloads exist for CallSomeFunction() the type of result might be completely different - and subsequent code may therefore behave completely differently than intended. You might suddenly trigger error messages in later code(e.g. subsequently trying to dereference an int, trying to change something which is now const). The more sinister change is where your change sails past the compiler, but subsequent code behaves in different and unknown - possibly buggy - ways. For example (as noted by sashoalm in comments) if the deduced type of a variable changes an integral type to a floating point type - and subsequent code is unexpectedly and silently affected by loss of precision.
Not having explicit knowledge of the type of some variables therefore makes it harder to rigorously justify a claim that the code works as intended. This means more effort to justify claims of "fit for purpose" in high-criticality (e.g. safety-critical or mission-critical) domains.
The other, more common drawback, is the temptation for a programmer to use auto as a blunt instrument to force code to compile, rather than thinking about what the code is doing, and working to get it right.
This isn't a drawback of auto in a principled way exactly, but in practical terms it seems to be an issue for some. Basically, some people either: a) treat auto as a savior for types and shut their brain off when using it, or b) forget that auto always deduces to value types. This causes people to do things like this:
auto x = my_obj.method_that_returns_reference();
Oops, we just deep copied some object. It's often either a bug or a performance fail. Then, you can swing the other way too:
const auto& stuff = *func_that_returns_unique_ptr();
Now you get a dangling reference. These problems aren't caused by auto at all, so I don't consider them legitimate arguments against it. But it does seem like auto makes these issue more common (from my personal experience), for the reasons I listed at the beginning.
I think given time people will adjust, and understand the division of labor: auto deduces the underlying type, but you still want to think about reference-ness and const-ness. But it's taking a bit of time.
Other answers are mentioning drawbacks like "you don't really know what the type of a variable is." I'd say that this is largely related to sloppy naming convention in code. If your interfaces are clearly-named, you shouldn't need to care what the exact type is. Sure, auto result = callSomeFunction(a, b); doesn't tell you much. But auto valid = isValid(xmlFile, schema); tells you enough to use valid without having to care what its exact type is. After all, with just if (callSomeFunction(a, b)), you wouldn't know the type either. The same with any other subexpression temporary objects. So I don't consider this a real drawback of auto.
I'd say its primary drawback is that sometimes, the exact return type is not what you want to work with. In effect, sometimes the actual return type differs from the "logical" return type as an implementation/optimisation detail. Expression templates are a prime example. Let's say we have this:
SomeType operator* (const Matrix &lhs, const Vector &rhs);
Logically, we would expect SomeType to be Vector, and we definitely want to treat it as such in our code. However, it is possible that for optimisation purposes, the algebra library we're using implements expression templates, and the actual return type is this:
MultExpression<Matrix, Vector> operator* (const Matrix &lhs, const Vector &rhs);
Now, the problem is that MultExpression<Matrix, Vector> will in all likelihood store a const Matrix& and const Vector& internally; it expects that it will convert to a Vector before the end of its full-expression. If we have this code, all is well:
extern Matrix a, b, c;
extern Vector v;
void compute()
{
Vector res = a * (b * (c * v));
// do something with res
}
However, if we had used auto here, we could get in trouble:
void compute()
{
auto res = a * (b * (c * v));
// Oops! Now `res` is referring to temporaries (such as (c * v)) which no longer exist
}
It makes your code a little harder, or tedious, to read.
Imagine something like that:
auto output = doSomethingWithData(variables);
Now, to figure out the type of output, you'd have to track down signature of doSomethingWithData function.
One of the drawbacks is that sometimes you can't declare const_iterator with auto. You will get ordinary (non const) iterator in this example of code taken from this question:
map<string,int> usa;
//...init usa
auto city_it = usa.find("New York");
Like this developer, I hate auto. Or rather, I hate how people misuse auto.
I'm of the (strong) opinion that auto is for helping you write generic code, not for reducing typing.
C++ is a language whose goal is to let you write robust code, not to minimize development time.
This is fairly obvious from many features of C++, but unfortunately a few of the newer ones like auto that reduce typing mislead people into thinking they should start being lazy with typing.
In pre-auto days, people used typedefs, which was great because typedef allowed the designer of the library to help you figure out what the return type should be, so that their library works as expected. When you use auto, you take away that control from the class's designer and instead ask the compiler to figure out what the type should be, which removes one of the most powerful C++ tools from the toolbox and risks breaking their code.
Generally, if you use auto, it should be because your code works for any reasonable type, not because you're just too lazy to write down the type that it should work with.
If you use auto as a tool to help laziness, then what happens is that you eventually start introducing subtle bugs in your program, usually caused by implicit conversions that did not happen because you used auto.
Unfortunately, these bugs are difficult to illustrate in a short example here because their brevity makes them less convincing than the actual examples that come up in a user project -- however, they occur easily in template-heavy code that expect certain implicit conversions to take place.
If you want an example, there is one here. A little note, though: before being tempted to jump and criticize the code: keep in mind that many well-known and mature libraries have been developed around such implicit conversions, and they are there because they solve problems that can be difficult if not impossible to solve otherwise. Try to figure out a better solution before criticizing them.
auto does not have drawbacks per se, and I advocate to (hand-wavily) use it everywhere in new code. It allows your code to consistently type-check, and consistently avoid silent slicing. (If B derives from A and a function returning A suddenly returns B, then auto behaves as expected to store its return value)
Although, pre-C++11 legacy code may rely on implicit conversions induced by the use of explicitly-typed variables. Changing an explicitly-typed variable to auto might change code behaviour, so you'd better be cautious.
Keyword auto simply deduce the type from the return value. Therefore, it is not equivalent with a Python object, e.g.
# Python
a
a = 10 # OK
a = "10" # OK
a = ClassA() # OK
// C++
auto a; // Unable to deduce variable a
auto a = 10; // OK
a = "10"; // Value of const char* can't be assigned to int
a = ClassA{} // Value of ClassA can't be assigned to int
a = 10.0; // OK, implicit casting warning
Since auto is deduced during compilation, it won't have any drawback at runtime whatsoever.
What no one mentioned here so far, but for itself is worth an answer if you asked me.
Since (even if everyone should be aware that C != C++) code written in C can easily be designed to provide a base for C++ code and therefore be designed without too much effort to be C++ compatible, this could be a requirement for design.
I know about some rules where some well defined constructs from C are invalid for C++ and vice versa. But this would simply result in broken executables and the known UB-clause applies which most times is noticed by strange loopings resulting in crashes or whatever (or even may stay undetected, but that doesn't matter here).
But auto is the first time1 this changes!
Imagine you used auto as storage-class specifier before and transfer the code. It would not even necessarily (depending on the way it was used) "break"; it actually could silently change the behaviour of the program.
That's something one should keep in mind.
1At least the first time I'm aware of.
As I described in this answer auto can sometimes result in funky situations you didn't intend.
You have to explictly say auto& to have a reference type while doing just auto can create a pointer type. This can result in confusion by omitting the specifier all together, resulting in a copy of the reference instead of an actual reference.
One reason that I can think of is that you lose the opportunity to coerce the class that is returned. If your function or method returned a long 64 bit, and you only wanted a 32 unsigned int, then you lose the opportunity to control that.
I think auto is good when used in a localized context, where the reader easily & obviously can deduct its type, or well documented with a comment of its type or a name that infer the actual type. Those who don't understand how it works might take it in the wrong ways, like using it instead of template or similar. Here are some good and bad use cases in my opinion.
void test (const int & a)
{
// b is not const
// b is not a reference
auto b = a;
// b type is decided by the compiler based on value of a
// a is int
}
Good Uses
Iterators
std::vector<boost::tuple<ClassWithLongName1,std::vector<ClassWithLongName2>,int> v();
..
std::vector<boost::tuple<ClassWithLongName1,std::vector<ClassWithLongName2>,int>::iterator it = v.begin();
// VS
auto vi = v.begin();
Function Pointers
int test (ClassWithLongName1 a, ClassWithLongName2 b, int c)
{
..
}
..
int (*fp)(ClassWithLongName1, ClassWithLongName2, int) = test;
// VS
auto *f = test;
Bad Uses
Data Flow
auto input = "";
..
auto output = test(input);
Function Signature
auto test (auto a, auto b, auto c)
{
..
}
Trivial Cases
for(auto i = 0; i < 100; i++)
{
..
}
Another irritating example:
for (auto i = 0; i < s.size(); ++i)
generates a warning (comparison between signed and unsigned integer expressions [-Wsign-compare]), because i is a signed int. To avoid this you need to write e.g.
for (auto i = 0U; i < s.size(); ++i)
or perhaps better:
for (auto i = 0ULL; i < s.size(); ++i)
I'm surprised nobody has mentioned this, but suppose you are calculating the factorial of something:
#include <iostream>
using namespace std;
int main() {
auto n = 40;
auto factorial = 1;
for(int i = 1; i <=n; ++i)
{
factorial *= i;
}
cout << "Factorial of " << n << " = " << factorial <<endl;
cout << "Size of factorial: " << sizeof(factorial) << endl;
return 0;
}
This code will output this:
Factorial of 40 = 0
Size of factorial: 4
That was definetly not the expected result. That happened because auto deduced the type of the variable factorial as int because it was assigned to 1.

Eigen custom expression type (or CWise Op functor?)

consider this simple function (kind of pseudo-code):
template <typename Derived>
void f (const MatrixBase<Derived>& input1,
const MatrixBase<Derived>& input2,
const MatrixBase<Derived>& input3,
Vec6& output)
{
part1(output) = <an expr using the three inputs>;
part2(output) = <another expr using the three inputs>;
}
Where part1 and part2 are block expressions selecting the top 3 rows and bottom 3 rows of output, respectively. I know that output is always a column vector with 6 coefficients.
I would like to avoid passing the output argument. But I would also like to avoid returning an explicit Vec6, to avoid copies of temporaries.
Thus I would like to return one of those magical eigen expression objects, which only store a reference to the inputs and actually do the computation only when required.
My questions:
Is it worth it? Given that it is a "small" 6-dimensional vector (for example, the docs say that with small 3-vectors the compilers are usually able to optimize away temporaries, thus there is not much point of avoiding them)
Can I implement my return expression type with a custom functor and the CWiseNullaryOp or do I really need a custom expression type?
Thanks
EDIT: so we established that the temporaries are optimized away (see comments); that answers question 1, and makes 2 not relevant.
However, for the sake of curiosity and learning a bit more about Eigen, does anyone have some hints about a viable expression type for my function? (assuming it was worth to devise one)
Assuming Vec6 has been designed "properly" (See e.g. Rule Of Zero) then if your function f() takes the form
Vec6 f(const MatrixBase<Derived>& input1,
const MatrixBase<Derived>& input2,
const MatrixBase<Derived>& input3)
{
Vec6 answer;
//Manipulate answer
//...
return answer;
}
then with optimization enabled on most modern compilers no copy will happen. See the Wikipedia section on Return Value Optimization here.
(To test the theory with your object and compiler, you could wrap a Vec6 in your own class, make all constructors print when they are executed, and check you are getting the number of constructor calls you expect.)

Is there a downside to declaring variables with auto in C++?

It seems that auto was a fairly significant feature to be added in C++11 that seems to follow a lot of the newer languages. As with a language like Python, I have not seen any explicit variable declaration (I am not sure if it is possible using Python standards).
Is there a drawback to using auto to declare variables instead of explicitly declaring them?
The question is about drawbacks of auto, so this answer highlights some of those. A drawback of using a programming language feature (in this case, a facility associated with a language keyword) does not mean that feature is unacceptable, nor does it mean that feature should be avoided entirely. It means there are disadvantages along with advantages, so a decision to use auto type deduction over alternatives must consider engineering trade-offs.
When used well, auto has several advantages as well - which is not the subject of the question. The drawbacks result from ease of abuse, and from increased potential for code to behave in unintended or unexpected ways.
The main drawback is that, by using auto, you don't necessarily know the type of object being created. There are also occasions where the programmer might expect the compiler to deduce one type, but the compiler adamantly deduces another.
Given a declaration like
auto result = CallSomeFunction(x,y,z);
you don't necessarily have knowledge of what type result is. It might be an int. It might be a pointer. It might be something else. All of those support different operations. You can also dramatically change the code by a minor change like
auto result = CallSomeFunction(a,y,z);
because, depending on what overloads exist for CallSomeFunction() the type of result might be completely different - and subsequent code may therefore behave completely differently than intended. You might suddenly trigger error messages in later code(e.g. subsequently trying to dereference an int, trying to change something which is now const). The more sinister change is where your change sails past the compiler, but subsequent code behaves in different and unknown - possibly buggy - ways. For example (as noted by sashoalm in comments) if the deduced type of a variable changes an integral type to a floating point type - and subsequent code is unexpectedly and silently affected by loss of precision.
Not having explicit knowledge of the type of some variables therefore makes it harder to rigorously justify a claim that the code works as intended. This means more effort to justify claims of "fit for purpose" in high-criticality (e.g. safety-critical or mission-critical) domains.
The other, more common drawback, is the temptation for a programmer to use auto as a blunt instrument to force code to compile, rather than thinking about what the code is doing, and working to get it right.
This isn't a drawback of auto in a principled way exactly, but in practical terms it seems to be an issue for some. Basically, some people either: a) treat auto as a savior for types and shut their brain off when using it, or b) forget that auto always deduces to value types. This causes people to do things like this:
auto x = my_obj.method_that_returns_reference();
Oops, we just deep copied some object. It's often either a bug or a performance fail. Then, you can swing the other way too:
const auto& stuff = *func_that_returns_unique_ptr();
Now you get a dangling reference. These problems aren't caused by auto at all, so I don't consider them legitimate arguments against it. But it does seem like auto makes these issue more common (from my personal experience), for the reasons I listed at the beginning.
I think given time people will adjust, and understand the division of labor: auto deduces the underlying type, but you still want to think about reference-ness and const-ness. But it's taking a bit of time.
Other answers are mentioning drawbacks like "you don't really know what the type of a variable is." I'd say that this is largely related to sloppy naming convention in code. If your interfaces are clearly-named, you shouldn't need to care what the exact type is. Sure, auto result = callSomeFunction(a, b); doesn't tell you much. But auto valid = isValid(xmlFile, schema); tells you enough to use valid without having to care what its exact type is. After all, with just if (callSomeFunction(a, b)), you wouldn't know the type either. The same with any other subexpression temporary objects. So I don't consider this a real drawback of auto.
I'd say its primary drawback is that sometimes, the exact return type is not what you want to work with. In effect, sometimes the actual return type differs from the "logical" return type as an implementation/optimisation detail. Expression templates are a prime example. Let's say we have this:
SomeType operator* (const Matrix &lhs, const Vector &rhs);
Logically, we would expect SomeType to be Vector, and we definitely want to treat it as such in our code. However, it is possible that for optimisation purposes, the algebra library we're using implements expression templates, and the actual return type is this:
MultExpression<Matrix, Vector> operator* (const Matrix &lhs, const Vector &rhs);
Now, the problem is that MultExpression<Matrix, Vector> will in all likelihood store a const Matrix& and const Vector& internally; it expects that it will convert to a Vector before the end of its full-expression. If we have this code, all is well:
extern Matrix a, b, c;
extern Vector v;
void compute()
{
Vector res = a * (b * (c * v));
// do something with res
}
However, if we had used auto here, we could get in trouble:
void compute()
{
auto res = a * (b * (c * v));
// Oops! Now `res` is referring to temporaries (such as (c * v)) which no longer exist
}
It makes your code a little harder, or tedious, to read.
Imagine something like that:
auto output = doSomethingWithData(variables);
Now, to figure out the type of output, you'd have to track down signature of doSomethingWithData function.
One of the drawbacks is that sometimes you can't declare const_iterator with auto. You will get ordinary (non const) iterator in this example of code taken from this question:
map<string,int> usa;
//...init usa
auto city_it = usa.find("New York");
Like this developer, I hate auto. Or rather, I hate how people misuse auto.
I'm of the (strong) opinion that auto is for helping you write generic code, not for reducing typing.
C++ is a language whose goal is to let you write robust code, not to minimize development time.
This is fairly obvious from many features of C++, but unfortunately a few of the newer ones like auto that reduce typing mislead people into thinking they should start being lazy with typing.
In pre-auto days, people used typedefs, which was great because typedef allowed the designer of the library to help you figure out what the return type should be, so that their library works as expected. When you use auto, you take away that control from the class's designer and instead ask the compiler to figure out what the type should be, which removes one of the most powerful C++ tools from the toolbox and risks breaking their code.
Generally, if you use auto, it should be because your code works for any reasonable type, not because you're just too lazy to write down the type that it should work with.
If you use auto as a tool to help laziness, then what happens is that you eventually start introducing subtle bugs in your program, usually caused by implicit conversions that did not happen because you used auto.
Unfortunately, these bugs are difficult to illustrate in a short example here because their brevity makes them less convincing than the actual examples that come up in a user project -- however, they occur easily in template-heavy code that expect certain implicit conversions to take place.
If you want an example, there is one here. A little note, though: before being tempted to jump and criticize the code: keep in mind that many well-known and mature libraries have been developed around such implicit conversions, and they are there because they solve problems that can be difficult if not impossible to solve otherwise. Try to figure out a better solution before criticizing them.
auto does not have drawbacks per se, and I advocate to (hand-wavily) use it everywhere in new code. It allows your code to consistently type-check, and consistently avoid silent slicing. (If B derives from A and a function returning A suddenly returns B, then auto behaves as expected to store its return value)
Although, pre-C++11 legacy code may rely on implicit conversions induced by the use of explicitly-typed variables. Changing an explicitly-typed variable to auto might change code behaviour, so you'd better be cautious.
Keyword auto simply deduce the type from the return value. Therefore, it is not equivalent with a Python object, e.g.
# Python
a
a = 10 # OK
a = "10" # OK
a = ClassA() # OK
// C++
auto a; // Unable to deduce variable a
auto a = 10; // OK
a = "10"; // Value of const char* can't be assigned to int
a = ClassA{} // Value of ClassA can't be assigned to int
a = 10.0; // OK, implicit casting warning
Since auto is deduced during compilation, it won't have any drawback at runtime whatsoever.
What no one mentioned here so far, but for itself is worth an answer if you asked me.
Since (even if everyone should be aware that C != C++) code written in C can easily be designed to provide a base for C++ code and therefore be designed without too much effort to be C++ compatible, this could be a requirement for design.
I know about some rules where some well defined constructs from C are invalid for C++ and vice versa. But this would simply result in broken executables and the known UB-clause applies which most times is noticed by strange loopings resulting in crashes or whatever (or even may stay undetected, but that doesn't matter here).
But auto is the first time1 this changes!
Imagine you used auto as storage-class specifier before and transfer the code. It would not even necessarily (depending on the way it was used) "break"; it actually could silently change the behaviour of the program.
That's something one should keep in mind.
1At least the first time I'm aware of.
As I described in this answer auto can sometimes result in funky situations you didn't intend.
You have to explictly say auto& to have a reference type while doing just auto can create a pointer type. This can result in confusion by omitting the specifier all together, resulting in a copy of the reference instead of an actual reference.
One reason that I can think of is that you lose the opportunity to coerce the class that is returned. If your function or method returned a long 64 bit, and you only wanted a 32 unsigned int, then you lose the opportunity to control that.
I think auto is good when used in a localized context, where the reader easily & obviously can deduct its type, or well documented with a comment of its type or a name that infer the actual type. Those who don't understand how it works might take it in the wrong ways, like using it instead of template or similar. Here are some good and bad use cases in my opinion.
void test (const int & a)
{
// b is not const
// b is not a reference
auto b = a;
// b type is decided by the compiler based on value of a
// a is int
}
Good Uses
Iterators
std::vector<boost::tuple<ClassWithLongName1,std::vector<ClassWithLongName2>,int> v();
..
std::vector<boost::tuple<ClassWithLongName1,std::vector<ClassWithLongName2>,int>::iterator it = v.begin();
// VS
auto vi = v.begin();
Function Pointers
int test (ClassWithLongName1 a, ClassWithLongName2 b, int c)
{
..
}
..
int (*fp)(ClassWithLongName1, ClassWithLongName2, int) = test;
// VS
auto *f = test;
Bad Uses
Data Flow
auto input = "";
..
auto output = test(input);
Function Signature
auto test (auto a, auto b, auto c)
{
..
}
Trivial Cases
for(auto i = 0; i < 100; i++)
{
..
}
Another irritating example:
for (auto i = 0; i < s.size(); ++i)
generates a warning (comparison between signed and unsigned integer expressions [-Wsign-compare]), because i is a signed int. To avoid this you need to write e.g.
for (auto i = 0U; i < s.size(); ++i)
or perhaps better:
for (auto i = 0ULL; i < s.size(); ++i)
I'm surprised nobody has mentioned this, but suppose you are calculating the factorial of something:
#include <iostream>
using namespace std;
int main() {
auto n = 40;
auto factorial = 1;
for(int i = 1; i <=n; ++i)
{
factorial *= i;
}
cout << "Factorial of " << n << " = " << factorial <<endl;
cout << "Size of factorial: " << sizeof(factorial) << endl;
return 0;
}
This code will output this:
Factorial of 40 = 0
Size of factorial: 4
That was definetly not the expected result. That happened because auto deduced the type of the variable factorial as int because it was assigned to 1.

What can C++ offer as far as functional programming?

Are the following things, considered intrinsic to FP, possible in C++?
higher order functions
lambdas (closures/anonymous functions)
function signatures as types
type polymorphism (generics)
immutable data structures
algebraic data types (variants)
adhock data structures (tuples)
partial function applications
type inference
tail recursion
pattern matching
garbage collection
Let me start by noting that most of these are not "intrinsic", or shall we say, "required"; many of these are absent from notable functional languages, and in theory, many of these features can be used to implement the others (such as higher order functions in untyped lambda calculus).
However, let's go through these:
Closures
Closures are not necessary, and are syntactical sugar: by the process of Lambda Lifting, you can convert any closure into a function object (or even just a free function).
Named Functors (C++03)
Just to show that this isn't a problem to begin with, here's a simple way to do this without lambdas in C++03:
Isn't A Problem:
struct named_functor
{
void operator()( int val ) { std::cout << val; }
};
vector<int> v;
for_each( v.begin(), v.end(), named_functor());
Anonymous functions (C++11)
However, anonymous functions in C++11 (also called lambda functions, as they derive from the LISP history), which are implemented as non-aliasingly named function objects, can provide the same usability (and are in fact referred to as closures, so yes, C++11 does have closures):
No problem:
vector<int> v;
for_each( v.begin(), v.end(), [] (int val)
{
std::cout << val;
} );
Polymorphic anonymous functions (C++14)
Even less of a problem, we don't need to care about the parameter types anymore in C++14:
Even Less Problem:
auto lammy = [] (auto val) { std::cout << val; };
vector<int> v;
for_each( v.begin(), v.end(), lammy);
forward_list<double> w;
for_each( w.begin(), w.end(), lammy);
I should note this fully support closure semantics, such as grabbing variables from scope, both by reference and by value, as well as being able to grab ALL variables, not merely specified ones. Lambda's are implicitly defined as function objects, providing the necessary context for these to work; usually this is done via lambda lifting.
Higher Order Functions
No problem:
std::function foo_returns_fun( void );
Is that not sufficient for you? Here's a lambda factory:
std::function foo_lambda( int foo ) { [=] () { std::cout << foo; } };
You can't create functions, but you can function objects, which can be passed around as std::function same as normal functions. So all the functionality is there, it's just up to you to put it together. I might add that much of the STL is designed around giving you reusable components with which to form ad-hoc function objects, approximating creating functions out of whole cloth.
Partial Function Applications
No problem
std::bind fully supports this feature, and is quite adept at transformations of functions into arbitrarily different ones as well:
void f(int n1, int n2, int n3, const int& n4, int n5)
{
std::cout << n1 << ' ' << n2 << ' ' << n3 << ' ' << n4 << ' ' << n5 << '\n';
}
int n = 7;
// (_1 and _2 are from std::placeholders, and represent future
// arguments that will be passed to f1)
auto f1 = std::bind(f, _2, _1, 42, std::cref(n), n);
For memoization and other partial function specialization techniques, you have to code it yourself using a wrapper:
template <typename ReturnType, typename... Args>
std::function<ReturnType (Args...)>
memoize(ReturnType (*func) (Args...))
{
auto cache = std::make_shared<std::map<std::tuple<Args...>, ReturnType>>();
return ([=](Args... args) mutable
{
std::tuple<Args...> t(args...);
if (cache->find(t) == cache->end())
(*cache)[t] = func(args...);
return (*cache)[t];
});
}
It can be done, and in fact it can be done relatively automatically, but no one has yet done it for you.
}
Combinators
No problem:
Let's start with the classics: map, filter, fold.
vector<int> startvec(100,5);
vector<int> endvec(100,1);
// map startvec through negate
std::transform(startvec.begin(), startvec.end(), endvec.begin(), std::negate<int>())
// fold startvec through add
int sum = std::accumulate(startvec.begin(), startvec.end(), 0, std::plus<int>());
// fold startvec through a filter to remove 0's
std::copy_if (startvec.begin(), startvec.end(), endvec.begin(), [](int i){return !(i==0);} );
These are quite simple, but the headers <functional>, <algorithm>, and <numerical> provide dozens of functors (objects callable as functions) which can be placed into these generic algorithms, as well as other generic algorithms. Together, these form a powerful ability to compose features and behavior.
Let's try something more functional though: SKI can easily be implemented, and is very functional, deriving from untyped lambda calculus:
template < typename T >
T I(T arg)
{
return arg;
}
template < typename T >
std::function<T(void*)> K(T arg)
{
return [=](void*) -> T { return arg; };
}
template < typename T >
T S(T arg1, T arg2, T arg3)
{
return arg1(arg3)(arg2(arg1));
}
These are very fragile; in effect, these must be of a type which returns it's own type and takes a single argument of their own type; such constraints would then allow for all the functional reasoning of the SKI system to be applied safely to the composition of these. With a little work, and some template metaprogramming, much of this could even be done at compile time through the magic of expression templates to form highly optimized code.
Expression templates, as an aside, are a technique in which an expression, usually in the form of a series of operations or sequential order of code, is based as an argument to a template. Expression templates therefore are compile time combinators; they are highly efficient, type safe, and effectively allow for domain specific languages to be embedded directly into C++. While these are high level topics, they are put to good use in the standard library and in boost::spirit, as shown below.
Spirit Parser Combinators
template <typename Iterator>
bool parse_numbers(Iterator first, Iterator last)
{
using qi::double_;
using qi::phrase_parse;
using ascii::space;
bool r = phrase_parse(
first,
last,
double_ >> (char_(',') >> double_),
space
);
if (first != last) // fail if we did not get a full match
return false;
return r;
}
This identifies a comma deliminated list of numbers. double_ and char_ are individual parsers that identify a single double or a single char, respectively. Using the >> operator, each one passes themselves to the next, forming a single large combined parser. They pass themselves via templates, the "expression" of their combined action building up. This is exactly analogous to traditional combinators, and is fully compile time checked.
Valarray
valarray, a part of the C++11 standard, is allowed to use expression templates (but not required, for some odd reason) in order to facilitate efficiency of transforms. In theory, any number of operations could be strung together, which would form quite a large messy expression which can then be aggressively inlined for speed. This is another form of combinator.
I suggest this resource if you wish to know more about expression templates; they are absolutely fantastic at getting all the compile time checks you wish done, as well as improving the re-usability of code. They are hard to program, however, which is why I would advise you find a library that contains the idioms you want instead of rolling your own.
Function Signatures As Types
No problem
void my_int_func(int x)
{
printf( "%d\n", x );
}
void (*foo)(int) = &my_int_func;
or, in C++, we'd use std::function:
std::function<void(int)> func_ptr = &my_int_func;
Type Inference
No problem
Simple variables typed by inference:
// var is int, inferred via constant
auto var = 10;
// y is int, inferred via var
decltype(var) y = var;
Generic type inference in templates:
template < typename T, typename S >
auto multiply (const T, const S) -> decltype( T * S )
{
return T * S;
}
Furthermore, this can be used in lambdas, function objects, basically any compile time expression can make use of decltype for compile time type inference.
But that's not what you are really after here, are you? You want type deduction as well as type restriction, you want type reconstruction and type derivations. All of this can be done with concepts, but they are not part of the language yet.
So, why don't we just implement them? boost::concepts, boost::typeerasure, and type traits (descendant from boost::tti and boost::typetraits) can do all of this.
Want to restrict a function based on some type? std::enable_if to the rescue!
Ah, but that's ad hoc right? That would mean for any new type you'd want to construct, you'd need to do boilerplate, etc etc. Well, no, but here's a better way!
template<typename RanIter>
BOOST_CONCEPT_REQUIRES(
((Mutable_RandomAccessIterator<RanIter>))
((LessThanComparable<typename Mutable_RandomAccessIterator<RanIter>::value_type>)),
(void)) // return type
stable_sort(RanIter,RanIter);
Now your stable_sort can only work on types that match your stringent requirements. boost::concept has tons of prebuilt ones, you just need to put them in the right place.
If you want to call different functions or do different things off types, or disallow types, use type traits, it's now standard. Need to select based on parts of the type, rather than the full type? Or allow many different types, which have a common interface, to be only a single type with that same interface? Well then you need type erasure, illustrated below:
Type Polymorphism
No problem
Templates, for compile time type polymorphism:
std::vector<int> intvector;
std::vector<float> floatvector;
...
Type erasure, for run time and adaptor based type polymorphism:
boost::any can_contain_any_type;
std::function can_call_any_function;
any_iterator can_iterator_any_container;
...
Type erasure is possible in any OO language, and involves setting up small function objects which derive from a common interface, and translate internal objects to it. With a little boost MPL boilerplate, this is fast, easy, and effective. Expect to see this become real popular soon.
Immutable Datastructures
Not syntax for explicit constructions, but possible:
Can be done via not using mutators or template metaprogramming. As this is a lot of code (a full ADT can be quite large), I will link you here, to show how to make an immutable singly linked list.
To do this at compile time would require a good amount of template magic, but can be done more easily with constexpr. This is an exercise for the reader; I don't know of any compile time libraries for this off the top of my head.
However, making an immutable datastructure from the STL is quite easy:
const vector<int> myvector;
There you are; a data structure that cannot be changed! In all seriousness, finger tree implementations do exist and are probably your best bet for associative array functionality. It's just not done for you by default.
Algebraic data types
No problem:
The amazing boost::mpl allows you to constrain uses of types, which along with boost::fusion and boost::functional to do anything at compile time that you would want in regards to ADT. In fact, most of it is done for you:
#include <boost/mpl/void.hpp>
//A := 1
typedef boost::mpl::void_ A;
As stated earlier, a lot of the work isn't done for you in a single place; for example, you'd need to use boost::optional to get optional types, and mpl to get unit type, as seen above. But using relatively simple compile time template mechanics, you can do recursive ADT types, which means you can implement generalized ADT's. As the template system is turing complete, you have a turing complete type checker and ADT generator at your disposal.
It's just waiting for you to bring the pieces together.
Variant based ADT's
boost::variant provides type checked unions, in addition to the original unions in the language. These can be used with no fuss, drop in:
boost::variant< int, std::string > v;
This variant, which can be int or string, can be assigned either way with checking, and you can even do run time variant based visitation:
class times_two_visitor
: public boost::static_visitor<>
{
public:
void operator()(int & i) const
{
i *= 2;
}
void operator()(std::string & str) const
{
str += str;
}
};
Anonymous/Ad-hoc data structures
No problem:
Of course we have tuples! You could use structs if you like, or:
std::tuple<int,char> foo (10,'x');
You can also perform a good deal of operations on tuples:
// Make them
auto mytuple = std::make_tuple(3.14,"pi");
std::pair<int,char> mypair (10,'a');
// Concatenate them
auto mycat = std::tuple_cat ( mytuple, std::tuple<int,char>(mypair) );
// Unpack them
int a, b;
std::tie (a, std::ignore, b, std::ignore) = mycat;
Tail Recursion
No explicit support, iteration is sufficient
This is not supported or mandated in Common LISP, though it is in Scheme, and therefore I don't know if you can say it's required. However, you can easily do tail recursion in C++:
std::size_t get_a_zero(vector<int>& myints, std::size_t a ) {
if ( myints.at(a) == 0 ) {
return a;
}
if(a == 0) return myints.size() + 1;
return f(myints, a - 1 ); // tail recursion
}
Oh, and GCC will compile this into an iterative loop, no harm no foul. While this behavior is not mandated, it is allowable and is done in at least one case I know of (possibly Clang as well).
But we don't need tail recursion: C++ totally is fine with mutations:
std::size_t get_a_zero(vector<int>& myints, std::size_t a ) {
for(std::size_t i = 0; i <= myints.size(); ++i){
if(myints.at(i) == 0) return i;
}
return myints.size() + 1;
}
Tail recursion is optimized into iteration, so you have exactly as much power.
Furthermore, through the usage of boost::coroutine, one can easily provide usage for user defined stacks and allow for unbounded recursion, making tail recursion unnecessary. The language is not actively hostile to recursion nor to tail recursion; it merely demands you provide the safety yourself.
Pattern Matching
No problem:
This can easily be done via boost::variant, as detailed elsewhere in this, via the visitor pattern:
class Match : public boost::static_visitor<> {
public:
Match();//I'm leaving this part out for brevity!
void operator()(const int& _value) const {
std::map<int,boost::function<void(void)>::const_iterator operand
= m_IntMatch.find(_value);
if(operand != m_IntMatch.end()){
(*operand)();
}
else{
defaultCase();
}
}
private:
void defaultCause() const { std::cout << "Hey, what the..." << std::endl; }
boost::unordered_map<int,boost::function<void(void)> > m_IntMatch;
};
This example, from this very charming website shows how to gain all the power of Scala pattern matching, merely using boost::variant. There is more boilerplate, but with a nice template and macro library, much of that would go away.
In fact, here is a library that has done all that for you:
#include <utility>
#include "match.hpp" // Support for Match statement
typedef std::pair<double,double> loc;
// An Algebraic Data Type implemented through inheritance
struct Shape
{
virtual ~Shape() {}
};
struct Circle : Shape
{
Circle(const loc& c, const double& r) : center(c), radius(r) {}
loc center;
double radius;
};
struct Square : Shape
{
Square(const loc& c, const double& s) : upper_left(c), side(s) {}
loc upper_left;
double side;
};
struct Triangle : Shape
{
Triangle(const loc& a, const loc& b, const loc& c) : first(a), second(b), third(c) {}
loc first;
loc second;
loc third;
};
loc point_within(const Shape* shape)
{
Match(shape)
{
Case(Circle) return matched->center;
Case(Square) return matched->upper_left;
Case(Triangle) return matched->first;
Otherwise() return loc(0,0);
}
EndMatch
}
int main()
{
point_within(new Triangle(loc(0,0),loc(1,0),loc(0,1)));
point_within(new Square(loc(1,0),1));
point_within(new Circle(loc(0,0),1));
}
As provided by this lovely stackoverflow answer
As you can see, it is not merely possible but also pretty.
Garbage Collection
Future standard, allocators, RAII, and shared_ptr are sufficient
While C++ does not have a GC, there is a proposal for one that was voted down in C++11, but may be included in C++1y. There are a wide variety of user defined ones you can use, but the C++ does not need garbage collection.
C++ has an idiom know as RAII to deal with resources and memory; for this reason, C++ has no need for a GC as it does not produce garbage; everything is cleaned up promptly and in the correct order by default. This does introduce the problem of who owns what, but this is largely solved in C++11 via shared pointers, weak pointers, and unique pointers:
// One shared pointer to some shared resource
std::shared_ptr<int> my_int (new int);
// Now we both own it!
std::shared_ptr<int> shared_int(my_int);
// I can use this int, but I cannot prevent it's destruction
std::weak_ptr<int> weak_int (shared_int);
// Only I can ever own this int
std::unique_ptr<int> unique_int (new int);
These allow you to provide a much more deterministic and user controlled form of garbage collection, that does not invoke any stop the world behavior.
That not easy enough for you? Use a custom allocator, such as boost::pool or roll your own; it's relatively easy to use a pool or arena based allocator to get the best of both worlds: you can easily allocate as freely as you like, then simply delete the pool or arena when you are done. No fuss, no muss, and no stopping the world.
However, in modern C++11 design, you would almost never use new anyway except when allocating into a *_ptr, so the wish for a GC is not necessary anyway.
In Summary
C++ has plenty of functional language features, and all of the ones you listed can be done, with the same power and expression ability of Haskell or Lisp. However, most of these features are not built in by default; this is changing, with the introduction of lambda's (which fill in the functional parts of the STL), and with the absorption of boost into the standard language.
Not all of these idioms are the most palatable, but none of them are particularly onerous to me, or unamendable to a few macros to make them easier to swallow. But anyone who says they are not possible has not done their research, and would seem to me to have limited experience with actual C++ programming.
From your list, C++ can do:
function signatures as types
type polymorphism (but not first-class like in many functional languages)
immutable data structures (but they require more work)
It can do only very limited forms of:
higher order functions / closures (basically, without GC most of the more interesting higher-order functional idioms are unusable)
adhoc data structures (if you mean in the form of light-weight structural types)
You can essentially forget about:
algebraic data types & pattern matching
partial function applications (requires implicit closures in general)
type inference (despite what people call "type inference" in C++ land it's a far shot from what you get with Hindley/Milner a la ML or Haskell)
tail calls (some compilers can optimise some limited cases of tail self-recursion, but there is no guarantee, and the language is actively hostile to the general case (pointers to the stack, destructors, and all that))
garbage collection (you can use Boehm's conservative collector, but it's no real substitute and rather unlikely to coexist peacefully with third-party code)
Overall, trying to do anything functional that goes beyond trivialities will be either a major pain in C++ or outright unusable. And even the things that are easy enough often require so much boilerplate and heavy notation that they are not very attractive. (Some C++ aficionados like to claim the opposite, but frankly, most of them seem to have rather limited experience with actual functional programming.)
(Just to add a little to Alice's answer, which is excellent.)
I'm far from a functional programming expert, but the compile-time template metaprogramming language in C++ is often seen as being "functional", albeit with a very arcane syntax. In this language, "functions" become (often recursive) class template instantiations. Partial specialisation serves the purpose of pattern matching, to terminate recursion and so on. So a compile-time factorial might look something like so:
template <int I>
struct fact
{
static const int value = I * fact<I-1>::value;
};
template <>
struct fact<1>
{
static const int value = 1;
};
Of course, this is pretty hideous, but many people (particularly the Boost developers) have done incredibly clever and complex things with just these tools.
It's possibly also worth mentioning the C++11 keyword constexpr, which denotes functions which may be evaluated at compile time. In C++11, constexpr functions are restricted to (basically) just a bare return statement; but the ternary operator and recursion are allowed, so the above compile-time factorial can be restated much more succinctly (and understandably) as:
constexpr int fact(int i)
{
return i == 1 ? 1 : i * fact(i-1);
}
with the added benefit that fact() can now be called at run-time too. Whether this constitutes programming in a functional style is left for the reader to decide :-)
(C++14 looks likely to remove many of the restrictions from constexpr functions, allowing a very large subset of C++ to be called at compile-time)
On a funny note, if there's a <functional> standard header, that means that there's at least some substantial support for functional programming.
Indeed, a great and important part of the C++ language is, in fact, template meta-programming, which is a powerful tool when one needs to write generic code. But TMP is compile-time and, most importantly, is about type computation. And types can't be changed, so once you "declare a variable holding a type", it will not hold any other type (more on the matter here); it's immutable, so you have to think in terms of functional programming principles to work with and to understand TMP. To cite Louis Dionne (from the intro to his Boost.Hana's documentation),
Programming with heterogeneous objects is inherently functional – since it is impossible to modify the type of an object, a new object must be introduced instead, which rules out mutation. Unlike previous metaprogramming libraries whose design was modeled on the STL, Hana uses a functional style of programming which is the source for a good portion of its expressiveness. However, as a result, many concepts presented in the reference will be unfamiliar to C++ programmers without a knowledge of functional programming. The reference attempts to make these concepts approachable by using intuition whenever possible, but bear in mind that the highest rewards are usually the fruit of some effort.
With reference to the list in the question, I would suggest reading Why Functional Programming Matters, which highlights that the truly fundamental features of such a programming paradigm are mainly 2:
higher order functions,
lazy evaluation.
And C++ gives you both. At least today:
That C++ has higher-order functions is not been a secret for a long time. Most if not all <algorithm>s accept a function or function object to customize their behavior, so algorithms are higher-order function. Some "standard" function objects you might want to pass to higher-order functions are defined in <functional> and with the help of lambdas you can write as many and as varied as you want.
As stated in a comment, you can do all you want with a Turing-complete language, and C++ offers tools to make lazy evaluation possible with human-level efforts (no, I'm not saying I'd been able to do it). A library which leverages a lot of C++ power to enable lazy evaluation is Range-v3 (which C++20's <ranges> is just a small part of). To give a silly example, if you were to execute
somelist = join $ map (take 1) $ chunk 2 $ drop 10 $ [0..] in Haskell
you'd have in somelist a proxy for an infinite list that would materialize to [10,12,14,16,…] if you were to try traversing it. Similarly with Range-v3 you could do the same think by writing something very similar, such as auto somelist = iota(0) | drop(10) | chunk(2) | transform(take(1)) | join; (working code for a similar example is here), where the differences are minimal, if you think about it.
Furthermore, I would suggest to refer to Ivan Čukić' Functional Programming in C++ for some practical examples of how you can write functional programming in C++.
And since I mentioned it, I would strongly suggest to read QuickStart of Louis Dionne's Boost.Hana (I'll make some reference to specific bits of the doc in the rest of the answer).
Now, some comments on some of the points in the list.
higher order functions
I'd say C++ has this since… the '90s? Having higher-order functions in a language simply means that functions are first class or, in other words, that they can be passed to and returned by other functions calls. Now, strictly speaking, properly said C++ functions are not like that: you can't a pass a function to anther one, but just a pointer to it, which in many scenarii works the same, but it's still a different thing. On the other hand C++ has operator overloading, which allows you to write a struct+operator(), and an object of that class *can be passed around and behaves just like a function. So yes, C++ has had higher-order functions for a long time; at least since operator overloading was introduced (1985, apparently).
lambdas (closures/anonymous functions)
Lambdas were introduced in C++11, but they have become more powerful with each standard. To give some examples, C++14 introduced generic lambdas, C++17 made stateless lambdas constexprable, and C++20 allowed an explicit list of template parameters. They obviously are more restricted than hand-written struct+operator()s, but as far as functional programming is concerned, they are just good. Personally, I only see them come short pre-C++20 because you can't make them accept all types satisfying a concept: you either have [](the type){} or [](auto){}. With C++20 you can have []<SomeConcept T>(T){}, so I don't know why I'd ever want to write a struct+operator().
immutable data structures
Well, I would say that mutating data structures is a choice, more than a tool. I'm happy I can mutate things if I want to, but I can still write code by adhering to functional programming principles.
partial function applications
As soon as you can pass functions around, you can write higher-order functions to curry or partially apply functions. I think there's an example in the book I mentioned above, but more practically, you can just make use of Boost.Hana's abstractions. It offers boost::hana::partial to partially apply a function, satisfying partial(f, x...)(y...) == f(x..., y...); but also reverse_partial, which satisfies reverse_partial(f, x...)(y...) == f(y..., x...). But in reality, it offers quite a bit combinators which are common to the functional programming language par excellence, Haskell, and which I list below¹.
tail recursion
I suspect this is more about how good compilers can be at understanding your code and producing the most appropriate binary.
pattern matching
Not there yet, but this talk by Herb Sutter is a "must watch"!
garbage collection
C++11 introduced std::unique_ptr, std::shared_ptr, std::weak_ptr, which have (all?) improved over time. They all together provide what you need to have a deterministic garbage collector in C++.
(¹) Here are some of the combinators offered by Boost.Hana.
filp, satisfying flip(f)(x, y, z...) == f(y, x, z...) and, if you are familiar with Haskell, corresponding to Haskell's namesake,
id, which corresponds to C++20 std::identity and to Haskell's namesake
on, which satisfies on(f, g)(x...) == f(g(x)...) and corresponds to Haskell's Data.Function.on, but is actually more general!
compose, which corresponds to Haskell's namesake
always, which corresponds to Haskell's const
demux, which I don't dare explaining in words, but which obeys demux(f)(g...)(x...) == f(g(x...)...)

Expression templates: improving performance in evaluating expressions?

By the expression templates technique, a matrix expression like
D = A*B+sin(C)+3.;
is pretty much equivalent, in terms of computing performance, to a hand-written for loop.
Now, suppose that I have the following two expressions
D = A*B+sin(C)+3.;
F = D*E;
cout << F << "\n";
In a "classical" implementation by expression templates, the computing performance will be pretty much the same as that of two for loops in sequence. This is because the expressions are evaluated immediately after the = operators are encountered.
My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused and that the values of interest are the sole elements of F, so that only the expression
F = E*(A*B+sin(C)+3.);
is evaluated and the whole performance is equivalent to that of a single for loop?
Of course, such an hypothetical technique should also be able to return back to evaluate the expression
D = A*B+sin(C)+3.;
if later in the code the values of D are needed.
Thank you in advance for any help.
EDIT: Results experimenting the solution suggested by Evgeny
Original instruction:
Result D=A*B-sin(C)+3.;
Computing time: 32ms
Two steps instruction:
Result Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;
Computing time: 43ms
Solution with auto:
auto&& Intermediate=A*B;
Result D=Intermediate-sin(C)+3.;
Computing time: 32ms.
In conclusion, auto&& enabled to restore the original computing time of the single instruction case.
EDIT: Summarizing relevant links, following the suggestions by Evgeny
Copy Elision
What does auto tell us
Universal References in C++11
C++ Rvalue References Explained
C++ and Beyond 2012: Scott Meyers - Universal References in C++11
Evaluation of expression template typically happens when you save result to some special type like:
Result D = A*B+sin(C)+3.;
Result type of expression:
A*B+sin(C)+3.
is not Result, but it is something that convertable to Result. And evaluation happens during such conversion.
My question is: is there any technique (for example, using placeholders?) to recognize that the values of D are actually unused
Such kind of "transfromation":
Result D = A*B+sin(C)+3.;
Result F = D*E;
to
Result F = (A*B+sin(C)+3.)*E;
Is possible when you do not evaluate D. To do this, typically you should capture D as it's real , expression type. For instance, with help of auto:
auto &&D = A*B+sin(C)+3.;
Result F = D*E;
However, you should be carefull - sometimes expression template captures references to it's operands, and if you have some rvalue which would expire after it's expression:
auto &&D = A*get_large_rvalue();
// At this point, result of **get_large_rvalue** is destructed
// And D has expiried reference
Result F = D*E;
Where get_large_rvalue is:
LargeMatrix get_large_rvalue();
It's result is rvalue, it expiries at the end of full expression when get_large_rvalue was called. If something within expression would store pointer/reference to it (for later evaluation) and you would "defer" evaluation - pointer/reference will outlive pointed/referenced object.
In order to prevent this, you should do:
auto &&intermediate = get_large_rvalue(); // it would live till the end of scope
auto &&D = A*intermediate ;
Result F = D*E;
I'm not familiar with C++11 but, as I understand, auto asks the compiler to determine the type of a variable from its initialization
Yes, exactly. This is called Type Inference/Deduction.
C++98/03 had type deduction only for template functions, in C++11 there is auto.
Do you know how do CUDA and C++11 interact each other?
I haven't used CUDA (though I used OpenCL), but I guess that there will be no any problems in Host code with C++11. Maybe some C++11 features are not supported within Device code, but for your purpose - you need auto only in Host code
Finally, is there any possibility with only C++?
Do you mean pre-C++11? I.e. C++98/C++03?
Yes, it is possible, but it has more syntax noise, maybe that would be reason to reject it:
// somehwhere
{
use_D(A*B+sin(C)+3.);
}
// ...
template<typename Expression>
void use_D(Expression D) // depending on your expression template library
// it may be better to use (const Expression &e)
{
Result F = D*E;
}
I'm now using CUDA/Visual Studio 2010 under Windows. Could you please recommend a compiler/toolset/environment for both OS' to use C++11 in the framework of my interest (GPGPU and CUDA, in you know any)
MSVC 2010 does supports some parts of C++11. In particular it supports auto. So, if you need only auto from C++11 - MSVC2010 is OK.
But if you may use MSVC2012 - I would recommed to stick with it - it has much better C++11 support.
Also, the trick auto &&intermediate = get_large_rvalue(); seems to be not "transparent" to a third party user (which is not supposed to know such an issue). Am I right? Any alternative?
If expression template stores references to some values, and you defer it's evaluation. You should be sure that all it's references are alive at the place of evaluation. Use any method which you want - it can be done without auto, like:
LargeMatrix temp = get_large_rvalue();
Or maybe even global/static variable (less prefered method).
A last comment/question: to use auto &&D = A*B+sin(C)+3.; it seems that I should overload the operator= for assignments between two expressions, right?
No, such form does not requires nor copy/move assignment operator nor copy/move constructor.
Basically it just names temporary value, and prolongs it's lifetime to the end of scope. Check this SO.
But, if you would use another form:
auto D = A*B+sin(C)+3.;
In such case copy/move/conversion constructor maybe required in order to compile (though actual copy can be optimized away by compiler by use of Copy Ellision)
Also, switching between using auto (for the intermediate expressions) and Result to force calculation seems to be non-transparent to a third party user. Any alternative?
I am not sure if there is any alternative. This is by nature of expression templates. While you using them in expressions - they return some internal intermediate types, but when you store to some "special" type - evaluation is triggered.
In c++11 you can use auto
auto D = A*B+sin(C)+3.;
assuming you are using expression templates, the type of D would be <some template type which represents an expression>.
Now, you have to use this carefully, because you are saving up some memory (no need to allocate space for a matrix) but depending on how you use it this may not be the best.
Think about
F = D*E
An element D[i][j] needs to be "visited" many times when computing D*E (actually n times where n is the size of the matrices). If D is a plain-matrix type this is no problem. If D is an expression you are evaluating it many, many times.
On the contray, doing
F = D + E
is fine.
Think about this: you can not write F = E*(A*B+sin(C)+3.); using only two nested loops.