Why does TensorFlow recommends the "functional style for constructing operations"? - c++

In TensorFlow's documentation, it is possible to find the following text:
// Not recommended
MatMul m(scope, a, b);
// Recommended
auto m = MatMul(scope, a, b);
I see no obvious benefit from using the "recommended" style. The first version is shorter at least. Also the "recommended" version might include more actions related to the unnecessary assignment operation.
I have read that documentation page no less than six times and still cannot get the rationale behind their reasoning.
Is this recommendation just a matter of style or may the second version have some benefits?

Also the "recommended" version might include more actions related to the unnecessary assignment operation.
There is no assignment. It's initialization. And any extra object that may exist in principle is elided entirely in any compiler worth using.
That recommendation you cite is inline with Herb Sutter's "Almost Always Auto" advice. The strongest point in favor of this style (which in full disclosure, I do not follow) is that it makes it impossible to leave a variable uninitialized.
auto foo; // ill-formed.
int foo; // indeterminate value
auto foo = int(); // a zero integer
Granted, a good static analysis tool can warn about it too, but it's still a very strong point that requires no additional analysis by a compiler or external tools.
Beyond that, the stylistic argument is that it also keep your code consistent with cases where you use auto for sanity's sake, such as
auto it = myVec.begin();
But YMMV on all counts. It is ultimately going to be a stylistic choice, and regardless of the chosen style, exceptions to both styles exist.

Related

Why are std::source_location's getters not marked as [[nodiscard]]? [duplicate]

I've recently read about [[nodiscard]] in C++17, and as far as I understand it's a new feature (design by contract?) which forces you to use the return value. This makes sense for controversial functions like std::launder (nodiscard since C++20), but I wonder why std::move isn't defined like so in C++17/20. Do you know a good reason or is it because C++20 isn't finalised yet?
The MSVC standard library team went ahead and added several thousand instances of [[nodiscard]] since VS 2017 15.6, and have reported wild success with it (both in terms of finding lots of bugs and generating no user complaints). The criteria they described were approximately:
Pure observers, e.g. vector::size(), vector::empty, and even std::count_if()
Things that acquire raw resources, e.g. allocate()
Functions where discarding the return value is extremely likely to lead to incorrect code, e.g. std::remove()
MSVC does mark both std::move() and std::forward() as [[nodiscard]] following these criteria.
While it's not officially annotated as such in the standard, it seems to provide clear user benefit and it's more a question of crafting such a paper to mark all the right things [[nodiscard]] (again, several thousand instances from MSVC) and apply them -- it's not complex work per se, but the volume is large. In the meantime, maybe prod your favorite standard library vendor and ask them to [[nodiscard]] lots of stuff?
AFAIK P0600R1 is the only proposal for adding [[nodiscard]] to the standard library that was applied to C++20. From that paper:
We suggest a conservative approach:
[...]
It should not be added when:
[...]
not using the return value makes no sense but doesn’t hurt and is usually not an error
[...]
So, [[nodiscard]] should not signal bad code if this
[...]
doesn’t hurt and probably no state change was meant that doesn’t happen
So the reason is that the standard library uses a conservative approach and a more aggresive one is not yet proposed.

Why is std::move not [[nodiscard]] in C++20?

I've recently read about [[nodiscard]] in C++17, and as far as I understand it's a new feature (design by contract?) which forces you to use the return value. This makes sense for controversial functions like std::launder (nodiscard since C++20), but I wonder why std::move isn't defined like so in C++17/20. Do you know a good reason or is it because C++20 isn't finalised yet?
The MSVC standard library team went ahead and added several thousand instances of [[nodiscard]] since VS 2017 15.6, and have reported wild success with it (both in terms of finding lots of bugs and generating no user complaints). The criteria they described were approximately:
Pure observers, e.g. vector::size(), vector::empty, and even std::count_if()
Things that acquire raw resources, e.g. allocate()
Functions where discarding the return value is extremely likely to lead to incorrect code, e.g. std::remove()
MSVC does mark both std::move() and std::forward() as [[nodiscard]] following these criteria.
While it's not officially annotated as such in the standard, it seems to provide clear user benefit and it's more a question of crafting such a paper to mark all the right things [[nodiscard]] (again, several thousand instances from MSVC) and apply them -- it's not complex work per se, but the volume is large. In the meantime, maybe prod your favorite standard library vendor and ask them to [[nodiscard]] lots of stuff?
AFAIK P0600R1 is the only proposal for adding [[nodiscard]] to the standard library that was applied to C++20. From that paper:
We suggest a conservative approach:
[...]
It should not be added when:
[...]
not using the return value makes no sense but doesn’t hurt and is usually not an error
[...]
So, [[nodiscard]] should not signal bad code if this
[...]
doesn’t hurt and probably no state change was meant that doesn’t happen
So the reason is that the standard library uses a conservative approach and a more aggresive one is not yet proposed.

Why do I have to cast an enum element when assigning it to a same enum variable type in C?

I have the following:
typedef enum
{
FLS_PROG_SUCCESS,
FLS_PROG_FAIL,
FLS_ERASE_SUCCESS2U,
FLS_ERASE_FAIL,
FLS_READ_SUCCESS,
FLS_READ_FAIL,
FLS_FORMAT_SUCCESS,
FLS_FORMAT_FAIL
}FLS_JobResult_t;
void Foo(void)
{
FLS_JobResult_t ProgramStatus;
/* Then I try to initialize the variable value */
ProgramStatus = FLS_PROG_SUCCESS;
...
}
Innocent uh, but when compiling MISRA C gives the error:
The value of an expression shall not be assigned to an object with a narrower essential type or of a different essential type category
And I found out that I shall write the initialization as follows:
ProgramStatus = (FLS_JobResult_t)FLS_PROG_SUCCESS;
And this just doesn't look good to me, it's like MISRA wants me to throw casts in all the code and that's too much.
Do you know why is this? I don't think this should be an issue but I've tried everything that comes to my mind and this was the only way of getting rid of this error, but it simply does not make any sense, does it?
Regards.
(Hi, this is a new account so I cannot use the comments section yet to ask for further clarification, so, my answer may be broader than needed)
Based on the text of the warning message I assume you are talking about MISRA-C:2012 (the latest standard) which is a great improvement over the prior ones in that much more effort in stating the rationale along with many more compliant and non-compliant examples have been added. This being Rule 10.3, the rationale is: since C permits assignments between different arithmetic types to be performed automatically, the use of these implicit conversions can lead to unintended results, with the potential for loss of value, sign or precision.
Thus MISRA-C:2012 requires the use of stronger typing, as enforced by its essential type model, which reduces the likelihood of these problems occurring.
Unfortunately many tools have not properly implemented the rules and the type model. In this case, your tool is incorrect, this is not a violation of essential type rules because ProgramStatus and FLS_PROG_SUCCESS are both the same essential type. In fact a similar example is shown in the standard itself, under the rule’s list of compliant examples:
enum enuma { A1, A2, A3 } ena;
ena = A1;
If your tool vendor disagrees you can post your question on the "official" MISRA forum to get the official answer and forward that to the vendor.

value semantics vs output params with large data structures

2013 Keynote: Chandler Carruth: Optimizing the Emergent Structures of C++
42:45
You don't need output parameters, we have value semantics in C++. ... Anytime you see someone arguing that nonono I'm not going to return by value because copy would cost too much, someone working on an optimizer says they're wrong. All right? I have never yet seen a piece of code where that argument was correct. ... People don't realize how important value semantics are to the optimizer because it completely clarifies the aliasing scenarios.
Can anyone put this in the context of this answer: https://stackoverflow.com/a/14229152
I hear that being repeated on and on but, well, for me a function returning something is a source. Output parameters by reference take that characteristic out of a function, and removing such hard coded characteristic from a function allows one to manage outside instead, how output will be stored/reused.
My question is, even in the context of that SO answer, is there a way to tell, restructuring the code some other equivalent way, "ok now see, value semantics in this way doesn't lose for the output param version", or Chandler comments were specific to some contrived situations? I had even seen Andrei Alexandrescu arguing this in a talk and telling you can't escape using by ref output for better performance.
For another take on Andrei's comments see Eric Niebler: Out Parameters, Move Semantics, and Stateful Algorithms.
It's either an overstatement, generalization, a joke, or Chandler's idea of "Perfectly Reasonable Performance" (using modern C++ toolchains/libs) is unacceptable to my programs.
I find it a fairly narrow scope of optimizations. Penalties exist beyond that scope which cannot be ignored due to the actual complexities and designs found in programs - heap allocations were an example for the getline example. The specific optimizations may or may not always be applicable to the program in question, despite your attempts to reduce them. Real world structures will reference memory which may alias. You can reduce that, but it's not practical to believe that you can eliminate aliasing (from an optimizer's perspective).
Of course, RBV can be a great thing - it's just not appropriate for all cases. Even the link you referenced pointed out how one can avoid a ton of allocations/frees. Real programs and the data structures found in them are far more complex.
Later in the talk, he goes on to criticize the use of member functions (ref: S::compute()). Sure, there is a point to take away, but is it really reasonable to avoid using these language features entirely because it makes the optimizer's job easier? No. Will it always result in more readable programs? No. Will these code transformations always result in measurably faster programs? No. Are the changes required to transform your codebase worth the time you invest? Sometimes. Can you take away some points and make better informed decisions which impact some of your existing or future codebase? Yes.
Sometimes it helps to break down how exactly your program would execute, or what it would look like in C.
The optimizer will not solve all performance problems, and you should not rewrite programs with the assumption that the programs you are dealing with are "completely brain dead and broken designs", nor should you believe that using RBV will always result in "Perfectly Reasonable Performance". You can utilize new language features and make the optimizer's job easier, though there is much to gain there are often more important optimizations to invest your time on.
It's fine to consider the proposed changes; ideally you would measure the impact of such changes in real world execution times and impact on your source code before adopting these suggestions.
For your example: Even copying+assigning large structures by value can have significant costs. Beyond the cost of running constructors and destructors (along with their associated creation/cleanup of the resources they acquire and own, as pointed out in the link you reference), even things as simple as avoiding unnecessary structure copies can save you a ton of CPU time if you use references (where appropriate). The structure copy may be as simple as a memcpy. These aren't contrived problems; they appear in actual programs and the complexity can increase greatly with your program's complexity. Is reducing aliasing of some memory and other optimizations worth the costs, and does it result in "Perfectly Reasonable Performance"? Not always.
The problem with output parameters as described in the linked question is that they generally make the common calling case (i.e., you don't have a vector storage to use already) much more verbose than normal. For example, if you used return by value:
auto a = split(s, r);
if you used output parameters:
std::vector<std::string> a;
split(s,r,a);
The second one looks much less pretty to my eyes. Also, as Chandler mentioned, the optimizer can do much more with the first one than the second, depending on the rest of your code.
Is there a way we can get the best of both worlds? Emphatically yes, using move semantics:
std::vector<std::string> split(const std::string &s, const std::regex &r, std::vector<std::string> v = {})
{
auto rit = std::sregex_token_iterator(s.begin(), s.end(), r, -1);
auto rend = std::sregex_token_iterator();
v.clear();
while(rit != rend)
{
v.push_back(*rit);
++rit;
}
return v;
}
Now, in the common case, we can call split as normal (i.e., the first example) and it will allocate a new vector storage for us. In the important but rare case when we have to split repeatedly and we want to re-use the same storage, we can just move in a storage that persists between calls:
int main()
{
const std::regex r(" +");
std::vector<std::string> a;
for(auto i=0; i < 1000000; ++i)
a = split("a b c", r, std::move(a));
return 0;
}
This runs just as fast as the output-argument method and what's going on is quite clear. You don't have to make your function hard to use all the time just to get good performance some of the time.
I was about to implement a solution using string_view and ranges and then found this:
std::split(): An algorithm for splitting strings
This endorses value semantics output by return and I accept it as prettier than the current choice in the referred SO answer. This design also would take the characteristic of being a source out of a function even though it is returning. Simple illustration: one could have an outer reserved vector being repetitively filled by a returned range.
In any case, I'm not sure whether such version of split would help the optimizer in any sense (I'm talking in context to Chandler's talk here).
Notice
An output param version enforces the existence of a named variable at call site, which may be ugly to the eyes, but may turn debugging the call site always easier.
Sample solution
While std::split doesn't arrive, I've exercised the value semantics output by return version this way:
#include <string>
#include <string_view>
#include <boost/regex.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/iterator/transform_iterator.hpp>
using namespace std;
using namespace std::experimental;
using namespace boost;
string_view stringfier(const cregex_token_iterator::value_type &match) {
return {match.first, static_cast<size_t>(match.length())};
}
using string_view_iterator =
transform_iterator<decltype(&stringfier), cregex_token_iterator>;
iterator_range<string_view_iterator> split(string_view s, const regex &r) {
return {
string_view_iterator(
cregex_token_iterator(s.begin(), s.end(), r, -1),
stringfier
),
string_view_iterator()
};
}
int main() {
const regex r(" +");
for (size_t i = 0; i < 1000000; ++i) {
split("a b c", r);
}
}
I have used Marshall Clow's string_view libc++ implementation found at https://github.com/mclow/string_view.
I have posted the timings at the bottom of the referred answer.

Expressions with no side effects in C++

See, what I don't get is, why should programs like the following be legal?
int main()
{
static const int i = 0;
i < i > i;
}
I mean, surely, nobody actually has any current programs that have expressions with no side effects in them, since that would be very pointless, and it would make parsing & compiling the language much easier. So why not just disallow them? What benefit does the language actually gain from allowing this kind of syntax?
Another example being like this:
int main() {
static const int i = 0;
int x = (i);
}
What is the actual benefit of such statements?
And things like the most vexing parse. Does anybody, ever, declare functions in the middle of other functions? I mean, we got rid of things like implicit function declaration, and things like that. Why not just get rid of them for C++0x?
Probably because banning then would make the specification more complex, which would make compilers more complex.
it would make parsing & compiling the
language much easier
I don't see how. Why is it easier to parse and compile i < i > i if you're required to issue a diagnostic, than it is to parse it if you're allowed to do anything you damn well please provided that the emitted code has no side-effects?
The Java compiler forbids unreachable code (as opposed to code with no effect), which is a mixed blessing for the programmer, and requires a little bit of extra work from the compiler than what a C++ compiler is actually required to do (basic block dependency analysis). Should C++ forbid unreachable code? Probably not. Even though C++ compilers certainly do enough optimization to identify unreachable basic blocks, in some cases they may do too much. Should if (foo) { ...} be an illegal unreachable block if foo is a false compile-time constant? What if it's not a compile-time constant, but the optimizer has figured out how to calculate the value, should it be legal and the compiler has to realise that the reason it's removing it is implementation-specific, so as not to give an error? More special cases.
nobody actually has any current
programs that have expressions with no
side effects in them
Loads. For example, if NDEBUG is true, then assert expands to a void expression with no effect. So that's yet more special cases needed in the compiler to permit some useless expressions, but not others.
The rationale, I believe, is that if it expanded to nothing then (a) compilers would end up throwing warnings for things like if (foo) assert(bar);, and (b) code like this would be legal in release but not in debug, which is just confusing:
assert(foo) // oops, forgot the semi-colon
foo.bar();
things like the most vexing parse
That's why it's called "vexing". It's a backward-compatibility issue really. If C++ now changed the meaning of those vexing parses, the meaning of existing code would change. Not much existing code, as you point out, but the C++ committee takes a fairly strong line on backward compatibility. If you want a language that changes every five minutes, use Perl ;-)
Anyway, it's too late now. Even if we had some great insight that the C++0x committee had missed, why some feature should be removed or incompatibly changed, they aren't going to break anything in the FCD unless the FCD is definitively in error.
Note that for all of your suggestions, any compiler could issue a warning for them (actually, I don't understand what your problem is with the second example, but certainly for useless expressions and for vexing parses in function bodies). If you're right that nobody does it deliberately, the warnings would cause no harm. If you're wrong that nobody does it deliberately, your stated case for removing them is incorrect. Warnings in popular compilers could pave the way for removing a feature, especially since the standard is authored largely by compiler-writers. The fact that we don't always get warnings for these things suggests to me that there's more to it than you think.
It's convenient sometimes to put useless statements into a program and compile it just to make sure they're legal - e.g. that the types involve can be resolved/matched etc.
Especially in generated code (macros as well as more elaborate external mechanisms, templates where Policies or types may introduce meaningless expansions in some no-op cases), having less special uncompilable cases to avoid keeps things simpler
There may be some temporarily commented code that removes the meaningful usage of a variable, but it could be a pain to have to similarly identify and comment all the variables that aren't used elsewhere.
While in your examples you show the variables being "int" immediately above the pointless usage, in practice the types may be much more complicated (e.g. operator<()) and whether the operations have side effects may even be unknown to the compiler (e.g. out-of-line functions), so any benefit's limited to simpler cases.
C++ needs a good reason to break backwards (and retained C) compatibility.
Why should doing nothing be treated as a special case? Furthermore, whilst the above cases are easy to spot, one could imagine far more complicated programs where it's not so easy to identify that there are no side effects.
As an iteration of the C++ standard, C++0x have to be backward compatible. Nobody can assert that the statements you wrote does not exist in some piece of critical software written/owned by, say, NASA or DoD.
Anyway regarding your very first example, the parser cannot assert that i is a static constant expression, and that i < i > i is a useless expression -- e.g. if i is a templated type, i < i > i is an "invalid variable declaration", not a "useless computation", and still not a parse error.
Maybe the operator was overloaded to have side effects like cout<<i; This is the reason why they cannot be removed now. On the other hand C# forbids non-assignment or method calls expresions to be used as statements and I believe this is a good thing as it makes the code more clear and semantically correct. However C# had the opportunity to forbid this from the very beginning which C++ does not.
Expressions with no side effects can turn up more often than you think in templated and macro code. If you've ever declared std::vector<int>, you've instantiated template code with no side effects. std::vector must destruct all its elements when releasing itself, in case you stored a class for type T. This requires, at some point, a statement similar to ptr->~T(); to invoke the destructor. int has no destructor though, so the call has no side effects and will be removed entirely by the optimizer. It's also likely it will be inside a loop, then the entire loop has no side effects, so the entire loop is removed by the optimizer.
So if you disallowed expressions with no side effects, std::vector<int> wouldn't work, for one.
Another common case is assert(a == b). In release builds you want these asserts to disappear - but you can't re-define them as an empty macro, otherwise statements like if (x) assert(a == b); suddenly put the next statement in to the if statement - a disaster! In this case assert(x) can be redefined as ((void)0), which is a statement that has no side effects. Now the if statement works correctly in release builds too - it just does nothing.
These are just two common cases. There are many more you probably don't know about. So, while expressions with no side effects seem redundant, they're actually functionally important. An optimizer will remove them entirely so there's no performance impact, too.