I am looking a tool able to detect ordered function call pairs in a nested fashion as shown below:
f() // depth 0
f() //depth 1
g()
g()
At each depth of call f() there must be a call of g() forming function call pair. This is particularly important in critical section entry and exit.
In C++, one option is to wrap the calls to f() and g() in the constructor and destructor of a class and only call those functions by instantiating an instance of that class. For example,
struct FAndGCaller
{
FAndGCaller() { f(); }
~FAndGCaller() { g(); }
};
This can then be used in any scope block like so:
{
FAndGCaller call_f_then_later_g; // calls f()
} // calls g()
Obviously in real code you'd want to name things more appropriately, and often you'll simply want to have the contents of f() and g() in the constructor and destructor bodies, rather than in separate functions.
This idiom of Scope Bound Resource Management (SBRM, or more commonly referred to as Resource Acquisition is Initialization, RAII) is quite common.
You may abuse a for-loop for this.
#define SAVETHEDAY for (bool seen = ((void)f(), true); seen; seen = ((void)g(), false))
The comma operator always lets your functions f be executed before the dependent statement and g afterwards. E.g
SAVETHEDAY {
SAVETHEDAY {
}
}
Pros:
Makes nesting levels clear.
Works for C++ and C99.
The pseudo for-loop will be
optimized away by any decent
compiler.
Cons:
You might have surprises with
break, return and continue
inside the blocks, so g might not be called in such a situation.
For C++, this is not safe against a
throw inside, again g might not be called
Will be frowned upon by many people since is in some sort extending the language(s).
Will be frowned upon by many people
especially for C++ since generally
such macros that hide code are
thought to be evil
The problem with continue can be repaired by doing things a bit more cleverly.
The first two cons can be circumvented in C++ by using a dummy type as for-variable that just has f and g in the constructor and destructor.
Scan through the code (that's the hard part) and every time you see an invocation of f(), increment a counter. Every time you see an invocation of g(), decrement the counter. At the end, the counter should be back to zero. If it ever goes negative, that's a problem as well (you had a call to g() that wasn't preceded by a matching call to f()).
Scanning the code accurately is the hard part though -- with C and (especially) C++, writing code to understand source code is extremely difficult. Offhand, I don't know of an existing tool for this particular job. You could undoubtedly get clang (for one example) to do it, but while it'll be a lot easier than doing it entirely on your own, it still won't be trivial.
The Coccinelle tool for semantic searching and patching of C code is designed for this sort of task (see also this LWN article on the tool).
Related
We had a function that used a non-capturing lambda internal to itself, e.g.:
void foo() {
auto bar = [](int a, int b){ return a + b; }
// code using bar(x,y) a bunch of times
}
Now the functionality implemented by the lambda became needed elsewhere, so I am going to lift the lambda out of foo() into the global/namespace scope. I can either leave it as a lambda, making it a copy-paste option, or change it to a proper function:
auto bar = [](int a, int b){ return a + b; } // option 1
int bar(int a, int b){ return a + b; } // option 2
void foo() {
// code using bar(x,y) a bunch of times
}
Changing it to a proper function is trivial, but it made me wonder if there is some reason not to leave it as a lambda? Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
There's one very important reason not to use global lambdas: because it's not normal.
C++'s regular function syntax has been around since the days of C. Programmers have known for decades what said syntax means and how they work (though admittedly that whole function-to-pointer decay thing sometimes bites even seasoned programmers). If a C++ programmer of any skill level beyond "utter newbie" sees a function definition, they know what they're getting.
A global lambda is a different beast altogether. It has different behavior from a regular function. Lambdas are objects, while functions are not. They have a type, but that type is distinct from the type of their function. And so forth.
So now, you've raised the bar in communicating with other programmers. A C++ programmer needs to understand lambdas if they're going to understand what this function is doing. And yes, this is 2019, so a decent C++ programmer should have an idea what a lambda looks like. But it is still a higher bar.
And even if they understand it, the question on that programmer's mind will be... why did the writer of this code write it that way? And if you don't have a good answer for that question (for example, because you explicitly want to forbid overloading and ADL, as in Ranges customization points), then you should use the common mechanism.
Prefer expected solutions to novel ones where appropriate. Use the least complicated method of getting your point across.
I can think of a few reasons you'd want to avoid global lambdas as drop-in replacements for regular functions:
regular functions can be overloaded; lambdas cannot (there are techniques to simulate this, however)
Despite the fact that they are function-like, even a non-capturing lambda like this will occupy memory (generally 1 byte for non-capturing).
as pointed out in the comments, modern compilers will optimize this storage away under the as-if rule
"Why shouldn't I use lambdas to replace stateful functors (classes)?"
classes simply have fewer restrictions than lambdas and should therefore be the first thing you reach for
(public/private data, overloading, helper methods, etc.)
if the lambda has state, then it is all the more difficult to reason about when it becomes global.
We should prefer to create an instance of a class at the narrowest possible scope
it's already difficult to convert a non-capturing lambda into a function pointer, and it is impossible for a lambda that specifies anything in its capture.
classes give us a straightforward way to create function pointers, and they're also what many programmers are more comfortable with
Lambdas with any capture cannot be default-constructed (in C++20. Previously there was no default constructor in any case)
Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
A problem of a certain level of complexity requires a solution of at least the same complexity. But if there is a less complex solution for the same problem, then there is really no justification for using the more complex one. Why introduce complexity you don't need?
Between a lambda and a function, a function is simply the less complex kind of entity of the two. You don't have to justify not using a lambda. You have to justify using one. A lambda expression introduces a closure type, which is an unnamed class type with all the usual special member functions, a function call operator, and, in this case, an implicit conversion operator to function pointer, and creates an object of that type. Copy-initializing a global variable from a lambda expression simply does a lot more than just defining a function. It defines a class type with six implicitly-declared functions, defines two more operator functions, and creates an object. The compiler has to do a lot more. If you don't need any of the features of a lambda, then don't use a lambda…
After asking, I thought of a reason to not do this: Since these are variables, they are prone to Static Initialization Order Fiasco (https://isocpp.org/wiki/faq/ctors#static-init-order), which could cause bugs down the line.
if there is some reason not to leave it as a lambda? Is there any reason not to just use lambdas everywhere instead of "regular" global functions?
We used to use functions instead of global functor, so it breaks the coherency and the Principle of least astonishment.
The main differences are:
functions can be overloaded, whereas functors cannot.
functions can be found with ADL, not functors.
Lambdas are anonymous functions.
If you are using a named lambda, it means you are basically using a named anonymous function. To avoid this oxymoron, you might as well use a function.
C++'s void type is not uninhabited. The problem is that while it has precisely one inhabitant, very much like the Unit type (a.k.a. ()) in ML-like languages, that inhabitant cannot be named or passed around as an ordinary value. For example, the following code fails to compile:
void foo(void a) { return; }
void bar() { foo(foo()); }
whereas equivalent (say) Rust code would compile just fine:
fn foo(a : ()) { return; }
fn bar() { foo(foo(())); }
In effect, void is like a unit type, but only half-heartedly so. Why is this the case?
Does the C++ standard explicitly state that one cannot create values of type void? If yes, what is the rationale behind this decision? If not, why does the code above not compile?
If it is some backwards-compatibility related reason, please give a code example.
To be clear, I'm not looking for work-arounds to the problem (e.g. using an empty struct/class). I want to know the historical reason(s) behind the status quo.
EDIT: I've changed the syntax in the code examples slightly to make it clear that I'm not trying to hijack existing syntax like void foo(void) (consequently, some comments may be out of date). The primary motivation behind the question is "why is the type system not like X" and not "why does this bit of syntax not behave as I'd like it to". Please keep this point in mind if you're writing an answer talking about breaking backwards compatibility.
"Does the C++ standard explicitly state that one cannot create values of type void?"
Yes. It states that void is an incomplete type and cannot be completed. You can't create objects or values with an incomplete type.
This is an old rule; as the comments note it's inherited from C. There are minor extensions in C++ to simplify the writing of generic code, e.g. void f(); void g() { return f(); } is legal.
There seems to be little gain in changing the status quo. C++ is not an academic language. Purity is not a goal. Writing useful program is, but how does such a proposal help with that? To quote Raymond Chen, every proposal starts at -100 and has to justify its addition; you don't justify the lack of features.
That is really an historical question. Old (pre-C) language used to differentiate functions which returned values, from subroutines which did not (ooh, the good old taste of Fortran IV and Basic...). AFAIK, early C only allowed functions, simply functions were by default returning int and it was legal to have no return statement (mean return an unspecified value) and legal to ignore any return value - so that the programmer can write coherent code... In those early days, C was used more or less as a powerful macro assembler, and anything was allowed provided the compiler can translate it into machine instructions (no strict aliasing rule for example...). As the memory unit was char, no need for void * pointer, char * was enough.
Then people felt the need to make clear that a buffer was expected to contain anything and not a character string, and that some functions will never return a value. And void came to feel the gap.
The drawback, is that when you declare a void function, you declare what was called a subroutine, that is something that can never be used as a value, in particular never be used as a function parameter. So void is not only a special type that can never be instantiated, it really declare that the result cannot be a member of an expression.
And because of language inheritance, and because the C standard library is still a subset of the C++ standard one, C++ still processes void the way ANSI C did.
Other languages can use different conventions. In Python for example a function will always return something, simply it returns the special None value if no return statement is encountered. And rust seem to have still another convention.
I'm wondering if C++ will still obey the inline keyword when a function is passed as an agument. In the following example, would a new frame for onFrame be pushed onto the stack every time frame() is called in the while loop?
bool interrupt = false;
void run(std::function<void()> frame) {
while(!interrupt) frame();
}
inline void onFrame() {
// do something each frame
}
int main() {
run(onFrame);
}
Or would changing to this have any effect?
void run(std::function<inline void()> frame) {
while(!interrupt) frame();
}
If you have no definitive answer, can you help me find a way to test this? Possibly using memory addresses or some sort of debugger?
It's going to be pretty hard for the compiler to inline your function if it has to go through std::function's type-erased dispatch to get there. It's possible it'll happen anyway, but you're making it as hard as possible. Your proposed alternative (taking a std::function<inline void()> argument) is ill-formed.
If you don't need type erasure, don't use type erasure. run() can simply take an arbitrary callable:
template <class F>
void run(F frame) {
while(!interrupt) frame();
}
That is muuch easier to inline for the compiler. Although, simply having an inline function does not in of itself guarantee that the function gets inlined. See this answer.
Note also that when you're passing a function pointer, that also makes it less likely to get inlined, which is awkward. I'm trying to find an answer on here that had a great example, but until then, if inlining is super important, wrapping it in a lambda may be the way to go:
run([]{ onFrame(); });
still obey the inline keyword ... would a new frame ... be pushed onto the stack
That isn't what the inline keyword does in the first place (see this question for extensive reference).
Assuming, as Barry does, that you're hoping to persuade the optimiser to inline your function call (once more for luck: this is nothing to do with the inline keyword), function template+lambda is probably the way to go.
To see why this is, consider what the optimiser has to work with in each of these cases:
function template + lambda
template <typename F>
void run(F frame) { while(!interrupt) frame(); }
// ... call site ...
run([]{ onFrame(); });
here, the function only exists at all (is instantiated from the template) at the call site, with everything the optimizer needs to work in scope and well-defined.
Note the optimizer may still reasonably choose not to inline a call if it thinks the extra instruction cache pressure will outweigh the saving of stack frame
function pointer
void run(void (*frame)()) { while(!interrupt) frame(); }
// ... call site ...
run(onFrame);
here, run may have to be compiled as a standalone function (although that copy may be thrown away by the linker if it can prove no-one used it), and same for onFrame, especially since its address is taken. Finally, the optimizer may need to consider whether run is called with many different function pointers, or just one, when deciding whether to inline these calls. Overall, it seems like more work, and may end up as a link-time optimisation.
NB. I used "standalone function" to mean the compiler likely emits the code & symbol table entry for a normal free function in both cases.
std::function
This is already getting long. Let's just notice that this class goes to great lengths (the type erasure Barry mentioned) to make the function
void run(std::function<void()> frame);
not depend on the exact type of the function, which means hiding information from the compiler at the point it generates the code for run, which means less for the optimiser to work with (or conversely, more work required to undo all that careful information hiding).
As for testing what your optimiser does, you need to examine this in the context of your whole program: it's free to choose different heuristics depending on code size and complexity.
To be totally sure what it actually did, just disassemble with source or compile to assembler. (Yes, that's potentially a big "just", but it's platform-specific, not really on-topic for the question, and a skill worth learning anyway).
Compile for release and check the list files, or turn on disassembly in debugger. Best way to know is to check the generated code.
Can one in C++11 somehow in gcc mark a function (not a class method) as const telling that it is pure and does not use the global memory but only its arguments?
I've tried gcc's __attribute__((const)) and it is precisely what I want. But it does not produce any compile time error when the global memory is touched in the function.
Edit 1
Please be careful. I mean pure functions. Not constant functions. GCC's attribute is a little bit confusing. Pure functions only use their arguments.
Are you looking for constexpr? This tells the compiler that the function may be evaluated at compile time. A constexpr function must have literal return and parameter types and the body can only contain static asserts, typedefs, using declarations and directives and one return statement. A constexpr function may be called in a constant expression.
constexpr int add(int a, int b) { return a + b; }
int x[add(3, 6)];
Having looked at the meaning of __atribute__((const)), the answer is no, you cannot do this with standard C++. Using constexpr will achieve the same effect, but only on a much more limited set of functions. There is nothing stopping a compiler from making these optimizations on its own, however, as long as the compiled program behaves the same way (the as-if rule).
Because it has been mentioned a lot here, lets forget about Meta programming for now, which is pure functional anyway and off topic. However, a constexpr function foo can be called with non constexpr arguments and in this context foo is actually a pure function evaluated at runtime (I am ignoring global variables here). But you can write many pure functions that you cannot make constexpr, this includes any function throwing exceptions for example.
Second I assume the OP means marking pure as an assertion for the compiler to check. GCC's pure attribute is the opposite, a way for the coder to help the compiler.
While the answer to the OP's question is NO, it is very interesting to read about the history of attempts to introduce a pure keyword (or impure and let pure be the default).
The d-lang community quickly figured out that the meaning of "pure" is not clear. Logging should not make a function impure. Mutable variables that do not escape the function call should be allowed in pure functions. Equal return values having different addresses should not be considered impure. But D goes even further than that in stretching purity.
So the d-lang community introduced the term "weakly pure" and "strongly pure". But later disputes showed that weak and strong is not black and white and there are grey zones. see purity in D
Rust introduced the "pure" keyword early on; and they dropped it because of its complexity. see purity in Rust.
Among the great benefits of a "pure" keyword there is an ugly consequence though. A templated function can be pure or not depending on its type parameters. This can explode the number of template instantiations. Those instantiations may only need to exist temporarily in the compiler and not get into the executable but they can still explode compile times.
A syntax highlighting editor could be of some help here without modifying the language. Optimizing C++ compilers do actually reason about the pureness of a function, they just do not guarantee catching all cases.
I find it sad that this feature seems to have low priority. It makes reasoning about code so much easier. I would even argue that it would improve software design by the way it incentivizing programmers to think differently.
using just standard C++11:
namespace g{ int x; }
constexpr int foo()
{
//return g::x = 42; Nah, not constant
return 42; // OK
}
int main()
{}
here's another example:
constexpr int foo( int blah = 0 )
{
return blah + 42; // OK
}
int main( int argc, char** )
{
int bah[foo(2)]; // Very constant.
int const troll = foo( argc ); // Very non-constant.
}
The meaning of GCC's __attribute__( const ) is documented in the GNU compiler docs as …
Many functions do not examine any values except their arguments, and have no effects except the return value. Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
One may take that to mean that the function result should only depend on the arguments, and that the function should have no side effects.
This allows a more general class of functions than C++11 constexpr, which makes the function inline, restricts arguments and function result to literal types, and restricts the "active" statements of the function body to a single return statement, where (C++11 §7.1.5/3)
— every constructor call and implicit conversion used in initializing the return value (6.6.3, 8.5) shall be one of those allowed in a constant expression (5.19)
As an example, it is difficult (I would think not impossible, but difficult) to make a constexpr sin function.
But the purity of the result matters only to two parties:
When known to be pure, the compiler can elide calls with known results.
This is mostly an optimization of macro-generated code. Replace macros with inline functions to avoid silly generation of identical sub-expressions.
When known to be pure, a programmer can remove a call entirely.
This is just a matter of proper documentation. :-)
So instead of looking for a way to express the purity of e.g. sin in the language, I suggest just avoid code generation via macros, and document pure functions as such.
And use constexpr for the functions where it's practically possible (unfortunately, as of Dec. 2012 the latest Visual C++ compiler doesn't yet support constexpr).
There is a previous SO question about the relationship between pure and constexpr. Mainly, every constexpr function is pure, but not vice versa.
As you probably know, C++11 introduces the constexpr keyword.
C++11 introduced the keyword constexpr, which allows the user to
guarantee that a function or object constructor is a compile-time
constant.
[...]
This allows the compiler to understand, and verify, that [function name] is a
compile-time constant.
My question is why are there such strict restrictions on form of the functions that can be declared. I understand desire to guarantee that function is pure, but consider this:
The use of constexpr on a function imposes some limitations on what
that function can do. First, the function must have a non-void return
type. Second, the function body cannot declare variables or define new
types. Third, the body may only contain declarations, null statements
and a single return statement. There must exist argument values such
that, after argument substitution, the expression in the return
statement produces a constant expression.
That means that this pure function is illegal:
constexpr int maybeInCppC1Y(int a, int b)
{
if (a>0)
return a+b;
else
return a-b;
//can be written as return (a>0) ? (a+b):(a-b); but that isnt the point
}
Also you cant define local variables... :(
So I'm wondering is this a design decision, or do compilers suck when it comes to proving function a is pure?
The reason you'd need to write statements instead of expressions is that you want to take advantage of the additional capabilities of statements, particularly the ability to loop. But to be useful, that would require the ability to declare variables (also banned).
If you combine a facility for looping, with mutable variables, with logical branching (as in if statements) then you have the ability to create infinite loops. It is not possible to determine if such a loop will ever terminate (the halting problem). Thus some sources would cause the compiler to hang.
By using recursive pure functions it is possible to cause infinite recursion, which can be shown to be equivalently powerful to the looping capabilities described above. However, C++ already has that problem at compile time - it occurs with template expansion - and so compilers already have to have a switch for "template stack depth" so they know when to give up.
So the restrictions seem designed to ensure that this problem (of determining if a C++ compilation will ever finish) doesn't get any thornier than it already is.
The rules for constexpr functions are designed such that it's impossible to write a constexpr function that has any side-effects.
By requiring constexpr to have no side-effects it becomes impossible for a user to determine where/when it was actually evaluated. This is important since constexpr functions are allowed to happen at both compile time and run time at the discretion of the compiler.
If side-effects were allowed then there would need to be some rules about the order in which they would be observed. That would be incredibly difficult to define - even harder than the static initialisation order problem.
A relatively simple set of rules for guaranteeing these functions to be side-effect free is to require that they be just a single expression (with a few extra restrictions on top of that). This sounds limiting initially and rules out the if statement as you noted. Whilst that particular case would have no side-effects it would have introduced extra complexity into the rules and given that you can write the same things using the ternary operator or recursively it's not really a huge deal.
n2235 is the paper that proposed the constexpr addition in C++. It discusses the rational for the design - the relevant quote seems to be this one from a discussion on destructors, but relevant generally:
The reason is that a constant-expression is intended to be evaluated by the compiler
at translation time just like any other literal of built-in type; in particular no
observable side-effect is permitted.
Interestingly the paper also mentions that a previous proposal suggested the the compiler figured out automatically which functions were constexpr without the new keyword, but this was found to be unworkably complex, which seems to support my suggestion that the rules were designed to be simple.
(I suspect there will be other quotes in the references cited in the paper, but this covers the key point of my argument about the no side-effects)
Actually the C++ standardization committee is thinking about removing several of these constraints for c++14. See the following working document http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3597.html
The restrictions could certainly be lifted quite a bit without enabling code which cannot be executed during compile time, or which cannot be proven to always halt. However I guess it wasn't done because
it would complicate the compiler for minimal gain. C++ compilers are quite complex as is
specifying exactly how much is allowed without violating the restrictions above would have been time consuming, and given that desired features have been postponed in order to get the standard out of the door, there probably was little incentive to add more work (and further delay of the standard) for little gain
some of the restrictions would have been either rather arbitrary or rather complicated (especially on loops, given that C++ doesn't have the concept of a native incrementing for loop, but both the end condition and the increment code have to be explicitly specified in the for statement, making it possible to use arbitrary expressions for them)
Of course, only a member of the standards committee could give an authoritative answer whether my assumptions are correct.
I think constexpr is just for const objects. I mean; you can now have static const objects like String::empty_string constructs statically(without hacking!). This may reduce time before 'main' called. And static const objects may have functions like .length(), operator==,... so this is why 'expr' is needed. In 'C' you can create static constant structs like below:
static const Foos foo = { .a = 1, .b = 2, };
Linux kernel has tons of this type classes. In c++ you could do this now with constexpr.
note: I dunno but code below should not be accepted so like if version:
constexpr int maybeInCppC1Y(int a, int b) { return (a > 0) ? (a + b) : (a - b); }