This question already has answers here:
What is the point of noreturn?
(5 answers)
Closed last year.
As I understand, this attribute tells the compiler that a function does not return. What is the benefit of specifying this when a function has a void return?
How are these two functions different?
void foo(){}
__attribute__((noreturn)) void foo(){}
It doesn't mean that the function doesn't return a value - it means that the function doesn't return at all. Calling the function gives a one way trip.
There's some limited use of this in hardware-related programming, where you don't want any function call overhead to be generated on the caller-side stack, since you aren't going to return there anyway.
For example when calling void main (void) from the CRT start-up code inside a microcontroller embedded system - these are systems which keep running until you plug the power and they will never return from main(). So if the compiler for such a system supports _Noreturn void main (void), then that creates less wasted stack space. Otherwise the CRT will just push x bytes on the stack which will remain there forever, as dead space.
It might also be useful for diagnostic purposes, such as when examining the behavior of compilers. As was done here, when demonstrating some major bugs in the clang compiler.
There is no relation between the return type, and a "no-return" behavior of the function.
The void return type is merely telling that this function returns no value (in case it returns).
Some function do not ever return to the caller. For example some message processing infinite loops, the exit() function or the functions for replacing the current process (such as exec* family). The __attribute__((noreturn)) is hinting the compiler that it might consider certain optimization for this specific function which are valid for functions that do not return - such as optimizing the overhead of calling such a function (like saving context of the caller, return address and such).
I just was looking up funciton attributes for gcc
(http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Function-Attributes.html)
and came across the returns_twice attribute.
And I am absolutely clueless in what case a function can return twice... I looked up quickly the mentioned vfork() and setjmp() but continue without an idea how an applicable scenario looks like - anyone of you used it or can explain a bit?
The setjmp function is analogous to creating a label (in the goto sense), as such you will first return from setjmp when you set the label, and then each time that you actually jump to it.
If it seems weird, rest assured, you should not be using setjmp in your daily programming. Or actually... you should probably not be using it at all. It is a very low-level command that break the expected execution flow (much like goto) and, especially in C++, most of the invariants you could expect.
When you call setjmp, it establishes that as a return point, then execution continues at the code immediately following the setjmp call.
At some point later in the code, calling longjmp (with the jump buffer initialized by the previous call to setjmp) returns execution to start from that same point again (i.e., the code immediately following the call the setjmp).
Therefore, the original call returns normally, then at arbitrary later times, execution returns (or at least may return) to the same point again.
The attribute simply warns the compiler of that fact.
In the following code:
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
int ready = 0;
wait()
{
int i;
do
{
usleep(1000);
pthead_mutex_lock(&mutex);
i = ready;
pthread_mutex_unlock(&mutex);
} while (i == 0);
printf("Finished\n");
}
signal()
{
pthead_mutex_lock(&mutex);
ready = 1;
pthread_mutex_unlock(&mutex);
}
We spawn two threads we call wait in one thread and then call signal in the other We also get the compiler to optimize aggressively.
Now will the code behave as expected or will we need to make ready volatile to get this to work? Will different compilers and libraries handle this differently?
Edit: I am hoping that there might be something round the mutex functions that will prevent optimization around itself or that the compiler generally does not optimize round function calls.
Note: I have not compiled and tested the code yet, will do so when I have a chance.
Some perspective from the kernel kings:
http://kernel.org/doc/Documentation/volatile-considered-harmful.txt
I'd be surprised if the compiler assumes anything about a global variable in the presence of library function calls. That being said, volatile will not cost you anything, and it shows your intentions. I'd put it there.
Volatile is neither required nor sufficient. So there is no reason to use it. It is not sufficient because no standard specifies that it will provide visibility between threads. It is not necessary because the pthreads standard says mutexes are sufficient.
Worse, using it suggests that you are an incompetent programmer who tries to sprinkle magic dust over your code to get it to work. It smacks of cargo cult programming and anyone looking at the code will infer that you didn't know it wasn't required. Worse, they may think that you felt it was sufficient, and they'll be suspicious of any other code you have written, fearing you used "volatile" to hide multi-threading bugs rather than fixing them.
Now will the code behave as expected or will we need to make ready volatile to get this to work?
I would advise to use the volatile in the situation. Though in the case it seems it is not needed.
IOW, Personally I would have added volatile and removed the locking: it is not needed for setting/reading the variable's value.
Will different compilers and libraries handle this differently?
I am hoping that there might be something round the mutex functions that will prevent optimization around itself or that the compiler generally does not optimize round function calls.
In your case the call to the function (pthread_mutex_lock()) has side effects and changes the execution environment. Thus compiler has to reread the global variable which might have being change by the call to the function.
To be really sure, you want to consult with C99's 5.1.2.3 Program execution where from I have borrowed the terminology. To give you the taste:
[...] Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place. (A summary of the sequence points is given in annex C.)
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object). [...]
Excerpt from the annex C:
The following are the sequence points described in 5.1.2.3:
— The call to a function, after the arguments have been evaluated (6.5.2.2).
And from there on.
In my experience, compilers are sufficiently smart nowadays and even when optimizing aggressively wouldn't do anything fancy with loop control variable which is a global variable.
Volatile is needed here in the presence of optimization. Otherwise the read of ready can be legally moved out of the while loop.
Assuming limitations to optimization that are not promised by the standard may be fine now, but cause great grief to future maintainers as compilers improve.
yes, volatile int ready = 0; is always needed here.
Update
If you would like no optimization around some code snippet, you can either use #pragma directives around that code (if your compiler supports them) or - which is totally portable - move the code to separate files and compile these files with no or little optimizations.
The latter approach might still require usage of the volatile keyword, since the ready variable might be used by other modules which you might want to optimize.
Most of the times, the definition of reentrance is quoted from Wikipedia:
A computer program or routine is
described as reentrant if it can be
safely called again before its
previous invocation has been completed
(i.e it can be safely executed
concurrently). To be reentrant, a
computer program or routine:
Must hold no static (or global)
non-constant data.
Must not return the address to
static (or global) non-constant
data.
Must work only on the data provided
to it by the caller.
Must not rely on locks to singleton
resources.
Must not modify its own code (unless
executing in its own unique thread
storage)
Must not call non-reentrant computer
programs or routines.
How is safely defined?
If a program can be safely executed concurrently, does it always mean that it is reentrant?
What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
Also,
Are all recursive functions reentrant?
Are all thread-safe functions reentrant?
Are all recursive and thread-safe functions reentrant?
While writing this question, one thing comes to mind:
Are the terms like reentrance and thread safety absolute at all i.e. do they have fixed concrete definitions? For, if they are not, this question is not very meaningful.
1. How is safely defined?
Semantically. In this case, this is not a hard-defined term. It just mean "You can do that, without risk".
2. If a program can be safely executed concurrently, does it always mean that it is reentrant?
No.
For example, let's have a C++ function that takes both a lock, and a callback as a parameter:
#include <mutex>
typedef void (*callback)();
std::mutex m;
void foo(callback f)
{
m.lock();
// use the resource protected by the mutex
if (f) {
f();
}
// use the resource protected by the mutex
m.unlock();
}
Another function could well need to lock the same mutex:
void bar()
{
foo(nullptr);
}
At first sight, everything seems ok… But wait:
int main()
{
foo(bar);
return 0;
}
If the lock on mutex is not recursive, then here's what will happen, in the main thread:
main will call foo.
foo will acquire the lock.
foo will call bar, which will call foo.
the 2nd foo will try to acquire the lock, fail and wait for it to be released.
Deadlock.
Oops…
Ok, I cheated, using the callback thing. But it's easy to imagine more complex pieces of code having a similar effect.
3. What exactly is the common thread between the six points mentioned that I should keep in mind while checking my code for reentrant capabilities?
You can smell a problem if your function has/gives access to a modifiable persistent resource, or has/gives access to a function that smells.
(Ok, 99% of our code should smell, then… See last section to handle that…)
So, studying your code, one of those points should alert you:
The function has a state (i.e. access a global variable, or even a class member variable)
This function can be called by multiple threads, or could appear twice in the stack while the process is executing (i.e. the function could call itself, directly or indirectly). Function taking callbacks as parameters smell a lot.
Note that non-reentrancy is viral : A function that could call a possible non-reentrant function cannot be considered reentrant.
Note, too, that C++ methods smell because they have access to this, so you should study the code to be sure they have no funny interaction.
4.1. Are all recursive functions reentrant?
No.
In multithreaded cases, a recursive function accessing a shared resource could be called by multiple threads at the same moment, resulting in bad/corrupted data.
In singlethreaded cases, a recursive function could use a non-reentrant function (like the infamous strtok), or use global data without handling the fact the data is already in use. So your function is recursive because it calls itself directly or indirectly, but it can still be recursive-unsafe.
4.2. Are all thread-safe functions reentrant?
In the example above, I showed how an apparently threadsafe function was not reentrant. OK, I cheated because of the callback parameter. But then, there are multiple ways to deadlock a thread by having it acquire twice a non-recursive lock.
4.3. Are all recursive and thread-safe functions reentrant?
I would say "yes" if by "recursive" you mean "recursive-safe".
If you can guarantee that a function can be called simultaneously by multiple threads, and can call itself, directly or indirectly, without problems, then it is reentrant.
The problem is evaluating this guarantee… ^_^
5. Are the terms like reentrance and thread safety absolute at all, i.e. do they have fixed concrete definitions?
I believe they do, but then, evaluating a function is thread-safe or reentrant can be difficult. This is why I used the term smell above: You can find a function is not reentrant, but it could be difficult to be sure a complex piece of code is reentrant
6. An example
Let's say you have an object, with one method that needs to use a resource:
struct MyStruct
{
P * p;
void foo()
{
if (this->p == nullptr)
{
this->p = new P();
}
// lots of code, some using this->p
if (this->p != nullptr)
{
delete this->p;
this->p = nullptr;
}
}
};
The first problem is that if somehow this function is called recursively (i.e. this function calls itself, directly or indirectly), the code will probably crash, because this->p will be deleted at the end of the last call, and still probably be used before the end of the first call.
Thus, this code is not recursive-safe.
We could use a reference counter to correct this:
struct MyStruct
{
size_t c;
P * p;
void foo()
{
if (c == 0)
{
this->p = new P();
}
++c;
// lots of code, some using this->p
--c;
if (c == 0)
{
delete this->p;
this->p = nullptr;
}
}
};
This way, the code becomes recursive-safe… But it is still not reentrant because of multithreading issues: We must be sure the modifications of c and of p will be done atomically, using a recursive mutex (not all mutexes are recursive):
#include <mutex>
struct MyStruct
{
std::recursive_mutex m;
size_t c;
P * p;
void foo()
{
m.lock();
if (c == 0)
{
this->p = new P();
}
++c;
m.unlock();
// lots of code, some using this->p
m.lock();
--c;
if (c == 0)
{
delete this->p;
this->p = nullptr;
}
m.unlock();
}
};
And of course, this all assumes the lots of code is itself reentrant, including the use of p.
And the code above is not even remotely exception-safe, but this is another story… ^_^
7. Hey 99% of our code is not reentrant!
It is quite true for spaghetti code. But if you partition correctly your code, you will avoid reentrancy problems.
7.1. Make sure all functions have NO state
They must only use the parameters, their own local variables, other functions without state, and return copies of the data if they return at all.
7.2. Make sure your object is "recursive-safe"
An object method has access to this, so it shares a state with all the methods of the same instance of the object.
So, make sure the object can be used at one point in the stack (i.e. calling method A), and then, at another point (i.e. calling method B), without corrupting the whole object. Design your object to make sure that upon exiting a method, the object is stable and correct (no dangling pointers, no contradicting member variables, etc.).
7.3. Make sure all your objects are correctly encapsulated
No one else should have access to their internal data:
// bad
int & MyObject::getCounter()
{
return this->counter;
}
// good
int MyObject::getCounter()
{
return this->counter;
}
// good, too
void MyObject::getCounter(int & p_counter)
{
p_counter = this->counter;
}
Even returning a const reference could be dangerous if the user retrieves the address of the data, as some other portion of the code could modify it without the code holding the const reference being told.
7.4. Make sure the user knows your object is not thread-safe
Thus, the user is responsible to use mutexes to use an object shared between threads.
The objects from the STL are designed to be not thread-safe (because of performance issues), and thus, if a user want to share a std::string between two threads, the user must protect its access with concurrency primitives;
7.5. Make sure your thread-safe code is recursive-safe
This means using recursive mutexes if you believe the same resource can be used twice by the same thread.
"Safely" is defined exactly as the common sense dictates - it means "doing its thing correctly without interfering with other things". The six points you cite quite clearly express the requirements to achieve that.
The answers to your 3 questions is 3× "no".
Are all recursive functions reentrant?
NO!
Two simultaneous invocations of a recursive function can easily screw up each other, if
they access the same global/static data, for example.
Are all thread-safe functions reentrant?
NO!
A function is thread-safe if it doesn't malfunction if called concurrently. But this can be achieved e.g. by using a mutex to block the execution of the second invocation until the first finishes, so only one invocation works at a time. Reentrancy means executing concurrently without interfering with other invocations.
Are all recursive and thread-safe functions reentrant?
NO!
See above.
The common thread:
Is the behavior well defined if the routine is called while it is interrupted?
If you have a function like this:
int add( int a , int b ) {
return a + b;
}
Then it is not dependent upon any external state. The behavior is well defined.
If you have a function like this:
int add_to_global( int a ) {
return gValue += a;
}
The result is not well defined on multiple threads. Information could be lost if the timing was just wrong.
The simplest form of a reentrant function is something that operates exclusively on the arguments passed and constant values. Anything else takes special handling or, often, is not reentrant. And of course the arguments must not reference mutable globals.
Now I have to elaborate on my previous comment. #paercebal answer is incorrect. In the example code didn't anyone notice that the mutex which as supposed to be parameter wasn't actually passed in?
I dispute the conclusion, I assert: for a function to be safe in the presence of concurrency it must be re-entrant. Therefore concurrent-safe (usually written thread-safe) implies re-entrant.
Neither thread safe nor re-entrant have anything to say about arguments: we're talking about concurrent execution of the function, which can still be unsafe if inappropriate parameters are used.
For example, memcpy() is thread-safe and re-entrant (usually). Obviously it will not work as expected if called with pointers to the same targets from two different threads. That's the point of the SGI definition, placing the onus on the client to ensure accesses to the same data structure are synchronised by the client.
It is important to understand that in general it is nonsense to have thread-safe operation include the parameters. If you've done any database programming you will understand. The concept of what is "atomic" and might be protected by a mutex or some other technique is necessarily a user concept: processing a transaction on a database can require multiple un-interrupted modifications. Who can say which ones need to be kept in sync but the client programmer?
The point is that "corruption" doesn't have to be messing up the memory on your computer with unserialised writes: corruption can still occur even if all individual operations are serialised. It follows that when you're asking if a function is thread-safe, or re-entrant, the question means for all appropriately separated arguments: using coupled arguments does not constitute a counter-example.
There are many programming systems out there: Ocaml is one, and I think Python as well, which have lots of non-reentrant code in them, but which uses a global lock to interleave thread acesss. These systems are not re-entrant and they're not thread-safe or concurrent-safe, they operate safely simply because they prevent concurrency globally.
A good example is malloc. It is not re-entrant and not thread-safe. This is because it has to access a global resource (the heap). Using locks doesn't make it safe: it's definitely not re-entrant. If the interface to malloc had be design properly it would be possible to make it re-entrant and thread-safe:
malloc(heap*, size_t);
Now it can be safe because it transfers the responsibility for serialising shared access to a single heap to the client. In particular no work is required if there are separate heap objects. If a common heap is used, the client has to serialise access. Using a lock inside the function is not enough: just consider a malloc locking a heap* and then a signal comes along and calls malloc on the same pointer: deadlock: the signal can't proceed, and the client can't either because it is interrupted.
Generally speaking, locks do not make things thread-safe .. they actually destroy safety by inappropriately trying to manage a resource that is owned by the client. Locking has to be done by the object manufacturer, thats the only code that knows how many objects are created and how they will be used.
The "common thread" (pun intended!?) amongst the points listed is that the function must not do anything that would affect the behaviour of any recursive or concurrent calls to the same function.
So for example static data is an issue because it is owned by all threads; if one call modifies a static variable the all threads use the modified data thus affecting their behaviour. Self modifying code (although rarely encountered, and in some cases prevented) would be a problem, because although there are multiple thread, there is only one copy of the code; the code is essential static data too.
Essentially to be re-entrant, each thread must be able to use the function as if it were the only user, and that is not the case if one thread can affect the behaviour of another in a non-deterministic manner. Primarily this involves each thread having either separate or constant data that the function works on.
All that said, point (1) is not necessarily true; for example, you might legitimately and by design use a static variable to retain a recursion count to guard against excessive recursion or to profile an algorithm.
A thread-safe function need not be reentrant; it may achieve thread safety by specifically preventing reentrancy with a lock, and point (6) says that such a function is not reentrant. Regarding point (6), a function that calls a thread-safe function that locks is not safe for use in recursion (it will dead-lock), and is therefore not said to be reentrant, though it may nonetheless safe for concurrency, and would still be re-entrant in the sense that multiple threads can have their program-counters in such a function simultaneously (just not with the locked region). May be this helps to distinguish thread-safety from reentarncy (or maybe adds to your confusion!).
The answers your "Also" questions are "No", "No" and "No". Just because a function is recursive and/or thread safe it doesn't make it re-entrant.
Each of these type of function can fail on all the points you quote. (Though I'm not 100% certain of point 5).
non reentrant function means that there will be a static context, maintained by function. when first time entering, there will be create new context for you. and next entering, you don't send more parameter for that, for convenient to token analyze, . e.g. strtok in c. if you have not clear the context, there might be some errors.
/* strtok example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] ="- This, a sample string.";
char * pch;
printf ("Splitting string \"%s\" into tokens:\n",str);
pch = strtok (str," ,.-");
while (pch != NULL)
{
printf ("%s\n",pch);
pch = strtok (NULL, " ,.-");
}
return 0;
}
on the contrary of non-reentrant, reentrant function means calling function in anytime will get the same result without side effect. because there is none of context.
in the view of thread safe, it just means there is only one modification for public variable in current time, in current process. so you should add lock guard to ensure just one change for public field in one time.
so thread safety and reentrant are two different things in different views.reentrant function safety says you should clear context before next time for context analyze. thread safety says you should keep visit public field order.
The terms "Thread-safe" and "re-entrant" mean only and exactly what their definitions say. "Safe" in this context means only what the definition you quote below it says.
"Safe" here certainly doesn't mean safe in the broader sense that calling a given function in a given context won't totally hose your application. Altogether, a function might reliably produce a desired effect in your multi-threaded application but not qualify as either re-entrant or thread-safe according to the definitions. Oppositely, you can call re-entrant functions in ways that will produce a variety of undesired, unexpected and/or unpredictable effects in your multi-threaded application.
Recursive function can be anything and Re-entrant has a stronger definition than thread-safe so the answers to your numbered questions are all no.
Reading the definition of re-entrant, one might summarize it as meaning a function which will not modify any anything beyond what you call it to modify. But you shouldn't rely on only the summary.
Multi-threaded programming is just extremely difficult in the general case. Knowing which part of one's code re-entrant is only a part of this challenge. Thread safety is not additive. Rather than trying to piece together re-entrant functions, it's better to use an overall thread-safe design pattern and use this pattern to guide your use of every thread and shared resources in the your program.
In certain cases - especially when an exception escapes a destructor during stack unwinding - C++ runtime calls terminate() which must do something reasonable post-mortem and then exit the program. When a question "why so harsh" arises the answer is usually "there's nothing more reasonable to do in such error situations". That sounds reasonable if the whole program is in C++.
Now what if the C++ code is in a library and the program that uses the library is not in C++? This happens quite often - for example I might have a native C++ COM component consumed by a .NET program. Once terminate() is called inside the component code the .NET program suddenly ends abnormally. The program author will first of all think "I don't care of C++, why the hell is this library make my program exit?"
How do I handle the latter scenario when developing libraries in C++? Is it reasonable that terminate() unexpectedly ends the program? Is there a better way to handle such situations?
Why is the C++ runtime calling terminate()? It doesn't do it at random, or due to circumstances which cannot be defined and/or avoided when the code is written. It does it because your code does something that is defined to result in a call to terminate(), such as throwing an exception from a destructor during stack unwinding.
There is a list in the C++ standard of all the situations which are defined to result in call to terminate(). If you don't want terminate() to be called, don't do any of those things in your code. The same applies to unexpected(), abort(), and so on.
I don't think this is any different really from the fact that you have to avoid undefined behavior, or in general avoid writing code which is wrong. You also have to avoid behavior which is defined but undesirable.
Perhaps you have a particular example where it is difficult to avoid a call to terminate(), but throwing an exception from a destructor during stack unwinding isn't it. Just don't throw exceptions out of destructors, ever. This means designing your destructors such that if they do something which might fail, the destructor catches the exception and your code continues in a defined state.
There are some situations where your system will impolitely hack your process away at the knees because of something that your C++ code does (although not by calling terminate()). For example, if the system is overcommitting memory and the VMM can't honour the promises malloc/new have made, then your process might be killed. But that's a system feature, and probably applies equally to the other language that's calling your C++. I don't think there's anything you can (or need to) do about it, as long as your caller is aware that your library might allocate memory. In that circumstance it's not your code's fault the process died, it's the defined response of the operating system to low-memory conditions.
I think the more fundamental issue isn't specifically what terminate does, but the library's design. The library may be designed to be used only with C++ in which case exceptions can be caught and handled as appropriate within the app.
If the library is intended to be used in conjunction with non-C++ apps, it needs to provide an interface that ensures no exceptions leave the library, for example an interface that does a catch(...).
Suppose you have a function in C++ called cppfunc and you are invoking it from another langugage (such as C or .NET). I suggest you create a wrapper function, let's say exportedfunc, like so:
int exportedfunc(resultype* outresult, paramtype1 param1, /* ... */)
{
try {
*outresult = cppfunc(param1,param2,/* ... */);
return 0; // indicate success
}catch( ... ) { // may want to have other handlers
/* possibly set other error status info */
return -1; // indicate failure
}
}
Basically, you need to ensure that exceptions do not cross language boundaries... so you need to wrap your C++ functions with a function that catches all exceptions and reports a status code or does something acceptable other than invoking std::terminate.
The default terminate handler will call abort. If you don't want this behavior, define your own terminate handler and set it using set_terminate.