Effect of compiler optimization in C++

Effect of compiler optimization in C++ - c++

I have following code which works in debug build not in the release build with g++ optimizations turned on. (When I say work, what I mean is when the main thread sets the flag stop to true, the looping thread exists).I know this issue can be fixed by adding volatile keyword. My question however is to understand what exactly is happening in this case. This is the code:
void main() {
bool stop = false;
std::thread reader_thread;
auto reader = [&]() {
std::cout << "starting reader" << std::endl;
//BindToCPU(reader_thread, 0);
while(!stop) {
}
std::cout << "exiting reader" << std::endl;
};
reader_thread = std::thread(reader);
sleep(2);
stop = true;
std::cout << "stopped" << std::endl;
reader_thread.join();
}
Why does this happen? Is it because compiler optimizations? Or is it because cache coherency issues? Some details on what exactly happens underneath is appreciated.

The behavior of the program is undefined. The problem is that two threads are both accessing the variable named stop, and one of them is writing to it. In the C++ standard, that's the definition of a data race, and the result is undefined behavior. To get rid of the data race you have to introduce synchronization in some form. The simplest way to do this is to change the type of stop from bool to std::atomic<bool>. Alternatively, you could use a mutex, but for this particular example, that would be overkill.
Marking stop as volatile might make the symptoms go away, but it does not fix the problem.

The problem is that the compiler, specifically the optimization phase cannot tell that the program actually does anything. In particular, it cannot tell that "stop" can ever be anything except false. The best and simplest solution is to make "stop" atomic. Here is the corrected program, with a bonus "sleep" routine at no extra charge.
#include <iostream>
#include <thread>
#include <chrono>
#include <atomic>
inline void std_sleep(long double seconds) noexcept
{
using duration_t = std::chrono::duration<long long, std::nano>;
const auto duration = duration_t(static_cast<long long> (seconds * 1e9));
std::this_thread::sleep_for(duration);
}
int main() {
std::atomic<bool> stop = false;
std::thread reader_thread;
auto reader = [&stop]() {
std::cout << "starting reader" << std::endl;
//BindToCPU(reader_thread, 0);
while(!stop) {
std_sleep(.25);
}
std::cout << "exiting reader" << std::endl;
};
reader_thread = std::thread(reader);
std_sleep(2.0);
stop = true;
std::cout << "stopped" << std::endl;
reader_thread.join();
return 0;
}

You have several threads which access the same variable, and one of them writes to the variable. This situation is called a data race. A data race is undefined behavior, and compilers tend to do funny/catastrophic things in these cases.
One example, which happens to match your description, is stated in here in section 2.3:
... violate the compiler's assumption that ordinary variables do not change without being assigned to ... If the variable is not annotated at all,
the loop waiting for another thread to set flag:
while (!flag) {}
could even be transformed to the, now likely infinite, but sequentially equivalent, loop:
tmp = flag; // tmp is local
while (!tmp) {}
Another article about this type of race condition is this one.

Related

thread::detach during concurrent thread::join in C++

In the following program, during one thread (main) is performing thread::join, another thread (x) calls thread::detach:
#include <thread>
#include <iostream>
int main(void) {
auto t = std::thread([] {
std::this_thread::sleep_for( std::chrono::milliseconds(1000) );
} );
auto x = std::thread([&t] {
std::this_thread::sleep_for( std::chrono::milliseconds(500) );
if ( t.joinable() )
{
std::cout << "detaching t..." << std::endl;
t.detach();
}
} );
std::cout << "joining t..." << std::endl;
t.join();
x.join();
std::cout << "Ok" << std::endl;
return 0;
}
It work fine in GCC's libstdc++ and Clang's libc++ printing
joining t...
detaching t...
Ok
but in Visual Studio the program terminates with not-zero exit code before printing Ok. Online demo: https://gcc.godbolt.org/z/v1nEfaP7a
Is it a bug in Visual Studio or the program contains some undefined behavior?

Neither join nor detach are const-qualified and therefore the implementation is allowed to modify internal memory of the thread object without having to provide any guarantees of write/write or write/read data race avoidance on unsynchronized calls to these member functions per the default data race avoidance requirements of [res.on.data.races].
There is also no exception to this rule mentioned in [thread.threads] or anywhere else for these functions.
Therefore calling join and detach without establishing a happens-before relation between the two calls is a data race and causes undefined behavior.
Even without the detach call, there is still a write/read data race on the join/joinable pair of calls.

Unexpected behavior when std::thread.detach is called

I've been trying to develop a better understanding of C++ threading, by which I have written the following example:
#include <functional>
#include <iostream>
#include <thread>
class Test {
public:
Test() { x = 5; }
void act() {
std::cout << "1" << std::endl;
std::thread worker(&Test::changex, this);
worker.detach();
std::cout << "2" << std::endl;
}
private:
void changex() {
std::cout << "3" << std::endl;
x = 10;
std::cout << "4" << std::endl;
}
int x;
};
int main() {
Test t;
t.act();
return 0;
}
To me, I should get the following output when compiled with g++ linked with -pthread:
1
2
3
4
as the cout calls are in that order. However, the output is inconsistent. 1 and 2 are always printed in order, but sometimes the 3 and or 4 are either omitted or printed double. i.e. 12, 123, 1234, or 12344
My working theory is that the main thread exits before the worker thread begins working or completes, thus resulting in the omission of output. I can immediately think of a solution to this problem in creating a global boolean variable to signify when the worker thread has completed that the main thread waits on for a state change before exiting. This alleviates the issue.
However, this feels to me like a highly messy approach that likely has a more clean solution, especially for an issue like this that likely comes up often in threading.

Just some general advice, that holds both for using raw pthreads in C++ and for pthreads wrapped in std::thread: The best way to get readable, comprehensible and debuggable behavior is to make thread synchronization and lifetime management explicit. I.e. avoid using pthread_kill, pthread_cancel, and in most cases, avoid detaching threads and instead do explicit join.
One design pattern I like is using an std atomic flag. When main thread wants to quit, it sets the atomic flag to true. The worker threads typically do their work in a loop, and check the atomic flag reasonably often, e.g. once per lap of the loop. When they find main has ordered them to quit, they clean up and return. The main thread then join:s with all workers.
There are some special cases that require extra care, for example when one worker is stuck in a blocking syscall and/or C library function. Usually, the platform provides ways of getting out of such blocking calls without resorting to e.g. pthread_cancel, since thread cancellation works very badly with C++. One example of how to avoid blocking is the Linux manpage for getaddrinfo_a, i.e. asynchronous network address translation.
One additional nice design pattern is when workers are sleeping in e.g. select(). You can then add an extra control pipe between main and the worker. Main signals the worker to quit by send():ing one byte over the pipe, thus waking up the worker if it sleeps in select().

Example of how this could be done:
#include <functional>
#include <iostream>
#include <thread>
class Test {
std::thread worker; // worker is now a member
public:
Test() { x = 5; } // worker deliberately left without a function to run.
~Test()
{
if (worker.joinable()) // worker can be joined (act was called successfully)
{
worker.join(); // wait for worker thread to exit.
// Note destructor cannot complete if thread cannot be exited.
// Some extra brains needed here for production code.
}
}
void act() {
std::cout << "1" << std::endl;
worker = std::thread(&Test::changex, this); // give worker some work
std::cout << "2" << std::endl;
}
// rest unchanged.
private:
void changex() {
std::cout << "3" << std::endl;
x = 10;
std::cout << "4" << std::endl;
}
int x;
};
int main() {
Test t;
t.act();
return 0;
} // test destroyed here. Destruction halts and waits for thread.

What's the proper way of implementing 'sleeping' technique using C++?

Two thread. Main one is constantly gathering notifications while the other one is processing some of them.
The way i implemet it - is not correct as i've been told. What problems is it causing and what's wrong about it?
#include <iostream>
#include <atomic>
#include <thread>
#include <mutex>
#include <chrono>
std::condition_variable foo;
std::mutex mtx;
void secondThread()
{
while (true)
{
foo.wait(std::unique_lock<std::mutex>(mtx));
std::cout << " ----------------------------" << std::endl;
std::cout << "|processing a notification...|" << std::endl;
std::cout << " ----------------------------" << std::endl;
}
}
int main()
{
std::thread subThread = std::thread(&secondThread);
int count = 0;
while (true)
{
if (count % 10 == 0)
{
foo.notify_one();
}
std::cout << "Main thread working on gathering notifications..." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(300));
count++;
}
return 0;
}
I was told that this foo.wait(std::unique_lock<std::mutex>(mtx)) line of code is not a good practice according to the C++ spec. This is not a proper way of solving this kind of problem. It's also called, sleeping(not busy waiting).

Before you call wait, you must check that the thing you are waiting for hasn't already happened. And before you stop calling wait, you must check that the thing you are waiting for has happened. Condition variables are stateless and have no idea what you're waiting for. It's your job to code that.
Also, the associated mutex must protect the thing you're waiting for. The entire point of a condition variable is to provide an atomic "unlock and wait" operation to prevent this problem:
You check if you need to wait under the protection of a mutex.
You decide you do need to wait.
You unlock the mutex so other threads can make progress.
You wait.
But what if the thing you're waiting for happens after you unlocked the mutex but before you waited? You'll be waiting for something that already happened.
This is why the wait function takes a lock holder -- so that it can perform steps 3 and 4 atomically.

C++ condition_variable wait_for returns instantly

This is simple code from http://www.cplusplus.com/reference/condition_variable/condition_variable/wait_for/
Why does wait_for() return instantly if I comment line with starting thread?
Like this:
// condition_variable::wait_for example
#include <iostream> // std::cout
#include <thread> // std::thread
#include <chrono> // std::chrono::seconds
#include <mutex> // std::mutex, std::unique_lock
#include <condition_variable> // std::condition_variable, std::cv_status
std::condition_variable cv;
int value;
void read_value() {
std::cin >> value;
cv.notify_one();
}
int main ()
{
std::cout << "Please, enter an integer (I'll be printing dots): ";
//std::thread th (read_value);
std::mutex mtx;
std::unique_lock<std::mutex> lck(mtx);
while (cv.wait_for(lck,std::chrono::seconds(1))==std::cv_status::timeout) {
std::cout << '.';
}
std::cout << "You entered: " << value << '\n';
//th.join();
return 0;
}
Update:
Please don't look for other problems in this example (related buffering cout ...). The original question was about why wait_for is skipped.

Short answer: compile with -pthread and your issue will go away.
Update:
This is a confirmed bug/issue in libstdc++. Without -pthread being passed in as a compiler flag, the timed wait call will return immediately. Given the history of the issue (3 years), it's not likely to be fixed anytime soon. Anyway, read my message below on why you should be using condition variables with predicates to avoid the spurious wakeup problem. It still holds true even if you are linking with the posix threads library.
That sample code on cplusplus.com has several issues. For starters, amend this line:
std::cout << '.';
To be like this:
std::cout << '.';
std::cout.flush()
Otherwise, you won't see any dots if stdout isn't getting flushed.
If you compile your program (with the thread commented out) like this:
g++ yourcode.cpp -std=c++11
Then the resulting a.out program exhibits the issue you described when the thread is not used. That is, there's a spurious wakeup when the thread is not used. It's like there's a phantom notify() call being invoked on the condition variable from some unknown source. This is odd, but not impossible.
But as soon as you uncomment out the declaration of the thread variable, the program will throw an exception (and crash) as a result of the program not using a multithreaded:
terminate called after throwing an instance of 'std::system_error'
what(): Enable multithreading to use std::thread: Operation not permitted
Please, enter an integer (I'll be printing dots): Aborted (core dumped)
Interesting, so let's fix that by recompiling with -pthread
g++ yourcode.cpp -std=c++11 -pthread
Now everything works as expected with or without the thread. No more spurious wakeup it seems.
Now let's talk about why you are seeing the behavior you are seeing. Programs using condition variables should always be written to deal with spurious wakeup. And preferably, use a predicate statement. That is, you might get a phantom notify causing your wait or wait_for statement to return early. The example code on the web from cplusplus.com doesn't use a predicate nor does it deal with this possibility.
Let's amend it as follows:
Change this block of code:
while (cv.wait_for(lck,std::chrono::seconds(1))==std::cv_status::timeout) {
std::cout << '.';
}
To be this:
while (cv.wait_for(lck,std::chrono::seconds(1), condition_check)==false) {
std::cout << '.';
std::cout.flush();
}
And then elsewhere outside of main, but after the declaration of value, add this function:
bool condition_check() {
return (value != 0);
}
Now the wait loop will wake up every second and/or when the notify call is made by the input thread. The wait loop will continue until value != 0. (Technically, value should be synchronized between threads, either with the lock or as a std::atomic value, but that's a minor detail).
Now the mystery is why does the non-predicate version of wait_for suffer from the spurious wake_up problem. My guess is that's an issue with the single threaded C++ runtime that goes away when the multithreaded runtime (-pthread) is used. Perhaps condition_variable has different behavior or a different implementation when the posix thread library is linked in.

There are several issue with this code:
First, as you have noticed, the program has to be build with the -pthread option.
Second, you need to flush the output if you want to see the dots printed.
Most importantly, this is entirely incorrect usage of mutex and condition variable. A condition variable notification indicates a change of value in a user-specified predicate/condition: the changing of the condition and examining it must be atomic and serialized: otherwise there is a data race and the behavior of the program would be undefined.
As is the case with the example program: value is read and written by two threads, but without any concurrency control mechanism, or to put it differently, there's no "happens-before" relation between the operation, which reads value and the operation which writes value.
Fixed example follows:
// condition_variable::wait_for example
#include <chrono> // std::chrono::seconds
#include <condition_variable> // std::condition_variable, std::cv_status
#include <iostream> // std::cout
#include <mutex> // std::mutex, std::unique_lock
#include <thread> // std::thread
std::mutex mtx;
std::condition_variable cv;
int value;
void read_value() {
int v;
std::cin >> v;
std::unique_lock<std::mutex> lck(mtx);
value = v;
cv.notify_one();
}
int main() {
std::cout << "Please, enter an integer (I'll be printing dots): ";
std::thread th(read_value);
std::unique_lock<std::mutex> lck(mtx);
while (cv.wait_for(lck, std::chrono::seconds(1)) == std::cv_status::timeout) {
std::cout << '.' << std::flush;
}
std::cout << "You entered: " << value << '\n';
th.join();
return 0;
}
So, what are the changes:
mutex is moved to global scope (for the sake of the example), so the thread, which reads value can lock it, in order to modify value.
the read is in a separate variable; it cannot be directly into value, because value must be modified only under the protection of the mutex, but holding the mutex, while waiting from input form std::cin would prevent the main thread from printing dots, as it will try to acquire the mutex upon timeout.
after each dot output, the std::cout is flushed

Handling mutual exclusion in C++11

I have a class representing a finite-state machine, which should run in a forever loop and check it's current state. In each state machine will set it's next state and either fall into idle state or do some work. I would like to allow another thread to change state of machine while it's working. This will cause a race condition as expected. So I add a mutual exclusion lock/unlock wrapping loop of machine and the public method that allows other threads to change current state of machine.
class Robot
{
public:
enum StateType {s1,s2,s3,idle,finish};
void run();
void move();
private:
StateType currentState;
StateType nextState;
StateType previousState;
std::mutex mutal_state;
};
Implementation:
void Robot::run()
{
this->currentState = s1;
while(true)
{
mutal_state.lock();
switch(currentState)
{
case s1:
// do some useful stuff here...
currentState = idle;
nextState = s3;
break;
case s2:
// do some other useful stuff here...
currentState = idle;
nextState = finish;
break;
case s3:
// again, do some useful things...
currentState = idle;
nextState = s2;
break;
case idle:
// busy waiting...
std::cout << "I'm waiting" << std::endl;
break;
case finish:
std::cout << "Bye" << std::endl;
mutal_state.unlock();
return;
}
mutal_state.unlock();
}
}
And the move method that allows other threads to change current state:
void Robot::move()
{
mutal_state.lock();
previousState = currentState; // Booommm
currentState = nextState;
mutal_state.unlock();
}
I can't manage to find what I'm doing wrong! Program crashes in first line of the move() function. On the other hand, the GDB is not working with C++11 and tracing code is not possible...
UPDATE:
Playing around code, I can see that problem is in move function. When the program tries to lock code piece inside move(), crashes. For example if move is like this:
void Robot::move()
{
std::cout << "MOVE IS CALLED" << std::endl;
mutal_state.lock();
//previousState = currentState;
//std::cout << "MOVING" << std::endl;
//currentState = nextState;
mutal_state.unlock();
}
Output is:
s1
I'm waiting
I'm waiting
MOVE IS CALLED1
The program has unexpectedly finished.
But when move is a simple function, not doing anything:
void Robot::move()
{
std::cout << "MOVE IS CALLED" << std::endl;
//mutal_state.lock();
//previousState = currentState;
//std::cout << "MOVING" << std::endl;
//currentState = nextState;
//mutal_state.unlock();
}
Program runs concurrently.

My suggestions:
1) if you have no debugger, how can you be so sure it is the first line of move that crashes? It is always with questioning any assumptions you have made about the code, unless you have hard evidence to back it up.
2) I would look at whatever interesting code is in state s3, as this is what the first call to move will perform. Up to that point the code in s3 has not been run. Either that or remove all code bar what is in the posted example, to rule this out.
3) The compiler may make copies of the variables in registers, you should declare all the states as volatile so it knows not to optimise in this way.

I can not help you why your code "explodes", however I can assume that the problem is not in the code you posted as it runs fine for me.
This will output for me:
I'm working
...
Bye
Code:
int main() {
Robot r;
auto async_moves = [&] () { // simulate some delayed interaction
std::this_thread::sleep_for(std::chrono::seconds(2)); //See note
for(auto i = 0; i != 3; ++i)
r.move();
};
auto handle = std::async(std::launch::async, async_moves);
r.run();
}
(Note: You have to compile with -D_GLIBCXX_USE_NANOSLEEP assuming you are using gcc, see this question.)
Note that the code above - and yours maybe, too - is still vulnurable to the problem, that the states may get invalidated if move is called twice or more before the loop triggers again.
Like one of the comments already mentioned, prefer to use lock_guards:
std::lock_guard<std::mutex> lock(mutal_state);

If you're using g++ on linux, you need to link with -lpthread in order for mutexes or threading stuff to work properly. If you don't, it won't fail to link, but will instead behave badly or crash at runtime...

I'm answering my own question! Because I find the problem, and It was not related to locking nor mutex implementation of C++0x. There is an ImageProcess class that should control state of Robot. It has a pointer to it's parent of type Robot* and using that, will move its parent. For that I've implemented a workhorse and a starter function. The start spawns a std::tread and runs workhorse on it:
void ImageProcess::start()
{
std::thread x(&ImageProcess::workhorse, *this);
x.detach();
}
I realized that this->parent in workhorse is a dangling pointer. Obviously calling parent->move() should crash. But it don't crash immediately! Surprisingly program control enters into move() function and then tries to change previousState of a non-existing Robot thing. (or lock a mutex of non-existing Robot).
I found that when invoking a thread like std::thread x(&ImageProcess::workhorse, *this); x.join() or x.detach(), the code is no longer running in caller object. To test I printed address of this and &image in both Robot::run() and ImageProcess::workhorse. There were different. I also added a public boolean foo to ImageProcess and changed its value to true in Robot, then printed it in workhorse and run, in workhorse value is always 0 but in Robot is 1.
I believe this is very strange behavior. I don't know if it's related to memory model or ownership of ImageProcess somehow is changed after std::thread x(&ImageProcess::workhorse, *this)...
I make ImageProcess a factory pattern class (everything is static!). Now it's OK.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Effect of compiler optimization in C++ - c++

Related

thread::detach during concurrent thread::join in C++

Unexpected behavior when std::thread.detach is called

What's the proper way of implementing 'sleeping' technique using C++?

C++ condition_variable wait_for returns instantly

Handling mutual exclusion in C++11

Categories

Resources