Multithreaded program works only with print statements

Multithreaded program works only with print statements - c++

I wish I could have thought of a more descriptive title, but that is the case. I've got some code that I'd like to do some image processing with. I also need to get some statistical data from those processed images, and I'd like to do that on a separate thread so my main thread can keep doing image processing.
All that aside, here's my code. It shouldn't really be relevant, other than the fact that my Image class wraps an OpenCV Mat (though I'm not using OMP or anything, as far as I'm aware):
#include <thread>
#include <iostream>
#include <vector>
using namespace std;
//Data struct
struct CenterFindData{
//Some images I'd like to store
Image in, bpass, bpass_thresh, local_max, tmp;
//Some Data (Particle Data = float[8])
vector<ParticleData> data;
//My thread flag
bool goodToGo{ false };
//Constructor
CenterFindData(const Image& m);
};
vector<ParticleData> statistics(CenterFindData& CFD);
void operate(vector<CenterFindData> v_CFD);
..........................................
..........................................
..........................................
void operate(vector<CenterFindData> v_CFD){
//Thread function, gathers statistics on processed images
thread T( [&](){
int nProcessed(0);
for (auto& cfd : v_CFD){
//Chill while the images are still being processed
while (cfd.goodToGo == false){
//This works if I uncomment this print statement
/*cout << "Waiting" << endl;*/
}
cout << "Statistics gathered from " << nProcessed++ << " images" << endl;
//This returns vector<ParticleData>
cfd.data = m_Statistics(cfd);
}
});
//Run some filters on the images before statistics
int nProcessed(0);
for (auto& cfd : v_CFD){
//Preprocess images
RecenterImage(cfd.in);
m_BandPass(cfd);
m_LocalMax(cfd);
RecenterImage(cfd.bpass_thresh);
//Tell thread to do statistics, on to the next
cfd.goodToGo = true;
cout << "Ran filters on " << nProcessed++ << " images" << endl;
}
//Join thread
T.join();
}
I figure that the delay from cout is avoid some race condition I otherwise run into, but what? Because only one thread modified the bool goodToGo, while the other checks it, should that be a thread safe way of "gating" the two functions?
Sorry if anything is unclear, I'm very new to this and seem to make a lot of obvious mistakes WRT multithreaded programming.
Thanks for your help
john

When you have:
while (cfd.goodToGo == false){ }
the compiler doesn't see any reason to "reload" the value of goodToGo (it doesn't know that this value is affected by other threads!). So it reads it once, and then loops forever.
The reason printing something makes a difference is, that the compiler doesn't know what the print function actually will and won't affect, so "just in case", it reloads that value (if the compiler could "see inside" all of the printing code, it could in fact decide that goodToGo is NOT changed by the printing, and not needing to reload - but there are limits to how much time [or some proxy for time, such as "number of levels of calls" or "number of intermediate instructions"] the compiler spends on figuring these sort of thing out [and there may of course be calls to code that the compiler doesn't actually have access to the source code of, such as the system calls to write or similar].
The solution, however, is to use thread-safe mechanisms to update the goodToGo - we could just throw a volatile attribute to the variable, but that will not guarantee that, for example, another processor gets "told" that the value has updated, so could delay the detection of the updated value significantly [or even infinitely under some conditions].
Use std::atomic_bool goodToGo and the store and load functions to access the value inside. That way, you will be guaranteed that the value is updated correctly and "immediately" (as in, a few dozen to hundreds of clock-cycles later) seen by the other thread.
As a side-note, which probably should have been the actual answer: Busy-waiting for threads is a bad idea in general, you should probably look at some thread-primitives to wait for a condition_variable or similar.

Related

Race Condition in detached Thread

I have tried to find a similar problem, but was unable to find it, or my knowledge is not enough to recognize the similarity.
I have a main loop creating an object, whereas this object has an infinite loop to process a matrix and do stuff with this matrix. I call this process function within a separate thread an detach it, so it is able to process the matrix multiple times, while the main loop might just wait for something and does nothing.
After a while the main loop receives a new matrix, while I represented it by just creating a new matrix and passes this new matrix into the object. The idea is that, due to waiting a few seconds before processing in the infinite while loop again, the update function can lock the mutex and the mutex is not (almost) frequently locked.
Below I tried to code a minimal Example.
class Test
{
public:
Test();
~Test();
void process(){
while(1){
boost::mutes::scoped_lock locker(mtx);
std::cout << "A" << std::endl;
// do stuff with Matrix
std::cout << "B" << std::endl;
mtx.unlock();
//wait for few microseconds
}
}
void updateMatrix(matrix MatrixNew){
boost::mutes::scoped_lock locker(mtx);
std::cout << "1" << std::endl;
Matrix = MatrixNew;
std::cout << "2" << std::endl;
}
private:
boost::mutex mtx;
matrix Matrix;
}
int main(){
Test test;
boost::thread thread_;
thread_ = boost::thread(&Test::process,boost::ref(test));
thread_.detach();
while(once_in_a_while){
Matrix MatrixNew;
test.updateMatrix(MatrixNew);
}
}
Unfortunately a race condition occurs. Process and Update have multiple steps within the locked mutex environment, while I print stuff to the console between these steps. I found that, both, the matrix is messed up and Letters/Numbers to occur parallel and not consecutive.
Any ideas why this occurs?
Best wishes and thanks in advance

while(1){
boost::mutes::scoped_lock locker(mtx);
std::cout << "A" << std::endl;
// do stuff with Matrix
std::cout << "B" << std::endl;
mtx.unlock();
//wait for few microseconds
}
Here you manually unlock mtx. Then sometime later the scoped_lock (called locker) also unlocks the mutex in its destructor (which is the point of that class). I dont know what the guarantee's boost::mutex requires but unlocking it more times than you locked it can't lead to anything good.
Instead of
mtx.unlock(); you presumably want locker.unlock();
Edit: A recommendation here would be to avoid using boost for this and use standard c++ instead. threading has been part of the standard since C++11 (8 years!) so presumably all your tools will now support it. Using standardised code/tools gives you better documentation and better help as they're much more well known. I'm not knocking boost (a lot of the standard started in boost) but once something has been consumed into the standard you should strongly think about using it.

List Iterator from STL library malfunctioning

The following is a snippet of a larger program and is done using Pthreads.
The UpdateFunction reads from a text file. The FunctionMap is just used to output (key,1). Here essentially UpdateFunction and FunctionMap run on different threads.
queue <list<string>::iterator> mapperpool;
void *UpdaterFunction(void* fn) {
std::string *x = static_cast<std::string*>(fn);
string filename = *x;
ifstream file (filename.c_str());
string word;
list <string> letterwords[50];
char alphabet = '0';
bool times = true;
int charno=0;
while(file >> word) {
if(times) {
alphabet = *(word.begin());
times = false;
}
if (alphabet != *(word.begin())) {
alphabet = *(word.begin());
mapperpool.push(letterwords[charno].begin());
letterwords[charno].push_back("xyzzyspoon");
charno++;
}
letterwords[charno].push_back(word);
}
file.close();
cout << "UPDATER DONE!!" << endl;
pthread_exit(NULL);
}
void *FunctionMap(void *i) {
long num = (long)i;
stringstream updaterword;
string toQ;
int charno = 0;
fprintf(stderr, "Print me %ld\n", num);
sleep(1);
while (!mapperpool.empty()) {
list<string>::iterator it = mapperpool.front();
while(*it != "xyzzyspoon") {
cout << "(" << *it << ",1)" << "\n";
cout << *it << "\n";
it++;
}
mapperpool.pop();
}
pthread_exit(NULL);
}
If I add the while(!mapperpool.empty()) in the UpdateFunction then it gives me the perfect output. But when I move it back to the FunctionMap then it gives me a weird out and Segfaults later.
Output when used in UpdateFunction:
Print me 0
course
cap
class
culture
class
cap
course
course
cap
culture
concurrency
.....
[Each word in separate line]
Output when used in FunctionMap (snippet shown above):
Print me 0
UPDATER DONE!!
(course%0+�0#+�0�+�05P+�0����cap%�+�0�+�0,�05�+�0����class5P?�0
����xyzzyspoon%�+�0�+�0(+�0%P,�0,�0�,�05+�0����class%p,�0�,�0-�05�,�0����cap%�,�0�,�0X-�05�,�0����course%-�0 -�0�-�050-�0����course%-�0p-�0�-�05�-�0����cap%�-�0�-�0H.�05�-�0����culture%.�0.�0�.�05 .�0
����concurrency%P.�0`.�0�.�05p.�0����course%�.�0�.�08/�05�.�0����cap%�.�0/�0�/�05/�0Segmentation fault (core dumped)
How do I fix this issue?

list <string> letterwords[50] is local to UpdaterFunction. When UpdaterFunction finishes, all its local variables got destroyed. When FunctionMap inspects iterator, that iterator already points to deleted memory.
When you insert while(!mapperpool.empty()) UpdaterFunction waits for FunctionMap completion and letterwords stays 'alive'.

Here essentially UpdateFunction and FunctionMap run on different threads.
And since they both manipulate the same object (mapperpool) and neither of them uses either pthread_mutex nor std::mutex (C++11), you have a data race. If you have a data race, you have Undefined Behaviour and the program might do whatever it wants. Most likely it will write garbage all over memory until eventually crashing, exactly as you see.
How do I fix this issue?
By locking the mapperpool object.
Why is list not thread-safe?
Well, in vast majority of use-cases, a single list (or any other collection) won't be used by more than one thread. In significant part of the rest the lock will have to extend over more than one operation on the collection, so the client will have to do its own locking anyway. The remaining tiny percentage of cases where locking in the operations themselves would help is not worth adding the overhead for everyone; C++ key design principle is that you only pay for what you use.
The collections are only reentrant, meaning that using different instances in parallel is safe.
Note on pthreads
C++11 introduced threading library that integrates well with the language. Most notably, it uses RAII for locking of std::mutex via std::lock_guard, std::unique_lock and std::shared_lock (for reader-writer locking). Consistently using these can eliminate large class of locking bugs that otherwise take considerable time to debug.
If you can't use C++11 yet (on desktop you can, but some embedded platforms did not get a compiler update yet), you should first consider Boost.Thread as it provides the same benefits.
If you can't use even then, still try to find, or write, a simple RAII wrapper for locking like the C++11/Boost do. The basic wrapper is just a couple of lines, but it will save you a lot of debugging.
Note that C++11 and Boost also have atomic operations library that pthreads sorely miss.

strange proplem using two Threads and Boolean

(I hate having to put a title like this. but I just couldn't find anything better)
I have two classes with two threads. first one detects motion between two frames:
void Detector::run(){
isActive = true;
// will run forever
while (isActive){
//code to detect motion for every frame
//.........................................
if(isThereMotion)
{
if(number_of_sequence>0){
theRecorder.setRecording(true);
theRecorder.setup();
// cout << " motion was detected" << endl;
}
number_of_sequence++;
}
else
{
number_of_sequence = 0;
theRecorder.setRecording(false);
// cout << " there was no motion" << endl;
cvWaitKey (DELAY);
}
}
}
second one will record a video when started:
void Recorder::setup(){
if (!hasStarted){
this->start();
}
}
void Recorder::run(){
theVideoWriter.open(filename, CV_FOURCC('X','V','I','D'), 20, Size(1980,1080), true);
if (recording){
while(recording){
//++++++++++++++++++++++++++++++++++++++++++++++++
cout << recording << endl;
hasStarted=true;
webcamRecorder.read(matRecorder); // read a new frame from video
theVideoWriter.write(matRecorder); //writer the frame into the file
}
}
else{
hasStarted=false;
cout << "no recording??" << endl;
changeFilemamePlusOne();
}
hasStarted=false;
cout << "finished recording" << endl;
theVideoWriter.release();
}
The boolean recording gets changed by the function:
void Recorder::setRecording(bool x){
recording = x;
}
The goal is to start the recording once motion was detected while preventing the program from starting the recording twice.
The really strange problem, which honestly doesn't make any sense in my head, is that the code will only work if I cout the boolean recording ( marked with the "++++++"). Else recording never changes to false and the code in the else statment never gets called.
Does anyone have an idea on why this is happening. I'm still just begining with c++ but this problem seems really strange to me..

I suppose your variables isThereMotion and recording are simple class members of type bool.
Concurrent access to these members isn't thread safe by default, and you'll face race conditions, and all kinds of weird behaviors.
I'd recommend to declare these member variables like this (as long you can make use of the latest standard):
class Detector {
// ...
std::atomic<bool> isThereMotion;
};
class Recorder {
// ...
std::atomic<bool> hasStarted;
};
etc.
The reason behind the scenes is, that even reading/writing a simple boolean value splits up into several assembler instructions applied to the CPU, and those may be scheduled off in the middle for a thread execution path change of the process. Using std::atomic<> provides something like a critical section for read/write operations on this variable automatically.
In short: Make everything, that is purposed to be accessed concurrently from different threads, an atomic value, or use an appropriate synchronization mechanism like a std::mutex.
If you can't use the latest c++ standard, you can perhaps workaround using boost::thread to keep your code portable.
NOTE:
As from your comments, your question seems to be specific for the Qt framework, there's a number of mechanisms you can use for synchronization as e.g. the mentioned QMutex.
Why volatile doesn't help in multithreaded environments?
volatile prevents the compiler to optimize away actual read access just by assumptions of values set formerly in a sequential manner. It doesn't prevent threads to be interrupted in actually retrieving or writing values there.
volatile should be used for reading from addresses that can be changed independently of the sequential or threading execution model (e.g. bus addressed peripheral HW registers, where the HW changes values actively, e.g. a FPGA reporting current data throughput at a register inteface).
See more details about this misconception here:
Why is volatile not considered useful in multithreaded C or C++ programming?

You could use a pool of nodes with pointers to frame buffers as part of a linked list fifo messaging system using mutex and semaphore to coordinate the threads. A message for each frame to be recorded would be sent to the recording thread (appended to it's list and a semaphore released), otherwise the node would be returned (appended) back to the main thread's list.
Example code using Windows based synchronization to copy a file. The main thread reads into buffers, the spawned thread writes from buffers it receives. The setup code is lengthy, but the actual messaging functions and the two thread functions are simple and small.
mtcopy.zip

Could be a liveness issue. The compiler could be re-ordering instructions or hoisting isActive out of the loop. Try marking it as volatile.
From MSDN docs:
Objects that are declared as volatile are not used in certain optimizations because their values can change at any time. The system always reads the current value of a volatile object when it is requested, even if a previous instruction asked for a value from the same object. Also, the value of the object is written immediately on assignment.
Simple example:
#include <iostream>
using namespace std;
int main() {
bool active = true;
while(active) {
cout << "Still active" << endl;
}
}
Assemble it:
g++ -S test.cpp -o test1.a
Add volatile to active as in volatile bool active = true
Assemble it again g++ -S test.cpp -o test2.a and look at the difference diff test1.a test2.a
< testb $1, -5(%rbp)
---
> movb -5(%rbp), %al
> testb $1, %al
Notice the first one doesn't even bother to read the value of active before testing it since the loop body never modifies it. The second version does.

C++ - Threads without coordinating mechanism like mutex_Lock

I attended one interview two days back. The interviewed guy was good in C++, but not in multithreading. When he asked me to write a code for multithreading of two threads, where one thread prints 1,3,5,.. and the other prints 2,4,6,.. . But, the output should be 1,2,3,4,5,.... So, I gave the below code(sudo code)
mutex_Lock LOCK;
int last=2;
int last_Value = 0;
void function_Thread_1()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 2)
{
cout << ++last_Value << endl;
last = 1;
}
mutex_Unlock(&LOCK);
}
}
void function_Thread_2()
{
while(1)
{
mutex_Lock(&LOCK);
if(last == 1)
{
cout << ++last_Value << endl;
last = 2;
}
mutex_Unlock(&LOCK);
}
}
After this, he said "these threads will work correctly even without those locks. Those locks will reduce the efficiency". My point was without the lock there will be a situation where one thread will check for(last == 1 or 2) at the same time the other thread will try to change the value to 2 or 1. So, My conclusion is that it will work without that lock, but that is not a correct/standard way. Now, I want to know who is correct and in which basis?

Without the lock, running the two functions concurrently would be undefined behaviour because there's a data race in the access of last and last_Value Moreover (though not causing UB) the printing would be unpredictable.
With the lock, the program becomes essentially single-threaded, and is probably slower than the naive single-threaded code. But that's just in the nature of the problem (i.e. to produce a serialized sequence of events).

I think the interviewer might have thought about using atomic variables.
Each instantiation and full specialization of the std::atomic template defines an atomic type. Objects of atomic types are the only C++ objects that are free from data races; that is, if one thread writes to an atomic object while another thread reads from it, the behavior is well-defined.
In addition, accesses to atomic objects may establish inter-thread synchronization and order non-atomic memory accesses as specified by std::memory_order.
[Source]
By this I mean the only thing you should change is remove the locks and change the lastvariable to std::atomic<int> last = 2; instead of int last = 2;
This should make it safe to access the last variable concurrently.
Out of curiosity I have edited your code a bit, and ran it on my Windows machine:
#include <iostream>
#include <atomic>
#include <thread>
#include <Windows.h>
std::atomic<int> last=2;
std::atomic<int> last_Value = 0;
std::atomic<bool> running = true;
void function_Thread_1()
{
while(running)
{
if(last == 2)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 1;
}
}
}
void function_Thread_2()
{
while(running)
{
if(last == 1)
{
last_Value = last_Value + 1;
std::cout << last_Value << std::endl;
last = 2;
}
}
}
int main()
{
std::thread a(function_Thread_1);
std::thread b(function_Thread_2);
while(last_Value != 6){}//we want to print 1 to 6
running = false;//inform threads we are about to stop
a.join();
b.join();//join
while(!GetAsyncKeyState('Q')){}//wait for 'Q' press
return 0;
}
and the output is always:
1
2
3
4
5
6
Ideone refuses to run this code (compilation errors)..
Edit: But here is a working linux version :) (thanks to soon)

The interviewer doesn't know what he is talking about. Without the locks you get races on both last and last_value. The compiler could for example reorder the assignment to last before the print and increment of last_value, which could lead to the other thread executing on stale data. Furthermore you could get interleaved output, meaning things like two numbers not being seperated by a linebreak.
Another thing, which could go wrong is that the compiler might decide not to reload last and (less importantly) last_value each iteration, since it can't (safely) change between those iterations anyways (since data races are illegal by the C++11 standard and aren't acknowledged in previous standards). This means that the code suggested by the interviewer actually has a good chance of creating infinite loops of doing absoulutely doing nothing.
While it is possible to make that code correct without mutices, that absolutely needs atomic operations with appropriate ordering constraints (release-semantics on the assignment to last and acquire on the load of last inside the if statement).
Of course your solution does lower efficiency due to effectivly serializing the whole execution. However since the runtime is almost completely spent inside the streamout operation, which is almost certainly internally synchronized by the use of locks, your solution doesn't lower the efficiency anymore then it already is. Waiting on the lock in your code might actually be faster then busy waiting for it, depending on the availible resources (the nonlocking version using atomics would absolutely tank when executed on a single core machine)

BOOST threading : cout behavior

I am new to Boost threading and I am stuck with how output is performed from multiple threads.
I have a simple boost::thread counting down from 9 to 1; the main thread waits and then prints "LiftOff..!!"
#include <iostream>
#include <boost/thread.hpp>
using namespace std;
struct callable {
void operator() ();
};
void callable::operator() () {
int i = 10;
while(--i > 0) {
cout << "#" << i << ", ";
boost::this_thread::yield();
}
cout.flush();
}
int main() {
callable x;
boost::thread myThread(x);
myThread.join();
cout << "LiftOff..!!" << endl;
return 0;
}
The problem is that I have to use an explicit "cout.flush()" statement in my thread to display the output. If I don't use flush(), I only get "LiftOff!!" as the output.
Could someone please advise why I need to use flush() explicitly?

This isn't specifically thread related as cout will buffer usually on a per thread basis and only output when the implementation decides to - so in the thread the output will only appear on a implementation specific basic - by calling flush you are forcing the buffers to be flushed.
This will vary across implementations - usually though it's after a certain amount of characters or when a new line is sent.
I've found that multiple threads writing too the same stream or file is mostly OK - providing that the output is performed as atomically as possible. It's not something that I'd recommend in a production environment though as it is too unpredictable.

This behaviour seems to depend on OS specific implementation of the cout stream. I guess that write operations on cout are buffered to some thread specific memory intermediatly in your case, and the flush() operation forces them being printed on the console. I guess this, since endl includes calling the flush() operation and the endl in your main function doesn't see your changes even after the thread has been joined.
BTW it would be a good idea to synchronize outputs to an ostream shared between threads anyway, otherwise you might see them intermigled. We do so for our logging classes which use a background thread to write the logging messages to the associated ostream.

Given the short length of your messages, there's no reason anything should appear without a flush. (Don't forget that std::endl is the equivalent of << '\n' << std::flush.)

I get the asked behaviour with and without flush (gcc 4.3.2 boost 1.47 Linux RH5)
I assume that your cygwin system chooses to implement several std::cout objects with associated std::streambuf. This I assume is implementation specific.
Since flush or endl only forces its buffer to flush onto its OS controlled output sequence the cout object of your thread remains buffered.
Sharing a reference of an ostream between the threads should solve the problem.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js