In Microsoft Visual C++ I can call CreateThread() to create a thread by starting a function with one void * parameter. I pass a pointer to a struct as that parameter, and I see a lot of other people do that as well.
My question is if I am passing a pointer to my struct how do I know if the structure members have been actually written to memory before CreateThread() was called? Is there any guarantee they won't be just cached? For example:
struct bigapple { string color; int count; } apple;
apple.count = 1;
apple.color = "red";
hThread = CreateThread( NULL, 0, myfunction, &apple, 0, NULL );
DWORD WINAPI myfunction( void *param )
{
struct bigapple *myapple = (struct bigapple *)param;
// how do I know that apple's struct was actually written to memory before CreateThread?
cout << "Apple count: " << myapple->count << endl;
}
This afternoon while I was reading I saw a lot of Windows code on this website and others that passes in data that is not volatile to a thread, and there doesn't seem to be any memory barrier or anything else. I know C++ or at least older revisions are not "thread aware" so I'm wondering if maybe there's some other reason. My guess would be the compiler sees that I've passed a pointer &apple in a call to CreateThread() so it knows to write out members of apple before the call.
Thanks
No. The relevant Win32 thread functions all take care of the necessary memory barriers. All writes prior to CreateThread are visible to the new thread. Obviously the reads in that newly created thread cannot be reordered before the call to CreateThread.
volatile would not add any extra useful constraints on the compiler, and merely slow down the code. In practice thiw wouldn't be noticeable compared to the cost of creating a new thread, though.
No, it should not be volatile. At the same time you are pointing at the valid issue. Detailed operation of the cache is described in the Intel/ARM/etc papers.
Nevertheless you can safely assume that the data WILL BE WRITTEN. Otherwise too many things will be broken. Several decades of experience tell that this is so.
If thread scheduler will start thread on the same core, the state of the cache will be fine, otherwise, if not, kernel will flush the cache. Otherwise, nothing will work.
Never use volatile for interaction between threads. It is an instruction on how to handle data inside the thread only (use a register copy or always reread, etc).
First, I think optimizer cannot change the order at expense of the correctness. CreateThread() is a function, parameter binidng for function calls happens before the call is made.
Secondly, volatile is not very helpful for the purpose you intend. Check out this article.
You're struggling into a non-problem, and are creating at least other two...
Don't worry about the parameter given to CreateThread: if they exist at the time the thread is created they exist until CreateThread returns. And since the thread who creates them does not destroy them, they are also available to the other thread.
The problem now becomes who and when they will be destroyed: You create them with new so they will exist until a delete is called (or until the process terminates: good memory leak!)
The process terminate when its main thread terminate (and all other threads will also be terminated as well by the OS!). And there is nothing in your main that makes it to wait for the other thread to complete.
Beware when using low level API like CreateThread form languages that have thir own library also interfaced with thread. The C-runtime has _beginthreadex. It call CreateThread and perform also other initialization task for the C++ library you will otherwise miss. Some C (and C++) library function may not work properly without those initializations, that are also required to properly free the runtime resources at termination. Unsing CreateThread is like using malloc in a context where delete is then used to cleanup.
The proper main thread bnehavior should be
// create the data
// create the other thread
// // perform othe task
// wait for the oter thread to terminate
// destroy the data
What the win32 API documentation don't say clearly is that every HANDLE is waitable, and become signaled when the associate resource is freed.
To wait for the other thread termination, you main thread will just have to call
WaitForSingleObject(hthread,INFINITE);
So the main thread will be more properly:
{
data* pdata = new data;
HANDLE hthread = (HANDLE)_beginthreadex(0,0,yourprocedure, pdata,0,0);
WaitForSingleObject(htread,INFINITE);
delete pdata;
}
or even
{
data d;
HANDLE hthread = (HANDLE)_beginthreadex(0,0,yourprocedure, &d,0,0);
WaitForSingleObject(htread,INFINITE);
}
I think the question is valid in another context.
As others have pointed out using a struct and the contents is safe (although access to the data should by synchronized).
However I think that the question is valid if you hav an atomic variable (or a pointer to one) that can be changed outside the thread. My opinion in that case would be that volatile should be used in this case.
Edit:
I think the examples on the wiki page are a good explanation http://en.wikipedia.org/wiki/Volatile_variable
Related
I have a Fortran program that calls a C++ dll to do some mathematical operations on 10000 sets of data. The data sets are totally independent from each other. I was planning to create a thread pool and then send tasks to it. However, the call to the dll will be made more than 1000 times (each call the 10000 sets of data are being processed).
My question is: when I create the thread pool during the first call to the dll, what happens to this thread pool after the function in the dll returns ? Can the second call (and the remaining 998 calls) access the pool that was created during the first call.
You can indeed use the same thread pool, if you set things up right.
Objects created on the stack of the FORTRAN->C++ calling thread will be destroyed as that stack unwinds and control returns to FORTRAN, so it's not a good idea to have the thread pool management data on that stack. You can, however:
launch another thread that creates the thread pool management data/object, or
allocate on the heap (using new) to decouple lifetime from the FORTRAN->C++ calls.
The latter is probably easier and cleaner... a pointer to the heap object/data managing the thread pool can be returned to FORTRAN and used as a "handle" for future calls, indicating the same thread pool should be used.
If you have control over the fortran code, you can save yourself some sneaky hiding of your state you maintain by using 3 functions instead of one.
someStateHandle PrepareBackgroundWork();
// Then you do your actual call series...
DoMyMath(someStateHandle, args...);
// And when you are done with all that, you call
FinalizeBackgroundWork(someStateHandle);
If you do not have control over the fortran code, you will have to decide what you want to keep around (Threadpool stuff or thread handles and a few synchronization objects) and lazily initialize them.
struct MyWorkerContext
{
size_t numberOfWorkerThreads;
std::vector<HANDLE> workerHandles;
// ...
};
static MyWorkerContext* s_context = NULL; // Sorry - looks like a singleton to me.
void DoMyMath( args..)
{
if(NULL == s_context) InitializeContext();
if( NULL != s_context )
{
// do the calculations using all that infrastructure.
}
}
E.g. in DLLMain() or hopefully earlier: clean up s_context.
Last not least, I think there is a "default thread pool", you might be able to use for that as well instead of creating your own.
I write a DLL MyDLL.dll with Visual C++ 2008, as follows:
(1) MFC static linked
(2) Using multi-thread runtime library.
In the DLL, this is a global data m_Data shared by two export functions, as follows:
ULONGLONG WINAPI MyFun1(LPVOID *lpCallbackFun1)
{
...
Write m_Data(using Critical section to protect)
…
return xxx;
}
ULONGLONG WINAPI MyFun2(LPVOID *lpCallbackFun2)
{
...
Suspend MyThread1 to prevent conflict.
Read m_Data(using Critical section to protect)
Resume MyThread1.
…
return xxx;
}
In in my main application, it will first call LoadLibrary to load MyDLL.dll, then get the address of MyFun1 and MyFun2, then do the following thing:
(1) Start a new thread MyThread1, which will invoke MyFun1 to do a time-consuming task.
(2) Start a new thread MyThread2, which will invoke MyFun2 for several times, as follows:
for (nIndex = 0; nIndex = 20; nIndex)
{
nResult2 = MyFun2(lpCallbackFun2);
NextStatement2;
}
Although MyThread1 and MyThread2 using critical section to protect the shared data m_Data, I will still suspend MyThread1 before accessing the shared data, to prevent any possible conflicts.
The problem is:
(1) When the first invoke of MyFun2, everything is OK, and the return value of MyFun2(that is nResult2) is 1 , which is expected.
(2) When the second, third and fourth invoke of MyFun2, the operations in MyFun2 are executed successfully, but the return value of MyFun2(that is nResult2) is a random value instead of the expected value 1. I try to using Debug to trace into MyFun2, and confirm that the last return statement is just return a value of 1, but the invoker will receive a random value instead of 1 when inspecting nResult2.
(3) After the fourth invoke of MyFun2 and return back to the next statement follow MyFun2, I will always get a “buffer overrun detected” error, whatever the next statement is.
I think this looks like a stack corruption, so try to make some tests:
I confirm the /GS (Stack security check) feature in the compiler is ON.
If MyFun2 is invoked after MyFun1 in MyThread1 is completed, then everything will be OK.
In debug mode, the codeline in MyFun2 that reads the shared data m_Data will not cause any errors or exceptions. Neither will the codeline in MyFun1 that writes the shared Data.
So, how to solve this problem
Thank you!
I suppose at this line
Suspend MyThread1 to prevent conflict.
you are using SuspendThread() function. That's what its documentation says:
This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread. To avoid this situation, a thread within an application that is not a debugger should signal the other thread to suspend itself. The target thread must be designed to watch for this signal and respond appropriately.
So, in short: don't use it. Critical sections and other synchronization objects do their job just fine.
Never use SupsendThread!!! NEVER!
SuspendThread is only used for Debugging purpose.
The reason is simple. You don't know where you suspend the thread. It may be just in time, when the thread blocks a resource that you want to use. Also a bunch of CRT function use thread synchronisation.
Just use critcal sectins or mutexes.
Just see the simple sample here: http://blog.kalmbachnet.de/?postid=6 and here
http://blog.kalmbachnet.de/?postid=16
Since this is a windows program you could use windows based mutex or semaphore and WaitForSingleObject when reading or writing shared data.
This question already has answers here:
Multithreading program stuck in optimized mode but runs normally in -O0
(3 answers)
Closed 1 year ago.
In a code review today, I stumbled across the following bit of code (slightly modified for posting):
while (!initialized)
{
// The thread can start before the constructor has finished initializing the object.
// Can lead to strange behavior.
continue;
}
This is the first few lines of code that runs in a new thread. In another thread, once initialization is complete, it sets initialized to true.
I know that the optimizer could turn this into an infinite loop, but what's the best way to avoid that?
volatile - considered harmful
calling an isInitialized() function instead of using the variable directly - would this guarantee a memory barrier? What if the function was declared inline?
Are there other options?
Edit:
Should have mentioned this sooner, but this is portable code that needs to run on Windows, Linux, Solaris, etc. We use mostly use Boost.Thread for our portable threading library.
Calling a function won't help at all; even if a function is not declared inline, its body can still be inlined (barring something extreme, like putting your isInitialized() function in another library and dynamically linking against it).
Two options that come to mind:
Declare initialized as an atomic flag (in C++0x, you can use std::atomic_flag; otherwise, you'll want to consult the documentation for your threading library for how to do this)
Use a semaphore; acquire it in the other thread and wait for it in this thread.
#Karl's comment is the answer. Don't start processing in thread A until thread B has finished initialization. They key to doing this is sending a signal from thread B to thread A that it is up & running.
You mentioned no OS, so I will give you some Windows-ish psudocode. Transcode to the OS/library of your choice.
First create a Windows Event object. This will be used as the signal:
Thread A:
HANDLE running = CreateEvent(0, TRUE, FALSE, 0);
Then have Thread A start Thread B, passing the event along to it:
Thread A:
DWORD thread_b_id = 0;
HANDLE thread_b = CreateThread(0, 0, ThreadBMain, (void*)handle, 0, &thread_b_id);
Now in Thread A, wait until the event is signaled:
Thread A:
DWORD rc = WaitForSingleObject(running, INFINITE);
if( rc == WAIT_OBJECT_0 )
{
// thread B is up & running now...
// MAGIC HAPPENS
}
Thread B's startup routine does its initialization, and then signals the event:
Thread B:
DWORD WINAPI ThreadBMain(void* param)
{
HANDLE running = (HANDLE)param;
do_expensive_initialization();
SetEvent(running); // this will tell Thread A that we're good to go
}
Synchronization primitives are the solution to this problem, not spinning in a loop... But if you must spin in a loop and can't use a semaphore, event, etc, you can safely use volatile. It's considered harmful because it hurts the optimizer. In this case that's exactly what you want to do, no?
There is a boost equivalent of atomic_flag which is called once_flag in boost::once. It may well be what you want here.
Effectively if you want something to be constructed the first time it is called, eg lazy loading, and happens in multiple threads, you get boost::once to call your function the first time it is reached. The post-condition is that it has been initialized so there is no need for any kind of looping or locking.
What you do need to ensure is that your initialization logic does not throw exceptions.
This is a well known problem when working with threads. Creation/Initialization of objects takes relatively little time. When the thread actually starts running though... That can take quite a long time in terms of executed code.
Everyone keeps mentioning semaphores...
You may want to look at POSIX 1003.1b semaphores. Under Linux, try man sem_init. E.g.:
http://manpages.ubuntu.com/manpages/dapper/man3/sem_init.3.html
http://www.skrenta.com/rt/man/sem_init.3.html
http://docs.oracle.com/cd/E23824_01/html/821-1465/sem-init-3c.html
These semaphores have the advantage that, once Created/Initialized, one thread can block indefinitely until signaled by another thread. More critically, that signal can occur BEFORE the waiting thread starts waiting. (A significant difference between Semaphores and Condition Variables.) Also, they can handle the situation where you receive multiple signals before waking up.
Recently I heard that memory in the stack is not shared with other thread and memory in the heap is shared with other threads.
I normally do:
HWND otherThreadHwnd;
DWORD commandId;
// initialize commandId and otherThreadHwnd
struct MyData {
int data1_;
long data2_;
void* chunk_;
};
int abc() {
MyData myData;
// initialize myData
SendMessage(otherThreadHwnd,commandId,&myData);
// read myData
}
Is it alright to do this?
Yes, it is safe in this instance.
Data on the stack only exists for the lifetime of the function call. Since SendMessage is a synchronous, blocking call, the data will be valid for the duration of that call.
This code would be broken if you replace SendMessage with a call to PostMessage, SendNotifyMessage, or SendMessageCallback, since they would not block and the function may have returned before the target window received the message.
I think 2 different issues are being confused by whoever you "heard that memory in the stack is not shared with other thread":
object lifetime - the data on the stack is only valid as long the thread doesn't leave the scope of the variable's name. In the example you giove, you're handling this by making the call to the other thread synchronously.
memory address visibility - the addresses pspace for a process is shared among the various threads in that process. So variables addressable by one thread are addressable by other threads in that process. If you are passing the address to a thread in a different process, the situation is quite different and you'd need to use some other mechanism (which might be to ensure that the memory block is mapped into both processes - but that I don't think that can normally be done with stack memory).
Yes, it is okay.
SendMessage is working in blocking mode. Even if myData is allocated in stack, its address is still visible to all threads in the process. Each thread has its own private stack; but data in the stack can be explicitly shared, for example, by your code. However, as you guess, do not use PostThreadMessage in such case.
What you heard about is "potential infringement of privacy", which is sharing the data on one thread's private stack with another thread.
Although it is not encouraged, it is only a "potential" problem--with correct synchronization, it can be done safely. In your case, this synchronization is done by ::SendMessage(); it will not return until the message is processed in the other thread, so the data will not go out of scope on the main thread's stack. But beware that whatever you do with this pointer in the worker thread, it must be done before returning from the message handler (if you're storing it somewhere, be sure to make a copy).
As others have said already, how you have it written is just fine, and in general, nothing will immediately fail when passing a pointer to an object on the stack to another thread as long as everything's synchronized. However, I tend to cringe a little when doing so because things that seem threadsafe can get out of their intended order when an exception occurs or if one of the threads is involved with asynchronous IO callbacks. In the case of an exception in the other thread during your call to SendMessage, it may return 0 immediately. If the exception is later handled in the other thread, you may have an access violation. Yet another potential hazard is that whatever's being stored on the stack can never be forcibly disposed of from another thread. If it gets stuck waiting for some callback, object, etc, forever and the user has decided to cancel or quit the application, there is no way for the working thread to be sure the stalled thread has tidied up whatever objects are on its stack.
My point is this: In simple scenarios as you've described where everything works perfectly, nothing ever changes, and no outside dependencies fail, sharing pointers to the local stack is safe - but since allocating on the heap is really just as simple, and it gives you the opportunity to explicitly control the object's lifetime from any thread in extenuating circumstances, why not just use the heap?
Finally, I strongly suggest that you be very careful with the void* chunk_ member of your MyData structure, as it is not threadsafe as described if it's copied in the other thread.
Is the following safe?
I am new to threading and I want to delegate a time consuming process to a separate thread in my C++ program.
Using the boost libraries I have written code something like this:
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
Where finished_flag is a boolean member of my class. When the thread is finished it sets the value and the main loop of my program checks for a change in that value.
I assume that this is okay because I only ever start one thread, and that thread is the only thing that changes the value (except for when it is initialised before I start the thread)
So is this okay, or am I missing something, and need to use locks and mutexes, etc
You never mentioned the type of finished_flag...
If it's a straight bool, then it might work, but it's certainly bad practice, for several reasons. First, some compilers will cache the reads of the finished_flag variable, since the compiler doesn't always pick up the fact that it's being written to by another thread. You can get around this by declaring the bool volatile, but that's taking us in the wrong direction. Even if reads and writes are happening as you'd expect, there's nothing to stop the OS scheduler from interleaving the two threads half way through a read / write. That might not be such a problem here where you have one read and one write op in separate threads, but it's a good idea to start as you mean to carry on.
If, on the other hand it's a thread-safe type, like a CEvent in MFC (or equivilent in boost) then you should be fine. This is the best approach: use thread-safe synchronization objects for inter-thread communication, even for simple flags.
Instead of using a member variable to signal that the thread is done, why not use a condition? You are already are using the boost libraries, and condition is part of the thread library.
Check it out. It allows the worker thread to 'signal' that is has finished, and the main thread can check during execution if the condition has been signaled and then do whatever it needs to do with the completed work. There are examples in the link.
As a general case I would neve make the assumption that a resource will only be modified by the thread. You might know what it is for, however someone else might not - causing no ends of grief as the main thread thinks that the work is done and tries to access data that is not correct! It might even delete it while the worker thread is still using it, and causing the app to crash. Using a condition will help this.
Looking at the thread documentation, you could also call thread.timed_join in the main thread. timed_join will wait for a specified amount for the thread to 'join' (join means that the thread has finsihed)
I don't mean to be presumptive, but it seems like the purpose of your finished_flag variable is to pause the main thread (at some point) until the thread thrd has completed.
The easiest way to do this is to use boost::thread::join
// launch the thread...
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
// ... do other things maybe ...
// wait for the thread to complete
thrd.join();
If you really want to get into the details of communication between threads via shared memory, even declaring a variable volatile won't be enough, even if the compiler does use appropriate access semantics to ensure that it won't get a stale version of data after checking the flag. The CPU can issue reads and writes out of order as long (x86 usually doesn't, but PPC definitely does) and there is nothing in C++9x that allows the compiler to generate code to order memory accesses appropriately.
Herb Sutter's Effective Concurrency series has an extremely in depth look at how the C++ world intersects the multicore/multiprocessor world.
Having the thread set a flag (or signal an event) before it exits is a race condition. The thread has not necessarily returned to the OS yet, and may still be executing.
For example, consider a program that loads a dynamic library (pseudocode):
lib = loadLibrary("someLibrary");
fun = getFunction("someFunction");
fun();
unloadLibrary(lib);
And let's suppose that this library uses your thread:
void someFunction() {
volatile bool finished_flag = false;
thrd = new boost::thread(boost::bind(&myclass::mymethod, this, &finished_flag);
while(!finished_flag) { // ignore the polling loop, it's besides the point
sleep();
}
delete thrd;
}
void myclass::mymethod() {
// do stuff
finished_flag = true;
}
When myclass::mymethod() sets finished_flag to true, myclass::mymethod() hasn't returned yet. At the very least, it still has to execute a "return" instruction of some sort (if not much more: destructors, exception handler management, etc.). If the thread executing myclass::mymethod() gets pre-empted before that point, someFunction() will return to the calling program, and the calling program will unload the library. When the thread executing myclass::mymethod() gets scheduled to run again, the address containing the "return" instruction is no longer valid, and the program crashes.
The solution would be for someFunction() to call thrd->join() before returning. This would ensure that the thread has returned to the OS and is no longer executing.