I'm creating a pool of threads that cosumes a buffer and do some actions on my application. I've created a std::list m_buffer and sometimes the application crashes on the end of the buffer.
Here's my code:
MyObject* myObject = 0;
bool hasMore = true;
while(hasMore)
{
{
boost::unique_lock lck(m_loadMutex1);
if(!m_buffer.size())
break;
myObject = m_buffer.front();
m_buffer.pop_front();
hasMore = m_buffer.size();
}
}
if(myObject)
loadMyObject(myObject);
I'm sure the list never starts empty. And a lot of threads executes this piece of code at the same time. And testing it on Windows sometimes the application crashes and the debugger says it was on pop_front. But I can't believe its there because I check if the size is more than 0.
Thanks for helping.
Set first-chance exceptions in the debugger and see what the exception is that is being thrown.
I can't see a way that the code as posted would contain such a problem.
Most likely somewhere ELSE you're adding or removing from the list, unprotected, at the same time this function is popping.
Alternately there could be random memory corruption. Check the size of the buffer in the debugger when the pop_front fails.
I would review how the program terminates. Maybe the list has been deallocated. This could happen in a variety if ways. For example: the list is created on the main thread, the main thread finishes and deallocates the list, and the other threads are still using the list. I mention this particular example because I do not see any logic in the thread method above that lets each thread know it is 'shutdown' time.
Related
I write a DLL MyDLL.dll with Visual C++ 2008, as follows:
(1) MFC static linked
(2) Using multi-thread runtime library.
In the DLL, this is a global data m_Data shared by two export functions, as follows:
ULONGLONG WINAPI MyFun1(LPVOID *lpCallbackFun1)
{
...
Write m_Data(using Critical section to protect)
…
return xxx;
}
ULONGLONG WINAPI MyFun2(LPVOID *lpCallbackFun2)
{
...
Suspend MyThread1 to prevent conflict.
Read m_Data(using Critical section to protect)
Resume MyThread1.
…
return xxx;
}
In in my main application, it will first call LoadLibrary to load MyDLL.dll, then get the address of MyFun1 and MyFun2, then do the following thing:
(1) Start a new thread MyThread1, which will invoke MyFun1 to do a time-consuming task.
(2) Start a new thread MyThread2, which will invoke MyFun2 for several times, as follows:
for (nIndex = 0; nIndex = 20; nIndex)
{
nResult2 = MyFun2(lpCallbackFun2);
NextStatement2;
}
Although MyThread1 and MyThread2 using critical section to protect the shared data m_Data, I will still suspend MyThread1 before accessing the shared data, to prevent any possible conflicts.
The problem is:
(1) When the first invoke of MyFun2, everything is OK, and the return value of MyFun2(that is nResult2) is 1 , which is expected.
(2) When the second, third and fourth invoke of MyFun2, the operations in MyFun2 are executed successfully, but the return value of MyFun2(that is nResult2) is a random value instead of the expected value 1. I try to using Debug to trace into MyFun2, and confirm that the last return statement is just return a value of 1, but the invoker will receive a random value instead of 1 when inspecting nResult2.
(3) After the fourth invoke of MyFun2 and return back to the next statement follow MyFun2, I will always get a “buffer overrun detected” error, whatever the next statement is.
I think this looks like a stack corruption, so try to make some tests:
I confirm the /GS (Stack security check) feature in the compiler is ON.
If MyFun2 is invoked after MyFun1 in MyThread1 is completed, then everything will be OK.
In debug mode, the codeline in MyFun2 that reads the shared data m_Data will not cause any errors or exceptions. Neither will the codeline in MyFun1 that writes the shared Data.
So, how to solve this problem
Thank you!
I suppose at this line
Suspend MyThread1 to prevent conflict.
you are using SuspendThread() function. That's what its documentation says:
This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread. To avoid this situation, a thread within an application that is not a debugger should signal the other thread to suspend itself. The target thread must be designed to watch for this signal and respond appropriately.
So, in short: don't use it. Critical sections and other synchronization objects do their job just fine.
Never use SupsendThread!!! NEVER!
SuspendThread is only used for Debugging purpose.
The reason is simple. You don't know where you suspend the thread. It may be just in time, when the thread blocks a resource that you want to use. Also a bunch of CRT function use thread synchronisation.
Just use critcal sectins or mutexes.
Just see the simple sample here: http://blog.kalmbachnet.de/?postid=6 and here
http://blog.kalmbachnet.de/?postid=16
Since this is a windows program you could use windows based mutex or semaphore and WaitForSingleObject when reading or writing shared data.
for 2 days now, I have been trying to find out where is the problem in my code. I have isolated the problem like this:
There is loop which look like this:
int test_counter = 0; //Debug purpose only
for (const_iterator i = begin(); i != end(); i++, test_counter++){
if ((*i)->isSoloed()) {
soloed = (*i);
break;
}
}
It is in one method of the class that inherits std::list. The list contains pointers to some dynamically allocated instances of some class, but that is likely not important here.
The list contains exactly two pointers.
The problem is that in about 20% runs, the second pass (test_counter == 1) crashes on (*i)->isSoloed() with access violation. In this case, the iterator value is 0xfeeefeee. This exact value is used by VisualStudio to indicate that the memory has been freed. Well that doesn't make any sense from at least 3 reasons:
No memory gets dealocated here or in another threads
Even if so, how would the iterator get that value???
If in the case of
crash (the exception window) I click break and look at the second
items in the list looks intact and everything seems OK.
Note that this is a multithreaded code which is likely to be the problem here, but the loop is read-only (I even used the const_iterator) and the other thread that has the pointer to this list does not write in the time when the loop is running. But even so, how could that affect the value of the iterator which is a local variable here!
Thanks a lot.
//edit:
I have also noticed 2 more interesting things:
1) if I break the debugging after the access violation occurs, I can go back (by dragging the next commant to execute arrow) before the loop and run it again without any problem. So the problem is unfortunatelly pretty undeterministic.
2) I have never been able to reproduce the problem in release build.
The signature of the method:
MidiMessageSequence::MidiEventHolder* getNextActiveEvent();
and it is called like this:
currentEvent = workingTrackList->getNextActiveEvent();
nothing special really. The application uses JUCE library, but that shouldn't be a problem. I can post more code, just tell me what should I post.
Two possible reasons.
1: The memory that i pointed to is deleted before it is accessed (*i). Try to add a check if(i) before access i (*i)->isSoloed().
2: Try to add a lock before you access the list or list item each time.
I haven't found out where exactly was the problem but I canceled the inheritance and agregated the std::list instead. With this the TrackList class became just a sort of wrapper around the std::list. I put a scoped lock in every method that accesses the list (in the wrapper, so from outside it works the same and I do not need to care about locking from outside) and this pretty much solved the problem.
I am having problems with stack corruption in a new module I am working on which is part of a large legacy project. My code is written in C++ using Borland C++Builder 5.0.
I have tracked the problem to the following function:
// Note: Class TMarshalServerClientThread has the following objects defined
// CRITICAL_SECTION FCriticalSection;
// std::vector<TMarshalTagInfo*> FTagChangeQueue;
void __fastcall TMarshalServerClientThread::SendChangeNotifications()
{
EnterCriticalSection(FCriticalSection);
try {
if (FTagChangeQueue.size() == 0) {
return;
}
// Process items in change queue
FTagChangeQueue.clear();
} __finally {
LeaveCriticalSection(FCriticalSection);
}
}
This function is called in the context of a worker thread (which descends from TThread). A different thread populates the change queue with data as it becomes available. The change queue is protected by a critical section object.
When the code is run, I sporadically get access violations when attempting to leave the critical section. From what I can tell, sometimes when the __finally section is entered, the stack is corrupted. The class instance on the heap is fine, but the pointers to the class (such as the "this" pointer) appear to be invalid.
If I remove the call to return if the change queue is empty, the problem goes away. Additionally, the code to process the items in the queue is not the source of the problem, as I can comment it out and the problem remains.
So my question is are there known issues when using __finally in C++Builder 5? Is it wrong to call return from within a try __finally block? If so, why?
Please note that I realize that there are different/better ways to do what I am doing, and I am refactoring as such. However, I fail to see why this codes should be causing stack corruption.
As #duDE pointed, you should use pair of __try, __finally instead of intermixing C++ try, and Borland extension __finally.
I know it is a long time after the original question was posted, but as a warning to others, I can vouch for the symptom that Jonathan Wiens is reporting. I experienced it with Builder XE4. It does not happen frequently, but it seems that Borland/Embarcadero's implementation of try / finally blocks in a multi-threaded process very occasionally corrupts the stack. I was also using critical sections, although that may be coincidental.
I was able to resolve my problem by discarding the try / finally. I was fortunate that I was only deleting class instances in the finally block, so I was able to replace the try / finally with scope braces, using std::auto_ptr fields to delete the objects in question.
Or at least I think the problem involves some kind of memory error. I'm making a program in SFML and I'm currently working on the menus using a GUI class that I made just for SFML. Internally, the GUI class uses std::shared_ptr to manage all of its internal pointers. The program consistently crashes after main() exits and all global destructors have been called, and gdb says a break point was triggered in ntdll!WaitForAlpCompletion, which leads me to believe that the problem is memory corruption. Whenever I remove the GUI instantiation from the menu function, it exits and closes with no errors. This seems to indicate GUI as the cause of the crash, except that sub-menus which create and destroy their own instances of GUI can be called and exited without any crashes or break points.
Some psuedocode:
SubMenu
{
Create GUI
Do Menu
Destroy GUI
}
Menu
{
Create GUI
Do Menu?SubMenu
Destroy GUI
}
main
{
Init Stuff
Menu
UnInit Stuff
Destroy GUI
return 0
}
//after return
Global Dtors
Breakpoint triggered???
I'm at a loss as to what this could be. I plan on using some memory debugger like valgrind sometime today, but I was wondering if anyone else had any ideas on what this could be.
Finally figured it out!!!!! It turns out that std::map calls the destructors of its objects every time it is re-sized, causing the shared_ptr's within to delete their data several times. A few "quick" design changes and fixed :) Thanks guys!
A heap corruption can be caused with this code:
int main()
{
int *A(new(std::nothrow) int(10));
int *B(A);
delete B;
delete A;
}
Does any of your code contain this similar situation?
[Edit: (copied from a comment) As it turns out, the problem was elsewhere, but thank you all for your input.]
I have a shared container class which uses a single mutex to lock the push() and pop() functions, since I don't want to simultaneously modify the head and tail. Here's the code:
int Queue::push( WorkUnit unit )
{
pthread_mutex_lock( &_writeMutex );
int errorCode = 0;
try
{
_queue.push_back( unit );
}
catch( std::bad_alloc )
{
errorCode = 1;
}
pthread_mutex_unlock( &_writeMutex );
return errorCode;
}
When I run this in debug mode, everything is peachy. When I run in release mode, I get crashes at roughly the time when the driver program starts pushing and popping "simultaneously". Does the try/catch block immediately force an exit if it catches a std::bad_alloc exception? If so, should I enclose the remainder of the function in a finally block?
Also, is it possible that the slower debug mode only succeeds because my push() and pop() calls never actually occur at the same time?
In C++ we use Resource Acquisition Is Initialization (RAII) for guarding against exceptions.
Is this actually bombing after an exception? Far more likely from your snippet is that you just have bad synchronization in place. That starts with the name of your mutex: "writeMutex". This is not going to work if there is also a "readMutex". All reading, peeking and writing operations need to be locked by the same mutex.
Does the try/catch block immediately
force an exit if it catches a
std::bad_alloc exception?
No. If a std::bad_alloc is thrown inside the try {...} block, the code in the catch {...} block will fire.
If your program is actually crashing, then it seems like either your push_back call is throwing some exception other than bad_alloc (which isn't handled in your code), or the bad_alloc is being thrown somewhere outside the try {...} block.
By the way, are you sure you really want to use a try...catch block here?
plus
what does the pop look like
create a lock wrapper class that will automatically free the lock when it goes out of scope (as in RAII comment)
c++ does not have finally (thanks to mr stoustrop being stroppy)
i would catch std::exception or none at all (ducks head down for flame war). If u catch none then you need the wrapper
Regarding release/debug: Yes, you will often find race condition change between the two types of builds. When you deal with synchronization, your threads will run with different level of training. Well written threading will mostly run concurrently while poorly written threading the threads will in a highly synchronous manner relative to each other. All types of synchronization yield some level synchronous behavior. It as if synchronous and synchronization come from the same root word...
So yes, given the slightly different run-time performance between debug and release, those points where the threads synchronize can sometimes cause bad code to manifest in one type of build and not the other.
You need to use RAII
This basically means using the constructor/destructor to lock/unlock the resource.
This gurantees that the mutex will always be unlocked even when exceptions are around.
You should only be using one mutex for access to the list.
Even if you have a read only mutex that is used by a thread that only reads. That does not mean it is safe to read when another thread is updating the queue. The queue could be in some intermediate state caused by a thread calling push() while another thread is trying ti navigate an invlide intermediate state.
class Locker
{
public:
Locker(pthread_mutex_t &lock)
:m_mutex(lock)
{
pthread_mutex_lock(&m_mutex);
}
~Locker()
{
pthread_mutex_unlock(&m_mutex);
}
private:
pthread_mutex_t& m_mutex;
};
int Queue::push( WorkUnit unit )
{
// building the object lock calls the constructor thus locking the mutex.
Locker lock(_writeMutex);
int errorCode = 0;
try
{
_queue.push_back( unit );
}
catch( std::bad_alloc ) // Other exceptions may happen here.
{ // You catch one that you handle locally via error codes.
errorCode = 1; // That is fine. But there are other exceptions to think about.
}
return errorCode;
} // lock destructor called here. Thus unlocking the mutex.
PS. I hate the use of leading underscore.
Though technically it is OK here (assuming member variables) it is so easy to mess up that I prefer not to pre pend '' to idnetifiers. See What are the rules about using an underscore in a C++ identifier? for a whole list of rules to do about '' in identifier names.
Previous code sample with Locker class has a major problem:
What do you do when and if pthread_mutex_lock() fails?
The answer is you must throw an exception at this point, from constructor, and it can be caught.
Fine.
However,
According to c++ exception specs throwing an exception from a destructor is a no-no.
HOW DO YOU HANDLE pthread_mutex_unlock FAILURES?
Running code under any instrumentation software serves no purpose whatsoever.
You have to right code that works, not run it under valgrind.
In C it works perfectly fine:
pthread_cleanup_pop( 0 );
r = pthread_mutex_unlock( &mutex );
if ( r != 0 )
{
/* Explicit error handling at point of occurance goes here. */
}
But because c++ is a software abortion there just no reasonable way to deal with threaded coded failures with any degree of certainty. Brain-dead ideas like wrapping pthread_mutex_t into a class that adds some sort of state variable is just that - brain dead. The following code just does not work:
Locker::~Locker()
{
if ( pthread_mutex_unlock( &mutex ) != 0 )
{
failed = true; // Nonsense.
}
}
And the reason for that is that after pthread_mutex_unlock() returns this thread very well may be sliced out of cpu - preempted. That means that the .failed public variable will be still false. Other threads looking at it will get wrong information - the state variable says no failures, meanwhile pthread_mutex_unlock() failed. Even if, by some stroke of luck, these two statements run in one go, this thread may be preempted before ~Locker() returns and other threads may modify the value of .failed. Bottom line these schemes do not work - there is no atomic test-and-set mechanism for application defined variables.
Some say, destructors should never have code that fails. Anything else is a bad design. Ok, fine. I am just curious to see what IS a good design to be 100% exception and thread safe in c++.