I posted a previous question "Seg Fault when using std::string on an embedded Linux platform" where I got some very useful advise. I have been away on other projects since then and have recently returned to looking at this issue.
To reiterate, I am restricted to using the arm-linux cross compiler (version 2.95.2) as this is what is supplied and supported by the embedded platform vendor. I understand that the issue is likely because the stdlib is very old, and not particularly thread safe.
The problem is that whenever I use the STL containers in multiple threads, I end up with a segmentation fault. The code below will consistently seg fault unless I use pthread_mutex_lock and scope operators around the container declarations (as in other post).
It is not feasible to use this approach in my application as I pass the containers around to different methods and classes. I would ideally like to solve this problem, or find a suitable alternative. I have tried STLPort and SGI's Standard Template Library with the same results. I can only assume that because they are being linked by the very old gcc, they cannot solve the problem.
Does anyone have any possible recommendations or solutions? Or perhaps you can suggest an implementation of vector (and string) that I can drop into my code?
Thanks in advance for any guidance.
#include <stdio.h>
#include <vector>
#include <list>
#include <string>
using namespace std;
/////////////////////////////////////////////////////////////////////////////
class TestSeg
{
static pthread_mutex_t _logLock;
public:
TestSeg()
{
}
~TestSeg()
{
}
static void* TestThread( void *arg )
{
int i = 0;
while ( i++ < 10000 )
{
printf( "%d\n", i );
WriteBad( "Function" );
}
pthread_exit( NULL );
}
static void WriteBad( const char* sFunction )
{
//pthread_mutex_lock( &_logLock );
//{
printf( "%s\n", sFunction );
string sKiller; // <----------------------------------Bad
//list<char> killer; // <----------------------------------Bad
//vector<char> killer; // <----------------------------------Bad
//}
//pthread_mutex_unlock( &_logLock );
return;
}
void RunTest()
{
int threads = 100;
pthread_t _rx_thread[threads];
for ( int i = 0 ; i < threads ; i++ )
{
pthread_create( &_rx_thread[i], NULL, TestThread, NULL );
}
for ( int i = 0 ; i < threads ; i++ )
{
pthread_join( _rx_thread[i], NULL );
}
}
};
pthread_mutex_t TestSeg::_logLock = PTHREAD_MUTEX_INITIALIZER;
int main( int argc, char *argv[] )
{
TestSeg seg;
seg.RunTest();
pthread_exit( NULL );
}
The issue is not with the containers, it's with your code.
It is completely unnecessary to make the containers themselves threadsafe, because what you need, first and foremost, is transaction-like semantics.
Let's assume, for the sake of demonstration, that you have a threadsafe implementation of vector, for example.
Thread 1: if (!vec.empty())
Thread 2: vec.clear();
Thread 1: foo = vec.front();
This leads to undefined behavior.
The issue is that having each operation on the container threadsafe is pretty much pointless because you are still required to be able to lock the container itself for several operations in a row. Therefore you would lock for your various operations, and then lock again on each and every operation ?
As I said: completely unnecessary.
Part of your query might be answered in another thread. The design of C++, including the standard library, is influenced by many factors. Efficiency is a repeated theme. Thread safety mechanisms often are at odds with an objective of efficiency. The age of the library is not really the issue.
For your situation, you may be able to wrap the STL vector in your own vector class (you might consider a Decorator) that contains the locking mechanism and provides the lock/unlock logic around accesses.
Related
I've been trying to use c++ primitives operators and variables, like int, if, and while to develop a thread-safe mechanism.
My idea is to use two integers variables called sync and lock, incrementing and checking the sync and after that incrementing and checking the lock. If all the checkings are successful, then the lock is guarantee, but it tries again if the checking is unsuccessful.
It seems that my idea is not working properly as it is asserting in the final verification.
#include <assert.h>
#include <iostream>
#include <string>
#include <thread>
#include <vector>
class Resource {
// Shared resource to be thread safe.
int resource;
// Mutual exclusion variables.
volatile int lock;
volatile int sync;
public:
Resource() : resource( 0 ), lock( 0 ), sync( 0 ) {}
~Resource() {}
int sharedResource() {
return resource;
}
void sharedResourceAction( std::string id ) {
bool done;
do {
int oldSync = sync;
// ++ should be atomic.
sync++;
if ( sync == oldSync + 1 ) {
// ++ should be atomic.
lock++;
if ( lock == 1 ) {
// For the sake of the example, the read-modify-write
// is not atomic and not thread safe if threre is no
// mutex surronding it.
int oldResource = resource;
resource = oldResource + 1;
done = true;
}
// -- should be atomic.
lock--;
}
if ( !done ) {
// Pseudo randomic sleep to unlock the race condition
// between the threads.
std::this_thread::sleep_for(
std::chrono::microseconds( resource % 5 ) );
}
} while( !done );
}
};
static const int maxThreads = 10;
static const int maxThreadActions = 1000;
void threadAction( Resource& resource, std::string& name ) {
for ( int i = 0; i < maxThreadActions; i++) {
resource.sharedResourceAction( name );
}
}
int main() {
std::vector< std::thread* > threadVec;
Resource resource;
// Create the threads.
for (int i = 0; i < maxThreads; ++i) {
std::string name = "t";
name += std::to_string( i );
std::thread *thread = new std::thread( threadAction,
std::ref( resource ),
std::ref( name ) );
threadVec.push_back( thread );
}
// Join the threads.
for ( auto threadVecIter = threadVec.begin();
threadVecIter != threadVec.end(); threadVecIter++ ) {
(*threadVecIter)->join();
}
std::cout << "Shared resource is " << resource.sharedResource()
<< std::endl;
assert( resource.sharedResource() == ( maxThreads * maxThreadActions ) );
return 0;
}
Is there a thread-safe mechanism to protect shared resources using only primitives variables and operators?
No, there are a few reasons why this doesn't work
Firstly the standard describes it not to work. You've (explicitly) got a read/write and write/write race condition and the standard forbids this.
Secondly, ++i is in no way atomic. Even on mainstream intel processors it isn't - it'll usually be an inc instruction when it needs to be a lock inc instruction.
Thirdly, volatile has no threading meaning in c++ like it does in java or c#. Its neither necessary nor sufficient to achieve anything to do with threadsafety (outside of nasty compiler extensions like volatile:/ms). See this answer for more information about volatile in c++.
There may be more issues in your code but this list should be enough to dissuade you.
Edit: And to actually answer your final question - no I dont think its possible to implement thread safety mechanisms from primitive types and operations in a standard compliant way. Basically you need to get the memory-subsytem, cpu AND compiler to all agree not to perform some kinds of transformations when implementing thread safety mechanisms. This generally means you need to use compiler hooks or guarantees outside of the standard and also knowledge of the final target CPUs guarantees or intrinsics to achieve it.
volatile is absolutely no good for multithreading:
Within a thread of execution, accesses (reads and writes) through
volatile glvalues cannot be reordered past observable side-effects
(including other volatile accesses) that are sequenced-before or
sequenced-after within the same thread, but this order is not
guaranteed to be observed by another thread, since volatile access
does not establish inter-thread synchronization.
In addition, volatile
accesses are not atomic (concurrent read and write is a data race) and
do not order memory (non-volatile memory accesses may be freely
reordered around the volatile access).
If you want to have atomic operations on an integer, the proper way to do it is with std::atomic<int>. That gives you guarantees on memory ordering that will be observed by other threads. If you really want to do this sort of lock-free programming, you should sit and absorb the memory model documentation, and, if you're anything like me, strongly reconsider attempting lock-free programming as you try to stop your head exploding.
For example, I've got a some work that is computed simultaneously by multiple threads.
For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.
My problem is that the counter is updated by each thread.
As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?
Here is some sample code to illustrate the issue and help the discussion along.
(Im using boost threads)
class someTask {
public:
int mCounter; //initialized to 0
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
int getCount()
{
boost::mutex::scoped_lock lock( cntmutex );
return mCount;
}
void process( int thread_id, int numThreads )
{
while ( getCount() < mTotal )
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
// Wait for all thread to get to this point
cntmutex.lock();
mCounter++; // < ---- how to ensure this is only updated once?
cntmutex.unlock();
}
}
};
The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.
The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.
First we isolate the work to be done in its own routine:
void process_thread(size_t id, size_t numThreads) {
// do something
}
Now, we can easily invoke this routine:
#include <future>
#include <thread>
#include <vector>
void process(size_t const total, size_t const numThreads) {
for (size_t count = 0; count != total; ++count) {
std::vector< std::future<void> > results;
// Create all threads, launch the work!
for (size_t id = 0; id != numThreads; ++id) {
results.push_back(std::async(process_thread, id, numThreads));
}
// The destruction of `std::future`
// requires waiting for the task to complete (*)
}
}
(*) See this question.
You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.
Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!
You protected the counter with a mutex, ensuring that no two threads can access the counter at the same time. Your other option would be using Boost::atomic, c++11 atomic operations or platform-specific atomic operations.
However, your code seems to access mCounter without holding the mutex:
while ( mCounter < mTotal )
That's a problem. You need to hold the mutex to access the shared state.
You may prefer to use this idiom:
Acquire lock.
Do tests and other things to decide whether we need to do work or not.
Adjust accounting to reflect the work we've decided to do.
Release lock. Do work. Acquire lock.
Adjust accounting to reflect the work we've done.
Loop back to step 2 unless we're totally done.
Release lock.
You need to use a message-passing solution. This is more easily enabled by libraries like TBB or PPL. PPL is included for free in Visual Studio 2010 and above, and TBB can be downloaded for free under a FOSS licence from Intel.
concurrent_queue<unsigned int> done;
std::vector<Work> work;
// fill work here
parallel_for(0, work.size(), [&](unsigned int i) {
processWorkItem(work[i]);
done.push(i);
});
It's lockless and you can have an external thread monitor the done variable to see how much, and what, has been completed.
I would like to disagree with David on doing multiple lock acquisitions to do the work.
Mutexes are expensive and with more threads contending for a mutex , it basically falls back to a system call , which results in user space to kernel space context switch along with the with the caller Thread(/s) forced to sleep :Thus a lot of overheads.
So If you are using a multiprocessor system , I would strongly recommend using spin locks instead [1].
So what i would do is :
=> Get rid of the scoped lock acquisition to check the condition.
=> Make your counter volatile to support above
=> In the while loop do the condition check again after acquiring the lock.
class someTask {
public:
volatile int mCounter; //initialized to 0 : Make your counter Volatile
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
void process( int thread_id, int numThreads )
{
while ( mCounter < mTotal ) //compare without acquiring lock
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
cntmutex.lock();
//Now compare again to make sure that the condition still holds
//This would save all those acquisitions and lock release we did just to
//check whther the condition was true.
if(mCounter < mTotal)
{
mCounter++;
}
cntmutex.unlock();
}
}
};
[1]http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock
My program is randomly crashing in a small scenario I can reproduce, but it happens in mlock.c (which is a VC++ runtime file) from ntdll.dll, and I can't see the stack trace. I do know that it happens in one of my thread functions, though.
This is the mlock.c code where the program crashes:
void __cdecl _unlock (
int locknum
)
{
/*
* leave the critical section.
*/
LeaveCriticalSection( _locktable[locknum].lock );
}
The error is "invalid handle specified". If I look at locknum, it's a number larger than _locktable's size, so this makes some sense.
This seems to be related to Critical Section usage. I do use CRITICAL_SECTIONS in my thread, via a CCriticalSection wrapper class and its associated RAII guard, CGuard. Definitions for both here to avoid even more clutter.
This is the thread function that's crashing:
unsigned int __stdcall CPlayBack::timerThread( void * pParams ) {
#ifdef _DEBUG
DRA::CommonCpp::SetThreadName( -1, "CPlayBack::timerThread" );
#endif
CPlayBack * pThis = static_cast<CPlayBack*>( pParams );
bool bContinue = true;
while( bContinue ) {
float m_fActualFrameRate = pThis->m_fFrameRate * pThis->m_fFrameRateMultiplier;
if( m_fActualFrameRate != 0 && pThis->m_bIsPlaying ) {
bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, static_cast<DWORD>( 1000.0f / m_fActualFrameRate ) ) == WAIT_TIMEOUT );
CImage img;
if( pThis->m_bIsPlaying && pThis->nextFrame( img ) )
pThis->sendImage( img );
}
else
bContinue = ( ::WaitForSingleObject( pThis->m_hEndThreadEvent, 10 ) == WAIT_TIMEOUT );
}
::GetErrorLoggerInstance()->Log( LOG_TYPE_NOTE, "CPlayBack", "timerThread", "Exiting thread" );
return 0;
}
Where does CCriticalSection come in? Every CImage object contains a CCriticalSection object which it uses through a CGuard RAII lock. Moreover, every CImage contains a CSharedMemory object which implements reference counting. To that end, it contains two CCriticalSection's as well, one for the data and one for the reference counter. A good example of these interactions is best seen in the destructors:
CImage::~CImage() {
CGuard guard(m_csData);
if( m_pSharedMemory != NULL ) {
m_pSharedMemory->decrementUse();
if( !m_pSharedMemory->isBeingUsed() ){
delete m_pSharedMemory;
m_pSharedMemory = NULL;
}
}
m_cProperties.ClearMin();
m_cProperties.ClearMax();
m_cProperties.ClearMode();
}
CSharedMemory::~CSharedMemory() {
CGuard guardUse( m_cs );
if( m_pData && m_bCanDelete ){
delete []m_pData;
}
m_use = 0;
m_pData = NULL;
}
Anyone bumped into this kind of error? Any suggestion?
Edit: I got to see some call stack: the call comes from ~CSharedMemory. So there must be some race condition there
Edit: More CSharedMemory code here
The "invalid handle specified" return code paints a pretty clear picture that your critical section object has been deallocated; assuming of course that it was allocated properly to begin with.
Your RAII class seems like a likely culprit. If you take a step back and think about it, your RAII class violates the Sepration Of Concerns principle, because it has two jobs:
It provides allocate/destroy semantics for the CRITICAL_SECTION
It provides acquire/release semantics for the CRITICAL_SECTION
Most implementations of a CS wrapper I have seen violate the SoC principle in the same way, but it can be problematic. Especially when you have to start passing around instances of the class in order to get to the acquire/release functionality. Consider a simple, contrived example in psudocode:
void WorkerThreadProc(CCriticalSection cs)
{
cs.Enter();
// MAGIC HAPPENS
cs.Leave();
}
int main()
{
CCriticalSection my_cs;
std::vector<NeatStuff> stuff_used_by_multiple_threads;
// Create 3 threads, passing the entry point "WorkerThreadProc"
for( int i = 0; i < 3; ++i )
CreateThread(... &WorkerThreadProc, my_cs);
// Join the 3 threads...
wait();
}
The problem here is CCriticalSection is passed by value, so the destructor is called 4 times. Each time the destructor is called, the CRITICAL_SECTION is deallocated. The first time works fine, but now it's gone.
You could kludge around this problem by passing references or pointers to the critical section class, but then you muddy the semantic waters with ownership issues. What if the thread that "owns" the crit sec dies before the other threads? You could use a shared_ptr, but now nobody really "owns" the critical section, and you have given up a little control in on area in order to gain a little in another area.
The true "fix" for this problem is to seperate concerns. Have one class for allocation & deallocation:
class CCriticalSection : public CRITICAL_SECTION
{
public:
CCriticalSection(){ InitializeCriticalSection(this); }
~CCriticalSection() { DestroyCriticalSection(this); }
};
...and another to handle locking & unlocking...
class CSLock
{
public:
CSLock(CRITICAL_SECTION& cs) : cs_(cs) { EnterCriticalSection(&cs_); }
~CSLock() { LeaveCriticalSection(&cs_); }
private:
CRITICAL_SECTION& cs_;
};
Now you can pass around raw pointers or references to a single CCriticalSection object, possibly const, and have the worker threads instantiate their own CSLocks on it. The CSLock is owned by the thread that created it, which is as it should be, but ownership of the CCriticalSection is clearly retained by some controlling thread; also a good thing.
Make sure Critical Section object is not in #pragma packing 1 (or any non-default packing).
Ensure that no other thread (or same thread) is corrupting the CS object. Run some static analysis tool to check for any buffer overrun problem.
If you have runtime analysis tool, do run it to find the issue.
I decided to adhere to the KISS principle and rock and roll all nite simplify things. I figured I'd replace the CSharedMemoryClass with a std::tr1::shared_ptr<BYTE> and a CCriticalSection which protects it from concurrent access. Both are members of CImage now, and concerns are better separated now, IMHO.
That solved the weird critical section, but now it seems I have a memory leak caused by std::tr1::shared_ptr, you might see me post about it soon... It never ends!
This is related to an issue I have been discussing here and here, but as my investigations have led me away from the STL as the potential issue, and towards "new" as my nemisis, I thought it best to start a new thread.
To reiterate, I am using an arm-linux cross compiler (version 2.95.2) supplied by the embedded platform vendor.
When I run the application below on my Linux PC, it of course works never fails. However when running it on the embedded device, I get segmentation faults every time. Using "malloc" never fails. Synchronising the "new" allocation using the mutex will stop the issue, but this is not practical in my main application.
Can anyone suggest why this might be occurring, or have any ideas how I can get around this issue?
Thanks.
#include <stdio.h>
#include <pthread.h>
pthread_mutex_t _logLock = PTHREAD_MUTEX_INITIALIZER;
static void* Thread(void *arg)
{
int i = 0;
while (i++ < 500)
{
// pthread_mutex_lock(&_logLock);
char* myDyn = (char*) new char[1023];
// char* buffer = (char*) malloc(1023);
// if (buffer == NULL)
// printf("Out of mem\n");
// free(buffer);
delete[] myDyn;
//pthread_mutex_unlock(&_logLock);
}
pthread_exit(NULL);
}
int main(int argc, char** argv)
{
int threads = 50;
pthread_t _rx_thread[threads];
for (int i = 0; i < threads; i++)
{
printf("Start Thread: %i\n", i);
pthread_create(&_rx_thread[i], NULL, Thread, NULL);
}
for (int i = 0; i < threads; i++)
{
pthread_join(_rx_thread[i], NULL);
printf("End Thread: %i\n", i);
}
}
If the heap on your device isn't thread-safe, then you need to lock. You could just write your own new and delete functions that lock for the duration of new or delete -- you don't need to hold the lock across the whole lifetime of the allocated memory.
Check to see if there are compiler switches to make the allocator thread-safe.
As others have stated, chances are that the toolset's default memory allocation behavior is not thread-safe. I just checked with the 2 ARM cross-development toolsets I use the most, and indeed this is the case with one of them.
Most toolsets offer ways of making the libraries thread-safe, either by re-implementing functions or by linking in a different (thread-safe) version of the library. Without more information from you, it's hard to say what your particular toolset is doing.
Hate to say it, but the best information is probably in your toolset documentation.
Yeah, new is probably not threadsafe. You need synchronization mechanisms around the memory allocation and, separately, around the deletion. Look into the Boost.thread library, which provides mutex types that should help you out.
How about using malloc (you say it never fails on the embedded platform) to get the required memory then using placement new (void* operator new[] (std::size_t size, void* ptr) throw()) assuming it is available, for construction. See
new[] operator
Also see stackoverflow article
and
MSDN
So I'm looking at using a simple producer/consumer queue in C++. I'll end up using boost for threading but this example is just using pthreads. I'll also end up using a far more OO approach, but I think that would obscure the details I'm interested in at the moment.
Anyway the particular issues I'm worried about are
Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
Are there any other glaring issues?
Anyway heres the code:
#include <pthread.h>
#include <deque>
#include <iostream>
struct Data
{
std::deque<int> * q;
pthread_mutex_t * mutex;
};
void* producer( void* arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
q.push_back( i );
std::cout<<"Producing "<<i<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
void* consumer( void * arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
int v = q.front();
q.pop_front();
std::cout<<"Consuming "<<v<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
int main()
{
Data d;
std::deque<int> q;
d.q = &q;
pthread_mutex_t mutex;
pthread_mutex_init( &mutex, NULL );
d.mutex = & mutex;
pthread_t producer_thread;
pthread_t consumer_thread;
pthread_create( &producer_thread, NULL, producer, &d );
pthread_create( &consumer_thread, NULL, consumer, &d );
pthread_join( producer_thread, NULL );
pthread_join( consumer_thread, NULL );
}
EDIT:
I did end up throwing away this implementation, I'm now using a modified version of the code from here by Anthony Williams. My modified version can be found here This modified version uses a more sensible condition variable based approach.
Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
As long as only one thread can modify the container at a time, this is okay.
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
So long as you correctly control access to the container using a mutex, it does not need to be volatile (this is dependent upon your threads library, but it wouldn't be a very good mutex if it didn't provide a correct memory barrier).
It is perfectly valid to allocate memory in one thread and free it in another if both threads are in the same process.
Using a mutex to protect access to the deque should provide the correct memory access configuration.
EDIT: The only other thing to think about is the nature of the producer and consumer. Your synthesized example lacks some of the subtleties involved with a real implementation. For example, how will you synchronize the producer with the consumer if they are not operating at the exact same rate? You might want to consider using something like a pipe or an OS queue instead of a deque so that the consumer can block on read if there is no data ready to process.