What are the problems with this producer/consumer implementation? - c++

So I'm looking at using a simple producer/consumer queue in C++. I'll end up using boost for threading but this example is just using pthreads. I'll also end up using a far more OO approach, but I think that would obscure the details I'm interested in at the moment.
Anyway the particular issues I'm worried about are
Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
Are there any other glaring issues?
Anyway heres the code:
#include <pthread.h>
#include <deque>
#include <iostream>
struct Data
{
std::deque<int> * q;
pthread_mutex_t * mutex;
};
void* producer( void* arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
q.push_back( i );
std::cout<<"Producing "<<i<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
void* consumer( void * arg )
{
std::deque<int> &q = *(static_cast<Data*>(arg)->q);
pthread_mutex_t * m = (static_cast<Data*>(arg)->mutex);
for(unsigned int i=0; i<100; ++i)
{
pthread_mutex_lock( m );
int v = q.front();
q.pop_front();
std::cout<<"Consuming "<<v<<std::endl;
pthread_mutex_unlock( m );
}
return NULL;
}
int main()
{
Data d;
std::deque<int> q;
d.q = &q;
pthread_mutex_t mutex;
pthread_mutex_init( &mutex, NULL );
d.mutex = & mutex;
pthread_t producer_thread;
pthread_t consumer_thread;
pthread_create( &producer_thread, NULL, producer, &d );
pthread_create( &consumer_thread, NULL, consumer, &d );
pthread_join( producer_thread, NULL );
pthread_join( consumer_thread, NULL );
}
EDIT:
I did end up throwing away this implementation, I'm now using a modified version of the code from here by Anthony Williams. My modified version can be found here This modified version uses a more sensible condition variable based approach.

Since this code is using push_back and pop_front of std::deque - it's probably doing allocation and deallocation of the underlying data in different threads - I believe this is bad (undefined behaviour) - what's the easiest way to avoid this?
As long as only one thread can modify the container at a time, this is okay.
Nothing is marked volatile. But the important bits are mutex protected. Do I need to mark anything as volatile and if so what? - I don't think I do as I believe the mutex contains appropriate memory barriers etc., but I'm unsure.
So long as you correctly control access to the container using a mutex, it does not need to be volatile (this is dependent upon your threads library, but it wouldn't be a very good mutex if it didn't provide a correct memory barrier).

It is perfectly valid to allocate memory in one thread and free it in another if both threads are in the same process.
Using a mutex to protect access to the deque should provide the correct memory access configuration.
EDIT: The only other thing to think about is the nature of the producer and consumer. Your synthesized example lacks some of the subtleties involved with a real implementation. For example, how will you synchronize the producer with the consumer if they are not operating at the exact same rate? You might want to consider using something like a pipe or an OS queue instead of a deque so that the consumer can block on read if there is no data ready to process.

Related

Is it ok to use atomics to reduce locking in a read dominant multithread program?

Recently, I found myself often in a situation that shared data get read a lot, but written rarely, so I begin to wonder is it possible to speed up the sync a little bit.
Take the following as an example, in which mutiple threads occasionally write the data, a single thread frequently read the data, all synched with a normal mutex.
#include <iostream>
#include <unistd.h>
#include <unordered_map>
#include <mutex>
#include <thread>
using namespace std;
unordered_map<int, int> someData({{1,10}});
mutex mu;
void writeData(){
while(true){
{
lock_guard<mutex> lock(mu);
int r = rand()%10;
someData[1] = r;
printf("data changed to %d\n", r);
}
usleep(rand()%100000000 + 100000000);
}
}
void readData(){
while(true){
{
lock_guard<mutex> lock(mu);
for(auto &i:someData){
printf("%d:%d\n", i.first, i.second);
}
}
usleep(100);
}
}
int main() {
thread writeT1(&writeData2);
thread writeT2(&writeData2);
thread readT(&readData2);
readT.join();
}
using normal lock mechanism, every read requires a lock, and I'm thinking to speed up to a single atomic read in most cases:
unordered_map<int, int> someData({{1,10}});
mutex mu;
atomic_int dataVersion{0};
void writeData2(){
while(true){
{
lock_guard<mutex> lock(mu);
dataVersion.fetch_add(1, memory_order_acquire);
int r = rand()%10;
someData[1] = r;
printf("data changed to %d\n", r);
}
usleep(rand()%100000000 + 100000000);
}
}
void readData2(){
mu.lock();
int versionCopy = dataVersion.load();
auto dataCopy = someData;
mu.unlock();
while(true){
if (versionCopy != dataVersion.load(memory_order_relaxed)){
lock_guard<mutex> lock(mu);
versionCopy = dataVersion.load(memory_order_relaxed);
dataCopy = someData;
}
else{
for(auto &i:dataCopy){
printf("%d:%d\n", i.first, i.second);
}
usleep(100);
}
}
}
The data type unordered_map here is just an example, it could be any type, and I'm not looking for a pure lock-free algorithm, as that might be a whole other story. Just for a normal lock based sync, in a situation that most operation is read, using a trick like this, is it logically ok? Are there any established approaches for this?
[edit]
I'm aware of the shared mutex, but it isn't really the situation that I was talking about. firstly shared lock is not cheap, probably more expensive than the plain mutex, certainly heavier than atomics; secondly, in the example I showed a single reading thread which can't take much advantage of it.
I was interested particularly in the locking operation cost. Reducing blocking, critical section sure is the first thing to look at in a real case, but I wasn't targeting that here.
The unordered_map data type is just an example, not looking for a data structure that better suits for a specific task, or a lock free algorithm, the data type could be anything.
sleep time is to demonstrate that read happens way much more than write, to a degree that we begin to not so care the extra lock and copy time in the if block.
Thanks~
You are storing the data in an unordered_map. What guarantees does the unordered_map class make about concurrent access for readers & writers. If it is unhappy with that prospect, the atomics are not your friend.
In most (every?) OS, locking primitives themselves are handled with atomics in the uncontested case; only reverting to a kernel when contested. With that in mind, you are best to minimize the amount of code while the lock is held, so your first loop should be:
int r = rand()%10;
mu.lock();
someData[1] = r;
mu.unlock();
printf("data changed to %d\n", r);
I don't know how you would fix the read side, but if you chose a friendlier data store, you could minimize access to it in the same way.
I will first try to describe my own understanding of your idea:
Frequent reads, occasional write.
Locks are expensive ... should be benchmarked, try std::shared_mutex or Slim Reader/Writer SRW Locks - Windows only, or some other slim implementation, which usually use some cheap and optimistic (atomic/spin-lock) mechanism, that has little-to-no impact in case of no collision (no writer most of the time).
You don't seem to care how old/recent your copy is. That is acceptable for some informative performance counters, but I would think twice about that - it is not something somebody else having to maintain your code would expect or even think about. The consequences can be catastrophic.
You only access the writable data under the lock, read a copy you create holding the lock. That means your approach is safe from simple threading synchronization view, except the above point (readers working with old data, multiple readers can have different copies ... is it worth it?).
Anyway, you should really try to benchmark first and then try to find better solution which somebody else already wrote (slim rw-locks), before even attemting to come-up with your own synchronization mechanism (that is generally very hard to do correctly).
EDIT: Found some article with concreate shared_mutex implementation using std::atomic:
Code Project: We make a std::shared_mutex 10 times faster
Coliru test here

Mutual exclusion for an asynchronous thread using primitive operators in c++

I've been trying to use c++ primitives operators and variables, like int, if, and while to develop a thread-safe mechanism.
My idea is to use two integers variables called sync and lock, incrementing and checking the sync and after that incrementing and checking the lock. If all the checkings are successful, then the lock is guarantee, but it tries again if the checking is unsuccessful.
It seems that my idea is not working properly as it is asserting in the final verification.
#include <assert.h>
#include <iostream>
#include <string>
#include <thread>
#include <vector>
class Resource {
// Shared resource to be thread safe.
int resource;
// Mutual exclusion variables.
volatile int lock;
volatile int sync;
public:
Resource() : resource( 0 ), lock( 0 ), sync( 0 ) {}
~Resource() {}
int sharedResource() {
return resource;
}
void sharedResourceAction( std::string id ) {
bool done;
do {
int oldSync = sync;
// ++ should be atomic.
sync++;
if ( sync == oldSync + 1 ) {
// ++ should be atomic.
lock++;
if ( lock == 1 ) {
// For the sake of the example, the read-modify-write
// is not atomic and not thread safe if threre is no
// mutex surronding it.
int oldResource = resource;
resource = oldResource + 1;
done = true;
}
// -- should be atomic.
lock--;
}
if ( !done ) {
// Pseudo randomic sleep to unlock the race condition
// between the threads.
std::this_thread::sleep_for(
std::chrono::microseconds( resource % 5 ) );
}
} while( !done );
}
};
static const int maxThreads = 10;
static const int maxThreadActions = 1000;
void threadAction( Resource& resource, std::string& name ) {
for ( int i = 0; i < maxThreadActions; i++) {
resource.sharedResourceAction( name );
}
}
int main() {
std::vector< std::thread* > threadVec;
Resource resource;
// Create the threads.
for (int i = 0; i < maxThreads; ++i) {
std::string name = "t";
name += std::to_string( i );
std::thread *thread = new std::thread( threadAction,
std::ref( resource ),
std::ref( name ) );
threadVec.push_back( thread );
}
// Join the threads.
for ( auto threadVecIter = threadVec.begin();
threadVecIter != threadVec.end(); threadVecIter++ ) {
(*threadVecIter)->join();
}
std::cout << "Shared resource is " << resource.sharedResource()
<< std::endl;
assert( resource.sharedResource() == ( maxThreads * maxThreadActions ) );
return 0;
}
Is there a thread-safe mechanism to protect shared resources using only primitives variables and operators?
No, there are a few reasons why this doesn't work
Firstly the standard describes it not to work. You've (explicitly) got a read/write and write/write race condition and the standard forbids this.
Secondly, ++i is in no way atomic. Even on mainstream intel processors it isn't - it'll usually be an inc instruction when it needs to be a lock inc instruction.
Thirdly, volatile has no threading meaning in c++ like it does in java or c#. Its neither necessary nor sufficient to achieve anything to do with threadsafety (outside of nasty compiler extensions like volatile:/ms). See this answer for more information about volatile in c++.
There may be more issues in your code but this list should be enough to dissuade you.
Edit: And to actually answer your final question - no I dont think its possible to implement thread safety mechanisms from primitive types and operations in a standard compliant way. Basically you need to get the memory-subsytem, cpu AND compiler to all agree not to perform some kinds of transformations when implementing thread safety mechanisms. This generally means you need to use compiler hooks or guarantees outside of the standard and also knowledge of the final target CPUs guarantees or intrinsics to achieve it.
volatile is absolutely no good for multithreading:
Within a thread of execution, accesses (reads and writes) through
volatile glvalues cannot be reordered past observable side-effects
(including other volatile accesses) that are sequenced-before or
sequenced-after within the same thread, but this order is not
guaranteed to be observed by another thread, since volatile access
does not establish inter-thread synchronization.
In addition, volatile
accesses are not atomic (concurrent read and write is a data race) and
do not order memory (non-volatile memory accesses may be freely
reordered around the volatile access).
If you want to have atomic operations on an integer, the proper way to do it is with std::atomic<int>. That gives you guarantees on memory ordering that will be observed by other threads. If you really want to do this sort of lock-free programming, you should sit and absorb the memory model documentation, and, if you're anything like me, strongly reconsider attempting lock-free programming as you try to stop your head exploding.

Thread safety in std::map of std::shared_ptr

I know there are a lot of similar questions with answers around, but since I still don't understand this particular case, I decided to pose a question.
What I have is a map of shared_ptrs to a dynamically allocated array (MyVector). What I want is limited concurrent access without the need to lock. I know that the map per se is not thread safe, but I always thought what I'm doing here should be ok, which is:
I fill the map in a single threaded environment like that:
typedef shared_ptr<MyVector<float>> MyVectorPtr;
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
After the initialization, I have one thread that reads from the elements and one that replaces what the shared_ptrs point to.
Thread 1:
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
Thread 2:
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
}
After a while I get either a seg fault or a double free in one of the two threads. Somehow not really surprisingly, but still I don't really get it.
The reasons why I thought this would work, are:
I don't add or remove any items of the map in the multi-threaded
environment, so the iterators should always point to something valid.
I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic. SO Question
Obviously, either one ore more of my assumptions are wrong, or I'm not doing what I think I am. I think that reset actually is not thread safe, would std::atomic_exchange help?
Can someone release me? Thanks a lot!
If someone wants to try out, here is the full code example:
#include <stdio.h>
#include <iostream>
#include <string>
#include <map>
#include <unistd.h>
#include <pthread.h>
using namespace std;
template<class T>
class MyVector
{
public:
MyVector(int length)
: number(length)
, array(new T[length])
{
}
~MyVector()
{
if (array != NULL)
{
delete[] array;
}
array = NULL;
}
int number;
private:
T* array;
};
typedef shared_ptr<MyVector<float>> MyVectorPtr;
static map<int,MyVectorPtr> content;
const int numRows = 1000;
const int numElements = 10;
//pthread_mutex_t write_lock;
double get_cache_size_in_megabyte()
{
double memory_use=0;
//BlockingLockGuard guard(write_lock);
for(auto i=content.begin();i!=content.end();i++)
{
MyVectorPtr p(i->second);
if (p)
{
memory_use+=sizeof(int) + sizeof(float) * p->number;
}
}
return memory_use/(1024.0*1024.0);
}
void* write_content(void*)
{
while(true)
{
//BlockingLockGuard guard(write_lock);
for (auto itr=content.begin();content.end()!=itr;++itr)
{
itr->second.reset(new MyVector<float>(numRows));
cout << "one new written" <<endl;
}
}
return NULL;
}
void* loop_size_checker(void*)
{
while (true)
{
cout << get_cache_size_in_megabyte() << endl;;
}
return NULL;
}
int main(int argc, const char* argv[])
{
for (int i = 0; i < numElements; i++)
{
content[i] = MyVectorPtr(new MyVector<float>(numRows));
}
pthread_attr_t attr;
pthread_attr_init(&attr) ;
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
pthread_t *grid_proc3 = new pthread_t;
pthread_create(grid_proc3, &attr, &loop_size_checker,NULL);
pthread_t *grid_proc = new pthread_t;
pthread_create(grid_proc, &attr, &write_content,(void*)NULL);
// to keep alive and avoid content being deleted
sleep(10000);
}
I thought concurrently changing a single element of the map is fine as long as the operation is atomic.
Changing the element in a map is not atomic unless you have a atomic type like std::atomic.
I thought the operations I do on the shared_ptr (increment ref count, decrement ref count in Thread 1, reset in Thread 2) are atomic.
That is correct. Unfortunately you are also changing the underlying pointer. That pointer is not atomic. Since it is not atomic you need synchronization.
One thing you can do though is use the atomic free functions that are introduced with std::shared_ptr. This will let you avoid having to use a mutex.
Lets expand MyVectorPtr p(i->second); which is running on thread-1:
The constructor called for this is:
template< class Y >
shared_ptr( const shared_ptr<Y>& r ) = default;
Which probably boils down to 2 assignments of the underlying shared pointer and the reference count.
It may very well happen that thread 2 would delete the shared pointer while in thread-1 the pointer is being assigned to p. The underlying pointer stored inside shared_ptr is not atomic.
Thus, you usage of std::shared_ptr is not thread safe. It is thread safe as long as you do not update or modify the underlying pointer.
TL;DR;
Changing std::map isn't thread safe, while using std::shared_ptr regarding additional references is.
You should protect accessing your map regarding read/write operations using an appropriate synchronization mechanism, like e.g. a std::mutex.
Also if the state of an instance referenced by the std::shared_ptr should change, it needs to be protected against data races if it's accessed from concurrent threads.
BTW, the MyVector you are showing is a way too naive implementation.

Safe multi-thread counter increment

For example, I've got a some work that is computed simultaneously by multiple threads.
For demonstration purposes the work is performed inside a while loop. In a single iteration each thread performs its own portion of the work, before the next iteration begins a counter should be incremented once.
My problem is that the counter is updated by each thread.
As this seems like a relatively simple thing to want to do, I presume there is a 'best practice' or common way to go about it?
Here is some sample code to illustrate the issue and help the discussion along.
(Im using boost threads)
class someTask {
public:
int mCounter; //initialized to 0
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
int getCount()
{
boost::mutex::scoped_lock lock( cntmutex );
return mCount;
}
void process( int thread_id, int numThreads )
{
while ( getCount() < mTotal )
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
// Wait for all thread to get to this point
cntmutex.lock();
mCounter++; // < ---- how to ensure this is only updated once?
cntmutex.unlock();
}
}
};
The main problem I see here is that you reason at a too-low level. Therefore, I am going to present an alternative solution based on the new C++11 thread API.
The main idea is that you essentially have a schedule -> dispatch -> do -> collect -> loop routine. In your example you try to reason about all this within the do phase which is quite hard. Your pattern can be much more easily expressed using the opposite approach.
First we isolate the work to be done in its own routine:
void process_thread(size_t id, size_t numThreads) {
// do something
}
Now, we can easily invoke this routine:
#include <future>
#include <thread>
#include <vector>
void process(size_t const total, size_t const numThreads) {
for (size_t count = 0; count != total; ++count) {
std::vector< std::future<void> > results;
// Create all threads, launch the work!
for (size_t id = 0; id != numThreads; ++id) {
results.push_back(std::async(process_thread, id, numThreads));
}
// The destruction of `std::future`
// requires waiting for the task to complete (*)
}
}
(*) See this question.
You can read more about std::async here, and a short introduction is offered here (they appear to be somewhat contradictory on the effect of the launch policy, oh well). It is simpler here to let the implementation decides whether or not to create OS threads: it can adapt depending on the number of available cores.
Note how the code is simplified by removing shared state. Because the threads share nothing, we no longer have to worry about synchronization explicitly!
You protected the counter with a mutex, ensuring that no two threads can access the counter at the same time. Your other option would be using Boost::atomic, c++11 atomic operations or platform-specific atomic operations.
However, your code seems to access mCounter without holding the mutex:
while ( mCounter < mTotal )
That's a problem. You need to hold the mutex to access the shared state.
You may prefer to use this idiom:
Acquire lock.
Do tests and other things to decide whether we need to do work or not.
Adjust accounting to reflect the work we've decided to do.
Release lock. Do work. Acquire lock.
Adjust accounting to reflect the work we've done.
Loop back to step 2 unless we're totally done.
Release lock.
You need to use a message-passing solution. This is more easily enabled by libraries like TBB or PPL. PPL is included for free in Visual Studio 2010 and above, and TBB can be downloaded for free under a FOSS licence from Intel.
concurrent_queue<unsigned int> done;
std::vector<Work> work;
// fill work here
parallel_for(0, work.size(), [&](unsigned int i) {
processWorkItem(work[i]);
done.push(i);
});
It's lockless and you can have an external thread monitor the done variable to see how much, and what, has been completed.
I would like to disagree with David on doing multiple lock acquisitions to do the work.
Mutexes are expensive and with more threads contending for a mutex , it basically falls back to a system call , which results in user space to kernel space context switch along with the with the caller Thread(/s) forced to sleep :Thus a lot of overheads.
So If you are using a multiprocessor system , I would strongly recommend using spin locks instead [1].
So what i would do is :
=> Get rid of the scoped lock acquisition to check the condition.
=> Make your counter volatile to support above
=> In the while loop do the condition check again after acquiring the lock.
class someTask {
public:
volatile int mCounter; //initialized to 0 : Make your counter Volatile
int mTotal; //initialized to i.e. 100000
boost::mutex cntmutex;
void process( int thread_id, int numThreads )
{
while ( mCounter < mTotal ) //compare without acquiring lock
{
// The main task is performed here and is divided
// into sub-tasks based on the thread_id and numThreads
cntmutex.lock();
//Now compare again to make sure that the condition still holds
//This would save all those acquisitions and lock release we did just to
//check whther the condition was true.
if(mCounter < mTotal)
{
mCounter++;
}
cntmutex.unlock();
}
}
};
[1]http://www.alexonlinux.com/pthread-mutex-vs-pthread-spinlock

A thread-safe vector and string container?

I posted a previous question "Seg Fault when using std::string on an embedded Linux platform" where I got some very useful advise. I have been away on other projects since then and have recently returned to looking at this issue.
To reiterate, I am restricted to using the arm-linux cross compiler (version 2.95.2) as this is what is supplied and supported by the embedded platform vendor. I understand that the issue is likely because the stdlib is very old, and not particularly thread safe.
The problem is that whenever I use the STL containers in multiple threads, I end up with a segmentation fault. The code below will consistently seg fault unless I use pthread_mutex_lock and scope operators around the container declarations (as in other post).
It is not feasible to use this approach in my application as I pass the containers around to different methods and classes. I would ideally like to solve this problem, or find a suitable alternative. I have tried STLPort and SGI's Standard Template Library with the same results. I can only assume that because they are being linked by the very old gcc, they cannot solve the problem.
Does anyone have any possible recommendations or solutions? Or perhaps you can suggest an implementation of vector (and string) that I can drop into my code?
Thanks in advance for any guidance.
#include <stdio.h>
#include <vector>
#include <list>
#include <string>
using namespace std;
/////////////////////////////////////////////////////////////////////////////
class TestSeg
{
static pthread_mutex_t _logLock;
public:
TestSeg()
{
}
~TestSeg()
{
}
static void* TestThread( void *arg )
{
int i = 0;
while ( i++ < 10000 )
{
printf( "%d\n", i );
WriteBad( "Function" );
}
pthread_exit( NULL );
}
static void WriteBad( const char* sFunction )
{
//pthread_mutex_lock( &_logLock );
//{
printf( "%s\n", sFunction );
string sKiller; // <----------------------------------Bad
//list<char> killer; // <----------------------------------Bad
//vector<char> killer; // <----------------------------------Bad
//}
//pthread_mutex_unlock( &_logLock );
return;
}
void RunTest()
{
int threads = 100;
pthread_t _rx_thread[threads];
for ( int i = 0 ; i < threads ; i++ )
{
pthread_create( &_rx_thread[i], NULL, TestThread, NULL );
}
for ( int i = 0 ; i < threads ; i++ )
{
pthread_join( _rx_thread[i], NULL );
}
}
};
pthread_mutex_t TestSeg::_logLock = PTHREAD_MUTEX_INITIALIZER;
int main( int argc, char *argv[] )
{
TestSeg seg;
seg.RunTest();
pthread_exit( NULL );
}
The issue is not with the containers, it's with your code.
It is completely unnecessary to make the containers themselves threadsafe, because what you need, first and foremost, is transaction-like semantics.
Let's assume, for the sake of demonstration, that you have a threadsafe implementation of vector, for example.
Thread 1: if (!vec.empty())
Thread 2: vec.clear();
Thread 1: foo = vec.front();
This leads to undefined behavior.
The issue is that having each operation on the container threadsafe is pretty much pointless because you are still required to be able to lock the container itself for several operations in a row. Therefore you would lock for your various operations, and then lock again on each and every operation ?
As I said: completely unnecessary.
Part of your query might be answered in another thread. The design of C++, including the standard library, is influenced by many factors. Efficiency is a repeated theme. Thread safety mechanisms often are at odds with an objective of efficiency. The age of the library is not really the issue.
For your situation, you may be able to wrap the STL vector in your own vector class (you might consider a Decorator) that contains the locking mechanism and provides the lock/unlock logic around accesses.