What level of locking granularity is good in concurrent data structures?

What level of locking granularity is good in concurrent data structures? - c++

I am quite new to multi-threading, I have a single threaded data analysis app that has a good bit of potential for parallelization and while the data sets are large it does not come close to saturating the hard-disk read/write so I figure I should take advantage of the threading support that is now in the standard and try to speed the beast up.
After some research I decided that producer consumer was a good approach for the reading of data from the disk and processing it and I started writing an object pool that would become part of the circular buffer that will be where the producers put data and the consumers get the data. As I was writing the class it felt like I was being too fine grained in how I was handling locking and releasing data members. It feels like half the code is locking and unlocking and like there are an insane number of synchronization objects floating around.
So I come to you with a class declaration and a sample function and this question: Is this too fine-grained? Not fine grained enough? Poorly thought out?
struct PoolArray
{
public:
Obj* arr;
uint32 used;
uint32 refs;
std::mutex locker;
};
class SegmentedPool
{
public: /*Construction and destruction cut out*/
void alloc(uint32 cellsNeeded, PoolPtr& ptr);
void dealloc(PoolPtr& ptr);
void clearAll();
private:
void expand();
//stores all the segments of the pool
std::vector< PoolArray<Obj> > pools;
ReadWriteLock poolLock;
//stores pools that are empty
std::queue< int > freePools;
std::mutex freeLock;
int currentPool;
ReadWriteLock currentLock;
};
void SegmentedPool::dealloc(PoolPtr& ptr)
{
//find and access the segment
poolLock.lockForRead();
PoolArray* temp = &(pools[ptr.getSeg()]);
poolLock.unlockForRead();
//reduce the count of references in the segment
temp->locker.lock();
--(temp->refs);
//if the number of references is now zero then set the segment back to unused
//and push it onto the queue of empty segments so that it can be reused
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
temp->locker.unlock();
ptr.set(NULL,-1);
}
A few explanations:
First PoolPtr is a stupid little pointer like object that stores the pointer and the segment number in the pool that the pointer came from.
Second this is all "templatized" but i took those lines out to try to reduce the length of the code block
Third ReadWriteLock is something I put together using a mutex and a pair of condition variables.

Locks are inefficient no matter how fine grained they are, so avoid at all cost.
Both queue and vector can be easyly implemented lock free using compare-swap primitive.
there are a number of papers on the topic
Lock free queue:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8674&rep=rep1&type=pdf
http://www.par.univie.ac.at/project/peppher/publications/Published/opodis10lfq.pdf
Lock free vector:
http://www.stroustrup.com/lock-free-vector.pdf
Straustrup's paper also refers to lock-free allocator, but don't jump at it right away, standard allocators are pretty good these days.
UPD
If you don't want to bother writing your own containers, use Intel's Threading Building Blocks library, it provide both thread safe vector and queue. They are NOT lock free, but they are optimized to use CPU cache efficiently.
UPD
Regarding PoolArray, you don't need a lock there either. If you can use c++11, use std::atomic for atomic increments and swaps, otherwise use compiler built-ins (InterLocked* functions in MSVC and _sync* in gcc http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html)

A good start - you lock things when needed and free them as soon as you're finished.
Your ReadWriteLock is pretty much a CCriticalSection object - depending on your needs it might improve performance to use that instead.
One thing I would say is that call your temp->locker.lock(); function before you release the lock on the pool poolLock.unlockForRead();, otherwise you're performing operations on the pool object when it's not under synchronisation control - it could be being used by another thread at that point. A minor point, but with multi-threading it's the minor points that trip you up in the end.
A good approach to take to multi-threading is to wrap any controlled resources in objects or functions that do the locking and unlocking inside them, so anyone who wants access to the data doesn't have to worry about which lock to lock or unlock, and when to do it. for example:
...
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
...
would be...
...
if(temp->refs==0)
{
temp->used=0;
addFreePool(ptr.getSeg());
}
...
void SegmentedPool::addFreePool(unsigned int seg)
{
freeLock.lock();
freePools.push(seg);
freeLock.unlock();
}
There are plenty of multi-threading benchmarking tools out there too. You can play around with controlling your resources in different ways, run it through one of the tools, and see where any bottlenecks are if you feel like performance is becoming an issue.

Related

Concurrency Problem: how to have only one thread go through critical section

I want to have a multithreaded function that allocates some memory for an object obj and returns the allocated memory. My current single-threaded and multiple threaded version codes are below.
The multi-threaded version has no race conditions but runs slow when a lot of threads are trying to get the lock. After the malloc and pointer update, each thread still needs to acquire and release the same lock. That causes some multi-threading performance drop. I wonder if there are some other ways to improve performance.
struct multi_level_tree{
multi_level_tree* ptr[256];
mutex mtx;
};
multi_level_tree tree; // A global object that every thread need to access and update
/* Single Threaded */
multi_level_tree* get_ptr(multi_level_tree* cur, int idx) {
if (!cur[idx].ptr)
cur[idx].ptr = malloc(sizeof(T));
return cur[idx].ptr;
}
/* Multi Threaded with mutex */
void get_ptr(multi_level_tree* cur, int idx) {
if (!cur[idx].ptr) {
cur[idx].mtx.lock(); // other threads wait here, and go one by one
/* Critical Section Start */
if (!cur[idx].ptr)
cur[idx].ptr = malloc(sizeof(multi_level_tree)); // malloc takes a while
/* Critical Section End */
cur[idx].mtx.unlock();
}
return cur[idx].ptr;
}
The code I am looking for should have the following property.
When the first thread allocated the memory, it should alert all other threads waiting for it.
All other threads should be unblocked at the same time.
No race condition.
The challenges in the problem
* The tree is sparse with multiple levels, initialize all of it is impossible considering the memory we have
* Similar to Double-Checked Locking problem, but was trying to avoid std::atomic
The point for this code is to implement a multi-level array as a global variable. Except for the lowest level, each array is a list of pointers to the next level array. Since this data structure needs to grow dynamically, I got into this problem.

how to have only one thread go through critical section
You could use a mutex. There's an example in your question.
It is not the most optimal solution for synchronised innitialisation. A simple improvement is to use a local static, in which case compiler is responsible for implementing the synchronisation:
T& get_T() {
static T instance;
return instance;
}
but runs slow when a lot of threads are trying to get the lock
This problem is inherent with serialising the access to the same data structure. A way to improve performance is to avoid doing that in the first place.
In this particular example, it appears that you could simply initialise the resource while the process is still single threaded, and start the parallel threads only after the initialisation is complete. That way no locking is required to access the pointer.
If that is not an option, another approach is to simply call get_ptr once in each thread, and store a copy locally. That way the locking overhead remains minimal.
Even better would be to have separate data structures in each thread. This is useful when threads only produce data, and don't need to access results from other threads.
Regarding edited example: You might benefit from a lock free tree implementation. It may be difficult to implement however.

Since you cannot easily fix it, since it's inherent of concurrency, i have an idea that may improve or decrease performance rather substantially, through.
If this resource is really used that often and is detrimental you could try to use Active Object (https://en.wikipedia.org/wiki/Active_object) and Boost Lockfree Queue (https://www.boost.org/doc/libs/1_66_0/doc/html/lockfree/reference.html#header.boost.lockfree.queue_hpp). Use atomic store/load on Future objects, and you will make this process completely lockless. But on the other hand it will require a single thread to maintain. Performance of such solution depends heavily on how often is this resource used.

From the comment form #WilliamClements , I see this is a double-checked locking problem itself. The original multi-threading code in my question may broke. To program it correctly, I switched to atomic pointers to prevent ordering problems with load/store instructions.
However, the example still uses a lock that I want to get rid of. Therefore, I choose to use std::atomic::compare_exchange_weak to only update the pointer when its value is nullptr. In this way, only one thread will successfully update the pointer value, and other threads are going to release requested memory if they fail std::atomic::compare_exchange_weak.
This code is doing very well for me so far.
struct multi_level_tree{
std::atomic<multi_level_tree*> ptr;
};
multi_level_tree tree;
void get_ptr(multi_level_tree* cur, int idx) {
if (!cur[idx].ptr.load()) {
/* Critical Section Start */
if (!cur[idx].ptr.load()) {
node* tmp = malloc(sizeof(multi_level_tree)*256);
if (cur[idx].ptr.compare_exchange_weak(nullptr, tmp)) {
/* successfully updated, do nothing */
}
else {
/* Already updated by other threads, release */
free(tmp);
}
}
/* Critical Section End */
}
return cur[idx].ptr;
}

Lockfree Reloading and Sharing of const objects

My application's configuration is a const object that is shared with multiple threads. The configuration is stored in a centralized location and any thread may reach it. I've tried building a lockfree implementation that would allow me to load new configuration while still allowing other threads to read the last known configuration.
My current implementation has a race between updating the shared_ptr and reading from it.
template<typename T>
class ConfigurationHolder
{
public:
typedef std::shared_ptr<T> SPtr;
typedef std::shared_ptr<const T> CSPtr;
ConfigurationHolder() : m_active(new T()) {}
CSPtr get() const { return m_active; } // RACE - read
template<typename Reloader>
bool reload(Reloader reloader)
{
SPtr tmp(new T());
if (!tmp)
return false;
if (!reloader(tmp))
return false;
m_active=tmp; // RACE - write
return true;
}
private:
CSPtr m_active;
};
I can add a shared_mutex for the problematic read/write access to the shared_ptr, but I'm looking for a solution that will keep the implementation lockfree.
EDIT: My version of GCC does not support atomic_exchange on shared_ptr
EDIT2: Requirements Clarification: I have multiple readers and may have multiple reloaders (although this is less common). Readers need to hold a configuration object and that it would not change while they are reading it. Old configuration objects must be freed when the last reader is done with them.

You should just update your compiler to get atomic shared pointer ops.
Failing that, wrap it in a shared_timed_mutex. Then test how much it costs you.
Both of these are going to be less work than correctly writing your own lock-free shared pointer system.
If you have to:
This is a hack, but it might work. It is a read-copy-update style on the pointer itself.
Have a std::vector<std::unique_ptr<std::shared_ptr<T>>>. Have a std::atomic<std::shared_ptr<T> const*> "current" pointer, and a std::atomic<std::size_t> active_readers.
The vector stores your still living shared_ptrs. When you want to change, push a new one on the back. Keep a copy of this shared_ptr.
Now swap the "current" pointer for the new one. Busy-wait until active_readers hits zero, or until you get bored.
If active_readers hit zero, filter your vector for shared_ptrs with a use-count of 1. Remove them from the vector.
Regardless, now drop the extra shared_ptr you ketp to the state you created. And done writing.
If you need more than one writer, lock this process using a separate mutex.
On the reader side, increment active_readers. Now atomically load the "current" pointer, make a local copy of the pointed-to shared_ptr, then decrement active_readers.
However, I just wrote this. So it probably contains bugs. Proving concurrent design correct is hard.
By far the easiest way to make this reliable is to upgrade your compiler and get atomic operations on a shared_ptr.
This is probably overly complex and I think we can set it up so that the T are cleaned up when the last reader goes away, but I aimed for correctness rather than efficiency on recycling T.
With minimial sync on the readers, you could use a condition variable to signal that a reader is done with a given T; but that involves a tiny bit of contention with the writer thread.
Practically, lock-free algorithms are often slower than lock based, because the overhead of a mutex isn't as high as you fear.
A shared_timed_mutex wrapping a shared_ptr, where the writer is simply overwriting the variable, is going to be pretty darn fast. Existing readers are going to keep their old shared_ptr just fine.

Implementing thread-safe arrays

I want to implement a array-liked data structure allowing multiple threads to modify/insert items simultaneously. How can I obtain it in regard to performance? I implemented a wrapper class around std::vector and I used critical sections for synchronizing threads. Please have a look at my code below. Each time a thread want to work on the internal data, it may have to wait for other threads. Hence, I think its performance is NOT good. :( Is there any idea?
class parallelArray{
private:
std::vector<int> data;
zLock dataLock; // my predefined class for synchronizing
public:
void insert(int val){
dataLock.lock();
data.push_back(val);
dataLock.unlock();
}
void modify(unsigned int index, int newVal){
dataLock.lock();
data[index]=newVal; // assuming that the index is valid
dataLock.unlock();
}
};

Take a look at shared_mutex in the Boost library. This allows you to have multiple readers, but only one writer
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types.shared_mutex

The best way is to use some fast reader-writer lock. You perform shared locking for read-only access and exclusive locking for writable access - this way read-only access is performed simultaneously.
In user-mode Win32 API there are Slim Reader/Writer (SRW) Locks available in Vista and later.
Before Vista you have to implement reader-writer lock functionality yourself that is pretty simple task. You can do it with one critical section, one event and one enum/int value. Though good implementation would require more effort - I would use hand-crafted linked list of local (stack allocated) structures to implement fair waiting queue.

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!

What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).

There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2

what is the best way to synchronize container access between multiple threads in real-time application

I have std::list<Info> infoList in my application that is shared between two threads. These 2 threads are accessing this list as follows:
Thread 1: uses push_back(), pop_front() or clear() on the list (Depending on the situation)
Thread 2: uses an iterator to iterate through the items in the list and do some actions.
Thread 2 is iterating the list like the following:
for(std::list<Info>::iterator i = infoList.begin(); i != infoList.end(); ++i)
{
DoAction(i);
}
The code is compiled using GCC 4.4.2.
Sometimes ++i causes a segfault and crashes the application. The error is caused in std_list.h line 143 at the following line:
_M_node = _M_node->_M_next;
I guess this is a racing condition. The list might have changed or even cleared by thread 1 while thread 2 was iterating it.
I used Mutex to synchronize access to this list and all went ok during my initial test. But the system just freezes under stress test making this solution totally unacceptable. This application is a real-time application and i need to find a solution so both threads can run as fast as possible without hurting the total applications throughput.
My question is this:
Thread 1 and Thread 2 need to execute as fast as possible since this is a real-time application. what can i do to prevent this problem and still maintain the application performance? Are there any lock-free algorithms available for such a problem?
Its ok if i miss some newly added Info objects in thread 2's iteration but what can i do to prevent the iterator from becoming a dangling pointer?
Thanks

Your for() loop can potentially keep a lock for a relatively long time, depending on how many elements it iterates. You can get in real trouble if it "polls" the queue, constantly checking if a new element became available. That makes the thread own the mutex for an unreasonably long time, giving few opportunities to the producer thread to break in and add an element. And burning lots of unnecessary CPU cycles in the process.
You need a "bounded blocking queue". Don't write it yourself, the lock design is not trivial. Hard to find good examples, most of it is .NET code. This article looks promising.

In general it is not safe to use STL containers this way. You will have to implement specific method to make your code thread safe. The solution you chose depends on your needs. I would probably solve this by maintaining two lists, one in each thread. And communicating the changes through a lock free queue (mentioned in the comments to this question). You could also limit the lifetime of your Info objects by wrapping them in boost::shared_ptr e.g.
typedef boost::shared_ptr<Info> InfoReference;
typedef std::list<InfoReference> InfoList;
enum CommandValue
{
Insert,
Delete
}
struct Command
{
CommandValue operation;
InfoReference reference;
}
typedef LockFreeQueue<Command> CommandQueue;
class Thread1
{
Thread1(CommandQueue queue) : m_commands(queue) {}
void run()
{
while (!finished)
{
//Process Items and use
// deleteInfo() or addInfo()
};
}
void deleteInfo(InfoReference reference)
{
Command command;
command.operation = Delete;
command.reference = reference;
m_commands.produce(command);
}
void addInfo(InfoReference reference)
{
Command command;
command.operation = Insert;
command.reference = reference;
m_commands.produce(command);
}
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
class Thread2
{
Thread2(CommandQueue queue) : m_commands(queue) {}
void run()
{
while(!finished)
{
processQueue();
processList();
}
}
void processQueue()
{
Command command;
while (m_commands.consume(command))
{
switch(command.operation)
{
case Insert:
m_infoList.push_back(command.reference);
break;
case Delete:
m_infoList.remove(command.reference);
break;
}
}
}
void processList()
{
// Iterate over m_infoList
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
void main()
{
CommandQueue commands;
Thread1 thread1(commands);
Thread2 thread2(commands);
thread1.start();
thread2.start();
waitforTermination();
}
This has not been compiled. You still need to make sure that access to your Info objects is thread safe.

I would like to know what is the purpose of this list, it would be easier to answer the question then.
As Hoare said, it is generally a bad idea to try to share data to communicate between two threads, rather you should communicate to share data: ie messaging.
If this list is modelling a queue, for example, you might simply use one of the various ways to communicate (such as sockets) between the two threads. Consumer / Producer is a standard and well-known problem.
If your items are expensive, then only pass the pointers around during communication, you'll avoid copying the items themselves.
In general, it's exquisitely difficult to share data, although it is unfortunately the only way of programming we hear of in school. Normally only low-level implementation of "channels" of communication should ever worry about synchronization and you should learn to use the channels to communicate instead of trying to emulate them.

To prevent your iterator from being invalidated you have to lock the whole for loop. Now I guess the first thread may have difficulties updating the list. I would try to give it a chance to do its job on each (or every Nth iteration).
In pseudo-code that would look like:
mutex_lock();
for(...){
doAction();
mutex_unlock();
thread_yield(); // give first thread a chance
mutex_lock();
if(iterator_invalidated_flag) // set by first thread
reset_iterator();
}
mutex_unlock();

You have to decide which thread is the more important. If it is the update thread, then it must signal the iterator thread to stop, wait and start again. If it is the iterator thread, it can simply lock the list until iteration is done.

The best way to do this is to use a container that is internally synchronized. TBB and Microsoft's concurrent_queue do this. Anthony Williams also has a good implementation on his blog here

Others have already suggested lock-free alternatives, so I'll answer as if you were stuck using locks...
When you modify a list, existing iterators can become invalidated because they no longer point to valid memory (the list automagically reallocates more memory when it needs to grow). To prevent invalidated iterators, you could make the producer block on a mutex while your consumer traverses the list, but that would be needless waiting for the producer.
Life would be easier if you used a queue instead of a list, and have your consumer use a synchronized queue<Info>::pop_front() call instead of iterators that can be invalidated behind your back. If your consumer really needs to gobble chunks of Info at a time, then use a condition variable that'll make your consumer block until queue.size() >= minimum.
The Boost library has a nice portable implementation of condition variables (that even works with older versions of Windows), as well as the usual threading library stuff.
For a producer-consumer queue that uses (old-fashioned) locking, check out the BlockingQueue template class of the ZThreads library. I have not used ZThreads myself, being worried about lack of recent updates, and because it didn't seem to be widely used. However, I have used it as inspiration for rolling my own thread-safe producer-consumer queue (before I learned about lock-free queues and TBB).
A lock-free queue/stack library seems to be in the Boost review queue. Let's hope we see a new Boost.Lockfree in the near future! :)
If there's interest, I can write up an example of a blocking queue that uses std::queue and Boost thread locking.
EDIT:
The blog referenced by Rick's answer already has a blocking queue example that uses std::queue and Boost condvars. If your consumer needs to gobble chunks, you can extend the example as follows:
void wait_for_data(size_t how_many)
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.size() < how_many)
{
the_condition_variable.wait(lock);
}
}
You may also want to tweak it to allow time-outs and cancellations.
You mentioned that speed was a concern. If your Infos are heavyweight, you should consider passing them around by shared_ptr. You can also try making your Infos fixed size and use a memory pool (which can be much faster than the heap).

As you mentioned that you don't care if your iterating consumer misses some newly-added entries, you could use a copy-on-write list underneath. That allows the iterating consumer to operate on a consistent snapshot of the list as of when it first started, while, in other threads, updates to the list yield fresh but consistent copies, without perturbing any of the extant snapshots.
The trade here is that each update to the list requires locking for exclusive access long enough to copy the entire list. This technique is biased toward having many concurrent readers and less frequent updates.
Trying to add intrinsic locking to the container first requires you to think about which operations need to behave in atomic groups. For instance, checking if the list is empty before trying to pop off the first element requires an atomic pop-if-not-empty operation; otherwise, the answer to the list being empty can change in between when the caller receives the answer and attempts to act upon it.
It's not clear in your example above what guarantees the iteration must obey. Must every element in the list eventually be visited by the iterating thread? Does it make multiple passes? What does it mean for one thread to remove an element from the list while another thread is running DoAction() against it? I suspect that working through these questions will lead to significant design changes.
You're working in C++, and you mentioned needing a queue with a pop-if-not-empty operation. I wrote a two-lock queue many years ago using the ACE Library's concurrency primitives, as the Boost thread library was not yet ready for production use, and the chance for the C++ Standard Library including such facilities was a distant dream. Porting it to something more modern would be easy.
This queue of mine -- called concurrent::two_lock_queue -- allows access to the head of the queue only via RAII. This ensures that acquiring the lock to read the head will always be mated with a release of the lock. A consumer constructs a const_front (const access to head element), a front (non-const access to head element), or a renewable_front (non-const access to head and successor elements) object to represent the exclusive right to access the head element of the queue. Such "front" objects can't be copied.
Class two_lock_queue also offers a pop_front() function that waits until at least one element is available to be removed, but, in keeping with std::queue's and std::stack's style of not mixing container mutation and value copying, pop_front() returns void.
In a companion file, there's a type called concurrent::unconditional_pop, which allows one to ensure through RAII that the head element of the queue will be popped upon exit from the current scope.
The companion file error.hh defines the exceptions that arise from use of the function two_lock_queue::interrupt(), used to unblock threads waiting for access to the head of the queue.
Take a look at the code and let me know if you need more explanation as to how to use it.

If you're using C++0x you could internally synchronize list iteration this way:
Assuming the class has a templated list named objects_, and a boost::mutex named mutex_
The toAll method is a member method of the list wrapper
void toAll(std::function<void (T*)> lambda)
{
boost::mutex::scoped_lock(this->mutex_);
for(auto it = this->objects_.begin(); it != this->objects_.end(); it++)
{
T* object = it->second;
if(object != nullptr)
{
lambda(object);
}
}
}
Calling:
synchronizedList1->toAll(
[&](T* object)->void // Or the class that your list holds
{
for(auto it = this->knownEntities->begin(); it != this->knownEntities->end(); it++)
{
// Do something
}
}
);

You must be using some threading library. If you are using Intel TBB, you can use concurrent_vector or concurrent_queue. See this.

If you want to continue using std::list in a multi-threaded environment, I would recommend wrapping it in a class with a mutex that provides locked access to it. Depending on the exact usage, it might make sense to switch to a event-driven queue model where messages are passed on a queue that multiple worker threads are consuming (hint: producer-consumer).
I would seriously take Matthieu's thought into consideration. Many problems that are being solved using multi-threaded programming are better solved using message-passing between threads or processes. If you need high throughput and do not absolutely require that the processing share the same memory space, consider using something like the Message-Passing Interface (MPI) instead of rolling your own multi-threaded solution. There are a bunch of C++ implementations available - OpenMPI, Boost.MPI, Microsoft MPI, etc. etc.

I don't think you can get away without any synchronisation at all in this case as certain operation will invalidate the iterators you are using. With a list, this is fairly limited (basically, if both threads are trying to manipulate iterators to the same element at the same time) but there is still a danger that you'll be removing an element at the same time you're trying to append one to it.
Are you by any chance holding the lock across DoAction(i)? You obviously only want to hold the lock for the absolute minimum of time that you can get away with in order to maximise the performance. From the code above I think you'll want to decompose the loop somewhat in order to speed up both sides of the operation.
Something along the lines of:
while (processItems) {
Info item;
lock(mutex);
if (!infoList.empty()) {
item = infoList.front();
infoList.pop_front();
}
unlock(mutex);
DoAction(item);
delayALittle();
}
And the insert function would still have to look like this:
lock(mutex);
infoList.push_back(item);
unlock(mutex);
Unless the queue is likely to be massive, I'd be tempted to use something like a std::vector<Info> or even a std::vector<boost::shared_ptr<Info> > to minimize the copying of the Info objects (assuming that these are somewhat more expensive to copy compared to a boost::shared_ptr. Generally, operations on a vector tend to be a little faster than on a list, especially if the objects stored in the vector are small and cheap to copy.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js