In one of my projects, I identified a call to std::deque<T>::clear() as a major bottleneck.
I therefore decided to move this operation in a dedicated, low-priority thread:
template <class T>
void SomeClass::parallelClear(T& c)
{
if (!c.empty())
{
T* temp = new T;
c.swap(*temp); // swap contents (fast)
// deallocate on separate thread
boost::thread deleteThread([=] () { delete temp; } );
// Windows specific: lower priority class
SetPriorityClass(deleteThread.native_handle(), BELOW_NORMAL_PRIORITY_CLASS);
}
}
void SomeClass:clear(std::deque<ComplexObject>& hugeDeque)
{
parallelClear(hugeDeque);
}
This seems to work fine (VisualC++ 2010), but I wonder if I overlooked any major flaw. I would welcome your comments about the above code.
Additional information:
SomeClass:clear() is called from a GUI-thread, and the user interface is unresponsive until the call returns. The hugeQueue, on the other hand, is unlikely to be accessed by that thread for several seconds after clearing.
This is only valid if you guarantee that access to the heap is serialized. Windows does serialize access to the primary heap by default, but it's possible to turn this behaviour off and there's no guarantee that it holds across platforms or libraries. As such, I'd be careful depending on it- make sure that it's explicitly noted that it depends on the heap being shared between threads and that the heap is thread-safe to access.
I personally would simply suggest using a custom allocator to match the allocation/deallocation pattern would be the best performance improvement here- remember that creating threads has a non-trivial overhead.
Edit: If you are using GUI/Worker thread style threading design, then really, you should create, manage and destroy the deque on the worker thread.
Be aware, that its not sure that this will improve the overall performance of your application. The windows standard heap (also the Low fragmentation heap) is not laid out for frequently transmitting allocation information from one thread to another. This will work, but it might produce quite an overhead of processing.
The documentation of the hoard memory allocator might be a starting point do get deeper into that:
http://www.cs.umass.edu/~emery/hoard/hoard-documentation.html
Your approach will though improve responsiveness etc.
In addition to the things mentioned by the other posters you should consider if the objects contained in the collection have a thread affinity, for example COM objects in single threaded apartment may not be amenable to this kind of trick.
Related
I am quite new to multi-threading, I have a single threaded data analysis app that has a good bit of potential for parallelization and while the data sets are large it does not come close to saturating the hard-disk read/write so I figure I should take advantage of the threading support that is now in the standard and try to speed the beast up.
After some research I decided that producer consumer was a good approach for the reading of data from the disk and processing it and I started writing an object pool that would become part of the circular buffer that will be where the producers put data and the consumers get the data. As I was writing the class it felt like I was being too fine grained in how I was handling locking and releasing data members. It feels like half the code is locking and unlocking and like there are an insane number of synchronization objects floating around.
So I come to you with a class declaration and a sample function and this question: Is this too fine-grained? Not fine grained enough? Poorly thought out?
struct PoolArray
{
public:
Obj* arr;
uint32 used;
uint32 refs;
std::mutex locker;
};
class SegmentedPool
{
public: /*Construction and destruction cut out*/
void alloc(uint32 cellsNeeded, PoolPtr& ptr);
void dealloc(PoolPtr& ptr);
void clearAll();
private:
void expand();
//stores all the segments of the pool
std::vector< PoolArray<Obj> > pools;
ReadWriteLock poolLock;
//stores pools that are empty
std::queue< int > freePools;
std::mutex freeLock;
int currentPool;
ReadWriteLock currentLock;
};
void SegmentedPool::dealloc(PoolPtr& ptr)
{
//find and access the segment
poolLock.lockForRead();
PoolArray* temp = &(pools[ptr.getSeg()]);
poolLock.unlockForRead();
//reduce the count of references in the segment
temp->locker.lock();
--(temp->refs);
//if the number of references is now zero then set the segment back to unused
//and push it onto the queue of empty segments so that it can be reused
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
temp->locker.unlock();
ptr.set(NULL,-1);
}
A few explanations:
First PoolPtr is a stupid little pointer like object that stores the pointer and the segment number in the pool that the pointer came from.
Second this is all "templatized" but i took those lines out to try to reduce the length of the code block
Third ReadWriteLock is something I put together using a mutex and a pair of condition variables.
Locks are inefficient no matter how fine grained they are, so avoid at all cost.
Both queue and vector can be easyly implemented lock free using compare-swap primitive.
there are a number of papers on the topic
Lock free queue:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.53.8674&rep=rep1&type=pdf
http://www.par.univie.ac.at/project/peppher/publications/Published/opodis10lfq.pdf
Lock free vector:
http://www.stroustrup.com/lock-free-vector.pdf
Straustrup's paper also refers to lock-free allocator, but don't jump at it right away, standard allocators are pretty good these days.
UPD
If you don't want to bother writing your own containers, use Intel's Threading Building Blocks library, it provide both thread safe vector and queue. They are NOT lock free, but they are optimized to use CPU cache efficiently.
UPD
Regarding PoolArray, you don't need a lock there either. If you can use c++11, use std::atomic for atomic increments and swaps, otherwise use compiler built-ins (InterLocked* functions in MSVC and _sync* in gcc http://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html)
A good start - you lock things when needed and free them as soon as you're finished.
Your ReadWriteLock is pretty much a CCriticalSection object - depending on your needs it might improve performance to use that instead.
One thing I would say is that call your temp->locker.lock(); function before you release the lock on the pool poolLock.unlockForRead();, otherwise you're performing operations on the pool object when it's not under synchronisation control - it could be being used by another thread at that point. A minor point, but with multi-threading it's the minor points that trip you up in the end.
A good approach to take to multi-threading is to wrap any controlled resources in objects or functions that do the locking and unlocking inside them, so anyone who wants access to the data doesn't have to worry about which lock to lock or unlock, and when to do it. for example:
...
if(temp->refs==0)
{
temp->used=0;
freeLock.lock();
freePools.push(ptr.getSeg());
freeLock.unlock();
}
...
would be...
...
if(temp->refs==0)
{
temp->used=0;
addFreePool(ptr.getSeg());
}
...
void SegmentedPool::addFreePool(unsigned int seg)
{
freeLock.lock();
freePools.push(seg);
freeLock.unlock();
}
There are plenty of multi-threading benchmarking tools out there too. You can play around with controlling your resources in different ways, run it through one of the tools, and see where any bottlenecks are if you feel like performance is becoming an issue.
I wrote a threaded Renderer for SFML which takes pointers to drawable objects and stores them in a vector to be draw each frame. Starting out adding objects to the vector and removing objects to the vector would frequently cause Segmentation faults (SIGSEGV). To try and combat this, I would add objects that needed to be removed/added to a queue to be removed later (before drawing the frame). This seemed to fix it, but lately I have noticed that if I add many objects at one time (or add/remove them fast enough) I will get the same SIGSEGV.
Should I be using locks when I add/remove from the vector?
You need to understand the thread-safety guarantees the C++ standard (and implementations of C++2003 for possibly concurrent systems) give. The standard containers are a thread-safe in the following sense:
It is OK to have multiple concurrent threads reading the same container.
If there is one thread modifying a container there shall be no concurrent threads reading or writing the same container.
Different containers are independent of each other.
Many people misunderstand thread-safety of container to mean that these rules are imposed by the container implementation: they are not! It is your responsibility to obey these rules.
The reason these aren't, and actually can't, be imposed by the containers is that they don't have an interface suitable for this. Consider for example the following trivial piece of code:
if (!c.empty() {
auto value = c.back();
// do something with the read value
}
The container can control the access to the calls to empty() and back(). However, between these calls it necessarily needs to release any sort of synchronization facilities, i.e. by the time the thread tries to read c.back() the container may be empty again! There are essentially two ways to deal with this problem:
You need to use external locking if there is possibility that a concurrent thread may be changing the container to span the entire range of accesses which are interdependent in some form.
You change the interface of the containers to become monitors. However, the container interface isn't at all suitable to be changed in this direction because monitors essentially only support "fire and forget" style of interfaces.
Both strategies have their advantages and the standard library containers are clearly supporting the first style, i.e. they require external locking when using concurrently with a potential of at least one thread modifying the container. They don't require any kind of locking (neither internal or external) if there is ever only one thread using them in the first place. This is actually the scenario they were designed for. The thread-safety guarantees given for them are in place to guarantee that there are no internal facilities used which are not thread-safe, say one per-object iterator object or a memory allocation facility shared by multiple threads without being thread-safe, etc.
To answer the original question: yes, you need to use external synchronization, e.g. in the form of mutex locks, if you modify the container in one thread and read it in another thread.
Should I be using locks when I add/remove from the vector?
Yes. If you're using the vector from two threads at the same time and you reallocate, then the backing allocation may be swapped out and freed behind the other thread's feet. The other thread would be reading/writing to freed memory, or memory in use for another unrelated allocation.
We've been looking to use a lock-free queue in our code to reduce lock contention between a single producer and consumer in our current implementation. There are plenty of queue implementations out there, but I haven't been too clear on how to best manage memory management of nodes.
For example, the producer looks like this:
queue.Add( new WorkUnit(...) );
And the consumer looks like:
WorkUnit* unit = queue.RemoveFront();
unit->Execute();
delete unit;
We currently use memory pool for allocation. You'll notice that the producer allocates the memory and the consumer deletes it. Since we're using pools, we need to add another lock to the memory pool to properly protect it. This seems to negate the performance advantage of a lock-free queue in the first place.
So far, I think our options are:
Implement a lock-free memory pool.
Dump the memory pool and rely on a threadsafe allocator.
Are there any other options that we can explore? We're trying to avoid implementing a lock-free memory pool, but we may to take that path.
Thanks.
Only producer shall be able to create objects and destroy them when they aren't needed anymore. Consumer is only able to use objects and mark them as used. That's the point. You don't need to share memory in this case. This is the only way I know of efficient lock free queue implementation.
Read this great article that describes such algorithm in details.
One more possibility is to have a separate memory pool for each thread, so only one thread is ever using a heap, and you can avoid locking for the allocation.
That leaves you with managing a lock-free function to free a block. Fortunately, that's pretty easy to manage: you maintain a linked list of free blocks for each heap. You put the block's own address into (the memory you'll use as) the link field for the linked list, and then do an atomic exchange with the pointer holding the head of the linked list.
You should look at Intel's TBB. I don't know how much, if anything, it costs for commercial projects, but they already have concurrent queues, concurrent memory allocators, that sort of thing.
Your queue interface also looks seriously dodgy- for example, your RemoveFront() call- what if the queue is empty? The new and delete calls also look quite redundant. Intel's TBB and Microsoft's PPL (included in VS2010) do not suffer these issues.
I am not sure what exactly your requirements are, but if you have scenario where producer creates data and push this data on queue and you have single consumer who takes this data and uses it and then destroys it, you just need tread safe queue or you can make your own single linked list that is tread safe in this scenario.
create list element
append data in list element
change pointer of next element from null to list element (or equivalent in other languages)
Consumer can be implemented in any way, and most linked lists are by default tread safe for this kind of operation (but check that whit implementation)
In this scenario, consumer should free this memory, or it can return it in same way back to producer setting predefined flag. Producer does not need to check all flags (if there are more than like 1000 of them) to find which bucket is free, but this flags can be implemented like tree, enabling log(n) searching for available pool. I am sure this can be done in O(1) time without locking, but have no idea how
I had the exact same concern, so I wrote my own lock-free queue (single-producer, single-consumer) that manages the memory allocation for you (it allocates a series of contiguous blocks, sort of like std::vector).
I recently released the code on GitHub. (Also, I posted on my blog about it.)
If you create the node on the stack, enqueue it, then dequeue it to another node on the stack, you won't need to use pointers/manual allocation at all. Additionally, if your node implements move semantics for construction and assignment, it will be automatically moved instead of copied :-)
I'm having some issues using pooled memory allocators for std::list objects in a multi-threaded application.
The part of the code I'm concerned with runs each thread function in isolation (i.e. there is no communication or synchronization between threads) and therefore I'd like to setup separate memory pools for each thread, where each pool is not thread-safe (and hence fast).
I've tried using a shared thread-safe singleton memory pool and found the performance to be poor, as expected.
This is a heavily simplified version of the type of thing I'm trying to do. A lot has been included in a pseudo-code kind of way, sorry if it's confusing.
/* The thread functor - one instance of MAKE_QUADTREE created for each thread
*/
class make_quadtree
{
private:
/* A non-thread-safe memory pool for int linked list items, let's say that it's
* something along the lines of BOOST::OBJECT_POOL
*/
pooled_allocator<int> item_pool;
/* The problem! - a local class that would be constructed within each std::list as the
* allocator but really just delegates to ITEM_POOL
*/
class local_alloc
{
public :
//!! I understand that I can't access ITEM_POOL from within a nested class like
//!! this, that's really my question - can I get something along these lines to
//!! work??
pointer allocate (size_t n) { return ( item_pool.allocate(n) ); }
};
public :
make_quadtree (): item_pool() // only construct 1 instance of ITEM_POOL per
// MAKE_QUADTREE object
{
/* The kind of data structures - vectors of linked lists
* The idea is that all of the linked lists should share a local pooled allocator
*/
std::vector<std::list<int, local_alloc>> lists;
/* The actual operations - too complicated to show, but in general:
*
* - The vector LISTS is grown as a quadtree is built, it's size is the number of
* quadtree "boxes"
*
* - Each element of LISTS (each linked list) represents the ID's of items
* contained within each quadtree box (say they're xy points), as the quadtree
* is grown a lot of ID pop/push-ing between lists occurs, hence the memory pool
* is important for performance
*/
}
};
So really my problem is that I'd like to have one memory pool instance per thread functor instance, but within each thread functor share the pool between multiple std::list objects.
Why not just construct an instance of local_alloc with a reference to make_quadtree?
A thread specific allocator is quite the challenge.
I spent some time looking for a thread specific allocator "off the shelf". The best I found was hoard ( hoard.org ). This provided a significant performance improvement, however hoard has some serious drawbacks
I experienced some crashes during testing
Commercial licensing is
expensive
It 'hooks' system calls to
malloc, a technique I consider dodgy.
So I decided to roll my own thread specific memory allocator, based on boost::pool and boost::threadspecificptr. This required a small amount of, IMHO, seriously advanced C++ code, but now seems to be working well.
It is now quite a few months since I looked at the details of this, but perhaps I can take a look at it once more.
Your comment that you are looking for a thread specific but not a thread safe allocator. This makes sense, because if the allocator is thread specific it does not need to be thread safe. However, in my experience, the extra burden of being thread safe is trivial, so long as contention does NOT occur.
However, all that theory is fun, but I think we should move on to practicalities. I believe that we need a small, instrumented stand-alone program that demonstrates the problem you need to solve. I had what sounded like a very similar problem with std::multiset allocation and wrote the program you can see here: Parallel reads from STL containers
If you can write something similar, showing your problem, then I can check if my thread specific memory allocator can be applied with advantage in your case.
I have std::list<Info> infoList in my application that is shared between two threads. These 2 threads are accessing this list as follows:
Thread 1: uses push_back(), pop_front() or clear() on the list (Depending on the situation)
Thread 2: uses an iterator to iterate through the items in the list and do some actions.
Thread 2 is iterating the list like the following:
for(std::list<Info>::iterator i = infoList.begin(); i != infoList.end(); ++i)
{
DoAction(i);
}
The code is compiled using GCC 4.4.2.
Sometimes ++i causes a segfault and crashes the application. The error is caused in std_list.h line 143 at the following line:
_M_node = _M_node->_M_next;
I guess this is a racing condition. The list might have changed or even cleared by thread 1 while thread 2 was iterating it.
I used Mutex to synchronize access to this list and all went ok during my initial test. But the system just freezes under stress test making this solution totally unacceptable. This application is a real-time application and i need to find a solution so both threads can run as fast as possible without hurting the total applications throughput.
My question is this:
Thread 1 and Thread 2 need to execute as fast as possible since this is a real-time application. what can i do to prevent this problem and still maintain the application performance? Are there any lock-free algorithms available for such a problem?
Its ok if i miss some newly added Info objects in thread 2's iteration but what can i do to prevent the iterator from becoming a dangling pointer?
Thanks
Your for() loop can potentially keep a lock for a relatively long time, depending on how many elements it iterates. You can get in real trouble if it "polls" the queue, constantly checking if a new element became available. That makes the thread own the mutex for an unreasonably long time, giving few opportunities to the producer thread to break in and add an element. And burning lots of unnecessary CPU cycles in the process.
You need a "bounded blocking queue". Don't write it yourself, the lock design is not trivial. Hard to find good examples, most of it is .NET code. This article looks promising.
In general it is not safe to use STL containers this way. You will have to implement specific method to make your code thread safe. The solution you chose depends on your needs. I would probably solve this by maintaining two lists, one in each thread. And communicating the changes through a lock free queue (mentioned in the comments to this question). You could also limit the lifetime of your Info objects by wrapping them in boost::shared_ptr e.g.
typedef boost::shared_ptr<Info> InfoReference;
typedef std::list<InfoReference> InfoList;
enum CommandValue
{
Insert,
Delete
}
struct Command
{
CommandValue operation;
InfoReference reference;
}
typedef LockFreeQueue<Command> CommandQueue;
class Thread1
{
Thread1(CommandQueue queue) : m_commands(queue) {}
void run()
{
while (!finished)
{
//Process Items and use
// deleteInfo() or addInfo()
};
}
void deleteInfo(InfoReference reference)
{
Command command;
command.operation = Delete;
command.reference = reference;
m_commands.produce(command);
}
void addInfo(InfoReference reference)
{
Command command;
command.operation = Insert;
command.reference = reference;
m_commands.produce(command);
}
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
class Thread2
{
Thread2(CommandQueue queue) : m_commands(queue) {}
void run()
{
while(!finished)
{
processQueue();
processList();
}
}
void processQueue()
{
Command command;
while (m_commands.consume(command))
{
switch(command.operation)
{
case Insert:
m_infoList.push_back(command.reference);
break;
case Delete:
m_infoList.remove(command.reference);
break;
}
}
}
void processList()
{
// Iterate over m_infoList
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
void main()
{
CommandQueue commands;
Thread1 thread1(commands);
Thread2 thread2(commands);
thread1.start();
thread2.start();
waitforTermination();
}
This has not been compiled. You still need to make sure that access to your Info objects is thread safe.
I would like to know what is the purpose of this list, it would be easier to answer the question then.
As Hoare said, it is generally a bad idea to try to share data to communicate between two threads, rather you should communicate to share data: ie messaging.
If this list is modelling a queue, for example, you might simply use one of the various ways to communicate (such as sockets) between the two threads. Consumer / Producer is a standard and well-known problem.
If your items are expensive, then only pass the pointers around during communication, you'll avoid copying the items themselves.
In general, it's exquisitely difficult to share data, although it is unfortunately the only way of programming we hear of in school. Normally only low-level implementation of "channels" of communication should ever worry about synchronization and you should learn to use the channels to communicate instead of trying to emulate them.
To prevent your iterator from being invalidated you have to lock the whole for loop. Now I guess the first thread may have difficulties updating the list. I would try to give it a chance to do its job on each (or every Nth iteration).
In pseudo-code that would look like:
mutex_lock();
for(...){
doAction();
mutex_unlock();
thread_yield(); // give first thread a chance
mutex_lock();
if(iterator_invalidated_flag) // set by first thread
reset_iterator();
}
mutex_unlock();
You have to decide which thread is the more important. If it is the update thread, then it must signal the iterator thread to stop, wait and start again. If it is the iterator thread, it can simply lock the list until iteration is done.
The best way to do this is to use a container that is internally synchronized. TBB and Microsoft's concurrent_queue do this. Anthony Williams also has a good implementation on his blog here
Others have already suggested lock-free alternatives, so I'll answer as if you were stuck using locks...
When you modify a list, existing iterators can become invalidated because they no longer point to valid memory (the list automagically reallocates more memory when it needs to grow). To prevent invalidated iterators, you could make the producer block on a mutex while your consumer traverses the list, but that would be needless waiting for the producer.
Life would be easier if you used a queue instead of a list, and have your consumer use a synchronized queue<Info>::pop_front() call instead of iterators that can be invalidated behind your back. If your consumer really needs to gobble chunks of Info at a time, then use a condition variable that'll make your consumer block until queue.size() >= minimum.
The Boost library has a nice portable implementation of condition variables (that even works with older versions of Windows), as well as the usual threading library stuff.
For a producer-consumer queue that uses (old-fashioned) locking, check out the BlockingQueue template class of the ZThreads library. I have not used ZThreads myself, being worried about lack of recent updates, and because it didn't seem to be widely used. However, I have used it as inspiration for rolling my own thread-safe producer-consumer queue (before I learned about lock-free queues and TBB).
A lock-free queue/stack library seems to be in the Boost review queue. Let's hope we see a new Boost.Lockfree in the near future! :)
If there's interest, I can write up an example of a blocking queue that uses std::queue and Boost thread locking.
EDIT:
The blog referenced by Rick's answer already has a blocking queue example that uses std::queue and Boost condvars. If your consumer needs to gobble chunks, you can extend the example as follows:
void wait_for_data(size_t how_many)
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.size() < how_many)
{
the_condition_variable.wait(lock);
}
}
You may also want to tweak it to allow time-outs and cancellations.
You mentioned that speed was a concern. If your Infos are heavyweight, you should consider passing them around by shared_ptr. You can also try making your Infos fixed size and use a memory pool (which can be much faster than the heap).
As you mentioned that you don't care if your iterating consumer misses some newly-added entries, you could use a copy-on-write list underneath. That allows the iterating consumer to operate on a consistent snapshot of the list as of when it first started, while, in other threads, updates to the list yield fresh but consistent copies, without perturbing any of the extant snapshots.
The trade here is that each update to the list requires locking for exclusive access long enough to copy the entire list. This technique is biased toward having many concurrent readers and less frequent updates.
Trying to add intrinsic locking to the container first requires you to think about which operations need to behave in atomic groups. For instance, checking if the list is empty before trying to pop off the first element requires an atomic pop-if-not-empty operation; otherwise, the answer to the list being empty can change in between when the caller receives the answer and attempts to act upon it.
It's not clear in your example above what guarantees the iteration must obey. Must every element in the list eventually be visited by the iterating thread? Does it make multiple passes? What does it mean for one thread to remove an element from the list while another thread is running DoAction() against it? I suspect that working through these questions will lead to significant design changes.
You're working in C++, and you mentioned needing a queue with a pop-if-not-empty operation. I wrote a two-lock queue many years ago using the ACE Library's concurrency primitives, as the Boost thread library was not yet ready for production use, and the chance for the C++ Standard Library including such facilities was a distant dream. Porting it to something more modern would be easy.
This queue of mine -- called concurrent::two_lock_queue -- allows access to the head of the queue only via RAII. This ensures that acquiring the lock to read the head will always be mated with a release of the lock. A consumer constructs a const_front (const access to head element), a front (non-const access to head element), or a renewable_front (non-const access to head and successor elements) object to represent the exclusive right to access the head element of the queue. Such "front" objects can't be copied.
Class two_lock_queue also offers a pop_front() function that waits until at least one element is available to be removed, but, in keeping with std::queue's and std::stack's style of not mixing container mutation and value copying, pop_front() returns void.
In a companion file, there's a type called concurrent::unconditional_pop, which allows one to ensure through RAII that the head element of the queue will be popped upon exit from the current scope.
The companion file error.hh defines the exceptions that arise from use of the function two_lock_queue::interrupt(), used to unblock threads waiting for access to the head of the queue.
Take a look at the code and let me know if you need more explanation as to how to use it.
If you're using C++0x you could internally synchronize list iteration this way:
Assuming the class has a templated list named objects_, and a boost::mutex named mutex_
The toAll method is a member method of the list wrapper
void toAll(std::function<void (T*)> lambda)
{
boost::mutex::scoped_lock(this->mutex_);
for(auto it = this->objects_.begin(); it != this->objects_.end(); it++)
{
T* object = it->second;
if(object != nullptr)
{
lambda(object);
}
}
}
Calling:
synchronizedList1->toAll(
[&](T* object)->void // Or the class that your list holds
{
for(auto it = this->knownEntities->begin(); it != this->knownEntities->end(); it++)
{
// Do something
}
}
);
You must be using some threading library. If you are using Intel TBB, you can use concurrent_vector or concurrent_queue. See this.
If you want to continue using std::list in a multi-threaded environment, I would recommend wrapping it in a class with a mutex that provides locked access to it. Depending on the exact usage, it might make sense to switch to a event-driven queue model where messages are passed on a queue that multiple worker threads are consuming (hint: producer-consumer).
I would seriously take Matthieu's thought into consideration. Many problems that are being solved using multi-threaded programming are better solved using message-passing between threads or processes. If you need high throughput and do not absolutely require that the processing share the same memory space, consider using something like the Message-Passing Interface (MPI) instead of rolling your own multi-threaded solution. There are a bunch of C++ implementations available - OpenMPI, Boost.MPI, Microsoft MPI, etc. etc.
I don't think you can get away without any synchronisation at all in this case as certain operation will invalidate the iterators you are using. With a list, this is fairly limited (basically, if both threads are trying to manipulate iterators to the same element at the same time) but there is still a danger that you'll be removing an element at the same time you're trying to append one to it.
Are you by any chance holding the lock across DoAction(i)? You obviously only want to hold the lock for the absolute minimum of time that you can get away with in order to maximise the performance. From the code above I think you'll want to decompose the loop somewhat in order to speed up both sides of the operation.
Something along the lines of:
while (processItems) {
Info item;
lock(mutex);
if (!infoList.empty()) {
item = infoList.front();
infoList.pop_front();
}
unlock(mutex);
DoAction(item);
delayALittle();
}
And the insert function would still have to look like this:
lock(mutex);
infoList.push_back(item);
unlock(mutex);
Unless the queue is likely to be massive, I'd be tempted to use something like a std::vector<Info> or even a std::vector<boost::shared_ptr<Info> > to minimize the copying of the Info objects (assuming that these are somewhat more expensive to copy compared to a boost::shared_ptr. Generally, operations on a vector tend to be a little faster than on a list, especially if the objects stored in the vector are small and cheap to copy.