Document locking in multithreading environment [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We have an application that supports binary plugins (dynamically loaded libraries) as well as a number of plugins for this application. The application is itself multithreaded and the plugins may also start threads. There's a lot of locking going on to keep data structures consistent.
One major problem is that sometimes locks are held across calls from the application into a plugin. This is problematic because the plugin code might want to call back into the application, producing a deadlock. This problem is aggravated by the fact that different teams work on the base application and the plugins.
The question is: Is there a "standard" or at least widely used way of documenting locking schemes apart from writing tons of plain text?

It is a theorical approach, I hope it will help you a little.
To me you can avoid this situation by redesigning the way plugins and your application are communicating (if possible).
A plugin's code is not secure. To ensure the application's flexibility and its stability you must build a standard way to exchange informations and make critical actions with plugins.
The easiest way is to avoid to manage each specific plugin behavior by defining a lock free api.
To do that you can make the critical parts of your plugins asynchronous by using ring buffer / disruptor or just an action buffer.
EDIT
Sorry if I argue again in the same way, but this seems to me to be like an "IO" problem.
You have concurrent access on some resources (memory/disc/network .... don't know which ones) and the need to expose them with high availability. And finally these resources cannot be access randomly without locking your application.
With a manager dedicated on the critical parts, the wait can be short enough to be imperceptible.
However this is not easily applicable to an already existing application, mostly if it is a large one.
if you don't already know this kind of stuff, I encourage you to look to the "disruptor". To me it is one of the modern basic to consider every time I work with threads.

I suggest to use Petri Net which are simple to learn and can describe very well the cooperation among the different parts of your software. In this question are described several models and tools useful to document concurrency: https://stackoverflow.com/questions/164187/what-tools-diagrams-do-you-use-for-modelling-multithreaded-systems. You can choose the right model according your needs.

If your locking scheme is simple enough that you can describe it in documentation, then by all means do so. However, if deadlocks are occurring in practice, the problem may not be lack of documentation, but that the API is not serving the needs of your plugin authors. Documenting the limitations is a good first step, but removing the limitations is better.
Consider the possibilities for a deadlock on a single lock held by your code and requested by the plugin:
Your code is not in the middle of reading or writing, but is still holding the lock just because that's how the code was written. In that case, your code should release the lock before calling into the plugin.
Your code and the plugin are both reading data, and using the lock to prevent concurrent writers. In that case, use a readers-writers lock.
Your code is in the middle of changing data, and the plugin wants to read it. This is not generally safe; there's a reason you're using a lock to protect the entire modification, after all. Most attempts to make this safe fail in practice (it is as hard as writing lock-free code). In this case, the best thing to do is change your design so your code finishes changes before calling the plugin, or starts changes after calling the plugin.
Your code is in the middle of reading data, and the plugin wants to change it. Like the previous case, this is also not safe. Your code should release the lock before calling the plugin and acquire it again afterward, and assume the data have changed, re-reading anything you need to continue.
This is the best advice I can give without knowing anything more about your application and its specific needs.
For most applications, software companies shy away from 3rd party binary plugins in the same process because when something goes wrong, it is very difficult to figure out why. Users usually blame the application, not the plugin, and the perception of the quality of your application is poor. It can be made to work by keeping very close relationships with your plugin authors, usually including exchanging all source code (optionally under restrictive licenses or NDAs).

Yes, there is a standard way of documenting locking schemes using in university.
1/ use diagram
you must draw a diagram. each point on the diagram is a lock link to other thread.
ex: T1 T2
1 -R-> A
2 <-W- B
2/ use table
you must write down each point and thread on each row
ex: T1 T2
lockX(A) lockS(B)
read(A) read(B)
A<-A50 unlock(B)
Conclude: this is very complex task and take many time to trace.

Related

When to use multithreading in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am C++ programmer (intermediate) and learning multi-threading now. I found it quite confusing when to use multi-threading in C++? How will i come to know that i need to use multi-threading in which part of section?
When to use multithreading in C++?
When you have resource intensive task like huge mathematical calculation , or I/O intensive task like reading or writing to file, use should your multithreading.
Purpose should be, you can be able to run multiple things (tasks) together, so that it will increase performance and responsiveness of your application. Also, learn about synchronization before implementing multithreading in your application.
When to use multithreading in C++?`
Well - the general rule of thumb is: use it when it can speed up your application. The answer isn't really language-dependant.
If you want to get an in-depth answer, then you have to consider a few things:
Is multithreading possible to implement inside your code? Do you have fragments which can be calulated at the same time and are intependent of other calculations?
Is multithreading worth implementing? Does your program run slow even when you did all you could to make it as fast as possible?
Will your code be run on machines that support multithreading (so have multiple processing units)? If you're designing code for some kind of machine with only one core, using multithreading is a waste of time.
Is there a different option? A better algorithm, cleaning the code, etc? If so - maybe it's better to use that instead of multithreading?
Do you have to handle things that are hard to predict in time, while the whole application has to constantly run? For example - receiving some information from a server in a game?
This is a slightly subjective subject... But I tend to use multi-threading in one of two situations.
1 - In a performance critical situation where the utmost power is needed (and the algorithm of course supports parallelism), for me, matrix multiplications.
2 - Rarely where it may be easier to have a thread managing something fairly independent. The classic is networking, perhaps have a thread blocking waiting for connections and spawning threads to manage each thread as it comes in. This is useful as the threads can block and respond in a timely manner. Say you have a server, one request might need disk access which is slow, another thread can jump in an field a different request while the first is waiting for its data.
As has been said by others, only when you need to should you think about doing it, it gets complicated fast and can be difficult to debug.
Multithreading is a specialized form of multitasking and a multitasking is the feature that allows your computer to run two or more programs concurrently.
I think this link can help you.
http://www.tutorialspoint.com/cplusplus/cpp_multithreading.htm
Mostly when you want things to be done at the same time. For instance, you may want a window to still respond to user input when a level is loading in a game or when you're downloading multiple files at once, etc. It's for things that really can't wait until other processing is done. Of course, both probably go slower as a result, but it really gives the illusion of multiple things happening at once.
Use multithreading when you can speed up your algorithms by doing things in parallel. Use it in opposition to multiprocessing when the threads need access to the parent process's resources.
My two cents.
Use cases:
Integrate your application in a lib/app that already runs a loop. You would need a thread of your own to run your code concurrently if you cannot integrate into the other app.
Task splitting. It makes sense to organize disjoint tasks in threads sometimes, such as in separating sound from image processing, for example.
Performance. When you want to improve the throghput of some task.
Recommendations:
In the general case, don't do multithreading if a single threaded solution will suffice. It adds complexity.
When needed, start with higher-order primitives, such as std::future and std::async.
When possible, avoid data sharing, which is the source of contention.
When going to lower level abstractions, such as mutexes and so on, encapsulate it in some pattern. You can take a look at these slides.
Decouple your functions from threading and compose the threading into the functions at a later point. Namely, don't embed thread creation into the logic of your code.

Application of Shared Read Locks

what is the need for a read shared lock?
I can understand that write locks have to be exclusive only. But what is the need for many clients to access the document simultaneously and still share only read privilege? Practical applications of Shared read locks would be of great help too.
Please move the question to any other forum you'd find it appropriate to be in.
Though this is a question purely related to ABAP programming and theory I'm doing, I'm guessing the applications are generic to all languages.
Thanks!
If you do complex and time-consuming calculations based on multiple datasets (e. g. postings), you have to ensure that none of these datasets is changed while you're working - otherwise the calculations might be wrong. Most of the time, the ACID principles will ensure this, but sometimes, that's not enough - for example if the datasource is so large that you have to break it up into parallel subtasks or if you have to call some function that performs a database commit or rollback internally. In this case, the transaction isolation is no longer enough, and you need to lock the entity on a logical level.

How to convert my project to become a multi threaded application

I have a project and I want to convert it to multi-threaded application. What are the things that can be done to make it a multi threaded application
List out things to be done to convert into multithreaded application
e.g mutex lock on shared variables.
I was not able to find a question which list all those under single hood.
project is in C
Single threaded application need not be concerned about being thread safe.
This issue arises when you have multiple threads which are trying to access a commonly shared resource. At that time, you must be concerned.
So, no need to worry.
EDIT (after question been edited ) :
You need to go through the following links.
Single threaded to multithreaded application
Single threaded to multithreaded application - What we need to consider ?
Advice - Single threaded to multithreaded application
Also a good advice for converting single to multithreaded application.Check out.
Single threaded -> Multithreaded application :: Good advice.
The big issue is that, in general, when designing your application it is very difficult to choose single thread and then later on add multi-threading. The choice is fundamental to the design idioms you are going to strive towards. Here's a brief but poor guide of some of the things you should be paying attention towards and how to modify your code (note, none of these are set in stone, there's always a way around):
Remove all mutable global variables. I'd say this goes for single threaded applications too but that's just me.
Add "const" to as many variables as you can as a first pass to decide where there are state changes and take notes from the compilation errors. This is not to say "turn all your variables to const." It is just s simple hack to figure out where your problem areas are going to be.
For those items which are mutable and which will be shared (that is, you can't leave them as const without compilation warnings) put locks around them. Each lock should be logged.
Next, introduce your threads. You're probably about to suffer a lot of deadlocks, livelocks, race conditions, and what not as your single threaded application made assumptions about the way and order your application would run.
Start by paring away unneeded locks. That is, look to the mutable state which isn't shared amongst your threads. Those locks are superfluous and need to go.
Next, study your code. At this point, determining where your threaded issues are is more art than science. Although, there are decent principals about how to go about this, that's about all I can say.
If that sounds like too much effort, it's time to look towards the Actor model for concurrency. This would be akin to creating several different applications which call one another through a message passing scheme. I find that Actors are not only intuitive but also massively friendly to determining where and how you might encounter threading issues. When setting up Actors, it's almost impossible not to think about all the "what ifs."
Personally, when dealing with a single threaded to multi threaded conversion, I do as little as possible to meet project goals. It's just safer.
This depends very heavily on exactly how you intend to use threads. What does your program do? Where do you want to use threads? What will those threads be doing?
You will need to figure out what resources these threads will be sharing, and apply appropriate locking. Since you're starting with a single-threaded application, it's a good idea to minimize the shared resources to make porting easier. For example, if you have a single GUI thread right now, and need to do some complex computations in multiple threads, spawn those threads, but don't have them directly touch any data for the GUI - instead, send a asynchronous message to the GUI thread (how you do this depends on the OS and GUI library) and have it handle any changes to GUI-thread data in a serialized fashion on the GUI thread itself.
As general advice, don't simply add threads willy-nilly. You should know exactly which variables and data structures are shared between threads, where they are accessed, and why. And you should be keeping said sharing to the minimum.
Without a much more detailed description of your application, it's nearly impossible to give you a complete answer.
It will be a good idea to give some insight in your understanding of threading aswell.
However, the most important is that each time a global variable is accessed or a pointer is used, there's a good chance you'll need to do that inside of a mutex.
This wikipedia page should be a good start : http://en.wikipedia.org/wiki/Thread_safety

What should I know about multithreading and when to use it, mainly in c++

I have never come across multithreading but I hear about it everywhere. What should I know about it and when should I use it? I code mainly in c++.
Mostly, you will need to learn about MT libraries on OS on which your application needs to run. Until and unless C++0x becomes a reality (which is a long way as it looks now), there is no support from the language proper or the standard library for threads. I suggest you take a look at the POSIX standard pthreads library for *nix and Windows threads to get started.
This is my opinion, but the biggest issue with multithreading is that it is difficult. I don't mean that from an experienced programmer point of view, I mean it conceptually. There really are a lot of difficult concurrency problems that appear once you dive into parallel programming. This is well known, and there are many approaches taken to make concurrency easier for the application developer. Functional languages have become a lot more popular because of their lack of side effects and idempotency. Some vendors choose to hide the concurrency behind API's (like Apple's Core Animation).
Multitheaded programs can see some huge gains in performance (both in user perception and actual amount of work done), but you do have to spend time to understand the interactions that your code and data structures make.
MSDN Multithreading for Rookies article is probably worth reading. Being from Microsoft, it's written in terms of what Microsoft OSes support(ed in 1993), but most of the basic ideas apply equally to other systems, with suitable renaming of functions and such.
That is a huge subject.
A few points...
With multi-core, the importance of multi-threading is now huge. If you aren't multithreading, you aren't getting the full performance capability of the machine.
Multi-threading is hard. Communicating and synchronization between threads is tricky to get right. Problems are often intermittent, hard to diagnose, and if the design isn't right for multi-threading, hard to fix.
Multi-threading is currently mostly non-portable and platform specific.
There are portable libraries with wrappers around threading APIs. Boost is one. wxWidgets (mainly a GUI library) is another. It can be done reasonably portably, but you won't have all the options you get from platform-specific APIs.
I've got an introduction to multithreading that you might find useful.
In this article there isn't a single
line of code and it's not aimed at
teaching the intricacies of
multithreaded programming in any given
programming language but to give a
short introduction, focusing primarily
on how and especially why and when
multithreaded programming would be
useful.
Here's a link to a good tutorial on POSIX threads programming (with diagrams) to get you started. While this tutorial is pthread specific, many of the concepts transfer to other systems.
To understand more about when to use threads, it helps to have a basic understanding of parallel programming. Here's a link to a tutorial on the very basics of parallel computing intended for those who are just becoming acquainted with the subject.
The other replies covered the how part, I'll briefly mention when to use multithreading.
The main alternative to multithreading is using a timer. Consider for example that you need to update a little label on your form with the existence of a file. If the file exists, you need to draw a special icon or something. Now if you use a timer with a low timeout, you can achieve basically the same thing, a function that polls if the file exists very frequently and updates your ui. No extra hassle.
But your function is doing a lot of unnecessary work, isn't it. The OS provides a "hey this file has been created" primitive that puts your thread to sleep until your file is ready. Obviously you can't use this from the ui thread or your entire application would freeze, so instead you spawn a new thread and set it to wait on the file creation event.
Now your application is using as little cpu as possible because of the fact that threads can wait on events (be it with mutexes or events). Say your file is ready however. You can't update your ui from different threads because all hell would break loose if 2 threads try to change the same bit of memory at the same time. In fact this is so bad that windows flat out rejects your attempts to do it at all.
So now you need either a synchronization mechanism of sorts to communicate with the ui one after the other (serially) so you don't step on eachother's toes, but you can't code the main thread part because the ui loop is hidden deep inside windows.
The other alternative is to use another way to communicate between threads. In this case, you might use PostMessage to post a message to the main ui loop that the file has been found and to do its job.
Now if your work can't be waited upon and can't be split nicely into little bits (for use in a short-timeout timer), all you have left is another thread and all the synchronization issues that arise from it.
It might be worth it. Or it might bite you in the ass after days and days, potentially weeks, of debugging the odd race condition you missed. It might pay off to spend a long time first to try to split it up into little bits for use with a timer. Even if you can't, the few cases where you can will outweigh the time cost.
You should know that it's hard. Some people think it's impossibly hard, that there's no practical way to verify that a program is thread safe. Dr. Hipp, author of sqlite, states that thread are evil. This article covers the problems with threads in detail.
The Chrome browser uses processes instead of threads, and tools like Stackless Python avoid hardware-supported threads in favor of interpreter-supported "micro-threads". Even things like web servers, where you'd think threading would be a perfect fit, and moving towards event driven architectures.
I myself wouldn't say it's impossible: many people have tried and succeeded. But there's no doubt writting production quality multi-threaded code is really hard. Successful multi-threaded applications tend to use only a few, predetermined threads with just a few carefully analyzed points of communication. For example a game with just two threads, physics and rendering, or a GUI app with a UI thread and background thread, and nothing else. A program that's spawning and joining threads throughout the code base will certainly have many impossible-to-find intermittent bugs.
It's particularly hard in C++, for two reasons:
the current version of the standard doesn't mention threads at all. All threading libraries and platform and implementation specific.
The scope of what's considered an atomic operation is rather narrow compared to a language like Java.
cross-platform libraries like boost Threads mitigate this somewhat. The future C++0x will introduce some threading support. But boost also has good interprocess communication support you could use to avoid threads altogether.
If you know nothing else about threading than that it's hard and should be treated with respect, than you know more than 99% of programmers.
If after all that, you're still interested in starting down the long hard road towards being able to write a multi-threaded C++ program that won't segfault at random, then I recommend starting with Boost threads. They're well documented, high level, and work cross platform. The concepts (mutexes, locks, futures) are the same few key concepts present in all threading libraries.

Multithreaded job queue manager

I need to manage CPU-heavy multitaskable jobs in an interactive application. Just as background, my specific application is an engineering design interface. As a user tweaks different parameters and options to a model, multiple simulations are run in the background and results displayed as they complete, likely even as the user is still editing values. Since the multiple simulations take variable time (some are milliseconds, some take 5 seconds, some take 10 minutes), it's basically a matter of getting feedback displayed as fast as possible, but often aborting jobs that started previously but are now no longer needed because of the user's changes have already invalidated them. Different user changes may invalidate different computations so at any time I may have 10 different simulations running. Somesimulations have multiple parts which have dependencies (simulations A and B can be seperately computed, but I need their results to seed simulation C so I need to wait for both A and B to finish first before starting C.)
I feel pretty confident that the code-level method to handle this kind of application is some kind of multithreaded job queue. This would include features of submitting jobs for execution, setting task priorities, waiting for jobs to finish, specifying dependencies (do this job, but only after job X and job Y have finished), canceling subsets of jobs that fit some criteria, querying what jobs remain, setting worker thread counts and priorities, and so on. And multiplatform support is very useful too.
These are not new ideas or desires in software, but I'm at the early design phase of my application where I need to make a choice about what library to use for managing such tasks. I've written my own crude thread managers in the past in C (I think it's a rite of passage) but I want to use modern tools to base my work on, not my own previous hacks.
The first thought is to run to OpenMP but I'm not sure it's what I want. OpenMP is great for parallelizing at a fine level, automatically unrolling loops and such. While multiplatform, it also invades your code with #pragmas. But mostly it's not designed for managing large tasks.. especially cancelling pending jobs or specifying dependencies. Possible, yes, but it's not elegant.
I noticed that Google Chrome uses such a job manager for even the most trivial tasks. The design goal seems to be to keep the user interaction thread as light and nimble as possible, so anything that can get spawned off asynchronously, should be. From looking at the Chrome source this doesn't seem to be a generic library, but it still is interesting to see how the design uses asynchronous launches to keep interaction fast. This is getting to be similar to what I'm doing.
There are a still other options:
Surge.Act: a Boost-like library for defining jobs. It builds on OpenMP, but does allow chaining of dependencies which is nice. It doesn't seem to feel like it's got a manager that can be queried, jobs cancelled, etc. It's a stale project so it's scary to depend on it.
Job Queue is quite close to what I'm thinking of, but it's a 5 year old article, not a supported library.
Boost.threads does have nice platform independent synchronization but that's not a job manager. POCO has very clean designs for task launching, but again not a full manager for chaining tasks. (Maybe I'm underestimating POCO though).
So while there are options available, I'm not satisfied and I feel the urge to roll my own library again. But I'd rather use something that's already in existence. Even after searching (here on SO and on the net) I haven't found anything that feels right, though I imagine this must be a kind of tool that is often needed, so surely there's some community library or at least common design.
On SO there's been some posts about job queues, but nothing that seems to fit.
My post here is to ask you all what existing tools I've missed, and/or how you've rolled your own such multithreaded job queue.
We had to build our own job queue system to meet requirements similar to yours ( UI thread must always respond within 33ms, jobs can run from 15-15000ms ), because there really was nothing out there that quite met our needs, let alone was performant.
Unfortunately our code is about as proprietary as proprietary gets, but I can give you some of the most salient features:
We start up one thread per core at the beginning of the program. Each pulls work from a global job queue. Jobs consist of a function object and a glob of associated data (really an elaboration on a func_ptr and void *). Thread 0, the fast client loop, isn't allowed to work on jobs, but the rest grab as they can.
The job queue itself ought to be a lockless data structure, such as a lock-free singly linked list (Visual Studio comes with one). Avoid using a mutex; contention for the queue is surprisingly high, and grabbing mutexes is costly.
Pack up all the necessary data for the job into the job object itself -- avoid having pointer from the job back into the main heap, where you'll have to deal with contention between jobs and locks and all that other slow, annoying stuff. For example, all the simulation parameters should go into the job's local data blob. The results structure obviously needs to be something that outlives the job: you can deal with this either by a) hanging onto the job objects even after they've finished running (so you can use their contents from the main thread), or b) allocating a results structure specially for each job and stuffing a pointer into the job's data object. Even though the results themselves won't live in the job, this effectively gives the job exclusive access to its output memory so you needn't muss with locks.
Actually I'm simplifying a bit above, since we need to choreograph exactly which jobs run on which cores, so each core gets its own job queue, but that's probably unnecessary for you.
I rolled my own, based on Boost.threads. I was quite surprised by how much bang I got from writing so little code. If you don't find something pre-made, don't be afraid to roll your own. Between Boost.threads and your experience since writing your own, it might be easier than you remember.
For premade options, don't forget that Chromium is licensed very friendly, so you may be able to roll your own generic library around its code.
Microsoft is working on a set of technologies for the next Version of Visual Studio 2010 called the Concurrency Runtime, the Parallel Pattern Library and the Asynchronous Agents Library which will probably help. The Concurrency Runtime will offer policy based scheduling, i.e. allowing you to manage and compose multiple scheduler instances (similar to thread pools but with affinitization and load balancing between instances), the Parallel Pattern Library will offer task based programming and parallel loops with an STL like programming model. The Agents library offers an actor based programming model and has support for building concurrent data flow pipelines, i.e. managing those dependencies described above. Unfortunately this isn't released yet, so you can read about it on our team blog or watch some of the videos on channel9 there is also a very large CTP that is available for download as well.
If you're looking for a solution today, Intel's Thread Building Blocks and boost's threading library are both good libraries and available now. JustSoftwareSolutions has released an implementation of std::thread which matches the C++0x draft and of course OpenMP is widely available if you're looking at fine-grained loop based parallelism.
The real challenge as other folks have alluded to is to correctly identify and decompose work into tasks suitable for concurrent execution (i.e. no unprotected shared state), understand the dependencies between them and minimize the contention that can occur on bottlenecks (whether the bottleneck is protecting shared state or ensuring the dispatch loop of a work queue is low contention or lock-free)... and to do this without scheduling implementation details leaking into the rest of your code.
-Rick
Would something like threadpool be useful to you? It's based on boost::threads and basically implements a simple thread task queue that passes worker functions off to the pooled threads.
I've been looking for near the same requirements. I'm working on a game with 4x-ish mechanics and scheduling different parts of what gets done almost exploded my brain. I have a complex set of work that needs to get accomplished at different time resolutions, and to a different degree of actual simulation depending on what system/region the player has actively loaded. This means as the player moves from system to system, I need to load a system to the current high resolution simulation, offload the last system to a lower resolution simulation, and do the same for active/inactive regions of systems. The different simulations are big lists of population, political, military, and economic actions based on profiles of each entity. I'm going to try to describe my issue and my approach so far and I hope it's useful at describe an alternative for you or someone else. The rough outline of the structure I'm building will use the following:
cpp-taskflow (A Modern C++ Parallel Task Programming Library) I'm going to make a library of modules that will be used as job construction parts. Each entry will have an API for initializing and destruction as well as pointers for communication. I'm hoping to write it in a way that they will be nest-able using the cpp-taskflow API to set-up all the dependencies at job creation time, but provide a means of live adjustment and having a kill-switch available. Most of what I'm making will be decision trees of state machines, or state machines of behavior trees so the job data structure will be settings and states of time-resolution tagged data pointing to actual stats and object values.
FlatBuffers I'm looking to use this library to build a "job list entry" as well as an "object wrapper" system. Each entry in the job queues will be a flatbuffer object describing the work needed done(settings for the module), as well as containing the data(or shared pointers to the data) for the work that needs done. The object storage flatbuffers will contain the data that represents entity tables. For me, most of the actual data will me arrays that need deciding/working on. I'm also looking to use flatbuffers as a communication/control channel between threads. I'm torn on making a master "router" thread all the others communicate through, or each one containing their own, and having some mechanism of discovery.
SQLite Since only the active regions/systems need higher resolution work done, some of the background job lists the game will create(for thousands of systems and their entities) will be pretty large and long lived. 100's of thousands - millions of jobs(big in my mind), each requiring an unknown amount of time to complete. In my case, I don't care when they get done, as long as they all do(long campains). I plan on each thread getting a table of an in-memory sqlite db as a job queue. Each entry will contain a blob of flatbuffer work, a pointer to a buffer to notify upon completion, a pointer to a control buffer for updates, and other fields decorating the job item(location, data ranges, priority) that will get filled as the job entry makes new jobs, and as the items are consumed into the database. This give me a way I can create relational ties between jobs and simply construct queries if I need to re-work/update jobs, remove them and their dependencies, or update/re-order priorities or dependencies. All this being used in an sqlite db also means that at any time I can dump the whole thing to disk and reload it later, or switch to attaching to and processing it from disk. Additionally, this gives me access to a lot of search and ordering algorithmic work I'd normally need a bunch of different types of containers for. Being able to use SQL queries gives me a lot of options to process the jobs.
The communication queue(as a db) is what I'm torn as to whether I should make access via the corresponding thread(each thread contains it's own messaging db, and the module API has locks/mutex abstracted for access), or have all updates, adds/removes, and communication via some master router thread into one large db. I have no idea which will give me the least headaches as far as mutexing and locks. I got a few days into making a monster spaghetti beast of shared pointers to sbuffer pools and lookup tables, so each thread had it's own buffer in, and separate out buffers. That's when I decided to just offload the giant list keeping to sqlite. Then I thought, why not just feed the flatbuffer objects of everything else into tables.
Having almost everything in a db means from each module, I can write sql statements that represent the view of the data I need to work on as well as pivot on the fly as to how the data is worked on. Having the jobs themselves in a db means I can do the same for them as well. SQLite has multi-threading access, so using it as a Multithreaded job queue manager shouldn't be too much of a stretch.
In summary, Cpp-Taskflow will allow you to setup complicated nested loops with dependency chaining and job-pool multithreading. Out of the box it comes with most of the structure you need. FlatBuffers will allow you to create job declarations and object wrappers easy to feed into stream-buffers as one unit of work and pass them between job threads, and SQLite will allow you to tag and queue the stream-buffer jobs into blob entries in a way that should allow adding, searching, ordering, updating, and removal with minimal work on your end. It also makes saving and reloading a breeze. Snapshots and roll-backs should also be doable, you just have to keep your mind wrapped around the order and resolution of events for the db.
Edit: Take this with a grain of salt though, I found your question because I'm trying to accomplish what Crashworks described. I'm thinking of using affinity to open long living threads and have the master thread run the majority of the Cpp-Taskflow hierarchy work, feeding jobs to the others. I've yet to use the sqlite meothod of job-queue/control communication, that's just my plan so far.
I hope someone finds this helpful.
You might want to look at Flow-Based Programming - it is based on data chunks streaming between asynchronous components. There are Java and C# versions of the driver, plus a number of precoded components. It is intrinsically multithreaded - in fact the only single-threaded code is within the components, although you can add timing constraints to the standard scheduling rules. Although it may be at too fine-grained a level for what you need, there may be stuff here you can use.
Take a look at boost::future (but see also this discussion and proposal) which looks like a really nice foundation for parallelism (in particular it seems to offer excellent support for C-depends-on-A-and-B type situations).
I looked at OpenMP a bit but (like you) wasn't convinced it would work well for anything but Fortran/C numeric code. Intel's Threading Building Blocks looked more interesting to me.
If it comes to it, it's not too hard to roll your own on top of boost::thread.
[Explanation: a thread farm (most people would call it a pool) draws work from a thread-safe queue of functors (tasks or jobs). See the tests and benchmark for examples of use. I have some extra complication to (optionally) support tasks with priorities, and the case where executing tasks can spawn more tasks into the work queue (this makes knowing when all the work is actually completed a bit more problematic; the references to "pending" are the ones which can deal with the case). Might give you some ideas anyway.]
You may like to look at Intel Thread Building Blocks. I beleave it does what you want and with version 2 it's Open Source.
There's plenty of distributed resource managers out there. The software that meets nearly all of your requirements is Sun Grid Engine. SGE is used on some of the worlds largest supercomputers and is in active development.
There's also similar solutions in Torque, Platform LSF, and Condor.
It sounds like you may want to roll your own but there's plenty of functionality in all of the above.
I don't know if you're looking for a C++ library (which I think you are), but Doug Lea's Fork/Join framework for Java 7 is pretty nifty, and does exactly what you want. You'd probably be able to implement it in C++ or find a pre-implemented library.
More info here:
http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W1
A little late to the punch perhaps, but take a look also at ThreadWeaver:
http://en.wikipedia.org/wiki/ThreadWeaver