Let us suppose I have different data structures in c++ on Linux
Data1, Data2, Data3, Data4 and many more
Afterwards, I make use of a network trace file (wireshark file). Send each packet to all these above data files. If anyone of them sets a flag for the packet. I want all the other data files to stop processing on that packet and move to the next packet in that network trace file.
In my scenario, which one will be better to use :
Pthreads or Linux processes (fork...)
Processes have individual address spaces where each heave a separate heap, stack and code laying inside. Loading processes require OS to create and manage memory resources.Data transferring through one to another require OS support, Inter Process Communication technologies such as Shared Memory or Pipes in case of Linux. Also each time accessing data that is protected by a shared semaphore, will require system calls. That will reduce your speed highly. Processes are protected from others by OS. If one process works right than the chance of another to break it is hard. Processes create a sandbox where you code is secured from others.
Thread's are more light weight. Creating and Deleting takes less time and afford. Doesn't have separate address space (page tables). Easy to share data one to another. Doesn't require OS support for that. But Threads are more vulnerable the mistakes of other threads. And still for the shared data you need concurrency tools such as semaphores or mutexes.
A small example of this is most browsers use threads to manage tab's. But when one fail mostly all application crash. But Chrome runs each tabs and extensions as different processes; If one crashes you still have others without major problem.
Go with threads if you are not sure. They will satisfy your needs stated in the question without problem.
I'm working on a game server, written in C++, and I'm trying to decide how many threads to use and what tasks to thread. The basic server skeleton consists of keyboard I/O and output to a console, accepting incoming connects, sending outgoing connects, and doing the game "stuff".
What I'd like to know is which things should be given a separate thread. Should each connect have its own thread? I know this is variable, it depends on the project or so, but I would like it to support a pretty decent number of players (somewhere in the hundreds if possible).
The standard answer should always be: Try it the simplest way first, and only look for ways to improve performance if the simple way isn't good enough. However, re-architecting a large C++ program can be a painful experience, so some guesses about performance in advance may be appropriate.
Theoretically, hundreds of threads are probably OK on modern machines. The NPTL implementation for Linux was tested with tens of thousands of threads, as I recall. If that's the easiest way for you to implement, it may be the right answer.
However, high-performance web servers and similar typically use event-driven models instead. Consider a library like libevent. I'm sure there are C++ libraries for the same purpose.
I personally believe that languages without first-class continuations, or at least coroutines, are poor choices for this kind of work, but the C language family is how we get work done today, so off we go. :-)
A good solution could be to use a Thread pool.
Idea is to let the main thread dispatch equitably all connexions in a fixed number of threads.
With a good design, you can easily set the number of thread on runtime.
You can find more informations here.
Create more threads than you have CPU cores is not productive, and adding too threads decrease performances due to time taken for switching between threads.
By example, for compiling a large project (it's not exactly the same thing, but it's valid for both case), it's often recommended to use no more thread than number of CPU cores + 1.
A very common technique is to have the game server run on one thread to monitor several connections (i.e. sockets) by using a select on each socket. When data is available, grab the data and enqueue it in a producer/consumer type model for the game engine to pick up.
This is by no means the be-all-end-all implementation, but it should be enough to get you started. Sounds like a cool project. Good luck!
If you setup the connections and utilize them in a manner that cause the thread to block waiting on IO then you should be able to service all of the connections and the keyboard on one thread. You may not want to put the console output on that same thread, as I've seen cases (on windows at least), where the speed of writing to the console is actually a bottleneck (i.e. if the console window is minimized the process runs considerably faster).
If the work of your game engine parallelizes well then you probably want to set use as many threads as there are CPUs less one (for the OS and the other two threads). If you expect the client to run on the same machine the server will want to detect that and scale back the number of threads it uses.
I'm working on an application (C++ combined with Qt for graphic part) to be run on an embedded Linux platform. I need know how to divide the application in different "cores" each one taking care of a different part of the application in such a way to improve stability, efficiency and security of the application itself.
My doubt is: is it more convenient to divide functionalities into threads or to fork different processes?
Let me provide a functional view of the application: there are different user interfaces each one allowing users to do more or less the same things (don't mind about data consistency, I've already solved this problem). Each of these interfaces must act as a stand-alone (like different terminal of the same system). I want all of them to send and receive messages from the same "core" which will take care of updating application data or do other proper stuff.
What's the best way to implement the division between the inner "core" and a user interface?
For sure I'm missing some knowledge but so far I came up with two alternatives:
1 - fork a child from father "core" and let the child execute a specific UI program (I have no practical experience of doing this so how, in this case, can I make father and child communicate (baring in mind that child is a new process)?)
2 - create different threads for each core and UI.
I need this division because the application is required to be as stable as possible and capable of restarting a UI in the case of a crash. Keep in mind also that the overall application wont have infinite memory and resources available.
Thanks in advance for your help, regards.
There are a several reasons why going down the separate process route might is a good choice in an embedded system:
Decoupling of component: running components as seperate processes is the ultimate decoupling. Often useful when projects become very large
Security and privilege management: Quite likely in an embedded system that some components need elevated privilege in order to control devices, whereas others are potential security hazards (for instance network facing components) you want to run with as little as little privilege as possible. Other likely scenarios are components that need real-time threading or to be able to mmap() a lot of system memory. Overallocation of either will lock your system up in a way it won't recover from.
Reliably: You can potentially respawn parts of the system if they fail leaving the remainder running
Building such an arrangement is actually easier than others here are suggesting - Qt has really good support for dbus - which nicely takes care of your IPC, and is used extensive in the Linux desktop for system management functionality.
As for the scenario you describe, you might want to daemonise the 'core' of the application and then allow client connections over dbus from UI components.
Running the UI in a different thread won't give you much in the way of additional stability -- the other thread can trash your heap of the engine, and even if you terminate the thread any resources it has won't be recycled.
You can improve stability a bit by having a really strong wall of abstraction between the Engine and the UI. So this isn't completely futile.
Multiple processes require lots of hoops to jump through -- you need a method of IPC (interprocess communication).
Note that IPC and to a lesser extent walls of abstraction can add to the overhead of your program.
An important question to ask is "how much data has to pass between the UI and the Engine?" -- if it is little enough data (like "start the task" from UI to engine, and "this task is 50% done" from engine to UI), IPC is less of a hassle. If you are an interactive painting application with real-time full-screen updates of an image, IPC is more annoying and less practical.
Now, a quick google on Qt and IPC tells me that there is a Qt extension for embedded linux that allows the Qt signals and slots to pass messages between processes: Qt COmmunications Protocol (QCOP). One issue I have had with frameworks like this is that it can easily lead to entanglements between the client and server state that can compromise stability on the other end of the communications pipe, compared to relatively simple protocols.
I have a small architecture doubt about organizing code in separate functional units (most probably threads?). Application being developed is supposed to be doing the following tasks:
Display some images on a screen (i.e. slideshow)
Read the data from external device through the USB port
Match received data against the corresponding image (stimulus)
Do some data analysis
Plot the results of data analysis
My thoughts were to organize the application into the following modules:
GUI thread (+ image slideshow)
USB thread buffering the received data
Thread for analyzing/plotting data (main GUI thread should not be blocked while plotting the data which might consume some more time)
So, what do you generally think about this concept? Is there anything else you think that might be a better fit in this particular scenario?
You can probably get away with combining 1 & 2, since the slide-show feature is essentially gui oriented anyway.
For #3, you may be able to make do with some kind of asynchronous I/O methodology, so that you don't need to dedicate a polling thread. Not sure if you can do this with USB, but you can certainly get async I/O with serial and network interfaces, so it's worth looking into.
It's probably a good idea to move heavy-weight tasks like 4 & 5 to their own thread. If you aren't doing the analysis and plotting concurrently, maybe one thread for them both. However, you should really consider how much cpu time these activities will need. If the worst-case analyze-and-plot takes much less than half a second, you might even just perform these actions with a call from the gui. Conversely, if there are cases where this will take longer than that, a separate thread is favorable b/c your users won't like a laggy gui.
Just bear in mind that the dark side of threads lies in the inevitable challenge of coordinating them.
Because of the way the Windows API works, especially with regard to user input and window ownership. You can really only do UI on a single thread. If you try and use multiple threads, they just end up locking each other out and only 1 thread runs at a time. There are some specialized exceptions, but you have to be a real master of the API to pull it off.
So.
GUI thread, owns the Window, and handles all user input.
USB listening thread, you would know better than I whether this makes sense
Thread(s) for analyzing/plotting data, once again, I can't speak to this, but I'm skeptical that they will really both be running at the same time. It seems more likely this it would be analyze then plot so 1 thread.
Thread for rendering frames for a slideshow.
I'm not sure how plotting isn't the same thing as the slideshow, but I do think you can have a background thread for drawing the slideshow as long as it doesn't display the images.
You can render (i.e. draw to a bitmap or DirectX surface) in a background thread, you just can't show it in a window. But you could hand completed bitmaps off to the GUI thread and have it do the actual displaying of the bitmap. This is essentially how a lot of video playback code works.
A lot of this depends on how much is involved in performing 3 (Do some data analysis.) and 4 (Plot analyzed data.)
My instincts would be:
Definitely have a separate thread for reading the data off the USB. Assuming for a moment that 3 is dependent on reading the data, then I would do 3 in the same thread as reading the data. This will simplify your signaling to the GUI when the data is ready. This also assumes the processing is quick, and won't block the USB port (How is that being read? IO completion ports?). If the processing takes time then you need a separate thread.
Likewise if image slide processing show takes a long time, this should be done in a separate thread. If this can be quickly recalculated depending say in a paint function, I would keep it as part of the main GUI.
There is some overhead with context switch of threads, and for each thread added complexity of signaling. So I would only add a thread to solve blocking of the GUI and the USB port. It may be possible to do all of this in just two threads.
4 and 5 are definitely good ideas. That being said, avoid using low-level threads unless you absolutely must.
I'd check out Boost and Boost::Thread. Not only does it make your code more portable, but I haven't worked with an easier library for threading.
If you are using Builder 2009, you should look at TThread. It has some stuff to simplify thread coding.
I can't help thinking that you may be going a bit overboard here. A USB port can't really deliver data terribly quickly -- it's theoretical bandwidth is only 480 Mbits/second, and realistically, it's a pretty rare USB device that can really get very close to that.
Unless the analysis you've mentioned is quite a bit more complex than you've implied, my guess is that a single thread is probably entirely adequate. I'd think hard about using overlapped I/O to read the data, and MsgWaitForMultipleObjects for the main message loop.
It seems to me that the main place you stand a good chance of gaining a lot is in plotting the data after it's processed. It might be worth considering something like OpenGL or DirectX Graphics to do the drawing. Especially if you're producing quite a bit of output, this can give a really substantial speed improvement. In an ideal situation, multiple threads might multiply your speed by the number of available cores -- typically 2 or 4 on today's machines. Drawing the output is likely to be the slowest part of the job, and hardware acceleration can easily speed that up by a considerably larger factor -- 10x is at the low end of what you can typically expect, and 100x is fairly common.
I need to manage CPU-heavy multitaskable jobs in an interactive application. Just as background, my specific application is an engineering design interface. As a user tweaks different parameters and options to a model, multiple simulations are run in the background and results displayed as they complete, likely even as the user is still editing values. Since the multiple simulations take variable time (some are milliseconds, some take 5 seconds, some take 10 minutes), it's basically a matter of getting feedback displayed as fast as possible, but often aborting jobs that started previously but are now no longer needed because of the user's changes have already invalidated them. Different user changes may invalidate different computations so at any time I may have 10 different simulations running. Somesimulations have multiple parts which have dependencies (simulations A and B can be seperately computed, but I need their results to seed simulation C so I need to wait for both A and B to finish first before starting C.)
I feel pretty confident that the code-level method to handle this kind of application is some kind of multithreaded job queue. This would include features of submitting jobs for execution, setting task priorities, waiting for jobs to finish, specifying dependencies (do this job, but only after job X and job Y have finished), canceling subsets of jobs that fit some criteria, querying what jobs remain, setting worker thread counts and priorities, and so on. And multiplatform support is very useful too.
These are not new ideas or desires in software, but I'm at the early design phase of my application where I need to make a choice about what library to use for managing such tasks. I've written my own crude thread managers in the past in C (I think it's a rite of passage) but I want to use modern tools to base my work on, not my own previous hacks.
The first thought is to run to OpenMP but I'm not sure it's what I want. OpenMP is great for parallelizing at a fine level, automatically unrolling loops and such. While multiplatform, it also invades your code with #pragmas. But mostly it's not designed for managing large tasks.. especially cancelling pending jobs or specifying dependencies. Possible, yes, but it's not elegant.
I noticed that Google Chrome uses such a job manager for even the most trivial tasks. The design goal seems to be to keep the user interaction thread as light and nimble as possible, so anything that can get spawned off asynchronously, should be. From looking at the Chrome source this doesn't seem to be a generic library, but it still is interesting to see how the design uses asynchronous launches to keep interaction fast. This is getting to be similar to what I'm doing.
There are a still other options:
Surge.Act: a Boost-like library for defining jobs. It builds on OpenMP, but does allow chaining of dependencies which is nice. It doesn't seem to feel like it's got a manager that can be queried, jobs cancelled, etc. It's a stale project so it's scary to depend on it.
Job Queue is quite close to what I'm thinking of, but it's a 5 year old article, not a supported library.
Boost.threads does have nice platform independent synchronization but that's not a job manager. POCO has very clean designs for task launching, but again not a full manager for chaining tasks. (Maybe I'm underestimating POCO though).
So while there are options available, I'm not satisfied and I feel the urge to roll my own library again. But I'd rather use something that's already in existence. Even after searching (here on SO and on the net) I haven't found anything that feels right, though I imagine this must be a kind of tool that is often needed, so surely there's some community library or at least common design.
On SO there's been some posts about job queues, but nothing that seems to fit.
My post here is to ask you all what existing tools I've missed, and/or how you've rolled your own such multithreaded job queue.
We had to build our own job queue system to meet requirements similar to yours ( UI thread must always respond within 33ms, jobs can run from 15-15000ms ), because there really was nothing out there that quite met our needs, let alone was performant.
Unfortunately our code is about as proprietary as proprietary gets, but I can give you some of the most salient features:
We start up one thread per core at the beginning of the program. Each pulls work from a global job queue. Jobs consist of a function object and a glob of associated data (really an elaboration on a func_ptr and void *). Thread 0, the fast client loop, isn't allowed to work on jobs, but the rest grab as they can.
The job queue itself ought to be a lockless data structure, such as a lock-free singly linked list (Visual Studio comes with one). Avoid using a mutex; contention for the queue is surprisingly high, and grabbing mutexes is costly.
Pack up all the necessary data for the job into the job object itself -- avoid having pointer from the job back into the main heap, where you'll have to deal with contention between jobs and locks and all that other slow, annoying stuff. For example, all the simulation parameters should go into the job's local data blob. The results structure obviously needs to be something that outlives the job: you can deal with this either by a) hanging onto the job objects even after they've finished running (so you can use their contents from the main thread), or b) allocating a results structure specially for each job and stuffing a pointer into the job's data object. Even though the results themselves won't live in the job, this effectively gives the job exclusive access to its output memory so you needn't muss with locks.
Actually I'm simplifying a bit above, since we need to choreograph exactly which jobs run on which cores, so each core gets its own job queue, but that's probably unnecessary for you.
I rolled my own, based on Boost.threads. I was quite surprised by how much bang I got from writing so little code. If you don't find something pre-made, don't be afraid to roll your own. Between Boost.threads and your experience since writing your own, it might be easier than you remember.
For premade options, don't forget that Chromium is licensed very friendly, so you may be able to roll your own generic library around its code.
Microsoft is working on a set of technologies for the next Version of Visual Studio 2010 called the Concurrency Runtime, the Parallel Pattern Library and the Asynchronous Agents Library which will probably help. The Concurrency Runtime will offer policy based scheduling, i.e. allowing you to manage and compose multiple scheduler instances (similar to thread pools but with affinitization and load balancing between instances), the Parallel Pattern Library will offer task based programming and parallel loops with an STL like programming model. The Agents library offers an actor based programming model and has support for building concurrent data flow pipelines, i.e. managing those dependencies described above. Unfortunately this isn't released yet, so you can read about it on our team blog or watch some of the videos on channel9 there is also a very large CTP that is available for download as well.
If you're looking for a solution today, Intel's Thread Building Blocks and boost's threading library are both good libraries and available now. JustSoftwareSolutions has released an implementation of std::thread which matches the C++0x draft and of course OpenMP is widely available if you're looking at fine-grained loop based parallelism.
The real challenge as other folks have alluded to is to correctly identify and decompose work into tasks suitable for concurrent execution (i.e. no unprotected shared state), understand the dependencies between them and minimize the contention that can occur on bottlenecks (whether the bottleneck is protecting shared state or ensuring the dispatch loop of a work queue is low contention or lock-free)... and to do this without scheduling implementation details leaking into the rest of your code.
-Rick
Would something like threadpool be useful to you? It's based on boost::threads and basically implements a simple thread task queue that passes worker functions off to the pooled threads.
I've been looking for near the same requirements. I'm working on a game with 4x-ish mechanics and scheduling different parts of what gets done almost exploded my brain. I have a complex set of work that needs to get accomplished at different time resolutions, and to a different degree of actual simulation depending on what system/region the player has actively loaded. This means as the player moves from system to system, I need to load a system to the current high resolution simulation, offload the last system to a lower resolution simulation, and do the same for active/inactive regions of systems. The different simulations are big lists of population, political, military, and economic actions based on profiles of each entity. I'm going to try to describe my issue and my approach so far and I hope it's useful at describe an alternative for you or someone else. The rough outline of the structure I'm building will use the following:
cpp-taskflow (A Modern C++ Parallel Task Programming Library) I'm going to make a library of modules that will be used as job construction parts. Each entry will have an API for initializing and destruction as well as pointers for communication. I'm hoping to write it in a way that they will be nest-able using the cpp-taskflow API to set-up all the dependencies at job creation time, but provide a means of live adjustment and having a kill-switch available. Most of what I'm making will be decision trees of state machines, or state machines of behavior trees so the job data structure will be settings and states of time-resolution tagged data pointing to actual stats and object values.
FlatBuffers I'm looking to use this library to build a "job list entry" as well as an "object wrapper" system. Each entry in the job queues will be a flatbuffer object describing the work needed done(settings for the module), as well as containing the data(or shared pointers to the data) for the work that needs done. The object storage flatbuffers will contain the data that represents entity tables. For me, most of the actual data will me arrays that need deciding/working on. I'm also looking to use flatbuffers as a communication/control channel between threads. I'm torn on making a master "router" thread all the others communicate through, or each one containing their own, and having some mechanism of discovery.
SQLite Since only the active regions/systems need higher resolution work done, some of the background job lists the game will create(for thousands of systems and their entities) will be pretty large and long lived. 100's of thousands - millions of jobs(big in my mind), each requiring an unknown amount of time to complete. In my case, I don't care when they get done, as long as they all do(long campains). I plan on each thread getting a table of an in-memory sqlite db as a job queue. Each entry will contain a blob of flatbuffer work, a pointer to a buffer to notify upon completion, a pointer to a control buffer for updates, and other fields decorating the job item(location, data ranges, priority) that will get filled as the job entry makes new jobs, and as the items are consumed into the database. This give me a way I can create relational ties between jobs and simply construct queries if I need to re-work/update jobs, remove them and their dependencies, or update/re-order priorities or dependencies. All this being used in an sqlite db also means that at any time I can dump the whole thing to disk and reload it later, or switch to attaching to and processing it from disk. Additionally, this gives me access to a lot of search and ordering algorithmic work I'd normally need a bunch of different types of containers for. Being able to use SQL queries gives me a lot of options to process the jobs.
The communication queue(as a db) is what I'm torn as to whether I should make access via the corresponding thread(each thread contains it's own messaging db, and the module API has locks/mutex abstracted for access), or have all updates, adds/removes, and communication via some master router thread into one large db. I have no idea which will give me the least headaches as far as mutexing and locks. I got a few days into making a monster spaghetti beast of shared pointers to sbuffer pools and lookup tables, so each thread had it's own buffer in, and separate out buffers. That's when I decided to just offload the giant list keeping to sqlite. Then I thought, why not just feed the flatbuffer objects of everything else into tables.
Having almost everything in a db means from each module, I can write sql statements that represent the view of the data I need to work on as well as pivot on the fly as to how the data is worked on. Having the jobs themselves in a db means I can do the same for them as well. SQLite has multi-threading access, so using it as a Multithreaded job queue manager shouldn't be too much of a stretch.
In summary, Cpp-Taskflow will allow you to setup complicated nested loops with dependency chaining and job-pool multithreading. Out of the box it comes with most of the structure you need. FlatBuffers will allow you to create job declarations and object wrappers easy to feed into stream-buffers as one unit of work and pass them between job threads, and SQLite will allow you to tag and queue the stream-buffer jobs into blob entries in a way that should allow adding, searching, ordering, updating, and removal with minimal work on your end. It also makes saving and reloading a breeze. Snapshots and roll-backs should also be doable, you just have to keep your mind wrapped around the order and resolution of events for the db.
Edit: Take this with a grain of salt though, I found your question because I'm trying to accomplish what Crashworks described. I'm thinking of using affinity to open long living threads and have the master thread run the majority of the Cpp-Taskflow hierarchy work, feeding jobs to the others. I've yet to use the sqlite meothod of job-queue/control communication, that's just my plan so far.
I hope someone finds this helpful.
You might want to look at Flow-Based Programming - it is based on data chunks streaming between asynchronous components. There are Java and C# versions of the driver, plus a number of precoded components. It is intrinsically multithreaded - in fact the only single-threaded code is within the components, although you can add timing constraints to the standard scheduling rules. Although it may be at too fine-grained a level for what you need, there may be stuff here you can use.
Take a look at boost::future (but see also this discussion and proposal) which looks like a really nice foundation for parallelism (in particular it seems to offer excellent support for C-depends-on-A-and-B type situations).
I looked at OpenMP a bit but (like you) wasn't convinced it would work well for anything but Fortran/C numeric code. Intel's Threading Building Blocks looked more interesting to me.
If it comes to it, it's not too hard to roll your own on top of boost::thread.
[Explanation: a thread farm (most people would call it a pool) draws work from a thread-safe queue of functors (tasks or jobs). See the tests and benchmark for examples of use. I have some extra complication to (optionally) support tasks with priorities, and the case where executing tasks can spawn more tasks into the work queue (this makes knowing when all the work is actually completed a bit more problematic; the references to "pending" are the ones which can deal with the case). Might give you some ideas anyway.]
You may like to look at Intel Thread Building Blocks. I beleave it does what you want and with version 2 it's Open Source.
There's plenty of distributed resource managers out there. The software that meets nearly all of your requirements is Sun Grid Engine. SGE is used on some of the worlds largest supercomputers and is in active development.
There's also similar solutions in Torque, Platform LSF, and Condor.
It sounds like you may want to roll your own but there's plenty of functionality in all of the above.
I don't know if you're looking for a C++ library (which I think you are), but Doug Lea's Fork/Join framework for Java 7 is pretty nifty, and does exactly what you want. You'd probably be able to implement it in C++ or find a pre-implemented library.
More info here:
http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W1
A little late to the punch perhaps, but take a look also at ThreadWeaver:
http://en.wikipedia.org/wiki/ThreadWeaver