It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
Preemptive multitasking in C/C++: can a running thread be interrupted by some timer and switch between tasks?
Many VMs and other language runtimes using green-threading and such are implemented in these terms; can C/C++ apps do the same?
If so, how?
This is going to be platform dependent, so please discuss this in terms of the support particular platforms have for this; e.g. if there's some magic you can do in a SIGALRM handler on Linux to swap some kind of internal stack (perhaps using longjmp?), that'd be great!
I ask because I am curious.
I have been working for several years making async IO loops. When writing async IO loops I have to be very careful not to put expensive to compute computation into the loop as it will DOS the loop.
I therefore have an interest in the various ways an async IO loop can be made to recover or even fully support some kind of green threading or such approach. For example, sampling the active task and the number of loop iterations in a SIGALRM, and then if a task is detected to be blocking, move the whole of everything else to a new thread, or some cunning variation on this with the desired result.
There was some complaints about node.js in this regard recently, and elsewhere I've seen tantalizing comments about other runtimes such as Go and Haskell. But lets not go too far away from the basic question of whether you can do preemptive multitasking in a single thread in C/C++
Windows has fibers that are user-scheduled units of execution sharing the same thread.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms682661%28v=vs.85%29.aspx
UPD: More information about user-scheduled context switching can be found in LuaJIT sources, it supports coroutines for different platforms, so looking at the sources can be useful even if you are not using lua at all. Here is the summary: http://coco.luajit.org/portability.html,
As far as i understand you are mixing things that are usually not mixed:
Asynchrounous Singals
A signal is usually delivered to the program (thus in your description one thread) on the same stack that is currently running and runs the registered signal handler... in BSD unix there is an option to let the handler run on a separate so-called "signal stack".
Threads and Stacks
The ability to run a thread on its own stack requires the ability to allocate stack space and save and restore state information (that includes all registers...) - otherwise clean "context switch" between threads/processes etc. is impossible. Usually this is implemented in the kernel and very often using some form of assembler since that is a very low-level and very time-sensitive operation.
Scheduler
AFAIK every system capable of running threads has some sort of scheduler... which is basically a piece of code running with the highest privileges. Often it has subscribed to some HW signal (clock or whatever) and makes sure that no other code ever registers directly (only indirectly) to that same signal. The scheduler has thus the ability to preemt anything on that system. Main conern is usually to give the threads enough CPU cycles on the available cores to do their job. Implementation usually includes some sort of queues (often more than one), priority handling and several other stuff. Kernel-side threads usually have a higher priority than anything else.
Modern CPUs
On modern CPUs the implementation is rather complicated since involves dealing with several cores and even some "special threads" (i.e. hyperthreads)... since modern CPUs usually have several levels of Cache etc. it is very important to deal with these appropriately to achieve high performance.
All the above means that your thread can and most probably will be preempted by OS on a regular basis.
In C you can register signal handlers which in turn preempt your thread on the same stack... BEWARE that singal handlers are problematic if reentered... you can either put the processing into the signal handler or fill some structure (for example a queue) and have that queue content consumed by your thread...
Regarding setjmp/longjmp you need to be aware that they are prone to several problems when used with C++.
For Linux there is/was a "full preemption patch" available which allows you to tell the scheduler to run your thread(s) with even higher priority than kernel thread (disk I/O...) get!
For some references see
http://www.kernel.org/doc/Documentation/scheduler/sched-rt-group.txt
http://www.kernel.org/doc/Documentation/scheduler/sched-design-CFS.txt
http://www.kernel.org/doc/Documentation/scheduler/
http://www.kernel.org/doc/Documentation/rt-mutex.txt
http://www.kernel.org/doc/Documentation/rt-mutex-design.txt
http://www.kernel.org/doc/Documentation/IRQ.txt
http://www.kernel.org/doc/Documentation/IRQ-affinity.txt
http://www.kernel.org/doc/Documentation/preempt-locking.txt
http://tldp.org/LDP/LG/issue89/vinayak2.html
http://lxr.linux.no/#linux+v3.1/kernel/sched.c#L3566
Can my thread help the OS decide when to context switch it out?
https://www.osadl.org/Realtime-Preempt-Kernel.kernel-rt.0.html
http://www.rtlinuxfree.com/
C++: Safe to use longjmp and setjmp?
Use of setjmp and longjmp in C when linking to C++ libraries
http://www.cs.utah.edu/dept/old/texinfo/glibc-manual-0.02/library_21.html
For seeing an acutal implementation of a scheduler etc. checkout the linux serouce code at https://kernel.org .
Since your question isn't very specific I am not sure whether this is a real answer but I suspect it has enough information to get you started.
REMARK:
I am not sure why you might want to implement something already present in the OS... if it for a higher performance on some async I/O then there are several options with maximum performance usually available on the kernel-level (i.e. write kernel-mode code)... perhaps you can clarify so that a more specific answer is possible.
Userspace threading libraries are usually cooperative (e.g: GNU pth, SGI's statethreads, ...). If you want preemptiveness, you'd go to kernel-level threading.
You could probably use getcontext()/setcontext()... from a SIGALARM signal handler, but if it works, it would be messy at best. I don't see what advantage has this approach over kernel threading or event-based I/O: you get all the non-determinism of preemptiveness, and you don't have your program separated into sequential control flows.
As others have outlined, preemptive is likely not very easy to do.
The usual pattern for this is using co-procedures.
Coprocedures are a very nice way to express finite state machines (e.g. text parsers, communication handlers).
You can 'emulate' the syntax of co-procedures with a modicum of preprocessor macro magic.
Regarding optimal input/output scheduling
You could have a look at Boost Asio: The Proactor Design Pattern: Concurrency Without Threads
Asio also has a co-procedure 'emulation' model based on a single (IIRC) simple preprocessor macro, combined with some amount of cunningly designed template facilities that come things eerily close to compiler support for _stack-less co procedures.
The sample HTTP Server 4 is an example of the technique.
The author of Boost Asio (Kohlhoff) explains the mechanism and the sample on his Blog here: A potted guide to stackless coroutines
Be sure to look for the other posts in that series!
What you're asking makes no sense. What would your one thread be interrupted by? Any executing code has to be in a thread. And each thread is basically a sequential execution of code. For a thread to be interrupted, it has to be interrupted by something. You can't just jump around randomly inside your existing thread as a response to an interrupt. Then it's no longer a thread in the usual sense.
What you normally do is this:
either you have multiple threads, and one of your threads is suspended until the alarm is triggered,
alternatively, you have one thread, which runs in some kind of event loop, where it receives events from (among other sources) the OS. When the alarm is triggered, it sends a message to your thread's event loop. If your thread is busy doing something else, it won't immediately see this message, but once it gets back into the event loop and processing events, it'll get it, and react.
The title is an oxymoron, a thread is an independent execution path, if you have two such paths, you have more than one thread.
You can do a kind of "poor-man's" multitasking using setjmp/longjmp, but I would not recommend it and it is cooperative rather than pre-emptive.
Neither C nor C++ intrinsically support multi-threading, but there are numerous libraries for supporting it, such as native Win32 threads, pthreads (POSIX threads), boost threads, and frameworks such as Qt and WxWidgets have support for threads also.
Related
Sorry, my first question here. I'm not sure to be the first to ask this, but I could not find answers anywhere.
Modern CPU are heavily multi-threaded/cored but Linux does not garantee processes/threads to physically run in the same time (time sharing).
I'd like my (C++) programs to take advantage of this hardware: spawn small tasks (to update a Hash, copy a data) while going on with the main thread. The goal is to run the program faster. As it does not make sense to spawn a 500ns task and to wait 1ms for its execution I'd like to be (almost) sure that the task will be really executed in the same time as the main thread.
I could not find any paper or discussion on this subject, but I'm not sure to search properly, I just don't know how this thing would be named.
Could someone tell me:
- what's the name of such parallel (same time) executions ?
- is this possible on Linux (or which kind of OS offer such service) ?
Thanks
I realized that my question was more OS than programming oriented, and that I should ask it on a more appropriated Programmers site, here:
https://softwareengineering.stackexchange.com/questions/325257/possiblity-to-request-several-linux-threads-scheduled-together-in-the-same-time
Thanks for answers, they made me advance and better define what I'm looking for.
What you are looking for is cpu thread affinity and cgroups under Linux. There is a lot of complexity to it and you will need to experiment with your particular requirements.
A common strategy in low latency applications is to assign a CPU resource solely to a particular process or thread. The thread runs 'hot' thereby never releasing the CPU to any other process including the kernel.
The Reactor pattern is useful for this model. A queue is setup on a hot thread with other threads feeding the queue. The queue itself can be a bottleneck but thats a whole other discussion. So in your example, the main (hot?) thread will be writing events into the queue of other hot worker threads. Some sort of rendezvous event will indicate to the main thread that the worker threads are finished.
This strategy is only useful for CPU bound applications. If your application is I/O bound then the kernel will likely do a better job than a custom algorithm.
To directly answer your question, yes this is possible in Linux with C/C++ but it is not trivial.
I have some old articles on my blog that may be of interest.
http://matthewericfisher.tumblr.com/post/6462629082/low-latency-highly-scalable-robust-systems
--Matt
This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??
It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.
Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.
Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.
Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.
As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.
I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.
For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.
In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.
As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.
I'm creating a concurrent memory reclamation algorithm in C++. Periodically, the stacks of executing mutator threads need to be inspected, so that I can see what references the threads are currently holding. In the process of doing this, I need to also check the registers of the mutator thread to check any references that might be in there.
Clearly many JVM's and C# vm's have no problem doing this as part of their garbage collection cycles. However, I haven't been able to find a definitive solution to this issue.
I can't quite tease apart what is going on in the Bohem garbage collector in order to inspect the root set, if you can (or know how its done), I'd really like to know.
Ideally I would be able to cause the mutator thread to be interrupted, and execute a piece of handler code which would report it's PC and also flush any register-based references into the stack, and then perhaps help finish the collection cycle. I believe that most compilers in most systems will automatically flush the registers when interrupt or signal handlers are called, but I'm not clear on the specifics, or how to access that data. It seems that separate stacks might be used for interrupt and signal handlers. Additionally, I can't find any information about how to target a particular thread, or how to send a signal. Windows does not seem to support this form of signaling anyway, and I would like my system to run on both Linux and Windows on x86-64 processors.
Edit: SuspendThread() is used in some situations, although safepoints seem to be preferred. Any ideas on why? Is there any way to deal with long-lasting I/O waits or other waits for kernel code to return?
I thought this was a very interesting question, so I dug into it a bit. It turns out that the Hotspot JVM uses a mechanism called "safepoints" which cause the threads of the JVM to cooperatively all stop themselves so that the GC can begin. In other words, the thread initiating GC doesn't forcibly stop the other threads, the other threads voluntarily suspend themselves by various clever mechanisms.
I don't believe the JVM scans registers, because a safepoint is defined such that all roots are known (I presume this means in memory).
For more information see:
HotSpot Glossary -- which defines safepoints
safepoint.cpp -- the source in HotSpot that implements safepoints
A slide deck that describes safepoints in some detail (look 10 slides or so in)
In regards to your desire to "interrupt" all threads, according to the slide deck I referenced above, thread suspension is "unreliable on Solaris and Linux, e.g., spurious signals." I'm not sure what mechanism even exists for thread suspension that the slides would be referring to.
On windows you should be able to get this done use SuspendThread (and ResumeThread) along with GetThreadContext (as Hans mentioned). All of these functions take handles to the specific thread you intend to target.
To get a list of all threads in the current process, see this(toolhlp32 works on x64, despite its bad naming scheme...).
As a point of interest, one way to flush registers to the stack on x86 is to use the PUSHAD assembly instruction.
Our (Windows native C++) app is composed of threaded objects and managers. It is pretty well written, with a design that sees Manager objects controlling the lifecycle of their minions. Various objects dispatch and receive events; some events come from Windows, some are home-grown.
In general, we have to be very aware of thread interoperability so we use hand-rolled synchronization techniques using Win32 critical sections, semaphores and the like. However, occasionally we suffer thread deadlock during shut-down due to things like event handler re-entrancy.
Now I wonder if there is a decent app shut-down strategy we could implement to make this easier to develop for - something like every object registering for a shutdown event from a central controller and changing its execution behaviour accordingly? Is this too naive or brittle?
I would prefer strategies that don't stipulate rewriting the entire app to use Microsoft's Parallel Patterns Library or similar. ;-)
Thanks.
EDIT:
I guess I am asking for an approach to controlling object life cycles in a complex app where many threads and events are firing all the time. Giovanni's suggestion is the obvious one (hand-roll our own), but I am convinced there must be various off-the-shelf strategies or frameworks, for cleanly shutting down active objects in the correct order. For example, if you want to base your C++ app on an IoC paradigm you might use PocoCapsule instead of trying to develop your own container. Is there something similar for controlling object lifecycles in an app?
This seems like a special case of the more general question, "how do I avoid deadlocks in my multithreaded application?"
And the answer to that is, as always: make sure that any time your threads have to acquire more than one lock at a time, that they all acquire the locks in the same order, and make sure all threads release their locks in a finite amount of time. This rule applies just as much at shutdown as at any other time. Nothing less is good enough; nothing more is necessary. (See here for a relevant discussion)
As for how to best do this... the best way (if possible) is to simplify your program as much as you can, and avoid holding more than one lock at a time if you can possibly help it.
If you absolutely must hold more than one lock at a time, you must verify your program to be sure that every thread that holds multiple locks locks them in the same order. Programs like helgrind or Intel thread checker can help with this, but it often comes down to simply eyeballing the code until you've proved to yourself that it satisfies this constraint. Also, if you are able to reproduce the deadlocks easily, you can examine (using a debugger) the stack trace of each deadlocked thread, which will show where the deadlocked threads are forever-blocked at, and with that information, you can that start to figure out where the lock-ordering inconsistencies are in your code. Yes, it's a major pain, but I don't think there is any good way around it (other than avoiding holding multiple locks at once). :(
One possible general strategy would be to send an "I am shutting down" event to every manager, which would cause the managers to do one of three things (depending on how long running your event-handlers are, and how much latency you want between the user initiating shutdown, and the app actually exiting).
1) Stop accepting new events, and run the handlers for all events received before the "I am shutting down" event. To avoid deadlocks you may need to accept events that are critical to the completion of other event handlers. These could be signaled by a flag in the event or the type of the event (for example). If you have such events then you should also consider restructuring your code so that those actions are not performed through event handlers (as dependent events would be prone to deadlocks in ordinary operation too.)
2) Stop accepting new events, and discard all events that were received after the event that the handler is currently running. Similar comments about dependent events apply in this case too.
3) Interrupt the currently running event (with a function similar to boost::thread::interrupt()), and run no further events. This requires your handler code to be exception safe (which it should already be, if you care about resource leaks), and to enter interruption points at fairly regular intervals, but it leads to the minimum latency.
Of course you could mix these three strategies together, depending on the particular latency and data corruption requirements of each of your managers.
As a general method, use an atomic boolean to indicate "i am shutting down", then every thread checks this boolean before acquiring each lock, handling each event etc. Can't give a more detailed answer unless you give us a more detailed question.
I have 2 versions of a function which are available in a C++ library which do the same task. One is a synchronous function, and another is of asynchronous type which allows a callback function to be registered.
Which of the below strategies is preferable for giving a better memory and performance optimization?
Call the synchronous function in a worker thread, and use mutex synchronization to wait until I get the result
Do not create a thread, but call the asynchronous version and get the result in callback
I am aware that worker thread creation in option 1 will cause more overhead. I am wanting to know issues related to overhead caused by thread synchronization objects, and how it compares to overhead caused by asynchronous call. Does the asynchronous version of a function internally spin off a thread and use synchronization object, or does it uses some other technique like directly talk to the kernel?
"Profile, don't speculate." (DJB)
The answer to this question depends on too many things, and there is no general answer. The role of the developer is to be able to make these decisions. If you don't know, try the options and measure. In many cases, the difference won't matter and non-performance concerns will dominate.
"Premature optimisation is the root of all evil, say 97% of the time" (DEK)
Update in response to the question edit:
C++ libraries, in general, don't get to use magic to avoid synchronisation primitives. The asynchronous vs. synchronous interfaces are likely to be wrappers around things you would do anyway. Processing must happen in a context, and if completion is to be signalled to another context, a synchronisation primitive will be necessary to do that.
Of course, there might be other considerations. If your C++ library is talking to some piece of hardware that can do processing, things might be different. But you haven't told us about anything like that.
The answer to this question depends on context you haven't given us, including information about the library interface and the structure of your code.
Use asynchronous function because will probably do what you want to do manually with synchronous one but less error prone.
Asynchronous: Will create a thread, do work, when done -> call callback
Synchronous: Create a event to wait for, Create a thread for work, Wait for event, On thread call sync version , transfer result, signal event.
You might consider that threads each have their own environment so they use more memory than a non threaded solution when all other things are equal.
Depending on your threading library there can also be significant overhead to starting and stopping threads.
If you need interprocess synchronization there can also be a lot of pain debugging threaded code.
If you're comfortable writing non threaded code (i.e. you won't burn a lot of time writing and debugging it) then that might be the best choice.