Monitors and mutual exclusion - concurrency

Just wanted to know if mutual exclusion in monitors is at a procedure/method level or if it is at a monitor level.
I mean, in the first case, there might be 2 threads accessing the monitor, but they would be running in different procedures. In the second case, there won't be 2 threads acceding the monitor even if two different procedures are called.
Assuming it is at a monitor level, it seems much safer. However, aren't there situations in which there could be running 2 different threads in a monitor each modifying different shared variables and the monitor won't allow this to happen? If that is the case, wouldn't this lower the concurrency in the system? Or would it mean that 2 monitors should have been created instead of one?
Ok, I better finish philosophizing :) Thanks in advance

Monitor locking is completely independent of the calling method.
Had that not been the case, it would have been very difficult to prevent two different methods from executing concurrently.
If you don't want it to be, use multiple monitors.

The mechanism of monitor operation is serializing of threads or processes.
So Monitor is a skill to guarantee mutual exclusion on concurrency situation.

Related

Is ITC faster than IPC of the same kind?

I am trying to find out if there is an inherent speed difference between inter-thread and inter-process communication.
I know that when using threads the threads share the same memory, can use the same global variables, and the such while processes have to use other tricks, which basically means queues.
But take the following case:
An application is comprised of several completely separate .exe files. When all are run they form a producer/consumer (or publisher/subscriber) architecture, with some processes producing some values and other processes reading and using those values and maybe producing some other values.
This communication is done with conventional ways of IPC.
My question is: if I were to move the code around so that it's one process with multiple threads (assuming no conflicts with variable names and the such), but keep the communication methods the same, queues with all the locks and semaphores behind them, will the thread-based application be faster than the process-based one?
The startup costs of processes vs. threads are not important because the application is meant to run for a long time (hours) so a few milliseconds will not be important.
Google has yielded no conclusive answers to this.
To clarify some aspects of the question:
The factor I want to maximize is throughput.
Some external factor (an arduino sensor for example) produces an input for one of the nodes and the entire network takes some time while all the nodes consume and produce values. Then a new input can be processed. I would like to be able to process more inputs per minute/second.
The data being passed back and forth are mostly numbers or small arrays of numbers.
The entire network can have lets say between 5 and 25 nodes.
As for platform (if it is relevant) I would like answers for both Linux and Windows.
The specific use-case is too large to be described here so consider the use-case provided above. This is as much, if not more, an educational question for my own knowledge as it is a question about a specific problem.
Please ask for any other relevant information that I have not included here.
If I were to move the code around so that it's one process with multiple threads (assuming no conflicts with variable names and the such), but keep the communication methods the same, queues with all the locks and semaphores behind them, will the thread-based application be faster than the process-based one?
That is not possible. The multiple thread version will take advantage of the shared memory space and the multi-process version cannot do so. For example, you can take an ordinary object in one thread and access it through a pointer in another thread, and all the referenced sub-objects will "just work". Anything not modified can be accessed just as easily in one thread as another with no special effort.
This simply won't work across processes at all since they don't share an address space.

Benefits of a multi thread program in a unicore system [duplicate]

This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??
It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.
Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.
Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.
Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.
As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.
I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.
For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.
In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.
As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.

Simulating Thread with fork()

What's your idea about simulating thread with "fork() function" and a "shared memory" block ...
Is it possible ?
How much is it reasonable to do this for a program ? ( I mean , Will it work well..?)
For starters, don't mix a thread and fork().
A fork gives you a brand new process, which is a copy of the current process, with the same code segments. As the memory image changes (typically this is due to different behavior of the two processes) you get a separation of the memory images, however the executable code remains the same. Tasks do not share memory unless they use some Inter Process Communication (IPC) primitive.
In contrast a thread is another execution thread of the same task. One task can have multiple threads, and the task memory object are shared among threads, therefore shared data must be accessed through some primitive and synchronization objects that allow you to avoid data corruption.
Yes, it is possible, but I cannot imagine it being a good idea, and it would be a real pain to test.
If you have a shared heap, and you make sure all semaphores etc. are allocated in the heap, and not the stack, then there's no inherent reason you couldn't do something like it. There would be some tricky differences though.
For example, anything you do in an interrupt handler in a multi-threaded program can change data used by all the threads, while in a forked program, you would have to send multiple interrupts, which would be caught at different times, and might lead to unintended effects.
If you want threading behavior, just use a thread.
AFAIK, fork will create a separate process with its own context, stack and so on. Depends what you mean by "simulating"...
You might want to check this out : http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them
A few of the answers here focus on "don't mix fork and threads". But the way I read your question is: "can you use two different processes, and still communicate quickly and conveniently with shared memory between them, just like how threads have access to each others' memory?"
And the answer is, yes you can, but you have to remember to explicitly mark which memory areas you want shared. You can not just share your variables between the processes. Also, you can communicate this way between processes not related to each other at all. It is not limited to processes forked from each other.
Have a look at shared memory or "shm".

Impact of hundreds of idle threads

I am considering the use of potentially hundreds of threads to implement tasks that manage devices over a network.
This is a C++ application running on a powerpc processor with a linux kernel.
After an initial phase when each task does synchronization to copy data from the device into the task, the task becomes idle, and only wakes up when it receives an alarm, or needs to change some data (configuration), which is rare after the start phase. Once all tasks reach the "idle" phase, I expect that only a few per second will need to wake.
So, my main concern is, if I have hundreds of threads will they have a negative impact on the system once they become idle?
Thanks.
amso
edit:
I'm updating the question based on the answers that I got. Thanks guys.
So it seems that having a ton of threads idling (IO blocked, waiting, sleeping, etc), per se , will not have an impact on the system in terms of responsiveness.
Of course, they will spend extra money for each thread's stack and TLS data but that's okay as long as we throw more memory at the thing (making it more €€€)
But then, other issues have to be accounted for. Having 100s of threads waiting will likely increase memory usage on the kernel, due to the need of wait queues or other similar resources. There's also a latency issue, which looks non-deterministic. To check the responsiveness and memory usage of each solution one should measure it and compare.
Finally, the whole idea of hundreds of threads that will be mostly idling may be modeled like a thread pool. This reduces a bit of code linearity but dramatically increases the scalability of the solution and with propper care can be easily tunable to adjust the compromise between performance and resource usage.
I think that's all. Thanks everyone for their input.
--
amso
Each thread has overhead - most importantly each one has its own stack and TLS. Performance is not that much of a problem since they will not get any time slices unless they actually do anything. You may still want to consider using thread pools.
Chiefly they will use up address space and memory for stacks; once you get, say, 1000 threads, this gets quite significant as I've seen that 10M per thread is typical for stacks (on x86_64). It is changable, but only with care.
If you have a 32-bit processor, address space will be the main limitation once you hit 1000s of threads, you can easily exhaust the AS.
They use up some kernel memory, but probably not as much as userspace.
Edit: of course threads share address space with each other only if they are in the same process; I am assuming that they are.
I'm not a Linux hacker, but assuming that Linux's thread scheduling is similar to Windows'...
Yes, of course the will be some impact. Every bit of memory you consume will potentially have some impact.
However, in a time-sliced environment, threads that are in a Wait/Sleep/Join state will not consume CPU cycles until they are awoken.
I would be worried about offering 1:1 thread-connections mappings, if nothing else because it leaves you rather exposed to denial of service attacks. (pthread_create() is a fairly expensive operation compared to just a call to accept())
EboMike has already answered the question directly - provided threads are blocked and not busy-waiting then they won't consume much in the way of resources although they will occupy memory and swap for all the per-thread state.
I'm learning the basics of the kernel now. I can't give you a specific answer yet; I'm still a noob... but here are some things for you to chew on.
Linux implements each POSIX thread as a unique process. This will create overhead as others have mentioned. In addition to this, your waiting model appears flawed any way you do it. If you create one conditional variable for each thread, then I think (based off of my interpretation of the website below) that you'll actually be expending a lot of kernel memory, as each thread would be placed into its own wait queue. If instead you break your threads up for each group of X threads to share a conditional variable, then you've got problems as well because every time the variable signals, you must wake up _EVERY_DARN_PROCESS_ in that variable's wait queue.
I also assume that you will need to do some object sharing an synchronization. In this case, your code may get slower because of the need to wake up all processes waiting on a resource, as I mentioned earlier.
I know this wasn't much help, but as I said, I'm a kernel noob. Hope it helped a little.
http://book.chinaunix.net/special/ebook/PrenticeHall/PrenticeHallPTRTheLinuxKernelPrimer/0131181637/ch03lev1sec7.html
I'm not sure what "device" you are talking about, but if it's a file descriptor, I'd suggest that you look at starting to migrate to using either poll or epoll (Id suggest the latter given the description of how active you expect each file descriptor to be). That way, you could use one process which would be responsible for all the fds.

More threads, better performance?

When I write a message driven app. much like a standard windows app only that it extensively uses messaging for internal operations, what would be the best approach regarding to threading?
As I see it, there are basically three approaches (if you have any other setup in mind, please share):
Having a single thread process all of the messages.
Having separate threads for separate message types (General, UI, Networking, etc...)
Having multiple threads that share and process a single message queue.
So, would there be any significant performance differences between the three?
Here are some general thoughts:
Obviously, the last two options benefit from a situation where there's more than one processor. Plus, if any thread is waiting for an external event, other threads can still process unrelated messages. But ignoring that, seems that multiple threads only add overhead (Thread switches, not to mention more complicated sync situations).
And another question: Would you recommend to implement such a system upon the standard Windows messaging system, or to implement a separate queue mechanism, and why?
The specific choice of threading model should be driven by the nature of the problem you are trying to solve. There isn't necessarily a single "correct" approach to designing the threading model for such an application. However, if we adopt the following assumptions:
messages arrive frequently
messages are independent and don't rely too heavily on shared resources
it is desirable to respond to an arriving message as quickly as possible
you want the app to scale well across processing architectures (i.e. multicode/multi-cpu systems)
scalability is the key design requirement (e.g. more message at a faster rate)
resilience to thread failure / long operations is desirable
In my experience, the most effective threading architecture would be to employ a thread pool. All messages arrive on a single queue, multiple threads wait on the queue and process messages as they arrive. A thread pool implementation can model all three thread-distribution examples you have.
#1 Single thread processes all messages => thread pool with only one thread
#2 Thread per N message types => thread pool with N threads, each thread peeks at the queue to find appropriate message types
#3 Multiple threads for all messages => thread pool with multiple threads
The benefits of this design is that you can scale the number of threads in the thread in proportion to the processing environment or the message load. The number of threads can even scale at runtime to adapt to the realtime message load being experienced.
There are many good thread pooling libraries available for most platforms, including .NET, C++/STL, Java, etc.
As to your second question, whether to use standard windows message dispatch mechanism. This mechanism comes with significant overhead and is really only intended for pumping messages through an windows application's UI loop. Unless this is the problem you are trying to solve, I would advise against using it as a general message dispatching solution. Furthermore, windows messages carry very little data - it is not an object-based model. Each windows message has a code, and a 32-bit parameter. This may not be enough to base a clean messaging model on. Finally, the windows message queue is not design to handle cases like queue saturation, thread starvation, or message re-queuing; these are cases that often arise in implementing a decent message queing solution.
We can't tell you much for sure without knowing the workload (ie, the statistical distribution of events over time) but in general
single queue with multiple servers is at least as fast, and usually faster, so 1,3 would be preferable to 2.
multiple threads in most languages add complexity because of the need to avoid contention and multiple-writer problems
long duration processes can block processing for other things that could get done quicker.
So horseback guess is that having a single event queue, with several server threads taking events off the queue, might be a little faster.
Make sure you use a thread-safe data structure for the queue.
It all depends.
For example:
Events in a GUI queue are best done by a single thread as there is an implied order in the events thus they need to be done serially. Which is why most GUI apps have a single thread to handle events, though potentially multiple events to create them (and it does not preclude the event thread from creating a job and handling it off to a worker pool (see below)).
Events on a socket can potentially by done in parallel (assuming HTTP) as each request is stateless and can thus by done independently (OK I know that is over simplifying HTTP).
Work Jobs were each job is independent and placed on queue. This is the classic case of using a set of worker threads. Each thread does a potentially long operation independently of the other threads. On completion comes back to the queue for another job.
In general, don't worry about the overhead of threads. It's not going to be an issue if you're talking about merely a handful of them. Race conditions, deadlocks, and contention are a bigger concern, and if you don't know what I'm talking about, you have a lot of reading to do before you tackle this.
I'd go with option 3, using whatever abstractions my language of choice offers.
Note that there are two different performance goals, and you haven't stated which you are targetting: throughput and responsiveness.
If you're writing a GUI app, the UI needs to be responsive. You don't care how many clicks per second you can process, but you do care about showing some response within a 10th of a second or so (ideally less). This is one of the reasons it's best to have a single thread devoted to handling the GUI (other reasons have been mentioned in other answers). The GUI thread needs to basically convert windows messages into work-items and let your worker queue handle the heavy work. Once the worker is done, it notifies the GUI thread, which then updates the display to reflect any changes. It does things like painting a window, but not rendering the data to be displayed. This gives the app a quick "snapiness" that is what most users want when they talk about performance. They don't care if it takes 15 seconds to do something hard, as long as when they click on a button or a menu, it reacts instantly.
The other performance characteristic is throughput. This is the number of jobs you can process in a specific amount of time. Usually this type of performance tuning is only needed on server type applications, or other heavy-duty processing. This measures how many webpages can be served up in an hour, or how long it takes to render a DVD. For these sort of jobs, you want to have 1 active thread per CPU. Fewer than that, and you're going to be wasting idle clock cycles. More than that, and the threads will be competing for CPU time and tripping over each other. Take a look at the second graph in this article DDJ articles for the trade-off you're dealing with. Note that the ideal thread count is higher than the number of available CPUs due to things like blocking and locking. The key is the number of active threads.
A good place to start is to ask yourself why you need multiple threads.
The well-thought-out answer to this question will lead you to the best answer to the subsequent question, "how should I use multiple threads in my application?"
And that must be a subsequent question; not a primary question. The fist question must be why, not how.
I think it depends on how long each thread will be running. Does each message take the same amount of time to process? Or will certain messages take a few seconds for example. If I knew that Message A was going to take 10 seconds to complete I would definitely use a new thread because why would I want to hold up the queue for a long running thread...
My 2 cents.
I think option 2 is the best. Having each thread doing independant tasks would give you best results. 3rd approach can cause more delays if multiple threads are doing some I/O operation like disk reads, reading common sockets and so on.
Whether to use Windows messaging framework for processing requests depends on the work load each thread would have. I think windows restricts the no. of messages that can be queued at the most to 10000. For most of the cases this should not be an issue. But if you have lots of messages to be queued this might be some thing to take into consideration.
Seperate queue gives a better control in a sense that you may reorder it the way you want (may be depending on priority)
Yes, there will be performance differences between your choices.
(1) introduces a bottle-neck for message processing
(3) introduces locking contention because you'll need to synchronize access to your shared queue.
(2) is starting to go in the right direction... though a queue for each message type is a little extreme. I'd probably recommend starting with a queue for each model in your app and adding queues where it makes since to do so for improved performance.
If you like option #2, it sounds like you would be interested in implementing a SEDA architecture. It is going to take some reading to understand what is going on, but I think the architecture fits well with your line of thinking.
BTW, Yield is a good C++/Python hybrid implementation.
I'd have a thread pool servicing the message queue, and make the number of threads in the pool easily configurable (perhaps even at runtime). Then test it out with expected load.
That way you can see what the actual correlation is - and if your initial assumptions change, you can easily change your approach.
A more sophisticated approach would be for the system to introspect its own performance traits and adapt it's use of resources, threads in particular, as it goes. Probably overkill for most custom application code, but I'm sure there are products that do that out there.
As for the windows events question - I think that's probably an application specific question that there is no right or wrong answer to in the general case. That said, I usually implement my own queue as I can tailor it to the specific characteristics of the task at hand. Sometimes that might involve routing events via the windows message queue.