When is lock free programming faster and slower than lock/mutex based programming? - concurrency

Hi I learned that lock free programming is able to give each thread freedom to decide its own progress. In other words, a thread’s progress is not affected by other threads if it chooses so.
But let’s step aside from flexibility and only talks about performance, meaning how fast the entire program runs, then which one is faster under what conditions: lock free programming or lock/mutex based programming?
Thanks.

Related

c/c++ maximum number of mutexes allowed in Linux

I've been searching trying to find out what is the maximum number of mutexes in Linux for a c/c++ process without success. Also, is there a way to modify this number. The book I'm reading mentions how to find the max number of threads allowed in Linux and how to modify this number but no mention of mutexes.
Check this pthread_mutex_init.
Why No Limits are Defined
Defining symbols for the maximum number of mutexes and condition variables was considered but rejected because the number of these objects may change dynamically. Furthermore, many implementations place these objects into application memory; thus, there is no explicit maximum.
EDIT: In the comments you asked about the costs a mutex may have other than memory. Well, I don't know, but I found some interesting material about that:
This article on How does a Mutex Work says this about the costs:
The Costs
There are a few points of interest when it comes to the cost of a mutex. The first, and very vital point, is waiting time. Your threads should spend only a fraction of their time waiting on mutexes. If they are waiting too often then you are losing concurrency. In a worst case scenario many threads always trying to lock the same mutex may result in performance worse than a single thread serving all requests. This really isn’t a cost of the mutex itself, but a serious concern with concurrent programming.
The overhead costs of a mutex relate to the test-and-set operation and the system call that implements a mutex. The test-and-set is likely very low cost; being essential to concurrent processing the CPUs have good reason to make it efficient. We’ve kind of omitted another important instruction however: the fence. This is used in all high-level mutexes and may have a higher cost than the test-and-set operation. More costlier than even that however is the system call. Not only do you suffer the context switch overhead of the system call, the kernel now spends some time in its scheduling code.
So I'm guessing the costs they talk about on the EAGAIN error involves either the CPU or internal kernel structures. Maybe both. Maybe some kernel error... I honestly don't know.
StackOverflow resources
I picked some SO Q&A that might interest you. Good reading!
How efficient is locking an unlocked mutex? What is the cost of a mutex?
How pthread_mutex_lock is implemented
How do mutexes really work?
When should we use mutex and when should we use semaphore

When to use multithreading in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am C++ programmer (intermediate) and learning multi-threading now. I found it quite confusing when to use multi-threading in C++? How will i come to know that i need to use multi-threading in which part of section?
When to use multithreading in C++?
When you have resource intensive task like huge mathematical calculation , or I/O intensive task like reading or writing to file, use should your multithreading.
Purpose should be, you can be able to run multiple things (tasks) together, so that it will increase performance and responsiveness of your application. Also, learn about synchronization before implementing multithreading in your application.
When to use multithreading in C++?`
Well - the general rule of thumb is: use it when it can speed up your application. The answer isn't really language-dependant.
If you want to get an in-depth answer, then you have to consider a few things:
Is multithreading possible to implement inside your code? Do you have fragments which can be calulated at the same time and are intependent of other calculations?
Is multithreading worth implementing? Does your program run slow even when you did all you could to make it as fast as possible?
Will your code be run on machines that support multithreading (so have multiple processing units)? If you're designing code for some kind of machine with only one core, using multithreading is a waste of time.
Is there a different option? A better algorithm, cleaning the code, etc? If so - maybe it's better to use that instead of multithreading?
Do you have to handle things that are hard to predict in time, while the whole application has to constantly run? For example - receiving some information from a server in a game?
This is a slightly subjective subject... But I tend to use multi-threading in one of two situations.
1 - In a performance critical situation where the utmost power is needed (and the algorithm of course supports parallelism), for me, matrix multiplications.
2 - Rarely where it may be easier to have a thread managing something fairly independent. The classic is networking, perhaps have a thread blocking waiting for connections and spawning threads to manage each thread as it comes in. This is useful as the threads can block and respond in a timely manner. Say you have a server, one request might need disk access which is slow, another thread can jump in an field a different request while the first is waiting for its data.
As has been said by others, only when you need to should you think about doing it, it gets complicated fast and can be difficult to debug.
Multithreading is a specialized form of multitasking and a multitasking is the feature that allows your computer to run two or more programs concurrently.
I think this link can help you.
http://www.tutorialspoint.com/cplusplus/cpp_multithreading.htm
Mostly when you want things to be done at the same time. For instance, you may want a window to still respond to user input when a level is loading in a game or when you're downloading multiple files at once, etc. It's for things that really can't wait until other processing is done. Of course, both probably go slower as a result, but it really gives the illusion of multiple things happening at once.
Use multithreading when you can speed up your algorithms by doing things in parallel. Use it in opposition to multiprocessing when the threads need access to the parent process's resources.
My two cents.
Use cases:
Integrate your application in a lib/app that already runs a loop. You would need a thread of your own to run your code concurrently if you cannot integrate into the other app.
Task splitting. It makes sense to organize disjoint tasks in threads sometimes, such as in separating sound from image processing, for example.
Performance. When you want to improve the throghput of some task.
Recommendations:
In the general case, don't do multithreading if a single threaded solution will suffice. It adds complexity.
When needed, start with higher-order primitives, such as std::future and std::async.
When possible, avoid data sharing, which is the source of contention.
When going to lower level abstractions, such as mutexes and so on, encapsulate it in some pattern. You can take a look at these slides.
Decouple your functions from threading and compose the threading into the functions at a later point. Namely, don't embed thread creation into the logic of your code.

Benefits of a multi thread program in a unicore system [duplicate]

This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??
It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.
Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.
Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.
Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.
As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.
I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.
For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.
In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.
As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.

Threading vs Task-Based vs Asynchronous Programming

I'm new to this concept. Are these the same or different things? What is the difference? I really like the idea of being able to run two processes at once, for example if I have several large files to load into my program I'd love to load as many of them simultaneously as possible instead of waiting for one at a time. And when working with a large file, such as wav file, it would be great to break it into pieces and do processing on several chunks at once and then put them back together. What do I want to look into to learn how to do this sort of thing?
Edit: Also, I know using more than one core on a multicore processor fits in here somewhere, but apparently asynchronous programming doesn't necessarily mean you are using multiple cores? Why would you do this if you didn't have multiple cores to take advantage of?
They are related but different.
Threading, normally called multi-threading, refers to the use of multiple threads of execution within a single process. This usually refers to the simple case of using a small set of threads each doing different tasks that need to be, or could benefit from, running simultaneously. For example, a GUI application might have one thread draw elements, another thread respond to events like mouse clicks, and another thread do some background processing.
However, when the number of threads, each doing their own thing, is taken to an extreme, we usually start to talk about an Agent-based approach.
The task-based approach refers to a specific strategy in software engineering where, in abstract terms, you dynamically create "tasks" to be accomplished, and these tasks are picked up by a task manager that assigns the tasks to threads that can accomplish them. This is more of a software architectural thing. The advantage here is that the execution of the whole program is a succession of tasks being relayed (task A finished -> trigger task B, when both task B and task C are done -> trigger task D, etc..), instead of having to write a big function or program that executes each task one after the other. This gives flexibility when it is unclear which tasks will take more time than others, and when tasks are only loosely coupled. This is usually implemented with a thread-pool (threads that are waiting to be assigned a task) and some message-passing interface (MPI) to communicate data and task "contracts".
Asynchronous programming does not refer to multi-threaded programming, although the two are very often associated (and work well together). A synchronous program must complete each step before moving on to the next. An asynchronous program starts a step, moves on to other steps that don't require the result of the first step, then checks on the result of the first step when its result is required.
That is, a synchronous program might go a little bit like this: "do this task", "wait until done", "do something with the result", and "move on to something else". By contrast, an asynchronous program might go a little more like this: "I'm gonna start a task, and I'll need the result later, but I don't need it just now", "in the meantime, I'll do something else", "I can't do anything else until I have the result of the first step now, so I'll wait for it, if it isn't ready", and "move on to something else".
Notice that "asynchronous" refers to a very broad concept, that always involves some form of "start some work and tell me when it's done" instead of the traditional "do it now!". This does not require multi-threading, in which case it just becomes a software design choice (which often involves callback functions and things like that to provide "notification" of the asynchronous result). With multiple threads, it becomes more powerful, as you can do various things in parallel while the asynchronous task is working. Taken to the extreme, it can become a more full-blown architecture like a task-based approach (which is one kind of asynchronous programming technique).
I think the thing that you want corresponds more to yet another concept: Parallel Computing (or parallel processing). This approach is more about splitting a large processing task into smaller parts and processing all parts in parallel, and then combining the results. You should look into libraries like OpenMP or OpenCL/CUDA (for GPGPU). That said, you can use multi-threading for parallel processing.
but apparently asynchronous programming doesn't necessarily mean you are using multiple cores?
Asynchronous programming does not necessarily involve anything happening concurrently in multiple threads. It could mean that the OS is doing things on your behalf behind the scenes (and will notify you when that work is finished), like in asynchronous I/O, which happens without you creating any threads. It boils down to being a software design choice.
Why would you do this if you didn't have multiple cores to take advantage of?
If you don't have multiple cores, multi-threading can still improve performance by reusing "waiting time" (e.g., don't "block" the processing waiting on file or network I/O, or waiting on the user to click a mouse button). That means the program can do useful work while waiting on those things. Beyond that, it can provide flexibility in the design and make things seem to run simultaneously, which often makes users happier. Still, you are correct that before multi-core CPUs, there wasn't as much of an incentive to do multi-threading, as the gains often do not justify the overhead.
I think in general, all these are design related rather than language related. Same apply to multicore programming.
To reflect Jim, it's not only the file load scenario. Generally, you need to design the whole software to run concurrently in order to feel the real benefit of multi-threading, task based or asynchronous programming.
Try see things from a grand picture point of view. Understand the over all modelling of a specific example and see how these methodologies are implemented. It'll easy to see the difference and help understand when and where to use which.

Semaphore Mutex Concurrency Issue in multithreaded C program

How do i design a multithreaded C program to avoid Semaphore Mutex Concurrency
There are a few ways.
The best way is to eliminate all shared data between threads. While this isn't always practical, it's always good to eliminate as much shared data as possible.
After that, you need to start looking into lockless programming. Lockless programming is a bit of a fad right now, but the dirty secret is that it's often a much better idea to use lock-based concurrency like mutexes and semaphores. Lockless programming is very hard to get correct. Look up Herb Sutter's articles on the subject, or the wikipedia page. There are a lot of good resources about lockless synchronization out there.
Somewhere in between is critical sections. If you're programming on Windows, critical sections should be preferred to mutexes as they do some work to avoid the overhead of full mutex locks and unlocks. Try those out first, and if your performance is unacceptable (or you're targeting platforms without critical sections), then you can look into lockless techniques.
It would be best to program lock less code. It's hard, but possible.
GCC Atomic Builtins
Article by Andrei Alexandrescu (C++)
Be sure to pass on data structures between threads, always knowing which thread the data exclusively belongs to. If you use (as mentioned by Dan before) e.q. lockless queues to pass your data around, you shouldn't run into too many concurrency issues (as your code behaves much more like any other code waiting for some data to arrive).
If you are, however, migrating single- to multithreaded code - this is an entirely different beast. It's very hard. And most of the time there are no elegant solutions.
Also, have a look at InterlockedXXX series to perform atomic operation in multi-threaded environment.
InterlockedIncrement Function
InterlockedDecrement Function
InterlockedExchange Function