When to use multithreading in C++? [closed] - c++

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am C++ programmer (intermediate) and learning multi-threading now. I found it quite confusing when to use multi-threading in C++? How will i come to know that i need to use multi-threading in which part of section?
When to use multithreading in C++?

When you have resource intensive task like huge mathematical calculation , or I/O intensive task like reading or writing to file, use should your multithreading.
Purpose should be, you can be able to run multiple things (tasks) together, so that it will increase performance and responsiveness of your application. Also, learn about synchronization before implementing multithreading in your application.

When to use multithreading in C++?`
Well - the general rule of thumb is: use it when it can speed up your application. The answer isn't really language-dependant.
If you want to get an in-depth answer, then you have to consider a few things:
Is multithreading possible to implement inside your code? Do you have fragments which can be calulated at the same time and are intependent of other calculations?
Is multithreading worth implementing? Does your program run slow even when you did all you could to make it as fast as possible?
Will your code be run on machines that support multithreading (so have multiple processing units)? If you're designing code for some kind of machine with only one core, using multithreading is a waste of time.
Is there a different option? A better algorithm, cleaning the code, etc? If so - maybe it's better to use that instead of multithreading?
Do you have to handle things that are hard to predict in time, while the whole application has to constantly run? For example - receiving some information from a server in a game?

This is a slightly subjective subject... But I tend to use multi-threading in one of two situations.
1 - In a performance critical situation where the utmost power is needed (and the algorithm of course supports parallelism), for me, matrix multiplications.
2 - Rarely where it may be easier to have a thread managing something fairly independent. The classic is networking, perhaps have a thread blocking waiting for connections and spawning threads to manage each thread as it comes in. This is useful as the threads can block and respond in a timely manner. Say you have a server, one request might need disk access which is slow, another thread can jump in an field a different request while the first is waiting for its data.
As has been said by others, only when you need to should you think about doing it, it gets complicated fast and can be difficult to debug.

Multithreading is a specialized form of multitasking and a multitasking is the feature that allows your computer to run two or more programs concurrently.
I think this link can help you.
http://www.tutorialspoint.com/cplusplus/cpp_multithreading.htm

Mostly when you want things to be done at the same time. For instance, you may want a window to still respond to user input when a level is loading in a game or when you're downloading multiple files at once, etc. It's for things that really can't wait until other processing is done. Of course, both probably go slower as a result, but it really gives the illusion of multiple things happening at once.

Use multithreading when you can speed up your algorithms by doing things in parallel. Use it in opposition to multiprocessing when the threads need access to the parent process's resources.

My two cents.
Use cases:
Integrate your application in a lib/app that already runs a loop. You would need a thread of your own to run your code concurrently if you cannot integrate into the other app.
Task splitting. It makes sense to organize disjoint tasks in threads sometimes, such as in separating sound from image processing, for example.
Performance. When you want to improve the throghput of some task.
Recommendations:
In the general case, don't do multithreading if a single threaded solution will suffice. It adds complexity.
When needed, start with higher-order primitives, such as std::future and std::async.
When possible, avoid data sharing, which is the source of contention.
When going to lower level abstractions, such as mutexes and so on, encapsulate it in some pattern. You can take a look at these slides.
Decouple your functions from threading and compose the threading into the functions at a later point. Namely, don't embed thread creation into the logic of your code.

Related

Benefits of a multi thread program in a unicore system [duplicate]

This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??
It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.
Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.
Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.
Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.
As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.
I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.
For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.
In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.
As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.

Document locking in multithreading environment [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
We have an application that supports binary plugins (dynamically loaded libraries) as well as a number of plugins for this application. The application is itself multithreaded and the plugins may also start threads. There's a lot of locking going on to keep data structures consistent.
One major problem is that sometimes locks are held across calls from the application into a plugin. This is problematic because the plugin code might want to call back into the application, producing a deadlock. This problem is aggravated by the fact that different teams work on the base application and the plugins.
The question is: Is there a "standard" or at least widely used way of documenting locking schemes apart from writing tons of plain text?
It is a theorical approach, I hope it will help you a little.
To me you can avoid this situation by redesigning the way plugins and your application are communicating (if possible).
A plugin's code is not secure. To ensure the application's flexibility and its stability you must build a standard way to exchange informations and make critical actions with plugins.
The easiest way is to avoid to manage each specific plugin behavior by defining a lock free api.
To do that you can make the critical parts of your plugins asynchronous by using ring buffer / disruptor or just an action buffer.
EDIT
Sorry if I argue again in the same way, but this seems to me to be like an "IO" problem.
You have concurrent access on some resources (memory/disc/network .... don't know which ones) and the need to expose them with high availability. And finally these resources cannot be access randomly without locking your application.
With a manager dedicated on the critical parts, the wait can be short enough to be imperceptible.
However this is not easily applicable to an already existing application, mostly if it is a large one.
if you don't already know this kind of stuff, I encourage you to look to the "disruptor". To me it is one of the modern basic to consider every time I work with threads.
I suggest to use Petri Net which are simple to learn and can describe very well the cooperation among the different parts of your software. In this question are described several models and tools useful to document concurrency: https://stackoverflow.com/questions/164187/what-tools-diagrams-do-you-use-for-modelling-multithreaded-systems. You can choose the right model according your needs.
If your locking scheme is simple enough that you can describe it in documentation, then by all means do so. However, if deadlocks are occurring in practice, the problem may not be lack of documentation, but that the API is not serving the needs of your plugin authors. Documenting the limitations is a good first step, but removing the limitations is better.
Consider the possibilities for a deadlock on a single lock held by your code and requested by the plugin:
Your code is not in the middle of reading or writing, but is still holding the lock just because that's how the code was written. In that case, your code should release the lock before calling into the plugin.
Your code and the plugin are both reading data, and using the lock to prevent concurrent writers. In that case, use a readers-writers lock.
Your code is in the middle of changing data, and the plugin wants to read it. This is not generally safe; there's a reason you're using a lock to protect the entire modification, after all. Most attempts to make this safe fail in practice (it is as hard as writing lock-free code). In this case, the best thing to do is change your design so your code finishes changes before calling the plugin, or starts changes after calling the plugin.
Your code is in the middle of reading data, and the plugin wants to change it. Like the previous case, this is also not safe. Your code should release the lock before calling the plugin and acquire it again afterward, and assume the data have changed, re-reading anything you need to continue.
This is the best advice I can give without knowing anything more about your application and its specific needs.
For most applications, software companies shy away from 3rd party binary plugins in the same process because when something goes wrong, it is very difficult to figure out why. Users usually blame the application, not the plugin, and the perception of the quality of your application is poor. It can be made to work by keeping very close relationships with your plugin authors, usually including exchanging all source code (optionally under restrictive licenses or NDAs).
Yes, there is a standard way of documenting locking schemes using in university.
1/ use diagram
you must draw a diagram. each point on the diagram is a lock link to other thread.
ex: T1 T2
1 -R-> A
2 <-W- B
2/ use table
you must write down each point and thread on each row
ex: T1 T2
lockX(A) lockS(B)
read(A) read(B)
A<-A50 unlock(B)
Conclude: this is very complex task and take many time to trace.

Thread per connection vs Reactor pattern (with a thread pool)?

I want to write a simple multiplayer game as part of my C++ learning project.
So I thought, since I am at it, I would like to do it properly, as opposed to just getting-it-done.
If I understood correctly: Apache uses a Thread-per-connection architecture, while nginx uses an event-loop and then dedicates a worker [x] for the incoming connection. I guess nginx is wiser, since it supports a higher concurrency level. Right?
I have also come across this clever analogy, but I am not sure if it could be applied to my situation. The analogy also seems to be very idealist. I have rarely seen my computer run at 100% CPU (even with a umptillion Chrome tabs open, Photoshop and what-not running simultaneously)
Also, I have come across a SO post (somehow it vanished from my history) where a user asked how many threads they should use, and one of the answers was that it's perfectly acceptable to have around 700, even up to 10,000 threads. This question was related to JVM, though.
So, let's estimate a fictional user-base of around 5,000 users. Which approach should would be the "most concurrent" one?
A reactor pattern running everything in a single thread.
A reactor pattern with a thread-pool (approximately, how big do you suggest the thread pool should be?
Creating a thread per connection and then destroying the thread the connection closes.
I admit option 2 sounds like the best solution to me, but I am very green in all of this, so I might be a bit naive and missing some obvious flaw. Also, it sounds like it could be fairly difficult to implement.
PS: I am considering using POCO C++ Libraries. Suggesting any alternative libraries (like boost) is fine with me. However, many say POCO's library is very clean and easy to understand. So, I would preferably use that one, so I can learn about the hows of what I'm using.
Reactive Applications certainly scale better, when they are written correctly. This means
Never blocking in a reactive thread:
Any blocking will seriously degrade the performance of you server, you typically use a small number of reactive threads, so blocking can also quickly cause deadlock.
No mutexs since these can block, so no shared mutable state. If you require shared state you will have to wrap it with an actor or similar so only one thread has access to the state.
All work in the reactive threads should be cpu bound
All IO has to be asynchronous or be performed in a different thread pool and the results feed back into the reactor.
This means using either futures or callbacks to process replies, this style of code can quickly become unmaintainable if you are not used to it and disciplined.
All work in the reactive threads should be small
To maintain responsiveness of the server all tasks in the reactor must be small (bounded by time)
On an 8 core machine you cannot cannot allow 8 long tasks arrive at the same time because no other work will start until they are complete
If a tasks could take a long time it must be broken up (cooperative multitasking)
Tasks in reactive applications are scheduled by the application not the operating system, that is why they can be faster and use less memory. When you write a Reactive application you are saying that you know the problem domain so well that you can organise and schedule this type of work better than the operating system can schedule threads doing the same work in a blocking fashion.
I am a big fan of reactive architectures but they come with costs. I am not sure I would write my first c++ application as reactive, I normally try to learn one thing at a time.
If you decide to use a reactive architecture use a good framework that will help you design and structure your code or you will end up with spaghetti. Things to look for are:
What is the unit of work?
How easy is it to add new work? can it only come in from an external event (eg network request)
How easy is it to break work up into smaller chunks?
How easy is it to process the results of this work?
How easy is it to move blocking code to another thread pool and still process the results?
I cannot recommend a C++ library for this, I now do my server development in Scala and Akka which provide all of this with an excellent composable futures library to keep the code clean.
Best of luck learning C++ and with which ever choice you make.
Option 2 will most efficiently occupy your hardware. Here is the classic article, ten years old but still good.
http://www.kegel.com/c10k.html
The best library combination these days for structuring an application with concurrency and asynchronous waiting is Boost Thread plus Boost ASIO. You could also try a C++11 std thread library, and std mutex (but Boost ASIO is better than mutexes in a lot of cases, just always callback to the same thread and you don't need protected regions). Stay away from std future, cause it's broken:
http://bartoszmilewski.com/2009/03/03/broken-promises-c0x-futures/
The optimal number of threads in the thread pool is one thread per CPU core. 8 cores -> 8 threads. Plus maybe a few extra, if you think it's possible that your threadpool threads might call blocking operations sometimes.
FWIW, Poco supports option 2 (ParallelReactor) since version 1.5.1
I think that option 2 is the best one. As for tuning of the pool size, I think the pool should be adaptive. It should be able to spawn more threads (with some high hard limit) and remove excessive threads in times of low activity.
as the analogy you linked to (and it's comments) suggest. this is somewhat application dependent. now what you are building here is a game server. let's analyze that.
game servers (generally) do a lot of I/O and relatively few calculations, so they are far from 100% CPU applications.
on the other hand they also usually change values in some database (a "game world" model). all players create reads and writes to this database. which is exactly the intersection problem in the analogy.
so while you may gain some from handling the I/O in separate threads, you will also lose from having separate threads accessing the same database and waiting for its locks.
so either option 1 or 2 are acceptable in your situation. for scalability reasons I would not recommend option 3.

C++ Server - To Thread or not to Thread?

I'm working on a game server, written in C++, and I'm trying to decide how many threads to use and what tasks to thread. The basic server skeleton consists of keyboard I/O and output to a console, accepting incoming connects, sending outgoing connects, and doing the game "stuff".
What I'd like to know is which things should be given a separate thread. Should each connect have its own thread? I know this is variable, it depends on the project or so, but I would like it to support a pretty decent number of players (somewhere in the hundreds if possible).
The standard answer should always be: Try it the simplest way first, and only look for ways to improve performance if the simple way isn't good enough. However, re-architecting a large C++ program can be a painful experience, so some guesses about performance in advance may be appropriate.
Theoretically, hundreds of threads are probably OK on modern machines. The NPTL implementation for Linux was tested with tens of thousands of threads, as I recall. If that's the easiest way for you to implement, it may be the right answer.
However, high-performance web servers and similar typically use event-driven models instead. Consider a library like libevent. I'm sure there are C++ libraries for the same purpose.
I personally believe that languages without first-class continuations, or at least coroutines, are poor choices for this kind of work, but the C language family is how we get work done today, so off we go. :-)
A good solution could be to use a Thread pool.
Idea is to let the main thread dispatch equitably all connexions in a fixed number of threads.
With a good design, you can easily set the number of thread on runtime.
You can find more informations here.
Create more threads than you have CPU cores is not productive, and adding too threads decrease performances due to time taken for switching between threads.
By example, for compiling a large project (it's not exactly the same thing, but it's valid for both case), it's often recommended to use no more thread than number of CPU cores + 1.
A very common technique is to have the game server run on one thread to monitor several connections (i.e. sockets) by using a select on each socket. When data is available, grab the data and enqueue it in a producer/consumer type model for the game engine to pick up.
This is by no means the be-all-end-all implementation, but it should be enough to get you started. Sounds like a cool project. Good luck!
If you setup the connections and utilize them in a manner that cause the thread to block waiting on IO then you should be able to service all of the connections and the keyboard on one thread. You may not want to put the console output on that same thread, as I've seen cases (on windows at least), where the speed of writing to the console is actually a bottleneck (i.e. if the console window is minimized the process runs considerably faster).
If the work of your game engine parallelizes well then you probably want to set use as many threads as there are CPUs less one (for the OS and the other two threads). If you expect the client to run on the same machine the server will want to detect that and scale back the number of threads it uses.

Large number of simultaneous long-running operations in Qt

I have some long-running operations that number in the hundreds. At the moment they are each on their own thread. My main goal in using threads is not to speed these operations up. The more important thing in this case is that they appear to run simultaneously.
I'm aware of cooperative multitasking and fibers. However, I'm trying to avoid anything that would require touching the code in the operations, e.g. peppering them with things like yieldToScheduler(). I also don't want to prescribe that these routines be stylized to be coded to emit queues of bite-sized task items...I want to treat them as black boxes.
For the moment I can live with these downsides:
Maximum # of threads tend to be O(1000)
Cost per thread is O(1MB)
To address the bad cache performance due to context-switches, I did have the idea of a timer which would juggle the priorities such that only idealThreadCount() threads were ever at Normal priority, with all the rest set to Idle. This would let me widen the timeslices, which would mean fewer context switches and still be okay for my purposes.
Question #1: Is that a good idea at all? One certain downside is it won't work on Linux (docs say no QThread::setPriority() there).
Question #2: Any other ideas or approaches? Is QtConcurrent thinking about this scenario?
(Some related reading: how-many-threads-does-it-take-to-make-them-a-bad-choice, many-threads-or-as-few-threads-as-possible, maximum-number-of-threads-per-process-in-linux)
IMHO, this is a very bad idea. If I were you, I would try really, really hard to find another way to do this. You're combining two really bad ideas: creating a truck load of threads, and messing with thread priorities.
You mention that these operations only need to appear to run simultaneously. So why not try to find a way to make them appear to run simultaneously, without literally running them simultaneously?
It's been 6 months, so I'm going to close this.
Firstly I'll say that threads serve more than one purpose. One is speedup...and a lot of people are focusing on that in the era of multi-core machines. But another is concurrency, which can be desirable even if it slows the system down when taken as a whole. Yet concurrency can be achieved using mechanisms more lightweight than threads, although it may complicate the code.
So this is just one of those situations where the tradeoff of programmer convenience against user experience must be tuned to fit the target environment. It's how Google's approach to a process-per-tab with Chrome would have been ill-advised in the era of Mosaic (even if process isolation was preferable with all else being equal). If the OS, memory, and CPU couldn't give a good browsing experience...they wouldn't do it that way now.
Similarly, creating a lot of threads when there are independent operations you want to be concurrent saves you the trouble of sticking in your own scheduler and yield() operations. It may be the cleanest way to express the code, but if it chokes the target environment then something different needs to be done.
So I think I'll settle on the idea that in the future when our hardware is better than it is today, we'll probably not have to worry about how many threads we make. But for now I'll take it on a case-by-case basis. i.e. If I have 100 of concurrent task class A, and 10 of concurrent task class B, and 3 of concurrent task class C... then switching A to a fiber-based solution and giving it a pool of a few threads is probably worth the extra complication.