Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
I have around 50 similar tasks - each one performs network calls to some server and writes the response to the db. Each task has its own set of calls it should make. Each task talks to one server, if that's relevant.
I can have one process, each running its own task with its own set of calls it should make (so no memory sharing is needed between the threads). Another option is a process for each task. Or a combination.
Should I choose multiple threads because switching between processes is more costly?
It runs on Ubuntu.
I don't care about the cost of the creation of the thread/process. I want to start them and let them run for a long time.
Strictly in terms of performance, it comes down to initialization time because as far as execution time goes, in this particular case that there is no interprocess communication required, execution time will be very similar.
Threads are faster to initialize since there is less duplication of resources to initialize.
The so much advertised fact that "threads switch faster than process" like in this StackOverflow question/answer is a preposterous non-sequitur. If a batch is running at 100% cpu and there are enough free cores to supply the demand, the OS will not ever bother to switch threads into new cores. This will only happen when threads block and wait for input (or output chance) from network or disk - in which case the price of 3-20 microseconds will completely overshadow the few nanoseconds of advantage of an additional memory cache miss.
The only reason why the operating system would preempt a thread or process other than I/O block is when the machine is running out of computing resources, ie when it is running at more than 100% on every single available CPU. But this is such a special, degenerated case that should not belong in a generic response.
However if the running time is much larger than the initialization time, say the process runs for several minutes while it takes a few milliseconds to start up, then performance should definitely be the same and your requirement should be other like facility of administration of the process batch. Another point to keep in mind is that if one process dies, the others finish, while if one thread crashes, the entire batch dies as well.
There might be some extra benefits of using threads as well due to the lesser use of system resources but I believe these would be negligible.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 months ago.
Improve this question
I am using Pagmo, a C+ API for my optimization problem. Pagmo launches a new thread when a new optimization is launched, via invocation of island.evolve(). MY point is that I don't have fine-grained control here over the type of thread that's launched "under the hood" of Pagmo. I can query Pagmo threads on their status - as to whether they've completed their run. I have a machine with 28 physical cores and I'm thinking that the optimum number of my running threads would be on the order of 28. Right now, my code just dumps the whole lot of threads onto the machine and that's substantially more than the number of cores - and hence, likely very inefficient.
I'm thinking of using a std::counting_semaphore (C++ 20) and setting the counter to 28. each time I launch a thread in Pagmo, I would decrement the semaphore counter. When it hits 0, the remaining threads would block and wait in the queue until the counter was incremented.
I'm thinking I could run a loop which queried the Pagmo threads as to their statuses and increment the std::counting_semaphore's counter each time a thread went idle (meaning its task was completed). Of course, the Pagmo threads are ultimately joined. Each time the counter goes above 0, a new thread is allowed to start - so I believe.
My questions are:
is the the best practice as to how to limit/throttle the number of running threads using modern C++?
Is there a better way I haven't thought of?
Is there a way to query Linux in real time, as to the number of running threads?
Thanks in advance!
Phil
I had tried a simple loop to launch and throttle theads creation but it didn't prove to work well and threads were launched too quickly.
First of all, your post could use some editing and even perhaps provide a code snippet that would help us understand the problem more. Right now I'm only going through the documentation based on a wild guess of what you are doing there.
I've quickly checked what Pagmo is about and I would at first advise to be careful when limiting any library that is designed for parallel computation, from outside of the library.
I will try to answer your questions:
I do not think that this is the best way to throttle threads created by an external library
Yes, first of all I've checked the Pagmo API documentation and if I understand you correctly you are using an island class - based on what they state in their documentation the default class that inherits island and is constructed by the default ctor is thread_island (at least on non-POSIX systems - which may not be your case). However thread_island can be constructed via the thread_island(bool use_pool) ctor which indicates that you can specify to these island classes a common thread pool which they can use. And if it is done for non-POSIX systems, it is most likely done for POSIX systems as well. So if you wish to limit the number of threads I would do this via a thread pool.
You can limit the maximum number of threads running on linux via /proc/sys/kernel/threads-max, you can also instruct a Linux system to treat some processes with less importance via niceness
Hope it helps!
EDIT: As a foot note I will also mention that the documentation actually encourages the use of thread_island even on POSIX systems. See this link here
EDIT2: In case that you must use fork_island due to their mentioned issues when thread_safety cannot be guaranteed. Then another option would be to limit available resources via setrlimit see this link right here - you are interested in setting RLIMIT_NPROC
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Is it a real hardware thread?
I have a program which reads data from 30 COM devices every second and so far I only have access to 7. It works great when I implemented multithreading, one thread for each device and it doesn't block off my GUI while it waits to read data (it takes 30ms). I'm wondering though what will happen if I exceed the amount of threads I have on my CPU? If this isn't possible how would I approach this?
std::thread represents a thread, managed by the operating system. It has its stack, registers, instruction pointer etc. However, it is still managed by the OS. The operating system handles all the scheduling, assigning the thread to the hardware core and then preempting it if necessary to do another work on that core.
In a regular program you can't really lock core to do your work without any OS intervention. Otherwise, it could have negative impact on the stability of the system.
If you launch more threads than there are cores on your CPU, and they all run all the time, the OS will start swapping them in and out, effectively keeping them all running. However, this swapping is not for free, and you can slow everything down if you have too many of them.
However, if your threads are halted for whatever reason -- for example, you threads stop on a mutex, wait on a condition variable, or simply goes to sleep (e.g. via std::this_thread::sleep_for), then it no longer consume the hardware resources during that wait. In that scenario it is perfectly fine to have much more threads than there are cores on your CPU.
std::thread is not a "hardware thread". std::thread is a class in the C++ standard library. Instance of the std::thread class is an object that acts as a RAII container for a "native thread" i.e. a thread of execution provided by the API of the operating system.
When you create a std::thread (assuming you don't use the default constructor), the constructor will use the operating system API to create a native thread, and calls the passed function.
I'm wondering though what will happen if I exceed the amount of threads I have on my CPU?
The operating system has a subsystem called "process scheduler" which allocates the time of the hardware CPU cores (or logical core in case of hyper threading, which I assume is what you mean by "hardware thread") time for each of the threads running on the system. The number of (logical) cores the CPUs of the system has affects how many threads can be executed in parallel, but doesn't limit how many threads the operating system can manage.
As such, nothing in particular will happen or will stop happening. If your system has more threads ready to run than number of (logical) CPU cores, then the operating system will not be able to give CPU time to all of the threads in parallel.
Note that creating native threads has a performance penalty, and having more threads waiting to run (excluding those waiting for disk or network) than the number of cores to execute them will reduce the performance of the system.
I'm wondering though what will happen if I exceed the amount of threads I have on my CPU?
You might experience a lot of (unneeded) task switching, each taking 1us to 22ms depending on what exactly ran before.
Depending on your OS or setup you can get the serial ports to do a lot of work for you. Also the amount of actual work on the individual receiving COM port matters.
OS sends a message that there are some data you can read from a given buffer.
OS wakes your thread as there is something to read on the port
Interrupt, wakes the thread and returns
In all cases a single thread might be able to handle all 30 COM's as most of the time is waiting for the very slow serial ports to send the data.
Most serial ports are buffered and only need to be emptied after several chars have been received. Some serial cards have DMA so you don't even need to empty it yourself.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
A feasible way to achieve a suitable time delay is to use busy-waiting, however
what are the advantages and disadvantages of using a busy-waiting or timer interrupts approach programming?
There are probably many, I will try to address what seems most important to me.
Advantages of busy-waiting:
The execution flow is usually easier to comprehend and thus less error prone.
Timing can be determined more accurately in some cases
Disadvantages:
No other code (except perhaps other interrupt routines) can be executed.
CPU-time is wasted: If no other work must be processed it is still more efficient to set some powersaving-state and let a timer interrupt wake it up in time.
A disadvantage of busy-waiting in embedded devices is the increased power consumption. In a busy wait, the processor is running full-blast, consuming power with no result. Most low power processors have the ability to put the processor to sleep while waiting for a timer interrupt, reducing power consumption dramatically. Lower power consumption = longer battery life.
Unless you have nothing else to do in your application or the result needs to be handled immediately (which is rather rare), you don't want to busy wait. It eats up cycles that could be used doing something else or sleeping.
A simple example is let's say you're making a wifi thermostat that communicates to a wifi chip via UART. Your application will need to read and process the temperature, update when new data is available, send out wifi messages, receive wifi messages and receive updates from button pushes just to name a few. If you ware busy waiting for any one of these to happen, than your thermostat can't do anything else unless it is though interrupt.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I am developing a C++ application, using pthreads library. Every thread in the program accesses a common unordered_map. The program runs slower with 4 threads than 1. I commented all the code in thread, and left only the part that tokenizes a string. Still the single-threading execution is faster, so I came to the conclusion that the map wasn't the problem.
After that I printed to the screen the threads' Ids, and they seemed to execute sequentially.
In the function that calls the threads, I have a while loop, which creates threads in an array, which size is the number of threads (let's say 'tn'). And every time tn threads are created, I execute a for loop to join them. (pthread_join). While runs many times(not only 4).
What may be wrong?
If you are running a small trivial program this tends to be the case because the work to start the threads, schedule priority, run, context switch, then sync could actually take more time then running it as a single process.
The point here is that when dealing with trivial problems it can run slower. BUT another factor might be how many cores you actually have in your CPU.
When you run a multitthreaded program, each thread will be processed sequentially according to the given CPU clock.
You will only have true multithreading if you have multiple cores. And in such scenario the only multithreading will be 1 thread /core.
Now, given the fact that you (most likely) have both threads on one core, try to keep in mind the overhead generated to the CPU for :
allocating different clock time for each thread
synchronizing thread accesses to various internal CPU operations
other thread priority operations
So in other words, for a simple application, multithreading is actually a downgrade in terms of performance.
Multithreading comes in handy when you need a asynchronous operation (meaning you don't want to wait for a rezult, such as loading an image from an url or streaming geomtery from HDD which is slower then ram) .
In such scenarios, applying multithreading will lead to better user experience, because your program won't hung up when a slow operation occurrs.
Without seeing the code it's difficult to tell for sure, but there could be a number of issues.
Your threads might not be doing enough work to justify their creation. Creating and running threads is expensive, so if your workload is too small, they won't pay for themselves.
Execution time could be spent mostly doing memory accesses on the map, in which case mutually excluding the threads means that you aren't really doing much parallel work in practice (Amdahl's Law).
If most of your code is running under a mutex that it will run serially and not in parllel
This question already has answers here:
How can multithreading speed up an application (when threads can't run concurrently)?
(9 answers)
Closed 9 years ago.
My professor causally mentioned that we should program multi-thread programs even if we are using a unicore processor however because of the lack of time , he did not elaborate on it .
I would like to know what are the benefits of a multi-thread program in a unicore processor ??
It won't be as significant as a multi-core system but it can still provide some benefits.
Mainly all the benefits that you are going to get will be regarding to the context switch that will happen after a input miss to the already executing thread. Executing thread may be waiting for anything such as a hardware resource or a branch mis-prediction or even data transfer after a cache miss.
At this point the waiting thread can be executed to benefit from this "waiting time". But of course context switch will take some time. Also managing threads inside the code rather than sequential computation can create some extra complexity to your program. And as it has been said, some applications needs to be multi-threaded so there is no escape from the context switch in some cases.
Some applications need to be multi-threaded. Multi-threading isn't just about improving performance by using more cores, it's also about performing multiple tasks at once.
Take Skype for example - The GUI needs to be able to accept the text you're entering, display it on the screen, listen for new messages coming from the user you're talking to, and display them. This wouldn't be a trivial task in a single threaded application.
Even if there's only one core available, the OS thread scheduler will give you the illusion of parallelism.
Usually it is about not blocking. Running many threads on a single core still gives the illusion of concurrency. So you can have, say, a thread doing IO while another one does user interactions. The user interaction thread is not blocked while the other does IO, so the user is free to carry on interacting.
Benefits could be different.
One of the widely used examples is the application with GUI, which supposed to perform some kind of computations. If you will have a single thread - the user will have to wait the result before dealing something else with the application, but if you start it in the separate thread - user interface could be still available for user during the computation process. So, multi-thread program could emulate multi-task environment even on a unicore system. That's one of the points.
As others have already mentioned, not blocking is one application. Another one is separation of logic for unrelated tasks that are to be executed simultaneously. Using threads for that leaves handling of scheduling these tasks to the OS.
However, note that it may also be possible to implement similar behavior using asynchronous operations in a single thread. "Future" and boost::asio provide ways of doing non-blocking stuff without necessarily resorting to multiple threads.
I think it depends a bit on how exactly you design your threads and which logic is actually in the thread. Some benefits you can even get on a single core:
A thread can wrap a blocking/long-during call you can't circumvent otherwise. For some operations there are polling mechanisms, but not for all.
A thread can wrap an almost standalone part of your application that has virtually no interaction with other code. For example background polling for updates, monitoring some resource (e.g. free storage), checking internet connectivity. If you keep them in a separate thread you can keep the code relatively simple in its own 'runtime' without caring too much about the impact on the main program, the sole communication with the main logic is usually a single 'event'.
In some environments you might get more processing time. This mainly depends on how your OS scheduling system works, but if this allocates time per thread, the more threads you have the more your app will be scheduled.
Some benefits long-term:
Where it's not hard to do you benefit if your hardware evolves. You never know what's going to happen, today your app runs on a single-core embedded device, tomorrow that embedded device gets a quad core. Programming threaded from the beginning improves your future scalability.
One example is an environment where you can deterministically assign work to a thread, e.g. based on some hash all related operations end up in the same thread. The advantage for single cores is 'small' but it's not hard to do as you need little synchronization primitives so the overhead stays small.
That said, I think there are situations where it's very ill advise:
As soon as your required synchronization mechanism with other threads becomes complex (e.g. multiple locks, lots of critical sections, ...). It might still be then that multi-threading gives you a benefit when effectively moving to multiple CPUs, but the overhead is huge both for your single core and your programming time.
For instance think about operations that block because of slow peripheral devices (harddisk access etc.). While these are waiting, even the single core can do other things asyncronously.
In a lot of applications the bottleneck is not CPU processing power. So when the program flow is waiting for completion of IO requests (user input, network/disk IO), critical resources to be available, or any sort of asynchroneously triggered events, the CPU can be scheduled to do other work instead of just blocking.
In this case you don't necessarily need multiple threads that can actually run in parallel. Cooperative multi-tasking concepts like asynchroneous IO, coroutines, or fibers come into mind.
If however the application's bottleneck is CPU processing power (constantly 100% CPU usage), then it makes sense to increase the number of CPUs available to the application. At that point it is easier to scale the application up to use more CPUs if it was designed to run in parallel upfront.
As far as I can see, one answer was not yet given:
You will have to write multithreaded applications in the future!
The average number of cores will double every 18 months in the future. People have learned single-threaded programming for 50 years now, and now they are confronted with devices that have multiple cores. The programming style in a multi-threaded environment differs significantly from single-threaded programming. This refers to low-level aspects like avoiding race conditions and proper synchronization, as well as the high-level aspects like the general algorithm design.
So in addition to the points already mentioned, it's also about writing future-proof software, scalability and the development of the skills that are required to achieve these goals.