Kernel threads in posix - c++

As far as I understand, the kernel has kernelthreads for each core in a computer and threads from the userspace are scheduled onto these kernel threads (The OS decides which thread from an application gets connected to which kernelthread). Lets say I want to create an application that uses X number of cores on a computer with X cores. If I use regular pthreads, I think it would be possible that the OS decides to have all the threads I created to be scheduled onto a single core. How can I ensure that each each thread is one-on-one with the kernelthreads?

You should basically trust the kernel you are using (in particular, because there could be another heavy process running; the kernel scheduler will choose tasks to be run during a quantum of time).
Perhaps you are interested in CPU affinity, with non-portable functions like pthread_attr_setaffinity_np

You're understanding is a bit off. 'kernelthreads' on Linux are basically kernel tasks that are scheduled alongside other processes and threads. When the kernel's scheduler runs, the scheduling algorithm decides which process/thread, out of the pool of runnable threads, will be scheduled to run next on a given CPU core. As #Basile Starynkevitch mentioned, you can tell the kernel to pin individual threads from your application to a particular core, which means the operating system's scheduler will only consider running it on that core, along with other threads that are not pinned to a particular core.
In general with multithreading, you don't want your number of threads to be equal to your number of cores, unless you're doing exclusively CPU-bound processing, you want number of threads > number of cores. When waiting for network or disk IO (i.e. when you're waiting in an accept(2), recv(2), or read(2)) you're thread is not considered runnable. If N threads > N cores, the operating system may be able to schedule a different thread of yours to do work while waiting for that IO.

What you mention is one possible model to implement threading. But such a hierarchical model may not be followed at all by a given POSIX thread implementation. Since somebody already mentioned linux, it dosn't have it, all threads are equal from the point of view of the scheduler, there. They compete for the same resources if you don't specify something extra.
Last time I have seen such a hierarchical model was on a machine with an IRIX OS, long time ago.
So in summary, there is no general rule under POSIX for that, you'd have to look up the documentation of your particular OS or ask a more specific question about it.

Related

What exactly is an std::thread? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Is it a real hardware thread?
I have a program which reads data from 30 COM devices every second and so far I only have access to 7. It works great when I implemented multithreading, one thread for each device and it doesn't block off my GUI while it waits to read data (it takes 30ms). I'm wondering though what will happen if I exceed the amount of threads I have on my CPU? If this isn't possible how would I approach this?
std::thread represents a thread, managed by the operating system. It has its stack, registers, instruction pointer etc. However, it is still managed by the OS. The operating system handles all the scheduling, assigning the thread to the hardware core and then preempting it if necessary to do another work on that core.
In a regular program you can't really lock core to do your work without any OS intervention. Otherwise, it could have negative impact on the stability of the system.
If you launch more threads than there are cores on your CPU, and they all run all the time, the OS will start swapping them in and out, effectively keeping them all running. However, this swapping is not for free, and you can slow everything down if you have too many of them.
However, if your threads are halted for whatever reason -- for example, you threads stop on a mutex, wait on a condition variable, or simply goes to sleep (e.g. via std::this_thread::sleep_for), then it no longer consume the hardware resources during that wait. In that scenario it is perfectly fine to have much more threads than there are cores on your CPU.
std::thread is not a "hardware thread". std::thread is a class in the C++ standard library. Instance of the std::thread class is an object that acts as a RAII container for a "native thread" i.e. a thread of execution provided by the API of the operating system.
When you create a std::thread (assuming you don't use the default constructor), the constructor will use the operating system API to create a native thread, and calls the passed function.
I'm wondering though what will happen if I exceed the amount of threads I have on my CPU?
The operating system has a subsystem called "process scheduler" which allocates the time of the hardware CPU cores (or logical core in case of hyper threading, which I assume is what you mean by "hardware thread") time for each of the threads running on the system. The number of (logical) cores the CPUs of the system has affects how many threads can be executed in parallel, but doesn't limit how many threads the operating system can manage.
As such, nothing in particular will happen or will stop happening. If your system has more threads ready to run than number of (logical) CPU cores, then the operating system will not be able to give CPU time to all of the threads in parallel.
Note that creating native threads has a performance penalty, and having more threads waiting to run (excluding those waiting for disk or network) than the number of cores to execute them will reduce the performance of the system.
I'm wondering though what will happen if I exceed the amount of threads I have on my CPU?
You might experience a lot of (unneeded) task switching, each taking 1us to 22ms depending on what exactly ran before.
Depending on your OS or setup you can get the serial ports to do a lot of work for you. Also the amount of actual work on the individual receiving COM port matters.
OS sends a message that there are some data you can read from a given buffer.
OS wakes your thread as there is something to read on the port
Interrupt, wakes the thread and returns
In all cases a single thread might be able to handle all 30 COM's as most of the time is waiting for the very slow serial ports to send the data.
Most serial ports are buffered and only need to be emptied after several chars have been received. Some serial cards have DMA so you don't even need to empty it yourself.

Make sure that main thread run on it's own core alone

I have a main thread which do some not-so-heavy-heavy work and also I'm creating worker threads which do very-heavy work. All documentation and examples shows how to create a number of hardware threads equal to std::thread::hardware_concurrency(). But since main thread already existed the number of threads becomes std::thread::hardware_concurrency() + 1. For example:
my machine supports 2 hardware threads.
in main thread I'm creating this 2 threads and the total number of threads becomes 3.
a core with the main thread do it's job plus (probably) the worker job.
Of course I don't want this because UI (which is done in main thread) becomes not responsive due to latency. What will happen if I create std::thread::hardware_concurrency() - 1 thread? Will it guarantee that the main thread and only main thread is running on single core? How can I check it?
P.S.: I'm using some sort of pool - I start threads on the program start and stop on exit. During the execution all worker threads run infinite while loop.
As others have written in the comments, you should carefully consider whether you can do a better job than the OS.
That being said, it is technically possible:
Use the native_handle method to get the OS's handle to your thread.
Consult your OS's documentation for setting the thread affinity. E.g., using pthreads, you'd want pthread_set_affinity.
This gives you full control over where each thread runs. In particular, you can give one of the threads a core of its own.
Note that this isn't part of the standard, as it is a level that is not portable. This might serve as another hint that it's possibly not what you're looking for.
No - std::thread::hardware_concurrency() only gives you a hint about the potential numbers of cores in use for multithreading. You might be interested in CPU Affinity Masks (Putting Threads on different CPUs). This works on the pthread level which you can reached via std::thread::native_handle (http://en.cppreference.com/w/cpp/thread/thread/native_handle)
Depending on your OS, you can get the thread's native handle, and control their priority levels using pthread_setschedparam(), for example giving the worker threads a lower priority than the main thread. This can be one solution to the UI problem. In general, number of threads need not match number of available HW cores.
There are definitely cases where you want to be able to gain full control, and reliably analyze what is going on. You are using Windows, but as an example, it is possible on a multicore machine to exclude e.g. one core from the normal Linux OS scheduler, and use that core for time-critical hard real-time tasks. In essence, you will own that core and handle interrupts for it, thereby enabling something close to hard real-time response times and predictability. Requires careful programming and analysis, and takes a significant effort. But very attractive if done right.

how to run each thread on other core?

I have a udp server that receive data and computing it.
I have two thread for each role.
In my cpu is a 8 multi-core and I send data in varius speed.
but at maximun I use ony %14 percent of my cpu two core 50%. if I send more data valume my buffer will fulled and don't use more cpu.
why each core arise only 50% and not more?
I think to divide this two role to multi-core.
I want to be sure that each one on other core.
how I can Explicitly to choose each thread run on other core?
my program worte on c++ visaul studio 9 and run on windows7 and I use boost::thread.
The scheduler will deal with where your threads etc will run. This is OS specific, therefore if you want to attempt to alter how code is run you would need an OS specific API that lets you set a threads affinity etc.
Also, depends what you application is like, its a client server by the looks of it, so its not totally CPU bound. How many threads do you have in total, you mention 2 per role? A thread can only be run on one CPU. Try make units of work that can truly run in parallel, that way they can be truly run independently, ideally on different cores.
The OS will generally do a good job of running your code since it will have a better overall picture.
You cannot make one thread use more than one core. To achieve better CPU utilization you need to redesign your program to create more threads and let the OS schedule them for you. There's no need to manually restrict the threads to specific cores. OSes are really good at figuring out how to allocate cores to threads.
In your case, if the data computing tasks are CPU heavy, you could spawn a new thread per request or have a worker thread pool that would be picking incoming tasks and processing them. This is just one of ideas. It's difficult to say without knowing more about your application architecture and the problems it's trying to solve.
In each thread you can use SetThreadAffinityMask to choose CPUs that your thread should run on it. But I suggest you create a new worker thread for each incoming request (also if you use a thread pool you see considerable performance boost)
Be care that the compiler and linker settings are enabling multithreading.
Best practice is also not to start many threads but long living thread which do some amount of queued work liked computations or downloads.

c++ thread division into microprocessor

I have a question... I need to build an app multi-thread and my question is: if I have a 2cpu processor, is automatically that my 2 threads are separately one by processor?
and if I have 4 threads and my pc have 4cpu, are again 1 per processor? and if I have 4 processor and 2 cpus, how is divided??
thanks in advance
This is not really a question which can be answered unless you specify the operating system at a minimum.
C++ itself knows nothing of threads, they are a service provided by the OS to the execution environment, and depend on that OS for its implementation.
As a general observation, I'm pretty certain that Linux schedules threads independently so that multiple threads can be spread across different CPUs and/or cores. I suspect Windows would do the same.
Some OS' will allow you to specify thread affinity, the ability for threads (and sometimes groups of threads) to stick with a single CPU but, again, that's an OS issue rather than a C++ one.
For Windows (as per your comment), you may want to read this introduction. Windows provides a SetProcessAffinityMask() function for controlling affinity of all threads in a given process or SetThreadAffinityMask() for controlling threads independently.
But, usually, you'll find it's best to leave these alone and let the OS sort it out - unless you have a specific need for different behaviour, the OS will almost certainly make the right decisions.
How threads get allocated to processors is specific to the OS your application is running on. Typically most OS's don't make any guarantees about how your threads are split across the processors, although some do have some low level APIs to allow you to specify thread affinity.
If your threads are CPU bound, then they will certainly tend to be scheduled on all available CPUs.
If your threads are IO bound, then if you only have one thread per CPU, most of the CPUs will be sitting idle. This is why - when attempting to maximize performace - it is important to measure what is happening and either find a hard coded ratio of threads per CPU, or use the operating systems thread pooling mechanism which has access to enough information to keep exactly as many threads active as there are CPU cores.
You generally dont want to have MORE active threads that CPUs (i.e. threads that arn't blocked waiting for IO to complete) as the act of switching between active threads on a CPU does incur small cost that can add up.

Allocate more processor cycles to my program

I've been working on win32, c,c++ for a while. I code on visual studio. Most of the time I see system idle process uses more cpu utilization. Is there a way to allocate more processor cycles to my program to run it faster? I understand there might be limitations from i/o, in those cases this question doesn't make any sense.
OR
did i misunderstood the task manager numbers? I'm in a confusion, please help me out.
And I want to do something in program itself, btw I will be happy if answers are specific to windows.
Thanks in advance
~calvin
If your program it the only program that has something to do (not wait for IO), its thread will always be assigned to a processor core.
However, if you have a multi-core processor, and a single-threaded program, the CPU usage of your process displayed in the task manager will always be limited by 100/Ncores.
For example, if you have a quad-core machine, your process will be at 25% (using one core), and the idle process at around 75%. You can only additional CPU power by dividing your tasks into chunks that can be worked on by separate threads which will then be run on the idle cores.
The idle process only "runs" when no other process needs to. If you want to use more CPU cycles, then use them.
If your program is idling, it doesn't do anything, i.e. there is nothing that could be done any faster. So the CPU is probably not the bottle-neck in your case.
Are you maybe waiting for data coming from the disk or network?
In case your processor has multiple cores and your program uses only one core to its full extent, making your program multi-threaded could work.
In a multitask / multithread OS the processor(s) time is splitted among threads.
If you want a specific thread to get bigger time chunk you can set its priority with the SetThreadPriority function, not wise to do it though.
Only special software (should) mess with those settings.
It's common for window applications to have a low cpu usage percent (which we see in the task manager)
because most of the time they just wait for messages.
Use threads to:
abstract away all the I/O waits.
assign work to all cores.
also, remove all sleep-wait states from main thread.
Defer all I/O to a thread, so that wait states are confined within it. Keep the actual computations in the foreground thread, and use synchronization mechanisms that make the I/O slave thread to wait for your main thread when communicating.
If your CPU is multi-core, and your problem is paralellizable, create as many threads as you have cores, research "set affinity" functions to assign them between the cores and still keep a separate thread for all I/O.
Also pay attention not to wait in your main thread - usleep(1) doesn't send you into background for 1 microsecond, but for "no less than..." and that may mean anything between 1ms and 100ms but hardly ever less than that, and never anything close to a microsecond.