how to run each thread on other core?

how to run each thread on other core? - c++

I have a udp server that receive data and computing it.
I have two thread for each role.
In my cpu is a 8 multi-core and I send data in varius speed.
but at maximun I use ony %14 percent of my cpu two core 50%. if I send more data valume my buffer will fulled and don't use more cpu.
why each core arise only 50% and not more?
I think to divide this two role to multi-core.
I want to be sure that each one on other core.
how I can Explicitly to choose each thread run on other core?
my program worte on c++ visaul studio 9 and run on windows7 and I use boost::thread.

The scheduler will deal with where your threads etc will run. This is OS specific, therefore if you want to attempt to alter how code is run you would need an OS specific API that lets you set a threads affinity etc.
Also, depends what you application is like, its a client server by the looks of it, so its not totally CPU bound. How many threads do you have in total, you mention 2 per role? A thread can only be run on one CPU. Try make units of work that can truly run in parallel, that way they can be truly run independently, ideally on different cores.
The OS will generally do a good job of running your code since it will have a better overall picture.

You cannot make one thread use more than one core. To achieve better CPU utilization you need to redesign your program to create more threads and let the OS schedule them for you. There's no need to manually restrict the threads to specific cores. OSes are really good at figuring out how to allocate cores to threads.
In your case, if the data computing tasks are CPU heavy, you could spawn a new thread per request or have a worker thread pool that would be picking incoming tasks and processing them. This is just one of ideas. It's difficult to say without knowing more about your application architecture and the problems it's trying to solve.

In each thread you can use SetThreadAffinityMask to choose CPUs that your thread should run on it. But I suggest you create a new worker thread for each incoming request (also if you use a thread pool you see considerable performance boost)

Be care that the compiler and linker settings are enabling multithreading.
Best practice is also not to start many threads but long living thread which do some amount of queued work liked computations or downloads.

Related

Allocating specific logical cores to specific processes exclusively, Windows, C++

If possible I do wish to allocate a logical core to a single process exclusively.
I am aware that Winbase.h contains Get/SetProcessAffinityMask and SetThreadAffinityMask.
I can get all processes running when the specific process is started and set their affinities to other logical cores, however, I do not want to check all processes in a periodic manner, for instance in order to deal with processes launched after the initiation of my process.
Furthermore there will be other processes which need to use specific logical cores only exclusively (no other process shall waste resources on that logical core). For instance my process shall run on core 15 but another shall run only on core 14.
Is there a better and more permanent way to allocate specific logical cores to specific processes than above mentioned Get/SetProcessAffinityMask scheme.

Windows is not a real-time operating system. Windows is designed to do preemptive multitasking with isolated processes, like basically any other modern desktop OS. A process is not supposed to just lock out every other process from a particular core, therefore, there is no API to explicitly do so (at least I'm not aware of one). It's up to the OS scheduler to decide which threads get to run when and where. That's the whole idea. You can use thread priorities to tell the scheduler that certain threads should be given a chance to run over others. You can use affinity masks to tell the scheduler which cores a thread can be scheduled to. You can even set a preferred core for your thread. But you don't get to schedule threads yourself.
Note that there's apparently a way to get something a bit like what you're looking for to work on Linux (see this question for more). I don't think similar possibilities exist on Windows. Yes you could try to hack together some solution based on a background task that continuously monitors and adjusts the priorities and affinity masks of all the threads in the system to approximate the desired behavior (like the person in the question linked by Ben Voigt above has apparently tried, and failed to achieve). But why would you want to do that? It goes completely against the very nature of everything an OS like Windows is designed to do. To me, what you are asking sounds a lot like what you're really looking for is a completely different kind of operating system, or maybe even no operating system at all. Boot the CPU straight into your own image and you get to drive all the cores in whatever way you fancy…

Parallel Thread Execution to achieve performance

I am little bit confused in multithreading. Actually we create multiple threads for breaking the main process to subprocess for achieving responsiveness and for removing waiting time.
But Here I got a situation where I have to execute the same task using multiple threads parallel.
And My processor can execute 4 threads parallel and so Will it improve the performance if I create more that 4 threads(10 or more). When I put this question to my colleague he is telling that nothing will happen we are already executing many threads in many other applications like browser threads, kernel threads, etc so he is telling to create multiple threads for the same task.
But if I create more than 4 threads that will execute parallel will not create more context switch and decrease the performance.
Or even though we create multiple thread for executing parallely the will execute one after the other so the performance will be the same.
So what to do in the above situations and are these correct?
edit
1 thread worked. time to process 120 seconds.
2 threads worked. time to process is about 60 seconds.
3 threads created. time to process is about 60 seconds.(not change to the time of 2 threads.)
Is it because, my hardware can only create 2 threads(for being dual)?
software thread=piece of code
Hardware thread=core(processor) for running software thread.
So my CPU support only 2 concurrent threads so if I purchase a AMD CPU which is having 8 cores or 12 cores can I achieve higher performance?

Multi-Tasking is pretty complex and performance gains usually depend a lot on the problem itself:
Only a part of the application can be worked in parallel (there is always a first part that splits up the work into multiple tasks). So the first question is: How much of the work can be done in parallel and how much of it needs to be synchronized (in some cases, you can stop here because so little can be done in parallel that the whole work isn't worth it).
Multiple tasks may depend on each other (one task may need the result of another task). These tasks cannot be executed in parallel.
Multiple tasks may work on the same data/resources (read/write situation). Here we need to synchronize access to this data/resources. If all tasks needs write access to the same object during the WHOLE process, then we cannot work in parallel.
Basically this means that without the exact definition of the problem (dependencies between tasks, dependencies on data, amount of parallel tasks, ...) it's very hard to tell how much performance you'll gain by using multiple threads (and if it's really worth it).

http://en.wikipedia.org/wiki/Amdahl%27s_law
Amdahl's states in a nutshell that the performance boost you receive from parallel execution is limited by your code that must run sequentially.
Without knowing your problem space here are some general things you should look at:
Refactor to eliminate mutex/locks. By definition they force code to run sequentially.
Reduce context switch overhead by pinning threads to physical cores. This becomes more complicated when threads must wait for work (ie blocking on IO) but in general you want to keep your core as busy as possible running your program not switching out threads.
Unless you absolutely need to use threads and sync primitives try use a task scheduler or parallel algorithms library to parallelize your work. Examples would be Intel TBB, Thrust or Apple's libDispatch.

Kernel threads in posix

As far as I understand, the kernel has kernelthreads for each core in a computer and threads from the userspace are scheduled onto these kernel threads (The OS decides which thread from an application gets connected to which kernelthread). Lets say I want to create an application that uses X number of cores on a computer with X cores. If I use regular pthreads, I think it would be possible that the OS decides to have all the threads I created to be scheduled onto a single core. How can I ensure that each each thread is one-on-one with the kernelthreads?

You should basically trust the kernel you are using (in particular, because there could be another heavy process running; the kernel scheduler will choose tasks to be run during a quantum of time).
Perhaps you are interested in CPU affinity, with non-portable functions like pthread_attr_setaffinity_np

You're understanding is a bit off. 'kernelthreads' on Linux are basically kernel tasks that are scheduled alongside other processes and threads. When the kernel's scheduler runs, the scheduling algorithm decides which process/thread, out of the pool of runnable threads, will be scheduled to run next on a given CPU core. As #Basile Starynkevitch mentioned, you can tell the kernel to pin individual threads from your application to a particular core, which means the operating system's scheduler will only consider running it on that core, along with other threads that are not pinned to a particular core.
In general with multithreading, you don't want your number of threads to be equal to your number of cores, unless you're doing exclusively CPU-bound processing, you want number of threads > number of cores. When waiting for network or disk IO (i.e. when you're waiting in an accept(2), recv(2), or read(2)) you're thread is not considered runnable. If N threads > N cores, the operating system may be able to schedule a different thread of yours to do work while waiting for that IO.

What you mention is one possible model to implement threading. But such a hierarchical model may not be followed at all by a given POSIX thread implementation. Since somebody already mentioned linux, it dosn't have it, all threads are equal from the point of view of the scheduler, there. They compete for the same resources if you don't specify something extra.
Last time I have seen such a hierarchical model was on a machine with an IRIX OS, long time ago.
So in summary, there is no general rule under POSIX for that, you'd have to look up the documentation of your particular OS or ask a more specific question about it.

Allocate more processor cycles to my program

I've been working on win32, c,c++ for a while. I code on visual studio. Most of the time I see system idle process uses more cpu utilization. Is there a way to allocate more processor cycles to my program to run it faster? I understand there might be limitations from i/o, in those cases this question doesn't make any sense.
OR
did i misunderstood the task manager numbers? I'm in a confusion, please help me out.
And I want to do something in program itself, btw I will be happy if answers are specific to windows.
Thanks in advance
~calvin

If your program it the only program that has something to do (not wait for IO), its thread will always be assigned to a processor core.
However, if you have a multi-core processor, and a single-threaded program, the CPU usage of your process displayed in the task manager will always be limited by 100/Ncores.
For example, if you have a quad-core machine, your process will be at 25% (using one core), and the idle process at around 75%. You can only additional CPU power by dividing your tasks into chunks that can be worked on by separate threads which will then be run on the idle cores.

The idle process only "runs" when no other process needs to. If you want to use more CPU cycles, then use them.

If your program is idling, it doesn't do anything, i.e. there is nothing that could be done any faster. So the CPU is probably not the bottle-neck in your case.
Are you maybe waiting for data coming from the disk or network?
In case your processor has multiple cores and your program uses only one core to its full extent, making your program multi-threaded could work.

In a multitask / multithread OS the processor(s) time is splitted among threads.
If you want a specific thread to get bigger time chunk you can set its priority with the SetThreadPriority function, not wise to do it though.
Only special software (should) mess with those settings.
It's common for window applications to have a low cpu usage percent (which we see in the task manager)
because most of the time they just wait for messages.

Use threads to:
abstract away all the I/O waits.
assign work to all cores.
also, remove all sleep-wait states from main thread.
Defer all I/O to a thread, so that wait states are confined within it. Keep the actual computations in the foreground thread, and use synchronization mechanisms that make the I/O slave thread to wait for your main thread when communicating.
If your CPU is multi-core, and your problem is paralellizable, create as many threads as you have cores, research "set affinity" functions to assign them between the cores and still keep a separate thread for all I/O.
Also pay attention not to wait in your main thread - usleep(1) doesn't send you into background for 1 microsecond, but for "no less than..." and that may mean anything between 1ms and 100ms but hardly ever less than that, and never anything close to a microsecond.

My threadspool just make 4~5threads. why?

I use QueueUserWorkItem() function to invoke threadpool.
And I tried lots of work with it. (about 30000)
but by the task manager my application only make 4~5 thread after I push the start button.
I read the MSDN which said that the default number of thread limitation is about 500.
why just a few of threads are made in my application?
I'm tyring to speed up my application and I dout this threadpool is the one of reason that slow down my application.
thanks

It is important to understand how the threadpool scheduler works. It was designed to fine-tune the number of running threads against the capabilities of your machine. Your machine probably can run only two threads at the same time, dual-core CPUs are the current standard. Maybe four.
So when you dump a bunch of threads in its lap, it starts out by activating only two threads. The rest of them are in a queue, waiting for CPU cores to become available. As soon as one of those two threads completes, it activates another one. Twice a second, it evaluates what's going on with active threads that didn't complete. It makes the rough assumption that those threads are blocking and thus not making progress and allows another thread to activate. You've now got three running threads. Getting up the 500 threads, the default max number of threads, will take 249 seconds.
Clearly, this behavior spells out what a thread should do to be suitable to run as a threadpool thread. It should complete quickly and don't block often. Note that blocking on I/O requests is dealt with separately.
If this behavior doesn't suit you then you can use a regular Thread. It will start running right away and compete with other threads in your program (and the operating system) for CPU time. Creating 30,000 of such threads is not possible, there isn't enough virtual memory available for that. A 32-bit operating system poops out somewhere south of 2000 threads, consuming all available virtual memory. You can get about 50,000 threads on a 64-bit operating system before the paging file runs out. Testing these limits in a production program is not recommended.

I think you may have misunderstood the use of the threadpool. Spawning threads and killing threads involves the Windows Kernel and is an expensive operation. If you continuously need threads to perform an aynchronous operation and then you throw them away it would perform many system calls.
So the threadpool is actually a group of threads which are created once which instead of exiting when they complete their task actually enter a wait for another item for queueuserworkitem. The threadpool will then tune itself based on how many threads are required concurrently for your process. If you wish to test this write this code:
for(int i = 0; i < 30000; i++)
{
ThreadPool.QueueUserWorkItem(myMethod);
}
You will see this will create a whole bunch of threads. Maybe not 30000 as some of the threads that are created will be reused as the ThreadPool starts to work through your function calls.

The threadpool is there so you can avoid creating a thread for every asynchronous operation for the very reason that threads are expensive. If you want 30,000 threads you're going to use a lot of memory for the thread stacks plus waste a lot of CPU time doing context switches. Now creating that many threads would be justified if you had 30,000 CPU cores...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js