I am trying to run a highly multi-threaded application and want to measure its performance with different cores ( 0,1,2,3,4,5,6 ... 12). I saw taskset when googled,
taskset 0x00000003 ./my_app
but when I see system monitor of the fedora, It only shows one core doing 100% and others only 12%, 0%,...etc.
Is there any way to tell the process to run on certain core. I also heard of an option like -t #no of cores . like
./my_app -t2
for 0 and 1 core .. but this also have no effect
what am I doing wrong can any one please lead me to right direction.
taskset 0x00000003 ./my_app sets the affinity of the my_app process to cores 1 and 2. If your application is multithreaded, the threads inherit the affinity, but their distribution between core 1 and 2 is not set.
To set the affinity of each thread within your process, you can either use taskset after the process is running (i.e. run myapp, examine the thread ids and call taskset -pc <core> <tid> for each) or set the affinity at thread creation with sched_setaffinity, pthread_setaffinity_np if you are using pthreads etc).
Whatever ./myapp -t2 does is specific to you application.
Related
I have a virtual machine with 32 cores.
I am running some simulations for which I need to utilize 16 cores at one time.
I use the below command to run a job on 16 cores :
mpirun -n 16 program_name args > log.out 2>&1
This program runs on 16 cores.
Now if I want to run the same programs on the rest of the cores, with different arguments, I use the same command like
mpirun -n 8 program_name diff_args > log_1.out 2>&1
The second process utilizes the same 16 cores that were utilized earlier.
How can use mpirun to run this process on 8 different cores, not the previous 16 that first job was using.
I am using headless Ubuntu 16.04.
Open MPI's launcher supports restricting the CPU set via the --cpu-set option. It accepts a set of logical CPUs expressed as a list of the form s0,s1,s2,..., where each list entry is either a single logical CPU number of a range of CPUs n-m.
Provided that the logical CPUs in your VM are numbered consecutively, what you have to do is:
mpirun --cpu-set 0-15 --bind-to core -n 16 program_name args > log.out 2>&1
mpirun --cpu-set 16-23 --bind-to core -n 8 program_name diff_args > log_1.out 2>&1
--bind-to core tells Open MPI to bind the processes to separate cores each while respecting the CPU set provided in the --cpu-set argument.
It might be helpful to use a tool such as lstopo (part of the hwloc library of Open MPI) to obtain the topology of the system, which helps in choosing the right CPU numbers and, e.g., prevents binding to hyperthreads, although this is less meaningful in a virtualised environment.
(Note that lstopo uses a confusing naming convention and calls the OS logical CPUs physical, so look for the numbers in the (P#n) entries. lstopo -p hides the hwloc logical numbers and prevents confusion.)
I have a centos minimal hexacore 3.5ghz machine and I do not undestand why a SCHED_FIFO realtime thread pinned to 1 core only, freezes the terminal? How to avoid this while keeping the realtime behaviour of the thread without using sleep in the loop or blocking it? To simplify my problem, this thread tries to dequeuue items from a non-blocking,lockfree,concurrent queue in an infinite loop.
The kernel runs on core 0, all the other cores are free. All other threads and my process too, are SCHED_OTHER same priority, 20. This is the only thread where i need ultra low latency for some high frequency calculations. After starting the application it seems everything works ok but my terminal freezes (i connect remotely trough ssh). I am able to see the threads created and force close my app from htop. The RT thread seems to run 100% burnout the core assigned as expected. When i kill the app, the terminal frozen is released and i can use again.
It looks like that thread has higher priorty than everything else across all cores, but i want this on the core i pinned it only.
Thank you
Hi victor you need to isolate the core from the linux scheduler so that it does not try to assign lower priority tasks such as running your terminal to a core that is running SCHED_* jobs with higher priority. You can achieve isolating core 1 in your case by adding the kernel option isolcpus=1 to your grub.cfg (or whatever boot loader config you are using).
After rebooting you can confirm that you have successfully isolated core 1 by running dmesg | grep isol
and see that your kernel was booted with the option.
Here is some more info on isolcpus:
https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html
I have a process that is launched on a Linux-based machine with exactly two cores.
Let's assume my process is the only process in the system (I will ignore other processes and even the system's ones).
My process is divided to two parts:
Critical performance code
Low priority code
Also let's assume my main process was launched on Core 0, and I want to exclusively reserve Core 1 for the critical performance code.
I'd like to divide the question to two:
How can I make sure that every thread in my process (including 3rd party libraries which I have linked my code with that might call pthread_create and etc.) will always open new threads on Core 0 ?
How can I write a test that can verify that Core 1 is doing absolutely nothing besides the performance critical path ?
I am familiar with APIs such as:
pthread_setaffinity_np
that can set a specific thread affinity but I want to know if there is a more low level way to make sure even threads that 3rd party libraries create (from inside the process) will also be pinned to Core 0.
Perhaps I can set the default affinity for the process to be Core 0 and for a specific thread - pin it to Core 1?
You have already described the solution you want:
Perhaps I can set the default affinity for the process to be Core 0 and for a specific thread - pin it to Core 1?
But, perhaps the question is you are not sure how to achieve this.
Linux provides sched_setaffinity to set the affinity of the current process.
To get newly created threads to run on a specific core, the easiest way is to initialize a pthread_attr_t, and set the desired core affinity with pthread_attr_setaffinity_np.
One of the solution is to install (if you do not have it already) and run Cpuset utility. Details can be found here
I have an mutlithreaded application (not parallel) that I now want to execute on different nodes synchronised using OpenMPI.
When I run the application on the node I get 300 %CPU utilisation (top command). I assume this means that 3 processors are used 100% (4 Core node).
When I run the same process synchronised using OpenMPI I only get 100% CPU utilisation, which I assume means that all my threads are confined to only 1 CPU on the node.
Is there any way that I can get the program to make use of all CPUs on the nodes for the 1 task scheduled on the node?
I have looked at OMP_NUM_THREADS but that does not help, I guess that it is used when OpenMPI spawns its own threads to panellise work.
I found that the affinity of each process can be set using --cpus-per-proc and it did solve the problem.
BUT: --cpus-per-proc is deprecated (using 1.8.3) and I get the following message:
Command line options:
Deprecated: --cpus-per-proc, -cpus-per-proc, --cpus-per-rank, -cpus-per-rank
Replacement: --map-by <obj>:PE=N, default <obj>=NUMA
I had to use the following to get the same functionality
--map-by socket:pe=4
I found out that the use of -pernode flag turns off binding and allows each node to use its existing threads (in OPENMPI on OPENSUSE 15.2 ).
I, however, could not tell exactly how many threads each node used as I am sure my OPENMP command setting the number of threads to 30 was ignored and slots=32 and max_slots=32 dis not help. I will next try your solution via --map-by.
What library function can I call to get mapping of processes to cores or given a process id tell me what core it's running on, it ran last time, or scheduled to run. So something like this:
core 1: 14232,42323
core 2: 42213,63434,434
core 3: 34232,34314
core 4: 42325,6353,1434,4342
core 5: 43432,64535,14345,34233
core 6: 23242,53422,4231,34242
core 7: 78789
core 8: 23423,23124,5663
I sched_getcpu returns the core number of calling process. If there was a function that given a process id, would return the core number that would be good too but I have not found one. sched_getaffinity is not useful either; It just tells you given a process what cores it can run on which is not what I'm interested in.
I don't know that you can get information about what CPU any particular process is running on, but if you look in /proc, you'll find one entry for each running process. Under that, in /proc/<pid>/cpuset you'll find information about the set of CPUs that can be used to run that process.
Your question does not have any precise answer. The scheduler can migrate a process from one processor core to another at any time (and it is actually doing that). So by the time you got the answer it may be already wrong. And a process is usually not tied to any particular core (unless its CPU affinity has been set e.g. with sched_setaffinity(2), which is unusual; see also cpuset(7) for more).
Why are you asking? Why does that matter?
You probably want to dig inside /proc, see proc(5) man page.
In other words, if the kernel does give that information, it is thru /proc/ but I guess that information is not available because it does not make any sense.
NB. The kernel will schedule processes on the various processor cores much better than you can do, so even with a warehouse, you should not care about the core running some pid.
Yes, the virtual file /proc/[pid]/stat seems to have this info: man 5 proc:
/proc/[pid]/stat
Status information about the process. This is used by ps(1). It is
defined in /usr/src/linux/fs/proc/array.c.
(...fields description...)
processor %d (since Linux 2.2.8)
CPU number last executed on.
on my dual core:
cat /proc/*/stat | awk '{printf "%-32s %d\n", $2 ":", $(NF-5)}'
(su): 0
(bash): 0
(tail): 1
(hd-audio0): 1
(chromium-browse): 0
(bash): 1
(upstart-socket-): 1
(rpcbind): 1
..though I can't say if it's pertinent and/or accurate..