I have a centos minimal hexacore 3.5ghz machine and I do not undestand why a SCHED_FIFO realtime thread pinned to 1 core only, freezes the terminal? How to avoid this while keeping the realtime behaviour of the thread without using sleep in the loop or blocking it? To simplify my problem, this thread tries to dequeuue items from a non-blocking,lockfree,concurrent queue in an infinite loop.
The kernel runs on core 0, all the other cores are free. All other threads and my process too, are SCHED_OTHER same priority, 20. This is the only thread where i need ultra low latency for some high frequency calculations. After starting the application it seems everything works ok but my terminal freezes (i connect remotely trough ssh). I am able to see the threads created and force close my app from htop. The RT thread seems to run 100% burnout the core assigned as expected. When i kill the app, the terminal frozen is released and i can use again.
It looks like that thread has higher priorty than everything else across all cores, but i want this on the core i pinned it only.
Thank you
Hi victor you need to isolate the core from the linux scheduler so that it does not try to assign lower priority tasks such as running your terminal to a core that is running SCHED_* jobs with higher priority. You can achieve isolating core 1 in your case by adding the kernel option isolcpus=1 to your grub.cfg (or whatever boot loader config you are using).
After rebooting you can confirm that you have successfully isolated core 1 by running dmesg | grep isol
and see that your kernel was booted with the option.
Here is some more info on isolcpus:
https://www.linuxtopia.org/online_books/linux_kernel/kernel_configuration/re46.html
Related
How to execute some script (in my case it would script which copies logs to flash or copies logs remotely) before watchdog execution?
Should I modify linux kernel watchdog driver? If so in which method?
Or maybe it is possible somehow to configure this by:
/etc/default/watchdog
/etc/watchdog.conf
However we have busybox installed where watchdog configuration is limited.
I cannot find anything on google, what is suprised as this is basic problem which needs to be solved - everybody wants to have logs after watchdog reset in persistent memory, flash what is not /var/log/ path.
Of course solution to copy from time to time logs to flash in normal device lifecycle is not good idea as there should be some solution how to do this when watchdog timeout on feeding /dev/watchdog expires.
On a linux kernel newer than 4.9 you should have the availability of the pretimeout governor framework which would allow you to write a linux kernel driver which would react on the detection of a pre-timeout. A solution like this is well beyond the scope of a simple question and answer, so I'm leaving my original answer stand.
TL;DR:
If the problem is detectable while the OS is still running you can flush the logs. If the problem is caused by the OS locking up then you won't have an opportunity to fix the issue as hardware will reset the box.
There are two things here:
Watchdog device
Watchdog program
The watchdog device is typically a hardware timer that will do 'something specifically low level' when it's timer expires. The most common low level thing to do is reset the box. There is no OS involvement in this if it happens in hardware. You will have no opportunity to do anything high level once that timer runs out - e.g. writing log files somewhere.
The watchdog program is a tool that reassures the watchdog device periodically as long as it's check conditions are met.
The busybox watchdog timer's condition is a simple loop (pseudo code):
while (1) {
# reassure watchdog
# sleep some time
}
so if the program stops running - e.g. by an OS lockup or termination of the program then the underlying hardware will simply kick the box.
The 'bigger' watchdog binary provides a bunch of checks, and if they fail, then it will trigger the repair-binary option in the /etc/watchdog.conf to try to recover. This would be a potential point to flush the logs.
I am working with a CPU-intensive real time application, and therefore I am trying to reserve a whole core for it.
To accomplish this in Windows, I am trying to set the CPU affinity of all running processes to the other cores, and then set the affinity of my real time application to the "free" core. Additionally, I am setting the priority to high.
Unfortunately, the following code (129 for testing as it means first and last core on my system) is not changing the affinity of all running processes:
while (Process32Next(hSnapShot, processInfo)!=FALSE)
{
hProcess = OpenProcess(PROCESS_ALL_ACCESS, TRUE, processInfo->th32ProcessID);
SetProcessAffinityMask(hProcess, 129);
}
Some system processes, like svchost.exe or csrss.exe, have the affinity 0xCCCCCCCC (looks like it is not initialized and is not used at all). And, of course, they are keeping it after a failed SetProcessAffinityMask().
Also, using Task Manager is not possible, as it denies access when trying to change affinity of those system processes.
Is it possible to change affinity for those processes as well?
Additional Information:
Windows 7 64bit
Real-time app has only one thread, therefore one core is "enough".
The below images show the difference:
Not working:
Working:
I have a time-critical application which processes a sequence of images coming from camera. It is written in C++ and it uses Qt, OpenCV and boost libraries. It is going to run on a dedicated PC.
Currently, the gui functions in main thread and i open a new thread for image processing. I didn't bother to divide the process section into threads because i think OpenCV is already doing that. However, i am having trouble maintaining the maximum tolerable delay.
My question is, how can i learn if my application using all the cores in the maximum level ?
When i look at the performance monitor, the pattern i see is really strange. The CPU usage is likely %35-40, all the cores are working but not at a full throttle.
Am i doing something wrong ?
You are not doing anything wrong, however you could change your code to take full use of the cpu cores by:
1 - setting the core affinity so that the thread does not change from one core to another, this could improve the cache usage (L1 and maybe L2)
2 - setting the scheduling of threads to FIFO so it does not get context-switched before finishing its processing
3 - run that thread on a higher priority process (this would require root privilege for the process)
Cheers
I have done strace on my multi-threaded c++ application running on linux
after couple hours of running, none of the threads got run, for about 12 seconds.
I have seen that the unfinished select system call which is called with a timeout was unfinished before the thread was suspended, reported after it resumed that, it took 11.x seconds for the operation to finish. (the timeout is only 900ms)
This is clear indication that the process got starved for a long time.
All threads in the process are created with default scheduling policy(SCHED_OTHER) of linux and default priority.
There are another 5 similar apps running on the same box which are also heavy I/O bound like this app due to heavy data received on the socket. But most of the time, this app is getting scheduled delay. The other apps are created with same sched policy and priority as this i.e. the defaults. why is only this process gets blocked almost all of the time?
Could it be because this process is more I/O intensive as in more busy due to may be higher rates of data? So, the linux dynamic priority adjusting in play here which pushed this process down?
Priority and process scheduling in Linux is only related to CPU time. In fact the Process Scheduler only cares about processes which wait to run on the CPU. Processes/threads which wait for I/O are not handled by the Process Scheduler but the I/O Scheduler.
I have written a program that captures and displays video from three video cards. For every frame I spawn a thread that compresses the frame to Jpeg and then puts it in queue for writing to disk. I also have other threads that read from these files and decodes them in their own threads. Usually this works fine, it's a pretty CPU intensive program using about 70-80 percent of all six CPU cores. But after a while the encoding suddenly slows down and the program can't handle the video fast enough and starts dropping frames. If I check the CPU utilization I can see that one core (usually core 5) is not doing much anymore.
When this happens, it doesn't matter if I quit and restart my program. CPU 5 will still have a low utilization and the program starts dropping frames immediately. Deleting all saved video doesn't have any effect either. Restarting the computer is the only thing that helps. Oh, and if I set the affinity of my program to use all but the semi-idling core, it works until the same happens to another core. Here is my setup:
AMD X6 1055T (Cool & Quiet OFF)
GA-790FX-UD5 motherboard
4Gig RAM unganged 1333Mhz'
Blackmagic Decklink DUO capture cards (x2)
Linux - Ubuntu x64 10.10 with kernel 2.6.32.29
My app uses:
libjpeg-turbo
posix threads
decklink api
Qt
Written in C/C++
All libraries linked dynamically
It seems to me like it would be some kind of problem with the way Linux schedules threads on the cores. Or is there some way my program can mess up so bad that it doesn't help to restart the program?
Thank you for reading, any and all input is welcome. I'm stuck :)
First of all, make sure it's not your program - maybe you are running into a convoluted concurrency bug, even though it's not all that likely with your program architecture and the fact that restarting the kernel helps. I've found that, usually, a good way is a post-mortem debugging. Compile with debugging symbols, kill the program with -SEGV when it is behaving strangely, and examine the core dump with gdb.
I would try to choose a core round-robin a when new frame processing thread is spawned and pin the thread to this core. Keep statistics on how long it takes for the thread to run. If this in in fact a bug in Linux scheduler - your threads will take roughly the same time to run on any core. If the core is actually busy with something else - your threads pinned to this core will get less CPU time.