I want to lower the priority of a thread.
The default policy of my thread is SCHED_OTHER, and the range of priority under my system(Ubuntu) is [0,0] (I get the range by sched_get_priority_min(SCHED_OTHER) and sched_get_priority_max(SCHED_OTHER)) which means all threads with SCHED_OTHER will have the same priority.
Is there any approach to lower the priority with SCHED_OTHER? I've been searching for a while and I found the nice value system, but not sure if it's the correct way to do this, since the man page said nice value is for process, instead of thread, I'm confused...
Could anyone give the correct solution to do this, and maybe with a short code snippet? Thanks!
Added:
why I want to lower the priority of thread:
I have a worker thread, which is doing some intensive computation periodically (say, a few seconds every minute, thus will cause some CPU usage peak), and my whole system will experience periodically downgrade in performance. But the priority of this worker thread is low, as long as it can finish computation before next minute, it should be fine. so I want to amortize the computation of this task over this time window smoothly.
Assuming you are running a fairly recent version of the Linux kernel, you can try setting your thread to SCHED_IDLE as shown at this link, i.e.:
void set_idle_priority() {
struct sched_param param;
param.sched_priority = 0;
if (pthread_setschedparam(pthread_self(), SCHED_IDLE, ¶m) != 0)
perror("pthread_setschedparam");
}
In that mode, your thread will only run when nothing else in the system wants to run.
... that said, I'm not confident that doing so will actually solve your problem, since from your description you shouldn't be having that problem in the first place. In particular, the presence of a CPU-hogging thread running at normal/default priority should not significantly slow down your system, since the scheduler should automatically detect its CPU-hogging nature and implicitly deprioritize it, without you having to take any special steps. That makes me think that your problem probably isn't the thread's CPU usage, but rather something else, like maybe your thread is using up all of the system's available RAM capacity, causing the system to have to page memory to disk. That would definitely cause the system to slow down considerably. Another possibility would be if your thread is doing a lot of disk I/O (although that seems less likely, since in that case it would probably not be pinning a CPU core).
You might try temporarily replacing your thread's computations with a trivial CPU-burning loop, e.g.:
void my_thread_entry_func()
{
while(1) {/* empty */}
}
... and run that just to see if it also provokes the slowdown. If not, then it's not the CPU-usage itself that is causing the slowdown, but rather something else your thread is doing, and you'll want to do further testing to narrow down exactly which part(s) of your thread's execution-path are the culprits.
Indeed, the situation with scheduling priorities on Linux is a huge mess of confusion over what applies to processes vs threads. At the specification level, nice and setpriority apply to processes, but Linux doesn't actually support doing that, so it interprets the argument as a kernel-level thread id instead (not same as pthread_t, and there's no standard userspace API to request the kernel-level tid of a thread!).
You might be able to achieve what you want with SCHED_IDLE or SCHED_BATCH, but they don't really work right either.
Related
Suppose I have a multi-threaded program in C++11, in which each thread controls the behavior of something displayed to the user.
I want to ensure that for every time period T during which one of the threads of the given program have run, each thread gets a chance to execute for at least time t, so that the display looks as if all threads are executing simultaneously. The idea is to have a mechanism for round robin scheduling with time sharing based on some information stored in the thread, forcing a thread to wait after its time slice is over, instead of relying on the operating system scheduler.
Preferably, I would also like to ensure that each thread is scheduled in real time.
In case there is no way other than relying on the operating system, is there any solution for Linux?
Is it possible to do this? How?
No that's not cross-platform possible with C++11 threads. How often and how long a thread is called isn't up to the application. It's up to the operating system you're using.
However, there are still functions with which you can flag the os that a special thread/process is really important and so you can influence this time fuzzy for your purposes.
You can acquire the platform dependent thread handle to use OS functions.
native_handle_type std::thread::native_handle //(since C++11)
Returns the implementation defined underlying thread handle.
I just want to claim again, this requires a implementation which is different for each platform!
Microsoft Windows
According to the Microsoft documentation:
SetThreadPriority function
Sets the priority value for the specified thread. This value, together
with the priority class of the thread's process determines the
thread's base priority level.
Linux/Unix
For Linux things are more difficult because there are different systems how threads can be scheduled. Under Microsoft Windows it's using a priority system but on Linux this doesn't seem to be the default scheduling.
For more information, please take a look on this stackoverflow question(Should be the same for std::thread because of this).
I want to ensure that for every time period T during which one of the threads of the given program have run, each thread gets a chance to execute for at least time t, so that the display looks as if all threads are executing simultaneously.
You are using threads to make it seem as though different tasks are executing simultaneously. That is not recommended for the reasons stated in Arthur's answer, to which I really can't add anything.
If instead of having long living threads each doing its own task you can have a single queue of tasks that can be executed without mutual exclusion - you can have a queue of tasks and a thread pool dequeuing and executing tasks.
If you cannot, you might want to look into wait free data structures and algorithms. In a wait free algorithm/data structure, every thread is guaranteed to complete its work in a finite (and even specified) number of steps. I can recommend the book The Art of Multiprocessor Programming where this topic is discussed in length. The gist of it is: every lock free algorithm/data structure can be modified to be wait free by adding communication between threads over which a thread that's about to do work makes sure that no other thread is starved/stalled. Basically, prefer fairness over total throughput of all threads. In my experience this is usually not a good compromise.
I've looked around a fair amount and can't seem to find what I'm looking for, but let me first stress that I'm not looking for a high-precision sleep function.
Here's the background for the problem I'm trying to solve:
I've made a memory mapping library that operates a lot like a named pipe. You can put bytes into it, get bytes out of it, and query how many bytes are available to read/write, all that good stuff.
It's fast (mostly) processes communicating using it will average at 4GB/s if they're passing chunks of bytes 8KBs or larger. Performance goes down to around 300MB/s as you approach 512B chunk size.
The problem:
Very occasionally, on heavily loaded servers, very large lag times will occur (Upwards of 5s). My running theory for the cause of this issue is that when large transfers are taking place (larger than the size of the mapped memory), the process that's writing data will tight poll to wait for more space to be available in the circular buffer that's implemented on top of the memory map. There are no calls to sleep, so the polling process could be hogging the CPU for no good reason! The issue is that even the smallest call to sleep (1ms) would absolutely demolish performance. The memmap size is 16KB, so if it slept for 1ms every 16KB, performance would drop to a best-case scenario of 16MB/s.
The solution:
I want a function that I can call that will relinquish the CPU, but makes no limitations on when it gets rescheduled by the operating system (Windows 7 in this case).
Has anyone got any bright alternatives?/Does anyone know if such a function exists?
Thanks.
According to the MSDN documentation, on XP or newer, when you call Sleep with a timeout of 0 will yield to other processes of equal priority.
A value of zero causes the thread to relinquish the remainder of its
time slice to any other thread of equal priority that is ready to
run. If there are no other threads of equal priority ready to run, the
function returns immediately, and the thread continues execution.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspx
Another option that will require more work but that will work more reliably would be to share an event handle between the producer and consumer process. You can use CreateEvent to create your event and DuplicateHandle to get it into your other process. As the producer fills the buffer, it will call ResetEvent on the event handle and call WaitForSingleObject with it. When the consumer has removed some data from the full shared buffer, it will call SetEvent, which will wake the producer which was waiting in WaitForSingleObject.
std::this_thread::yield() probably does what you want. I believe it just calls Sleep with 0 in most implementations.
You need the SwitchToThread() function (which will only relinquish its time slice if something else can run), not Sleep(0) (which would relinquish its time slice even if nothing else can run).
If you're writing code that's designed to take advantage of hyperthreading, YieldProcessor might do something for you too, but I doubt that'll be helpful.
You're incorrectly assuming a binary choice. You now are always busy-waiting because sleep always would be a bad idea.
The better solution is to try a few times without sleeping. If that still fails (because the map is full, and the other thread isn't running), then you can issue a true sleep. This will be sufficiently rare that on average you'll be sleeping microseconds. You could even check the realtime clock (RDTSC) to determine how long you've spent busy-waiting before surrendering your timeslice.
If you're operating under .Net, you can look into the Thread::Yield() method.
It may or may not help with your specific scenario but it's the correct way notify the scheduler that you want to relinquish the remainder of your timeslice.
If you're running in a pre-.Net environment (seems unlikely if you're on Windows 7), you can look into the Win32 SwitchToThread() function instead.
I have a program with several threads, one thread will change a global when it exits itself and the other thread will repeatedly poll the global. No any protection on the globals.
The program works fine on uni-processor. On dual core machine, it works for a while and then halt either on Sleep(0) or SuspendThread(). Would anyone be able to help me out on this?
The code would be like this:
Thread 1:
do something...
while(1)
{
.....
flag_thread1_running=false;
SuspendThread(GetCurrentThread());
continue;
}
Thread 2
flag_thread1_running=true;
ResumeThread(thread1);
.....do some other work here....
while(flag_thread1_running) Sleep(0);
....
The fact that you don't see any problem on a uniprocessor machine, but see problems on a multiproc machine is an artifact of the relatively large granularity of thread context switching on a uniprocessor machine. A thread will execute for N amount of time (milliseconds, nanoseconds, whatever) before the thread scheduler switches execution to a different thread. A lot of CPU instructions can execute in the typical thread timeslice. You can think of it as having a fairly large chunk of "free play" exclusive processor time during which you probably won't run into resource collisions because nothing else is executing on the processor.
When running on a multiproc machine, though, CPU instructions in two threads execute exactly at the same time. The size of the "free play" chunk of time is near zero.
To reproduce a resource contention issue between two threads, you need to get thread 1 to be accessing the resource and thread 2 to be accessing the resource at the same time, or very nearly the same time.
In the large-granularity thread switching that takes place on a uniprocessor machine, the chances that a thread switch will happen exactly in the right spot are slim, so the program may never exhibit a failure under normal use on a uniproc machine.
In a multiproc machine, the instructions are executing at the same time in the two threads, so the chances of thread 1 and thread 2 accessing the same resource at the same time are much, much greater - thousands of times more likely than the uniprocessor scenario.
I've seen it happen many times: an app that has been running fine for years on uniproc machines suddenly starts failing all over the place when executed on a new multiproc machine. The cause is a latent threading bug in the original code that simply never hit the right coincidence of timeslicing to repro on the uniproc machines.
When working with multithreaded code, it is absolutely imperitive to test the code on multiproc hardware. If you have thread collision issues in your code, they will quickly present themselves on a multiproc machine.
As others have noted, don't use SuspendThread() unless you are a debugger. Use mutexes or other synchronization objects to coordinate between threads.
Try using something more like WaitForSingleObjectEx instead of SuspendThread.
You are hitting a race condition. Thread 2 may execute flag_thread1_running=true;
before thread 1 executes flag_thread1_running=false.
This is not likely to happen on single CPU, because with usual the scheduling quantum 10-20 ms you are not likely to hit the problem. It will happen there as well, but very rarely.
Using proper synchronization primitives is a must here. Instead of bool, use event. Instead of checking the bool in a loop, use WaitForSingleObject (or WaitForMultipleObjects for more elaborate stuff later).
It is possible to perform synchronization between threads using plain variables, but it is rarely a good idea and it is quite hard to do it right - cf. How can I write a lock free structure?. It is definitely not a good idea to perform schedulling using Sleep, Suspend or Resume.
I guess that you already know that polling a global flag is a "Bad Idea™" so I'll skip that little speech. Try adding volatile to the flag declaration. That should force each read of it to read from memory. Without volatile, the implementation could be reading the flag into a register and not fetching it from memory.
I am working on a threaded application on Linux in C++ which attempts to be real time, doing an action on a heartbeat, or as close to it as possible.
In practice, I find the OS is swapping out my thread and causing delays of up to a tenth of a second while it is switched out, causing the heartbeat to be irregular.
Is there a way my thread can hint to the OS that now is a good time to context switch it out? I could make this call right after doing a heartbeat, and thus minimize the delay due to an ill timed context switch.
It is hard to say what the main problem is in your case, but it is most certainly not something that can be corrected with a call to sched_yield() or pthread_yield(). The only well-defined use for yielding, in Linux, is to allow a different ready thread to preempt the currently CPU-bound running thread at the same priority on the same CPU under SCHED_FIFO scheduling policy. Which is a poor design decision in almost all cases.
If you're serious about your goal of "attempting to be real-time" in Linux, then first of all, you should be using a real-time sched_setscheduler setting (SCHED_FIFO or SCHED_RR, FIFO preferred).
Second, get the full preemption patch for Linux (from kernel.org if your distro does not supply one. It will also give you the ability to reschedule device driver threads and to execute your thread higher than, say, hard disk or ethernet driver threads.
Third, see RTWiki and other resources for more hints on how to design and set up a real-time application.
This should be enough to get you under 10 microseconds response time, regardless of system load on any decent desktop system. I have an embedded system where I only squeeze out 60 us response idle and 150 us under heavy disk/system load, but it's still orders of magnitude faster than what you're describing.
You can tell the current executing thread to pause execution with various commands such as yield.
Just telling the thread to pause is non-determanistic, 999 times it might provide good intervals and 1 time it doesn't.
You'll will probably want to look at real time scheduling for consistant results. This site http://www2.net.in.tum.de/~gregor/docs/pthread-scheduling.html seems to be a good starting spot for researching about thread scheduling.
use sched_yield
And fur threads there is an pthread_yield http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_yield.3.html
I'm a bit confused by the question. If your program is just waiting on a periodic heartbeat and then doing some work, then the OS should know to schedule other things when you go back to waiting on the heartbeat.
You aren't spinning on a flag to get your "heartbeat" are you?
You are using a timer function such as setitimer(), right? RIGHT???
If not, then you are doing it all wrong.
You may need to specify a timer interval that is just a little shorter than what you really need. If you are using a real-time scheduler priority and a timer, your process will almost always be woken up on time.
I would say always on time, but Linux isn't a perfect real-time OS yet.
I'm not too sure for Linux, but on Windows it's been explained that you can't ask the system to not interrupt you for several reasons (first paragraph mostly). Off my head, one of the reasons is hardware interrupts that can occur at any time and over which you have no control.
EDIT Some guy just suggested the use of sched_yield then deleted his answer. It'll relinquish time for your whole process though. You can also use sched_setscheduler to hint the kernel about what you need.
I've written a C++ library that does some seriously heavy CPU work (all of it math and calculations) and if left to its own devices, will easily consume 100% of all available CPU resources (it's also multithreaded to the number of available logical cores on the machine).
As such, I have a callback inside the main calculation loop that software using the library is supposed to call:
while(true)
{
//do math here
callback(percent_complete);
}
In the callback, the client calls Sleep(x) to slow down the thread.
Originally, the clientside code was a fixed Sleep(100) call, but this led to bad unreliable performance because some machines finish the math faster than others, but the sleep is the same on all machines. So now the client checks the system time, and if more than 1 second has passed (which == several iterations), it will sleep for half a second.
Is this an acceptable way of slowing down a thread? Should I be using a semaphore/mutex instead of Sleep() in order to maximize performance? Is sleeping x milliseconds for each 1 second of processing work fine or is there something wrong that I'm not noticing?
The reason I ask is that the machine still gets heavily bogged down even though taskman shows the process taking up ~10% of the CPU. I've already explored hard disk and memory contention to no avail, so now I'm wondering if the way I'm slowing down the thread is causing this problem.
Thanks!
Why don't you use a lower priority for the calculation threads? That will ensure other threads are scheduled while allowing your calculation threads to run as fast as possible if no other threads need to run.
What is wrong with the CPU at 100%? That's what you should strive for, not try to avoid. These math calculations are important, no? Unless you're trying to avoid hogging some other resource not explicitly managed by the OS (a mutex, the disk, etc) and used by the main thread, generally trying to slow your thread down is a bad idea. What about on multicore systems (which almost all systems will be, going forward)? You'd be slowing down a thread for absolutely no reason.
The OS has a concept of a thread quantum. It will take care of ensuring that no important thread on your system is starved. And, as I mentioned, on multicore systems spiking one thread on one CPU does not hurt performance for other threads on other cores at all.
I also see in another comment that this thread is also doing a lot of disk I/O - these operations will already cause your thread to yield while it's waiting for the results, so the sleeps will do nothing.
In general, if you're calling Sleep(x), there is something wrong/lazy with your design, and if x==0, you're opening yourself up to live locks (the thread calling Sleep(0) can actually be rescheduled immediately, making it a noop).
Sleep should be fine for throttling an app, which from your comments is what you're after. Perhaps you just need to be more precise how long you sleep for.
The only software in which I use a feature like this is the BOINC client. I don't know what mechanism it uses, but it's open-source and multi-platform, so help yourself.
It has a configuration option ("limit CPU use to X%"). The way I'd expect to implement that is to use platform-dependent APIs like clock() or GetSystemTimes(), and compare processor time against elapsed wall clock time. Do a bit of real work, check whether you're over or under par, and if you're over par sleep for a while to get back under.
The BOINC client plays nicely with priorities, and doesn't cause any performance issues for other apps even at 100% max CPU. The reason I use the throttle it is that otherwise, the client runs the CPU flat-out all the time, and drives up the fan speed and noise. So I run it at the level where the fan stays quiet. With better cooling maybe I wouldn't need it :-)
Another, not so elaborate, method could be to time one iteration and let the thread sleep for (x * t) milliseconds before the next iteration where t is the millisecond time for one iteration and x is the choosen sleep time fraction (between 0 and 1).
Have a look at cpulimit. It sends SIGSTOP and SIGCONT as required to keep a process below a given CPU usage percentage.
Even still, WTF at "crazy complaints and outlandish reviews about your software killing PC performance". I'd be more likely to complain that your software was slow and not making the best use of my hardware, but I'm not your customer.
Edit: on Windows, SuspendThread() and ResumeThread() can probably produce similar behaviour.