I am reading about real time concepts at following link
http://www.embeddedlinux.org.cn/RTConforEmbSys/5107final/LiB0024.html
Here in section 4.4.4 it is mentioned has
The dispatcher is the part of the scheduler that performs context
switching and changes the flow of execution. At any time an RTOS is
running, the flow of execution, also known as flow of control, is
passing through one of three areas: through an application task,
through an ISR, or through the kernel. When a task or ISR makes a
system call, the flow of control passes to the kernel to execute one
of the system routines provided by the kernel. When it is time to
leave the kernel, the dispatcher is responsible for passing control
to one of the tasks in the user’s application. It will not necessarily
be the same task that made the system call.
It is the scheduling algorithms (to be discussed shortly) of the
scheduler that determines which task executes next. It is the
dispatcher that does the actual work of context switching and passing
execution control.
Depending on how the kernel is first entered, dispatching can happen
differently. When a task makes system calls, the dispatcher is used to
exit the kernel after every system call completes. In this case, the
dispatcher is used on a call-by-call basis so that it can coordinate
task-state transitions that any of the system calls might have
caused. (One or more tasks may have become ready to run, for example.)
On the other hand, if an ISR makes system calls, the dispatcher is
bypassed until the ISR fully completes its execution. This process is
true even if some resources have been freed that would normally
trigger a context switch between tasks. These context switches do not
take place because the ISR must complete without being interrupted by
tasks. After the ISR completes execution, the kernel exits through the
dispatcher so that it can then dispatch the correct task
.
My question on above text is
What does author mean by "When a task makes system calls, the dispatcher is used to exit the kernel
after every system call completes. In this case, the dispatcher is used on a call-by-call basis so that it can coordinate task-state transitions that any of
the system calls might have caused.". Specifically Here what does author mean by dispatcher is used to exit the kernel.
Thanks!
The author is presenting a simplified explanation of a real-time system architecture where a thread of control can be in one of three states - kernel mode (system calls), application mode (TASK), or interrupt service routine (ISR) mode.
The dispatcher in this example is a kernel routine that decides the application TASK that is to receive control after exiting each system call made by one of the application TASKs. This could be the TASK that issued the system call or it could be a different TASK depending on the dispatching algorithms and rules that are being followed.
There are many variations of dispatching rules and algorithms based on the planned usage of the system; As an example you might think of giving each TASK an equal amount of CPU time per minute - so if 3 application TASKS are being executed each one is supposed to receive 20 seconds of CPU time every minute. The dispatcher in this case could decide the next TASK to receive control is the TASK with the smallest accumulated CPU time during the last minute in an attempt to equally distribute the CPU time per minute across the TASKS.
After deciding which TASK is to next receive control the dispatcher will exit the mode of a system call and transfer control to the application TASK so invoking the dispatcher is the equivalent of "exiting" the kernel and transferring control to an eligible application TASK.
The author states that this is real-time system which means emphasis will be given to the quick processing of interrupts (via ISRs) over the processing of applications (TASKS). To minimize the amount of time consumed by each interrupt when an ISR issues a system call the kernel will directly returns to that ISR and not "exit via the dispatcher" which would allow control is to be given to an application TASK.
When the ISR has completed its processing, its exit will be performed in a manner that causes the kernel to invoke the dispatcher (hence it will exit via the dispatcher) so an application TASK can once again use the CPU.
As an additional note: one of the hidden assumptions in this explanation is that the same kernel routines (system calls) can be invoked by application TASKS and interrupt service routines (ISR). This is very common but security and performance issues might require different sets of kernel routines (system calls) for ISRs and TASKS.
After the system call finishes executing control has to be passed back to a user space task. There likely are many user space tasks waiting to be run and they all might have different priorities. The dispatcher uses it's algorithm to evaluate waiting tasks based on priority and other criteria (how long have they been waiting? How long do I anticipate they need?) and then starts one of them.
For example you might have an application that needs to read input from the command line. So your application calls the read() system call which passes control to the kernel. After the read() is complete, the dispatcher evaluates the tasks waiting to run and may decide another task should be run other than the one that called the read()
Related
I've inherited a large code base that contains multiple serial interface classes to various hardware components. Each of these serial models use non-overlapped serial for their communication. I have an issue where I get random CPU spikes to 100% which causes the threads to stall briefly, and then the CPU goes back to normal usage after ~10-20 seconds.
My theory is that due to the blocking nature of non-overlapped serial that there are times when multiple threads are calling readFile() and blocking each other.
My question is if multiple threads are calling readFile() (or writeFile()) at the same time will they block each other? Based on my research I believe that's true but would like confirmation.
The platform is Windows XP running C++03 so I don't have many modern tools available
"if multiple threads are calling readFile() (or writeFile()) at the same time will they block each other?"
As far as I'm concerned, they will block each other.
I suggest you could refer to the Doc:Synchronization and Overlapped Input and Output
When a function is executed synchronously, it does not return until
the operation has been completed. This means that the execution of the
calling thread can be blocked for an indefinite period while it waits
for a time-consuming operation to finish. Functions called for
overlapped operation can return immediately, even though the operation
has not been completed. This enables a time-consuming I/O operation to
be executed in the background while the calling thread is free to
perform other tasks.
Using the same event on multiple threads can lead to a race condition
in which the event is signaled correctly for the thread whose
operation completes first and prematurely for other threads using that
event.
And the operating system is in charge of the CPU. Your code only gets to run when the operating system calls it. The OS will not bother running threads that are blocked.Blocking will not occupy the CPU. I suggest you could try to use Windows Performance Toolkik to check cpu utilization.
I want to use async functions calling. I chose boost::deadline_timer.
For me, hardware-timer is a specific hardware (surprisingly), that works independently from CPU and is duty only for monitoring time. At the same time, if I understand correctly, it is also can be used for setting timeout and generating an interrupt when the timeout has been reached. (timers)
The primary advantage of that is asynchronous execution. The thread that set a timer can continue working and the callback function will be triggered in the same thread where the timer has been set.
Let me describe as I see it in action.
The application contains one or more worker threads. E.g. they process input items and filter them. Let's consider that application has 5 threads and each thread set one timer (5 seconds).
Application is working. E.g. current thread is thread-3.
Timer (thread-0) has been expired and generates (probably the wrong term) an interrupt.
Thread-context switching (thread-3 -> thread-0);
Callback function execution;
Timer (thread-1) has been expired and generates interruption.
...
And so on
P.S.0. I understand that this is not only one possible case for multi-threaded application.
Questions:
Did I describe the working process rightly?
Do I understand correctly that even current thread is thread-0 it also leads to context-switching, since the thread has to stop to execute current code and switch to execute the code from callback fuction?
If each thread sets 100k or 500k timers how it will affect on performance?
Does hardware have the limit to count of timers?
How expensive to update the timeout for a timer?
A hardware timer is, at its core, just a count-up counter and a set of comparators (or a count-down counter that uses the borrow of the MSb as an implicit comparison with 0).
Picture it as a register with a specialized operation Increment (or Decrement) that is started at every cycle of a clock (the easiest kind of counter with this operation is the Ripple-counter).
Each cycle the counter value is also fed to the comparator, previously loaded with a value, and its output will be the input to the CPU (as an interrupt or in a specialized pin).
In the case of a count-down counter, the borrow from the MSb acts as the signal that the value rolled over zero.
These timers have usually more functions, like the ability to stop after they reach the desired value (one-shot), to reset (periodic), to alternate the output state low and high (square wave generator), and other fancy features.
There is no limit on how many timers you can put on a package, of course, albeit simple circuits, they still have a cost in terms of money and space.
Most MCUs have one or two timers, when two, the idea is to use one for generic scheduling and the other for high-priority tasks orthogonal to the OS scheduling.
It's worth noting that having many hardware timers (to be used by the software) is useless unless there are also many CPUs/MCUs since it's easier to use software timers.
On x86 the HPET timer is actually made of at most 32 timers, each with 8 comparators, for a total of 256 timers as seen from the software POV.
The idea was to assign each timer to a specific application.
Applications in an OS don't use the hardware timers directly, because there can possibly be a lot of applications but just one or two timers.
So what the OS does is share the timer.
It does this by programming the timer to generate an interrupt every X units of time and by registering an ISR (Interrupt Service Routine) for such an event.
When a thread/task/program sets up a timer, the OS appends the timer information (periodic vs one-shot, period, ticks left, and callback) to a priority queue using the absolute expiration time as the key (see Peter Cordes comments below) or a list for simple OSes.
Each time the ISR is called the OS will peek at the queue and see if the element on top is expired.
What happens when a software timer is expired is OS-dependent.
Some embedded and small OS may call the timer's callback directly from the context of the ISR.
This is often true if the OS doesn't really have a concept of thread/task (and so of context switch).
Other OSes may append the timer's callback to a list of "to be called soon" functions.
This list will be walked and processed by a specialized task. This is how FreeRTOS does it if the timer task is enabled.
This approach keeps the ISR short and allows programming the hardware timer with a shorter period (in many architectures interrupts are ignored while in an ISR, either by the CPU automatically masking interrupts or by the interrupt controller).
IIRC Windows does something similar, it posts an APC (Async Procedure Call) in the context of the thread that set the software timer just expired. When the thread is scheduled the APC will (as a form of a window's message or not, depending on the specific API used) call the callback. If the thread was waiting on the timer, I think it is just set in the ready state. In any case, it's not scheduled right away but it may get a priority boost.
Where the ISR will return is still OS-dependent.
An OS may continue executing the interrupted thread/task until it's scheduled out. In this case, you don't have step 4 immediately after step 3, instead, thread3 will run until its quantum expires.
On the other way around, an OS may signal the end of the ISR to the hardware and then schedule the thread with the callback.
This approach doesn't work if two or more timers expire in the same tick, so a better approach would be to execute a rescheduling, letting the schedule pick up the most appropriate thread.
The scheduling may also take into account other hints given by the thread during the creation of the software timer.
The OS may also just switch context, execute the callback and get back to the ISR context where it continues peeking at the queue.
The OS may even do any of that based on the period of the timer and other hints.
So it works pretty much like you imagined, except that a thread may not be called immediately upon the timer's expiration.
Updating a timer is not expensive.
While all in all the total work is not much, the timer ISR is meant to be called many many times a second.
In fact, I'm not even sure an OS will allow you to create such a huge number (500k) of timers.
Windows can manage a lot of timers (and their backing threads) but probably not 500k.
The main problem with having a lot of timers is that even if each one performs little work, the total work performed may be too much to keep up with the rate of ticking.
If each X units (e.g. 1ms) of time 100 timers expire, you have X/100 units of time (e.g. 10us) to execute each callback and the callback's code may just be too long to execute in that slice of time.
When this happens the callbacks will be called less often than desired.
More CPU/cores will allow some callback to execute in parallel and would alleviate the pressure.
In general, you need different timers if they run at different rates, otherwise, a single timer that walks a data structure filled with elements of work/data is fine.
Multi-threading can provide concurrency if your tasks are IO-bounded (files, network, input, and so on) or parallelism if you have a multi-processor system.
Let's say you had two tasks. Each one has it's own complex modules running schedule based systems and event based systems. When thinking about context switching, exactly when and how does a task scheduler decide when to switch tasks, and at one point can it do this? Will a task switch while in the middle of executing a block of code? Right in the middle of a function?
For reference I am working in a vxworks environment.
Generally, operating system schedules have no concern for blocks of C code. They switch when various events occur including:
A timer measuring how long your process has been using the CPU expires.
A device connected to the computer reports it has completed a task, and some process with high priority than yours was waiting for this.
Your process makes a request that cannot be satisfied immediately, such as requesting input from the keyboard, and the user has not typed it yet.
In the last case, the switch of course occurs at the point of your request. The others are effectively random with regard to where your process is executing. The associated interrupt can occur at any instruction in your process.
In some processor architectures, an interrupt can even occur during certain instructions: The instruction may be interrupted when it has only been partially executed, and the registers will be updated so that execution can be resumed to continue the instruction later.
As far as I know, crystal cycles Fibers with io, meaning that if one fiber is waiting for io, crystal will switch to an another fiber.
What if we spawn two fibers but one of them does constant computation/loop with no io?
For example, with the code below server doesn't respond to any http requests
spawn do
Kemal.run
end
spawn do
# constant computation/loop with no IO
some_func
end
Fiber.yield
# or sleep
By default, Crystal uses cooperative multitasking. To implement it, the Crystal runtime provides Fibers. Due to their cooperative nature, you have to yield execution from time to time (e.g. with Fibers.yield):
Fibers are cooperative. That means execution can only be drawn from a fiber when it offers it. It can't be interrupted in its execution at random. In order to make concurrency work, fibers must make sure to occasionally provide hooks for the scheduler to swap in other fibers. [...]
When a computation-intensive task has none or only rare IO operations, a fiber should explicitly offer to yield execution from time to time using Fiber.yield to break up tight loops. The frequency of this call depends on the application and concurrency model.
Note that CPU intense operations are not the only source for starving other Fibers. When calling C libraries that may block, the Fiber will also wait the operation to complete. An example would be a long-polling operation, which will wait for the next event or eventually time out (e.g. rd_kafka_poll in Kafka). To prevent that, prefer async API version (if available), or use a short polling interval (e.g. 0 for Kafka poll) and shift the sleep operation to the Crystal runtime, so the other Fibers can run.
In 2019, Crystal introduced support for parallelism. By running multiple worker threads, you can also prevent one expensive computation for starving all other operations. However, you have to be careful as the responsiveness (and maybe even correctness) of the program could then depend on the number of workers (e.g. with only one worker, it will still hang). Overall, yielding occasionally in time-extensive operations seems to be the better solution, even if you end up using multiple workers for the improved performance on multi-core machines.
There is easy way to calc duration of any function which described here: How to Calculate Execution Time of a Code Snippet in C++
start_timestamp = get_current_uptime();
// measured algorithm
duration_of_code = get_current_uptime() - start_timestamp;
But, it does not allow to get clear duration cause some time for execution other threads will be included in the measured time.
So question is: how to consider time which code spend in other threads?
OSX code preffer. Although it's great to look to windows or linux code also...
upd: Ideal? concept of code
start_timestamp = get_this_thread_current_uptime();
// measured algorithm
duration_of_code = get_this_thread_current_uptime() - start_timestamp;
I'm sorry to say that in the general case there is no way to do what you want. You are looking for worst-case execution time, and there are several methods to get a good approximation for this, but there is no perfect way as WCET is equivalent to the Halting problem.
If you want to exclude the time spent in other threads then you could disable task context switches upon entering the function that you want to measure. This is RTOS dependent but one possibility is to raise the priority of the current thread to the maximum. If this thread is max priority then other threads won't be able to run. Remember to reset the thread priority again at the end of the function. This measurement may still include the time spent in interrupts, however.
Another idea is to disable interrupts altogether. This could remove other threads and interrupts from your measurement. But with interrupts disabled the timer interrupt may not function properly. So you'll need to setup a hardware timer appropriately and rely on the timer's counter value register (rather than any time value derived from a timer interrupt) to measure the time. Also make sure your function doesn't call any RTOS routines that allow for a context switch. And remember to restore interrupts at the end of your function.
Another idea is to run the function many times and record the shortest duration measured over those many times. Longer durations probably include time spent in other threads but the shortest duration may be just the function with no other threads.
Another idea is to set a GPIO pin upon entry to and clear it upon exit from the function. Then monitor the GPIO pin with an oscilloscope (or logic analyzer). Use the oscilloscope to measure the period for when the GPIO pin is high. In order to remove the time spent in other threads you would need to modify the RTOS scheduler routine that selects the thread to run. Clear the GPIO pin in the scheduler when another thread runs and set it when the scheduler returns to your function's thread. You might also consider clearing the GPIO pin in interrupt handlers.
Your question is entirely OS dependent. The only way you can accomplish this is to somehow get a guarantee from the OS that it won't preempt your process to perform some other task, and to my knowledge this is simply not possible in most consumer OS's.
RTOS often do provide ways to accomplish this though. With Windows CE, anything running at priority 0 (in theory) won't be preempted by another thread unless it makes a function/os api/library call that requires servicing from another thread.
I'm not super familer with OSx, but after glancing at the documentation, OSX is a "soft" realtime operating system. This means that technically what you want can't be guaranteed. The OS may decide that there is "Something" more important than your process that NEEDS to be done.
OSX does however allow you to specify a Real-time process which means the OS will make every effort to honor your request to not be interrupted and will only do so if it deems absolutely necessary.
Mac OS X Scheduling documentation provides examples on how to set up real-time threads
OSX is not an RTOS, so the question is mistitled and mistagged.
In a true RTOS you can lock the scheduler, disable interrupts or raise the task to the highest priority (with round-robin scheduling disabled if other tasks share that priority) to prevent preemption - although only interrupt disable will truly prevent preemption by interrupt handlers. In a GPOS, even if it has a priority scheme, that normally only controls the number of timeslices allowed to a process in what is otherwise round-robin scheduling, and does not prevent preemption.
One approach is to make many repeated tests and take the smallest value obtained, since that is likely to be the one where the fewest pre-emptions occurred. It will help also to set the process to the highest priority in order to minimise the number of preemtions. But bear in mind on a GPOS many interrupts from devices such as the mouse, keyboard, and system clock will occur and consume a small (an possibly negligible) amount of time.