"Blinky" using C++ condition_variable for interruptible delays - c++

A class stores the state of the program - ready or busy. An LED indicator shows a steady colour when ready, otherwise it blinks to indicate that the device is busy. I want the LED to "immediately" reflect changes in the program's state i.e. it should not attempt to finish its blink cycle when the state changes to ready.
A routine running in its own thread acts on this information as follows:
using namespace std::chrono::literals;
// This runs in a thread ... only snippet given here
while(!m_should_close) {
if (!m_ready) {
// Blink
std::unique_lock<std::mutex> lk(m_needs_update_mtx);
led_on();
if(!m_needs_update_cv::wait_for(lk, 300ms, []{ return m_needs_update; })) continue;
led_off();
if(!m_needs_update_cv::wait_for(lk, 300ms, []{ return m_needs_update; })) continue;
} else {
// Steady on
led_on();
}
}
The m_needs_update atomic_boolean is set to true to notify the blinker thread that a change has occurred to either m_should_close or m_ready, and m_needs_update_cv is of type std::condition_variable. I have two problems with my design:
It doesn't feel right. There's something about substituting those two delays with long lines of "jibberish" that feels convoluted.
The timer of effective period 600ms is not very accurate and is at the whim of Linux scheduling, contention etc.
Do you have any architectural advice? Thanks.

Related

How can I clear any pending interrupts in the NodeMCU ESP32?

I have built a simple coin sensor with two copper plates that detect when a coin hits them. Upon striking the two plates, I fire off an interrupt which looks like:
attachInterrupt( digitalPinToInterrupt(INPUT_PIN_COIN), Interrupt_CoinDeposit, FALLING );
This works fine and I am able to pick up when the coin strikes the two plates. In order to avoid the same coin being registered multiple times due to contact bounce, I detach the interrupt within the Interrupt_CoinDeposit() function as so:
void IRAM_ATTR Interrupt_CoinDeposit()
{
detachInterrupt(digitalPinToInterrupt(17));
g_crOSCore.EnqueueCoin();
}
EnqueueCoin simply increases a counter and returns back to where the interrupt left off. After which, I check if the counter has increased, and if it does, I reattach the interrupt. However, upon reattaching the interrupt, it fires off immediately. I learnt that reattaching the interrupt completes all the pending interrupts. I do not want this to happen. In the Arduino UNO R3, I believe you can solve this problem by resetting the EIFR. I'm wondering if there is something similar for the NodeMCU ESP32?
You could use a flag instead of disabling the interrupt. This way you also avoid the function call detachInterrupt() inside the ISR.
bool coinRegistered = false;
void IRAM_ATTR Interrupt_CoinDeposit()
{
if (!coinRegistered) {
coinRegistered = true;
g_crOSCore.EnqueueCoin();
}
}
/* ... somewhere else in the code ...*/
coinRegistered = false;
Either you can start a timer in the ISR, which resets the flag, or you reset it manually.

Posix Timer on SCHED_RR Thread is using 100% CPU

I have the following code snippet:
#include <iostream>
#include <thread>
#include <unistd.h>
#include <sys/epoll.h>
#include <sys/timerfd.h>
int main() {
std::thread rr_thread([](){
struct sched_param params = {5};
pthread_setschedparam(pthread_self(), SCHED_RR, &params);
struct itimerspec ts;
struct epoll_event ev;
int tfd ,epfd;
ts.it_interval.tv_sec = 0;
ts.it_interval.tv_nsec = 0;
ts.it_value.tv_sec = 0;
ts.it_value.tv_nsec = 20000; // 50 kHz timer
tfd = timerfd_create(CLOCK_MONOTONIC, 0);
timerfd_settime(tfd, 0, &ts, NULL);
epfd = epoll_create(1);
ev.events = EPOLLIN;
epoll_ctl(epfd, EPOLL_CTL_ADD, tfd, &ev);
while (true) {
epoll_wait(epfd, &ev, 1, -1); // wait forever for the timer
read(tfd, &missed, sizeof(missed));
// Here i have a blocking function (dummy in this example) which
// takes on average 15ns to execute, less than the timer period anyways
func15ns();
}
});
rr_thread.join();
}
I have a posix thread using the SCHED_RR policy and on this thread there is a POSIX Timer running with a timeout of 20000ns = 50KHz = 50000 ticks/sec.
After the timer fires i am executing a function that takes roughly 15ns so less than the timer period, but this doesn't really matter.
When i execute this i am getting 100% CPU Usage, the whole system is getting slow but i don't understand why this is happening and some things are confusing.
Why 100% CPU Usage since the thread is supposed to be sleeping while waiting for the timer to fire, so other tasks can be scheduled in theory right? even if this is a high priority thread.
I checked using pidstat the number of context switches and it seems that it's very small, close to 0, both voluntary and involuntary ones. Is this normal? While waiting for the timer to fire the scheduler should schedule other tasks right? I should see at least 20000 * 2 context switches / sec
As presented, your program does not behave as you describe. This is because you program the timer as a one-shot, not a repeating timer. For a timer that fires every 20000 ns, you want to set a 20000-ns interval:
ts.it_interval.tv_nsec = 20000;
Having modified that, I get a program that works produces heavy load on one core.
Why 100% CPU Usage since the thread is supposed to be sleeping while waiting for the timer to fire, so other tasks can be scheduled
in theory right? even if this is a high priority thread.
Sure, your thread blocks in epoll_wait() to await timer ticks, if in fact it manages to loop back there before the timer ticks again. On my machine, your program consumes only about 30% of one core, which seems to confirm that such blocking will indeed happen. That you see 100% CPU use suggests that my computer runs the program more efficiently than yours does, for whatever reason.
But you have to appreciate that the load is very heavy. You are asking to perform all the processing of the timer itself, the epoll call, the read, and func15ns() once every 20000 ns. Yes, whatever time may be left, if any, is available to be scheduled for another task, but the task swap takes a bit more time again. 20000 ns is not very much time. Consider that just fetching a word from main memory costs about 100 ns (though reading one from cache is of course faster).
In particular, do not neglect the work other than func15ns(). If the latter indeed takes only 15 ns to run then it's the least of your worries. You're performing two system calls, and these are expensive. Just how expensive depends on a lot of factors, but consider that removing the epoll_wait() call reduces the load for me from 30% to 25% of a core (and note that the whole epoll setup is superfluous here because simply allowing the read() to block serves the purpose).
I checked using pidstat the number of context switches and it seems that it's very small, close to 0, both voluntary and involuntary
ones. Is this normal? While waiting for the timer to fire the
scheduler should schedule other tasks right? I should see at least
20000 * 2 context switches / sec
You're occupying a full CPU with a high priority task, so why do you expect switching?
On the other hand, I'm also observing a low number of context switches for the process running your (modified) program, even though it's occupying only 25% of a core. I'm not prepared at the moment to reason about why that is.

Light event in WinAPI / C++

Is there some light (thus fast) event in WinAPI / C++ ? Particularly, I'm interested in minimizing the time spent on waiting for the event (like WaitForSingleObject()) when the event is set. Here is a code example to clarify further what I mean:
#include <Windows.h>
#include <chrono>
#include <stdio.h>
int main()
{
const int64_t nIterations = 10 * 1000 * 1000;
HANDLE hEvent = CreateEvent(nullptr, true, true, nullptr);
auto start = std::chrono::high_resolution_clock::now();
for (int64_t i = 0; i < nIterations; i++) {
WaitForSingleObject(hEvent, INFINITE);
}
auto elapsed = std::chrono::high_resolution_clock::now() - start;
double nSec = 1e-6 * std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
printf("%.3lf Ops/sec\n", nIterations / nSec);
return 0;
}
On 3.85GHz Ryzen 1800X I'm getting 7209623.405 operations per second, meaning 534 CPU clocks (or 138.7 nanoseconds) are spent on average for a check whether the event is set.
However, I want to use the event in performance-critical code where most of the time the event is actually set, so it's just a check for a special case and in that case the control flow goes to code which is not performance-critical (because this situation is seldom).
WinAPI events which I know (created with CreateEvent) are heavy-weight because of security attributes and names. They are intended for inter-process communication. Perhaps WaitForSingleObject() is so slow because it switches from user to kernel mode and back, even when the event is set. Furthermore, this function has to behave differently for manual- and auto-reset events, and a check for the type of the event takes time too.
I know that a fast user-mode mutex (spin lock) can be implemented with atomic_flag . Its spinning loop can be extended with a std::this_thread::yield() in order to let other threads run while spinning.
With the event I wouldn't like a complete equivalent of a spin-lock, because when the event is not set, it may take substantial time till it becomes set again. If every thread that needs the event set start spinning till it's set again, that would be an epic waste of CPU electricity (though shouldn't affect system performance if they call std::this_thread::yield)
So I would rather like an analogy of a critical section, which usually just does the work in user mode and when it realizes it needs to wait (out of spins), it switches to kernel mode and waits on a heavy synchronization object like a mutex.
UPDATE1: I've found that .NET has ManualResetEventSlim , but couldn't find an equivalent in WinAPI / C++.
UPDATE2: because there were details of event usage requested, here they are. I'm implementing a knowledge base that can be switched between regular and maintenance mode. Some operations are maintenance-only, some operations are regular-only, some can work in both modes, but of them some are faster in maintenance and some are faster in regular mode. Upon its start each operation needs to know whether it is in maintenance or regular mode, as the logic changes (or the operation refuses to execute at all). From time to time user can request a switch between maintenance and regular mode. This is rare. When this request arrives, no new operations in the old mode can start (a request to do so fails) and the app waits for the current operations in the old mode to finish, then it switches mode. So light event is a part of this data structure: the operations except mode switching have to be fast, so they need to set/reset/wait event quickly.
begin from win8 the best solution for you use WaitOnAddress (in place WaitForSingleObject, WakeByAddressAll (work like SetEvent for NotificationEvent) and WakeByAddressSingle (work like SynchronizationEvent ). more read - WaitOnAddress lets you create a synchronization object
implementation can be next:
class LightEvent
{
BOOLEAN _Signaled;
public:
LightEvent(BOOLEAN Signaled)
{
_Signaled = Signaled;
}
void Reset()
{
_Signaled = FALSE;
}
void Set(BOOLEAN bWakeAll)
{
_Signaled = TRUE;
(bWakeAll ? WakeByAddressAll : WakeByAddressSingle)(&_Signaled);
}
BOOL Wait(DWORD dwMilliseconds = INFINITE)
{
BOOLEAN Signaled = FALSE;
while (!_Signaled)
{
if (!WaitOnAddress(&_Signaled, &Signaled, sizeof(BOOLEAN), dwMilliseconds))
{
return FALSE;
}
}
return TRUE;
}
};
don't forget add Synchronization.lib for linker input.
code for this new api very effective, they not create internal kernel objects for wait (like event) but use new api ZwAlertThreadByThreadId ZwWaitForAlertByThreadId special design for this targets.
how implement this yourself, before win8 ? for first look trivial - boolen varitable + event handle. and must look like:
void Set()
{
SetEvent(_hEvent);
// Sleep(1000); // simulate thread innterupted here
_Signaled = true;
}
void Reset()
{
_Signaled = false;
// Sleep(1000); // simulate thread innterupted here
ResetEvent(_hEvent);
}
void Wait(DWORD dwMilliseconds = INFINITE)
{
if(!_Signaled) WaitForSingleObject(_hEvent);
}
but this code really incorrect. problem that we do 2 operation in Set (Reset) - change state of _Signaled and _hEvent. and no way do this from user mode as atomic/interlocked operation. this mean that thread can be interrupted between this two operation. assume that 2 different threads in concurrent call Set and Reset. in most case operation will be executed in next order for example:
SetEvent(_hEvent);
_Signaled = true;
_Signaled = false;
ResetEvent(_hEvent);
here all ok. but possible and next order (uncomment one Sleep for test this)
SetEvent(_hEvent);
_Signaled = false;
ResetEvent(_hEvent);
_Signaled = true;
as result _hEvent will be in reset state, when _Signaled is true.
implement this as atomic yourself, without os support will be not simply, however possible. but i be first look for usage of this - for what ? are event like behavior this is exactly you need for task ?
The other answer is very good if you can drop support of Windows 7.
However on Win7, if you set/reset the event many times from multiple threads, but only need to sleep rarely, the proposed method is quite slow.
Instead, I use a boolean guarded by a critical section, with condition variable to wake / sleep.
The wait method will go to the kernel for sleep on SleepConditionVariableCS API, that’s expected and what you want.
However set & reset methods will work entirely in user mode: setting a single boolean variable is very fast, i.e. in 99% of cases, the critical section will do it’s user-mode lock free magic.

C++ multithreaded application using std::thread works fine on Windows but not Ubuntu

I have a somewhat simple multithreaded application written using the C++ std::thread library for both Ubuntu 14.04 and Windows 8.1. The code is nearly completely identical except that I'm using the operating system respective libraries windows.h and unistd.h to use Sleep/sleep to pause execution for a time. They both actually begin to run and the Ubuntu version does keep running for a short time but then hangs. I am using the proper arguments to the sleep/Sleep functions since I know Windows Sleep takes milliseconds, while Unix sleep takes seconds.
I've run the code multiple times and on Ubuntu it never makes it past two minutes whereas I've run it on windows twice for 20 minutes and then multiple times for roughly five minutes each to see if I was just lucky. Is this just an incompatibility with the thread library or does sleep not do what I think it does, or something else? The infinite loops are there because this is a school project and is expected to run without deadlocks or crashing.
The gist is that this is a modified 4-way stop where cars who arrive first don't have to slow down and stop. We only had to let one car through the intersection at a time which takes 3 seconds to cross, hence Sleep(3000), and don't have to worry about turns. Three threads run the spawnCars function and there are four other threads that each monitor one of the four directions N, E, S, and W. I hope that it's understandable why I can't post the entire code in the chance some other student stumbles upon this. These two functions are the only place where code is different aside from the operating system dependent library inclusion at the top. Thanks.
edit: Since I've just gone and posted all the code for the project, if the problem does end up being a deadlock, may I request that you only say so, and not post an in depth solution? I'm new here so if that's against the spirit of SO then fire away and I'll try to figure it out without reading the details.
/* function clearIntersection
Makes a car go through the intersection. The sleep comes before the removal from the queue
because my understanding is that the wait condition simulates the go signal for drivers.
It wouldn't make sense for the sensors to tell a car to go if the intersection isn't yet
clear even if the lock here would prevent that.
*/
void clearIntersection(int direction)
{
lock->lock();
Sleep(3000);
dequeue(direction);
lock->unlock();
}
/* function atFront(int direction)
Checks whether the car waiting at the intersection from a particular direction
has permission to pass, meaning it is at the front of the list of ALL waiting cars.
This is the waiting condition.
*/
bool isAtFront(int direction)
{
lock->lock();
bool isAtFront = cardinalDirections[direction].front() == list->front();
lock->unlock();
return isAtFront;
}
void waitInLine()
{
unique_lock<mutex> conditionLock(*lock);
waitForTurn->wait(conditionLock);
conditionLock.unlock();
}
//function broadcast(): Let all waiting threads know they can check whether or not their car can go.
void broadcast()
{
waitForTurn->notify_all();
}
};
/* function monitorDirection(intersectionQueue,int,int)
Threads will run this function. There are four threads that run this function
in total, one for each of the cardinal directions. The threads check to see
if the car at the front of the intersectionQueue, which contains the arrival order
of cars regardless of direction, is the car at the front of the queue for the
direction the thread is assigned to monitor. If not, it waits on a condition
variable until it is the case. It then calls the function to clear the intersection.
Broadcast is then used on the condition variable so all drivers will check if they
are allowed to pass, which one will unless there are 0 waiting cars, waiting again if not the case.
*/
void monitorDirection(intersectionQueue *intersection, int direction, int id)
{
while (true) //Do forever to see if crashes can occur.
{
//Do nothing if there are no cars coming from this direction.
//Possibly add more condition_variables for each direction?
if (!intersection->empty(direction))
{
while (!intersection->isAtFront(direction))
intersection->waitInLine();
intersection->clearIntersection(direction);
cout << "A car has gone " << numberToDirection(direction) << endl;
//All cars at the intersection will check the signal to see if it's time to go so broadcast is used.
intersection->broadcast();
}
}
}
Your culprit is likely your while (!isAtFront(...)) loop. If another thread gets scheduled between the check and the subsequent call to waitInLine(), the state of your queues could change, causing all of your consumer threads to end up waiting. At that point there's no thread to signal your condition_variable, so they will wait forever.

How can I use Boost condition variables in producer-consumer scenario?

EDIT: below
I have one thread responsible for streaming data from a device in buffers. In addition, I have N threads doing some processing on that data. In my setup, I would like the streamer thread to fetch data from the device, and wait until the N threads are done with the processing before fetching new data or a timeout is reached. The N threads should wait until new data has been fetched before continuing to process. I believe that this framework should work if I don't want the N threads to repeat processing on a buffer and if I want all buffers to be processed without skipping any.
After careful reading, I found that condition variables is what I needed. I have followed tutorials and other stack overflow questions, and this is what I have:
global variables:
boost::condition_variable cond;
boost::mutex mut;
member variables:
std::vector<double> buffer
std::vector<bool> data_ready // Size equal to number of threads
data receiver loop (1 thread runs this):
while (!gotExitSignal())
{
{
boost::unique_lock<boost::mutex> ll(mut);
while(any(data_ready))
cond.wait(ll);
}
receive_data(buffer);
{
boost::lock_guard<boost::mutex> ll(mut);
set_true(data_ready);
}
cond.notify_all();
}
data processing loop (N threads run this)
while (!gotExitSignal())
{
{
boost::unique_lock<boost::mutex> ll(mut);
while(!data_ready[thread_id])
cond.wait(ll);
}
process_data(buffer);
{
boost::lock_guard<boost::mutex> ll(mut);
data_ready[thread_id] = false;
}
cond.notify_all();
}
These two loops are in their own member functions of the same class. The variable buffer is a member variable, so it can be shared across threads.
The receiver thread will be launched first. The data_ready variable is a vector of bools of size N. data_ready[i] is true if data is ready to be processed and false if the thread has already processed data. The function any(data_ready) outputs true if any of the elements of data_ready is true, and false otherwise. The set_true(data_ready) function sets all of the elements of data_ready to true. The receiver thread will check if any processing thread still is processing. If not, it will fetch data, set the data_ready flags, notify the threads, and continue with the loop which will stop at the beginning until processing is done. The processing threads will check their respective data_ready flag to be true. Once it is true, the processing thread will do some computations, set its respective data_ready flag to 0, and continue with the loop.
If I only have one processing thread, the program runs fine. Once I add more threads, I'm getting into issues where the output of the processing is garbage. In addition, the order of the processing threads matters for some reason; in other words, the LAST thread I launch will output correct data whereas the previous threads will output garbage, no matter what the input parameters are for the processing (assuming valid parameters). I don't know if the problem is due to my threading code or if there is something wrong with my device or data processing setup. I try using couts at the processing and receiving steps, and with N processing threads, I see the output as it should:
receive data
process 1
process 2
...
process N
receive data
process 1
process 2
...
Is the usage of the condition variables correct? What could be the problem?
EDIT: I followed fork's suggestions and changed the code to:
data receiver loop (1 thread runs this):
while (!gotExitSignal())
{
if(!any(data_ready))
{
receive_data(buffer);
boost::lock_guard<boost::mutex> ll(mut);
set_true(data_ready);
cond.notify_all();
}
}
data processing loop (N threads run this)
while (!gotExitSignal())
{
// boost::unique_lock<boost::mutex> ll(mut);
boost::mutex::scoped_lock ll(mut);
cond.wait(ll);
process_data(buffer);
data_ready[thread_id] = false;
}
It works somewhat better. Am I using the correct locks?
I did not read your whole story but if i look at the code quickly i see that you use conditions wrong.
A condition is like a state, once you set a thread in a waiting condition it gives away the cpu. So your thread will effectively stop running untill some other process/thread notifies it.
In your code you have a while loop and each time you check for data you wait. That is wrong, it should be an if instead of a while. But then again it should not be there. The checking for data should be done somewhere else. And your worker thread should put itself in waiting condition after it has done its work.
Your worker threads are the consumers. And the producers are the ones that deliver the data.
I think a better construction would be to make a thread check if there is data and notify the worker(s).
PSEUDO CODE:
//producer
while (true) {
1. lock mutex
2. is data available
3. unlock mutex
if (dataAvailableVariable) {
4. notify a worker
5. set waiting condition
}
}
//consumer
while (true) {
1. lock mutex
2. do some work
3. unlock mutex
4. notify producer that work is done
5. set wait condition
}
You should also take care of the fact that some thread needs to be alive in order to avoid a deadlock, means all threads in waiting condition.
I hope that helps you a little.