How to selectively ignore some inotify events? - c++

Lets say I have two processes (simulated in this example with two threads) in a producer-consumer set up. That is, one process writes data to a file, the other process consumes the data in the file, then clears said file.
The set up I currently have, based on bits and pieces I've thrown together from various resources online, is that I should use a lock file to ensure that only one process can access the data file at a time. The producer acquires the lock, writes to the file, then releases the lock. Meanwhile, the consumer waits for modify events with inotify at which point it acquires the lock, consumes the data, and empties the file.
This seems relatively straightforward, but the part that's tripping me up is that when I empty the file out in my consumer thread, it triggers inotify modify event again, which sets off the whole flow again, and ends with the data file being cleared again, thus repeating forever.
I've tried a few ways to work around this problem, but none of them seem quite right. I'm worried doing this wrong will introduce potential race conditions or I'll end up skipping modify events or something.
Here is my current code:
#include <fstream>
#include <iostream>
#include <string>
#include "pthread.h"
#include "sys/file.h"
#include "sys/inotify.h"
#include "sys/stat.h"
#include "unistd.h"
const char* lock_filename = "./test_lock_file";
const char* data_filename = "./test_data_file";
int AquireLock(char const* lockName) {
mode_t m = umask(0);
int fd = open(lockName, O_RDWR | O_CREAT, 0666);
umask(m);
bool success = false;
if (fd < 0 || flock(fd, LOCK_EX) < 0) {
close(fd);
return -1;
}
return fd;
}
void ReleaseLock(int fd, char const* lockName) {
if (fd < 0) return;
remove(lockName);
close(fd);
}
void* ConsumerThread(void*) {
// Set up inotify.
int file_descriptor = inotify_init();
if (file_descriptor < 0) return nullptr;
int watch_descriptor =
inotify_add_watch(file_descriptor, data_filename, IN_MODIFY);
if (watch_descriptor < 0) return nullptr;
char buf[4096] __attribute__((aligned(__alignof__(inotify_event))));
while (true) {
// Read new events.
const inotify_event* event;
ssize_t numRead = read(file_descriptor, buf, sizeof(buf));
if (numRead <= 0) return nullptr;
// For each event, do stuff.
for (int i = 0; i < numRead; i += sizeof(inotify_event) + event->len) {
event = reinterpret_cast<inotify_event*>(&buf[i]);
// Critical section!
int fd = AquireLock(lock_filename);
// Read from the file.
std::string line;
std::ifstream data_file(data_filename);
if (data_file.is_open()) {
while (getline(data_file, line)) {
std::cout << line << std::endl;
}
data_file.close();
// Clear the file by opening then closing without writing to it.
std::ofstream erase_data_file(data_filename);
erase_data_file.close();
std::cout << "file cleared." << std::endl;
}
ReleaseLock(fd, lock_filename);
// Critical section over!
}
}
return nullptr;
}
int main(int argv, char** argc) {
// Set up other thread.
pthread_t thread;
int rc = pthread_create(&thread, NULL, ConsumerThread, nullptr);
if (rc) return rc;
// Producer thread: Periodically write to a file.
while (true) {
sleep(3);
// Critical section!
int fd = AquireLock(lock_filename);
// Write some text to a file
std::ofstream data_file(data_filename);
int counter = 0;
if (data_file.is_open()) {
std::cout << "Writing to file.\n";
data_file << "This is some example data. " << counter++ << "\n";
data_file.close();
}
ReleaseLock(fd, lock_filename);
// Critical section over!
}
pthread_exit(NULL);
return 0;
}
One idea I had was to disable tracking of modify events at the start of the consumer thread's critical section with inotify_rm_watch, then re-add it right before leaving the critical section. This doesn't seem to work though. Even with the events disabled, modify events are still getting triggered and I'm not sure why.
I've also considered just using a boolean to see if there was any file contents while consuming the file, and only clearing the file if it wasn't empty. This felt kind of hacky since it's still doing a second unnecessary iteration of the loop, but if I can't find a better solution I might just go with that. Ideally there would be a way to have only the producer thread's modifications trigger events, while the consumer could have it's own file modifications somehow ignored or disabled, but I'm not sure how to achieve that effect.

Related

Program with pipes and threads exiting before completion

I am writing a c++ program with threading and pipes. I am implementing a parallelized algorithm and the idea is that I have a main thread that writes data to child threads. The child thread must read this data, process it, and write back the result to the main thread.
I have stripped down a minimal reproducing, compiling version of the core logic of the communication and commented out the places where I have more code. The program runs and exits without typing out complete. Usually, the last value of i that is printed is between 1 and 9 and the program just terminates without saying anything. I would expect the program to run to completion but I am not getting any errors and the program exits gracefully so I am not sure how to debug.
NOTE: Pipes and Pthreads are mandated from somewhere else and are hard requirements. Please don't suggest a solution to use std::thread or just communicate between threads within the same address space.
#include <iostream>
#include "pthread.h"
#include "unistd.h"
#include <vector>
using namespace std;
void* func (void* args)
{
std::vector<int> v = * (std::vector<int>*)(args);
auto FH = fdopen(v[0], "r");
char buffer[1024];
int buffer_len = 1024;
while (fgets(buffer, buffer_len, FH))
{
std::string x{buffer};
}
// process the result and return it to the parent
return NULL;
}
int main()
{
std::vector<std::vector<int> *> pipes{};
std::vector<pthread_t *> threads{};
for (int i=0; i<20; i++)
{
std::cout<<i<<std::endl;
int fd[2];
if (pipe(fd) < 0)
{
std::cout<<"failed"<<std::endl;
exit(0);
}
int fd2[2];
if (pipe(fd2) < 0)
{
std::cout<<"failed"<<std::endl;
exit(0);
}
std::vector<int> *pipe_info = new std::vector<int>{fd[0], fd[1], fd2[0], fd2[1]};
auto F = fdopen(fd[1], "w");
pthread_t *thread = new pthread_t;
threads.push_back(thread);
pipes.push_back(pipe_info);
pthread_create(thread, NULL, func, (void*) pipe_info);
for (int i=0; i<100; i++)
fprintf(F, "%d", 3);
}
// read the data returned from the child threads
// using fd2 (indices 2,3) in each pipe in pies.
// free all allocated memory
for (auto thread: threads)
{
pthread_join(*thread, NULL);
delete thread;
}
std::cout<<"complete"<<std::endl;
return 0;
}
I cannot reproduce the problem, and the symptoms you describe seem improbabe. On my system the program prints all the numbers, and does not terminate, but hangs.
The reason is that pipe() is not a constructor; fdopen() is not a constructor either. The are c interfaces, and they wouldn't close by virtue of leaving the scope. You have to close fds and FILEs manually. You don't do it, and the threads patiently wait in fgets for more data or EOF. and until main close the writing end of a pipe, there will be no EOF.

How to properly wait for condition variable in C++?

In trying to create an asynchronous I/O file reader in C++ under Linux. The example I have has two buffers. The first read blocks. Then, for each time around the main loop, I asynchronously launch the IO and call process() which runs the simulated processing of the current block. When processing is done, we wait for the condition variable. The idea is that the asynchronous handler should notify the condition variable.
Unfortunately the notify seems to happen before wait, and it seems like this is not the way the condition variable wait() function works. How should I rewrite the code so that the loop waits until the asynchronous io has completed?
#include <aio.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
#include <condition_variable>
#include <cstring>
#include <iostream>
#include <thread>
using namespace std;
using namespace std::chrono_literals;
constexpr uint32_t blockSize = 512;
mutex readMutex;
condition_variable cv;
int fh;
int bytesRead;
void process(char* buf, uint32_t bytesRead) {
cout << "processing..." << endl;
usleep(100000);
}
void aio_completion_handler(sigval_t sigval) {
struct aiocb* req = (struct aiocb*)sigval.sival_ptr;
// check whether asynch operation is complete
if (aio_error(req) == 0) {
int ret = aio_return(req);
bytesRead = req->aio_nbytes;
cout << "ret == " << ret << endl;
cout << (char*)req->aio_buf << endl;
}
{
unique_lock<mutex> readLock(readMutex);
cv.notify_one();
}
}
void thready() {
char* buf1 = new char[blockSize];
char* buf2 = new char[blockSize];
aiocb cb;
char* processbuf = buf1;
char* readbuf = buf2;
fh = open("smallfile.dat", O_RDONLY);
if (fh < 0) {
throw std::runtime_error("cannot open file!");
}
memset(&cb, 0, sizeof(aiocb));
cb.aio_fildes = fh;
cb.aio_nbytes = blockSize;
cb.aio_offset = 0;
// Fill in callback information
/*
Using SIGEV_THREAD to request a thread callback function as a notification
method
*/
cb.aio_sigevent.sigev_notify_attributes = nullptr;
cb.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb.aio_sigevent.sigev_notify_function = aio_completion_handler;
/*
The context to be transmitted is loaded into the handler (in this case, a
reference to the aiocb request itself). In this handler, we simply refer to
the arrived sigval pointer and use the AIO function to verify that the request
has been completed.
*/
cb.aio_sigevent.sigev_value.sival_ptr = &cb;
int currentBytesRead = read(fh, buf1, blockSize); // read the 1st block
while (true) {
cb.aio_buf = readbuf;
aio_read(&cb); // each next block is read asynchronously
process(processbuf, currentBytesRead); // process while waiting
{
unique_lock<mutex> readLock(readMutex);
cv.wait(readLock);
}
currentBytesRead = bytesRead; // make local copy of global modified by the asynch code
if (currentBytesRead < blockSize) {
break; // last time, get out
}
cout << "back from wait" << endl;
swap(processbuf, readbuf); // switch to other buffer for next time
currentBytesRead = bytesRead; // create local copy
}
delete[] buf1;
delete[] buf2;
}
int main() {
try {
thready();
} catch (std::exception& e) {
cerr << e.what() << '\n';
}
return 0;
}
A condition varible should generally be used for
waiting until it is possible that the predicate (for example a shared variable) has changed, and
notifying waiting threads that the predicate may have changed, so that waiting threads should check the predicate again.
However, you seem to be attempting to use the state of the condition variable itself as the predicate. This is not how condition variables are supposed to be used and may lead to race conditions such as those described in your question. Another reason to always check the predicate is that spurious wakeups are possible with condition variables.
In your case, it would probably be appropriate to create a shared variable
bool operation_completed = false;
and use that variable as the predicate for the condition variable. Access to that variable should always be controlled by the mutex.
You can then change the lines
{
unique_lock<mutex> readLock(readMutex);
cv.notify_one();
}
to
{
unique_lock<mutex> readLock(readMutex);
operation_completed = true;
cv.notify_one();
}
and change the lines
{
unique_lock<mutex> readLock(readMutex);
cv.wait(readLock);
}
to:
{
unique_lock<mutex> readLock(readMutex);
while ( !operation_completed )
cv.wait(readLock);
}
Instead of
while ( !operation_completed )
cv.wait(readLock);
you can also write
cv.wait( readLock, []{ return operation_completed; } );
which is equivalent. See the documentation of std::condition_varible::wait for further information.
Of course, operation_completed should also be set back to false when appropriate, while the mutex is locked.

c++ asynchronous I/O in linux that waits on condition_variable, not waiting. What are we doing wrong?

In a previous questionTrying to write asynchronous I/O in C++ using locks and condition variables. This code calls terminate on the first lock() why?
,
we tried to use two mutexes to have asynchronous code that reads one block of a file into memory, then asynchronously tries to read the next block while processing the current one. Someone made a comment that using read was not the best way to do that. This is an attempt to use POSIX aio_read, but we are trying to wait on a condition_variable and do a notify on the condition variable in the callback after the I/O completes, and it's not working -- in the debugger we can see it blows right past the wait.
#include <aio.h>
#include <fcntl.h>
#include <signal.h>
#include <unistd.h>
#include <condition_variable>
#include <cstring>
#include <iostream>
#include <thread>
using namespace std;
using namespace std::chrono_literals;
constexpr uint32_t blockSize = 512;
mutex readMutex;
mutex procMutex;
condition_variable cv;
int fh;
int bytesRead;
void process(char* buf, uint32_t bytesRead) {
cout << "processing..." << endl;
usleep(100000);
}
void aio_completion_handler(sigval_t sigval) {
struct aiocb* req = (struct aiocb*)sigval.sival_ptr;
// check whether asynch operation is complete
if (aio_error(req) == 0) {
int ret = aio_return(req);
cout << "ret == " << ret << endl;
cout << (char*)req->aio_buf << endl;
}
cv.notify_one();
}
void thready() {
char* buf1 = new char[blockSize];
char* buf2 = new char[blockSize];
aiocb cb;
char* processbuf = buf1;
char* readbuf = buf2;
fh = open("smallfile.dat", O_RDONLY);
if (fh < 0) {
throw std::runtime_error("cannot open file!");
}
memset(&cb, 0, sizeof(aiocb));
cb.aio_fildes = fh;
cb.aio_nbytes = blockSize;
cb.aio_offset = 0;
// Fill in callback information
/*
Using SIGEV_THREAD to request a thread callback function as a notification
method
*/
cb.aio_sigevent.sigev_notify_attributes = nullptr;
cb.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb.aio_sigevent.sigev_notify_function = aio_completion_handler;
/*
The context to be transmitted is loaded into the handler (in this case, a
reference to the aiocb request itself). In this handler, we simply refer to
the arrived sigval pointer and use the AIO function to verify that the request
has been completed.
*/
cb.aio_sigevent.sigev_value.sival_ptr = &cb;
int currentBytesRead = read(fh, buf1, blockSize); // read the 1st block
unique_lock<mutex> readLock(readMutex);
while (true) {
cb.aio_buf = readbuf;
aio_read(&cb); // each next block is read asynchronously
process(processbuf, currentBytesRead); // process while waiting
cv.wait(readLock);
if (currentBytesRead < blockSize) {
break; // last time, get out
}
cout << "back from wait" << endl;
swap(processbuf, readbuf); // switch to other buffer for next time
currentBytesRead = bytesRead; // create local copy
}
delete[] buf1;
delete[] buf2;
}
int main() {
try {
thready();
} catch (std::exception& e) {
cerr << e.what() << '\n';
}
return 0;
}

How to write data to stdin to be consumed by a separate thread waiting on input from stdin?

I am trying to read some data from stdin in a separate thread from main thread. Main thread should be able to communicate to this waiting thread by writing to stdin, but when I run the test code (included below) nothing happens except that the message ('do_some_work' in my test code) is printed on the terminal directly instead of being output from the waiting thread.
I have tried a couple of solutions listed on SO but with no success. My code mimics one of the solutions from following SO question, and it works perfectly fine by itself but when coupled with my read_stdin_thread it does not.
Is it possible to write data into own stdin in Linux
#include <unistd.h>
#include <string>
#include <iostream>
#include <sstream>
#include <thread>
bool terminate_read = true;
void readStdin() {
static const int INPUT_BUF_SIZE = 1024;
char buf[INPUT_BUF_SIZE];
while (terminate_read) {
fd_set readfds;
struct timeval tv;
int data;
FD_ZERO(&readfds);
FD_SET(STDIN_FILENO, &readfds);
tv.tv_sec=2;
tv.tv_usec=0;
int ret = select(16, &readfds, 0, 0, &tv);
if (ret == 0) {
continue;
} else if (ret == -1) {
perror("select");
continue;
}
data=FD_ISSET(STDIN_FILENO, &readfds);
if (data>0) {
int bytes = read(STDIN_FILENO,buf,INPUT_BUF_SIZE);
if (bytes == -1) {
perror("input poll: read");
continue;
}
if (bytes) {
std::cout << "Execute: " << buf << std::endl;
if (strncmp(buf, "quit", 4)==0) {
std::cout << "quitting reading from stdin." << std::endl;
break;
}
else {
continue;
}
}
}
}
}
int main() {
std::thread threadReadStdin([] () {
readStdin();
});
usleep(1000000);
std::stringstream msg;
msg << "do_some_work" << std::endl;
auto s = msg.str();
write(STDIN_FILENO, s.c_str(), s.size());
usleep(1000000);
terminate_read = false;
threadReadStdin.join();
return 0;
}
A code snippet illustrating how to write to stdin that in turn is read by threadReadStdin would be extremely helpful.
Thanks much in advance!
Edit:
One thing I forgot to mention here that code within readStdin() is a third party code and any kind of communication that takes place has to be on its terms.
Also, I am pretty easily able to redirect std::cin and std::cout to either fstream or stringstream. Problem is that when I write to redirected cin buffer nothing really appears on the reading thread.
Edit2:
This is a single process application and spawning is not an option.
If you want to use a pipe to communicate between different threads in the same program, you shouldn't try using stdin or stdout. Instead, just use the pipe function to create your own pipe. I'll walk you through doing this step-by-step!
Opening the channel
Let's create a helper function to open the channel using pipe. This function takes two ints by reference - the read end and the write end. It tries opening the pipe, and if it can't, it prints an error.
#include <unistd.h>
#include <cstdio>
#include <thread>
#include <string>
void open_channel(int& read_fd, int& write_fd) {
int vals[2];
int errc = pipe(vals);
if(errc) {
fputs("Bad pipe", stderr);
read_fd = -1;
write_fd = -1;
} else {
read_fd = vals[0];
write_fd = vals[1];
}
}
Writing a message
Next, we define a function to write the message. This function is given as a lambda, so that we can pass it directly to the thread.
auto write_message = [](int write_fd, std::string message) {
ssize_t amnt_written = write(write_fd, message.data(), message.size());
if(amnt_written != message.size()) {
fputs("Bad write", stderr);
}
close(write_fd);
};
Reading a message
We should also make a function to read the message. Reading the message will be done on a different thread. This lambda reads the message 1000 bytes at a type, and prints it to standard out.
auto read_message = [](int read_fd) {
constexpr int buffer_size = 1000;
char buffer[buffer_size + 1];
ssize_t amnt_read;
do {
amnt_read = read(read_fd, &buffer[0], buffer_size);
buffer[amnt_read] = 0;
fwrite(buffer, 1, amnt_read, stdout);
} while(amnt_read > 0);
};
Main method
Finally, we can write the main method. It opens the channel, writes the message on one thread, and reads it on the other thread.
int main() {
int read_fd;
int write_fd;
open_channel(read_fd, write_fd);
std::thread write_thread(
write_message, write_fd, "Hello, world!");
std::thread read_thread(
read_message, read_fd);
write_thread.join();
read_thread.join();
}
It seems like I have stumbled upon the answer with the help of very constructive responses from #Jorge Perez, #Remy Lebeau and #Kamil Cuk. This solution is built upon #Jorge Perez's extremely helpful code. For brevity's sake I am not including the whole code but part comes from the code I posted and a large part comes from #Jorge Perez's code.
What I have done is taken his approach using pipes and replacing STDIN_FILENO by the pipe read fd using dup. Following link was really helpful:
https://en.wikipedia.org/wiki/Dup_(system_call)
I would really appreciate your input on whether this is a hack or a good enough approach/solution given the constraints I have in production environment code.
int main() {
int read_fd;
int write_fd;
open_channel(read_fd, write_fd);
close(STDIN_FILENO);
if(dup(read_fd) == -1)
return -1;
std::thread write_thread(write_message, write_fd, "Whatsup?");
std::thread threadReadStdin([] () {
readStdin();
});
write_thread.join();
threadReadStdin.join();
return 0;
}

timerfd and read

I have application, that periodically (by timer) check some data storage.
Like this:
#include <iostream>
#include <cerrno>
#include <cstring>
#include <cstdlib>
#include <sys/fcntl.h>
#include <unistd.h>
// EPOLL & TIMER
#include <sys/epoll.h>
#include <sys/timerfd.h>
int main(int argc, char **argv)
{
/* epoll instance */
int efd = epoll_create1(EPOLL_CLOEXEC);
if (efd < 0)
{
std::cerr << "epoll_create error: " << strerror(errno) << std::endl;
return EXIT_FAILURE;
}
struct epoll_event ev;
struct epoll_event events[128];
/* timer instance */
int tfd = timerfd_create(CLOCK_MONOTONIC, TFD_CLOEXEC);
struct timespec ts;
// first expiration in 3. seconds after program start
ts.tv_sec = 3;
ts.tv_nsec = 0;
struct itimerspec new_timeout;
struct itimerspec old_timeout;
bzero(&new_timeout, sizeof(new_timeout));
bzero(&old_timeout, sizeof(old_timeout));
// value
new_timeout.it_value = ts;
// no interval;
// timer will be armed in epoll_wait event trigger
new_timeout.it_interval.tv_sec =
new_timeout.it_interval.tv_nsec = 0;
// Add the timer descriptor to epoll.
if (tfd != -1)
{
ev.events = EPOLLIN | EPOLLERR /*| EPOLLET*/;
ev.data.ptr = &tfd;
epoll_ctl(efd, EPOLL_CTL_ADD, tfd, &ev);
}
int flags = 0;
if (timerfd_settime(tfd, flags, &new_timeout, &old_timeout) < 0)
{
std::cerr << "timerfd_settime error: " << strerror(errno) << std::endl;
}
int numEvents = 0;
int timeout = 0;
bool checkTimer = false;
while (1)
{
checkTimer = false;
numEvents = epoll_wait(efd, events, 128, timeout);
if (numEvents > 0)
{
for (int i = 0; i < numEvents; ++i)
{
if (events[i].data.ptr == &tfd)
{
std::cout << "timeout" << std::endl;
checkTimer = true;
}
}
}
else if(numEvents == 0)
{
continue;
}
else
{
std::cerr << "An error occured: " << strerror(errno) << std::endl;
}
if (checkTimer)
{
/* Check data storage */
uint64_t value;
ssize_t readBytes;
//while ( (readBytes = read(tfd, &value, 8)) > 0)
//{
// std::cout << "\tread: '" << value << "'" << std::endl;
//}
itimerspec new_timeout;
itimerspec old_timeout;
new_timeout.it_value.tv_sec = rand() % 3 + 1;
new_timeout.it_value.tv_nsec = 0;
new_timeout.it_interval.tv_sec =
new_timeout.it_interval.tv_nsec = 0;
timerfd_settime(tfd, flags, &new_timeout, &old_timeout);
}
}
return EXIT_SUCCESS;
}
This is simple description of my app.
After each timeout timer need to be rearmed by some value different in each timeout.
Questions are:
Is it necessary to add timerfd to epoll (epoll_ctl) with EPOLLET flag?
Is it necessary to read timerfd after each timeout?
Is it necessary to epoll_wait infinitely (timeout = -1)?
You can do this in one of two modes, edge triggered or level triggered. If you choose the edge triggered route then you must pass EPOLLET and do not need to read the timerfd after each wakeup. The fact that you receive an event from epoll means one or more time outs have fired. Optionally you may read the timerfd and it will return the number of time outs that have fired since you last read it.
If you choose the level triggered route then you don't need to pass EPOLLET, but you must read the timerfd after each wakeup. If you do not then you will immediately be woken up again until you consume the time out.
You should either pass -1 to epoll as the time out or some positive value. If you pass 0, like you do in the example, then you will never go to sleep, you'll just spin waiting for the time out to fire. That's almost certainly undesirable behaviour.
Answers to the questions:
Is it necessary to add timerfd to epoll (epoll_ctl) with EPOLLET flag?
No. Adding EPOLLET (edge trigger) does changes the behavior of receiving events. Without EPOLLET, you'll continuously receive the event from epoll_wait related to the timerfd until you've read() from the timerfd. With EPOLLET, you'll NOT receive additional events beyond the first one, even if new expiration occurs, until you've read() from the timerfd and a new expiration occur.
Is it necessary to read timerfd after each timeout?
Yes in order to continue and receive events (only) when new expiration occur (see above). No when periodic timer is not used (single expiration only), and you close the timerfd without reading.
Is it necessary to epoll_wait infinitely (timeout = -1)?
No. You can use epoll_wait's timeout instead of timerfd. I personally think it is easier to use timerfd than keep calculating the next timeout for EPOLL, especially if you expect multiple timeout intervals; keeping tabs on what is your next task when timeout occurs is much easier when it is tied to the specific event what woke up.