I have a local file which some process continuously appends to. I would like to serve that file with boost::beast.
So far I'm using boost::beast::http::response<boost::beast::http::file_body> and boost::beast::http::async_write to send the file to the client. That works very well and it is nice that boost::beast takes care of everything. However, when the end of the file is reached, it stops the asynchronous writing. I assume that is because is_done of the underlying serializer returns true at this point.
Is it possible to keep the asynchronous writing ongoing so new contents are written to the client as the local file grows (similar to how tail -f would keep writing the file's contents to stdout)?
I've figured that I might need to use boost::beast::http::response_serializer<boost::beast::http::file_body> for that kind of customization but I'm not sure how to use it correctly. And do I need to use chunked encoding for that purpose?
Note that keeping the HTTP connection open is not the problem, only writing further output as soon as the file grows.
After some research this problem seems not easily solvable, at least not under GNU/Linux which I'm currently focusing on.
It is possible to use chunked encoding as described in boost::beast's documentation. I've implemented serving chunks asynchronously from file contents which are also read asynchronously with the help of boost::asio::posix::stream_descriptor. That works quite well. However, it also stops with an end-of-file error as soon as the end of the file is reached. When using async_wait via the descriptor I'm getting the error "Operation not supported".
So it simply seems not possible to asynchronously wait for more bytes to be written to a file. That's strange considering tail -f does exactly that. So I've straceed tail -f and it turns out that it calls inotify_add_watch(4, "path_to_file", IN_MODIFY). Hence I assume one actually needs to use inotify to implement this.
For me it seems easier and more efficient to take control over the process which writes to the file so far to let it prints to stdout. Then I can stream the pipe (similarly to how I've attempted streaming the file) and write the file myself.
However, if one wanted to go down the road, I suppose using inotify and boost::asio::posix::stream_descriptor is the answer to the question, at least under GNU/Linux.
Related
I have a program in C++ which runs a loop like this, grabbing frames from a video device, using a proprietary driver which I have no access to.
while(true) {
mybuf = getNextFrame(); // blocks
}
I would like to build some logic using other programming languages, so I was thinking of using the following interface. (I need only Linux support)
I was thinking of having a file somewhere, like:
/my/video/device
And every time I call read() on it, it would give me the current frame. Also, if I call read() again, I would like it to block until the next frame is available and return that for me. Also, if I don't call open() for awhile, I don't want the frames in-between to be buffered.
What would be the best approach?
I tried to use FUSE to implement a filesystem, but it was trying to seek inside the file, if it was a regular file, and would only read up to the size I specified for the file. I then made a character device, but it would never call my read() function, instead it would say permission denied...
I was thinking about trying CUSE, or something along the lines. Am I over complicating things? I just need to be able to work with a stream of frames which are constantly coming from my C++ loop, but I want to parse them in a different language, like Python or Go. I also do not want to mix the compilation of my C++ code with Go or python, I want the two to be completely separate. I thought having some sort of file API between the two would make things easier. What would be a good way of handling this?
I would write the program using named pipes. One thing to keep in mind is that if the recieving end disconnects in the middle of a write the server will recieve a SIGPIPE signal and unless this signal is handled or blocked the server will be terminated.
I have a fortran code which needs to read a series of ascii data files (which all together are about 25 Gb). Basically the code opens a given ascii file, reads the information and use it to do some operations, and then close it. Then opens another file, reads the information, do some operations, and close it again. And so on with the rest of ascii files.
Overall each complete run takes about 10h. I usually need to run several independent calculations with different parameters, and the way I do is to run each independent calculation sequentially, so that at the end if I have 10 independent calculations, the total CPU time is 100h.
A more rapid way would be to run the 10 independent calculations at the same time using different processors on a cluster machine, but the problem is that if a given calculation needs to open and read data from a given ascii file which has been already opened and it's being used by another calculation, then the code gives obviously an error.
I wonder whether there is a way to verify if a given ascii file is already being used by another calculation, and if so to ask the code to wait until the ascii file is finally closed.
Any help would be of great help.
Many thanks in advance.
Obamakoak.
Two processes should be able to read the same file. Perhaps action="read" on the open statement might help. Must the files be human readable? The I/O would very likely be much faster with unformatted (sometimes call binary) files.
P.S. If your OS doesn't support multiple-read access, you might have to create your own lock system. Create a master file that a process opens to check which files are in use or not, and to update said list. Immediately closing after a check or update. To handle collisions on this read/write file, use iostat on the open statement and retry after a delay if there is an error.
I know this is an old thread but I've been struggling with the same issue for my own code.
My first attempt was creating a variable on a certain process (e.g. the master) and accessing this variable exclusively using one-sided passive MPI. This is fancy and works well, but only with newer versions of MPI.
Also, my code seemed happy to open (with READWRITE status) files that were also open in other processes.
Therefore, the easiest workaround, if your program has file access, is to make use of an external lock file, as described here. In your case, the code might look something like this:
A process checks whether the lock file exists using the NEW statement, which fails if a file already exists. It will look something like:
file_exists = .true.
do while (file_exists)
open(STATUS='NEW',unit=11,file=lock_file_name,iostat=open_stat)
if (open_stat.eq.0) then
file_exists = .false.
open(STATUS='OLD',ACTION=READWRITE',unit=12,file=data_file_name,iostat=ierr)
if (ierr.ne.0) stop
else
call sleep(1)
end if
end do
The file is now opened exclusively by the current process. Do the operations you need to do, such as reading, writing.
When you are done, close the data file and finally the lock file
close(12,iostat=ierr)
if (ierr.ne.0) stop
close(11,status='DELETE',iostat=ierr)
if (ierr.ne.0) stop
The data file is now again unlocked for the other processes.
I hope this may be useful for other people who have the same problem.
I am going to read a file in C++. The reading itself is happening in a while-loop, and is reading from one file.
When the function reads information from the file, it is going to push this information up some place in the system. The problem is that this file may change while the loop is ongoing.
How may I catch that new information in the file? I tried out std::ifstream reading and changing my file manually on my computer as the endless-loop (with a sleep(2) between each loop) was ongoing, but as expected -- nothing happend.
EDIT: the file will overwrite itself at each new entry of data to the file.
Help?
Running on virtual box Ubuntu Linux 12.04, if that may be useful info. And this is not homework.
The usual solution is something along the lines of what MichaelH
proposes: the writing process opens the file in append mode, and
always writes to the end. The reading process does what
MichaelH suggests.
This works fine for small amounts of data in each run. If the
processes are supposed to run a long time, and generate a lot of
data, the file will eventually become too big, since it will
contain all of the processed data as well. In this case, the
solution is to use a directory, generating numbered files in it,
one file per data record. The writing process will write each
data set to a new file (incrementing the number), and the
reading process will try to open the new file, and delete it
when it has finished. This is considerably more complex than
the first suggestion, but will work even for processes
generating large amounts of data per second and running for
years.
EDIT:
Later comments by the OP say that the device is actually a FIFO.
In that case:
you can't seek, so MichaelH's suggestion can't be used
literally, but
you don't need to seek, since data is automatically removed
from the FIFO whenever it has been read, and
depending on the size of the data, and how it is written, the
writes may be atomic, so you don't have to worry about partial
records, even if you happen to read exactly in the middle of
a write.
With regards to the latter: make sure that both the read and
write buffers are large enough to contain a complete record, and
that the writer flushes after each record. And make sure that
the records are smaller than the size needed to guarantee
atomicity. (Historically, on the early Unix I know, this was
4096, but I would be surprised if it hasn't increased since
then. Although... Under Posix, this is defined by PIPE_BUF,
which is only guaranteed to be at least 512, and is only 4096
under modern Linux.)
Just read the file, rename the file, open the renamed file. do the processing of data to your system, and at the end of the loop close the file. After a sleep, re-open the file at the top of the white loop, rename it and repeat.
That's the simplest way to approach the problem and saves having to write code to process dynamic changes to the file during the processing stage.
To be absolutely sure you don't get any corruption it's best to rename the file. This guarantees that any changes from another process do not affect the processing. It may not be necessary to do this - it depends on the processing and how the file is updated. But it's a safer approach. A move or rename operation is guaranteed to be atomic - so there should be no concurreny issues if using this approach.
You can use inotify to watch file changes.
If you need simpler solution - read file attributes ( with stat(), and check last_write_time of a file ).
However you still may miss some file modification, while you'll be opening and rereading file. So if you have control over the application which writes to a file, i'd recommend you using something else to communicate between these processes, pipes for example.
To be more explicit, if you want tail-like behavior you'll want to:
Open the file, read in the data. Save the length. Close the file.
Wait for a bit.
Open the file again, attempt to seek to the last read position, read the remaining data, close.
rinse and repeat
I have a program that creates pipes between two processes. One process constantly monitors the output of the other and when specific output is encountered it gives input through the other pipe with the write() function. The problem I am having, though is that the contents of the pipe don't go through to the other process's stdin stream until I close() the pipe. I want this program to infinitely loop and react every time it encounters the output it is looking for. Is there any way to send the input to the other process without closing the pipe?
I have searched a bit and found that named pipes can be reopened after closing them, but I wanted to find out if there was another option since I have already written the code to use unnamed pipes and I haven't yet learned to use named pipes.
Take a look at using fflush.
How are you reading the other end? Are you expecting complete strings? You aren't sending terminating NULs in the snippet you posted. Perhaps sending strlen(string)+1 bytes will fix it. Without seeing the code it's hard to tell.
Use fsync. http://pubs.opengroup.org/onlinepubs/007908799/xsh/fsync.html
From http://www.delorie.com/gnu/docs/glibc/libc_239.html:
Once write returns, the data is enqueued to be written and can be read back right away, but it is not necessarily written out to permanent storage immediately. You can use fsync when you need to be sure your data has been permanently stored before continuing. (It is more efficient for the system to batch up consecutive writes and do them all at once when convenient. Normally they will always be written to disk within a minute or less.) Modern systems provide another function fdatasync which guarantees integrity only for the file data and is therefore faster. You can use the O_FSYNC open mode to make write always store the data to disk before returning.
I have this tool in which a single log-like file is written to by several processes.
What I want to achieve is to have the file truncated when it is first opened, and then have all writes done at the end by the several processes that have it open.
All writes are systematically flushed and mutex-protected so that I don't get jumbled output.
First, a process creates the file, then starts a sequence of other processes, one at a time, that then open the file and write to it (the master sometimes chimes in with additional content; the slave process may or may not be open and writing something).
I'd like, as much as possible, not to use more IPC that what already exists (all I'm doing now is writing to a popen-created pipe). I have no access to external libraries other that the CRT and Win32 API, and I would like not to start writing serialization code.
Here is some code that shows where I've gone:
// open the file. Truncate it if we're the 'master', append to it if we're a 'slave'
std::ofstream blah(filename, ios::out | (isClient ? ios:app : 0));
// do stuff...
// write stuff
myMutex.acquire();
blah << "stuff to write" << std::flush;
myMutex.release();
Well, this does not work: although the output of the slave process is ordered as expected, what the master writes is either bunched together or at the wrong place, when it exists at all.
I have two questions: is the flag combination given to the ofstream's constructor the right one ? Am I going the right way anyway ?
If you'll be writing a lot of data to the log from multiple threads, you'll need to rethink the design, since all threads will block on trying to acquire the mutex, and in general you don't want your threads blocked from doing work so they can log. In that case, you'd want to write your worker thread to log entries to queue (which just requires moving stuff around in memory), and have a dedicated thread to pull entries off the queue and write them to the output. That way your worker threads are blocked for as short a time as possible.
You can do even better than this by using async I/O, but that gets a bit more tricky.
As suggested by reinier, the problem was not in the way I use the files but in the way the programs behave.
The fstreams do just fine.
What I missed out is the synchronization between the master and the slave (the former was assuming a particular operation was synchronous where it was not).
edit: Oh well, there still was a problem with the open flags. The process that opened the file with ios::out did not move the file pointer as needed (erasing text other processes were writing), and using seekp() completely screwed the output when writing to cout as another part of the code uses cerr.
My final solution is to keep the mutex and the flush, and, for the master process, open the file in ios::out mode (to create or truncate the file), close it and reopen it using ios::app.
I made a 'lil log system that has it's own process and will handle the writing process, the idea is quite simeple. The proccesses that uses the logs just send them to a pending queue which the log process will try to write to a file. It's like batch procesing in any realtime rendering app. This way you'll grt rid of too much open/close file operations. If I can I'll add the sample code.
How do you create that mutex?
For this to work this needs to be a named mutex so that both processes actually lock on the same thing.
You can check that your mutex is actually working correctly with a small piece of code that lock it in one process and another process which tries to acquire it.
I suggest blocking such that the text is completely written to the file before releasing the mutex. I've had instances where the text from one task is interrupted by text from a higher priority thread; doesn't look very pretty.
Also, put the format into Comma Separated format, or some format that can be easily loaded into a spreadsheet. Include thread ID and timestamp. The interlacing of the text lines shows how the threads are interacting. The ID parameter allows you to sort by thread. Timestamps can be used to show sequential access as well as duration. Writing in a spreadsheet friendly format will allow you to analyze the log file with an external tool without writing any conversion utilities. This has helped me greatly.
One option is to use ACE::logging. It has an efficient implementation of concurrent logging.