Writing or flushing a file to asynchronous NFS with c++ ofstreams

Writing or flushing a file to asynchronous NFS with c++ ofstreams - c++

We have a project where multiple nodes writes a data to a file in sequence and the file resides on NFS.
We were using synchronous NFS before so the flush to file streams just worked fine. Now we have asynchronous NFS and its not working. Not working in a sense obviously the caching comes into picture and other nodes doesnt see the changes made by a particular node.
I wanted to know if there is a way to forcefully flush the data from the cache to disk. I know this is not efficient but it will get things working until we get the real solution in place.

I've had a similar problem using NFS with VxWorks. After some experimentation I've found a way to surely flush data to the device:
int fd;
fd = open("/ata0a/test.dat", O_RDWR | O_CREATE);
write(fd, "Hallo", 5);
/* data is having a great time in some buffers... */
ioctl(fd, FIOSYNC, 0); // <-- may last quite a while...
/* data is flushed to file */
I've never worked with ofstreams neither do I know if your OS provides something similar to the code shown above...
But one thing to try is to simply close the file. This will cause all buffers to be flushed. But be aware that there may be some time between closing the file and all data being flushed which your application does not see since the "close" call may return before the data is written. Additionally this creates a lot of overhead since you have to re-open the file afterwards.
If this is no option you can also write as much "dummy-data" after your data to cause the buffers to fill up. This will also result in the data being written to the file. But this may waste a lot of disk space depending on the size of your data.

Related

How to stream a file continuously with boost::beast

I have a local file which some process continuously appends to. I would like to serve that file with boost::beast.
So far I'm using boost::beast::http::response<boost::beast::http::file_body> and boost::beast::http::async_write to send the file to the client. That works very well and it is nice that boost::beast takes care of everything. However, when the end of the file is reached, it stops the asynchronous writing. I assume that is because is_done of the underlying serializer returns true at this point.
Is it possible to keep the asynchronous writing ongoing so new contents are written to the client as the local file grows (similar to how tail -f would keep writing the file's contents to stdout)?
I've figured that I might need to use boost::beast::http::response_serializer<boost::beast::http::file_body> for that kind of customization but I'm not sure how to use it correctly. And do I need to use chunked encoding for that purpose?
Note that keeping the HTTP connection open is not the problem, only writing further output as soon as the file grows.

After some research this problem seems not easily solvable, at least not under GNU/Linux which I'm currently focusing on.
It is possible to use chunked encoding as described in boost::beast's documentation. I've implemented serving chunks asynchronously from file contents which are also read asynchronously with the help of boost::asio::posix::stream_descriptor. That works quite well. However, it also stops with an end-of-file error as soon as the end of the file is reached. When using async_wait via the descriptor I'm getting the error "Operation not supported".
So it simply seems not possible to asynchronously wait for more bytes to be written to a file. That's strange considering tail -f does exactly that. So I've straceed tail -f and it turns out that it calls inotify_add_watch(4, "path_to_file", IN_MODIFY). Hence I assume one actually needs to use inotify to implement this.
For me it seems easier and more efficient to take control over the process which writes to the file so far to let it prints to stdout. Then I can stream the pipe (similarly to how I've attempted streaming the file) and write the file myself.
However, if one wanted to go down the road, I suppose using inotify and boost::asio::posix::stream_descriptor is the answer to the question, at least under GNU/Linux.

Python with-statement file close open: race condition [duplicate]

I'm running a test, and found that the file doesn't actually get written until I control-C to abort the program. Can anyone explain why that would happen?
I expected it to write at the same time, so I could read the file in the middle of the process.
import os
from time import sleep
f = open("log.txt", "a+")
i = 0
while True:
f.write(str(i))
f.write("\n")
i += 1
sleep(0.1)

Writing to disk is slow, so many programs store up writes into large chunks which they write all-at-once. This is called buffering, and Python does it automatically when you open a file.
When you write to the file, you're actually writing to a "buffer" in memory. When it fills up, Python will automatically write it to disk. You can tell it "write everything in the buffer to disk now" with
f.flush()
This isn't quite the whole story, because the operating system will probably buffer writes as well. You can tell it to write the buffer of the file with
os.fsync(f.fileno())
Finally, you can tell Python not to buffer a particular file with open(f, "w", 0) or only to keep a 1-line buffer with open(f,"w", 1). Naturally, this will slow down all operations on that file, because writes are slow.

You need to f.close() to flush the file write buffer out to the file. Or in your case you might just want to do a f.flush(); os.fsync(); so you can keep looping with the opened file handle.
Don't forget to import os.

You have to force the write, so I i use the following lines to make sure a file is written:
# Two commands together force the OS to store the file buffer to disc
f.flush()
os.fsync(f.fileno())

You will want to check out file.flush() - although take note that this might not write the data to disk, to quote:
Note:
flush() does not necessarily write the file’s data to disk. Use flush() followed by os.fsync() to ensure this behavior.
Closing the file (file.close()) will also ensure that the data is written - using with will do this implicitly, and is generally a better choice for more readability and clarity - not to mention solving other potential problems.

This is a windows-ism. If you add an explicit .close() when you're done with file, it'll appear in explorer at that time. Even just flushing it might be enough (I don't have a windows box handy to test). But basically f.write does not actually write, it just appends to the write buffer - until the buffer gets flushed you won't see it.
On unix the files will typically show up as a 0-byte file in this situation.

File Handler to be flushed.
f.flush()

The file does not get written, as the output buffer is not getting flushed until the garbage collection takes effect, and flushes the I/O buffer (more than likely by calling f.close()).
Alternately, in your loop, you can call f.flush() followed by os.fsync(), as documented here.
f.flush()
os.fsync()
All that being said, if you ever plan on sharing the data in that file with other portions of your code, I would highly recommend using a StringIO object.

Reading file that changes over time C++

I am going to read a file in C++. The reading itself is happening in a while-loop, and is reading from one file.
When the function reads information from the file, it is going to push this information up some place in the system. The problem is that this file may change while the loop is ongoing.
How may I catch that new information in the file? I tried out std::ifstream reading and changing my file manually on my computer as the endless-loop (with a sleep(2) between each loop) was ongoing, but as expected -- nothing happend.
EDIT: the file will overwrite itself at each new entry of data to the file.
Help?
Running on virtual box Ubuntu Linux 12.04, if that may be useful info. And this is not homework.

The usual solution is something along the lines of what MichaelH
proposes: the writing process opens the file in append mode, and
always writes to the end. The reading process does what
MichaelH suggests.
This works fine for small amounts of data in each run. If the
processes are supposed to run a long time, and generate a lot of
data, the file will eventually become too big, since it will
contain all of the processed data as well. In this case, the
solution is to use a directory, generating numbered files in it,
one file per data record. The writing process will write each
data set to a new file (incrementing the number), and the
reading process will try to open the new file, and delete it
when it has finished. This is considerably more complex than
the first suggestion, but will work even for processes
generating large amounts of data per second and running for
years.
EDIT:
Later comments by the OP say that the device is actually a FIFO.
In that case:
you can't seek, so MichaelH's suggestion can't be used
literally, but
you don't need to seek, since data is automatically removed
from the FIFO whenever it has been read, and
depending on the size of the data, and how it is written, the
writes may be atomic, so you don't have to worry about partial
records, even if you happen to read exactly in the middle of
a write.
With regards to the latter: make sure that both the read and
write buffers are large enough to contain a complete record, and
that the writer flushes after each record. And make sure that
the records are smaller than the size needed to guarantee
atomicity. (Historically, on the early Unix I know, this was
4096, but I would be surprised if it hasn't increased since
then. Although... Under Posix, this is defined by PIPE_BUF,
which is only guaranteed to be at least 512, and is only 4096
under modern Linux.)

Just read the file, rename the file, open the renamed file. do the processing of data to your system, and at the end of the loop close the file. After a sleep, re-open the file at the top of the white loop, rename it and repeat.
That's the simplest way to approach the problem and saves having to write code to process dynamic changes to the file during the processing stage.
To be absolutely sure you don't get any corruption it's best to rename the file. This guarantees that any changes from another process do not affect the processing. It may not be necessary to do this - it depends on the processing and how the file is updated. But it's a safer approach. A move or rename operation is guaranteed to be atomic - so there should be no concurreny issues if using this approach.

You can use inotify to watch file changes.
If you need simpler solution - read file attributes ( with stat(), and check last_write_time of a file ).
However you still may miss some file modification, while you'll be opening and rereading file. So if you have control over the application which writes to a file, i'd recommend you using something else to communicate between these processes, pipes for example.

To be more explicit, if you want tail-like behavior you'll want to:
Open the file, read in the data. Save the length. Close the file.
Wait for a bit.
Open the file again, attempt to seek to the last read position, read the remaining data, close.
rinse and repeat

unix application hangs during fread() when sending SIGINT

I'm reading a big file using fread. When I interrupt the program during it using Ctrl+C, the program hangs and is not killable, also not with kill -9. It simple sticks with 100% CPU, keeping the RAM it had already allocated. It would be great to get that fixed, but it would also be okay just to be able to kill that application from outside (the main problem being the fact that I can't restart that machine myself).
Is there a way of doing that in Unix?
Thanks!
Here is the source:
int Read_New_Format(const char* prefix,const char* folder)
{
char filename[500];
long count_pos;
//open files for reading.
sprintf(filename,"%s/%s.pos.mnc++",folder,prefix);
FILE *pos = fopen(filename,"r");
if(pos==NULL)
{
printf("Could not open pos file %s\n",filename);
}
//read the number count of entries in each of the three files.
fread(&count_pos,sizeof(long),1,pos);
printf("[...]");
//read the complete file into an array.
float *data_pos = new float[3*count_pos];
fread(data_pos,3*sizeof(float),*count_pos,pos);
printf("Read files.\n");
[...]
}

If your program cannot be interrupted by a signal, that almost surely means it's in an uninterruptable sleep state. This is normally an extremely short-lived state that only exists momentarily while waiting for the physical disk to perform a read or write, either due to an explicit read or write call that can't be satisfied by the cache, or one resulting from a page fault where a disk-backed page is not swapped into physical memory.
If the uninterruptable sleep state persists, this is almost surely indicative of either extremely high load on the storage device (a huge number of IO requests all happening at once) or, much more likely, damaged hardware.
I suspect you have a failing hard disk or scratched optical disc.

Problem wasn't reproducable after some days. Maybe a problem with the file system. As a workaround, direct use of the unix library routines instead of fread worked.

Writing concurrently to a file

I have this tool in which a single log-like file is written to by several processes.
What I want to achieve is to have the file truncated when it is first opened, and then have all writes done at the end by the several processes that have it open.
All writes are systematically flushed and mutex-protected so that I don't get jumbled output.
First, a process creates the file, then starts a sequence of other processes, one at a time, that then open the file and write to it (the master sometimes chimes in with additional content; the slave process may or may not be open and writing something).
I'd like, as much as possible, not to use more IPC that what already exists (all I'm doing now is writing to a popen-created pipe). I have no access to external libraries other that the CRT and Win32 API, and I would like not to start writing serialization code.
Here is some code that shows where I've gone:
// open the file. Truncate it if we're the 'master', append to it if we're a 'slave'
std::ofstream blah(filename, ios::out | (isClient ? ios:app : 0));
// do stuff...
// write stuff
myMutex.acquire();
blah << "stuff to write" << std::flush;
myMutex.release();
Well, this does not work: although the output of the slave process is ordered as expected, what the master writes is either bunched together or at the wrong place, when it exists at all.
I have two questions: is the flag combination given to the ofstream's constructor the right one ? Am I going the right way anyway ?

If you'll be writing a lot of data to the log from multiple threads, you'll need to rethink the design, since all threads will block on trying to acquire the mutex, and in general you don't want your threads blocked from doing work so they can log. In that case, you'd want to write your worker thread to log entries to queue (which just requires moving stuff around in memory), and have a dedicated thread to pull entries off the queue and write them to the output. That way your worker threads are blocked for as short a time as possible.
You can do even better than this by using async I/O, but that gets a bit more tricky.

As suggested by reinier, the problem was not in the way I use the files but in the way the programs behave.
The fstreams do just fine.
What I missed out is the synchronization between the master and the slave (the former was assuming a particular operation was synchronous where it was not).
edit: Oh well, there still was a problem with the open flags. The process that opened the file with ios::out did not move the file pointer as needed (erasing text other processes were writing), and using seekp() completely screwed the output when writing to cout as another part of the code uses cerr.
My final solution is to keep the mutex and the flush, and, for the master process, open the file in ios::out mode (to create or truncate the file), close it and reopen it using ios::app.

I made a 'lil log system that has it's own process and will handle the writing process, the idea is quite simeple. The proccesses that uses the logs just send them to a pending queue which the log process will try to write to a file. It's like batch procesing in any realtime rendering app. This way you'll grt rid of too much open/close file operations. If I can I'll add the sample code.

How do you create that mutex?
For this to work this needs to be a named mutex so that both processes actually lock on the same thing.
You can check that your mutex is actually working correctly with a small piece of code that lock it in one process and another process which tries to acquire it.

I suggest blocking such that the text is completely written to the file before releasing the mutex. I've had instances where the text from one task is interrupted by text from a higher priority thread; doesn't look very pretty.
Also, put the format into Comma Separated format, or some format that can be easily loaded into a spreadsheet. Include thread ID and timestamp. The interlacing of the text lines shows how the threads are interacting. The ID parameter allows you to sort by thread. Timestamps can be used to show sequential access as well as duration. Writing in a spreadsheet friendly format will allow you to analyze the log file with an external tool without writing any conversion utilities. This has helped me greatly.

One option is to use ACE::logging. It has an efficient implementation of concurrent logging.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Writing or flushing a file to asynchronous NFS with c++ ofstreams - c++

Related

How to stream a file continuously with boost::beast

Python with-statement file close open: race condition [duplicate]

Reading file that changes over time C++

unix application hangs during fread() when sending SIGINT

Writing concurrently to a file

Categories

Resources