fopen/fwrite and multi-threading? - c++

fopen/fwrite and multi-threading?
Some multi-threading programs open the same file, each thread create a file pointer to that the file.
There is one thread created by a paricular program that will update the file at some random time, whilst other threads, created by a different program, will simply read the contents of the file.
I guess this create a racing/data-inconsistence problem there if the writing thread change contents in the file whilst other threads try to read the contents.
The problem here is the thread that update the file should compiled into a different exe program than the the program that creates threads that read the contents of the file, so within-program level thread control become impossible.
My solution is create a very small "flag" file on the harddisk to indicates 3 status of the file:
1) writing-thread is updating the contents of the file;
2) reading-thread are reading the contents of the file;
3) Neither 1) or 2);
Using this flag file to block threads whenever necessary.
Are there some more-compact/neat solution to this problem?

It might be easier to use a process-global "named" semaphore that all the processes know about. Plus then you could use thread/process-blocking semaphore mechanisms instead of spin-looping on file-open-close and file contents...

Related

Is seek_ptr unique per file?

Sorry but I didn't find clear answer to my question,
I know that each file has its own seek_ptr, let's suppose the main process opened connection to file_A then before doing anything called fork()
Then forked process reads 2 chars, which is correct?
will seek_ptr be equal to 2 for both files?
seek_ptr be equal to 2 for the child process and still 0 for main process?
Only if the answer is 1:
How can I open 2 files in notepad and each file has its indicator/cursor in different locations?
In Unix, (pid, fd) acts as a pointer into the kernel's table of open file descriptions. When a process is forked, the child process will have a different PID, call it pid2. So (pid2, fd) is a different key from (pid, fd). However, these two pointers actually point to the same open file description: fork does not fork the open file descriptions themselves. Therefore, they share a single offset. If one process seeks, it affects the other process as well. If one process reads, it affects the other process as well.
However, either process is free to call close to dissociate fd from the existing open file description, then call open to create a new open file description which may refer to the same file. After this is done, the two processes will have different open file descriptions, and seeking in one does not affect the other.
Each successful call to open always creates a new open file description.

How to get correct file size only on the completion of a detected file change, not at the beginning?

I'm using libuv's uv_fs_event_t to monitor file changes. And once a change is detected, I open the file in the callback uv_fs_event_cb.
However, my program requires to also get the full file size when opening the file, so I would know how much memory is to be allocated based on the file size. I found that no matter I use libuv's uv_fs_fstat or POSIX's stat/stat64, or fseek+ftell I never get the correct file size immediately. It's because when my program is opening the file, the file is still being updated.
My program runs in a tight single thread with callbacks so delay/sleep isn't the best option here (and no guaranteed correctness either).
Is there any way to handle this with or without leveraging libuv, so that I can, say hold off opening and reading the file, until the write to the file has completed? In other words, instead of immediately detects the start of a change of a file event, can I in some way detects a completion of a file change?
One approach is to have the writer create an intermediate file, and finish I/O by renaming it to the target file. e.g. this is what happens in most browsers, the file has an "downloading.tmp" name until download is complete to discourage you from opening it.
Another approach is to write/touch a "finished" file after writing the main target file, and wait to see that file before the reader starts his job.
Last option I can see, if the file format can be altered slightly, have the writer print the file size as first bytes of the file, then the reader can preallocate correctly even if the file is not fully written, and then it will insist on reading all the data.
Overall I'm suggesting instead of a completion event, make the writer produce any event that can be monitored after it has completed it's task, and have the reader wait/synchronize on that event.

boost interprocess race condition prevention

I am having some issues with a code that is occasionally and sporadically throwing the following exception:
boost interprocess: no such file or directory
There are multiple codes accessing the same set of files, but some of the codes will move the files around to different directories in real time.
The codes handling and moving the files are using file locks, e.g.
boost::interprocess::file_lock
The process happening in code is the following:
1) Program 1 checks to see the file it wants to lock exists
2) If the above check passes, it then locks the file using file_lock
The problem, I think, is that between step 1 and 2, Program 2 can use boost::filesystem::rename on the file Program 1 is working on and move it.
If both programs are running simultaneously, is there any way to prevent this from happening?
Don't check if the file exists before locking. Instead, just attempt to lock it; if the file doesn't exist, Boost will throw an interprocess_exception alerting you that the file is locked.

UNIX File Descriptors Reuse

Though I'm reasonably used to UNIX and have programmed on it for a long time, I'm not used to file manipulation.
I know that 0/1/2 file descriptors are standard in, out, and error. I'm aware that whenever a process opens a file, it is given a descriptor with the smallest value that isn't yet used - and I understand some things about using dup/dup2.
I get confused about file descriptors between processes though. Does each process have its own 0/1/2 descriptors for in/out/error or are those 3 descriptors shared between all processes? How come you can run 3 programs in 3 different shells and they all get only their programs output if they are shared?
If two programs open myfile.txt after start-up, will they both use file descriptor #3, or would the second program use #4 since 3 was taken?
I know I asked the same question in a couple ways there, but I just wanted to be clear. The more detail the better :) I've never run into problems with these things while programming, but I'm reading through a UNIX book to understand more and I suddenly realized this confused me a lot and I'd never though about it in detail before.
Each file descriptor is local to the process. However, some file descriptors can refer to the same file - for example, if you create a child process using fork() it would share the files opened by the parent. It would have its own set of file descriptors, initially identical to the parent's ones, but they can change with closing/dup-ing, etc.
If two programs open the same file, in general they get separate file descriptors, pointing to separate internal structures. However, using certain techniques (fork, FD passing, etc.) you can have file descriptors in different processes point to the same internal entity. Generally, though, it is not the case.
Answering your question, both programs would have FD #3 for newly open file.
File descriptors in Unix (normally) persist through fork() and exec() calls. So yes, several processes can share file descriptors.
For example, a shell might do a command like:
foo | bar
In this case, foo's stdout must be connected to bar's stdin. To do this, the shell will most likely use pipe() to create reader- and writer file descriptors. It fork()s twice. The descriptors persist. The fork() which will call up foo, will close(1); dup(writer_fd); to make writer_fd descriptor 1. It will then exec(), and process foo will output to the pipe we created. For bar, we close(0); dup(reader); then exec(). And voila, foo will output to bar.
Don't confuse the file descriptors with the resources they represent. You can have ten different processes, each with a file descriptor of '3' open, and each refer to a different open file. When a process does I/O using its file descriptor, the OS knows which process is doing the I/O and is able to disambiguate which file is being referred to.

Writing concurrently to a file

I have this tool in which a single log-like file is written to by several processes.
What I want to achieve is to have the file truncated when it is first opened, and then have all writes done at the end by the several processes that have it open.
All writes are systematically flushed and mutex-protected so that I don't get jumbled output.
First, a process creates the file, then starts a sequence of other processes, one at a time, that then open the file and write to it (the master sometimes chimes in with additional content; the slave process may or may not be open and writing something).
I'd like, as much as possible, not to use more IPC that what already exists (all I'm doing now is writing to a popen-created pipe). I have no access to external libraries other that the CRT and Win32 API, and I would like not to start writing serialization code.
Here is some code that shows where I've gone:
// open the file. Truncate it if we're the 'master', append to it if we're a 'slave'
std::ofstream blah(filename, ios::out | (isClient ? ios:app : 0));
// do stuff...
// write stuff
myMutex.acquire();
blah << "stuff to write" << std::flush;
myMutex.release();
Well, this does not work: although the output of the slave process is ordered as expected, what the master writes is either bunched together or at the wrong place, when it exists at all.
I have two questions: is the flag combination given to the ofstream's constructor the right one ? Am I going the right way anyway ?
If you'll be writing a lot of data to the log from multiple threads, you'll need to rethink the design, since all threads will block on trying to acquire the mutex, and in general you don't want your threads blocked from doing work so they can log. In that case, you'd want to write your worker thread to log entries to queue (which just requires moving stuff around in memory), and have a dedicated thread to pull entries off the queue and write them to the output. That way your worker threads are blocked for as short a time as possible.
You can do even better than this by using async I/O, but that gets a bit more tricky.
As suggested by reinier, the problem was not in the way I use the files but in the way the programs behave.
The fstreams do just fine.
What I missed out is the synchronization between the master and the slave (the former was assuming a particular operation was synchronous where it was not).
edit: Oh well, there still was a problem with the open flags. The process that opened the file with ios::out did not move the file pointer as needed (erasing text other processes were writing), and using seekp() completely screwed the output when writing to cout as another part of the code uses cerr.
My final solution is to keep the mutex and the flush, and, for the master process, open the file in ios::out mode (to create or truncate the file), close it and reopen it using ios::app.
I made a 'lil log system that has it's own process and will handle the writing process, the idea is quite simeple. The proccesses that uses the logs just send them to a pending queue which the log process will try to write to a file. It's like batch procesing in any realtime rendering app. This way you'll grt rid of too much open/close file operations. If I can I'll add the sample code.
How do you create that mutex?
For this to work this needs to be a named mutex so that both processes actually lock on the same thing.
You can check that your mutex is actually working correctly with a small piece of code that lock it in one process and another process which tries to acquire it.
I suggest blocking such that the text is completely written to the file before releasing the mutex. I've had instances where the text from one task is interrupted by text from a higher priority thread; doesn't look very pretty.
Also, put the format into Comma Separated format, or some format that can be easily loaded into a spreadsheet. Include thread ID and timestamp. The interlacing of the text lines shows how the threads are interacting. The ID parameter allows you to sort by thread. Timestamps can be used to show sequential access as well as duration. Writing in a spreadsheet friendly format will allow you to analyze the log file with an external tool without writing any conversion utilities. This has helped me greatly.
One option is to use ACE::logging. It has an efficient implementation of concurrent logging.