Multiple Pointers To Same FILE With Different Access Mode C++ - c++

Is It possible to have multiple FILE * s point to the same file with different access modes? For example
lets say i had fopen("File1.bin","wb",fp1) and i perform write operations and WITHOUT closing the file using fclose i call fopen("File1.bin","rb",fp2) and try to use write operations on it. this should fail. but fp2 still writes content to it when i use a different access mode. Why?

fopen() opens a file stream, which is an abstraction of a file. Sure, a file handle is opened underneath but it is perfectly acceptable to have concurrent access to the same file through different handles (which may even be in different processes).
A file is a shared resource.

Related

Recover stdio stream open mode

Is there a way for a function receiving a value of type FILE * to get the open mode used on the call to fopen() used to create the stream?
This question was motivated by the need to extend a C++ class that works as a wrapper to stdio's FILE pointers, in a way that I can clone an already open stream into a new wrapped one, while the original would continue to be used unwrapped by other parts of the program.
Under POSIX, I know that I can use fileno() to get the stream's underlying file descriptor in order to clone (dup()) it, but using the underlying descriptor's file flags would not be an exact replacement for the stream open mode, since it is possible that the stream would have stricter access restrictions than the descriptor it's bound to. So, do you have any suggestions?

Can you have multiple "cursors" for the same ifstream ? Would that be thread-safe?

I have multiple threads, and I want each of them to process a part of my file. Can I have a single ifstream object for that and make them read concurrently read different parts ? The parts are non overlapping, so the same line will not be processed by two threads. If yes, how to get multiple cursors ?
A single std::ifstream is associated with exactly one cursor (there's a seekg and tellg method associated with the std::ifstream directly).
If you want the same std::ifstream object to be shared accross multiple threads, you'll have to have some sort of synchronization mechanism between the threads, which might defeat the purpose (in each thread, you'll have to lock, seek, read and unlock each time).
To solve your problem, you can open one std::ifstream to the same file per thread. In each thread, you'd seek to whatever position you want to start reading from. This would only require you to be able to "easily" compute the seek position for each thread though (Note: this is a pretty strong requirement).
C++ file streams are not guaranteed to be thread safe (see e.g. this answer).
The typical solution is anyway to open separate streams on the same file, each instance comes with their own "cursor". However, you need to ensure shared access, and concurrency becomes platform specific.
For ifstream (i.e. only reading from the file), the concurrency issues are usually tame. Even if someone else modifies the file, both streams might see different content, but you do have some kind of eventual consistency.
Reads and writes are usually not atomic, i.e. you might read only part of a write. Writes might not even execute in the order they are issued (see write combining).
Looking at FILE struct it seems like there is a pointer inside FILE, char* curp, pointing to the current active pointer, which may mean that for each FILE object, you'd have one particular part of the file.
This being in C, I don't know how ifstream works and if it uses FILE object/it is built like a FILE object. Might not help you at all, but I thought it would be interesting to share this little information, and that it could may be help someone.

Concurrent File write between processes

I need to write log data into a single file from different processes.
I am using Windows Mutex which needs Common Language Runtime support for it.
Mutex^ m = gcnew Mutex( false,"MyMutex" );
m->WaitOne();
//... File Open and Write ..
m->ReleaseMutex()
Do I really need to change from C++ to C++/CLI for synchronization?
It is ok if the atomic is not used. But I need to know whether using this Mutex will slow down the performance compared to local mutex.
Adding CLR support to your C++ application just to get the Mutex class is overkill. There are several options available to you to synchronize your file access between two applications.
Option 1: Mutex
If you need to write a file from multiple processes, using a mutex is a good way to do it. Use the mutex functions in the Win32 API. (The .Net Mutex class is just a wrapper around those functions anyway.)
HANDLE mutex = CreateMutex(NULL, false, "MyMutex");
DWORD waitResult = WaitForSingleObject(mutex, INFINITE);
if (waitResult == WAIT_OBJECT_0)
{
// TODO: Write the file
WriteFile(...);
ReleaseMutex(mutex);
}
As the other answer noted, you will need to open the file with sharing, so that both of your applications can open it at once. However, that by itself may not be enough: If both of your applications are trying to write to the same area of the file, then you'll still need to make sure that only one application writes at a time. Imagine if both applications look at the size of the file, then both try to write to that byte offset at the same time: Even though both tried to just append to the end of the file, they ended up clobbering each other.
Option 2: Open as append only
If you're purely writing to the end of the file, and not ever attempting to read anything or to write anywhere other than the very end of the file, then there is a special mode you can use that will let you not use a mutex. If you open the file with dwDesiredAccess set to FILE_APPEND_DATA | SYNCHRONIZE and nothing else (don't include FILE_WRITE_DATA), then the OS will take care of making sure that all the data that gets written to the file at the end, and the two applications writing data do not overwrite each other. This behavior is documented on MSDN:
If only the FILE_APPEND_DATA and SYNCHRONIZE flags are set, the caller can write only to the end of the file, and any offset information about writes to the file is ignored. However, the file will automatically be extended as necessary for this type of write operation.
Option 3: LockFile
One other path you can take is to use the LockFile method. With LockFile (or LockFileEx), you can have both applications open the file, and have each app lock the section of the file that it wants to write to. This gives you more granularity than the mutex, allowing non-overlapping writes to happen at the same time. (Using LockFile on the entire file will give you the same basic effect as the mutex, with the added benefit that it will prevent other applications from writing the file while you're doing so.) There's a good example of how to use LockFile on Raymond Chen's blog.
Actually you don't need to use a separate mutex at all, you can just use the file itself. When a file is opened with the CreateFile API call (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396), the call takes a parameter called dwShareMode which specifiew what concurrent access is allowed by other processes. A value of 0 would prevent other processes from opening the file completely.
Pretty much all APIs to open a file map to CreateFile under the hood, so clr might be doing the right thing for you when you open a file for writing already.
In the C runtime there is also _fsopen which allows you to open a a file with the sharing flags.
I'd recommend you to test what the default sharing mode is when you open your file from C#. If it does not prevent simultaneous open for writing by default, use _fsopen from C (or maybe there is an appropriate C# function).

Is a file guaranteed to be openable for reading immidiately after ofstream::close() has returned?

I need my code (C++, on linux) to call a second executable, having previously written an output file which is read by the second program. Does the naïve approach,
std::ofstream out("myfile.txt");
// write output here
out.close();
system("secondprogram myfile.txt");
suffer from a potential race condition, where even though out.close() has executed, the file cannot immediately be read by secondprogram? If so, what is the best practice for resolving this?
Three notes:
If this is file-system-dependent, I'm interested in the behaviour on ext3 and tmpfs.
Clearly there are other reasons (file permissions etc.) why the second program might fail to open the file; I'm just interested in the potential for a race condition.
The hardcoded filename in the example above is for simplicity; in reality I use mkstemp.
Once the file has been closed, all the written data is guaranteed to be flushed from the buffers of the ofstream object (because at that point you can destroy it without any risk of losing whatsoever data, and actually closing the file is internally done by the destructor if needed). This does not mean that the data will at this point be physically on the disk (it will probably not, because of caching behavior of the OS disk drivers), but any program running in the same OS will be able to read the file consistently (as the OS will then perform the reading from the cached data). If you need to flush the OS buffers to the disk (which is not needed for your secondprogram to correctly read the input file), then you might want to look at the sync() function in <unistd.h>.
There is a potential failure mode that I missed earlier: You don't seem to have a way of recovering when the file cannot be opened by secondprogram. The problem is not that the file might be locked/inconsistent after close() returns, but that another program, completely unrelated to yours, might open the file between close() and system() (say, an AV scanner, someone greping through the directory containing the file, a backup process). If that happens, secondprogram will fail even though your program behaves correctly.
TL/DR: Even though everything works as expected, you have to account for the case that secondprogram may not be able to open the file!
According to cplusplus.com the function will return, when all data has been written to disk. So there should be no race-condition.

lock file so that it cannot be deleted

I'm working with two independent c/c++ applications on Windows where one of them constantly updates an image on disk (from a webcam) and the other reads that image for processing. This works fine and dandy 99.99% of the time, but every once in a while the reader app is in the middle of reading the image when the writer deletes it to refresh it with a new one.
The obvious solution to me seems to be to have the reader put some sort of a lock on the file so that the writer can see that it can't delete it and thus spin-lock on it until it can delete and update. Is there anyway to do this? Or is there another simple design pattern I can use to get the same sort of constant image refreshing between two programs?
Thanks,
-Robert
Try using a synchronization object, probably a mutex will do. Whenever a process wants to read or write to a file it should first acquire the mutex lock.
Yes, a locking mechanism would help. There are, unfortunately, several to choose from. Linux/Unix e.g. has flock (2), Windows has a similar (but different) mechanism.
Another (somewhat hacky) solution is to just write the file under a temporary name, then rename it. Many filesystems guarantee that a rename is atomic, so this may work. This however depends on the fs, so it's a bit hacky.
If you are willing to go with the Windows API, opening the file with CreateFile and passing in 0 for the dwShareMode will not allow any other application to open the file.
From the documentation:
Prevents other processes from opening a file or device if they
request delete, read, or write access.
Then you'd have to use ReadFile, WriteFile, CloseFile, etc rather than the C standard library functions.
Or, as a really simple kludge, the reader creates a temp file (says, .lock) before starting reading and deletes it afterwards. The write doesn't manipulate the file so long as .lock exists.
That's how Open Office does it (and others) and it's probably the simplest to implement, no matter which platform.
Joe, many solutions have been proposed; I commented on some of them but I'd like to chime in with an overall view and some specifics and recommendations:
You have the following options:
use filesystem locking: under Windows have both the reader and writer open (and create with the CREATE_ALWAYS disposition, respectively) the shared file in OF_SHARE_EXCLUSIVE mode; have both the reader and writer ready to handle ERROR_SHARING_VIOLATION and retry after some predefined period of time (e.g. 250ms)
use file renaming to essentially transfer file ownership: have the writer create a writer-private file (e.g. shared_file.tmpwrite), write to it, close it, then make it publicly available to the reader by renaming it to an agreed-upon "public" name (e.g. simply shared-file); have the reader periodically test for the existence of a file with the agreed-upon "public" name (e.g. shared-file) and, when one is found, attempt to first rename it to a reader-private name (e.g. shared_file.tmpread) before having the reader open it (under the reader-private name); under Windows use MOVEFILE_REPLACE_EXISTING; the rename operation does not have to be atomic for this to work
use other forms of interprocess communication (IPC): under Windows you can create a named mutex, and have both the reader and writer attempt to create (the existing mutex will be returned if it already exists) then acquire the named mutex before opening the shared file for reading or writing
implement your own filesystem-backed locking: take advantage of open(O_CREAT|O_EXCL) or, under Windows, of the CREATE_NEW disposition to atomically create an application lock file; unlike OF_SHARE_EXCLUSIVE approach above, it would be up to you to deal with stale lock files (i.e. lock files left by a process which did not shut down gracefully such as after a crash.)
I would implement method 1.
Method 2 would also work, but it is in a sense reinventing the wheel.
Method 3 arguably has the advantage of allowing your reader process to wait on the writer process and vice-versa, eliminating the need for the arbitrary sleep delays between the retries of methods 1 and 2 (polling); however, if you are OK with polling then you should still use method 1
Method 4 is listed for completeness only, as it is complex to implement (when the lock file is detected to be stale, e.g. by checking whether the PID contained therein still exists, multiple processes can potentially be competing for its removal, which introduces a race condition requiring a second lock, which in turn can become stale etc. etc., e.g.:
process A creates the lock file but dies without removing the lock file
process A restarts and tries to acquire the lock file but realizes it is stale
process B comes out of a sleep delay and also tries to acquire the lock file but realizes it is stale
process A removes the lock file, which it knew to be stale, and recreates it essentially reacquiring the lock
process B removes the lock file, which it (still) thinks is stale (although at this point it is no longer stale and owned by process A) -- violation
Instead of deleting images, what about appending them to the end of the file? This would allow you to keep adding to the file while the reader is still operating without destroying the file. The reader can then delete the image when it's done with it (provided it is necessary) and move onto the next image. Or, the other option would be store the image in a buffer, for writing, and you test the file pointer. If it's set to the head of the file then you can go ahead and write from the buffer to the file. Otherwise, wait until reader finishes and puts the pointer back at the head of the file.
couldn't you store a few images? ('n' sounds like a good number :-)
Not too many to fill your disk, but surely 3 would be enough? if not, you are writing faster than you can process and have a fundamental problem anyhoo (tune to discover 'n').
Cyclically overwrite.