lock file so that it cannot be deleted - c++

I'm working with two independent c/c++ applications on Windows where one of them constantly updates an image on disk (from a webcam) and the other reads that image for processing. This works fine and dandy 99.99% of the time, but every once in a while the reader app is in the middle of reading the image when the writer deletes it to refresh it with a new one.
The obvious solution to me seems to be to have the reader put some sort of a lock on the file so that the writer can see that it can't delete it and thus spin-lock on it until it can delete and update. Is there anyway to do this? Or is there another simple design pattern I can use to get the same sort of constant image refreshing between two programs?
Thanks,
-Robert

Try using a synchronization object, probably a mutex will do. Whenever a process wants to read or write to a file it should first acquire the mutex lock.

Yes, a locking mechanism would help. There are, unfortunately, several to choose from. Linux/Unix e.g. has flock (2), Windows has a similar (but different) mechanism.
Another (somewhat hacky) solution is to just write the file under a temporary name, then rename it. Many filesystems guarantee that a rename is atomic, so this may work. This however depends on the fs, so it's a bit hacky.

If you are willing to go with the Windows API, opening the file with CreateFile and passing in 0 for the dwShareMode will not allow any other application to open the file.
From the documentation:
Prevents other processes from opening a file or device if they
request delete, read, or write access.
Then you'd have to use ReadFile, WriteFile, CloseFile, etc rather than the C standard library functions.

Or, as a really simple kludge, the reader creates a temp file (says, .lock) before starting reading and deletes it afterwards. The write doesn't manipulate the file so long as .lock exists.
That's how Open Office does it (and others) and it's probably the simplest to implement, no matter which platform.

Joe, many solutions have been proposed; I commented on some of them but I'd like to chime in with an overall view and some specifics and recommendations:
You have the following options:
use filesystem locking: under Windows have both the reader and writer open (and create with the CREATE_ALWAYS disposition, respectively) the shared file in OF_SHARE_EXCLUSIVE mode; have both the reader and writer ready to handle ERROR_SHARING_VIOLATION and retry after some predefined period of time (e.g. 250ms)
use file renaming to essentially transfer file ownership: have the writer create a writer-private file (e.g. shared_file.tmpwrite), write to it, close it, then make it publicly available to the reader by renaming it to an agreed-upon "public" name (e.g. simply shared-file); have the reader periodically test for the existence of a file with the agreed-upon "public" name (e.g. shared-file) and, when one is found, attempt to first rename it to a reader-private name (e.g. shared_file.tmpread) before having the reader open it (under the reader-private name); under Windows use MOVEFILE_REPLACE_EXISTING; the rename operation does not have to be atomic for this to work
use other forms of interprocess communication (IPC): under Windows you can create a named mutex, and have both the reader and writer attempt to create (the existing mutex will be returned if it already exists) then acquire the named mutex before opening the shared file for reading or writing
implement your own filesystem-backed locking: take advantage of open(O_CREAT|O_EXCL) or, under Windows, of the CREATE_NEW disposition to atomically create an application lock file; unlike OF_SHARE_EXCLUSIVE approach above, it would be up to you to deal with stale lock files (i.e. lock files left by a process which did not shut down gracefully such as after a crash.)
I would implement method 1.
Method 2 would also work, but it is in a sense reinventing the wheel.
Method 3 arguably has the advantage of allowing your reader process to wait on the writer process and vice-versa, eliminating the need for the arbitrary sleep delays between the retries of methods 1 and 2 (polling); however, if you are OK with polling then you should still use method 1
Method 4 is listed for completeness only, as it is complex to implement (when the lock file is detected to be stale, e.g. by checking whether the PID contained therein still exists, multiple processes can potentially be competing for its removal, which introduces a race condition requiring a second lock, which in turn can become stale etc. etc., e.g.:
process A creates the lock file but dies without removing the lock file
process A restarts and tries to acquire the lock file but realizes it is stale
process B comes out of a sleep delay and also tries to acquire the lock file but realizes it is stale
process A removes the lock file, which it knew to be stale, and recreates it essentially reacquiring the lock
process B removes the lock file, which it (still) thinks is stale (although at this point it is no longer stale and owned by process A) -- violation

Instead of deleting images, what about appending them to the end of the file? This would allow you to keep adding to the file while the reader is still operating without destroying the file. The reader can then delete the image when it's done with it (provided it is necessary) and move onto the next image. Or, the other option would be store the image in a buffer, for writing, and you test the file pointer. If it's set to the head of the file then you can go ahead and write from the buffer to the file. Otherwise, wait until reader finishes and puts the pointer back at the head of the file.

couldn't you store a few images? ('n' sounds like a good number :-)
Not too many to fill your disk, but surely 3 would be enough? if not, you are writing faster than you can process and have a fundamental problem anyhoo (tune to discover 'n').
Cyclically overwrite.

Related

Synchronization between two threads in shared folder

We have two threads one for writing and another for removing and one shared directory where one thread creates a new file and writes inside some message, but another scans this directory with a delay and reads the message from the new file and removes it.
So basically I understand that it should be synchronized because if one created the file but has not written yet and another thread takes this empty file and will remove then we will have a problem.
but how long does creating or writing can take time in case if there 4kb for example?
the main question can we avoid synchronization or in this case, synchronization should be?
if one created the file but has not written yet and another thread takes this empty file and will remove then we will have a problem
The solution to this problem is for the file to never be incomplete or invalid.
File creation followed by file population is not an atomic sequence. To make it atomic for readers, the writer should:
Create a temporary file in the destination directory, e.g. <filename>~. So named temporary files (ending with ~) mustn't be read by any other process. When there are multiple competing writers, throw in a combination of (pid or tid) and (tsc or nanoseconds since epoch) into the temporary filename, e.g. <filename>.<pid>.<nsec>~. On Linux you can alterantively use mkstemp to create a unique file for you.
Populate <filename>~.
Rename <filename>~ to <filename>. On Linux renameat2 is an atomic filesystem operation, with option RENAME_NOREPLACE it fails when <filename> already exists, so that multiple writers can detect this condition and act accordingly. std::rename, unfortunately, doesn't require any particular behaviour when <filename> already exists.
This way when <filename> exists it is complete and valid.

Can you have multiple "cursors" for the same ifstream ? Would that be thread-safe?

I have multiple threads, and I want each of them to process a part of my file. Can I have a single ifstream object for that and make them read concurrently read different parts ? The parts are non overlapping, so the same line will not be processed by two threads. If yes, how to get multiple cursors ?
A single std::ifstream is associated with exactly one cursor (there's a seekg and tellg method associated with the std::ifstream directly).
If you want the same std::ifstream object to be shared accross multiple threads, you'll have to have some sort of synchronization mechanism between the threads, which might defeat the purpose (in each thread, you'll have to lock, seek, read and unlock each time).
To solve your problem, you can open one std::ifstream to the same file per thread. In each thread, you'd seek to whatever position you want to start reading from. This would only require you to be able to "easily" compute the seek position for each thread though (Note: this is a pretty strong requirement).
C++ file streams are not guaranteed to be thread safe (see e.g. this answer).
The typical solution is anyway to open separate streams on the same file, each instance comes with their own "cursor". However, you need to ensure shared access, and concurrency becomes platform specific.
For ifstream (i.e. only reading from the file), the concurrency issues are usually tame. Even if someone else modifies the file, both streams might see different content, but you do have some kind of eventual consistency.
Reads and writes are usually not atomic, i.e. you might read only part of a write. Writes might not even execute in the order they are issued (see write combining).
Looking at FILE struct it seems like there is a pointer inside FILE, char* curp, pointing to the current active pointer, which may mean that for each FILE object, you'd have one particular part of the file.
This being in C, I don't know how ifstream works and if it uses FILE object/it is built like a FILE object. Might not help you at all, but I thought it would be interesting to share this little information, and that it could may be help someone.

Concurrent File write between processes

I need to write log data into a single file from different processes.
I am using Windows Mutex which needs Common Language Runtime support for it.
Mutex^ m = gcnew Mutex( false,"MyMutex" );
m->WaitOne();
//... File Open and Write ..
m->ReleaseMutex()
Do I really need to change from C++ to C++/CLI for synchronization?
It is ok if the atomic is not used. But I need to know whether using this Mutex will slow down the performance compared to local mutex.
Adding CLR support to your C++ application just to get the Mutex class is overkill. There are several options available to you to synchronize your file access between two applications.
Option 1: Mutex
If you need to write a file from multiple processes, using a mutex is a good way to do it. Use the mutex functions in the Win32 API. (The .Net Mutex class is just a wrapper around those functions anyway.)
HANDLE mutex = CreateMutex(NULL, false, "MyMutex");
DWORD waitResult = WaitForSingleObject(mutex, INFINITE);
if (waitResult == WAIT_OBJECT_0)
{
// TODO: Write the file
WriteFile(...);
ReleaseMutex(mutex);
}
As the other answer noted, you will need to open the file with sharing, so that both of your applications can open it at once. However, that by itself may not be enough: If both of your applications are trying to write to the same area of the file, then you'll still need to make sure that only one application writes at a time. Imagine if both applications look at the size of the file, then both try to write to that byte offset at the same time: Even though both tried to just append to the end of the file, they ended up clobbering each other.
Option 2: Open as append only
If you're purely writing to the end of the file, and not ever attempting to read anything or to write anywhere other than the very end of the file, then there is a special mode you can use that will let you not use a mutex. If you open the file with dwDesiredAccess set to FILE_APPEND_DATA | SYNCHRONIZE and nothing else (don't include FILE_WRITE_DATA), then the OS will take care of making sure that all the data that gets written to the file at the end, and the two applications writing data do not overwrite each other. This behavior is documented on MSDN:
If only the FILE_APPEND_DATA and SYNCHRONIZE flags are set, the caller can write only to the end of the file, and any offset information about writes to the file is ignored. However, the file will automatically be extended as necessary for this type of write operation.
Option 3: LockFile
One other path you can take is to use the LockFile method. With LockFile (or LockFileEx), you can have both applications open the file, and have each app lock the section of the file that it wants to write to. This gives you more granularity than the mutex, allowing non-overlapping writes to happen at the same time. (Using LockFile on the entire file will give you the same basic effect as the mutex, with the added benefit that it will prevent other applications from writing the file while you're doing so.) There's a good example of how to use LockFile on Raymond Chen's blog.
Actually you don't need to use a separate mutex at all, you can just use the file itself. When a file is opened with the CreateFile API call (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396), the call takes a parameter called dwShareMode which specifiew what concurrent access is allowed by other processes. A value of 0 would prevent other processes from opening the file completely.
Pretty much all APIs to open a file map to CreateFile under the hood, so clr might be doing the right thing for you when you open a file for writing already.
In the C runtime there is also _fsopen which allows you to open a a file with the sharing flags.
I'd recommend you to test what the default sharing mode is when you open your file from C#. If it does not prevent simultaneous open for writing by default, use _fsopen from C (or maybe there is an appropriate C# function).

proper way to use lock file(s) as locks between multiple processes

I have a situation where 2 different processes(mine C++, other done by other people in JAVA) are a writer and a reader from some shared data file. So I was trying to avoid race condition by writing a class like this(EDIT:this code is broken, it was just an example)
class ReadStatus
{
bool canRead;
public:
ReadStatus()
{
if (filesystem::exists(noReadFileName))
{
canRead = false;
return;
}
ofstream noWriteFile;
noWriteFile.open (noWriteFileName.c_str());
if ( ! noWriteFile.is_open())
{
canRead = false;
return;
}
boost::this_thread::sleep(boost::posix_time::seconds(1));
if (filesystem::exists(noReadFileName))
{
filesystem::remove(noWriteFileName);
canRead= false;
return;
}
canRead= true;
}
~ReadStatus()
{
if (filesystem::exists(noWriteFileName))
filesystem::remove(noWriteFileName);
}
inline bool OKToRead()
{
return canRead;
}
};
usage:
ReadStatus readStatus; //RAII FTW
if ( ! readStatus.OKToRead())
return;
This is for one program ofc, other will have analogous class.
Idea is:
1. check if other program created his "I'm owner file", if it has break else go to 2.
2. create my "I'm the owner" file, check again if other program created his own, if it has delete my file and break else go to 3.
3. do my reading, then delete mine "I'm the owner file".
Please note that rare occurences when they both dont read or write are OK, but the problem is that I still see a small chance of race conditions because theoretically other program can check for the existence of my lock file, see that there isnt one, then I create mine, other program creates his own, but before FS creates his file I check again, and it isnt there, then disaster occurs. This is why I added the one sec delay, but as a CS nerd I find it unnerving to have code like that running.
Ofc I don't expect anybody here to write me a solution, but I would be happy if someone does know a link to a reliable code that I can use.
P.S. It has to be files, cuz I'm not writing entire project and that is how it is arranged to be done.
P.P.S.: access to data file isn't reader,writer,reader,writer.... it can be reader,reader,writer,writer,writer,reader,writer....
P.P.S: other process is not written in C++ :(, so boost is out of the question.
On Unices the traditional way of doing pure filesystem based locking is to use dedicated lockfiles with mkdir() and rmdir(), which can be created and removed atomically via single system calls. You avoid races by never explicitly testing for the existence of the lock --- instead you always try to take the lock. So:
lock:
while mkdir(lockfile) fails
sleep
unlock:
rmdir(lockfile)
I believe this even works over NFS (which usually sucks for this sort of thing).
However, you probably also want to look into proper file locking, which is loads better; I use F_SETLK/F_UNLCK fcntl locks for this on Linux (note that these are different from flock locks, despite the name of the structure). This allows you to properly block until the lock is released. These locks also get automatically released if the app dies, which is usually a good thing. Plus, these will let you lock your shared file directly without having to have a separate lockfile. This, too, work on NFS.
Windows has very similar file locking functions, and it also has easy to use global named semaphores that are very convenient for synchronisation between processes.
As far as I've seen it, you can't reliably use files as locks for multiple processes. The problem is, while you create the file in one thread, you might get an interrupt and the OS switches to another process because I/O is taking so long. The same holds true for deletion of the lock file.
If you can, take a look at Boost.Interprocess, under the synchronization mechanisms part.
While I'm generally against making API calls which can throw from a constructor/destructor (see docs on boost::filesystem::remove) or making throwing calls without a catch block in general that's not really what you were asking about.
You could check out the Overlapped IO library if this is for windows. Otherwise have you considered using shared memory between the processes instead?
Edit: Just saw the other process was Java. You may still be able to create a named mutex that can be shared between processes and used that to create locks around the file IO bits so they have to take turns writing. Sorry I don't know Java so no I idea if that's more feasible than shared memory.

Mutex lock on write only

I have a multithreaded C++ application which holds a complex data structure in memory (cached data).
Everything is great while I just read the data. I can have as many threads as I want access the data.
However the cached structure is not static.
If the requested data item is not available it will be read from database and is then inserted into the data tree. This is probably also not problematic and even if I use a mutex while I add the new data item to the tree that will only take few cycles (it's just adding a pointer).
There is a Garbage Collection process that's executed every now and then. It removes all old items from the tree. To do so I need to lock the whole thing down to make sure that no other process is currently accessing any data that's going to be removed from memory. I also have to lock the tree while I read from the cache so that I don't remove items while they are processed (kind of "the same thing the other way around").
"Pseudocode":
function getItem(key)
lockMutex()
foundItem = walkTreeToFindItem(key)
copyItem(foundItem, safeCopy)
unlockMutex()
return safeCopy
end function
function garbageCollection()
while item = nextItemInTree
if (tooOld) then
lockMutex()
deleteItem(item)
unlockMutex()
end if
end while
end function
What's bothering me: This means, that I have to lock the tree while I'm reading (to avoid the garbage collection to start while I read). However - as a side-effect - I also can't have two reading processes at the same time anymore.
Any suggestions?
Is there some kind of "this is a readonly action that only collides with writes" Mutex?
Look into read-write-lock.
You didn't specify which framework can you use but both pThread and boost have implemented that pattern.
The concept is a "shared reader, single writer" lock as others have stated. In Linux environments you should be able to use pthread_rwlock_t without any framework. I would suggest looking into boost::shared_lock as well.
I suggest a reader-writer lock. The idea is you can acquire a lock for "reading" or for "writing", and the lock will allow multiple readers, but only one writer. Very handy.
In C++17 this type of access (multiple read, single write) is supported directly with std::shared_mutex. I think this was adopted from boost::shared_lock. The topic is also described with an example in Anthony Williams's C++ Concurrency in Action.
https://livebook.manning.com/book/c-plus-plus-concurrency-in-action-second-edition/chapter-3/185.
The essential bits are:
use std::shared_lock on reads where shared access is allowed
existing shared_locks don't prevent a new shared_lock from locking
use std::unique_lock on update/write functions, where exclusive access is needed
will wait until all existing shared reads complete, as well as other writes
precludes any other reads or writes while locked