We have two threads one for writing and another for removing and one shared directory where one thread creates a new file and writes inside some message, but another scans this directory with a delay and reads the message from the new file and removes it.
So basically I understand that it should be synchronized because if one created the file but has not written yet and another thread takes this empty file and will remove then we will have a problem.
but how long does creating or writing can take time in case if there 4kb for example?
the main question can we avoid synchronization or in this case, synchronization should be?
if one created the file but has not written yet and another thread takes this empty file and will remove then we will have a problem
The solution to this problem is for the file to never be incomplete or invalid.
File creation followed by file population is not an atomic sequence. To make it atomic for readers, the writer should:
Create a temporary file in the destination directory, e.g. <filename>~. So named temporary files (ending with ~) mustn't be read by any other process. When there are multiple competing writers, throw in a combination of (pid or tid) and (tsc or nanoseconds since epoch) into the temporary filename, e.g. <filename>.<pid>.<nsec>~. On Linux you can alterantively use mkstemp to create a unique file for you.
Populate <filename>~.
Rename <filename>~ to <filename>. On Linux renameat2 is an atomic filesystem operation, with option RENAME_NOREPLACE it fails when <filename> already exists, so that multiple writers can detect this condition and act accordingly. std::rename, unfortunately, doesn't require any particular behaviour when <filename> already exists.
This way when <filename> exists it is complete and valid.
Related
I have multiple threads, and I want each of them to process a part of my file. Can I have a single ifstream object for that and make them read concurrently read different parts ? The parts are non overlapping, so the same line will not be processed by two threads. If yes, how to get multiple cursors ?
A single std::ifstream is associated with exactly one cursor (there's a seekg and tellg method associated with the std::ifstream directly).
If you want the same std::ifstream object to be shared accross multiple threads, you'll have to have some sort of synchronization mechanism between the threads, which might defeat the purpose (in each thread, you'll have to lock, seek, read and unlock each time).
To solve your problem, you can open one std::ifstream to the same file per thread. In each thread, you'd seek to whatever position you want to start reading from. This would only require you to be able to "easily" compute the seek position for each thread though (Note: this is a pretty strong requirement).
C++ file streams are not guaranteed to be thread safe (see e.g. this answer).
The typical solution is anyway to open separate streams on the same file, each instance comes with their own "cursor". However, you need to ensure shared access, and concurrency becomes platform specific.
For ifstream (i.e. only reading from the file), the concurrency issues are usually tame. Even if someone else modifies the file, both streams might see different content, but you do have some kind of eventual consistency.
Reads and writes are usually not atomic, i.e. you might read only part of a write. Writes might not even execute in the order they are issued (see write combining).
Looking at FILE struct it seems like there is a pointer inside FILE, char* curp, pointing to the current active pointer, which may mean that for each FILE object, you'd have one particular part of the file.
This being in C, I don't know how ifstream works and if it uses FILE object/it is built like a FILE object. Might not help you at all, but I thought it would be interesting to share this little information, and that it could may be help someone.
I need to write log data into a single file from different processes.
I am using Windows Mutex which needs Common Language Runtime support for it.
Mutex^ m = gcnew Mutex( false,"MyMutex" );
m->WaitOne();
//... File Open and Write ..
m->ReleaseMutex()
Do I really need to change from C++ to C++/CLI for synchronization?
It is ok if the atomic is not used. But I need to know whether using this Mutex will slow down the performance compared to local mutex.
Adding CLR support to your C++ application just to get the Mutex class is overkill. There are several options available to you to synchronize your file access between two applications.
Option 1: Mutex
If you need to write a file from multiple processes, using a mutex is a good way to do it. Use the mutex functions in the Win32 API. (The .Net Mutex class is just a wrapper around those functions anyway.)
HANDLE mutex = CreateMutex(NULL, false, "MyMutex");
DWORD waitResult = WaitForSingleObject(mutex, INFINITE);
if (waitResult == WAIT_OBJECT_0)
{
// TODO: Write the file
WriteFile(...);
ReleaseMutex(mutex);
}
As the other answer noted, you will need to open the file with sharing, so that both of your applications can open it at once. However, that by itself may not be enough: If both of your applications are trying to write to the same area of the file, then you'll still need to make sure that only one application writes at a time. Imagine if both applications look at the size of the file, then both try to write to that byte offset at the same time: Even though both tried to just append to the end of the file, they ended up clobbering each other.
Option 2: Open as append only
If you're purely writing to the end of the file, and not ever attempting to read anything or to write anywhere other than the very end of the file, then there is a special mode you can use that will let you not use a mutex. If you open the file with dwDesiredAccess set to FILE_APPEND_DATA | SYNCHRONIZE and nothing else (don't include FILE_WRITE_DATA), then the OS will take care of making sure that all the data that gets written to the file at the end, and the two applications writing data do not overwrite each other. This behavior is documented on MSDN:
If only the FILE_APPEND_DATA and SYNCHRONIZE flags are set, the caller can write only to the end of the file, and any offset information about writes to the file is ignored. However, the file will automatically be extended as necessary for this type of write operation.
Option 3: LockFile
One other path you can take is to use the LockFile method. With LockFile (or LockFileEx), you can have both applications open the file, and have each app lock the section of the file that it wants to write to. This gives you more granularity than the mutex, allowing non-overlapping writes to happen at the same time. (Using LockFile on the entire file will give you the same basic effect as the mutex, with the added benefit that it will prevent other applications from writing the file while you're doing so.) There's a good example of how to use LockFile on Raymond Chen's blog.
Actually you don't need to use a separate mutex at all, you can just use the file itself. When a file is opened with the CreateFile API call (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396), the call takes a parameter called dwShareMode which specifiew what concurrent access is allowed by other processes. A value of 0 would prevent other processes from opening the file completely.
Pretty much all APIs to open a file map to CreateFile under the hood, so clr might be doing the right thing for you when you open a file for writing already.
In the C runtime there is also _fsopen which allows you to open a a file with the sharing flags.
I'd recommend you to test what the default sharing mode is when you open your file from C#. If it does not prevent simultaneous open for writing by default, use _fsopen from C (or maybe there is an appropriate C# function).
I have one file that is updated in every second, I append some line end of it and another thread read it every time. so I have two pointer to this file for these work. is it possible?
(I use two while(1) for updating and reading in two function)
thanks.
Here's a good example for reading a single file with multiple threads : Mutlitple thread reading a single file
You could start from here.
Like said #MatsPetersson, you have to be really sure of what you're doing in each thread. If you don't want to read incomplete data, you will need to make sure the other thread is not writing in the file. There's several ways of doing this, you can use for example Mutex or Signal or Shared Memory Segment of a bool.
I think in your case, even if it's not explicit, you need to read only when no other thread is writing, to do this I will recommand the use of Mutex. Here's the doc : Mutex function documentation .
So we have readThread and writeThread. Here's a pseudo-code of how you treat your problem :
main(){
putTheMutexTo(1);
}
readThread(){
consumeMutex(1);
openTheFile();
readTheFile();
closeTheFile();
loadMutex(1);
}
writeThread(){
consumeMutex(1);
openTheFile();
writeTheFile();
closeTheFile();
loadMutex(1);
}
But if you don't really know how Mutex works, don't go code right now, and go read some doc on the Internet, because this is a bit complex to understand when you start.
How can i lock a file for read and write operation. That is If "ABC" file name is in Write lock, it also provide Read Lock on the same locked file. In normal case we want to wait till write operation completion.So if there any ways to acquire this kind of locking
Many programs simply use a lock file to signify that a certain file is currently in use for writing.
The lock file is later removed when done writing.
For example, when process #1 is about to start writing to file example, it creates file example.lock. Later when done writing, it simply removes example.lock.
When process #2 want to read from file example it first checks if file example.lock exists. If it does then the file is locked for write operations and process #2 will have to wait.
shared_mutex from Boost implements read/write locking.
I'm working with two independent c/c++ applications on Windows where one of them constantly updates an image on disk (from a webcam) and the other reads that image for processing. This works fine and dandy 99.99% of the time, but every once in a while the reader app is in the middle of reading the image when the writer deletes it to refresh it with a new one.
The obvious solution to me seems to be to have the reader put some sort of a lock on the file so that the writer can see that it can't delete it and thus spin-lock on it until it can delete and update. Is there anyway to do this? Or is there another simple design pattern I can use to get the same sort of constant image refreshing between two programs?
Thanks,
-Robert
Try using a synchronization object, probably a mutex will do. Whenever a process wants to read or write to a file it should first acquire the mutex lock.
Yes, a locking mechanism would help. There are, unfortunately, several to choose from. Linux/Unix e.g. has flock (2), Windows has a similar (but different) mechanism.
Another (somewhat hacky) solution is to just write the file under a temporary name, then rename it. Many filesystems guarantee that a rename is atomic, so this may work. This however depends on the fs, so it's a bit hacky.
If you are willing to go with the Windows API, opening the file with CreateFile and passing in 0 for the dwShareMode will not allow any other application to open the file.
From the documentation:
Prevents other processes from opening a file or device if they
request delete, read, or write access.
Then you'd have to use ReadFile, WriteFile, CloseFile, etc rather than the C standard library functions.
Or, as a really simple kludge, the reader creates a temp file (says, .lock) before starting reading and deletes it afterwards. The write doesn't manipulate the file so long as .lock exists.
That's how Open Office does it (and others) and it's probably the simplest to implement, no matter which platform.
Joe, many solutions have been proposed; I commented on some of them but I'd like to chime in with an overall view and some specifics and recommendations:
You have the following options:
use filesystem locking: under Windows have both the reader and writer open (and create with the CREATE_ALWAYS disposition, respectively) the shared file in OF_SHARE_EXCLUSIVE mode; have both the reader and writer ready to handle ERROR_SHARING_VIOLATION and retry after some predefined period of time (e.g. 250ms)
use file renaming to essentially transfer file ownership: have the writer create a writer-private file (e.g. shared_file.tmpwrite), write to it, close it, then make it publicly available to the reader by renaming it to an agreed-upon "public" name (e.g. simply shared-file); have the reader periodically test for the existence of a file with the agreed-upon "public" name (e.g. shared-file) and, when one is found, attempt to first rename it to a reader-private name (e.g. shared_file.tmpread) before having the reader open it (under the reader-private name); under Windows use MOVEFILE_REPLACE_EXISTING; the rename operation does not have to be atomic for this to work
use other forms of interprocess communication (IPC): under Windows you can create a named mutex, and have both the reader and writer attempt to create (the existing mutex will be returned if it already exists) then acquire the named mutex before opening the shared file for reading or writing
implement your own filesystem-backed locking: take advantage of open(O_CREAT|O_EXCL) or, under Windows, of the CREATE_NEW disposition to atomically create an application lock file; unlike OF_SHARE_EXCLUSIVE approach above, it would be up to you to deal with stale lock files (i.e. lock files left by a process which did not shut down gracefully such as after a crash.)
I would implement method 1.
Method 2 would also work, but it is in a sense reinventing the wheel.
Method 3 arguably has the advantage of allowing your reader process to wait on the writer process and vice-versa, eliminating the need for the arbitrary sleep delays between the retries of methods 1 and 2 (polling); however, if you are OK with polling then you should still use method 1
Method 4 is listed for completeness only, as it is complex to implement (when the lock file is detected to be stale, e.g. by checking whether the PID contained therein still exists, multiple processes can potentially be competing for its removal, which introduces a race condition requiring a second lock, which in turn can become stale etc. etc., e.g.:
process A creates the lock file but dies without removing the lock file
process A restarts and tries to acquire the lock file but realizes it is stale
process B comes out of a sleep delay and also tries to acquire the lock file but realizes it is stale
process A removes the lock file, which it knew to be stale, and recreates it essentially reacquiring the lock
process B removes the lock file, which it (still) thinks is stale (although at this point it is no longer stale and owned by process A) -- violation
Instead of deleting images, what about appending them to the end of the file? This would allow you to keep adding to the file while the reader is still operating without destroying the file. The reader can then delete the image when it's done with it (provided it is necessary) and move onto the next image. Or, the other option would be store the image in a buffer, for writing, and you test the file pointer. If it's set to the head of the file then you can go ahead and write from the buffer to the file. Otherwise, wait until reader finishes and puts the pointer back at the head of the file.
couldn't you store a few images? ('n' sounds like a good number :-)
Not too many to fill your disk, but surely 3 would be enough? if not, you are writing faster than you can process and have a fundamental problem anyhoo (tune to discover 'n').
Cyclically overwrite.