proper way to use lock file(s) as locks between multiple processes

proper way to use lock file(s) as locks between multiple processes - c++

I have a situation where 2 different processes(mine C++, other done by other people in JAVA) are a writer and a reader from some shared data file. So I was trying to avoid race condition by writing a class like this(EDIT:this code is broken, it was just an example)
class ReadStatus
{
bool canRead;
public:
ReadStatus()
{
if (filesystem::exists(noReadFileName))
{
canRead = false;
return;
}
ofstream noWriteFile;
noWriteFile.open (noWriteFileName.c_str());
if ( ! noWriteFile.is_open())
{
canRead = false;
return;
}
boost::this_thread::sleep(boost::posix_time::seconds(1));
if (filesystem::exists(noReadFileName))
{
filesystem::remove(noWriteFileName);
canRead= false;
return;
}
canRead= true;
}
~ReadStatus()
{
if (filesystem::exists(noWriteFileName))
filesystem::remove(noWriteFileName);
}
inline bool OKToRead()
{
return canRead;
}
};
usage:
ReadStatus readStatus; //RAII FTW
if ( ! readStatus.OKToRead())
return;
This is for one program ofc, other will have analogous class.
Idea is:
1. check if other program created his "I'm owner file", if it has break else go to 2.
2. create my "I'm the owner" file, check again if other program created his own, if it has delete my file and break else go to 3.
3. do my reading, then delete mine "I'm the owner file".
Please note that rare occurences when they both dont read or write are OK, but the problem is that I still see a small chance of race conditions because theoretically other program can check for the existence of my lock file, see that there isnt one, then I create mine, other program creates his own, but before FS creates his file I check again, and it isnt there, then disaster occurs. This is why I added the one sec delay, but as a CS nerd I find it unnerving to have code like that running.
Ofc I don't expect anybody here to write me a solution, but I would be happy if someone does know a link to a reliable code that I can use.
P.S. It has to be files, cuz I'm not writing entire project and that is how it is arranged to be done.
P.P.S.: access to data file isn't reader,writer,reader,writer.... it can be reader,reader,writer,writer,writer,reader,writer....
P.P.S: other process is not written in C++ :(, so boost is out of the question.

On Unices the traditional way of doing pure filesystem based locking is to use dedicated lockfiles with mkdir() and rmdir(), which can be created and removed atomically via single system calls. You avoid races by never explicitly testing for the existence of the lock --- instead you always try to take the lock. So:
lock:
while mkdir(lockfile) fails
sleep
unlock:
rmdir(lockfile)
I believe this even works over NFS (which usually sucks for this sort of thing).
However, you probably also want to look into proper file locking, which is loads better; I use F_SETLK/F_UNLCK fcntl locks for this on Linux (note that these are different from flock locks, despite the name of the structure). This allows you to properly block until the lock is released. These locks also get automatically released if the app dies, which is usually a good thing. Plus, these will let you lock your shared file directly without having to have a separate lockfile. This, too, work on NFS.
Windows has very similar file locking functions, and it also has easy to use global named semaphores that are very convenient for synchronisation between processes.

As far as I've seen it, you can't reliably use files as locks for multiple processes. The problem is, while you create the file in one thread, you might get an interrupt and the OS switches to another process because I/O is taking so long. The same holds true for deletion of the lock file.
If you can, take a look at Boost.Interprocess, under the synchronization mechanisms part.

While I'm generally against making API calls which can throw from a constructor/destructor (see docs on boost::filesystem::remove) or making throwing calls without a catch block in general that's not really what you were asking about.
You could check out the Overlapped IO library if this is for windows. Otherwise have you considered using shared memory between the processes instead?
Edit: Just saw the other process was Java. You may still be able to create a named mutex that can be shared between processes and used that to create locks around the file IO bits so they have to take turns writing. Sorry I don't know Java so no I idea if that's more feasible than shared memory.

Related

Shared file logging between threads in C++ 11 [duplicate]

This question already has answers here:
Is cout synchronized/thread-safe?
(4 answers)
Closed 5 years ago.
Recently I started learning C++ 11. I only studied C/C++ for a brief period of time when I was in college.I come from another ecosystem (web development) so as you can imagine I'm relatively new into C++.
At the moment I'm studying threads and how could accomplish logging from multiple threads with a single writer (file handle). So I wrote the following code based on tutorials and reading various articles.
My First question and request would be to point out any bad practices / mistakes that I have overlooked (although the code works with VC 2015).
Secondly and this is what is my main concern is that I'm not closing the file handle, and I'm not sure If that causes any issues. If it does when and how would be the most appropriate way to close it?
Lastly and correct me if I'm wrong I don't want to "pause" a thread while another thread is writing. I'm writing line by line each time. Is there any case that the output messes up at some point?
Thank you very much for your time, bellow is the source (currently for learning purposes everything is inside main.cpp).
#include <iostream>
#include <fstream>
#include <thread>
#include <string>
static const int THREADS_NUM = 8;
class Logger
{
public:
Logger(const std::string &path) : filePath(path)
{
this->logFile.open(this->filePath);
}
void write(const std::string &data)
{
this->logFile << data;
}
private:
std::ofstream logFile;
std::string filePath;
};
void spawnThread(int tid, std::shared_ptr<Logger> &logger)
{
std::cout << "Thread " + std::to_string(tid) + " started" << std::endl;
logger->write("Thread " + std::to_string(tid) + " was here!\n");
};
int main()
{
std::cout << "Master started" << std::endl;
std::thread threadPool[THREADS_NUM];
auto logger = std::make_shared<Logger>("test.log");
for (int i = 0; i < THREADS_NUM; ++i)
{
threadPool[i] = std::thread(spawnThread, i, logger);
threadPool[i].join();
}
return 0;
}
PS1: In this scenario there will always be only 1 file handle open for threads to log data.
PS2: The file handle ideally should close right before the program exits... Should it be done in Logger destructor?
UPDATE
The current output with 1000 threads is the following:
Thread 0 was here!
Thread 1 was here!
Thread 2 was here!
Thread 3 was here!
.
.
.
.
Thread 995 was here!
Thread 996 was here!
Thread 997 was here!
Thread 998 was here!
Thread 999 was here!
I don't see any garbage so far...

My First question and request would be to point out any bad practices / mistakes that I have overlooked (although the code works with VC 2015).
Subjective, but the code looks fine to me. Although you are not synchronizing threads (some std::mutex in logger would do the trick).
Also note that this:
std::thread threadPool[THREADS_NUM];
auto logger = std::make_shared<Logger>("test.log");
for (int i = 0; i < THREADS_NUM; ++i)
{
threadPool[i] = std::thread(spawnThread, i, logger);
threadPool[i].join();
}
is pointless. You create a thread, join it and then create a new one. I think this is what you are looking for:
std::vector<std::thread> threadPool;
auto logger = std::make_shared<Logger>("test.log");
// create all threads
for (int i = 0; i < THREADS_NUM; ++i)
threadPool.emplace_back(spawnThread, i, logger);
// after all are created join them
for (auto& th: threadPool)
th.join();
Now you create all threads and then wait for all of them. Not one by one.
Secondly and this is what is my main concern is that I'm not closing the file handle, and I'm not sure If that causes any issues. If it does when and how would be the most appropriate way to close it?
And when do you want to close it? After each write? That would be a redundant OS work with no real benefit. The file is supposed to be open through entire program's lifetime. Therefore there is no reason to close it manually at all. With graceful exit std::ofstream will call its destructor that closes the file. On non-graceful exit the os will close all remaining handles anyway.
Flushing a file's buffer (possibly after each write?) would be helpful though.
Lastly and correct me if I'm wrong I don't want to "pause" a thread while another thread is writing. I'm writing line by line each time. Is there any case that the output messes up at some point?
Yes, of course. You are not synchronizing writes to the file, the output might be garbage. You can actually easily check it yourself: spawn 10000 threads and run the code. It's very likely you will get a corrupted file.
There are many different synchronization mechanisms. But all of them are either lock-free or lock-based (or possibly a mix). Anyway a simple std::mutex (basic lock-based synchronization) in the logger class should be fine.

The first massive mistake is saying "it works with MSVC, I see no garbage", even moreso as it only works because your test code is broken (well it's not broken, but it's not concurrent, so of course it works fine).
But even if the code was concurrent, saying "I don't see anything wrong" is a terrible mistake. Multithreaded code is never correct unless you see something wrong, it is incorrect unless proven correct.
The goal of not blocking ("pausing") one thread while another is writing is unachieveable if you want correctness, at least if they concurrently write to the same descriptor. You must synchronize properly (call it any way you like, and use any method you like), or the behavior will be incorrect. Or worse, it will look correct for as long as you look at it, and it will behave wrong six months later when your most important customer uses it for a multi-million dollar project.
Under some operating systems, you can "cheat" and get away without synchronization as these offer syscalls that have atomicity guarantees (e.g. writev). That is however not what you may think, it is indeed heavyweight synchronization, only just you don't see it.
A better (more efficient) strategy than to use a mutex or use atomic writes might be to have a single consumer thread which writes to disk, and to push log tasks onto a concurrent queue from how many producer threads you like. This has minimum latency for threads that you don't want to block, and blocking where you don't care. Plus, you can coalesce several small writes into one.
Closing or not closing a file seems like a non-issue. After all, when the program exits, files are closed anyway. Well yes, except, there are three layers of caching (four actually if you count the physical disk's caches), two of them within your application and one within the operating system.
When data has made it at least into the OS buffers, all is good unless power fails unexpectedly. Not so for the other two levels of cache!
If your process dies unexpectedly, its memory will be released, which includes anything cached within iostream and anything cached within the CRT. So if you need any amount of reliability, you will either have to flush regularly (which is expensive), or use a different strategy. File mappying may be such a strategy because whatever you copy into the mapping is automatically (by definition) within the operating system's buffers, and unless power fails or the computer explodes, it will be written to disk.
That being said, there exist dozens of free and readily available logging libraries (such as e.g. spdlog) which do the job very well. There's really not much of a reason to reinvent this particular wheel.

Hello and welcome to the community!
A few comments on the code, and a few general tips on top of that.
Don't use native arrays if you do not absolutely have to.
Eliminating the native std::thread[] array and replacing it with an std::array would allow you to do a range based for loop which is the preferred way of iterating over things in C++. An std::vector would also work since you have to generate the thredas (which you can do with std::generate in combination with std::back_inserter)
Don't use smart pointers if you do not have specific memory management requirements, in this case a reference to a stack allocated logger would be fine (the logger would probably live for the duration of the program anyway, hence no need for explicit memory management). In C++ you try to use the stack as much as possible, dynamic memory allocation is slow in many ways and shared pointers introduce overhead (unique pointers are zero cost abstractions).
The join in the for loop is probably not what you want, it will wait for the previously spawned thread and spawn another one after it is finished. If you want parallelism you need another for loop for the joins, but the preferred way would be to use std::for_each(begin(pool), end(pool), [](auto& thread) { thread.join(); }) or something similar.
Use the C++ Core Guidelines and a recent C++ standard (C++17 is the current), C++11 is old and you probably want to learn the modern stuff instead of learning how to write legacy code. http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines
C++ is not java, use the stack as much as possible - this is one of the biggest advantages to using C++. Make sure you understand how the stack, constructors and destructors work by heart.

The first question is subjective so someone else would want to give an advice, but I don't see anything awful.
Nothing in C++ standard library is thread-safe except for some rare cases. A good answer on using ofstream in a multithreaded environment is given here.
Not closing a file is indeed an issue. You have to get familiar with RAII as it is one of the first things to learn. The answer by Detonar is a good piece of advice.

Concurrent File write between processes

I need to write log data into a single file from different processes.
I am using Windows Mutex which needs Common Language Runtime support for it.
Mutex^ m = gcnew Mutex( false,"MyMutex" );
m->WaitOne();
//... File Open and Write ..
m->ReleaseMutex()
Do I really need to change from C++ to C++/CLI for synchronization?
It is ok if the atomic is not used. But I need to know whether using this Mutex will slow down the performance compared to local mutex.

Adding CLR support to your C++ application just to get the Mutex class is overkill. There are several options available to you to synchronize your file access between two applications.
Option 1: Mutex
If you need to write a file from multiple processes, using a mutex is a good way to do it. Use the mutex functions in the Win32 API. (The .Net Mutex class is just a wrapper around those functions anyway.)
HANDLE mutex = CreateMutex(NULL, false, "MyMutex");
DWORD waitResult = WaitForSingleObject(mutex, INFINITE);
if (waitResult == WAIT_OBJECT_0)
{
// TODO: Write the file
WriteFile(...);
ReleaseMutex(mutex);
}
As the other answer noted, you will need to open the file with sharing, so that both of your applications can open it at once. However, that by itself may not be enough: If both of your applications are trying to write to the same area of the file, then you'll still need to make sure that only one application writes at a time. Imagine if both applications look at the size of the file, then both try to write to that byte offset at the same time: Even though both tried to just append to the end of the file, they ended up clobbering each other.
Option 2: Open as append only
If you're purely writing to the end of the file, and not ever attempting to read anything or to write anywhere other than the very end of the file, then there is a special mode you can use that will let you not use a mutex. If you open the file with dwDesiredAccess set to FILE_APPEND_DATA | SYNCHRONIZE and nothing else (don't include FILE_WRITE_DATA), then the OS will take care of making sure that all the data that gets written to the file at the end, and the two applications writing data do not overwrite each other. This behavior is documented on MSDN:
If only the FILE_APPEND_DATA and SYNCHRONIZE flags are set, the caller can write only to the end of the file, and any offset information about writes to the file is ignored. However, the file will automatically be extended as necessary for this type of write operation.
Option 3: LockFile
One other path you can take is to use the LockFile method. With LockFile (or LockFileEx), you can have both applications open the file, and have each app lock the section of the file that it wants to write to. This gives you more granularity than the mutex, allowing non-overlapping writes to happen at the same time. (Using LockFile on the entire file will give you the same basic effect as the mutex, with the added benefit that it will prevent other applications from writing the file while you're doing so.) There's a good example of how to use LockFile on Raymond Chen's blog.

Actually you don't need to use a separate mutex at all, you can just use the file itself. When a file is opened with the CreateFile API call (see https://msdn.microsoft.com/en-us/library/windows/desktop/aa363858%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396), the call takes a parameter called dwShareMode which specifiew what concurrent access is allowed by other processes. A value of 0 would prevent other processes from opening the file completely.
Pretty much all APIs to open a file map to CreateFile under the hood, so clr might be doing the right thing for you when you open a file for writing already.
In the C runtime there is also _fsopen which allows you to open a a file with the sharing flags.
I'd recommend you to test what the default sharing mode is when you open your file from C#. If it does not prevent simultaneous open for writing by default, use _fsopen from C (or maybe there is an appropriate C# function).

Where to put the mutex in a logging class?

First off, it's been a while since I've used any sort of mutex or semaphore, so go easy on me.
I have implemented a generic logging class that right now only receives a message from other classes and prepends that message with date/time and the level of debug, and then prints the message to stdout.
I would like to implement some sort of queue or buffer that will hold many messages that are sent to the logging class and then write them to a file.
The problem that I'm running into is I can't decide how/where to protect the queue.
Below is some pseudo-code of what I've come up with so far:
logMessage(char *msg, int debugLevel){
formattedMsg = formatMsg(msg, debugLevel) //formats the msg to include date/time & debugLevel
lockMutext()
queue.add(formattedMsg)
unlockMutex()
}
wrtieToFile(){
if (isMessageAvailable()) { //would check to see if there is a message in the queue
lockMutext()
file << queue.getFirst() //would append file with the first available msg from the queue
unlockMutex()
}
}
My questions are:
Do I really need to use the mutex in both places?
Is a mutex really what I'm looking for?
I'm thinking I may need a thread for the writing to the file part - does that sound like a good idea?
FYI I looking for a way to do this without using Boost or any 3rd party library.
EDIT The intended platform is Linux.
EDIT 2 Moved formatMsg to before the mutex lock (thank you #Paul Rubel)

With respect to do you really need the mutex. Think what could happen if you didn't lock things. Unless your queue is thread-safe you probably need to protect both insertion and removal.
Imagine execution contexts changing as you are removing the first element. The add could find the queue in a inconsistent state, and then who knows what could happen.
Regarding creating the message, unless formatMsg makes use of shared resources you can probably more it out of the locked section, which can increase your parallelism.
Extracting the writing to file into its own thread sounds like a reasonable choice, that way the logging threads will not have to make the calls themselves.

correct me if i'm wrong. Multiple callers from multiple threads all trying to access the same resource concurrently.
Maybe you could just have one mutex wrapping the entirety of your logging functionality.
watch out for race conditions.
Edit
Readers take a look at the comments to this answer for some valuable discussion

You can define a global variable which contains the number of element present in the queue or buffer. That means you need to increment or decrement this variable while adding data or removing data from buffer or queue. So you keep this variable inside a mutex for your above logging framework.

check file exists once in n mins c++

i have created a class which reads a file and does some operations on the contents and saves a new file with time stamp. But, i am in a requirement to perform in such a way that , a code should check every one min whether the file is present. If yes, it should process the file. It need to work on cross platform.
I am novice in c++ and need to know what approach i need to follow for this. Do i need to create process or something. I am completely blank .
class inputHandler
{
public:
void readInput();
void performTask();
void saveFile();
};
since the code implementation is too large, just i am posting the structure. I am ready to spend time on this. So, i need a sample tutorial which can guide me to achieve this .

This is not addressed by the C++ standard. Thus, you'll have to implement code for each supported system, or use a library.
As far as I understood, the most general solution is to create a thread which loops every minute, checking file timestamps. Naturally, depending on your code, you could do it another way, avoiding threads whatsoever. Using a notification system such as inotify could be much better. Also, you could use alarm() on POSIX-compatible systems, being alarmed whenever a minute has passed.
Anyway, if you go with the thread solution, in POSIX-compatible systems, check out pthread_create() and stat(). In Windows, check out CreateThread() and GetFileTime(). To have a one-minute delay, sleep(60000) or Sleep(60000) respectively should do the trick.
Just to clarify, "to create a process" is system's programming jargon meaning roughly "to launch a new program" (or "thread", sometimes). In that sense, if you follow the above you'll be creating a new thread.

The simple part is checking if a file exists: when you open an std::ifstream it will be in good state only if the file exists:
std::ifstream in(filename);
if (in) {
// the file exists and can be processed here
}
The more interesting part is to do something in regular intervals. The basic idea is to set up a timer in some form. Depending on whether anything else needs to be done you may need a separate thread: if the program just waits until the file exists and doesn't do anything in the mean time, you can just sleep and there is no need to spawn another thread. Otherwise, you probably want to spawn a thread which is just sleeping.
Assuming you need to use a separate thread, you probably want to be able to interrupt it from waiting, e.g., to exit in a clean way upon condition from a separate thread. thus, I would use a condition variable with a timed wait, i.e., something like this:
std::mutex guard;
std::condition_variable condition;
bool done(false);
std::unique_lock<std::mutex> lock(guard);
while (!done) {
condition.wait_for(lock, std::chrono::minutes(n));
if (!done) {
do_whatever_needs_to_be_done_once_every_n_minutes();
}
}
The code above uses C++ 2011 facilities. If you can't use the corresponding classes, you can use suitable alternatives, e.g., the Boost classes.

lock file so that it cannot be deleted

I'm working with two independent c/c++ applications on Windows where one of them constantly updates an image on disk (from a webcam) and the other reads that image for processing. This works fine and dandy 99.99% of the time, but every once in a while the reader app is in the middle of reading the image when the writer deletes it to refresh it with a new one.
The obvious solution to me seems to be to have the reader put some sort of a lock on the file so that the writer can see that it can't delete it and thus spin-lock on it until it can delete and update. Is there anyway to do this? Or is there another simple design pattern I can use to get the same sort of constant image refreshing between two programs?
Thanks,
-Robert

Try using a synchronization object, probably a mutex will do. Whenever a process wants to read or write to a file it should first acquire the mutex lock.

Yes, a locking mechanism would help. There are, unfortunately, several to choose from. Linux/Unix e.g. has flock (2), Windows has a similar (but different) mechanism.
Another (somewhat hacky) solution is to just write the file under a temporary name, then rename it. Many filesystems guarantee that a rename is atomic, so this may work. This however depends on the fs, so it's a bit hacky.

If you are willing to go with the Windows API, opening the file with CreateFile and passing in 0 for the dwShareMode will not allow any other application to open the file.
From the documentation:
Prevents other processes from opening a file or device if they
request delete, read, or write access.
Then you'd have to use ReadFile, WriteFile, CloseFile, etc rather than the C standard library functions.

Or, as a really simple kludge, the reader creates a temp file (says, .lock) before starting reading and deletes it afterwards. The write doesn't manipulate the file so long as .lock exists.
That's how Open Office does it (and others) and it's probably the simplest to implement, no matter which platform.

Joe, many solutions have been proposed; I commented on some of them but I'd like to chime in with an overall view and some specifics and recommendations:
You have the following options:
use filesystem locking: under Windows have both the reader and writer open (and create with the CREATE_ALWAYS disposition, respectively) the shared file in OF_SHARE_EXCLUSIVE mode; have both the reader and writer ready to handle ERROR_SHARING_VIOLATION and retry after some predefined period of time (e.g. 250ms)
use file renaming to essentially transfer file ownership: have the writer create a writer-private file (e.g. shared_file.tmpwrite), write to it, close it, then make it publicly available to the reader by renaming it to an agreed-upon "public" name (e.g. simply shared-file); have the reader periodically test for the existence of a file with the agreed-upon "public" name (e.g. shared-file) and, when one is found, attempt to first rename it to a reader-private name (e.g. shared_file.tmpread) before having the reader open it (under the reader-private name); under Windows use MOVEFILE_REPLACE_EXISTING; the rename operation does not have to be atomic for this to work
use other forms of interprocess communication (IPC): under Windows you can create a named mutex, and have both the reader and writer attempt to create (the existing mutex will be returned if it already exists) then acquire the named mutex before opening the shared file for reading or writing
implement your own filesystem-backed locking: take advantage of open(O_CREAT|O_EXCL) or, under Windows, of the CREATE_NEW disposition to atomically create an application lock file; unlike OF_SHARE_EXCLUSIVE approach above, it would be up to you to deal with stale lock files (i.e. lock files left by a process which did not shut down gracefully such as after a crash.)
I would implement method 1.
Method 2 would also work, but it is in a sense reinventing the wheel.
Method 3 arguably has the advantage of allowing your reader process to wait on the writer process and vice-versa, eliminating the need for the arbitrary sleep delays between the retries of methods 1 and 2 (polling); however, if you are OK with polling then you should still use method 1
Method 4 is listed for completeness only, as it is complex to implement (when the lock file is detected to be stale, e.g. by checking whether the PID contained therein still exists, multiple processes can potentially be competing for its removal, which introduces a race condition requiring a second lock, which in turn can become stale etc. etc., e.g.:
process A creates the lock file but dies without removing the lock file
process A restarts and tries to acquire the lock file but realizes it is stale
process B comes out of a sleep delay and also tries to acquire the lock file but realizes it is stale
process A removes the lock file, which it knew to be stale, and recreates it essentially reacquiring the lock
process B removes the lock file, which it (still) thinks is stale (although at this point it is no longer stale and owned by process A) -- violation

Instead of deleting images, what about appending them to the end of the file? This would allow you to keep adding to the file while the reader is still operating without destroying the file. The reader can then delete the image when it's done with it (provided it is necessary) and move onto the next image. Or, the other option would be store the image in a buffer, for writing, and you test the file pointer. If it's set to the head of the file then you can go ahead and write from the buffer to the file. Otherwise, wait until reader finishes and puts the pointer back at the head of the file.

couldn't you store a few images? ('n' sounds like a good number :-)
Not too many to fill your disk, but surely 3 would be enough? if not, you are writing faster than you can process and have a fundamental problem anyhoo (tune to discover 'n').
Cyclically overwrite.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js