Trying to understand read write lock and the need for two mutex instead of one - concurrency

I have been reading this wiki article about read/write lock and it says to implement it we need two mutexes but why can't we do it with one mutex? I am wondering if my understanding is correct.
we need to make sure one write is happening at any moment
we need to make sure no one is reading while it's being written to
many reads can happen concurrently and its completely fine
Write:
// wait until no one is reading concurrently
if (wait(write_lock))
{
acquire(write_lock)
// Do the write operation
release(write_lock)
}
Read:
// wait until no one is currently writing
if (wait(write_lock))
{
// Do the read operation atomically
}

Related

Lock stepping pthread mutex

I don't know if this is good practice or not but I am doing work on a real time stream of input data and using pthreads in lockstep order to allow one thread at a time to do different operations at the same. This is my program flow for each thread:
void * my_thread() {
pthread_mutex_lock(&read_mutex);
/*
read data from a stream such as stdin into global buffer
*/
pthread_mutex_lock(&operation_mutex);
pthread_mutex_unlock(&read_mutex);
/*
perform some work on the data you read
*/
pthread_mutex_lock(&output_mutex);
pthread_mutex_unlock(&operation_mutex);
/*
Write the data to output such as stdout
*/
pthread_mutex_unlock(&output_mutex);
}
I know there is pthread conditional lock, but is my approach a good idea or a bad idea? I tested this on various size streams and I am trying to think of corner cases to make this deadlock, produce race condition, or both. I know mutexes don't guarantee thread order execution but I need help to think of scenarios that will break this.
UPDATE:
I stepped away from this, but had sometime recently to rethink about this. I rewrote the code using C++ threads and mutexes. I am trying to use condition variables but have no such luck. This is my approach to the problem:
void my_thread_v2() {
//Let only 1 thread read in at a time
std::unique_lock<std::mutex> stdin_lock(stdin_mutex);
stdin_cond.wait(stdin_lock);
/*
Read from stdin stream
*/
//Unlock the stdin mutex
stdin_lock.unlock();
stdin_cond.notify_one();
//Lock step
std::unique_lock<std::mutex> operation_lock(operation_mutex);
operation_cond.wait(operation_lock);
/*
Perform work on the data that you read in
*/
operation_lock.unlock();
operation_cond.notify_one();
std::unique_lock<std::mutex> stdout_lock(stdout_mutex);
stdout_cond.wait(stdout_lock);
/*
Write the data out to stdout
*/
//Unlock the stdout mutex
stdout_lock.unlock();
stdout_cond.notify_one();
}
I know the issue with this code is that there is no way to signal the first condition. I definitely am not understanding the proper use of the condition variable. I looked at various examples on cpp references, but can't seem to get away from the thought that the initial approach maybe the only way of doing what I want to do which is to lock step the threads. Can someone shed some light on this?
UPDATE 2:
So I implemented a simple Monitor class that utilizes C++ condition_variable and unique_lock:
class ThreadMonitor{
public:
ThreadMonitor() : is_occupied(false) {}
void Wait() {
std::unique_lock<std::mutex> lock(mx);
while(is_occupied) {
cond.wait(lock);
}
is_occupied = true;
}
void Notify() {
std::unique_lock<std::mutex> lock(mx);
is_occupied = false;
cond.notify_one();
}
private:
bool is_occupied;
std::condition_variable cond;
std::mutex mx;
};
This is my initial approach assuming i have three ThreadMonitors called stdin_mon, operation_mon, and stdout_mon:
void my_thread_v3() {
//Let only 1 thread read in at a time
stdin_mon.Wait();
/*
Read from stdin stream
*/
stdin_mon.Notify();
operation_mon.Wait();
/*
Perform work on the data that you read in
*/
operation_mon.Notify();
stdout_mon.Wait();
/*
Write the data out to stdout
*/
//Unlock the stdout
stdout_mon.notify();
}
The issue with this was that the data was still being corrupted so I had to change back to the original logic of lock stepping the threads:
void my_thread_v4() {
//Let only 1 thread read in at a time
stdin_mon.Wait();
/*
Read from stdin stream
*/
operation_mon.Wait();
stdin_mon.Notify();
/*
Perform work on the data that you read in
*/
stdout_mon.Wait();
operation_mon.Notify();
/*
Write the data out to stdout
*/
//Unlock the stdout
stdout_mon.notify();
}
I am beginning to suspect that if thread order matters that this is the only way to handle it. I am also questioning what the benefit is of using a Monitor that utilizes condition_variable over just using a mutex.
The problem with your approach is that you still can modify the data while another thread is reading it:
Thread A acquired read, then operation and released read again, and starts writing some data, but is interrupted.
Now thread B operates, acquires read and can read the partially modified, possibly inconsistent data!
I assume you want to allow multiple threads reading the same data without blocking, but as soon as writing, the data shall be protected. Finally, while outputting data, we are just reading the modified data again and thus can do this concurrently again, but need to prevent simultaneous write.
Instead of having multiple mutex instances, you can do this better with a read/write mutex:
Any function only reading the data acquires the read lock.
Any function intending to write acquires write lock right from the start (be aware that first acquiring read, then write lock without releasing the read lock in between can result in dead-lock; if you release read lock in between, though, your data handling needs to be robust against data being modified by another thread in between as well!).
Reducing write lock to shared without releasing in between is safe, so we can do so now before outputting. If data must not be modified in between writing data and outputting it, we even need to do this without entirely releasing the lock.
Last point is problematic as not supported neither by C++ standard's thread support library nor by pthreads library.
For C++ boost provides a solution; if you don't want to or cannot (C!) use boost a simple, but possibly not most efficient approach would be protecting acquiring write lock via another mutex:
acquire standard (non-rw) mutex protecting the read write mutex
acquire RW mutex for writing
release protecting mutex
read data, write modified data
acquire protecting mutex
release RW mutex
re-acquire RW mutex for reading; it does not matter if another thread acquired for reading as well, we only need to protect against locking for write here
release protecting mutex
output
release RW mutex (no need to protect)...
Non-modifying functions can just acquire the read lock without any further protection, there aren't any conflicts with...
In C++, you'd prefer using the thread support library and additionally gain platform independent code for free, in C, you would use a standard pthread mutex for protecting acquiring the write lock just as you did before and use the RW variants from pthread for the read write lock.

append and read one file at same time

I have one file that is updated in every second, I append some line end of it and another thread read it every time. so I have two pointer to this file for these work. is it possible?
(I use two while(1) for updating and reading in two function)
thanks.
Here's a good example for reading a single file with multiple threads : Mutlitple thread reading a single file
You could start from here.
Like said #MatsPetersson, you have to be really sure of what you're doing in each thread. If you don't want to read incomplete data, you will need to make sure the other thread is not writing in the file. There's several ways of doing this, you can use for example Mutex or Signal or Shared Memory Segment of a bool.
I think in your case, even if it's not explicit, you need to read only when no other thread is writing, to do this I will recommand the use of Mutex. Here's the doc : Mutex function documentation .
So we have readThread and writeThread. Here's a pseudo-code of how you treat your problem :
main(){
putTheMutexTo(1);
}
readThread(){
consumeMutex(1);
openTheFile();
readTheFile();
closeTheFile();
loadMutex(1);
}
writeThread(){
consumeMutex(1);
openTheFile();
writeTheFile();
closeTheFile();
loadMutex(1);
}
But if you don't really know how Mutex works, don't go code right now, and go read some doc on the Internet, because this is a bit complex to understand when you start.

proper way to use lock file(s) as locks between multiple processes

I have a situation where 2 different processes(mine C++, other done by other people in JAVA) are a writer and a reader from some shared data file. So I was trying to avoid race condition by writing a class like this(EDIT:this code is broken, it was just an example)
class ReadStatus
{
bool canRead;
public:
ReadStatus()
{
if (filesystem::exists(noReadFileName))
{
canRead = false;
return;
}
ofstream noWriteFile;
noWriteFile.open (noWriteFileName.c_str());
if ( ! noWriteFile.is_open())
{
canRead = false;
return;
}
boost::this_thread::sleep(boost::posix_time::seconds(1));
if (filesystem::exists(noReadFileName))
{
filesystem::remove(noWriteFileName);
canRead= false;
return;
}
canRead= true;
}
~ReadStatus()
{
if (filesystem::exists(noWriteFileName))
filesystem::remove(noWriteFileName);
}
inline bool OKToRead()
{
return canRead;
}
};
usage:
ReadStatus readStatus; //RAII FTW
if ( ! readStatus.OKToRead())
return;
This is for one program ofc, other will have analogous class.
Idea is:
1. check if other program created his "I'm owner file", if it has break else go to 2.
2. create my "I'm the owner" file, check again if other program created his own, if it has delete my file and break else go to 3.
3. do my reading, then delete mine "I'm the owner file".
Please note that rare occurences when they both dont read or write are OK, but the problem is that I still see a small chance of race conditions because theoretically other program can check for the existence of my lock file, see that there isnt one, then I create mine, other program creates his own, but before FS creates his file I check again, and it isnt there, then disaster occurs. This is why I added the one sec delay, but as a CS nerd I find it unnerving to have code like that running.
Ofc I don't expect anybody here to write me a solution, but I would be happy if someone does know a link to a reliable code that I can use.
P.S. It has to be files, cuz I'm not writing entire project and that is how it is arranged to be done.
P.P.S.: access to data file isn't reader,writer,reader,writer.... it can be reader,reader,writer,writer,writer,reader,writer....
P.P.S: other process is not written in C++ :(, so boost is out of the question.
On Unices the traditional way of doing pure filesystem based locking is to use dedicated lockfiles with mkdir() and rmdir(), which can be created and removed atomically via single system calls. You avoid races by never explicitly testing for the existence of the lock --- instead you always try to take the lock. So:
lock:
while mkdir(lockfile) fails
sleep
unlock:
rmdir(lockfile)
I believe this even works over NFS (which usually sucks for this sort of thing).
However, you probably also want to look into proper file locking, which is loads better; I use F_SETLK/F_UNLCK fcntl locks for this on Linux (note that these are different from flock locks, despite the name of the structure). This allows you to properly block until the lock is released. These locks also get automatically released if the app dies, which is usually a good thing. Plus, these will let you lock your shared file directly without having to have a separate lockfile. This, too, work on NFS.
Windows has very similar file locking functions, and it also has easy to use global named semaphores that are very convenient for synchronisation between processes.
As far as I've seen it, you can't reliably use files as locks for multiple processes. The problem is, while you create the file in one thread, you might get an interrupt and the OS switches to another process because I/O is taking so long. The same holds true for deletion of the lock file.
If you can, take a look at Boost.Interprocess, under the synchronization mechanisms part.
While I'm generally against making API calls which can throw from a constructor/destructor (see docs on boost::filesystem::remove) or making throwing calls without a catch block in general that's not really what you were asking about.
You could check out the Overlapped IO library if this is for windows. Otherwise have you considered using shared memory between the processes instead?
Edit: Just saw the other process was Java. You may still be able to create a named mutex that can be shared between processes and used that to create locks around the file IO bits so they have to take turns writing. Sorry I don't know Java so no I idea if that's more feasible than shared memory.

Thread safety in C++

I have question regarding thread safety as below ( I have only two threads in which one of the threads only read from the map, the other threads would be writing and reading as shown):
//Thread 2: the reading and writing thread
unordered_map<int, unordered_map<classA*>*>testMap;
//need lock because writing to the map?
testMap[1] = new unordered_map<int, classA*>;
//do not need lock because only reading and the other thread is only reading?
unordered_map<classA*>* ptr = testMap[1];
//need lock because writing?
(*ptr)[1] = new classA;
//do not need lock because only reading and the other thread is only reading?
classA* ptr2 = (*ptr)[1];
//din't modify the map, but modify the data pointed by the pointer stored by the map, do I need lock?
ptr2->field1 = 5;
ptr2->field2 = 6;
//end of reading and writing thread
What is the correct way to lock to unordered_map? Also, should I use a single lock or multiple locks?
Thanks.
If your map is the only shared resource, a single mutex is sufficient.
You need to lock the writing in the first thread, and lock the reading in the second one. If you lock the map only when writing on it, the second thread could read it while you are writing in it.
You dont need a lock in the last example regarding the pointers, since you dont deal with any data stored in the map.
Edit : in fact, it depends on what your are doing with the pointers and in which thread you do it.
You should read this great article : http://herbsutter.com/2010/09/24/effective-concurrency-know-when-to-use-an-active-object-instead-of-a-mutex/
You need to lock both, reading and writing. If you do not lock reading then a write can occur while you are reading and you may access the map in an inconsistent state.
What would be best in your situation would be a reader-writer-lock. Such a lock allows multiple readers to read at the same time but only one writer at the same time and no readers while a writer writes and vice versa.
Couple of things:
You should consider smart pointers to store in your map.
What you are doing is potentially quite dangerous (i.e. you may not be modifying the main map), but you are modifying what's stored there and if you do this outside of a lock, the end result could be anything - let's say that thread one has also read the same pointer and starts iterating whilst thread two is writing the instance of classA - what happens then?
I would have a lock around the main map, and then another lock for each payload map. Any operations on either map should require to obtain the lock at the correct level. I'd also be careful not to return iterators outside of the class that manages the lock, so basically you should implement all the methods you'd need within the class.

Lockless reader/writer

I have some data that is both read and updated by multiple threads. Both reads and writes must be atomic. I was thinking of doing it like this:
// Values must be read and updated atomically
struct SValues
{
double a;
double b;
double c;
double d;
};
class Test
{
public:
Test()
{
m_pValues = &m_values;
}
SValues* LockAndGet()
{
// Spin forver until we got ownership of the pointer
while (true)
{
SValues* pValues = (SValues*)::InterlockedExchange((long*)m_pValues, 0xffffffff);
if (pValues != (SValues*)0xffffffff)
{
return pValues;
}
}
}
void Unlock(SValues* pValues)
{
// Return the pointer so other threads can lock it
::InterlockedExchange((long*)m_pValues, (long)pValues);
}
private:
SValues* m_pValues;
SValues m_values;
};
void TestFunc()
{
Test test;
SValues* pValues = test.LockAndGet();
// Update or read values
test.Unlock(pValues);
}
The data is protected by stealing the pointer to it for every read and write, which should make it threadsafe, but it requires two interlocked instructions for every access. There will be plenty of both reads and writes and I cannot tell in advance if there will be more reads or more writes.
Can it be done more effective than this? This also locks when reading, but since it's quite possible to have more writes then reads there is no point in optimizing for reading, unless it does not inflict a penalty on writing.
I was thinking of reads acquiring the pointer without an interlocked instruction (along with a sequence number), copying the data, and then having a way of telling if the sequence number had changed, in which case it should retry. This would require some memory barriers, though, and I don't know whether or not it could improve the speed.
----- EDIT -----
Thanks all, great comments! I haven't actually run this code, but I will try to compare the current method with a critical section later today (if I get the time). I'm still looking for an optimal solution, so I will get back to the more advanced comments later. Thanks again!
What you have written is essentially a spinlock. If you're going to do that, then you might as well just use a mutex, such as boost::mutex. If you really want a spinlock, use a system-provided one, or one from a library rather than writing your own.
Other possibilities include doing some form of copy-on-write. Store the data structure by pointer, and just read the pointer (atomically) on the read side. On the write side then create a new instance (copying the old data as necessary) and atomically swap the pointer. If the write does need the old value and there is more than one writer then you will either need to do a compare-exchange loop to ensure that the value hasn't changed since you read it (beware ABA issues), or a mutex for the writers. If you do this then you need to be careful how you manage memory --- you need some way to reclaim instances of the data when no threads are referencing it (but not before).
There are several ways to resolve this, specifically without mutexes or locking mechanisms. The problem is that I'm not sure what the constraints on your system is.
Remember that atomic operations is something that often get moved around by the compilers in C++.
Generally I would solve the issue like this:
Multiple-producer-single-consumer by having 1 single-producer-single-consumer per writing thread. Each thread writes into their own queue. A single consumer thread that gathers the produced data and stores it in a single-consumer-multiple-reader data storage. The implementation for this is a lot of work and only recommended if you are doing a time-critical application and that you have the time to put in for this solution.
There are more things to read up about this, since the implementation is platform specific:
Atomic etc operations on windows/xbox360:
http://msdn.microsoft.com/en-us/library/ee418650(VS.85).aspx
The multithreaded single-producer-single-consumer without locks:
http://www.codeproject.com/KB/threads/LockFree.aspx#heading0005
What "volatile" really is and can be used for:
http://www.drdobbs.com/cpp/212701484
Herb Sutter has written a good article that reminds you of the dangers of writing this kind of code:
http://www.drdobbs.com/cpp/210600279;jsessionid=ZSUN3G3VXJM0BQE1GHRSKHWATMY32JVN?pgno=2