Ive looked online and have not been able to satisfy myself with an answer.
Is memcpy threadsafe? (in Windows)
What I mean is if I write to an area of memory shared between processes (using boost::shared_memory_object) using a single memcpy and then try read that area from another
process using a single memcpy then will one process be blocked automatically
while that write is happening? Where can I read about this?
memcpy is typically coded for raw speed. It will not be thread safe. If you require this, you need to perform the memcpy call inside of a critical section or use some other semaphor mechanism.
take_mutex(&mutex);
memcpy(dst, src, count);
yield_mutex(&mutex);
memcpy is not thread/process safe
Routines like memcpy() (or memmove()) are part of standard C library, are included through standard <string.h> header and know nothing about any locking mechanics. Locking should be provided by some external way like inter-process mutexes, semaphores or things like this.
You are confusing "atomic" and "thread safe". If you read and write data (with or without memcpy) concurrently in a shared region, that is not safe. But of course copying data itself is thread safe.
memcpy itself is also thread safe, at least on POSIX systems see this one, and therefore I guess it is also on Windows. Anything else would make it quite useless.
If it would be "automatically blocking", it would have to be atomic (or at least manage it's own locks) which would slow down your system. So in your case you should use your own locks.
Related
Visual Studio's fread "locks out other threads." There is an alternate version _fread_nolock, which reads "without locking other threads", which should only be used "in thread-safe contexts such as single-threaded applications or where the calling scope already handles thread isolation."
Even after reading other somewhat relevant discussions on the two, I'm confused if the locking fread implements is on a specific FILE struct, a specific actual file, or on all fread calls on totally different files.
If you use the nolock versions, what level of locking do you need to provide? Can multiple threads in parallel be reading separate files without any locking? Can multiple threads in parallel be writing separate files without any locking? Or are there global or static variables involved that would be corrupted?
So, by using the nolock versions, are you able to potentially achieve better I/O throughput (if you aren't needlessly moving heads, like reading off separate drives, or a SSD drive), or is the potential gain just reducing redundant locks to a single lock (which should be negligible.)
Does VS' ifstream.read function work just like the regular fread? (I don't see a nolock version of it.)
The MS standard library implementation fully supports multi-threading. The C++ standard explain this requirement:
27.2.3: Concurrent access to a stream object, stream buffer object, or C Library stream by multiple threads may result in a data
race unless otherwise specified.
If one thread makes a library call a that writes a value to a stream
and, as a result, another thread reads this value from the stream
through a library call b such that this does not result in a data
race, then a’s write synchronizes with b’s read.
This means that if you write on a stream, a locking (not file locking, but concurrent access locking to the in-memory stream data structure) is done, to be sure that concurrency is well manageged for all the other threads using the same stream.
This locking overhead is always there, even if not needed. This could have a performance aspect, according to Microsoft:
the performance of the multithreaded libraries has been improved and
is close to the performance of the now-eliminated single-threaded
libraries. For those situations when even higher performance is
required, there are several new features.
This is why _nolock functions are provided. They access the stream directly without thread locking. It must be used with extreme care, for example:
if your application is single threaded (another process using the same stream has its own data structure, and OS manageds concurrency here)
if you're sure that no two threads use the same stream (for example if you have only one reader thread and writing is done outside your porgramme).
if you have other synchronisation mechasnism that protect a critical section of your code. For example, if you use a mutex lock, or an thread safe non blocking algorithm that makes use of atomics.
In such cases, the additional lock for stream access is not needed/redundant. For file intensive functions, it could be worth using the no_lock then.
Note: as you've pointed out: it's only worth using the nolock for intensive file accesses where you make millions of accesses.
fread_no_lock() appears to be used once you make sure that the file is locked with an external mechanism (some form of mutex, probably), and then you use it to reduce overhead: related: What's the intended use of _fread_nolock, _fseek_nolock?
This may also answer any further questions you might have: it may or may not be possible for your hard-drive to actually perform more than I/O operation at the same time depending on what type of hard drive you have: https://superuser.com/questions/252959/which-is-faster-copying-everything-at-once-or-one-thing-at-a-time
I've been playing around with gtkmm and multi-threaded GUIs and stumbled into the concept of a mutex. From what I've been able to gather, it serves the purpose of locking access to a variable for a single thread in order to avoid concurrency issues. This I understand, seems rather natural, however I still don't get how and when one should use a mutex. I've seen several uses where the mutex is only locked to access particular variables (e.g.like this tutorial). For which type of variables/data should a mutex be used?
PS: Most of the answers I've found on this subject are rather technical, and since I am far from an expert on this I was looking more for a conceptual answer.
If you have data that is accessed from more than a single thread, you probably need a mutex. You usually see something like
theMutex.lock()
do_something_with_data()
theMutex.unlock()
or a better idiom in c++ would be:
{
MutexGuard m(theMutex)
do_something_with_data()
}
where MutexGuard c'tor does the lock() and d'tor does the unlock()
This general rule has a few exceptions
if the data you are using can be accessed in an atomic manner, you don't need a lock. In Visual Studio you have functions like InterlockedIncrement() that do this. gcc has it's own facilities to do this.
If you are accessing the data to only ever read it and never change it, it's usually safe to do without locking. but if even a single thread does any change to the data, all the other threads need to make sure they don't try to read the data while it is being changed. You can also read about Reader-Writer lock for this kind of situations.
variables that are changed among multiple threads. So data that is not modified (immutable) or data that is not shared does not need
I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).
Could you help me to understand how to use mutexes in multithread Linux application, where:
during data writing it is need to lock variable on write and read
during data reading from the variable it is need to lock it on write.
So it is possible to read simultaneously, but writing opertion is a single opertaion in the same time. During writing, all other operation should wait before it finishes.
You're asking about something that is a bit higher level than mutexes. A mutex is a simple, low-level device. When you lock a thread with a mutex, the CPU is either executing code in the thread that obtained the lock or it is executing some other process entirely. In other words, the mutex has locked out all other threads that belong to the same (heavyweight) process.
You are asking about a read-write lock. Read-write locks use mutexes underneath the hood. The POSIX functions that deal with read-write locks start with pthread_rwlock_. Since you are on a Linux machine, just type man pthread and look for the section marked "READ/WRITE LOCK ROUTINES".
You need a reader/writer lock to allow multiple readers/single writer.
Boost.Thread has one of these (boost::shared_mutex), if you have no other preferred threading library. This uses PThreads primitives under the covers, and will probably save you time in wrapping the raw APIs yourself.
I would not recommend implementing this yourself - it's easy to get something that appears to work, but under load either crashes or kills performance or (worst of all) silently modifies your data in a way it should not be, so you get bad results.
A simple boost::mutex can also be used here as noted by #Als, but won't allow multiple concurrent reads. That is simpler to implement, and may be sufficient for your needs, depending on your read/write access profile.
You will need to use mutexes, if you have global or static objects which are being accessed(read and written to) from different threads.
I intend to perform opening for reading a single file from many threads using std::ifstream. My concern is if std::ifstream is thread-safe & lock-free?
More details:
I use g++ 4.4 on Ubuntu & Windows XP, 4.0 on Leopard.
Each thread creates its own instance of std::ifstream
Thanks in advance!
That is implementation defined. Standard C++ says absolutely nothing about threading, and therefore any assumptions about threads inherently invoke unspecified or implementation defined behavior.
We need the platform you are using to be more specific, but it's probably unreasonable to assume ifstream is either thread safe or lock free. If nothing else, there are probably locks involved in the OS level calls that actually do the reading from the file, in which case no true lock-free implementation is possible. Even without that, each read from an ifstream needs to check several format flags, and needs to update the flags bits depending on what occurs during the read. (i.e. istream::good() and istream::operator bool) Since there is no way all of that can be done atomicly, it's unreasonable to assume much about istream's thread safety characteristics.
See http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html.
As of the writing of that manual page, GCC's standard library defers to the operating system's C stdio file buffering. They avoid keeping state outside the C FILE structure and achieve some level of safety through it.
Since the C stdio library implements a buffer of a single range within the file around the last I/O operation, I don't see how a lock-free implementation is possible. The operations on a file must be processed serially. Perhaps unbuffered mode could help; that's a little more research than I'd like to do right now.
All std libraries are thread safe but not "async" safe. So you can call the same functions from different threads but not on the same objects.