What is the equivalent of Windows .NET (C++) SpinWait on Linux and Mac OS X? - c++

Windows .NET (C++) provides SpinWait for Hyper-threading friendly busy waiting with YIELD/PAUSE instructions. What is the equivalent function on Linux and Mac OS X? If a system call isn't available, how can an equivalent be implemented in user space?
See Windows Thread::SpinWait
See Long Duration Spin-wait Loops on Hyper-Threading Technology Enabled Intel Processors for a discussion of performance issues with spin waits.

See https://www.codeproject.com/articles/184046/spin-lock-in-c by sameer_87
#include "SpinLock.h"
#include <iostream>
using namespace LockFree;
using namespace std;
void tSpinWait::Lock(tSpinLock &LockObj)
{
m_iterations = 0;
while(true)
{
// A thread alreading owning the lock shouldn't be allowed to wait to acquire the lock - reentrant safe
if(LockObj.dest == GetCurrentThreadId())
break;
/*
Spinning in a loop of interlockedxxx calls can reduce the available memory bandwidth and slow
down the rest of the system. Interlocked calls are expensive in their use of the system memory
bus. It is better to see if the 'dest' value is what it is expected and then retry interlockedxx.
*/
if(InterlockedCompareExchange(&LockObj.dest, LockObj.exchange, LockObj.compare) == 0)
{
//assign CurrentThreadId to dest to make it re-entrant safe
LockObj.dest = GetCurrentThreadId();
// lock acquired
break;
}
// spin wait to acquire
while(LockObj.dest != LockObj.compare)
{
if(HasThreasholdReached())
{
if(m_iterations + YIELD_ITERATION >= MAX_SLEEP_ITERATION)
Sleep(0);
if(m_iterations >= YIELD_ITERATION && m_iterations < MAX_SLEEP_ITERATION)
{
m_iterations = 0;
SwitchToThread();
}
}
// Yield processor on multi-processor but if on single processor then give other thread the CPU
m_iterations++;
if(Helper::GetNumberOfProcessors() > 1) { YieldProcessor(/*no op*/); }
else { SwitchToThread(); }
}
}
}
//
void tSpinWait::Unlock(tSpinLock &LockObj)
{
if(LockObj.dest != GetCurrentThreadId())
throw std::runtime_error("Unexpected thread-id in release");
// lock released
InterlockedCompareExchange(&LockObj.dest, LockObj.compare, GetCurrentThreadId());
}

Related

Interprocess signalling : POSIX Semaphores/Shared memory vs TCP Socket

I am making an application where it has to communicate with another process on the same machine.I need to signal an error from a root level process to the UI process. For that I am using semaphore. Kernel level process runs in a loop and on error, it locks the semaphore and writes the type of error in a shared memory segment. While the UI process monitors the semaphore in a loop, if it finds the semaphore is locked, it will notify the user of the error with the error type in the shared memory segment.
//err_sem= named semaphore
P1_root_lvl()
{
while(1)
{
//Do Work
if(error_type1)
{
err_flag=true;
sem_trywait(&err_sem);
shared_mem=type1;
}
else if(error_type2)
{
err_flag=true;
sem_trywait(&err_sem);
shared_mem=type2;
}
else if(ok_state==true && err_flag==true)
{
sem_post(&err_sem);
err_flag=false;
}
}
}
P2_UI()
{
long sleep_time=10000
while(1)
{
sem_getvalue(&err_sem,ok)
if(ok==1)
{
//All OK
}
else if(ok==0)
{
err_type=shared_mem;
}
usleep(sleep_time);
}
}
P1 is running on high priority.
I have many other modules running in the system at the same time.
What is the performance difference when I increase/decrease the sleep_time?
What is the performance difference if I use sockets to signal the UI process (P1 as client and P2 as server)?
What else I can do to optimise this mechanism i am using?

CPU comsuption with Serial Port Thread

I write my professional application and I have one problem with the serial port thread.
I have cpu consuption. When I add SerialCtrl.h (from project SerialCtrl http://www.codeproject.com/Articles/99375/CSerialIO-A-Useful-and-Simple-Serial-Communication ) in my project my CPU % is become more 100% so without is near 40%.
I use VS C++ 2012 Professional in ANSI 32 bits MFC MT
SerialCtrl.cpp
const unsigned short MAX_MESSAGE = 300;
IMPLEMENT_DYNCREATE(SerialThread,CWinThread)
SerialThread::SerialThread() :m_serialIO(NULL)
{
}
SerialThread::~SerialThread()
{
m_serialIO = NULL;
}
BOOL SerialThread::InitInstance()
{
return TRUE;
}
int SerialThread::Run()
{
// Check signal controlling and status to open serial communication.
while(1)
{
while(m_serialIO->GetProcessActivateValue()==TRUE)
{
if ((serialCtrl().GetPortStatus()==FALSE)&&m_serialIO->GetPortActivateValue()==TRUE)
{
if(serialCtrl().OpenPort(m_serialIO->m_DCB,m_serialIO->m_strPortName)==TRUE)
{
m_serialIO->OnEventOpen(TRUE);
}
else
{
m_serialIO->OnEventOpen(FALSE);
m_serialIO->SetPortActivate(FALSE);
}
}
else if (m_serialIO->GetPortActivateValue()==TRUE)
{
char message[MAX_MESSAGE]={0};
unsigned int lenBuff = MAX_MESSAGE;
unsigned long lenMessage;
if(serialCtrl().Read(message,lenBuff,lenMessage)==TRUE)
{
if(lenMessage>0)
m_serialIO->OnEventRead(message,lenMessage);
}
else
{
m_serialIO->SetProcessActivate(FALSE);
}
}
if (m_serialIO->GetSendActivateValue()==TRUE)
{
unsigned long nWritten;
if(serialCtrl().Write(m_serialIO->m_sendBuffer,m_serialIO->m_sendSize,nWritten)==TRUE)
{
m_serialIO->OnEventWrite(nWritten);
}
else
{
m_serialIO->OnEventWrite(-1);
}
m_serialIO->SetSendActivate(FALSE);
}
if (m_serialIO->m_bClosePort==TRUE)
{
if (serialCtrl().ClosePort()==TRUE)
{
m_serialIO->OnEventClose(TRUE);
}
else
{
m_serialIO->OnEventClose(FALSE);
}
m_serialIO->m_bClosePort=FALSE;
}
}
break;
}
return 0;
}
void SerialThread::ClosePort()
{
serialCtrl().ClosePort();
}
I guess that it is SerialThread run which an issues but I didn't find how solve it.
(After performance and others tools)
Are you some idea?
Thank you
I took a look at your code, and unfortunately the problem comes from the library/project you are using. Basically the all-in-one thread is just looping and never waiting anywhere, and this leads to 100% CPU consumption.
What you can do :
Add a Sleep(1-10) at the end of the inner while loop in the run() method. This method is the worst, it just patch the underlying problem.
Use another, better designed library.
Make your own library suited to your use.
Some advises to make your own serial com wrapper :
Everything you need to know about serial ports on Windows is here : Serial Communications.
An IO thread should always wait somewhere. It can be on a blocking IO call like ReadFile(), or on a Windows waitable object.
If you can, use overlapped IO, even if you don't use asynchronous calls. It will enable simultaneous read and write, and make the reads and writes cancellable (cleanly).
You only need a separate thread to read. And optionally another one to write via a message queue, if you want a completely asynchronous library.

Waiting until another process locks and then unlocks a Win32 mutex

I am trying to tell when a producer process accesses a shared windows mutex. After this happens, I need to lock that same mutex and process the associated data. Is there a build in way in Windows to do this, short of a ridiculous loop?
I know the result of this is doable through creating a custom Windows event in the producer process, but I want to avoid changing this programs code as much as possible.
What I believe will work (in a ridiculously inefficient way) would be this (NOTE: this is not my real code, I know there are like 10 different things very wrong with this; I want to avoid doing anything like this):
#include <Windows.h>
int main() {
HANDLE h = CreateMutex(NULL, 0, "name");
if(!h) return -1;
int locked = 0;
while(true) {
if(locked) {
//can assume it wont be locked longer than a second, but even if it does should work fine
if(WaitForSingleObject(h, 1000) == WAIT_OBJECT_0) {
// do processing...
locked = 0;
ReleaseMutex(h);
}
// oh god this is ugly, and wastes so much CPU...
} else if(!(locked = WaitForSingleObject(h, 0) == WAIT_TIMEOUT)) {
ReleaseMutex(h);
}
}
return 0;
}
If there is an easier way with C++ for whatever reason, my code is actually that. This example was just easier to construct in C.
You will not be able to avoid changing the producer if efficient sharing is needed. Your design is fundamentally flawed for that.
A producer needs to be able to signal a consumer when data is ready to be consumed, and to make sure it does not alter the data while it is busy being consumed. You cannot do that with a single mutex alone.
The best way is to have the producer set an event when data is ready, and have the consumer reset the event when the data has been consumed. Use the mutex only to sync access to the data, not to signal the data's readiness.
#include <Windows.h>
int main()
{
HANDLE readyEvent = CreateEvent(NULL, TRUE, FALSE, "ready");
if (!readyEvent) return -1;
HANDLE mutex = CreateMutex(NULL, FALSE, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(readyEvent, 1000) == WAIT_OBJECT_0)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
// process as needed...
ResetEvent(readyEvent);
ReleaseMutex(mutex);
}
}
}
return 0;
}
If you can't change the producer to use an event, then at least add a flag to the data itself. The producer can lock the mutex, update the data and flag, and unlock the mutex. Consumers will then have to periodically lock the mutex, check the flag and read the new data if the flag is set, reset the flag, and unlock the mutex.
#include <Windows.h>
int main()
{
HANDLE mutex = CreateMutex(NULL, FALSE, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
if (ready)
{
// process as needed...
ready = false;
}
ReleaseMutex(mutex);
}
}
return 0;
}
So either way, your logic will have to be tweaked in both the producer and consumer.
Otherwise, if you can't change the producer at all, then you have no choice but to change the consumer alone to simply check the data for changes peridiodically:
#include <Windows.h>
int main()
{
HANDLE mutex = CreateMutex(NULL, 0, "name");
if (!mutex) return -1;
while(true)
{
if (WaitForSingleObject(mutex, 1000) == WAIT_OBJECT_0)
{
// check data for changes
// process new data as needed
// cache results for next time...
ReleaseMutex(mutex);
}
}
return 0;
}
Tricky. I'm going to answer the underlying question: when is the memory written?
This can be observed via a four step solution:
Inject a DLL in the watched process
Add a vectored exception handler for STATUS_GUARD_PAGE_VIOLATION
Set the guard page bit on the 2 MB memory range (finding it could be a challenge)
From the vectored exception handler, inform your process and re-establish the guard bit (it's one-shot)
You may need only a single guard page if the image is always fully rewritten.

c++ multithreading and affinity

I'm writing a simple thread pool for my application, which I test on dual-core processor. Usually it works good, but i noticed that when other processes are using more than 50% of processor, my application almost halts. This made me curious, so i decided to reproduce this situation and created auxiliary application, which simply runs infinite loop (without multithreading), taking 50% of processor. While auxiliary one is running, multithreaded application almost halts, as before (processing speed falls from 300-400 tasks per second to 5-10 tasks per second). But when I changed process affinity of my multithreaded program to use only one core (auxiliary still uses both), it started working, of course using at most 50% processor left. When I disabled multithreading in my application (still processing the same tasks, but without thread pool), it worked like charm, without any slow down from auxiliary, which was still running (and that's how two applications should behave when running on two cores). But when I enable multithreading, the problem comes back.
I've made special code for testing this particular ThreadPool:
header
#ifndef THREADPOOL_H_
#define THREADPOOL_H_
typedef double FloatingPoint;
#include <queue>
#include <vector>
#include <mutex>
#include <atomic>
#include <condition_variable>
#include <thread>
using namespace std;
struct ThreadTask
{
int size;
ThreadTask(int s)
{
size = s;
}
~ThreadTask()
{
}
};
class ThreadPool
{
protected:
queue<ThreadTask*> tasks;
vector<std::thread> threads;
std::condition_variable task_ready;
std::mutex variable_mutex;
std::mutex max_mutex;
std::atomic<FloatingPoint> max;
std::atomic<int> sleeping;
std::atomic<bool> running;
int threads_count;
ThreadTask * getTask();
void runWorker();
void processTask(ThreadTask*);
bool isQueueEmpty();
bool isTaskAvailable();
void threadMethod();
void createThreads();
void waitForThreadsToSleep();
public:
ThreadPool(int);
virtual ~ThreadPool();
void addTask(int);
void start();
FloatingPoint getValue();
void reset();
void clearTasks();
};
#endif /* THREADPOOL_H_ */
and .cpp
#include "stdafx.h"
#include <climits>
#include <float.h>
#include "ThreadPool.h"
ThreadPool::ThreadPool(int t)
{
running = true;
threads_count = t;
max = FLT_MIN;
sleeping = 0;
if(threads_count < 2) //one worker thread has no sense
{
threads_count = (int)thread::hardware_concurrency(); //default value
if(threads_count == 0) //in case it fails ('If this value is not computable or well defined, the function returns 0')
threads_count = 2;
}
printf("%d worker threads\n", threads_count);
}
ThreadPool::~ThreadPool()
{
running = false;
reset(); //it will make sure that all worker threads are sleeping on condition variable
task_ready.notify_all(); //let them finish in natural way
for (auto& th : threads)
th.join();
}
void ThreadPool::start()
{
createThreads();
}
FloatingPoint ThreadPool::getValue()
{
waitForThreadsToSleep();
return max;
}
void ThreadPool::createThreads()
{
threads.clear();
for(int i = 0; i < threads_count; ++i)
threads.push_back(std::thread(&ThreadPool::threadMethod, this));
}
void ThreadPool::threadMethod()
{
while(running)
runWorker();
}
void ThreadPool::runWorker()
{
ThreadTask * task = getTask();
processTask(task);
}
void ThreadPool::processTask(ThreadTask * task)
{
if(task == NULL)
return;
//do something to simulate processing
vector<int> v;
for(int i = 0; i < task->size; ++i)
v.push_back(i);
delete task;
}
void ThreadPool::addTask(int s)
{
ThreadTask * task = new ThreadTask(s);
std::lock_guard<std::mutex> lock(variable_mutex);
tasks.push(task);
task_ready.notify_one();
}
ThreadTask * ThreadPool::getTask()
{
std::unique_lock<std::mutex> lck(variable_mutex);
if(tasks.empty())
{
++sleeping;
task_ready.wait(lck);
--sleeping;
if(tasks.empty()) //in case of ThreadPool being deleted (destructor calls notify_all), or spurious notifications
return NULL; //return to main loop and repeat it
}
ThreadTask * task = tasks.front();
tasks.pop();
return task;
}
bool ThreadPool::isQueueEmpty()
{
std::lock_guard<std::mutex> lock(variable_mutex);
return tasks.empty();
}
bool ThreadPool::isTaskAvailable()
{
return !isQueueEmpty();
}
void ThreadPool::waitForThreadsToSleep()
{
while(isTaskAvailable())
std::this_thread::yield(); //wait for all tasks to be taken
while(true) //wait for all threads to finish they last tasks
{
if(sleeping == threads_count)
break;
std::this_thread::yield();
}
}
void ThreadPool::clearTasks()
{
std::unique_lock<std::mutex> lock(variable_mutex);
while(!tasks.empty()) tasks.pop();
}
void ThreadPool::reset() //don't call this when var_mutex is already locked by this thread!
{
clearTasks();
waitForThreadsToSleep();
max = FLT_MIN;
}
how it's tested:
ThreadPool tp(2);
tp.start();
int iterations = 1000;
int task_size = 1000;
for(int j = 0; j < iterations; ++j)
{
printf("\r%d left", iterations - j);
tp.reset();
for(int i = 0; i < 1000; ++i)
tp.addTask(task_size);
tp.getValue();
}
return 0;
I've build this code with mingw with gcc 4.8.1 (from here) and Visual Studio 2012 (VC11) on Win7 64, both on debug configuration.
Two programs build with mentioned compilers behave totally different.
a) program build with mingw works much faster than one build on VS, when it can take whole processor (system shows almost 100% CPU usage by this process, so i don't think mingw is secretly setting affinity to one core). But when i run auxiliary program (using 50% of CPU), it slows down greatly (about several dozen times). CPU usage in this case is about 50%-50% for main program and auxiliary one.
b) program build with VS 2012, when using whole CPU, is even slower than a) with slowdown (when i set task_size = 1, their speeds were similiar). But when auxiliary is running, main program even takes most of CPU (usage is about 66% main - 33% aux) and resulting slow down is barely noticeable.
When set to use only one core, both programs speed up noticeable (about 1.5 - 2 times), and mingw one stops being vulnerable to competition.
Well, now i don't know what to do. My program behaves differently when build by two different toolsets. Is this a flaw in my code (which is suppose is true), or something to do with compilers having problems with c++11 ?

Win32 Read/Write Lock Using Only Critical Sections

I have to implement a read/write lock in C++ using the Win32 api as part of a project at work. All of the existing solutions use kernel objects (semaphores and mutexes) that require a context switch during execution. This is far too slow for my application.
I would like implement one using only critical sections, if possible. The lock does not have to be process safe, only threadsafe. Any ideas on how to go about this?
If you can target Vista or greater, you should use the built-in SRWLock's. They are lightweight like critical sections, entirely user-mode when there is no contention.
Joe Duffy's blog has some recent entries on implementing different types of non-blocking reader/writer locks. These locks do spin, so they would not be appropriate if you intend to do a lot of work while holding the lock. The code is C#, but should be straightforward to port to native.
You can implement a reader/writer lock using critical sections and events - you just need to keep enough state to only signal the event when necessary to avoid an unnecessary kernel mode call.
I don't think this can be done without using at least one kernel-level object (Mutex or Semaphore), because you need the help of the kernel to make the calling process block until the lock is available.
Critical sections do provide blocking, but the API is too limited. e.g. you cannot grab a CS, discover that a read lock is available but not a write lock, and wait for the other process to finish reading (because if the other process has the critical section it will block other readers which is wrong, and if it doesn't then your process will not block but spin, burning CPU cycles.)
However what you can do is use a spin lock and fall back to a mutex whenever there is contention. The critical section is itself implemented this way. I would take an existing critical section implementation and replace the PID field with separate reader & writer counts.
Old question, but this is something that should work. It doesn't spin on contention. Readers incur limited extra cost if they have little or no contention, because SetEvent is called lazily (look at the edit history for a more heavyweight version that doesn't have this optimization).
#include <windows.h>
typedef struct _RW_LOCK {
CRITICAL_SECTION countsLock;
CRITICAL_SECTION writerLock;
HANDLE noReaders;
int readerCount;
BOOL waitingWriter;
} RW_LOCK, *PRW_LOCK;
void rwlock_init(PRW_LOCK rwlock)
{
InitializeCriticalSection(&rwlock->writerLock);
InitializeCriticalSection(&rwlock->countsLock);
/*
* Could use a semaphore as well. There can only be one waiter ever,
* so I'm showing an auto-reset event here.
*/
rwlock->noReaders = CreateEvent (NULL, FALSE, FALSE, NULL);
}
void rwlock_rdlock(PRW_LOCK rwlock)
{
/*
* We need to lock the writerLock too, otherwise a writer could
* do the whole of rwlock_wrlock after the readerCount changed
* from 0 to 1, but before the event was reset.
*/
EnterCriticalSection(&rwlock->writerLock);
EnterCriticalSection(&rwlock->countsLock);
++rwlock->readerCount;
LeaveCriticalSection(&rwlock->countsLock);
LeaveCriticalSection(&rwlock->writerLock);
}
int rwlock_wrlock(PRW_LOCK rwlock)
{
EnterCriticalSection(&rwlock->writerLock);
/*
* readerCount cannot become non-zero within the writerLock CS,
* but it can become zero...
*/
if (rwlock->readerCount > 0) {
EnterCriticalSection(&rwlock->countsLock);
/* ... so test it again. */
if (rwlock->readerCount > 0) {
rwlock->waitingWriter = TRUE;
LeaveCriticalSection(&rwlock->countsLock);
WaitForSingleObject(rwlock->noReaders, INFINITE);
} else {
/* How lucky, no need to wait. */
LeaveCriticalSection(&rwlock->countsLock);
}
}
/* writerLock remains locked. */
}
void rwlock_rdunlock(PRW_LOCK rwlock)
{
EnterCriticalSection(&rwlock->countsLock);
assert (rwlock->readerCount > 0);
if (--rwlock->readerCount == 0) {
if (rwlock->waitingWriter) {
/*
* Clear waitingWriter here to avoid taking countsLock
* again in wrlock.
*/
rwlock->waitingWriter = FALSE;
SetEvent(rwlock->noReaders);
}
}
LeaveCriticalSection(&rwlock->countsLock);
}
void rwlock_wrunlock(PRW_LOCK rwlock)
{
LeaveCriticalSection(&rwlock->writerLock);
}
You could decrease the cost for readers by using a single CRITICAL_SECTION:
countsLock is replaced with writerLock in rdlock and rdunlock
rwlock->waitingWriter = FALSE is removed in wrunlock
wrlock's body is changed to
EnterCriticalSection(&rwlock->writerLock);
rwlock->waitingWriter = TRUE;
while (rwlock->readerCount > 0) {
LeaveCriticalSection(&rwlock->writerLock);
WaitForSingleObject(rwlock->noReaders, INFINITE);
EnterCriticalSection(&rwlock->writerLock);
}
rwlock->waitingWriter = FALSE;
/* writerLock remains locked. */
However this loses in fairness, so I prefer the above solution.
Take a look at the book "Concurrent Programming on Windows" which has lots of different reference examples for reader/writer locks.
Check out the spin_rw_mutex from Intel's Thread Building Blocks ...
spin_rw_mutex is strictly in user-land
and employs spin-wait for blocking
This is an old question but perhaps someone will find this useful. We developed a high-performance, open-source RWLock for Windows that automatically uses Vista+ SRWLock Michael mentioned if available, or otherwise falls back to a userspace implementation.
As an added bonus, there are four different "flavors" of it (though you can stick to the basic, which is also the fastest), each providing more synchronization options. It starts with the basic RWLock() which is non-reentrant, limited to single-process synchronization, and no swapping of read/write locks to a full-fledged cross-process IPC RWLock with re-entrance support and read/write de-elevation.
As mentioned, they dynamically swap out to the Vista+ slim read-write locks for best performance when possible, but you don't have to worry about that at all as it'll fall back to a fully-compatible implementation on Windows XP and its ilk.
If you already know of a solution that only uses mutexes, you should be able to modify it to use critical sections instead.
We rolled our own using two critical sections and some counters. It suits our needs - we have a very low writer count, writers get precedence over readers, etc. I'm not at liberty to publish ours but can say that it is possible without mutexes and semaphores.
Here is the smallest solution that I could come up with:
http://www.baboonz.org/rwlock.php
And pasted verbatim:
/** A simple Reader/Writer Lock.
This RWL has no events - we rely solely on spinlocks and sleep() to yield control to other threads.
I don't know what the exact penalty is for using sleep vs events, but at least when there is no contention, we are basically
as fast as a critical section. This code is written for Windows, but it should be trivial to find the appropriate
equivalents on another OS.
**/
class TinyReaderWriterLock
{
public:
volatile uint32 Main;
static const uint32 WriteDesireBit = 0x80000000;
void Noop( uint32 tick )
{
if ( ((tick + 1) & 0xfff) == 0 ) // Sleep after 4k cycles. Crude, but usually better than spinning indefinitely.
Sleep(0);
}
TinyReaderWriterLock() { Main = 0; }
~TinyReaderWriterLock() { ASSERT( Main == 0 ); }
void EnterRead()
{
for ( uint32 tick = 0 ;; tick++ )
{
uint32 oldVal = Main;
if ( (oldVal & WriteDesireBit) == 0 )
{
if ( InterlockedCompareExchange( (LONG*) &Main, oldVal + 1, oldVal ) == oldVal )
break;
}
Noop(tick);
}
}
void EnterWrite()
{
for ( uint32 tick = 0 ;; tick++ )
{
if ( (tick & 0xfff) == 0 ) // Set the write-desire bit every 4k cycles (including cycle 0).
_InterlockedOr( (LONG*) &Main, WriteDesireBit );
uint32 oldVal = Main;
if ( oldVal == WriteDesireBit )
{
if ( InterlockedCompareExchange( (LONG*) &Main, -1, WriteDesireBit ) == WriteDesireBit )
break;
}
Noop(tick);
}
}
void LeaveRead()
{
ASSERT( Main != -1 );
InterlockedDecrement( (LONG*) &Main );
}
void LeaveWrite()
{
ASSERT( Main == -1 );
InterlockedIncrement( (LONG*) &Main );
}
};
I wrote the following code using only critical sections.
class ReadWriteLock {
volatile LONG writelockcount;
volatile LONG readlockcount;
CRITICAL_SECTION cs;
public:
ReadWriteLock() {
InitializeCriticalSection(&cs);
writelockcount = 0;
readlockcount = 0;
}
~ReadWriteLock() {
DeleteCriticalSection(&cs);
}
void AcquireReaderLock() {
retry:
while (writelockcount) {
Sleep(0);
}
EnterCriticalSection(&cs);
if (!writelockcount) {
readlockcount++;
}
else {
LeaveCriticalSection(&cs);
goto retry;
}
LeaveCriticalSection(&cs);
}
void ReleaseReaderLock() {
EnterCriticalSection(&cs);
readlockcount--;
LeaveCriticalSection(&cs);
}
void AcquireWriterLock() {
retry:
while (writelockcount||readlockcount) {
Sleep(0);
}
EnterCriticalSection(&cs);
if (!writelockcount&&!readlockcount) {
writelockcount++;
}
else {
LeaveCriticalSection(&cs);
goto retry;
}
LeaveCriticalSection(&cs);
}
void ReleaseWriterLock() {
EnterCriticalSection(&cs);
writelockcount--;
LeaveCriticalSection(&cs);
}
};
To perform a spin-wait, comment the lines with Sleep(0).
Look my implementation here:
https://github.com/coolsoftware/LockLib
VRWLock is a C++ class that implements single writer - multiple readers logic.
Look also test project TestLock.sln.
UPD. Below is the simple code for reader and writer:
LONG gCounter = 0;
// reader
for (;;) //loop
{
LONG n = InterlockedIncrement(&gCounter);
// n = value of gCounter after increment
if (n <= MAX_READERS) break; // writer does not write anything - we can read
InterlockedDecrement(&gCounter);
}
// read data here
InterlockedDecrement(&gCounter); // release reader
// writer
for (;;) //loop
{
LONG n = InterlockedCompareExchange(&gCounter, (MAX_READERS+1), 0);
// n = value of gCounter before attempt to replace it by MAX_READERS+1 in InterlockedCompareExchange
// if gCounter was 0 - no readers/writers and in gCounter will be MAX_READERS+1
// if gCounter was not 0 - gCounter stays unchanged
if (n == 0) break;
}
// write data here
InterlockedExchangeAdd(&gCounter, -(MAX_READERS+1)); // release writer
VRWLock class supports spin count and thread-specific reference count that allows to release locks of terminated threads.