IPC through a SHM for C++ and concurrent memory accesses

IPC through a SHM for C++ and concurrent memory accesses - c++

I am trying to improve the performance of a framework I use.
Currently it makes use of a shared memory space(shm), to allow for inter-process communication between two C++ threads. Control to the SHM is passed through a semaphore. This is the current system which works quite well, albeit slightly slower than what I'd like, and with a need of flags for communication.
I have thought of using a master/slave configuration with the signals only being driven by either side. Hence a signal such as Slave_Ready would be written by the slave, and read by the master to show that the slave can take a request.
I would expect this behaviour to be supported, since only one side is writing to a signal ever. However, when the slave is polling the master-driven signals, the master seems unable to change the values of the signals. I've done this in Eclipse, and when I try to step through the write instruction it just doesn't get executed. This is what it looks like :
shmp->MREADY = true; // in the same time the slave is polling this signal.
So this instruction never goes through.
By my understanding, a read/write should be irrelevant. The write should go through, or it should be handled as an atomic request by the memory controller. Even if a read happens half-way through a write, I do not have an issue with corrupted data. If the read sees a true, it goes through and accesses data that was written before the ready signal was asserted. If it reads a false, it accesses the signals the next cycle. Either way the data integrity is preserved. Hence I would expect that this would work without issue, however there is obviously something at play here.
Is it that concurrent read/writes aren't supported? Does the constant spam of polling read request drown out the write request?

I would suggest using C++11's std::atomic<> for the MREADY member. Or at least issue std::atomic_thread_fence(std::memory_order_acq_rel) after the assignment.

Related

`sqlite3` ignores `sqlite3_busy_timeout`?

I use sqlite3 in multiple threads application (it is compiled with SQLITE_THREADSAFE=2). In watch window I see that sqlite->busyTimeout == 600000, i. e. that it is supposed to have 10 minutes timeout. However, sqlite3_step returns SQLITE_BUSY obvoiusly faster than after 10 minutes (it returns instantly actually, like if I never called sqlite3_busy_timeout).
What is the reason that sqlite3 ignores timeout and return error instantly?

One possibility: SQLite ignores the timeout when it detects a deadlock.
The scenario is as follows. Transaction A starts as a reader, and later attempts to perform a write. Transaction B is a writer (either started that way, or started as a reader and promoted to a writer first). B holds a RESERVED lock, waiting for readers to clear so it can start writing. A holds a SHARED lock (it's a reader) and tries to acquire RESERVED lock (so it can start writing). For description of various lock types, see http://sqlite.org/lockingv3.html
The only way to make progress in this situation is for one of the transactions to roll back. No amount of waiting will help, so when SQLite detects this situation, it doesn't honor the busy timeout.
There are two ways to avoid the possibility of a deadlock:
Switch to WAL mode - it allows one writer to co-exist with multiple readers.
Use BEGIN IMMEDIATE to start a transaction that may eventually need to write - this way, it starts as a writer right away. This of course reduces the potential concurrency in the system, as the price of avoiding deadlocks.

I made a lot of tests and share them here for other people who uses SQLite in multithreaded environment. SQLite threading support is not well documented, there are not any good tutorial that describes all threading issues in one place. I made a test program that creates 100 threads and sends random queries (INSERT, SELECT, UPDATE, DELETE) concurrently to single database. My answer is based on this program observation.
The only really thread safe journal mode is WAL. It allows multiple connections to do anything they need for the same database within one proces in the same manner as single threaded application does. Any other modes are not thread safe independently from timeouts, busy handlers and SQLITE_THREADSAFE preprocessor definition. They generate SQLITE_BUSY periodically, and it looks like complex programming task to expect such error always and handle it always. If you need thread safe SQLite that never returns SQLITE_BUSY like signle thread does, you have to set WAL journal mode.
Additionally, you have to set SQLITE_THREADSAFE=2 or SQLITE_THREADSAFE=1 preprocessor definition.
When done, you have to choose from 2 options:
You can call sqlite3_busy_timeout. It is enough, you are not required to call sqlite3_busy_handler, even from documentation it is not obvious. It gives you "default", "built-in" timeout functionality.
You can call sqlite3_busy_handler and implement timeout yourself. I don't see why, but may be under some nonstandard OS it is required. When you call sqlite3_busy_handler, it resets timeout to 0 (i. e. disabled). For desktop Linux & Windows you don't need it unless you like to write more complex code.

C/C++ - ring buffer in shared memory (POSIX compatible)

I've an application where producers and consumers ("clients") want to send broadcast messages to each other, i.e. a n:m relationship. All could be different programs so they are different processes and not threads.
To reduce the n:m to something more maintainable I was thinking of a setup like introducing a little, central server. That server would offer an socket where each client connects to.
And each client would send a new message through that socket to the server - resulting in 1:n.
The server would also offer a shared memory that is read only for the clients. It would be organized as a ring buffer where the new messages would be added by the server and overwrite older ones.
This would give the clients some time to process the message - but if it's too slow it's bad luck, it wouldn't be relevant anymore anyway...
The advantage I see by this approach is that I avoid synchronisation as well as unnecessary data copying and buffer hierarchies, the central one should be enough, shouldn't it?
That's the architecture so far - I hope it makes sense...
Now to the more interesting aspect of implementing that:
The index of the newest element in the ring buffer is a variable in shared memory and the clients would just have to wait till it changes. Instead of a stupid while( central_index == my_last_processed_index ) { /* do nothing */ } I want to free CPU resources, e.g. by using a pthread_cond_wait().
But that needs a mutex that I think I don't need - on the other hand Why do pthreads’ condition variable functions require a mutex? gave me the impression that I'd better ask if my architecture makes sense and could be implemented like that...
Can you give me a hint if all of that makes sense and could work?
(Side note: the client programs could also be written in the common scripting languages like Perl and Python. So the communication with the server has to be recreated there and thus shouldn't be too complicated or even proprietary)

If memory serves, the reason for the mutex accompanying a condition variable is that under POSIX, signalling the condition variable causes the kernel to wake up all waiters on the condition variable. In these circumstances, the first thing that consumer threads need to do is check is that there is something to consume - by means of accessing a variable shared between producer and consumer threads. The mutex protects against concurrent access to the variable used for this purpose. This of course means that if there are many consumers, n-1 of them are needless awoken.
Having implemented precisely the arrangement described above, the choice of IPC object to use is not obvious. We were buffering audio between high priority real-time threads in separate processes, and didn't want to block the consumer. As the audio was produced and consumed in real-time, we were already getting scheduled regularly on both ends, and if there wasn't to consume (or space to produce into) we trashed the data because we'd already missed the deadline.
In the arrangement you describe, you will need a mutex to prevent the consumers concurrently consuming items that are queued (and believe me, on a lightly loaded SMP system, they will). However, you don't need to have the producer contend on this as well.
I don't understand you comment about the consumer having read-only access to the shared memory. In the classic lockless ring buffer implementation, the producer writes the queue tail pointer and the consumer(s) the head - whilst all parties need to be able to read both.
You might of course arrange for the queue head and tails to be in a different shared memory region to the queue data itself.
Also be aware that there is a theoretical data coherency hazard on SMP systems when implementing a ring buffer such as this - namely that write-back to memory of the queue content with respect to the head or tail pointer may occur out of order (they in cache - usually per-CPU core). There are other variants on this theme to do with synchonization of caches between CPUs. To guard against these, you need to an memory, load and store barriers to enforce ordering. See Memory Barrier on Wikipedia. You explicitly avoid this hazard by using kernel synchronisation primitives such as mutex and condition variables.
The C11 atomic operations can help with this.

You do need a mutex on a pthread_cond_wait() as far as I know. The reason is that pthread_cond_wait() is not atomic. The condition variable could change during the call, unless it's protected by a mutex.
It's possible that you can ignore this situation - the client might sleep past message 1, but when the subsequent message is sent then the client will wake up and find two messages to process. If that's unacceptable then use a mutex.

You probably can have a bit of different design by using sem_t if your system has them; some POSIX systems are still stuck on the 2001 version of POSIX.
You probably don't forcably need a mutex/condition pair. This is just how it was designed long time ago for POSIX.
Modern C, C11, and C++, C++11, now brings you (or will bring you) atomic operations, which were a feature that is implemented in all modern processors, but lacked support from most higher languages. Atomic operations are part of the answer for resolving a race condition for a ring buffer as you want to implement it. But they are not sufficient because with them you can only do active wait through polling, which is probably not what you want.
Linux, as an extension to POSIX, has futex that resolves both problems: to avoid races for updates by using atomic operations and the ability to putting waiters to sleep via a system call. Futexes are often considered as being too low level for everyday programming, but I think that it actually isn't too difficult to use them. I have written up things here.

VNC viewer implementation

Our team is implementing a VNC viewer (=VNC client) on Windows. The protocol (called RFB) is stateful, meaning that the viewer has to read 1 byte, see what it is, then read either 3 or 10 bytes more, parse them, and so on.
We've decided to use asynchronous sockets and a single (UI) thread. Consequently, there are 2 ways to go:
1) state machine -- if we get a block on socket reading, just remember the current state and quit. Later on, a socket notification will arrive and the interrupted logic will resume from the proper stage;
2) inner message loop -- once we determine that reading from the socket would block, we enter an inner message loop and spin there until all the necessary data is finally received.
UI is not thus frozen in case of a block.
As experience showed, the second approach is bad, as any message can come while we're in the inner message loop. I cannot tell the full story here, but it simply is not reliable enough. Crashes and kludges.
The first option seems to be quite acceptable, but it is not easy to program in such a style. One has to remember the state of an algorithm and values of all the local variables required for further processing.
This is quite possible to use multiple threads, but we just thought that the problems in this case would be even much harder: synchronization of frame-buffer access, multi-threading issues, etc. Moreover, even in this variant it seems necessary to use asynchronous sockets as well.
So, what way is in your opinion the best ?
The problem is quite a general one. This is the problem of organizing asynchronous communication through stateful protocols.
Edit 1: We use C++ and MFC as UI framework.

I've done a few parallel computing projects and it seems that MPI (Message Passing Interface) might be helpful to your VNC project. You're probably not so interested in the parallel computing power provided by MPI, but you may want to use the simplified socket-like interface for asynchronous communication over a network.
http://www.open-mpi.org/
You can find other implementations of MPI and tons of use examples from google.

Don't bother with CSocket, you'll move to CAsyncSocket in the end because of the extra control you get (interrupting, shutting down etc.). I'd also recommend using a separate thread to manage the communication, it adds complexity but keeping the UI responsive should be a top priority.

I think you will find that your design will be simplified greatly by using a separate thread to handle a blocking socket.
The main reason for this is you don't need to spin and wait. The UI remains responsive while the network thread(s) block when it has nothing to do and comes back when it has stuff to do. You are effectively offloading a large portion of your overhead to the OS.
Remember, RFB does not require a whole lot of state info to work. Because client to server messages are short; there is nothing requiring you to receive a frame buffer before you send your next pointer input.
My point being is messages in RFB can be intermixed; the server will work on your schedule.
Now, Windows provides easy to use synchronization API's that while not always the most efficient, are more than enough for your purposes and will ease getting a proof of concept up and going.
Take a look at Windows Synchronization and specifically Critical Sections
Just my 2cents, I've implemented both a vnc server and client on windows, these were my impressions.

Proper message queue usage in POSIX

I'm quite bewildered by the use of message queues in realtime OS. The code that was given seems to have message queues used down to the bone: even passing variables to another class object is done through MQ. I always have a concept of MQ used in IPC. Question is: what is a proper use of a message queue?

In realtime OS environments you often face the problem that you have to guarantee execution of code at a fixed schedule. E.g. you may have a function that gets called exactly each 10 milliseconds. Not earlier, not later.
To guarantee such hard timing constraints you have to write code that must not block the time critical code under any circumstances.
The posix thread synchronization primitives from cannot be used here.
You must never lock a mutex or aqurie a semaphore from time critical code because a different process/thread may already have it locked. However, often you are allowed to unblock some other thread from time critical code (e.g. releasing a semaphore is okay).
In such environments message queues are a nice choice to exchange data because they offer a clean way to pass data from one thread to another without ever blocking.
Using queues to just set variables may sound like overkill, but it is very good software design. If you do it that way you have a well-defined interface to your time critical code.
Also it helps to write deterministic code because you'll never run into the problem of race-conditions. If you set variables via message-queues you can be sure that the time critical code sees the messages in the same order as they have been sent. When mixing direct memory access and messages you can't guarantee this.

Message Queues are predominantly used as an IPC Mechanism, whenever there needs to be exchange of data between two different processes. However, sometimes Message Queues are also used for thread context switching. For eg:
You register some callback with a software layer which sits on top of driver. The callback is returned to you in the context of the driver. It is a thread spawned by the driver. Now you cannot hog this thread of driver by doing a lot of processing in it. So one may add the data returned in callback in a message Queue, which has application threads blocked on it for performing the processing on the data.
I dont see why one should use Message Queues for replacing just normal function calls.

Is there a way to abort an SQLite call?

I'm using SQLite3 in a Windows application. I have the source code (so-called SQLite amalgamation).
Sometimes I have to execute heavy queries. That is, I call sqlite3_step on a prepared statement, and it takes a lot of time to complete (due to the heavy I/O load).
I wonder if there's a possibility to abort such a call. I would also be glad if there was an ability to do some background processing in the middle of the call within the same thread (since most of the time is spent in waiting for the I/O to complete).
I thought about modifying the SQLite code myself. In the simplest scenario I could check some condition (like an abort event handle for instance) before every invocation of either ReadFile/WriteFile, and return an error code appropriately. And in order to allow the background processing the file should be opened in the overlapped mode (this enables asynchronous ReadFile/WriteFile).
Is there a chance that interruption of WriteFile may in some circumstances leave the database in the inconsistent state, even with the journal enabled? I guess not, since the whole idea of the journal file is to be prepared for any error of any kind. But I'd like to hear more opinions about this.
Also, did someone tried something similar?
Thanks in advance.
EDIT:
Thanks to ereOn. I wasn't aware of the existence of sqlite3_interrupt. This probably answers my question.
Now, for all of you who wonders how (and why) one expects to do some background processing during the I/O within the same thread.
Unfortunately not many people are familiar with so-called "Overlapped I/O".
http://en.wikipedia.org/wiki/Overlapped_I/O
Using it one issues an I/O operation asynchronously, and the calling thread is not blocked. Then one receives the I/O completion status using one of the completion mechanisms: waitable event, new routine queued into the APC, or the completion port.
Using this technique one doesn't have to create extra threads. Actually the only real legitimation for creating threads is when your bottleneck is the computation time (i.e. CPU load), and the machine has several CPUs (or cores).
And creating a thread just to let it be blocked by the OS most of the time - this doesn't make sense. This leads to the unjustified waste of the OS resources, complicates the program (need for synchronization and etc.).
Unfortunately not all the libraries/APIs allow asynchronous mode of operation, thus making creating extra threads the necessarily evil.
EDIT2:
I've already found the solution, thansk to ereOn.
For all those who nevertheless insist that it's not worth doing things "in background" while "waiting" for the I/O to complete using overlapped I/O. I disagree, and I think there's no point to argue about this. At least this is not related to the subject.
I'm a Windows programmer (as you may noticed), and I have a very extensive experience in all kinds of multitasking. Plus I'm also a driver writer, so that I also know how things work "behind the scenes".
I know that it's a "common practice" to create several threads to do several things "in parallel". But this doesn't mean that this is a good practice. Please allow me not to follow the "common practice".

I don't understand why you want the interruption to come from the same thread and I even don't understand how that would be possible: if the current thread is blocked, waiting for some IO, you can't execute any other code. (Yeah, that's what "blocked" means)
Perhaps if you give us more hints about why you want this, we might help further.
Usually, I use sqlite3_interrupt() to cancel calls. But this, obviously, involves that the call is made from another thread.

By default, SQLite is threadsafe. It sounds to me like the easiest thing to do would be to start the Sqlite command on a background thread, and let SQLite to the necessary locking to have that work.
From your perspective then, the sqlite call looks like an asynchronous bit of I/O, and you can continue normal processing on this thread, such as e.g. using a loop including interruptible sleep and a bit of occasional background processing (e.g. to update a liveness indicator). When the SQLite statement completes, the background thread should set a state variable to indicate this, wake the main thread (if necessary), and terminate.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js