C++ Boost.Asio - tcp socket asynchronous write - c++

Scenario:
Inside object A (thread A), boost::asio::ip::tcp::socket is being read from and written to asynchronously.
Object B (thread B) posts data to object A's data queue.
Object A should write the data in its data queue as soon as possible.
How to achieve the third point efficiently?
Right now I'm doing this:
There might be no data in the queue.
socket->async_send(data, handler);
inside handler: back to point two.
I'm worried that this approach is highly inefficient - calling async_send with zero-length data most of the time until actual data can be sent.
Might it be that a better approach would be to have an additional thread inside object A that performs synchronous writes on the socket as soon as new data is posted? Peforming the write from object B's thread is out of question.

Well firstly, unless you have a good reason to do I personally wouldn't break it down into 1 thread per object.
Instead, have a shared io_service (just pass it in by reference to both A and B ctors. Then have a single thread on the io_serice.run().
Assuming one of the objects is also async_reading, you needn't be writing 0 length datums and creating a loop in the handler. Just schedule the async_write as an when data comes in.

"Object A should write the data in its data queue as soon as possible" may be understood as waiting for a C++ future, so you check that answer and that boost::asio::example and last but not least I presume that some improvements will be required on your "data queue" you could have a look to that answer.

Related

safely distributing a pointer update between threads

tl;dr:
class Controller
{
public:
volatile Netconsole* nc;
void init(); //initialize the threads
void calculate(); // handler for the "mothership app"
void senderThreadLoop(); //also calls reinitNet() if connection is broken.
void listenerThreadLoop();
inline void reinitNet(){ delete nc; nc = new Netconsole(); }
}
// inside
Json::Value header = nc->Recv();
error: passing 'volatile Netconsole' as 'this' argument discards qualifiers [-fpermissive]
Pointer to an instance of a utility class (Netconsole) shared between two threads must be updated inside both threads if the utility class is re-instantiated, but declaring it as volatile generates the above error. If it's updated just inside one thread, the other thread may still use old, invalid pointer. How to assure it's updated in both but using methods through the pointer doesn't trigger the above error?
Extended info:
The "smart glue logic" library I'm writing is used to pass and convert messages between a 3rd party software and a custom device. It consists of three essential threads:
a handler: the main thread of the 3rd party app periodically calls a "calculate" function in my library to handle new updates - data to send, data received
a sender thread that converts and sends whatever the handler pushed into the send buffer
a listener thread that converts and pushes any data received from the device into receive buffer.
Both the sender and the listener threads use the same utility class that handles network communication with the device; upon initialization the class creates a connection to the device, and the two threads perform blocking reads or await for new data to send respectively. In case of any problems, the sender thread performs all "maintenance" work, while the listener thread enters a safe state awaiting return of connectivity.
Now, since the two threads share one connection to the device, they both share the same instance of the communication class, as a pointer to that class.
The problem is in the procedure of reconnect - it involves destroying and creating the helper class instance exploiting safe shutdown and initialization already present in the destructor and constructor. As result the pointer changes. Without volatile it's quite likely the listener won't receive the updated pointer. With volatile, it protests - needlessly, because nc (the pointer) won't change at a random moment - first the listener is notified of a problem, then it enters a safe state where it doesn't perform any operations on 'nc' and notifies the sender it's ready. Only then the sender performs the repair and notifies the listener to resume normal operation.
So what's the right solution in this situation?
What you need is a sequence of operations. The producing thread has 2 relevant operations : "initialize new Netconsole" and "write pointer". The consuming thread also has two operations: "read pointer" and "use new Netconsole object". Those 4 operations must be sequenced in exactly that order for the update to be visible.
By far the simplest way to achieve this are two memory barriers. A write barrier (std::memory_order_release on the pointer write) prevents the first two operations from being reordered, and the read barrier (std::memory_order_acquire on the pointer load) prevents the last two operations from being reordered.
As the two threads run independently, your program correctness shouldn't depend on whether a particular object update happened before a particular object use. The updating thread might just have been a bit slow, and that should not break your program. So the third ordering between write and read isn't really relevant and you shouldn't try to "fix" it.
To summarize: Yes, the 4 operations have to happen in exactly the right order for the result to be visible, but if the second and third operation are
reordered then the update is perfectly invisible to the consuming thread. It's an atomic update, all or nothing.
There's still a matter of cleaning up the old object. The producing thread cannot just assume that the consuming thread has already seen the pointer update. There must be synchronization to ensure both threads agree that the old object is unused. The easiest is if the producing thread strictly does not use the old object after the new object has been created (the memory barrier helps here), and the consuming thread cleans up the old object as soon as it knows there's a new object (because that happens strictly after the read barrier, thus after the write barrier and in turn after the last use by the producing thread)

What is the best way to implement an echo server with async i/o and IOCP?

As we all know, an echo server is a server that reads from a socket, and writes that very data into another socket.
Since Windows I/O Completion ports give you different ways to do things, I was wondering what is the best way (the most efficient) to implement an echo server. I'm sure to find someone who tested the ways I will describe here, and can give his/her contribute.
My classes are Stream which abstracts a socket, named pipe, or whatever, and IoRequest which abstracts both an OVERLAPPED structure and the memory buffer to do the I/O (of course, suitable for both reading and writing). In this way when I allocate an IoRequest I'm simply allocating memory for memory buffer for data + OVERLAPPED structure in one shot, so I call malloc() only once.
In addition to this, I also implement fancy and useful things in the IoRequest object, such as an atomic reference counter, and so on.
Said that, let's explore the ways to do the best echo server:
-------------------------------------------- Method A. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Let's copy the buffer just received with the "reader" IoRequest to the "writer" IoRequest. (this will involve a memcpy() or whatever).
3) Let's fire again a new reading with ReadFile() in the "reader", with the same IoRequest used for reading.
4) Let's fire a new writing with WriteFile() in the "writer".
-------------------------------------------- Method B. ------------------------------------------
1) The "reader" socket completes its reading, the IOCP callback returns, and you have an IoRequest just completed with the memory buffer.
2) Instead of copying data, pass that IoRequest to the "writer" for writing, without copying data with memcpy().
3) The "reader" now needs a new IoRequest to continue reading, allocate a new one or pass one already allocated before, maybe one just completed for writing before the new writing does happen.
So, in the first case, every Stream objects has its own IoRequest, data is copied with memcpy() or similar functions, and everything works fine.
In the second case the 2 Stream objects do pass IoRequest objects each other, without copying data, but its a little bit more complex, you have to manage the "swapping" of IoRequest objects between the 2 Stream objects, with the possible drawback to get synchronization problems (what about those completions do happen in different threads?)
My questions are:
Q1) Is avoiding copying data really worth it!?
Copying 2 buffers with memcpy() or similar, is very fast, also because the CPU cache is exploited for this very purpose.
Let's consider that with the first method, I have the possibility to echo from a "reader" socket to multiple "writer" sockets, but with the second one I can't do that, since I should create N new IoRequest objects for each N writers, since each WriteFile() needs its own OVERLAPPED structure.
Q2) I guess that when I fire a new N writings for N different sockets with WriteFile(), I have to provide N different OVERLAPPED structure AND N different buffers where to read the data.
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking the data from the same buffer for the N sockets?
Is avoiding copying data really worth it!?
Depends on how much you are copying. 10 bytes, not so much. 10MB, then yes, it's worth avoiding the copying!
In this case, since you already have an object that contains the rx data and an OVERLAPPED block, it seems somewhat pointless to copy it - just reissue it to WSASend(), or whatever.
but with the second one I can't do that
You can, but you need to abstract the 'IORequest' class from a 'Buffer' class. The buffer holds the data, an atomic int reference-count and any other management info for all calls, the IOrequest the OVERLAPPED block and a pointer to the data and any other management information for each call. This information could have an atomic int reference-count for the buffer object.
The IOrequest is the class that is used for each send call. Since it contains only a pointer to the buffer, there is no need to copy the data and so it's reasonably small and O(1) to data size.
When the tx completions come in, the handler threads get the IOrequest, deref the buffer and dec the atomic int in it towards zero. The thread that manages to hit 0 knows that the buffer object is no longer needed and can delete it, (or, more likely, in a high-performance server, repool it for later reuse).
Or, I can fire N WriteFile() calls with N different OVERLAPPED taking
the data from the same buffer for the N sockets?
Yes, you can. See above.
Re. threading - sure, if your 'management data' can be reached from multiple completion-handler threads, then yes, you may want to protect it with a critical-section, but an atomic int should do for the buffer refcount.

Is this method of inter-thread-communication safe?

I have 3 objects(inherited from QObject) that each contain a separate std::list. Each object gets created in the main gui thread (with no parent) and then is pushed to it's own thread (using Qt's QObject::moveToThread()).
Each thread is hooked up to a gui and messages are sent between the different threads with data. Each thread is to essentially handle it's own list. For example:
Obj 1 : Consumer of data. It pop's the front off of its list(if data is present) to use. It also has a SLOT available so that other threads can push data to it. No other object can access this list directly only the the original QObject class.
Obj 2 : Producer of data. It pushes data to its list. It has SLOTS available for others to 'ping' it for data which will in turn emit a SIGNAL popping data from its list. No other object can access this list directly.
Obj 3: Produces data for obj 1 and consumes data from obj 2. It has it's own internal data structures that keep track of the data sent to obj 1 and the data coming from obj 2. It finally will push both data sets to some QwtPlots after it does some analysis.
Obj's 1 and 2 are real-time critial and use QueryPerformanceCounter style 'timing' which will essentially suck down a CPU each while they're running. They run QCoreApplication::processEvents() every loop to handle the events that come down through.
Is this an okay way to handle cross-thread data sharing? If it isn't, where are the holes and how would you correct them? I understand this will create a lot of 'copies' of data flying around, but memory bloat isn't a concern at this point.
thanks in advance :)
It's hard to say exactly whether it's thread-safe or not without all the implementation details as there are a lot of things that can go wrong when using threads.
Obj 1 : Consumer of data. It pop's the front off of its list(if data is present) to use. It also has a SLOT available so that other threads can push data to it. No other object can access this list directly only the the original QObject class.
If this slot is connected to signals in other threads (such as Obj 3) using queued or auto connection type, then the Obj 1 is probably safe. If the slot is called directly from other threads, then it obviously isn't thread safe unless you explicitly synchronize everything.
Obj 2 : Producer of data. It pushes data to its list. It has SLOTS available for others to 'ping' it for data which will in turn emit a SIGNAL popping data from its list. No other object can access this list directly.
You don't mention how "pinging" is implemented or which threads call those slots. If other threads call them directly and if pinging involves accessing the internal std::list, then you're in trouble. If those slots are only called via queued or auto connections (to some signal in Obj 3, for example), then it's fine. If those slots are thread safe (for example, they only put a "ping" message into some sort of internal synchronized message queue), then it's fine too. The latter way looks like custom reimplementation of the queued connection mechanism, though.
Overall, this whole thing looks too dangerous to me as slots can be called from anywhere by mistake. I'd try to avoid this kind of thing by putting some safety checks there, like this:
void Obj2::ping() {
if (QThread::currentThread() != this->thread()) {
// not sure how efficient it is
QMetaObject::invoke(this, "ping", Qt::QueuedConnection);
return;
}
// thread unsafe code goes here
}

how to pass data to running thread

When using pthread, I can pass data at thread creation time.
What is the proper way of passing new data to an already running thread?
I'm considering making a global variable and make my thread read from that.
Thanks
That will certainly work. Basically, threads are just lightweight processes that share the same memory space. Global variables, being in that memory space, are available to every thread.
The trick is not with the readers so much as the writers. If you have a simple chunk of global memory, like an int, then assigning to that int will probably be safe. Bt consider something a little more complicated, like a struct. Just to be definite, let's say we have
struct S { int a; float b; } s1, s2;
Now s1,s2 are variables of type struct S. We can initialize them
s1 = { 42, 3.14f };
and we can assign them
s2 = s1;
But when we assign them the processor isn't guaranteed to complete the assignment to the whole struct in one step -- we say it's not atomic. So let's now imagine two threads:
thread 1:
while (true){
printf("{%d,%f}\n", s2.a, s2.b );
sleep(1);
}
thread 2:
while(true){
sleep(1);
s2 = s1;
s1.a += 1;
s1.b += 3.14f ;
}
We can see that we'd expect s2 to have the values {42, 3.14}, {43, 6.28}, {44, 9.42} ....
But what we see printed might be anything like
{42,3.14}
{43,3.14}
{43,6.28}
or
{43,3.14}
{44,6.28}
and so on. The problem is that thread 1 may get control and "look at" s2 at any time during that assignment.
The moral is that while global memory is a perfectly workable way to do it, you need to take into account the possibility that your threads will cross over one another. There are several solutions to this, with the basic one being to use semaphores. A semaphore has two operations, confusingly named from Dutch as P and V.
P simply waits until a variable is 0 and the goes on, adding 1 to the variable; V subtracts 1 from the variable. The only thing special is that they do this atomically -- they can't be interrupted.
Now, do you code as
thread 1:
while (true){
P();
printf("{%d,%f}\n", s2.a, s2.b );
V();
sleep(1);
}
thread 2:
while(true){
sleep(1);
P();
s2 = s1;
V();
s1.a += 1;
s1.b += 3.14f ;
}
and you're guaranteed that you'll never have thread 2 half-completing an assignment while thread 1 is trying to print.
(Pthreads has semaphores, by the way.)
I have been using the message-passing, producer-consumer queue-based, comms mechanism, as suggested by asveikau, for decades without any problems specifically related to multiThreading. There are some advantages:
1) The 'threadCommsClass' instances passed on the queue can often contain everything required for the thread to do its work - member/s for input data, member/s for output data, methods for the thread to call to do the work, somewhere to put any error/exception messages and a 'returnToSender(this)' event to call so returning everything to the requester by some thread-safe means that the worker thread does not need to know about. The worker thread then runs asynchronously on one set of fully encapsulated data that requires no locking. 'returnToSender(this)' might queue the object onto a another P-C queue, it might PostMessage it to a GUI thread, it might release the object back to a pool or just dispose() it. Whatever it does, the worker thread does not need to know about it.
2) There is no need for the requesting thread to know anything about which thread did the work - all the requestor needs is a queue to push on. In an extreme case, the worker thread on the other end of the queue might serialize the data and communicate it to another machine over a network, only calling returnToSender(this) when a network reply is received - the requestor does not need to know this detail - only that the work has been done.
3) It is usually possible to arrange for the 'threadCommsClass' instances and the queues to outlive both the requester thread and the worker thread. This greatly eases those problems when the requester or worker are terminated and dispose()'d before the other - since they share no data directly, there can be no AV/whatever. This also blows away all those 'I can't stop my work thread because it's stuck on a blocking API' issues - why bother stopping it if it can be just orphaned and left to die with no possibility of writing to something that is freed?
4) A threadpool reduces to a one-line for loop that creates several work threads and passes them the same input queue.
5) Locking is restricted to the queues. The more mutexes, condVars, critical-sections and other synchro locks there are in an app, the more difficult it is to control it all and the greater the chance of of an intermittent deadlock that is a nightmare to debug. With queued messages, (ideally), only the queue class has locks. The queue class must work 100% with mutiple producers/consumers, but that's one class, not an app full of uncooordinated locking, (yech!).
6) A threadCommsClass can be raised anytime, anywhere, in any thread and pushed onto a queue. It's not even necessary for the requester code to do it directly, eg. a call to a logger class method, 'myLogger.logString("Operation completed successfully");' could copy the string into a comms object, queue it up to the thread that performs the log write and return 'immediately'. It is then up to the logger class thread to handle the log data when it dequeues it - it may write it to a log file, it may find after a minute that the log file is unreachable because of a network problem. It may decide that the log file is too big, archive it and start another one. It may write the string to disk and then PostMessage the threadCommsClass instance on to a GUI thread for display in a terminal window, whatever. It doesn't matter to the log requesting thread, which just carries on, as do any other threads that have called for logging, without significant impact on performance.
7) If you do need to kill of a thread waiting on a queue, rather than waiing for the OS to kill it on app close, just queue it a message telling it to teminate.
There are surely disadvantages:
1) Shoving data directly into thread members, signaling it to run and waiting for it to finish is easier to understand and will be faster, assuming that the thread does not have to be created each time.
2) Truly asynchronous operation, where the thread is queued some work and, sometime later, returns it by calling some event handler that has to communicate the results back, is more difficult to handle for developers used to single-threaded code and often requires state-machine type design where context data must be sent in the threadCommsClass so that the correct actions can be taken when the results come back. If there is the occasional case where the requestor just has to wait, it can send an event in the threadCommsClass that gets signaled by the returnToSender method, but this is obviously more complex than simply waiting on some thread handle for completion.
Whatever design is used, forget the simple global variables as other posters have said. There is a case for some global types in thread comms - one I use very often is a thread-safe pool of threadCommsClass instances, (this is just a queue that gets pre-filled with objects). Any thread that wishes to communicate has to get a threadCommsClass instance from the pool, load it up and queue it off. When the comms is done, the last thread to use it releases it back to the pool. This approach prevents runaway new(), and allows me to easily monitor the pool level during testing without any complex memory-managers, (I usually dump the pool level to a status bar every second with a timer). Leaking objects, (level goes down), and double-released objects, (level goes up), are easily detected and so get fixed.
MultiThreading can be safe and deliver scaleable, high-performance apps that are almost a pleasure to maintain/enhance, (almost:), but you have to lay off the simple globals - treat them like Tequila - quick and easy high for now but you just know they'll blow your head off tomorrow.
Good luck!
Martin
Global variables are bad to begin with, and even worse with multi-threaded programming. Instead, the creator of the thread should allocate some sort of context object that's passed to pthread_create, which contains whatever buffers, locks, condition variables, queues, etc. are needed for passing information to and from the thread.
You will need to build this yourself. The most typical approach requires some cooperation from the other thread as it would be a bit of a weird interface to "interrupt" a running thread with some data and code to execute on it... That would also have some of the same trickiness as something like POSIX signals or IRQs, both of which it's easy to shoot yourself in the foot while processing, if you haven't carefully thought it through... (Simple example: You can't call malloc inside a signal handler because you might be interrupted in the middle of malloc, so you might crash while accessing malloc's internal data structures which are only partially updated.)
The typical approach is to have your thread creation routine basically be an event loop. You can build a queue structure and pass that as the argument to the thread creation routine. Then other threads can enqueue things and the thread's event loop will dequeue it and process the data. Note this is cleaner than a global variable (or global queue) because it can scale to have multiple of these queues.
You will need some synchronization on that queue data structure. Entire books could be written about how to implement your queue structure's synchronization, but the most simple thing would have a lock and a semaphore. When modifying the queue, threads take a lock. When waiting for something to be dequeued, consumer threads would wait on a semaphore which is incremented by enqueuers. It's also a good idea to implement some mechanism to shut down the consumer thread.

Passing data structures to different threads

I have an application that will be spawning multiple threads. However, I feel there might be an issue with threads accessing data that they shouldn't be.
Here is the structure of the threaded application (sorry for the crudeness):
MainThread
/ \
/ \
/ \
Thread A Thread B
/ \ / \
/ \ / \
/ \ / \
Thread A_1 Thread A_2 Thread B_1 Thread B_2
Under each lettered thread (which could be many), there will only be two threads and they are fired of sequentially. The issue i'm having is I'm not entirely sure how to pass in a datastructure into these threads.
So, the datastructure is created in MainThread, will be modified in the lettered thread (Thread A, etc) specific to that thread and then a member variable from that datastructure is sent to Letter_Numbered threads.
Currently, the lettered thread class has a member variable and when the class is constructed, the datastructure from mainthread is passed in by reference, invoking the copy constructor so the lettered thread has it's own copy to play with.
The lettered_numbered thread simply takes in a string variable from the data structure within the lettered thread. My question is, is this accceptable? Is there a much better way to ensure each lettered thread gets its own data structure to play with?
Sorry for the somewhat poor explanation, please leave comments and i'll try to clarify.
EDIT:
So my lettered thread constructor should take the VALUE of the data structure, not the reference?
I would have each thread create it's own copy of the datastructure, e.g. you pass the structure in the constructor and then explicitly create a local copy. Then you are guaranteed that the threads have distinct copies. (You say that it's passsed by reference, and that this invokes the copy constructor. I think you mean pass by value? I feel it's better to explicitly make a copy, to leave no doubt and to make your intent clear. Otherwise someone might later come along and change your pass by value to pass by reference as a "smart optimization".)
EDIT: Removed comment about strings. For some reason, I was assuming .NET.
To ensure strings are privately owned, follow the same procedure, create a copy of the string, which you can then freely modify.
There is a pattern called Active Object Pattern wherein each object executes in its own thread. Frameworks like ACE support this. If you have access to such frameworks, you should use those. In any case, i would believe creating a new instance of an object and allowing it to exetute in its own thread is much cleaner that invoking the copy-constructor to make a copy of the object. Else see if you can fit a solution that uses Thread Local Storage.
Have you looked at boost threads?
You would basically create a callable class that has a constructor that takes the parameters the thread is to work on and then launch the thread by passing objects of your callable class, initialized and ready to go.
This is very similar to how Java implements threads and it makes a good amount of sense most of the time from a design point of view.
You aparently are making a copy of the data for each trhead and everything works? then no problem.
Here are some additional thoughts:
If data is read only, you can share a single struct and everything will be ok, as long as each read is small and fast (basic types)
If data needs to be written, but "private" (or contained) to each thread, then send a copy to each thread (what you are doing). Caveat: I assume the data is not too big and a copy does not eat to much resources.
If the data needs to be written and the new values shared between threads, then you need to think about it (read on it) and create a proper design. I like a transactional object to centralize each threads read/write operation. Like a tiny database in memory. Check on thread mutex, semaphores and critical sections). Dealing with huge data set I have used a database to centralize requests (See ODBM). You can also check existing messaging queuing libraries (like MSMQ) to have data change ordered and synchronized.
Hope this helps.
It seems unlikely that you would want each thread to operate on the data and then not at least occasionally have another thread react to what another thread has done to another thread's work on the data. If you are truly independent meaning that no other thread truly will ever care about work that another thread has done, then I suggest making a copy of the data, otherwise in the case where you will want to do work in one thread and make that result of that work available to another thread I would suggest that you, pass a reference/pointer to the object around and then protect access to it via locks so that the threads can work with it, properly, I suggest a multi-read, single writer lock implementation.