ZeroMQ socket.recv() raised a STACK_OVERFLOW exception - c++

if use this code in .dll, a call to a socket.recv() raised an exception STACK_OVERFLOW, but when this code compiled as .exe it works.
Why?
I run a .dll-test by "C:\windows\system32\rundll32.exe myDll.dll StartUp"
void StartUp()
{
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_REP);
socket.bind("tcp://127.0.0.1:3456");
zmq::message_t msgIN, msgOUT("test", 4);
while (true){
socket.recv(&msgIN);
socket.send(msgOUT);
};
}
callstack :
libzmq-v120-mt-gd-4_2_2.dll!zmq::mailbox_t::recv(zmq::command_t * cmd_=0x0231f700, int timeout_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!zmq::io_thread_t::in_event()
libzmq-v120-mt-gd-4_2_2.dll!zmq::select_t::loop()
libzmq-v120-mt-gd-4_2_2.dll!zmq::select_t::worker_routine(void * arg_=0x002f1778)
libzmq-v120-mt-gd-4_2_2.dll!thread_routine(void * arg_=0x002f17c0)
main thread callstack:
libzmq-v120-mt-gd-4_2_2.dll!zmq::signaler_t::wait(int timeout_=0xffffffff)
libzmq-v120-mt-gd-4_2_2.dll!zmq::mailbox_t::recv(zmq::command_t * cmd_=0x0019f3c0, int timeout_=0xffffffff)
libzmq-v120-mt-gd-4_2_2.dll!zmq::socket_base_t::process_commands(int timeout_, bool throttle_)
libzmq-v120-mt-gd-4_2_2.dll!zmq::socket_base_t::recv(zmq::msg_t * msg_=0x0019f628, int flags_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!s_recvmsg(zmq::socket_base_t * s_=0x006f6c70, zmq_msg_t * msg_=0x0019f628, int flags_=0x00000000)
libzmq-v120-mt-gd-4_2_2.dll!zmq_msg_recv(zmq_msg_t * msg_=0x0019f628, void * s_=0x006f6c70, int flags_=0x00000000)
mydll.dll!zmq::socket_t::recv(zmq::message_t * msg_=0x0019f628, int flags_=0x00000000)
mydll.dll!StartUp()
Update:
this example, also crashed with the same reason. Does someone know any reasons for exception stack overflow?
zmq::context_t context(1);
zmq::socket_t socket(context, ZMQ_REP);
socket.bind("tcp://*:7712");
while (1){
Sleep(10);
}
A reverse problem-isolation MCVE:
And how did this myDll.dll-test work,
if run by C:\windows\system32\rundll32.exe myDll.dll StartUp? Post the screen outputs.
void StartUp()
{
std::cout << "INF:: ENTRY POINT ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
std::cout << "INF:: WILL SLEEP ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
Sleep( 10 );
std::cout << "INF:: SLEPT WELL ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
std::cout << "INF:: WILL RETURN ( C:\windows\system32\rundll32.exe myDll.dll StartUp )" << std::endl;
}

The reason of crash is SizeOfStackCommit value in OPTIONAL_HEADER rundll32 file.
It too small (0xC000), i change it to 0x100000. Now all works.

ZeroMQ objects require certain respect to work with:
there are many features under the radar, that may go wreck havoc, as you have already seen on your screen.
Best read with due care both the ZeroMQ C++ binding reference documentation plus the original ZeroMQ API ( which is often mentioned in the C++ binding either ).
Both do emphasise to never handle zmq::message_t instances directly, but via using "service"-functions ( often re-wrapped as instance methods in C++ ).
zmq::message_t messageIN,
messageOUT;
bool successFlag;
while (true){
successFlag = socket.recv( &messageIN );
assert( successFlag && "EXC: .recv( &messageIN )" );
/* The zmq_recv() function shall receive a message
from the socket referenced by the socket argument
and store it in the message referenced by the msg
argument.
Any content previously stored in msg shall be
properly deallocated.
If there are no messages available on the specified
socket the zmq_recv() function shall block
until the request can be satisfied.
*/
messageOUT.copy( messageIN );
successFlag = socket.send( messageOUT );
assert( successFlag && "EXC: .send( messageOUT )" );
/* The zmq_send() function shall queue the message
referenced by the msg argument to be sent to
the socket referenced by the socket argument.
The flags argument is a combination of the flags
defined { ZMQ_NOBLOCK, ZMQ_SNDMORE }
The zmq_msg_t structure passed to zmq_send()
is nullified during the call.
If you want to send the same message to multiple
sockets you have to copy it using (e.g.
using zmq_msg_copy() ).
A successful invocation of zmq_send()
does not indicate that the message
has been transmitted to the network,
only that it has been queued on the socket
and ØMQ has assumed responsibility for the message.
*/
};
My suspect is a reference counting, adding more and more instances, produced by a zmq::message_t message; constructor in an infinite while( true ){...}-loop, none of which has ever met it's own fair destructor. The STACK, having a physically-limited capacity and none STACK-management care inside DLL, will fail sooner or later.
zmq::message_t instances are quite an expensive toy, so a good resources-management practices ( pre-allocation, reuse, controlled destructions ) are always welcome for professional code.
Q.E.D.
Tail remarks for clarity purposes:
A bit paraphrasing Dijkstra's view on error hunting and software testing: "If I see no Error, that does not mean, there is none in the piece of code ( the less if any external functions are linked in addition to it )."
No stack allocations?
Yes, no visible ones.
ZeroMQ API puts more light into it:
"The zmq_msg_init_size() function shall allocate any resources required to store a message size bytes long and initialise the message object referenced by msg to represent the newly allocated message.
The implementation shall choose whether to store message content on the stack (small messages) or on the heap (large messages). For performance reasons zmq_msg_init_size() shall not clear the message data."
Many years, so far spent on using cross-platform distributed systems, based on ZeroMQ API since v.2.1+, has taught me lot on being careful on explicit resources control. The more once you did not develop your own language binding for the native API.
After all unsupported criticism, let's add one more citation from ZeroMQ:
This adds a view, how a proper indirect manipulation of the message_t content is done by the library C++ bindings itself, wrapped into trivial helper functions:
from zhelpers.hpp:
// Receive 0MQ string from socket and convert into string
static std::string
s_recv (zmq::socket_t & socket) {
zmq::message_t message;
socket.recv(&message);
return std::string(static_cast<char*>(message.data()), message.size());
}
// Convert string to 0MQ string and send to socket
static bool
s_send (zmq::socket_t & socket, const std::string & string) {
zmq::message_t message(string.size());
memcpy (message.data(), string.data(), string.size());
bool rc = socket.send (message);
return (rc);
}
// Sends string as 0MQ string, as multipart non-terminal
static bool
s_sendmore (zmq::socket_t & socket, const std::string & string) {
zmq::message_t message(string.size());
memcpy (message.data(), string.data(), string.size());
bool rc = socket.send (message, ZMQ_SNDMORE);
return (rc);
}

Related

App crashes when it takes too long to reply in a ZMQ REQ/REP pattern

I am writing a plugin that interfaces with a desktop application through a ZeroMQ REQ/REP request-reply communication archetype. I can currently receive a request, but the application seemingly crashes if a reply is not sent quick enough.
I receive the request on a spawned thread and put it in a queue. This queue is processed in another thread, in which the processing function is invoked by the application periodically.
The message is correctly being received and processed, but the response cannot be sent until the next iteration of the function, as I cannot get the data from the application until then.
When this function is conditioned to send the response on the next iteration, the application will crash. However, if I send fake data as the response soon after receiving the request, in the first iteration, the application will not crash.
Constructing the socket
zmq::socket_t socket(m_context, ZMQ_REP);
socket.bind("tcp://*:" + std::to_string(port));
Receiving the message in the spawned thread
void ZMQReceiverV2::receiveRequests() {
nInfo(*m_logger) << "Preparing to receive requests";
while (m_isReceiving) {
zmq::message_t zmq_msg;
bool ok = m_respSocket.recv(&zmq_msg, ZMQ_NOBLOCK);
if (ok) {
// msg_str will be a binary string
std::string msg_str;
msg_str.assign(static_cast<char *>(zmq_msg.data()), zmq_msg.size());
nInfo(*m_logger) << "Received the message: " << msg_str;
std::pair<std::string, std::string> pair("", msg_str);
// adding to message queue
m_mutex.lock();
m_messages.push(pair);
m_mutex.unlock();
}
std::this_thread::sleep_for(std::chrono::milliseconds(100));
}
nInfo(*m_logger) << "Done receiving requests";
}
Processing function on seperate thread
void ZMQReceiverV2::exportFrameAvailable()
// checking messages
// if the queue is not empty
m_mutex.lock();
if (!m_messages.empty()) {
nInfo(*m_logger) << "Reading message in queue";
smart_target::SMARTTargetCreateRequest id_msg;
std::pair<std::string, std::string> pair = m_messages.front();
std::string topic = pair.first;
std::string msg_str = pair.second;
processMsg(msg_str);
// removing just read message
m_messages.pop();
//m_respSocket.send(zmq::message_t()); wont crash if I reply here in this invocation
}
m_mutex.unlock();
// sending back the ID that has just been made, for it to be mapped
if (timeToSendReply()) {
sendReply(); // will crash, if I wait for this to be exectued on next invocation
}
}
My research shows that there is no time limit for the response to be sent, so this, seeming to be, timing issue, is strange.
Is there something that I am missing that will let me send the response on the second iteration of the processing function?
Revision 1:
I have edited my code, so that the responding socket only ever exists on one thread. Since I need to get information from the processing function to send, I created another queue, which is checked in the revised the function running on its own thread.
void ZMQReceiverV2::receiveRequests() {
zmq::socket_t socket = setupBindSocket(ZMQ_REP, 5557, "responder");
nInfo(*m_logger) << "Preparing to receive requests";
while (m_isReceiving) {
zmq::message_t zmq_msg;
bool ok = socket.recv(&zmq_msg, ZMQ_NOBLOCK);
if (ok) {
// does not crash if I call send helper here
// msg_str will be a binary string
std::string msg_str;
msg_str.assign(static_cast<char *>(zmq_msg.data()), zmq_msg.size());
NLogger::nInfo(*m_logger) << "Received the message: " << msg_str;
std::pair<std::string, std::string> pair("", msg_str);
// adding to message queue
m_mutex.lock();
m_messages.push(pair);
m_mutex.unlock();
}
std::this_thread::sleep_for(std::chrono::milliseconds(100));
if (!sendQueue.empty()) {
sendEntityCreationMessage(socket, sendQueue.front());
sendQueue.pop();
}
}
nInfo(*m_logger) << "Done receiving requests";
socket.close();
}
The function sendEntityCreationMessage() is a helper function that ultimately calls socket.send().
void ZMQReceiverV2::sendEntityCreationMessage(zmq::socket_t &socket, NUniqueID id) {
socket.send(zmq::message_t());
}
This code seems to be following the thread safety guidelines for sockets. Any suggestions?
Q : "Is there something that I am missing"
Yes,the ZeroMQ evangelisation, called a Zen-of-Zero, since ever promotes never try to share a Socket-instance, never try to block and never expect the world to act as one wishes.
This said, avoid touching the same Socket-instance from any non-local thread, except the one that has instantiated and owns the socket.
Last, but not least, the REQ/REP-Scalable Formal Communication Pattern Archetype is prone to fall into a deadlock, as a mandatory two-step dance must be obeyed - where one must keep the alternating sequence of calling .send()-.recv()-.send()-.recv()-.send()-...-methods, otherwise the principally distributed-system tandem of Finite State Automata (FSA) will unsalvageably end up in a mutual self-deadlock state of the dFSA.
In case one is planning to professionally build on ZeroMQ, the best next step is to re-read the fabulous Pieter HINTJENS' book "Code Connected: Volume 1". A piece of a hard read, yet definitely worth one's time, sweat, tears & efforts put in.

Crash in a modified version of an official ZeroMQ mutithreaded example

I'm new to zmq and cppzmq. While trying to run the multithreaded example in the official guide: http://zguide.zeromq.org/cpp:mtserver
My setup
macOS Mojave, Xcode 10.3
libzmq 4.3.2 via Homebrew
cppzmq GitHub HEAD
I hit a few problems.
Problem 1
When running source code in the guide, it hangs forever without any stdout output shown up.
Here is the code directly copied from the Guide.
/*
Multithreaded Hello World server in C
*/
#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <zmq.hpp>
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
zmq::message_t request;
socket.recv (&request);
std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
socket.send (reply);
}
return (NULL);
}
int main ()
{
// Prepare our context and sockets
zmq::context_t context (1);
zmq::socket_t clients (context, ZMQ_ROUTER);
clients.bind ("tcp://*:5555");
zmq::socket_t workers (context, ZMQ_DEALER);
workers.bind ("inproc://workers");
// Launch pool of worker threads
for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
pthread_t worker;
pthread_create (&worker, NULL, worker_routine, (void *) &context);
}
// Connect work threads to client threads via a queue
zmq::proxy (static_cast<void*>(clients),
static_cast<void*>(workers),
nullptr);
return 0;
}
It crashes soon after I put a breakpoint in the while loop of the worker.
Problem 2
Noticing that the compiler prompted me to replace deprecated API calls, I modified the above sample code to make the warnings disappear.
/*
Multithreaded Hello World server in C
*/
#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <cstdio>
#include <zmq.hpp>
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
zmq::mutable_buffer request(buf.data(), buf.size());
socket.recv(request, zmq::recv_flags::dontwait);
std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %X\n", e.num());
}
}
return (NULL);
}
int main ()
{
// Prepare our context and sockets
zmq::context_t context (1);
zmq::socket_t clients (context, ZMQ_ROUTER);
clients.bind ("tcp://*:5555"); // who i talk to.
zmq::socket_t workers (context, ZMQ_DEALER);
workers.bind ("inproc://workers");
// Launch pool of worker threads
for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
pthread_t worker;
pthread_create (&worker, NULL, worker_routine, (void *) &context);
}
// Connect work threads to client threads via a queue
zmq::proxy (clients, workers);
return 0;
}
I'm not pretending to have a literal translation of the original broken example, but it's my effort to make things compile and run without obvious memory errors.
This code keeps giving me error number 9523DFB (156384763in Hex) from the try-catch block. I can't find the definition of the error number in official docs, but got it from this question that it's the native ZeroMQ error EFSM:
The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state. This error may occur with socket types that switch between several states, such as ZMQ_REP.
I'd appreciate it if anyone can point out where I did wrong.
UPDATE
I tried polling according to #user3666197 's suggestion. But still the program hangs. Inserting any breakpoint effectively crashes the program, making it difficult to debug.
Here is the new worker code
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
zmq::pollitem_t items[1] = { { socket, 0, ZMQ_POLLIN, 0 } };
while (true) {
if(zmq::poll(items, 1, -1) < 1) {
printf("Terminating worker\n");
break;
}
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %s\n", e.what());
}
}
return (NULL);
}
Welcome to the domain of the Zen-of-Zero
Suspect #1: the code jumps straight into an unresolveable live-lock due to a move into ill-directed state of the distributed-Finite-State-Automaton:
While I since ever advocate for preferring non-blocking .recv()-s, the code above simply commits suicide right by using this step:
socket.recv( request, zmq::recv_flags::dontwait ); // socket being == ZMQ_REP
kills all chances for any other future life but the very error The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state.
as
going into the .send()-able state is possible if and only if a previous .recv()-ed has delivered a real message.
The Best Next Step :
Review the code and may either use a blocking-form of the .recv() before going to .send() or, better, use a { blocking | non-blocking }-form of .poll( { 0 | timeout }, ZMQ_POLLIN ) before entering into an attempt to .recv() and keep doing other things, if there is nothing to receive yet ( so as to avoid the self suicidal throwing the dFSA into an uresolvable collision, flooding your stdout/stderr with a second-spaced flow of printf(" ERROR: %X\n", e.num() ); )
Error Handling :
Better use const char *zmq_strerror ( int errnum ); being fed by int zmq_errno (void);
The Problem 1 :
On the contrary to the suicidal ::dontwait flag in the Problem 2 root cause, the Problem 2 root cause is, that a blocking-form of the first .recv() here moves all the worker-threads into an undeterministically long, possibly infinite, waiting-state, as the .recv()-blocks proceeding to any further step until a real message arrives ( which it does not seem from the MCVE, that it ever will ) and so your pool-of-threads remains in a pool-wide blocked-waiting-state and nothing will ever happen until any message arrived.
Update on how the REQ/REP works :
The REQ/REP Scalable Communication Pattern Archetype works like a distributed pair of people - one, let's call her Mary, asks ( Mary .send()-s the REQ ), while the other one, say Bob the REP listens in a potentially infinitely long blocking .recv() ( or takes a due care, using .poll() to orderly and regularly check, if Mary has asked about something or not and continues to do his own hobbies or gardening otherwise ) and once the Bob's end gets a message, Bob can go and .send() Mary a reply ( not before, as he knows nothing when and what Mary would ( or would not ) ask in the nearer of farther future ) ) and Mary is fair not to ask her next REQ.send()-question to Bob anytime sooner but after Bob has ( REP.send() ) replied and Mary has received Bob's message ( REQ.recv() ) - which is fair and more symmetric, than a real life may exhibit among real people under one roof :o)
The code?
The code is not a reproducible MCVE. The main() creates five Bobs ( hanging waiting a call from Mary, somewhere over inproc:// transport-class ), but no Mary ever calls, or does she? Not visible sign of any Mary trying to do so, the less her ( their, could be a (even a dynamic) community of N:M herd-of-Mary(s):herd-of-5-Bobs relation ) attempt(s) to handle REP-ly(s) coming from either one of the 5-Bobs.
Persevere, ZeroMQ took me some time of scratching my own head, yet the years after I took a due care to learn the Zen-of-Zero are still a rewarding eternal walk in the Gardens of Paradise. No localhost serial-code IDE will ever be able to "debug" a distributed-system (unless a distributed-inspector infrastructure is inplace, a due architecture for a distributed-system monitor/tracer/debugger is another layer of distributed messaging/signaling layer atop of the debugged distributed messaging/signaling system - so do not expect it from a trivial localhost serial-code IDE.
If still in doubts, isolate potential troublemakers - replace inproc:// with tcp:// and if toys do not work with tcp:// (where one can wire-line trace the messages) it won't with inproc:// memory-zone tricks.
About the hanging that I saw in my UPDATED question, I finally figured out what's going on. It's a false expectation on my part.
This very sample code in my question is never meant to be a self-contained service/client code: It is a server-only app with ZMQ_REP socket. It just waits for any client code to send request through ZMQ_REQ sockets. So the "hang" that I was seeing is completely normal!
As soon as I hook up a client app to it, things start rolling instantly. This chapter is somewhere in the middle of the Guide and I was only concerned with multithreading so I skipped many code samples and messaging patterns, which led to my confusion.
The code comments even said it's a server, but I expected to see explicit confirmation from the program. So to be fair the lack of visual cue and the compiler deprecation warning caused me to question the sample code as a new user, but the story that the code tells is valid.
Such a shame on wasted time! But all of a sudden all #user3666197 says in his answer starts to make sense.
For the completeness of this question, the updated server thread worker code that works:
// server.cpp
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %s\n", e.what());
}
}
return (NULL);
}
The much needed client code:
// client.cpp
int main (void)
{
void *context = zmq_ctx_new ();
// Socket to talk to server
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");
int request_nbr;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
zmq_send (requester, "Hello", 6, 0);
char buf[6];
zmq_recv (requester, buf, 6, 0);
printf ("Received reply %d [%s]\n", request_nbr, buf);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
The server worker does not have to poll manually because it has been wrapped into the zmq::proxy.

boost::asio write: Broken pipe

I have a TCP server that handles new connections, when there's a new connection two threads will be created (std::thread, detached).
void Gateway::startServer(boost::asio::io_service& io_service, unsigned short port) {
tcp::acceptor TCPAcceptor(io_service, tcp::endpoint(tcp::v4(), port));
bool UARTToWiFiGatewayStarted = false;
for (;;) { std::cout << "\nstartServer()\n";
auto socket(std::shared_ptr<tcp::socket>(new tcp::socket(io_service)));
/*!
* Accept a new connected WiFi client.
*/
TCPAcceptor.accept(*socket);
socket->set_option( tcp::no_delay( true ) );
// This will set the boolean `Gateway::communicationSessionStatus` variable to true.
Gateway::enableCommunicationSession();
// start one thread
std::thread(WiFiToUARTWorkerSession, socket, this->SpecialUARTPort, this->SpecialUARTPortBaud).detach();
// start the second thread
std::thread(UARTToWifiWorkerSession, socket, this->UARTport, this->UARTbaud).detach();
}
}
The first of two worker functions look like this (here I'm reading using the shared socket):
void Gateway::WiFiToUARTWorkerSession(std::shared_ptr<tcp::socket> socket, std::string SpecialUARTPort, unsigned int baud) {
std::cout << "\nEntered: WiFiToUARTWorkerSession(...)\n";
std::shared_ptr<FastUARTIOHandler> uart(new FastUARTIOHandler(SpecialUARTPort, baud));
try {
while(true == Gateway::communicationSessionStatus) { std::cout << "WiFi->UART\n";
unsigned char WiFiDataBuffer[max_incoming_wifi_data_length];
boost::system::error_code error;
/*!
* Read the TCP data.
*/
size_t length = socket->read_some(boost::asio::buffer(WiFiDataBuffer), error);
/*!
* Handle possible read errors.
*/
if (error == boost::asio::error::eof) {
// this will set the shared boolean variable from "true" to "false", causing the while loop (from the both functions and threads) to stop.
Gateway::disableCommunicationSession();
break; // Connection closed cleanly by peer.
}
else if (error) {
Gateway::disableCommunicationSession();
throw boost::system::system_error(error); // Some other error.
}
uart->write(WiFiDataBuffer, length);
}
}
catch (std::exception &exception) {
std::cerr << "[APP::exception] Exception in thread: " << exception.what() << std::endl;
}
std::cout << "\nExiting: WiFiToUARTWorkerSession(...)\n";
}
And the second one (here I'm writing using the thread-shared socket):
void Gateway::UARTToWifiWorkerSession(std::shared_ptr<tcp::socket> socket, std::string UARTport, unsigned int baud) {
std::cout << "\nEntered: UARTToWifiWorkerSession(...)\n";
/*!
* Buffer used for storing the UART-incoming data.
*/
unsigned char UARTDataBuffer[max_incoming_uart_data_length];
std::vector<unsigned char> outputBuffer;
std::shared_ptr<FastUARTIOHandler> uartHandler(new FastUARTIOHandler(UARTport, baud));
while(true == Gateway::communicationSessionStatus) { std::cout << "UART->WiFi\n";
/*!
* Read the UART-available data.
*/
auto bytesReceived = uartHandler->read(UARTDataBuffer, max_incoming_uart_data_length);
/*!
* If there was some data, send it over TCP.
*/
if(bytesReceived > 0) {
boost::asio::write((*socket), boost::asio::buffer(UARTDataBuffer, bytesReceived));
std::cout << "\nSending data to app...\n";
}
}
std::cout << "\nExited: UARTToWifiWorkerSession(...)\n";
}
For stopping this two threads I do the following thing: from the WiFiToUARTWorkerSession(...) function, if the read(...) fails (there's an error like boost::asio::error::eof, or any other error) I set the Gateway::communicationSessionStatus boolean switch (which is shared (global) by the both functions) to false, this way the functions should return, and the threads should be killed gracefully.
When I'm connecting for the first time, this works well, but when I'm disconnecting from the server, the execution flow from the WiFiToUARTWorkerSession(...) goes through else if (error) condition, it sets the while condition variable to false, and then it throws boost::system::system_error(error) (which actually means Connection reset by peer).
Then when I'm trying to connect again, I got the following exception and the program terminates:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): write: Broken pipe
What could be the problem?
EDIT: From what I found about this error, it seems that I write(...) after the client disconnects, but how could this be possible?
EDIT2: I have debugged the code even more and it seems that one thread (on which runs the UARTToWifiWorkerSession(...) function) won't actually exit (because there's a blocking read(...) function call at where the execution flow stops). This way that one thread will hang until there's some data received by the read(...) function, and when I'm reconnecting there will be created another two threads, this causing some data racing problems.
Can someone confirm me that this could be the problem?
The actual problem was that the function UARTToWifiWorkerSession(...) didn't actually exit (because of a blocking read(...) function, this causing two threads (the hanging one, and one of the latest two created ones) to write(...) (without any concurrency control) using the same socket.
The solution was to set a read(...) timeout, so I can return from the function (and thus destroy the thread) without pending from some input.

ZeroMQ PUSH/PULL

Some part of zmq is not behaving in a predictable manner.
I'm using VS2013 and zmq 3.2.4. In order to not 'lose' messages in my pubsub framework [aside: I believe this is a design flaw. I should be able to start my subscriber first, then publisher and I should receive all messages] I must synchronise the publisher with the subscriber a la durapub/durasub etc. I am using the durasub.cpp and durapub.cpp examples found in the zeromq guide.
If I use the examples as-is, the system works perfectly.
If I now add scoping brackets around ZMQ_PUSH in durasub.cpp
{
zmq::socket_t sync (context, ZMQ_PUSH);
sync.connect(syncstr.c_str());
s_send (sync, "sync");
}
the system stops working. The matching 'ZMQ_PULL' signal never reaches the application level in durapub.cpp.
I have stepped through the C++ wrapper to check the return values from zmq_close and all is well. As far as ZMQ is concerned it has delivered the message to the endpoint.
Hopefully I've done something obviously stupid?
There's more. The addition of
std::this_thread::sleep_for(std::chrono::milliseconds(1));
allows the system (ie the pub/sub) to start working again. So it's clearly a race-condition, presumably in the reaper thread as it destroys the socket.
More digging around. I think LIBZMQ-179 refers to the problem as well.
EDIT#2 2014-08-13 03:00 [UTC+0000]
Publisher.cpp:
#include <zmq.hpp>
#include <zhelpers.hpp>
#include <string>
int main (int argc, char *argv[])
{
zmq::context_t context(1);
std::string bind_point("tcp://*:5555");
std::string sync_bind("tcp://*:5554");
zmq::socket_t sync(context, ZMQ_PULL);
sync.bind(sync_bind.c_str());
// We send updates via this socket
zmq::socket_t publisher(context, ZMQ_PUB);
publisher.bind(bind_point.c_str());
// Wait for synchronization request
std::string tmp = s_recv (sync);
std::cout << "Recieved: " << tmp << std::endl;
int numbytessent = s_send (publisher, "END");
std::cout << numbytessent << "bytes sent" << std::endl;
}
Subscriber.cpp
#include <zmq.hpp>
#include <zhelpers.hpp>
#include <string>
int main (int argc, char *argv[])
{
std::string connectstr("tcp://127.0.0.1:5555");
std::string syncstr("tcp://127.0.0.1:5554");
zmq::context_t context(1);
zmq::socket_t subscriber (context, ZMQ_SUB);
subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);
subscriber.connect(connectstr.c_str());
#if ENABLE_PROBLEM
{
#endif ENABLE_PROBLEM
zmq::socket_t sync (context, ZMQ_PUSH);
sync.connect(syncstr.c_str());
s_send (sync, "sync");
#if ENABLE_PROBLEM
}
#endif ENABLE_PROBLEM
while (1)
{
std::cout << "Receiving..." << std::endl;
std::string s = s_recv (subscriber);
std::cout << s << std::endl;
if (s == "END")
{
break;
}
}
}
Compile each cpp to its own exe.
Start both exes (starting order is irrelevant)
If ENABLE_PROBLEM is defined:
Publisher: (EMPTY prompt)
Subscriber: 'Receiving...'
And then you have to kill both processes because they're hung...
If ENABLE_PROBLEM is not defined:
Publisher: 'Received: sync'
'3 bytes sent'
Subscriber: 'Receiving...'
'END'
EDIT#1 2014-08-11: Original post has changed, without leaving revisions visible
What is the goal?
With all due respect, it is quite hard to isolate the goal and mock-up any PASS/FAIL-test to validate the goal, from just the three SLOC-s above.
So let´s start step by step.
What ZMQ-primitives are used there?
T.B.D.
post-EDIT#1: ZMQ_PUSH + ZMQ_PULL + ( hidden ZMQ_PUB + ZMQ_SUB ... next time rather post ProblemDOMAIN-context-complete sources, best enriched with self-test-case outputs alike:
...
// <code>-debug-isolation-framing ------------------------------------------------
std::cout << "---[Pre-test]: sync.connect(syncstr.c_str()) argument" << std::endl;
std::cout << syncstr.c_str() << std::endl;
std::cout << "---[Use/exec]: " << std::endl;
sync.connect( syncstr.c_str());
// <code>-debug-isolation-framing ------------------------------------------------
...
)
What ZMQ-context create/terminate life-cycle-policy is deployed?
T.B.D.
post-EDIT#1: n.b.: ZMQ_LINGER rather influences .close() of the resource, which may take way before a ZMQ_Context termination appears. ( And may block ... which hurts... )
A note on "When does ZMQ_LINGER really matter?"
This parameter comes into action once a Context is near to be terminated, while a sending-queue is not empty yet and an attempt to zmq_close() is being handled.
In most architectures ( ... the more in low-latency / high-performance, where microseconds and nanoseconds count ... ) the (shared/restricted) resources setup/disposal operations appear for many reasons either at the very begining, or at the very end of the system life-cycle. Needless to tell more why, just imagine the overheads directly associated with all the setup / discard operations, that are altogether simply not feasible to take place ( the less to repetitively take place ... ) during routine flow of operations in near-real-time system designs.
So, having the system processes come to the final "tidy-up" phase ( just before exit )
setting ZMQ_LINGER == 0 simply ignores whatever is still inside the <sender>'s queue and allows a prompt zmq_close() + zmq_term()
Similarly ZMQ_LINGER == -1 puts whatever is still hanging inside the <sender>'s queue a label of [Having an utmost value], that the whole system has to wait ad-infimum, after ( hopefully any ) <receiver> retrieves & "consumes" all the en-queued messages, before any zmq_close() + zmq_term() is allowed to take place ... which could be pretty long and is fully out of your control ...
And finally ZMQ_LINGER > 0 serves as a compromise to wait a defined amount of [msec]-s, should some <receiver> comes and retrieves an en-queued message(s). However on the given TimeDOMAIN milestone, the system proceeds to zmq_close() + zmq_term() to a graceful clean release of all reserved resources and exit in accord with the system design timing constraints.

How to pass user-defined data to a worker thread using IOCP?

Hey... I created a small test server using I/O completion ports and winsock.
I can successfully connect and associate a socket handle with the completion port.
But I don´t know how to pass user-defined data-structures into the wroker thread...
What I´ve tried so far was passing a user-structure as (ULONG_PTR)&structure as the Completion Key in the association-call of CreateIoCompletionPort()
But that did not work.
Now I tried defining my own OVERLAPPED-structure and using CONTAINING_RECORD() as described here http://msdn.microsoft.com/en-us/magazine/cc302334.aspx and http://msdn.microsoft.com/en-us/magazine/bb985148.aspx.
But that does not work, too. (I get freaky values for the contents of pHelper)
So my Question is: How can I pass data to the worker thread using WSARecv(), GetQueuedCompletionStatus() and the Completion packet or the OVERLAPPED-strucutre?
EDIT: How can I successfully transmit "per-connection-data"?... It seems like I got the art of doing it (like explained in the two links above) wrong.
Here goes my code: (Yes, its ugly and its only TEST-code)
struct helper
{
SOCKET m_sock;
unsigned int m_key;
OVERLAPPED over;
};
///////
SOCKET newSock = INVALID_SOCKET;
WSABUF wsabuffer;
char cbuf[250];
wsabuffer.buf = cbuf;
wsabuffer.len = 250;
DWORD flags, bytesrecvd;
while(true)
{
newSock = accept(AcceptorSock, NULL, NULL);
if(newSock == INVALID_SOCKET)
ErrorAbort("could not accept a connection");
//associate socket with the CP
if(CreateIoCompletionPort((HANDLE)newSock, hCompletionPort, 3,0) != hCompletionPort)
ErrorAbort("Wrong port associated with the connection");
else
cout << "New Connection made and associated\n";
helper* pHelper = new helper;
pHelper->m_key = 3;
pHelper->m_sock = newSock;
memset(&(pHelper->over), 0, sizeof(OVERLAPPED));
flags = 0;
bytesrecvd = 0;
if(WSARecv(newSock, &wsabuffer, 1, NULL, &flags, (OVERLAPPED*)pHelper, NULL) != 0)
{
if(WSAGetLastError() != WSA_IO_PENDING)
ErrorAbort("WSARecv didnt work");
}
}
//Cleanup
CloseHandle(hCompletionPort);
cin.get();
return 0;
}
DWORD WINAPI ThreadProc(HANDLE h)
{
DWORD dwNumberOfBytes = 0;
OVERLAPPED* pOver = nullptr;
helper* pHelper = nullptr;
WSABUF RecvBuf;
char cBuffer[250];
RecvBuf.buf = cBuffer;
RecvBuf.len = 250;
DWORD dwRecvBytes = 0;
DWORD dwFlags = 0;
ULONG_PTR Key = 0;
GetQueuedCompletionStatus(h, &dwNumberOfBytes, &Key, &pOver, INFINITE);
//Extract helper
pHelper = (helper*)CONTAINING_RECORD(pOver, helper, over);
cout << "Received Overlapped item" << endl;
if(WSARecv(pHelper->m_sock, &RecvBuf, 1, &dwRecvBytes, &dwFlags, pOver, NULL) != 0)
cout << "Could not receive data\n";
else
cout << "Data Received: " << RecvBuf.buf << endl;
ExitThread(0);
}
If you pass your struct like this it should work just fine:
helper* pHelper = new helper;
CreateIoCompletionPort((HANDLE)newSock, hCompletionPort, (ULONG_PTR)pHelper,0);
...
helper* pHelper=NULL;
GetQueuedCompletionStatus(h, &dwNumberOfBytes, (PULONG_PTR)&pHelper, &pOver, INFINITE);
Edit to add per IO data:
One of the frequently abused features of the asynchronous apis is they don't copy the OVERLAPPED struct, they simply use the provided one - hence the overlapped struct returned from GetQueuedCompletionStatus points to the originally provided struct. So:
struct helper {
OVERLAPPED m_over;
SOCKET m_socket;
UINT m_key;
};
if(WSARecv(newSock, &wsabuffer, 1, NULL, &flags, &pHelper->m_over, NULL) != 0)
Notice that, again, in your original sample, you were getting your casting wrong. (OVERLAPPED*)pHelper was passing a pointer to the START of the helper struct, but the OVERLAPPED part was declared last. I changed it to pass the address of the actual overlapped part, which means that the code compiles without a cast, which lets us know we are doing the correct thing. I also moved the overlapped struct to be the first member of the struct.
To catch the data on the other side:
OVERLAPPED* pOver;
ULONG_PTR key;
if(GetQueuedCompletionStatus(h,&dw,&key,&pOver,INFINITE))
{
// c cast
helper* pConnData = (helper*)pOver;
On this side it is particularly important that the overlapped struct is the first member of the helper struct, as that makes it easy to cast back from the OVERLAPPED* the api gives us, and the helper* we actually want.
You can send special-purpose data of your own to the completion port via PostQueuedCompletionStatus.
The I/O completion packet will satisfy
an outstanding call to the
GetQueuedCompletionStatus function.
This function returns with the three
values passed as the second, third,
and fourth parameters of the call to
PostQueuedCompletionStatus. The system
does not use or validate these values.
In particular, the lpOverlapped
parameter need not point to an
OVERLAPPED structure.
I use the standard socket routines (socket, closesocket, bind, accept, connect ...) for creating/destroying and ReadFile/WriteFile for I/O as they allow use of the OVERLAPPED structure.
After your socket has accepted or connected you should associate it with the session context that it services. Then you associate your socket to an IOCP and (in the third parameter) provide it with a reference to the session context. The IOCP does not know what this reference is and doesn't care either for that matter. The reference is for YOUR use so that when you get an IOC through GetQueuedCompletionStatus the variable pointed to by parameter 3 will be filled in with the reference so that you immediately find the context associated with the socket event and can begin servicing the event. I usually use an indexed structure containing (among other things) the socket declaration, the overlapped structure as well as other session-specific data. The reference I pass to CreateIoCompletionPort in parameter 3 will be the index to the structure member containing the socket.
You need to check if GetQueuedCompletionStatus returned a completion or a timeout. With a timeout you can run through your indexed structure and see (for example) if one of them has timed out or something else and take appropriate house-keeping actions.
The overlapped structure also needs to be checked to see that the I/O completed correctly.
The function servicing the IOCP should be a separate, multi-threaded entity. Use the same number of threads that you have cores in your system, or at least no more than that as it wastes system resources (you don't have more resources for servicing the event than the number of cores in your system, right?).
IOCPs really are the best of all worlds (too good to be true) and anyone who says "one thread per socket" or "wait on multiple-socket list in one function" don't know what they are talking about. The former stresses your scheduler and the latter is polling and polling is ALWAYS extremely wasteful.