I'm trying to publish more than 100 messages per second to Azure Iot Hub built-in event hub. I'm using libmosquitto 1.6.8 library. I'm using the Free tier package for Azure Iot Hub, I know that there is throttle limit of 100 messages per second. But this is not related to that issue. I have not been able to publish even half of the messages to AZ Iot Hub.
Basically, I have a list of multiple values in multimap that needs to be sent. The metric list:
std::multimap< const std::string, std::tuple< const std::string, const std::string, float> > calculatedMetricList;
I'll be iterating through the multimap and constructing each value into a object payload and will be sending it through. What this means is that the mosquitto_publish method will be called multiple times.
Following is the code for publishing the messages:
void MosquittoClient::sendDataToUpstreamSystem(){
StatisticalMethod statisticalMethod;
int rc;
MosquittoClient pub_mosq(
"<IoT Hub Name>",
printf("Using MQTT to get data payload from host: %s and on port: %d.\r\n", pub_mosq.get_host(), pub_mosq.get_port());
// init the mosquitto lib
// create the mosquito object
struct mosquitto* mosq = mosquitto_new(pub_mosq.get_deviceID(), false, NULL);
// add callback functions
mosquitto_connect_callback_set(mosq, MosquittoClient::connect_callback);
mosquitto_publish_callback_set(mosq, MosquittoClient::publish_callback);
mosquitto_message_callback_set(mosq, MosquittoClient::on_message);
mosquitto_disconnect_callback_set(mosq, MosquittoClient::on_disconnect_callback);
// set mosquitto username, password and options
mosquitto_username_pw_set(mosq, pub_mosq.get_userName(), pub_mosq.get_password());
// specify the certificate to use
std::ifstream infile(pub_mosq.get_certificate());
bool certExists = infile.good();
if (!certExists)
printf("Warning: Could not find file '%s'! The mosquitto loop may fail.\r\n", pub_mosq.get_certificate());
printf("Using certificate: %s\r\n", pub_mosq.get_certificate());
mosquitto_tls_set(mosq, pub_mosq.get_certificate(), NULL, NULL, NULL, NULL);
// specify the mqtt version to use
int* option = new int(MQTT_PROTOCOL_V311);
rc = mosquitto_opts_set(mosq, MOSQ_OPT_PROTOCOL_VERSION, option);
rc = pub_mosq.mosquitto_error(rc, "Error: opts_set protocol version");
printf("Setting up options OK\r\n");
// connect
rc = mosquitto_connect_async(mosq, pub_mosq.get_host(), pub_mosq.get_port(), 4);
rc = pub_mosq.mosquitto_error(rc, NULL);
printf("Connect returned OK\r\n");
rc = mosquitto_loop_start(mosq);
rc = pub_mosq.mosquitto_error(rc, NULL);
for (auto itr = Metrics::calculatedMetricList.begin(); itr != Metrics::calculatedMetricList.end(); itr++) {
int msgId = rand();
std::string test1= itr->first;
std::string test2 = std::get<0>(itr->second);
std::string test3= std::get<1>(itr->second); // metric type
float value = std::get<2>(itr->second); // value
DataPayload objectPayload(
std::string dataPayload = objectPayload.getPayload();
std::cout << "dataPayload: " << dataPayload << std::endl;
std::cout << "dataPayload Size: " << dataPayload.size() << std::endl;
// once connected, we can publish (send) a Telemetry message
printf("Publishing to topic: %s\r\n", pub_mosq.get_topic());
rc = pub_mosq.publishToTopic(mosq, &msgId, dataPayload.size(), (char *)dataPayload.c_str());
printf("Publish returned OK\r\n");
rc = pub_mosq.mosquitto_error(rc, NULL);
} while (rc != MOSQ_ERR_SUCCESS);
mosquitto_loop_stop(mosq, true);
Publish method:
int MosquittoClient::publishToTopic(struct mosquitto *mosq, int *msgId, int sizeOfData, char *data)
return mosquitto_publish(mosq, msgId, p_topic, sizeOfData, data, 1, true);
When running the program all the messages published return ok, according to the console. But only one or two messages are appearing on the Azure IoT Hub side.
The following image shows the monitoring of IoT Hub, at that time only one message got through.
Device Explorer Twin Monitoring
I have tried so many different solutions, but the program was unable to publish all the messages. It looks like the publish method is waiting to complete the first message but the iteration is moving onto the next message, causing it to be dropped. If that is the cause of the dropped messages, what is the best way to sequence the message sending? Otherwise, what else could be causing messages to be dropped?
The problem was the program didn't waiting until the messages were successfully published to the broker (Azure IoT Hub). You will know if the message is successfully published to the broker if the publish_callback is returned.
void MosquittoClient::publish_callback(struct mosquitto* mosq, void* userdata, int mid)
printf("Publish OK.\r\n");
Solution was to sleep thread before destroy, cleanup calls and start Mosquitto loop before connection is established.
mosquitto_loop_stop(mosq, true);

mosquitto_publish() is asynchronous: having it return MOSQ_ERR_SUCCESS simply means that the publication of the message has properly been passed to the Mosquitto thread. So at the moment you are enqueuing lots of messages, and then have your program terminate before it had a chance to actually send the packets.
You can use your MosquittoClient::publish_callback callback to check that all the messages have effectively been sent before stopping the loop and terminating your program.


Writing Data to Poll Invalid Socket Causes Uncatchable Exception

I am working on a game server that uses sockets and implemented a polling function that sends the message "[POLL]" over all player sockets in a lobby every second to notify the player clients that their connection is still alive.
If I disconnect on the client-side the socket is still polled with no errors, however, if I create a new connection with the same client (Gets a new FD and is added to the map as a second player), the whole server crashes without any exceptions/warnings/messages when it attempts to write to the previous socket FD. My call to Write on the socket is wrapped in a try/catch that doesn't catch any exceptions and, when debugging using gdb, I am not given any error messaging.
This is the Socket Write function:
int Socket::Write(ByteArray const& buffer)
if (!open)
return -1;
// Convert buffer to raw char array
char* raw = new char[buffer.v.size()];
for (int i=0; i < buffer.v.size(); i++)
raw[i] = buffer.v[i];
// Perform the write operation
int returnValue = write(GetFD(), raw, buffer.v.size()); // <- Crashes program
if (returnValue <= 0)
open = false;
return returnValue;
And this is the Poll function (Players are stored in a map of uint -> Socket*):
Polls all connected players to tell them
to keep their connections alive.
void Lobby::Poll()
for (auto it = players.begin(); it != players.end(); it++)
if (it->second != nullptr && it->second->IsOpen())
catch (...)
std::cout << "Failed to write to " << it->first << std::endl;
I would expect to see the "Failed to write to " message but instead the entire server program exits with no messaging. What could be happening here?
I was unable to find a reason for the program crashing in the call to write but I was able to find a workaround.
I perform a poll operation on the file descriptor prior to calling write and I query the POLLNVAL event. If I receive a nonzero value, the FD is now invalid.
// Check if FD is valid
struct pollfd pollFd;
pollFd.fd = GetFD(); = POLLNVAL;
if (poll(&pollFd, 1, 0) > 0)
open = false;
return -1;

gRPC: What are the best practices for long-running streaming?

We've implemented a Java gRPC service that runs in the cloud, with an unidirectional (client to server) streaming RPC which looks like:
rpc PushUpdates(stream Update) returns (Ack);
A C++ client (a mobile device) calls this rpc as soon as it boots up, to continuously send an update every 30 or so seconds, perpetually as long as the device is up and running.
ChannelArguments chan_args;
// this will be secure channel eventually
auto channel_p = CreateCustomChannel(remote_addr, InsecureChannelCredentials(), chan_args);
auto stub_p = DialTcc::NewStub(channel_p);
// ...
Ack ack;
auto strm_ctxt_p = make_unique<ClientContext>();
auto strm_p = stub_p->PushUpdates(strm_ctxt_p.get(), &ack);
// ...
While(true) {
// wait until we are ready to send a new update
Update updt;
// populate updt;
if(!strm_p->Write(updt)) {
// stream is not kosher, create a new one and restart
Now different kinds of network interruptions happen while this is happening:
the gRPC service running in the cloud may go down (for maintenance) or may simply become unreachable.
the device's own ip address keeps changing as it is a mobile device.
We've seen that on such events, neither the channel, nor the Write() API is able to detect network disconnection reliably. At times the client keep calling Write() (which doesn't return false) but the server doesn't receive any data (wireshark doesn't show any activity at the outgoing port of the client device).
What are the best practices to recover in such cases, so that the server starts receiving the updates within X seconds from the time when such an event occurs? It is understandable that there would loss of X seconds worth data whenever such an event happens, but we want to recover reliably within X seconds.
gRPC version: 1.30.2, Client: C++-14/Linux, Sever: Java/Linux
Here's how we've hacked this. I want to check if this can be made any better or anyone from gRPC can guide me about a better solution.
The protobuf for our service looks like this. It has an RPC for pinging the service, which is used frequently to test connectivity.
// Message used in IsAlive RPC
message Empty {}
// Acknowledgement sent by the service for updates received
message UpdateAck {}
// Messages streamed to the service by the client
message Update {
service GrpcService {
// for checking if we're able to connect
rpc Ping(Empty) returns (Empty);
// streaming RPC for pushing updates by client
rpc PushUpdate(stream Update) returns (UpdateAck);
Here is how the c++ client looks, which does the following:
Create the stub for calling the RPCs, if the stub is nullptr.
Call Ping() in regular intervals until it is successful.
On success call PushUpdate(...) RPC to create a new stream.
On failure reset the stream to nullptr.
Stream(): Do the following a while(true) loop:
Get the update to be pushed.
Call Write(...) on the stream with the update to be pushed.
If Write(...) fails for any reason break and the control goes back to Connect().
Once in every 30 minutes (or some regular interval), reset everything (stub, channel, stream) to nullptr to start afresh. This is required because at times Write(...) does not fail even if there is no connection between the client and the service. Write(...) calls are successful but the outgoing port on the client does not show any activity on wireshark!
Here is the code:
constexpr GRPC_TIMEOUT_S = 10;
constexpr RESTART_INTERVAL_M = 15;
constexpr GRPC_KEEPALIVE_TIME_MS = 10000;
string root_ca, tls_key, tls_cert; // for SSL
string remote_addr = "";
void ResetStreaming() {
if (stub_p) {
if (strm_p) { // graceful restart/stop, this pair of API are called together, in this order
if (!strm_p->WritesDone()) {
// Log a message
strm_p->Finish(); // Log if return value of this is NOT grpc::OK
strm_p = nullptr;
strm_ctxt_p = nullptr;
stub_p = nullptr;
channel_p = nullptr;
void CreateStub() {
if (!stub_p) {
ChannelArguments chan_args;
channel_p = CreateCustomChannel(
SslCredentials(SslCredentialsOptions{root_ca, tls_key, tls_cert}),
stub_p = GrpcService::NewStub(m_channel_p);
void Stream() {
const auto restart_time = steady_clock::now() + minutes(RESTART_INTERVAL_M);
while (!stop) {
// restart every RESTART_INTERVAL_M (15m) even if ALL IS WELL!!
if (steady_clock::now() > restart_time) {
Update updt = GetUpdate(); // get the update to be sent
if (!stop) {
if (channel_p->GetState(true) == GRPC_CHANNEL_SHUTDOWN ||
!strm_p->Write(updt)) {
// could not write!!
return; // we will Connect() again
// stopped due to stop = true or interval to create new stream has expired
ResetStreaming(); // channel, stub, stream are recreated once in every 15m
bool PingRemote() {
ClientContext ctxt;
ctxt.set_deadline(system_clock::now() + seconds(GRPC_TIMEOUT_S));
Empty req, resp;
if (stub_p->Ping(&ctxt, req, &resp).ok()) {
static UpdateAck ack;
strm_ctxt_p = make_unique<ClientContext>(); // need new context
strm_p = stub_p->PushUpdate(strm_ctxt_p.get(), &ack);
return true;
if (strm_p) {
strm_p = nullptr;
strm_ctxt_p = nullptr;
return false;
void Connect() {
while (!stop) {
if (PingRemote() || stop) {
sleep_for(seconds(5)); // wait before retrying
// set to true from another thread when we want to stop
atomic<bool> stop = false;
void StreamUntilStopped() {
if (stop) {
strm_thread_p = make_unique<thread>([&] {
while (!stop) {
// called by the thread that sets stop = true
void Finish() {
With this we are seeing that the streaming recovers within 15 minutes (or RESTART_INTERVAL_M) whenever there is a disruption for any reason. This code runs in a fast path, so I am curious to know if this can be made any better.

ZeroMQ IPC across several instances of a program

I am having some problems with inter process communication in ZMQ between several instances of a program
I am using Linux OS
I am using zeromq/cppzmq, header-only C++ binding for libzmq
If I run two instances of this application (say on a terminal), I provide one with an argument to be a listener, then providing the other with an argument to be a sender. The listener never receives a message. I have tried TCP and IPC to no avail.
#include <zmq.hpp>
#include <string>
#include <iostream>
int ListenMessage();
int SendMessage(std::string str);
zmq::context_t global_zmq_context(1);
int main(int argc, char* argv[] ) {
std::string str = "Hello World";
if (atoi(argv[1]) == 0) ListenMessage();
else SendMessage(str);
zmq_ctx_destroy(& global_zmq_context);
return 0;
int SendMessage(std::string str) {
std::cout << "Sending \n";
zmq::socket_t publisher(global_zmq_context, ZMQ_PUB);
int linger = 0;
int rc = zmq_setsockopt(publisher, ZMQ_LINGER, &linger, sizeof(linger));
rc = zmq_connect(publisher, "tcp://");
if (rc == -1) {
printf ("E: connect failed: %s\n", strerror (errno));
return -1;
zmq::message_t message(static_cast<const void*> (, str.size());
rc = publisher.send(message);
if (rc == -1) {
printf ("E: send failed: %s\n", strerror (errno));
return -1;
return 0;
int ListenMessage() {
std::cout << "Listening \n";
zmq::socket_t subscriber(global_zmq_context, ZMQ_SUB);
int rc = zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, "", 0);
int linger = 0;
rc = zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof(linger));
rc = zmq_bind(subscriber, "tcp://");
if (rc == -1) {
printf ("E: bind failed: %s\n", strerror (errno));
return -1;
std::vector<zmq::pollitem_t> p = {{subscriber, 0, ZMQ_POLLIN, 0}};
while (true) {
zmq::message_t rx_msg;
// when timeout (the third argument here) is -1,
// then block until ready to receive
std::cout << "Still Listening before poll \n";
zmq::poll(, 1, -1);
std::cout << "Found an item \n"; // not reaching
if (p[0].revents & ZMQ_POLLIN) {
// received something on the first (only) socket
std::string rx_str;
rx_str.assign(static_cast<char *>(, rx_msg.size());
std::cout << "Received: " << rx_str << std::endl;
return 0;
This code will work if I running one instance of the program with two threads
std::thread t_sub(ListenMessage);
sleep(1); // Slow joiner in ZMQ PUB/SUB pattern
std::thread t_pub(SendMessage str);
But I am wondering why when running two instances of the program the code above won't work?
Thanks for your help!
In case one has never worked with ZeroMQ,one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds"before diving into further details
Q : wondering why when running two instances of the program the code above won't work?
This code will never fly - and it has nothing to do with thread-based nor the process-based [CONCURENT] processing.
It was caused by a wrong design of the Inter Process Communication.
ZeroMQ can provide for this either one of the supported transport-classes :{ ipc:// | tipc:// | tcp:// | norm:// | pgm:// | epgm:// | vmci:// } plus having even smarter one for in-process comms, an inproc:// transport-class ready for inter-thread comms, where a stack-less communication may enjoy the lowest ever latency, being just a memory-mapped policy.
The selection of L3/L2-based networking stack for an Inter-Process-Communication is possible, yet sort of the most "expensive" option.
The Core Mistake :
Given that choice, any single processes ( not speaking about a pair of processes ) will collide on an attempt to .bind() its AccessPoint onto the very same TCP/IP-address:port#
The Other Mistake :
Even for the sake of a solo programme launched, both of the spawned threads attempt to .bind() its private AccessPoint, yet none does an attempt to .connect() a matching "opposite" AccessPoint.
At least one has to successfully .bind(), and
at least one has to successfully .connect(), so as to get a "channel", here of the PUB/SUB Archetype.
decide about a proper, right-enough Transport-Class ( best avoid an overkill to operate the full L3/L2-stack for localhost/in-process IPC )
refactor the Address:port# management ( for 2+ processes not to fail on .bind()-(s) to the same ( hard-wired ) address:port#
always detect and handle appropriately the returned {PASS|FAIL}-s from API calls
always set LINGER to zero explicitly ( you never know )

Local attestation with Intel SGX

I'm trying to perform a local attestation between two enclaves created from two different applications.
The provided sample code for Linux here creates 3 different enclaves and then establishes secure connections between them. But those enclaves have all been created by the same application which therefore is aware of all enclave IDs.
If two different applications are creating there own enclave which should communicate with one another, how would the source enclave get to know the ID of the destination enclave? Would that ID have to be transmitted from one application to the enclave on a "general" way (IPC)?
I've tried some simple test by starting a destination enclave and printing its ID: "26ce00000002"
Then I used this ID in the local attestation example to try to connect to this running destination enclave:
uint64_t wrapper(const char *c) {
errno = 0;
uint64_t result = strtoull(c, NULL, 16);
if (errno == EINVAL) {
cout << "WRONG NUMBER" << endl;
} else if (errno == ERANGE) {
cout << "Too big\n";
return result;
uint32_t load_enclaves() {
uint32_t enclave_temp_no;
int ret, launch_token_updated;
sgx_launch_token_t launch_token;
enclave_temp_no = 0;
ret = sgx_create_enclave(ENCLAVE1_PATH, SGX_DEBUG_FLAG, &launch_token, &launch_token_updated, &e1_enclave_id, NULL);
if (ret != SGX_SUCCESS) {
return ret;
g_enclave_id_map.insert(std::pair<sgx_enclave_id_t, uint32_t>(e1_enclave_id, enclave_temp_no));
const char *test = "26ce00000002";
e2_enclave_id = wrapper(test);
g_enclave_id_map.insert(std::pair<sgx_enclave_id_t, uint32_t>(e2_enclave_id, enclave_temp_no));
int main(int argc, char **argv) {
uint32_t ret_status;
sgx_status_t status;
if(load_enclaves() != SGX_SUCCESS) {
printf("\nLoad Enclave Failure");
printf("\nAvaliable Enclaves");
printf("\nEnclave1 - EnclaveID %lx",e1_enclave_id);
printf("\nEnclave2 - EnclaveID %lx",e2_enclave_id);
do {
//Test Create session between Enclave1(Source) and Enclave2(Destination)
status = Enclave1_test_create_session(e1_enclave_id, &ret_status, e1_enclave_id, e2_enclave_id);
if (status!=SGX_SUCCESS)
printf("Enclave1_test_create_session Ecall failed: Error status code is %x", status);
printf("\n\nSecure Channel Establishment between Source (E1) and Destination (E2) Enclaves successful !!!");
printf("\nSession establishment and key exchange failure between Source (E1) and Destination (E2): Error return status is %x\n", ret_status);
When executing the local attestation program with the source enclave I receive a "SGX_ERROR_INVALID_ENCLAVE_ID" error? This error is not thrown by the local attestation example program but comes from somewhere in the SGX libraries and I don't know why since the destination enclave is still running, therefore the ID should exist!?
We do not need a secure connection to exchange the enclave id's. The Application can store the enclave id in a registry or on the disc along with the enclave names which can be retrieved by corresponding application to obtain the id of the required enclave. Then the application initiates a session between the source enclave and the destination enclave by doing an ECALL into the source enclave, passing in the enclave id of the destination enclave. Upon receiving the enclave id of the destination enclave, the source enclave does an OCALL into the core untrusted code which then does an ECALL into the destination enclave to exchange the messages required to establish a session using ECDH Key Exchange protocol.

czmq asynchronous send/receive (PUB/SUB)

I'm using czmq for interprocess communication.
There are 2 processes :
The server, receiving requests and sending replies but also sending events.
The client, sending requests and receiving replies but also listening to the events.
I have already successfuly implemented the "request/reply" pattern with REQ/REP (details below)
Now I want to implement the notification mechanism.
I want my server to send its events without caring whether anyone receives them or not and without being blocked in anyway.
The client listens to those events but should it crash, it mustn't have any impact on the server.
I believe PUB/SUB is the most appropriate pattern, but if not do not hesitate to enlighten me.
Here's my implementation (cleaned from checks and logs) :
The server publishes the events
Server::eventIpcPublisher = zsock_new_pub("#ipc:///tmp/events.ipc");
void Server::OnEvent(uint8_t8* eventData, size_t dataSize) {
if (Server::eventIpcPublisher != nullptr) {
int retCode = zsock_send(Server::eventIpcPublisher, "b", eventData, dataSize);
The client listens to them in a dedicated thread
void Client::RegisterToEvents(const std::function<void(uint8_t*, size_t)>& callback) {
zsock_t* eventIpcSubscriber = zsock_new_sub(">ipc:///tmp/events.ipc", "");
listening = true;
while (listening) {
byte* receptionBuffer;
size_t receptionBufferSize;
int retCode = zsock_recv(eventIpcSubscriber, "b", &receptionBuffer, &receptionBufferSize);
if (retCode == 0) {
callback(static_cast<uint8_t*>(receptionBuffer), receptionBufferSize);
It doesn't work:
The server sends with return code 0, as if everything is ok,
The client doesn't receive anything (blocked on receive).
Help would be much appreciated, thanks in advance!
PS: here is the REQ/REP that I have already implemented with success (no help needed here, just for comprehension)
The client sends a request and then waits for the answer.
uint8_t* MulticamApi::GetDatabase(size_t& sizeOfData) {
zsock_t* requestSocket = zsock_new_req(">ipc:///tmp/requests.ipc");
if (requestSocket == nullptr)
return nullptr;
byte* receptionBuffer;
size_t receptionBufferSize;
int retCode = zsock_send(requestSocket, "i", static_cast<int>(IpcComm_GetClipDbRequest));
if (retCode != 0) {
sizeOfData = 0;
return nullptr;
retCode = zsock_recv(requestSocket, "b", &receptionBuffer, &receptionBufferSize);
databaseData.reset(new MallocGuard(static_cast<void*>(receptionBuffer)));
sizeOfData = receptionBufferSize;
return static_cast<uint8_t*>(databaseData->Data());
A dedicated thread in the server listens to requests, processes them and replies. (don't worry, delete is handled somewhere else)
U32 Server::V_OnProcessing(U32 waitCode) {
protocolIpcWriter = zsock_new_rep("#ipc:///tmp/requests.ipc");
while (running) {
int receptionInt = 0;
int retCode = zsock_recv(protocolIpcWriter, "i", &receptionInt);
if ((retCode == 0) && (receptionInt == static_cast<int>(IpcComm_GetClipDbRequest))) {
return 0;
void Server::GetDatabase() {
uint32_t dataSize = 10820 * 340;
uint8_t* data = new uint8_t[dataSize];
uint32_t nbBytesWritten = DbHelper::SaveDbToBuffer(data, dataSize);
int retCode = zsock_send(protocolIpcWriter, "b", data, nbBytesWritten);
I know my question's old but for the record, I switched from czmq to base zmq api and everything went smooth. A colleague of mine also had issues with the czmq layer and switched to zmq to fix them so that's definitely what I recommend.