I am having some problems with inter process communication in ZMQ between several instances of a program
I am using Linux OS
I am using zeromq/cppzmq, header-only C++ binding for libzmq
If I run two instances of this application (say on a terminal), I provide one with an argument to be a listener, then providing the other with an argument to be a sender. The listener never receives a message. I have tried TCP and IPC to no avail.
#include <zmq.hpp>
#include <string>
#include <iostream>
int ListenMessage();
int SendMessage(std::string str);
zmq::context_t global_zmq_context(1);
int main(int argc, char* argv[] ) {
std::string str = "Hello World";
if (atoi(argv[1]) == 0) ListenMessage();
else SendMessage(str);
zmq_ctx_destroy(& global_zmq_context);
return 0;
}
int SendMessage(std::string str) {
assert(global_zmq_context);
std::cout << "Sending \n";
zmq::socket_t publisher(global_zmq_context, ZMQ_PUB);
assert(publisher);
int linger = 0;
int rc = zmq_setsockopt(publisher, ZMQ_LINGER, &linger, sizeof(linger));
assert(rc==0);
rc = zmq_connect(publisher, "tcp://127.0.0.1:4506");
if (rc == -1) {
printf ("E: connect failed: %s\n", strerror (errno));
return -1;
}
zmq::message_t message(static_cast<const void*> (str.data()), str.size());
rc = publisher.send(message);
if (rc == -1) {
printf ("E: send failed: %s\n", strerror (errno));
return -1;
}
return 0;
}
int ListenMessage() {
assert(global_zmq_context);
std::cout << "Listening \n";
zmq::socket_t subscriber(global_zmq_context, ZMQ_SUB);
assert(subscriber);
int rc = zmq_setsockopt(subscriber, ZMQ_SUBSCRIBE, "", 0);
assert(rc==0);
int linger = 0;
rc = zmq_setsockopt(subscriber, ZMQ_LINGER, &linger, sizeof(linger));
assert(rc==0);
rc = zmq_bind(subscriber, "tcp://127.0.0.1:4506");
if (rc == -1) {
printf ("E: bind failed: %s\n", strerror (errno));
return -1;
}
std::vector<zmq::pollitem_t> p = {{subscriber, 0, ZMQ_POLLIN, 0}};
while (true) {
zmq::message_t rx_msg;
// when timeout (the third argument here) is -1,
// then block until ready to receive
std::cout << "Still Listening before poll \n";
zmq::poll(p.data(), 1, -1);
std::cout << "Found an item \n"; // not reaching
if (p[0].revents & ZMQ_POLLIN) {
// received something on the first (only) socket
subscriber.recv(&rx_msg);
std::string rx_str;
rx_str.assign(static_cast<char *>(rx_msg.data()), rx_msg.size());
std::cout << "Received: " << rx_str << std::endl;
}
}
return 0;
}
This code will work if I running one instance of the program with two threads
std::thread t_sub(ListenMessage);
sleep(1); // Slow joiner in ZMQ PUB/SUB pattern
std::thread t_pub(SendMessage str);
t_pub.join();
t_sub.join();
But I am wondering why when running two instances of the program the code above won't work?
Thanks for your help!
In case one has never worked with ZeroMQ,one may here enjoy to first look at "ZeroMQ Principles in less than Five Seconds"before diving into further details
Q : wondering why when running two instances of the program the code above won't work?
This code will never fly - and it has nothing to do with thread-based nor the process-based [CONCURENT] processing.
It was caused by a wrong design of the Inter Process Communication.
ZeroMQ can provide for this either one of the supported transport-classes :{ ipc:// | tipc:// | tcp:// | norm:// | pgm:// | epgm:// | vmci:// } plus having even smarter one for in-process comms, an inproc:// transport-class ready for inter-thread comms, where a stack-less communication may enjoy the lowest ever latency, being just a memory-mapped policy.
The selection of L3/L2-based networking stack for an Inter-Process-Communication is possible, yet sort of the most "expensive" option.
The Core Mistake :
Given that choice, any single processes ( not speaking about a pair of processes ) will collide on an attempt to .bind() its AccessPoint onto the very same TCP/IP-address:port#
The Other Mistake :
Even for the sake of a solo programme launched, both of the spawned threads attempt to .bind() its private AccessPoint, yet none does an attempt to .connect() a matching "opposite" AccessPoint.
At least one has to successfully .bind(), and
at least one has to successfully .connect(), so as to get a "channel", here of the PUB/SUB Archetype.
ToDo:
decide about a proper, right-enough Transport-Class ( best avoid an overkill to operate the full L3/L2-stack for localhost/in-process IPC )
refactor the Address:port# management ( for 2+ processes not to fail on .bind()-(s) to the same ( hard-wired ) address:port#
always detect and handle appropriately the returned {PASS|FAIL}-s from API calls
always set LINGER to zero explicitly ( you never know )
Related
I am working on a project where I have to use zmq_poll. But I did not completely understand what it does.
So I also tried to implement it:
zmq_pollitem_t timer_open(void){
zmq_pollitem_t items[1];
if( items[0].socket == nullptr ){
printf("error socket %s: %s\n", zmq_strerror(zmq_errno()));
return;
}
else{
items[0].socket = gsock;
}
items[0].fd = -1;
items[0].events = ZMQ_POLLIN;
// get a timer
items[0].fd = timerfd_create( CLOCK_REALTIME, 0 );
if( items[0].fd == -1 )
{
printf("timerfd_create() failed: errno=%d\n", errno);
items[0].socket = nullptr;
return;
}
int rc = zmq_poll(items,1,-1);
if(rc == -1){
printf("error poll %s: %s\n", zmq_strerror(zmq_errno()));
return;
}
else
return items[0];
}
I am very new to this topic and I have to modify an old existing project and replace the functions with the one of zmq. On other websites I saw examples where they used two items and the zmq_poll function in an endless loop. I have read the documentation but still could not properly understand how this works. And these are the other two functions I have implemented. I do not know if it is the correct way to implement it like this:
void timer_set(zmq_pollitem_t items[] , long msec, ipc_timer_mode_t mode ) {
struct itimerspec t;
...
timerfd_settime( items[0].fd , 0, &t, NULL );
}
void timer_close(zmq_pollitem_t items[]){
if( items[0].fd != -1 )
close(items[0].fd);
items[0].socket = nullptr;
}
I am not sure if I need the zmq_poll function because I am using a timer.
EDIT:
void some_function_timer_example() {
// We want to wait on two timers
zmq_pollitem_t items[2] ;
// Setup first timer
ipc_timer_open_(&items[0]);
ipc_timer_set_(&items[0], 1000, IPC_TIMER_ONE_SHOT);
// Setup second timer
ipc_timer_open_(&items[1]);
ipc_timer_set_(&items[1], 1000, IPC_TIMER_ONE_SHOT);
// Now wait for the timers in a loop
while (1) {
//ipc_timer_set_(&items[0], 1000, IPC_TIMER_REPEAT);
//ipc_timer_set_(&items[1], 5000, IPC_TIMER_REPEAT);
int rc = zmq_poll (items, 2, -1);
assert (rc >= 0); /* Returned events will be stored in items[].revents */
if (items [0].revents & ZMQ_POLLIN) {
// Process task
std::cout << "revents: 1" << std::endl;
}
if (items [1].revents & ZMQ_POLLIN) {
// Process weather update
std::cout << "revents: 2" << std::endl;
}
}
}
Now it still prins very fast and is not waiting. It is still waiting only in the beginning. And when the timer_set is inside the loop it waits properly, only if the waiting time is the same like: ipc_timer_set(&items[1], 1000,...) and ipctimer_set(&items[0], 1000,...)
So how do I have to change this? Or is this the correct behavior?
zmq_poll works like select, but it allows some additional stuff. For instance you can select between regular synchronous file descriptors, and also special async sockets.
In your case you can use the timer fd as you have tried to do, but you need to make a few small changes.
First you have to consider how you will invoke these timers. I think the use case is if you want to create multiple timers and wait for them. This would be typically the function in yuor current code that might be using a loop for the timer (either using select() or whatever else they might be doing).
It would be something like this:
void some_function() {
// We want to wait on two timers
zmq_pollitem items[2];
// Setup first timer
ipc_timer_open(&item[0]);
ipc_timer_set(&item[0], 1000, IPC_TIMER_ONE_REPEAT);
// Setup second timer
ipc_timer_open(&item[1]);
ipc_timer_set(&item[1], 5000, IPC_TIMER_ONE_SHOT);
// Now wait for the timers in a loop
while (1) {
int rc = zmq_poll (items, 2, -1);
assert (rc >= 0); /* Returned events will be stored in items[].revents */
}
}
Now, you need to fix the ipc_timer_open. It will be very simple - just create the timer fd.
// Takes a pointer to pre-allocated zmq_pollitem_t and returns 0 for success, -1 for error
int ipc_timer_open(zmq_pollitem_t *items){
items[0].socket = NULL;
items[0].events = ZMQ_POLLIN;
// get a timer
items[0].fd = timerfd_create( CLOCK_REALTIME, 0 );
if( items[0].fd == -1 )
{
printf("timerfd_create() failed: errno=%d\n", errno);
return -1; // error
}
return 0;
}
Edit: Added as reply to comment, since this is long:
From the documentation:
If both socket and fd are set in a single zmq_pollitem_t, the ØMQ socket referenced by socket shall take precedence and the value of fd shall be ignored.
So if you are passing the fd, you have to set socket to NULL. I am not even clear where gsock is coming from. Is this in the documentation? I couldn't find it.
And when will it break out of the while(1) loop?
This is application logic, and you have to code according to what you require. zmq_poll just keeps returning everytime one of the timer hits. In this example, every second the zmq_poll returns because the first timer (which is a repeat) keeps triggering. But at 5 seconds, it will also return because of the second timer (which is a one shot). Its up to you to decide when you exit the loop. Do you want this to go infinitely? Do you need to check for a different condition to exit the loop? Do you want to do this for say 100 times and then return? You can code whatever logic you want on top of this code.
And what kind of events are returned back
ZMQ_POLLIN since timer fds behave like readable file descriptors.
I'm new to zmq and cppzmq. While trying to run the multithreaded example in the official guide: http://zguide.zeromq.org/cpp:mtserver
My setup
macOS Mojave, Xcode 10.3
libzmq 4.3.2 via Homebrew
cppzmq GitHub HEAD
I hit a few problems.
Problem 1
When running source code in the guide, it hangs forever without any stdout output shown up.
Here is the code directly copied from the Guide.
/*
Multithreaded Hello World server in C
*/
#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <zmq.hpp>
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
zmq::message_t request;
socket.recv (&request);
std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
socket.send (reply);
}
return (NULL);
}
int main ()
{
// Prepare our context and sockets
zmq::context_t context (1);
zmq::socket_t clients (context, ZMQ_ROUTER);
clients.bind ("tcp://*:5555");
zmq::socket_t workers (context, ZMQ_DEALER);
workers.bind ("inproc://workers");
// Launch pool of worker threads
for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
pthread_t worker;
pthread_create (&worker, NULL, worker_routine, (void *) &context);
}
// Connect work threads to client threads via a queue
zmq::proxy (static_cast<void*>(clients),
static_cast<void*>(workers),
nullptr);
return 0;
}
It crashes soon after I put a breakpoint in the while loop of the worker.
Problem 2
Noticing that the compiler prompted me to replace deprecated API calls, I modified the above sample code to make the warnings disappear.
/*
Multithreaded Hello World server in C
*/
#include <pthread.h>
#include <unistd.h>
#include <cassert>
#include <string>
#include <iostream>
#include <cstdio>
#include <zmq.hpp>
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
zmq::mutable_buffer request(buf.data(), buf.size());
socket.recv(request, zmq::recv_flags::dontwait);
std::cout << "Received request: [" << (char*) request.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %X\n", e.num());
}
}
return (NULL);
}
int main ()
{
// Prepare our context and sockets
zmq::context_t context (1);
zmq::socket_t clients (context, ZMQ_ROUTER);
clients.bind ("tcp://*:5555"); // who i talk to.
zmq::socket_t workers (context, ZMQ_DEALER);
workers.bind ("inproc://workers");
// Launch pool of worker threads
for (int thread_nbr = 0; thread_nbr != 5; thread_nbr++) {
pthread_t worker;
pthread_create (&worker, NULL, worker_routine, (void *) &context);
}
// Connect work threads to client threads via a queue
zmq::proxy (clients, workers);
return 0;
}
I'm not pretending to have a literal translation of the original broken example, but it's my effort to make things compile and run without obvious memory errors.
This code keeps giving me error number 9523DFB (156384763in Hex) from the try-catch block. I can't find the definition of the error number in official docs, but got it from this question that it's the native ZeroMQ error EFSM:
The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state. This error may occur with socket types that switch between several states, such as ZMQ_REP.
I'd appreciate it if anyone can point out where I did wrong.
UPDATE
I tried polling according to #user3666197 's suggestion. But still the program hangs. Inserting any breakpoint effectively crashes the program, making it difficult to debug.
Here is the new worker code
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
zmq::pollitem_t items[1] = { { socket, 0, ZMQ_POLLIN, 0 } };
while (true) {
if(zmq::poll(items, 1, -1) < 1) {
printf("Terminating worker\n");
break;
}
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %s\n", e.what());
}
}
return (NULL);
}
Welcome to the domain of the Zen-of-Zero
Suspect #1: the code jumps straight into an unresolveable live-lock due to a move into ill-directed state of the distributed-Finite-State-Automaton:
While I since ever advocate for preferring non-blocking .recv()-s, the code above simply commits suicide right by using this step:
socket.recv( request, zmq::recv_flags::dontwait ); // socket being == ZMQ_REP
kills all chances for any other future life but the very error The zmq_send() operation cannot be performed on this socket at the moment due to the socket not being in the appropriate state.
as
going into the .send()-able state is possible if and only if a previous .recv()-ed has delivered a real message.
The Best Next Step :
Review the code and may either use a blocking-form of the .recv() before going to .send() or, better, use a { blocking | non-blocking }-form of .poll( { 0 | timeout }, ZMQ_POLLIN ) before entering into an attempt to .recv() and keep doing other things, if there is nothing to receive yet ( so as to avoid the self suicidal throwing the dFSA into an uresolvable collision, flooding your stdout/stderr with a second-spaced flow of printf(" ERROR: %X\n", e.num() ); )
Error Handling :
Better use const char *zmq_strerror ( int errnum ); being fed by int zmq_errno (void);
The Problem 1 :
On the contrary to the suicidal ::dontwait flag in the Problem 2 root cause, the Problem 2 root cause is, that a blocking-form of the first .recv() here moves all the worker-threads into an undeterministically long, possibly infinite, waiting-state, as the .recv()-blocks proceeding to any further step until a real message arrives ( which it does not seem from the MCVE, that it ever will ) and so your pool-of-threads remains in a pool-wide blocked-waiting-state and nothing will ever happen until any message arrived.
Update on how the REQ/REP works :
The REQ/REP Scalable Communication Pattern Archetype works like a distributed pair of people - one, let's call her Mary, asks ( Mary .send()-s the REQ ), while the other one, say Bob the REP listens in a potentially infinitely long blocking .recv() ( or takes a due care, using .poll() to orderly and regularly check, if Mary has asked about something or not and continues to do his own hobbies or gardening otherwise ) and once the Bob's end gets a message, Bob can go and .send() Mary a reply ( not before, as he knows nothing when and what Mary would ( or would not ) ask in the nearer of farther future ) ) and Mary is fair not to ask her next REQ.send()-question to Bob anytime sooner but after Bob has ( REP.send() ) replied and Mary has received Bob's message ( REQ.recv() ) - which is fair and more symmetric, than a real life may exhibit among real people under one roof :o)
The code?
The code is not a reproducible MCVE. The main() creates five Bobs ( hanging waiting a call from Mary, somewhere over inproc:// transport-class ), but no Mary ever calls, or does she? Not visible sign of any Mary trying to do so, the less her ( their, could be a (even a dynamic) community of N:M herd-of-Mary(s):herd-of-5-Bobs relation ) attempt(s) to handle REP-ly(s) coming from either one of the 5-Bobs.
Persevere, ZeroMQ took me some time of scratching my own head, yet the years after I took a due care to learn the Zen-of-Zero are still a rewarding eternal walk in the Gardens of Paradise. No localhost serial-code IDE will ever be able to "debug" a distributed-system (unless a distributed-inspector infrastructure is inplace, a due architecture for a distributed-system monitor/tracer/debugger is another layer of distributed messaging/signaling layer atop of the debugged distributed messaging/signaling system - so do not expect it from a trivial localhost serial-code IDE.
If still in doubts, isolate potential troublemakers - replace inproc:// with tcp:// and if toys do not work with tcp:// (where one can wire-line trace the messages) it won't with inproc:// memory-zone tricks.
About the hanging that I saw in my UPDATED question, I finally figured out what's going on. It's a false expectation on my part.
This very sample code in my question is never meant to be a self-contained service/client code: It is a server-only app with ZMQ_REP socket. It just waits for any client code to send request through ZMQ_REQ sockets. So the "hang" that I was seeing is completely normal!
As soon as I hook up a client app to it, things start rolling instantly. This chapter is somewhere in the middle of the Guide and I was only concerned with multithreading so I skipped many code samples and messaging patterns, which led to my confusion.
The code comments even said it's a server, but I expected to see explicit confirmation from the program. So to be fair the lack of visual cue and the compiler deprecation warning caused me to question the sample code as a new user, but the story that the code tells is valid.
Such a shame on wasted time! But all of a sudden all #user3666197 says in his answer starts to make sense.
For the completeness of this question, the updated server thread worker code that works:
// server.cpp
void *worker_routine (void *arg)
{
zmq::context_t *context = (zmq::context_t *) arg;
zmq::socket_t socket (*context, ZMQ_REP);
socket.connect ("inproc://workers");
while (true) {
// Wait for next request from client
std::array<char, 1024> buf{'\0'};
socket.recv(zmq::buffer(buf), zmq::recv_flags::none);
std::cout << "Received request: [" << (char*) buf.data() << "]" << std::endl;
// Do some 'work'
sleep (1);
// Send reply back to client
zmq::message_t reply (6);
memcpy ((void *) reply.data (), "World", 6);
try {
socket.send (reply, zmq::send_flags::dontwait);
}
catch (zmq::error_t& e) {
printf("ERROR: %s\n", e.what());
}
}
return (NULL);
}
The much needed client code:
// client.cpp
int main (void)
{
void *context = zmq_ctx_new ();
// Socket to talk to server
void *requester = zmq_socket (context, ZMQ_REQ);
zmq_connect (requester, "tcp://localhost:5555");
int request_nbr;
for (request_nbr = 0; request_nbr != 10; request_nbr++) {
zmq_send (requester, "Hello", 6, 0);
char buf[6];
zmq_recv (requester, buf, 6, 0);
printf ("Received reply %d [%s]\n", request_nbr, buf);
}
zmq_close (requester);
zmq_ctx_destroy (context);
return 0;
}
The server worker does not have to poll manually because it has been wrapped into the zmq::proxy.
In this code the Subscriber ( in subscriber.cpp code ) socket binds to port 5556.
It receives updates/messages from publisher ( in subscriber.cpp ), and publisher socket connects to the subscriber at 5556 and sends updates/messages to it.
I know that convention is to .bind() a publisher and not to call .connect() on it. But by theory every socket type can .bind() or .connect().
But, both the codes give zmq error when run. Why?
This is CPP code.
publisher.cpp
#include <iostream>
#include <zmq.hpp>
#include <zhelpers.hpp>
using namespace std;
int main () {
zmq::context_t context (1);
zmq::socket_t publisher(context, ZMQ_PUB);
publisher.connect("tcp://*:5556");
while (1) {
zmq::message_t request (12);
memcpy (request.data (), "Pub-1 Data", 12);
sleep(1);
publisher.send (request);
}
return 0;
}
subcriber.cpp
#include <iostream>
#include <zmq.hpp>
int main (int argc, char *argv[])
{
zmq::context_t context (1);
zmq::socket_t subscriber (context, ZMQ_SUB);
subscriber.bind("tcp://localhost:5556");
subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0); // subscribe to all messages
// Process 10 updates
int update_nbr;
for (update_nbr = 0; update_nbr < 10 ; update_nbr++) {
zmq::message_t update;
subscriber.recv (&update);
std::string updt = std::string(static_cast<char*>(update.data()), update.size());
std::cout << "Received Update/Messages/TaskList " << update_nbr <<" : "<< updt << std::endl;
}
return 0;
}
No, there is no problem in reversed .bind()/.connect()
This principally works fine.
Yet, the PUB/SUB Formal Archetype is subject to a so called late-joiner syndrome.
Without thorough debugging details, as were requested above, one may just repeat general rules of thumb:
In newer API versions one may
add rc = <aSocket>.setsockopt( ZMQ_CONFLATE, 1 ); assert( rc & "CONFLATE" );
add rc = <aSocket>.setsockopt( ZMQ_IMMEDIATE, 1 ); assert( rc & "IMMEDIATE" );
and so forth,
all that to better tune the context-instance + socket-instance attributes, so as to minimise the late-joiner syndrome effects.
There is no problem with reversed bind()/connect().
The code is working when I changed the line - subscriber.bind("tcp://localhost:5556");
to
subscriber.bind("tcp://:5556");
and
publisher.connect("tcp://:5556");
to
publisher.connect("tcp://localhost:5556");
I have written simple server/client programs, in which the client sends some hardcoded data in small chunks to the server program, which is waiting for the data so that it can print it to the terminal. In the client, I'm calling send() in a loop while there is more data to send, and on the server, I'm doing the same with read(), that is, while the number of bytes returned is > 0, I continue to read.
This example works perfectly if I specifically call close() on the client's socket after I've finished sending, but if I don't, the server won't actually exit the read() loop until I close the client and break the connection. On the server side, I'm using:
while((bytesRead = read(socket, buffer, BUFFER_SIZE)) > 0)
Shouldn't bytesRead be 0 when all the data has been received? And if so, why will it not exit this loop until I close the socket? In my final application, it will be beneficial to keep the socket open between requests, but all of the sample code and information I can find calls close() immediately after sending data, which is not what I want.
What am I missing?
When the other end of the socket is connected to some other network system halfway around the world, the only way that the receiving socket knows "when all the data has been received" is precisely when the other side of the socket is closed. That's what tells the other side of the socket that "all the data has been received".
All that a socket knows about is that it's connected to some other socket endpoint. That's it. End of story. The socket has no special knowledge of the inner workings of the program that has the other side of the socket connection. Nor should it know. That happens to be the responsibility of the program that has the socket open, and not the socket itself.
If your program, on the receiving side, has knowledge -- by the virtue of knowing what data it is expected to receive -- that it has now received everything that it needs to receive, then it can close its end of the socket, and move on to the next task at hand.
You will have to incorporate in your program's logic, a way to determine, in some form or fashion, that all the data has been transmitted. The exact nature of that is going to be up to you to define. Perhaps, before sending all the data on the socket, your sending program will send in advance, on the same socket, the number of bytes that will be in the data to follow. Then, your receiving program reads the number of bytes first, followed by the data itself, and then knows that it has received everything, and can move on.
That's one simplistic approach. The exact details is up to you. Alternatively, you can also implement a timeout: set a timer and if any data is not received in some prescribed period of time, assume that there is no more.
You can set a flag on the recv call to prevent blocking.
One way to detect this easily is to wrap the recv call:
enum class read_result
{
// note: numerically in increasing order of severity
ok,
would_block,
end_of_file,
error,
};
template<std::size_t BufferLength>
read_result read(int socket_fd, char (&buffer)[BufferLength], int& bytes_read)
{
auto result = recv(socket_fd, buffer, BufferLength, MSG_DONTWAIT);
if (result > 0)
{
return read_result::ok;
}
else if (result == 0)
{
return read_result::end_of_file;
}
else {
auto err = errno;
if (err == EAGAIN or err == EWOULDBLOCK) {
return read_result::would_block;
}
else {
return read_result ::error;
}
}
}
One use case might be:
#include <unistd.h>
#include <sys/socket.h>
#include <cstdlib>
#include <cerrno>
#include <iostream>
enum class read_result
{
// note: numerically in increasing order of severity
ok,
would_block,
end_of_file,
error,
};
template<std::size_t BufferLength>
read_result read(int socket_fd, char (&buffer)[BufferLength], int& bytes_read)
{
auto result = recv(socket_fd, buffer, BufferLength, MSG_DONTWAIT);
if (result > 0)
{
return read_result::ok;
}
else if (result == 0)
{
return read_result::end_of_file;
}
else {
auto err = errno;
if (err == EAGAIN or err == EWOULDBLOCK) {
return read_result::would_block;
}
else {
return read_result ::error;
}
}
}
struct keep_reading
{
keep_reading& operator=(read_result result)
{
result_ = result;
}
const operator bool() const {
return result_ < read_result::end_of_file;
}
auto get_result() const -> read_result { return result_; }
private:
read_result result_ = read_result::ok;
};
int main()
{
int socket; // = open my socket and wait for it to be connected etc
char buffer [1024];
int bytes_read = 0;
keep_reading should_keep_reading;
while(keep_reading = read(socket, buffer, bytes_read))
{
if (should_keep_reading.get_result() != read_result::would_block) {
// read things here
}
else {
// idle processing here
}
}
std::cout << "reason for stopping: " << should_keep_reading.get_result() << std::endl;
}
I'm trying to do some simple packet capturing with pcap, and so I've created a handle to listen through eth0. My issue is with the pcap_loop(handle, 10, myCallback, NULL); line near the end of my code. I'm trying to use pcap_loop.
The expected output is supposed to be:
eth0
Activated!
1
2
3
...
10
Done processing packets!
Current output is missing the increments:
eth0
Activated!
Done processing packets!
Currently it's just skipping right through to "Done processing packets!" and I have no idea why. Even if it doesn't go to the callback, it should still be waiting on packets as the ;count' parameter (see documentation for pcap_loop) is set to 10.
#include <iostream>
#include <pcap.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>
void myCallback(u_char *useless, const struct pcap_pkthdr* hdr, const u_char*packet){
static int count = 1;
std::cout <<count<<std::endl;
count ++;
}
int main(){
char errbuf[PCAP_ERRBUF_SIZE];
char * devName;
char* net;
char* mask;
const u_char*packet;
struct in_addr addr;
struct pcap_pkthdr hdr;
bpf_u_int32 netp;
bpf_u_int32 maskp;
pcap_if_t *devs;
pcap_findalldevs(&devs, errbuf);
devName = pcap_lookupdev(errbuf);
std::cout <<devName<<std::endl;
int success = pcap_lookupnet(devName, &netp, &maskp, errbuf);
if(success<0){
exit(EXIT_FAILURE);
}
pcap_freealldevs(devs);
//Create a handle
pcap_t *handle = pcap_create(devName, errbuf);
pcap_set_promisc(handle, 1);
pcap_can_set_rfmon(handle);
//Activate the handle
if(pcap_activate(handle)){
std::cout <<"Activated!"<<std::endl;
}
else{
exit(EXIT_FAILURE);
}
pcap_loop(handle, 10, myCallback, NULL);
std::cout <<"Done processing packets!"<<std::endl;
//close handle
pcap_close(handle);
}
pcap_findalldevs(&devs, errbuf);
That call isn't doing anything useful, as you're not doing anything with devs other than freeing it. (You also aren't checking whether it succeeds or fails.) You might as well remove it unless you have some need to know what all the devices on which you can capture are.
pcap_can_set_rfmon(handle);
That all isn't doing anything useful, as you're not checking its return value. If you are capturing on a Wi-Fi device, and you want to capture in monitor mode, you call pcap_set_rfmon() - not pcap_can_set_rfmon() - on the handle after creating and before activating the handle.
//Activate the handle
if(pcap_activate(handle)){
std::cout <<"Activated!"<<std::endl;
}
else{
exit(EXIT_FAILURE);
}
To quote the pcap_activate() man page:
RETURN VALUE
pcap_activate() returns 0 on success without warnings, PCAP_WARN-
ING_PROMISC_NOTSUP on success on a device that doesn't support promis-
cuous mode if promiscuous mode was requested, PCAP_WARNING on success
with any other warning, PCAP_ERROR_ACTIVATED if the handle has already
been activated, PCAP_ERROR_NO_SUCH_DEVICE if the capture source speci-
fied when the handle was created doesn't exist, PCAP_ERROR_PERM_DENIED
if the process doesn't have permission to open the capture source,
PCAP_ERROR_RFMON_NOTSUP if monitor mode was specified but the capture
source doesn't support monitor mode, PCAP_ERROR_IFACE_NOT_UP if the
capture source is not up, and PCAP_ERROR if another error occurred. If
PCAP_WARNING or PCAP_ERROR is returned, pcap_geterr() or pcap_perror()
may be called with p as an argument to fetch or display a message
describing the warning or error. If PCAP_WARNING_PROMISC_NOTSUP,
PCAP_ERROR_NO_SUCH_DEVICE, or PCAP_ERROR_PERM_DENIED is returned,
pcap_geterr() or pcap_perror() may be called with p as an argument to
fetch or display an message giving additional details about the problem
that might be useful for debugging the problem if it's unexpected.
This means that the code above is 100% wrong - if pcap_activate() returns a non-zero value, it may have failed, and if it returns 0, it succeeded.
If the return value is negative, it's an error value, and it has failed. If it's non-zero but positive, it's a warning value; it has succeeded, but, for example, it might not have turned promiscuous mode on, as the OS or device might not let promiscuous mode be set.
So what you want is, instead:
//Activate the handle
int status;
status = pcap_activate(handle);
if(status >= 0){
if(status == PCAP_WARNING){
// warning
std:cout << "Activated, with warning: " << pcap_geterror(handle) << std::endl;
}
else if (status != 0){
// warning
std:cout << "Activated, with warning: " << pcap_statustostr(status) << std::endl;
}
else{
// no warning
std::cout <<"Activated!"<<std::endl;
}
}
else{
if(status == PCAP_ERROR){
std:cout << "Failed to activate: " << pcap_geterror(handle) << std::endl;
}
else{
std:cout << "Failed to activate: " << pcap_statustostr(status) << std::endl;
}
exit(EXIT_FAILURE);
}