Clearing a read() buffer while using a socket - c++

Recently I've been messing around with some sockets by trying to make a client/server program. So far I have been successful, but it seems I hit a roadblock. For some quick background information, I made a server that can accept a connection, and once everything is set up and a connection to a client is made, this block of code begins to exectue:
while(1){
read(newsockfd, &inbuffer, 256);
std::cout << "Message from client " << inet_ntoa(cli_addr.sin_addr) << " : ";
for(int i = 0; i < sizeof(inbuffer); i++){
std::cout << inbuffer[i];
}
std::cout << std::endl;
}
Now the client simply, when executed, connects to the server and writes to the socket, and then exits. So since one message was sent, this loop should only run once, and then wait for another message if what I read was correct.
But what ends up happenning is that this loop continues over and over, printing the same message over and over. From what I read (on this site and others) about the read() function is that after it is called once, it waits for another message to be recieved. I may be making a stupid mistake here, but is there any way I can have this read() function wait for a new message, instead of using the same old message over and over? Or is there another function that could replace read() to do what I want it to?
Thanks for any help.

You don't check the return value of read. So if the other end closes the connection or there's an error, you'll just loop forever outputting whatever happened to be in the buffer. You probably want:
while(1){
int msglen = read(newsockfd, &inbuffer, 256);
if (msglen <= 0) break;
std::cout << "Data from client " << inet_ntoa(cli_addr.sin_addr) << " : ";
for(int i = 0; i < msglen; i++){
std::cout << inbuffer[i];
}
std::cout << std::endl;
}
Notice that I changed the word "message" to "data". Here's why:
So since one message was sent, this loop should only run once, and then wait for another message if what I read was correct.
This is incorrect. The code above does not have any concept of a "message", and TCP does not preserve application message boundaries. So not only is this wrong, there's no way it could be correct because the word "message" has no meaning that could possibly apply in this context. TCP does not "glue together" the bytes that happend to be passed in a single call to a sending function.

Related

How to use asio buffer after async_read_until for consecutive reads

I am reading from a serial device where each message must be specifically requested. E.g. you send a request and get a response with the serialised payload.
Each message contains these parts in order:
PREAMBLE (2 bytes, "$M")
HEADER (3 bytes, containing payload length N)
PAYLOAD+CRC (N+1 bytes)
My approach with asio is to detect the start (PREAMBLE) of a message by using asio::async_read_until and afterwards using asio::async_read for reading the exact amount of bytes for HEADER and PAYLOAD+CRC. Since there is no static pattern at the end of the message, I cannot use async_read_until to read the full message.
After receiving PREAMBLE, the handler for async_read_until gets called and the buffer contains the PREAMBLE bytes and might contain additional bytes from HEADER and PAYLOAD+CRC.
The asio documentation for async_read_until says:
After a successful async_read_until operation, the streambuf may
contain additional data beyond the delimiter. An application will
typically leave that data in the streambuf for a subsequent
async_read_until operation to examine.
I interpret this as that you should only consume the requested bytes and leave all remaining bytes in the buffer for further reads. However, all consecutive reads block since the data is already in the buffer and there is nothing left on the device.
The reading is implemented as a small state machine processState, where different handlers are registered depending on which part of the message is to be read. All reading is done with the same buffer (asio::streambuf). processState is called in an infinite loop.
void processState() {
// register handler for incomming messages
std::cout << "state: " << parser_state << std::endl;
switch (parser_state) {
case READ_PREAMBLE:
asio::async_read_until(port, buffer, "$M",
std::bind(&Client::onPreamble, this, std::placeholders::_1, std::placeholders::_2));
break;
case READ_HEADER:
asio::async_read(port, buffer, asio::transfer_exactly(3),
std::bind(&Client::onHeader, this, std::placeholders::_1, std::placeholders::_2));
break;
case READ_PAYLOAD_CRC:
asio::async_read(port, buffer, asio::transfer_exactly(request_received->length+1),
std::bind(&Client::onDataCRC, this, std::placeholders::_1, std::placeholders::_2));
break;
case PROCESS_PAYLOAD:
onProcessMessage();
break;
case END:
parser_state = READ_PREAMBLE;
break;
}
// wait for incoming data
io.run();
io.reset();
}
The PREAMBLE handler onPreamble is called when receiving the PREAMBLE:
void onPreamble(const asio::error_code& error, const std::size_t bytes_transferred) {
std::cout << "onPreamble START" << std::endl;
if(error) { return; }
std::cout << "buffer: " << buffer.in_avail() << "/" << buffer.size() << std::endl;
// ignore and remove header bytes
buffer.consume(bytes_transferred);
std::cout << "buffer: " << buffer.in_avail() << "/" << buffer.size() << std::endl;
buffer.commit(buffer.size());
std::cout << "onPreamble END" << std::endl;
parser_state = READ_HEADER;
}
After this handler, no other handlers get called since the data is in the buffer and no data is left on the device.
What is the correct way to use asio::streambuf such that the handlers of consecutive async_read get called and I can process bytes in order of the state machine? I don't want to process the remaining bytes in onPreamble since it is not guaranteed that these will contain the full message.
You don't need the call to buffer.commit() in the onPreamble() handler. Calling buffer.consume() will remove the header bytes as you expect and leave the remaining bytes (if any were received) in the asio::streambuf for the next read. The streambuf's prepare() and commit() calls are used when you are filling it with data to send to the remote party.
I just finished a blog post and codecast about using the asio::streambuf to perform a simple HTTP GET with a few web servers. It might give you a better idea of how to use async_read_until() and async_read().

Using boost::asio::async_wait_until with boost::asio::streambuf

I have an application that I am currently developing for communicating with a device using serial communication. For this I am using the boost library basic_serial_port. Right now, I am just attempting to read from the device and am using the async_wait_until function coupled with a async_wait from the deadline_timer class. The code that sets up the read and wait look like this:
async_read_until(port,readData,io_params.delim,
boost::bind(&SerialComm::readCompleted,
this,boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred));
timer.expires_from_now(boost::posix_time::seconds(1));
timer.async_wait(boost::bind(&SerialComm::timeoutExpired,this,
boost::asio::placeholders::error));
The callback on the async_read_until looks like
void SerialComm::readCompleted(const boost::system::error_code& error,
const size_t bytesTransferred){
if (!error){
wait_result = success;
bytes_transferred = bytesTransferred;
}
else {
if (error.value() != 125) wait_result = error_out;
else wait_result = op_canceled;
cout << "Port handler called with error code " + to_string(error.value()) << endl;
}
}
and the following code is triggered on successful read
string msg;
getline(istream(&readData), msg, '\r');
boost::trim_right_if(msg, boost::is_any_of("\r"));
In the case of this device, all messages are terminated with a carriage return, so specifying the carriage return in the async_read_until should retrieve a single message. However, what I am seeing is that, while the handler is triggered, new data is not necessarily entered into the buffer. So, what I might see is, if the handler is triggered 20x
one line pumped into the buffer in the first call
none in the next 6 calls
6 lines in the next call
no data in the next 10
10 lines following
...
I am obviously not doing something correctly, but what is it?
async_read_until does not guarantee it only read up until the first delimiter.
Due to the underlying implementation details, it will just "read what is available" on most systems and will return if the streambuf contains the delimiter. Additional data will be in the streambuf. Moreover, EOF might be returned even if you didn't expect it yet.
See for background Read until a string delimiter in boost::asio::streambuf
So, found the problem here. The way this program is intended to work is that it should
Send a request for data
Start an async_read_until to read data on the port.
Start an async_wait so that it we don't wait forever.
Use io_service::run_one to wait for a timeout or a successful read.
The code for step four looked like this:
for (;;){
// This blocks until an event on io_service_ is set.
n_handlers = io_service_.run_one();
// Brackets in success case limit scope of new variables
switch(wait_result){
case success:{
char c_[1024];
//string msg;
string delims = "\r";
std::string msg{buffers_begin(readData.data()), buffers_begin(readData.data()) + bytes_transferred- delims.size()};
// Consume through the first delimiter.
readData.consume(bytes_transferred);
data_out = msg;
cout << msg << endl;
data_handler(msg);
return data_out;
}
case timeout_expired:
//Set up for wait and read.
wait_result = in_progress;
cout << "Time is up..." << endl;
return data_out;
break;
case error_out:
cout << "Error out..." << endl;
return data_out;
break ;
case op_canceled:
return data_out;
break;
case in_progress:
cout << "In progress..." << endl;
break;
}
}
Only two cases should trigger an exit from the loop - timeout_expired and success. But, as you can see, the system will exit if an operation is cancelled (op_canceled) or if there is an error (error_out).
The problem is that when an async operation is cancelled (i.e. deadline_timer::cancel()) it will trigger an event picked up by io_service::run_one which will set the state evaluated by the switch statement to op_canceled. This can leave async operations stacking up in the event loop. The simple fix is to just comment out the return statement in all cases except for success and timeout_expired.

How to send a struct over a pipe C++

I am very new to writing in c++ and am working on using pipes to communicate between processes. I have written a very simple program that works when I am sending strings or integers but when I try to send a struct (message in this case) I get null when I try to read it on the other side. Does anyone have some insight into this that they would share? Thanks for your time.
#include <unistd.h>
#include <iostream>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BUFFER_LEN sizeof(message)
using namespace std;
struct message{
int from;
string msg;
};
void childCode(int *pipeOUT, int *pipeIN, message buffer){
// Local Buffer for input from pipeIN
cout << "Child: Sending Message"<< endl;
buffer.msg = "Child:I am the child!!";
write(pipeOUT[1],(char*) &buffer, BUFFER_LEN); // Test Child -> Parent comms
cout << "Child: Message Sent"<<endl;
read(pipeIN[0],(char*) &buffer,BUFFER_LEN); // Test Child <- Parent comms
cout << "Child: Recieved: "<< buffer.msg << endl;
cout << "Child Exiting..."<< endl;
exit(0); // Child process End
}
int main(int argCount, char** argVector){
pid_t pid;
int childPipeIN[2];
int childPipeOUT[2];
message buffer; // Buffer for reading from pipe
// Make Parent <- Child pipe
int ret = pipe(childPipeIN);
if (ret == -1){
perror("There was an error creating the childPipeIN. Exiting...");
exit(1);
}
// Make Parent -> Child pipe
ret = pipe(childPipeOUT);
if (ret == -1){
perror("There was an error creating the childPipeOUT. Exiting...");
exit(1);
}
// Fork off Child
pid = fork();
if (pid == -1){
perror("There has been an issue forking off the child. Exiting...");
exit(1);
}
if (pid == 0){ // Child code
cout << "Child PID = " << getpid() << endl;
childCode(childPipeIN,childPipeOUT,buffer);
}
else{ // Parent Code
cout << "Parent PID = " << getpid() << endl;
// Test Parent <- Child comms
read(childPipeIN[0], (char*) &buffer, BUFFER_LEN);
cout << "Parent: I recieved this from the child...\n" << buffer.msg << endl;
buffer.msg = "Parent: Got you message!";
// Test Parent -> Child comms
write(childPipeOUT[1], (char*) &buffer, BUFFER_LEN);
wait(null);
cout << "Parent: Children are done. Exiting..." << endl;
}
exit(0);
}
Yeah. I voted to close. Then I read Dupe more closely and realized it didn't explain the problem or the solution very well, and the solution didn't really fit with OP's intent.
The problem:
One does not simply write a std::string into a pipe. std::string is not a trivial piece of data. There are pointers there that do not sleep.
Come to think of it, it's bloody dangerous to write a std::string into anything. Including another std::string. I would not, could not with a file. This smurf is hard to rhyme, so I'll go no further with Dr. Seuss.
To another process, the pointer that references the storage containing the string's data, the magic that allows strings to be resizable, likely means absolutely nothing, and if it does mean something, you can bet it's not something you want to mess with because it certainly isn't the string's data.
Even in the same process in another std::string the two strings cannot peacefully co-exist pointing to the same memory. When one goes out of scope, resizes, or does practically anything else that mutates the string badness will ensue.
Don't believe me? Check BUFFER_LEN. No matter how big your message gets, BUFFER_LEN never changes.
This applies to everything you want to write that isn't a simple hunk of data. Integer, write away. Structure of integers and an array of characters of fixed size, write away. std::vector? No such luck. You can write std::vector::data if and only if whatever it contains is trivial.
std::is_pod may help you decide what you can and cannot read and write the easy way.
Solution:
Serialize the data. Establish a communications protocol that defines the format of the data, then use that protocol as the basis of your reading and writing code.
Typical solutions for moving a string are null terminating the buffer just like in the good ol' days of C and prepending the size of the string to the characters in the string like the good old days of Pascal.
I like the Pascal approach because it allows you to size the receiver's buffer ahead of time. With null termination you have to play a few dozen rounds of Getta-byte looking for the null terminator and hope your buffer's big enough or compound the ugliness with the dynamic allocation and copying that comes with buffer resizes.
Writing is pretty much what you are doing now, but structure member by structure member. In the above case
Write message.from to pipe.
Write length of message.msg to pipe.
Write message.msg.data() to pipe.
Two caveats:
Watch your endian! Firmly establish the byte order used by your protocol. If the native endian does not match the protocol endian, some bit shifting may be required to re-orient the message.
One man's int may be the size of another man's long so use fixed width integers.
Reading is a bit more complicated because a single call to read will return up to the requested length. It may take more than one read to get all the data you need, so you'll want a function that loops until all of the data arrives or cannot arrive because the pipe, file, socket, whatever is closed.
Loop on read until all of message.from has arrived.
Loop on read until all of the length of message.msg has arrived.
Use message.msg.resize(length) to size message.msg to hold the message.
Loop on read until all of message.msg has arrived. You can read the message directly into message.msg.data().

Strange SIGPIPE in loop

After dealing with a very strange error in a C++ program I was writing, I decided to write the following test code, confirming my suspicion. In the original program, calling send() and this_thread::sleep_for() (with any amount of time) in a loop 16 times caused send to fail with a SIGPIPE signal. In this example however, it fails after 4 times.
I have a server running on port 25565 bound to localhost. The original program was designed to communicate with this server. I'm using the same one in this test code because it doesn't terminate connections early.
int main()
{
struct sockaddr_in sa;
memset(sa.sin_zero, 0, 8);
sa.sin_family = AF_INET;
inet_pton(AF_INET, "127.0.0.1", &(sa.sin_addr));
sa.sin_port = htons(25565);
cout << "mark 1" << endl;
int sock = socket(AF_INET, SOCK_STREAM, 0);
connect(sock, (struct sockaddr *) &sa, sizeof(sa));
cout << "mark 2" << endl;
for (int i = 0; i < 16; i++)
{
cout << "mark 3" << endl;
cout << "sent " << send(sock, &i, 1, 0) << " byte" << endl;
cout << "errno == " << errno << endl;
cout << "i == " << i << endl;
this_thread::sleep_for(chrono::milliseconds(2));
}
return 0;
}
Running it in GDB is how I discovered it was emitting SIGPIPE. Here is the output of that: http://pastebin.com/gXg2Y6g1
In another test, I called this_thread::sleep_for() 16 times in a loop, THEN called send() once. This did NOT produce the same error. It ran without issue.
In yet another test, I commented out the thread sleeping line, and it ran all the way through just fine. I did this in both the original program and the above test code.
These results make me believe it's not a case of the server closing the connection, even though that's usually what SIGPIPE means (why did it run fine when there was no call to this_thread::sleep_for()?).
Any ideas as to what could be causing this? I've been messing around with it for a week and have gotten no further.
Running this on my machine prints up to mark 3 once, as I expected it to. The fact that it does run several times on your end tells me that you have a server listening on port 25565, which you have not included in this question.
Your problem is that you are not testing to see whether the server, of which you have not told us, closed the connection. When it does, your process gets a SIGPIPE. Since you do not handle that signal, your process quits.
What you can do in order to fix this:
Start checking return values of functions. It wouldn't have helped in this particular case, but you ignore potential errors from both connect and send. I'm hoping this is because of minimizing the program, but it is worth mentioning.
Handle the signal. If you prefer to handle server closes from the main flow of your code, you can either register a handler that ignores the signal, or pass the flag MSG_NOSIGNAL to send. In both cases, send will return -1 in such a case with errno set to EPIPE.
RTFM. Seriously. A simple man send and a search for SIGPIPE would give you this answer.
As for why the server closed, I cannot answer this without knowing what server it is and what protocol it is running. No, don't answer that question in the comments. It is irrelevant to this question. The simple truth of the matter is that a server you are talking to might close the connection at any time, and your code must be able to to deal with that.

Strange performance issues reading from stdout

I'm working on some code that will be used to test other executables. For convenience I'll refer to my code as the tester and the code being tested as the client. The tester will spawn the client and send commands to the client's stdin and receive results from the client's stdout.
I wanted to do some performance testing first so I wrote a very simple example tester and client. The tester waits for the client to write "READY" to its stdout and in response it sends "GO" to the client's stdin. The client then writes some number of bytes to stdout, configured via a command line flag, and then writes "\nREADY\n" at which point the tester will again write "GO". This repeats 10,000 times after which I calculate the time it took to complete the test and the "throughput", the 10,000 divided by the time to complete.
I ran the above test having the client send 0, 10, 100, 1000, 10000, and 100000 bytes of data before it sends "READY". For each byte size I repeated the test 10 times and took the average. When run on my laptop in an Ubuntu VMWare instance I got a throughput of about 100k GO/READY pairs per second. The performance was fairly stable and had virtually no dependence on the number of binary bytes the client sends to the tester. I then repeated the test on a very fast, 24 core server running CentOS. With a 0 byte payload I observed only about 55k GO/READY pairs per second and the performance degraded noticably as the number of bytes the client sent increased. When the client sends 100k bytes between "GO" and "READY" the throughput was only about 6k operations per second.
So I have three questions
Why would the same code run much more slowly on a faster machine
Why would the performance in the virtual machine be independent of payload size but the performance on the fast server be heavily dependent on payload size?
Is there anything I can do to make things faster on the server
One possible explanation is that I recompiled the code on the fast server and it is using a different version of the C++ libraries. The VMWare machine is running Ubuntu 11.10 and the fast sever is running CentOS 6. Both are 64 bit machines.
The relevant tester code is as follows:
ios_base::sync_with_stdio(false);
const int BUFFER_SIZE = 2 << 20;
char buffer[BUFFER_SIZE];
process_stdout->rdbuf()->pubsetbuf(buffer, BUFFER_SIZE);
Timer timer;
// Wait until the process is ready
string line;
line.reserve(2 << 20);
getline(*process_stdout, line);
CHECK(line == "READY");
timer.Start();
for (int i = 0; i < num_trials; ++i) {
*process_stdin << "GO\n";
process_stdin->flush();
line = "";
while (line != "READY") {
getline(*process_stdout, line);
}
}
double elapsed = timer.Elapsed();
cout << "Done. Did " << num_trials << " iterations in "
<< elapsed << " seconds. Throughput: "
<< double(num_trials) / elapsed << " per second." << endl;
I also tried versions using read() calls (from unistd.h) into a 1MB buffer and calls to memchr to find the "\n" characters and look for READY but got the same performance results.
The relevant client code is as follows:
// Create a vector of binary data. Some portion of the data will be sent
// to stdout each time a "GO" is received before sending "READY"
vector<char> byte_source;
const int MAX_BYTES = 1 << 20;
for (int i = 0; i < MAX_BYTES; ++i) {
byte_source.push_back(i % 256);
}
cout << "READY" << endl;
while (cin.good()) {
string line;
getline(cin, line);
if (line == "GO") {
// The value of response_bytes comes from a command line flag
OutputData(response_bytes, byte_source);
cout << "READY" << endl;
}
}
// write bytes worth of data from byte_source to stdout
void OutputData(unsigned int bytes,
const vector<char>& byte_source) {
if (bytes == 0) {
return;
}
cout.write(&byte_source[0], bytes);
cout << "\n";
}
Any help would be greatly appreciated!
The fact that the speed in VM is independent of the payload size indicates that you're doing something wrong. These are not complete programs so it's hard to pinpoint what. Use strace to see what's going on, i.e., whether the client actually does send all data you believe it is (and also check that the tester is receiving all data it should be).
100k READY/GO pairs is way too much; it's basically near the upper limit of the number of context switches per second, without doing anything else.