Use socket io in C++ for dynamic number of bytes - c++

I'm trying to implement a simple client-server application where a client or the server can send a dynamic number of bytes in a single write() call.
For example, let's assume that the client sends a byte stream of 1500 bytes. And server reads every 1000 bytes.
int BUFFER_SIZE = 1000;
...
read( iSockFD, cBuffer, BUFFER_SIZE );
I can use a loop and call read until its return value is 0. But the client may have multiple write() calls in a loop (i.e. sending multiple messages).
My question is, will it affect the read() on the server side? Meaning, will two consecutive write() of 1000 bytes in the client side, be read by a single read() with 2000 bytes buffer size at the server side?
If that's the case, what are the recommended ways of implementing such a scenario? Should I use a separator for messages (using an encoding algorithm)?
I understand this more related to sockets itself rather than C++. But, your help and guidance are highly appreciated.
UPDATE:
The intention is to implement a simple middleware system to send different types of messages, where the messages will be encoded in binary before sending.

Nobody can guarantee you that a write(x) will trigger a read(x) at the receiver side. If x is larger than your socket receive buffer, or if you call read() before the entire message has been received in the socket receive buffer, then read() will only return a fraction of the data and require you to issue a subsequent read() to get the rest.
The recommended way of doing this would be to define a message buffer of sufficient size. Every call to read() will return 1 or more bytes, which you keep enqueuing into the buffer. Now, once the buffer is larger than 4 bytes + the be32toh(integer) stored in the first 4 bytes of the buffer, you have to consume the integer plus the following x bytes from the beginning of the buffer (and process them further). This will allow you to nicely handle cases where read() contains the end of a previously unfinished packet and at the same time contains the beginning of the next (incomplete) packet.
Just make sure that every payload you transmit is prepended by a htobe32(length).

Related

Does WSASend send all WSABUF buffers as a single packet?

The title probably explains itself best.
Anyway, I have a data buffer received from an another source, and I want to send it in a single UDP packet that contains a sequence number (as the first byte) -> I want to add the sequence number to the given buffer!
Instead of allocating a new buffer, settings it's size to size+4, setting the sequence number as the first byte and copying the data into the buffer, I would like to just use the scatter gather mechanism of WSA.
Sadly though, no WSA document specifies explicitly that WSASend guarantees that all buffer will be sent a single packet (The packet size will be held as < 1500 bytes).
Can I be certain that it will work that way? Or should I re-build the packet?
Best,
Daniel
It is documented in a round-about way:
For message-oriented sockets, do not exceed the maximum message size of the underlying provider, which can be obtained by getting the value of socket option SO_MAX_MSG_SIZE. If the data is too long to pass atomically through the underlying protocol the error WSAEMSGSIZE is returned, and no data is transmitted.
So clearly it combines the data from the buffers to make a single UDP packet. If it didn't then there would no point in returning the WSAEMSGSIZE error.

How can I use boost::asio for a TCP protocol without a header which tells me the message's size?

The Boost chat server example demonstrates handling a simple TCP message protocol in which each message is preceded by a fixed-size header which tells you the size of the message which follows. This means you always know exactly how many bytes to read in your next call to async_read(); you alternate between reading a header whose size is always the same, and a message whose size is given in the header. This works well with the Boost i/o service model, which promises to call a handler when exactly the expected number of bytes have been received from the socket.
How can I use Boost to run a TCP protocol which doesn't use a header like this? My client has a protocol which uses special byte sequences to represent the start and end of each message, so I won't know how many bytes to read in each call to async_read(); I have to just get bytes from the socket as they arrive and watch for the special byte sequences. If I pick a sensible buffer size like 256 bytes, and if my handler will only be called when that many bytes have been read, I believe the i/o service will generally end up receiving the last few bytes of the most recent message from the network, but not passing them to my handler until the next message comes along and brings the byte total up to the number I'm expecting. The next message may not arrive for some time, and I want to handle the current message as soon as it arrives.
Reading one byte at a time isn't a good idea for performance reasons, correct?
http://www.boost.org/doc/libs/1_45_0/doc/html/boost_asio/examples.html
There is few options:
You can use async_read_until to read until your "ending
sequence"(so until end of message).
If your "ending sequence" depends
on "starting sequence", you can make it to read fixed buffer (equal
to starting sequence length); calculate the ending sequence; and then
setup async_read_until.
Also, you can make call to async_read_some to read any amount of bytes which arrived into socket buffer. Then check your buffer with your own function for containing complete packet or need to read next part.

Read from socket less than is available to read

I cannot find the answer for this one: what will happen if I read from socket 4bytes (I set the limit for 4 bytes) but there are actually 256bytes awaiting to be read? Will they be lost or will they wait until the next call of read function?
If it's a TCP socket, then no data will get lost; it'll get queued up.
Bear in mind that you have to be prepared to deal with partial reads, i.e. where you get fewer bytes than requested and have to call read() again to get more.
It depends what kind of socket you use. If it is stream socket (created with SOCK_STREAM), then it supports a stream of data, and you can read it even by 1 byte (though it will be not efficient), on another side you may request 1024 bytes but get only 1. And that almost irrelevant by what portions sender put them into stream (there is dependency, but you should not rely on that). So with stream you need to define end of data by higher level protocol. You may send strings with \n at the end, or use zero terminated string, or send some bytes of size of coming data before that data.
On another side if you use datagram protocol (created with SOCK_DGRAM) you will get data by packets - whatever size sender sent them. If you provide smaller buffer than data available, it will be truncated and remaining data is discarded.

Reading arbitrarily length messages? How do you know your finished

EDIT!
Just read that read will block until the buffer is full. How on earth to I receive smaller packets with out having to send 1MB (my max buffer length) each time? What If I want to send arbitrarily length messages?
In Java you seem to be able to just send a char array without any worries. But in C++ with the boost sockets I seem to either have to keep calling socket.read(...) until I think I have everything or send my full buffer length of data which seems wasteful.
Old original question for context.
Yet again boost sockets has me completely stumped. I am using
boost::asio::ssl::stream<boost::asio::ip::tcp::socket> socket; I
used the boost SSL example for guidance but I have dedicated a thread
to it rather than having the async calls.
The first socket.read_some(...) of the socket is fine and it reads
all the bytes. After that it reads 1 byte and then all the rest on the
next socket.read_some(...) which had me really confused. I then
noticed that read_some typically has this behaviour. So I moved to
boost::asio::read as socket does have a member function read which
surprised me. However noticed boost::asio has a read function that
takes a socket and buffer. However it is permanently blocking.
//read blocking data method
//now
bytesread = boost::asio::read(socket,buffer(readBuffer, max_length)); << perminatly blocks never seems to read.
//was
//bytesread = socket.read_some(buffer(readBuffer, max_length)); << after the 1st read it will always read one byte and need another
socket.read_some(...) call to read the rest.
What do I need to do make boost::asio::read(...) work?
note .. I have used wireshark to make sure that the server is not
sending the data broken up. The server is not faulty.
Read with read_some() in a loop merging the buffers until you get a complete application message. Assume you can get back anything between 1 byte and full length of your buffer.
Regarding "knowing when you are finished" - that goes into your application level protocol, which could use either delimited messages, fixed length messages, fixed length headers that tell payload length, etc.

How C++ `recv` function acts at data receving? Could it receive a partial "packet"?

static void HandlePackets(void* pParams)
{
int iResult = 0;
char recvbuf[MAX_PACKET_LENGTH];
printf("Packet handling started\n");
while((iResult = recv(lhSocket, recvbuf, MAX_PACKET_LENGTH, 0)) > 0)
printf("Bytes received: %d\n", iResult);
printf("Packet handling stopped with reason %i", WSAGetLastError());
}
For now, it only prints the amount of bytes received.
Could such things happen, that recv will receive only a half of packet? Or one full packet and half of next packet, if server sent them one by one fast?
For example, server sent a single packet with 512 bytes length, is it possible that recv first got 500 bytes, and the remain 12 will receive from second attempt?
If server sending a lot of packets with 512 bytes length for each, is it possible that recv will get 700 bytes from first executing and the remain bytes from second?
MAX_PACKET_LENGTH is 1024
(I talk here about application layer packets, not transport layer.)
The whole problem is - do I need to make some possibility for client to combine received bytes into one packet or split over-received bytes to different packets?
Is it possible that recv first got 500 bytes, and the remain 12 will receive from second attempt?
Yes, definitely.
You have no guarantee that, when the sending end sends a burst of X bytes, that the receiving end will pick them all up in a single call to recv. recv doesn't even know how many bytes are in your application-layer "packet".
The whole problem is - do I need to make some possibility for client to combine received bytes into one packet or split over-received bytes to different packets?
Your application will definitely have to accumulate data from possibly sequential reads, fill up a buffer, and implement parsing to look for a full packet.
TCP/IP doesn't know your application's protocol; as David said, if you want to split the incoming data stream into "packets" then you must do that yourself.
Yes, with TCP, it can happen. But it's not a problem. If you receive too little, call receive again. If you receive too much, well that's great because it just saves you the trouble of having to call receive again.
The networking stack knows TCP, but it doesn't know the protocol you are implementing. If you want to divide the byte stream into messages, that's your job.
If you don't make the client do it, how will it possibly happen? The networking stack has no idea what your application layer packets are like. It has no idea what constitutes a complete application layer packet since it's not at the application layer.
Note that is the rule for TCP and other byte-stream protocols. Other protocols may have different semantics.
In TCP communication sender uses write() (possibly in a loop) to send data. On the receiver side, read() copies received data from a socket buffer into your buffer at application level. If one write() sends let's say 900 bytes, TCP can break it into multiple chunks of various sizes...e.g. 300, 400, and 200 bytes, so on the receiving side you need to call read() three times in order to receive all data.
Now, if you put recv() in a loop and each time it fills entire buffer or its part, how do you know when to stop receiving? When sender sends all data and gracefully closes connection, your recv() will return 0. There is nothing more to receive, you can close your socket.
I mentioned filling the buffer in a loop. If you're not processing data from receiving buffer in the recv() loop, you need to preserve it somewhere, otherwise each iteration might overwrite it. (You can advance buffer pointer in each iteration but that would work only if you know in advance the length of the packet.) You can copy each received chunk into queue or some other data structure. Data processing usually goes in parallel with data receiving - and in another, processing thread.
But let's go back to recv() loop. Apart from waiting for 0, there is another trick of how can receiver know when to stop receiving: sender and receiver can agree (know) that e.g. first two bytes sent will carry the length of the message. So at the beginning, receiver will wait only for two bytes. Once it receives them, it will unpack the information on the message size, let's say 900 bytes. Now receiver can adjust its buffer size to 900 and receives in a loop until all 900 bytes are received. Each recv() will return number of bytes received, and receiver can advance buffer pointer by that number of bytes so the next recv() writes into free part of the buffer.
Btw, this shared knowledge (contract) between client and server (or receiver and sender) is your communication protocol at the application level. It comes on the top of TCP protocol.