Winsock2 tcp/ip - some data packets are ignored probably due to null terminator from the previous packet - c++

I wrote a simple client-server program. Network.h is a header file which uses Winsock2.h (TCP/IP mode) to create socket, accept/connect in blocking mode, send/recv in non-blocking mode. I made it so that the function string TNetwork::Recv(int size) will return the string "Nothing" if it gets WSAWOULDBLOCK error (no data is received yet)
Here is my main function:
int main(){
string Ans;
TNetwork::StartUp(); //WSA start up, etc
cin >> Ans;
if (Ans == "0"){ // 0 --> server
TNetwork::SetupAsServer(); //accept connection (in blocking mode!)
while (true){
TNetwork::Send("\nAss" + '\0'); //without null terminator, the client may read extra bytes, causing undefined behavior (?)
TNetwork::Send("embly" + '\0');
cin >> Ans;
}
}
else{ // others --> regard Ans as IP address. e.g. I can type "127.0.0.1"
TNetwork::SetupAsClient(Ans);
string Rec;
while (true){
Rec = TNetwork::Recv(1000);
if (Rec != "Nothing"){
cout << Rec;
}
}
}
system("PAUSE");
}
Supposedly, the client would print "Assembly" when connected, and when the server enters anything to its console window. Sometimes, though, the client would only print out "\nAss" in the console without the "embly.
To my understanding, TCP/IP ensures all data to be sent and in the correct order, so I guess what happens is that both packets arrive at the same time, which happen quite often over the unstable internet. And due to this null terminator, the client would ignore the "embly", since the Recv() function stopped reading when it hits a null terminator.
So, how can I ensure that the client will always read all data packets correctly?

Yes, the network stack will send the data in the correct order and doesn't care what termination type you use. This has to do with how you're receiving and processing the data stream (note: not packets, stream). If you receive all 11 bytes and print it to the screen, the print function will stop when it reaches the zero, but the rest of the data is still there.
Note: since it's a stream, what happens if you received only 10 bytes of data from the stream? You need to scan what you receive for the zero to know if you've received a full "zero-terminated string" if that's how you want to communicate your data.
EDIT: Also, I don't think "\nAss" + '\0' is doing what you think it is. Instead of adding a 0 character to the end of the string (which already has one, by the way), it's adding 0 to your string pointer.

As #mark points out, TCP is all about streams, not packets. TCP takes care of ensuring that data is reliably transmitted from A to B and that the data is delivered to the consumer in the order in which it was transmitted. Yes, the data is packetized on the wire, but the TCP stack on the system takes those packets and builds the stream which it makes available to you through the recv() function. The TCP stack handles out-of-order data, missing data, and duplicated data such that by the time your application sees it, the stream is a mirror-copy of when the sender sent.
To properly receive TCP data, you will typically need some kind of loop that reads data from the socket when it becomes available. The way I normally do this is to have a thread that is dedicated to servicing the socket. In the thread function is a loop that reads data from the socket when it becomes available and is idle otherwise. This loop reads data into a buffer of, say, 1 KB. Once the data is received from the socket into this buffer, the buffer is copied to another thread for processing. In the thread function for the processing thread is a loop that receives the 1 KB buffers from the socket thread and adds them to the back end of a master buffer of, say, 1 MB. The processing thread then processes the messages out of this master buffer and makes them available to the application.
For a simple demo application, two threads may be overkill. The two threads I've described could be certainly be combined into one, but for my application, it is more efficient to have two threads and take advantage of the multiple cores on my system. The point is, if you're going to have a front-end UI, there's not going to be a way around using at least one thread and still have the UI be responsive.
One other thing. There are two commonly-used mechanisms for protocol design. You're using one, namely, a marker (e.g., a null terminator, etc.) to signal the begin/end of a message. I don't prefer this mechanism mainly because the marker may actually need to be part of the message at some point. The other mechanism is to have a header on each message that tells, at a minimum, how long the message is. I prefer this mechanism and include in my headers a sync word and the message type as well. For example,
struct Header
{
__int16 _sync; // a hex pattern, e.g., 0xABCD
__int16 _type;
__int32 _length;
}
That's a total of 8 bytes. So when processing from the master buffer, I read the first 8 bytes, verify the sync word, and get the length. I determine if there are 'length' bytes available in the master buffer. If not, I have to wait until the socket thread provides me more data before checking again. If so, I extract 'length' bytes from the master buffer and pass that to an object created according to the specified type, which knows how to interpret that particular message. Then repeat.
As I mentioned, I use a master buffer of 1 MB or so. As messages are processed, it is important to remove them from the master buffer so there is additional space available for new data on the back end. This involves simply copying the unprocessed data, if any, to the beginning of the buffer. In cases where data comes in faster than you can process it, the master buffer may need the ability to resize itself to accommodate the additional data.
I hope that's not overwhelming. Start simple and add as you go.

Related

How to tell if SSL_read has received and processed all the records from single message

Following is the dilemma,
SSL_read, on success returns number of bytes read, SSL_pending is used to tell if the processed record has more that to be read, that means probably buffer provided is not sufficient to contain the record.
SSL_read may return n > 0, but what if this happens when first records has been processed and message effectively is multi record communication.
Question: I am using epoll to send/receive messages, which means I have to queue up event in case I expect more data. What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
PS: This code hasn't been tested so it may be incorrect. Purpose of the code is to share the idea that I am trying to implement.
Following is code snippet for the read -
//read whatever is available.
while (1)
{
auto n = SSL_read(ssl_, ptr_ + tail_, sz_ - tail_);
if (n <= 0)
{
int ssle = SSL_get_error(ch->ssl_, rd);
auto old_ev = evt_.events;
if (ssle == SSL_ERROR_WANT_READ)
{
//need more data to process, wait for epoll notification again
evt_.events = EPOLLIN | EPOLLERR;
}
else if (err == SSL_ERROR_WANT_WRITE)
{
evt_.events = EPOLLOUT | EPOLLERR;
}
else
{
/* connection closed by peer, or
some irrecoverable error */
done_ = true;
tail_ = 0; //invalidate the data
break;
}
if (old_ev != evt_.events)
if (epoll_ctl(epoll_fd_, EPOLL_CTL_MOD, socket_fd_, &evt_) < 0)
{
perror("handshake failed at EPOLL_CTL_MOD");
SSL_free(ssl_);
ssl_ = nullptr;
return false;
}
}
else //some data has been read
{
tail_ = n;
if (SSL_pending(ssl_) > 0)
//buffer wasn't enough to hold the content. resize and reread
resize();
else
break;
}
}
```
enter code here
SSL_read() returns the number of decrypted bytes returned in the caller's buffer, not the number of bytes received on the connection. This mimics the return value of recv() and read().
SSL_pending() returns the number of decrypted bytes that are still in the SSL's buffer and haven't been read by the caller yet. This would be equivalent to calling ioctl(FIONREAD) on a socket.
There is no way to know how many SSL/TLS records constitute an "application message", that is for the decrypted protocol data to dictate. The protocol needs to specify where a message ends and a new message begins. For instance, by including the message length in the message data. Or delimiting messages with terminators.
Either way, the SSL/TLS layer has no concept of "messages", only an arbitrary stream of bytes that it encrypts and decrypts as needed, and transmits in "records" of its choosing. Similar to how TCP breaks up a stream of arbitrary bytes into IP frames, etc.
So, while your loop is reading arbitrary bytes from OpenSSL, it needs to process those bytes to detect separations between protocol messages, so it can then act accordingly per message.
What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
I'd have hoped that your message has a header with the number of records in it. Otherwise the protocol you've got is probably unparseable.
What you'd need is to have a stateful parser that consumes all the available bytes and outputs records once they are complete. Such a parser needs to suspend its state once it reaches the last byte of decrypted input, and then must be called again when more data is available to be read. But in all cases if you can't predict ahead of time how much data is expected, you won't be able to tell when the message is finished - that is unless you're using a self-synchronizing protocol. Something like ATM headers would be a starting point. But such complication is unnecessary when all you need is just to properly delimit your data so that the packet parser can know exactly whether it's got all it needs or not.
That's the problem with sending messages: it's very easy to send stuff that can't be decoded by the receiver, since the sender is perfectly fine with losing data - it just doesn't care. But the receiver will certainly need to know how many bytes or records are expected - somehow. It can be told this a-priori by sending headers that include byte counts or fixed-size record counts (it's the same size information just in different units), or a posteriori by using unique record delimiters. For example, when sending printable text split into lines, such delimiters can be Unicode paragraph separators (U+2029).
It's very important to ensure that the record delimiters can't occur within the record data itself. Thus you need some sort of a "stuffing" mechanism, where if a delimiter sequence appears in the payload, you can alter it so that it's not a valid delimiter anymore. You also need an "unstuffing" mechanism so that such altered delimiter sequences can be detected and converted back to their original form, of course without being interpreted as a delimiter. A very simple example of such delimiting process is the octet-stuffed framing in the PPP protocol. It is a form of HDLC framing. The record separator is 0x7E. Whenever this byte is detected in the payload, it is escaped - replaced by a 0x7D 0x5E sequence. On the receiving end, the 0x7D is interpreted to mean "the following character has been XOR'd with 0x20". Thus, the receiver converts 0x7D 0x5E to 0x5E first (it removes the escape byte), and then XORs it with 0x20, yielding the original 0x7E. Such framing is easy to implement but potentially has more overhead than framing with a longer delimiter sequence, or even a dynamic delimiter sequence whose form differs for each position within the stream. This could be used to prevent denial-of-service attacks, when the attacker may maliciously provide a payload that will incur a large escaping overhead. The dynamic delimiter sequence - especially if unpredictable, e.g. by negotiating a new sequence for every connection - prevents such service degradation.

c++ streaming udp data into a queue?

I am streaming data as a string over UDP, into a Socket class inside Unreal engine. This is threaded, and runs in the background.
My read function is:
float translate;
void FdataThread::ReceiveUDP()
{
uint32 Size;
TArray<uint8> ReceivedData;
if (ReceiverSocket->HasPendingData(Size))
{
int32 Read = 0;
ReceivedData.SetNumUninitialized(FMath::Min(Size, 65507u));
ReceiverSocket->RecvFrom(ReceivedData.GetData(), ReceivedData.Num(), Read, *targetAddr);
}
FString str = FString(bytesRead, UTF8_TO_TCHAR((const UTF8CHAR *)ReceivedData));
translate = FCString::Atof(*str);
}
I then call the translate variable from another class, on a Tick, or timer.
My test case sends an incrementing number from another application.
If I print this number from inside the above Read function, it looks as expected, counting up incrementally.
When i print it from the other thread, it is missing some of the numbers.
I believe this is because I call it on the Tick, so it misses out some data due to processing time.
My question is:
Is there a way to queue the incoming data, so that when i pull the value, it is the next incremental value and not the current one? What is the best way to go about this?
Thank you, please let me know if I have not been clear.
Is this the complete code? ReceivedData isn't used after it's filled with data from the socket. Instead, an (in this code) undefined variable 'buffer' is being used.
Also, it seems that the while loop could run multiple times, overwriting old data in the ReceivedData buffer. Add some debugging messages to see whether RecvFrom actually reads all bytes from the socket. I believe it reads only one 'packet'.
Finally, especially when you're using UDP sockets over the network, note that the UDP protocol isn't guaranteed to actually deliver its packets. However, I doubt this is causing your problems if you're using it on a single computer or a local network.
Your read loop doesn't make sense. You are reading and throwing away all datagrams but the last in any given sequence that happen to be in the socket receive buffer at the same time. The translate call should be inside the loop, and the loop should be while(true), or while (running), or similar.

TCP Socket - read most recent data from input queue [duplicate]

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)

Socket Commuication with High frequency

I need to send data to another process every 0.02s.
The Server code:
//set socket, bind, listen
while(1){
sleep(0.02);
echo(newsockfd);
}
void echo (int sock)
{
int n;
char buffer[256]="abc";
n=send(sock,buffer,strlen(buffer),0);
if (n < 0) error("ERROR Sending");
}
The Client code:
//connect
while(1)
{
bzero(buffer,256);
n = read(sock,buffer,255);
printf("Recieved data:%s\n",buffer);
if (n < 0)
error("ERROR reading from socket");
}
The problem is that:
The client shows something like this:
Recieved data:abc
Recieved data:abcabcabc
Recieved data:abcabc
....
How does it happen? When I set sleep time:
...
sleep(2)
...
It would be ok:
Recieved data:abc
Recieved data:abc
Recieved data:abc
...
TCP sockets do not guarantee framing. When you send bytes over a TCP socket, those bytes will be received on the other end in the same order, but they will not necessarily be grouped the same way — they may be split up, or grouped together, or regrouped, in any way the operating system sees fit.
If you need framing, you will need to send some sort of packet header to indicate where each chunk of data starts and ends. This may take the form of either a delimiter (e.g, a \n or \0 to indicate where each chunk ends), or a length value (e.g, a number at the head of each chunk to denote how long it is).
Also, as other respondents have noted, sleep() takes an integer, so you're effectively not sleeping at all here.
sleep takes unsigned int as argument, so sleep(0.02) is actually sleep(0).
unsigned int sleep(unsigned int seconds);
Use usleep(20) instead. It will sleep in microseconds:
int usleep(useconds_t usec);
The OS is at liberty to buffer data (i.e. why not just send a full packet instead of multiple packets)
Besides sleep takes a unsigned integer.
The reason is that the OS is buffering data to be sent. It will buffer based on either size or time. In this case, you're not sending enough data, but you're sending it fast enough the OS is choosing to bulk it up before putting it on the wire.
When you add the sleep(2), that is long enough that the OS chooses to send a single "abc" before the next one comes in.
You need to understand that TCP is simply a byte stream. It has no concept of messages or sizes. You simply put bytes on the wire on one end and take them off on the other. If you want to do specific things, then you need to interpret the data special ways when you read it. Because of this, the correct solution is to create an actual protocol for this. That protocol could be as simple as "each 3 bytes is one message", or more complicated where you send a size prefix.
UDP may also be a good solution for you, depending on your other requirements.
sleep(0.02)
is effectively
sleep(0)
because argument is unsigned int, so implicit conversion does it for you. So you have no sleep at all here. You can use sleep(2) to sleep for 2 microseconds.Next, even if you had, there is no guarantee that your messages will be sent in a different frames. If you need this, you should apply some sort of delimiter, I have seen
'\0'
character in some implementation.
TCPIP stacks buffer up data until there's a decent amount of data, or until they decide that there's no more coming from the application and send what they've got anyway.
There are two things you will need to do. First, turn off Nagle's algorithm. Second, sort out some sort of framing mechanism.
Turning off Nagle's algorithm will cause the stack to "send data immediately", rather than waiting on the off chance that you'll be wanting to send more. It actually leads to less network efficiency because you're not filling up Ethernet frames, something to bare in mind on Gigabit where jumbo frames are required to get best throughput. But in your case timeliness is more important than throughput.
You can do your own framing by very simple means, eg by send an integer first that says how long the rest if the message will be. At the reader end you would read the integer, and then read that number of bytes. For the next message you'd send another integer saying how long that message is, etc.
That sort of thing is ok but not hugely robust. You could look at something like ASN.1 or Google Protocol buffers.
I've used Objective System's ASN.1 libraries and tools (they're not free) and they do a good job of looking after message integrity, framing, etc. They're good because they don't read data from a network connection one byte at a time so the efficiency and speed isn't too bad. Any extra data read is retained and included in the next message decode.
I've not used Google Protocol Buffers myself but it's possible that they have similar characteristics, and there maybe other similar serialisation mechanisms out there. I'd recommend avoiding XML serialisation for speed/efficiency reasons.

Handling partial return from recv() TCP in C

I've been reading through Beej's Guide to Network Programming to get a handle on TCP connections. In one of the samples the client code for a simple TCP stream client looks like:
if ((numbytes = recv(sockfd, buf, MAXDATASIZE-1, 0)) == -1) {
perror("recv");
exit(1);
}
buf[numbytes] = '\0';
printf("Client: received '%s'\n", buf);
close(sockfd);
I've set the buffer to be smaller than the total number of bytes that I'm sending. I'm not quite sure how I can get the other bytes. Do I have to loop over recv() until I receive '\0'?
*Note on the server side I'm also implementing his sendall() function, so it should actually be sending everything to the client.
See also 6.1. A Simple Stream Server in the guide.
Yes, you will need multiple recv() calls, until you have all data.
To know when that is, using the return status from recv() is no good - it only tells you how many bytes you have received, not how many bytes are available, as some may still be in transit.
It is better if the data you receive somehow encodes the length of the total data. Read as many data until you know what the length is, then read until you have received length data. To do that, various approaches are possible; the common one is to make a buffer large enough to hold all data once you know what the length is.
Another approach is to use fixed-size buffers, and always try to receive min(missing, bufsize), decreasing missing after each recv().
The first thing you need to learn when doing TCP/IP programming: 1 write/send call might take
several recv calls to receive, and several write/send calls might need just 1 recv call to receive. And anything in-between.
You'll need to loop until you have all data. The return value of recv() tells you how much data you received. If you simply want to receive all data on the TCP connection, you can loop until recv() returns 0 - provided that the other end closes the TCP connection when it is done sending.
If you're sending records/lines/packets/commands or something similar, you need to make your own protocol over TCP, which might be as simple as "commands are delimited with \n".
The simple way to read/parse such a command would be to read 1 byte at a time, building up a buffer with the received bytes and check for a \n byte every time. Reading 1 byte is extremely inefficient, so you should read larger chunks at a time.
Since TCP is stream oriented and does not provide record/message boundaries it becomes a bit more tricky - you'd
have to recv a piece of bytes, check in the received buffer for a \n byte, if it's there - append the bytes to previously received bytes and output that message. Then check the remainder of the buffer after the \n - which might contain another whole message or just the start of another message.
Yes, you have to loop over recv() until you receive '\0' or an
error happen (negative value from recv) or 0 from recv().
For the first option: only if this zero is part of your
protocol (the server sends it). However from your code it seems that
the zero is just to be able to use the buffer content as a
C-string (on the client side).
The check for a return value of 0 from recv:
this means that the connection was closed (it could be part
of your protocol that this happens.)