How to determine length of buffer at client side - c++

I have a server sending a multi-dimensional character array
char buff1[][3] = { {0xff,0xfd,0x18} , {0xff,0xfd,0x1e} , {0xff,0xfd,21} }
In this case the buff1 carries 3 messages (each having 3 characters). There could be multiple instances of buffers on server side with messages of variable length (Note : each message will always have 3 characters). viz
char buff2[][3] = { {0xff,0xfd,0x20},{0xff,0xfd,0x27}}
How should I store the size of these buffers on client side while compiling the code.

The server should send information about the length (and any other structure) of the message with the message as part of the message.
An easy way to do that is to send the number of bytes in the message first, then the bytes in the message. Often you also want to send the version of the protocol (so you can detect mismatches) and maybe even a message id header (so you can send more than one kind of message).
If blazing fast performance isn't the goal (and you are talking over a network interface, which tends to be slower than computers: parsing may be cheap enough that you don't care), using a higher level protocol or format is sometimes a good idea (json, xml, whatever). This also helps with debugging problems, because instead of debugging your custom protocol, you get to debug the higher level format.
Alternatively, you can send some sign that the sequence has terminated. If there is a value that is never a valid sequence element (such as 0,0,0), you could send that to say "no more data". Or you could send each element with a header saying if it is the last element, or the header could say that this element doesn't exist and the last element was the previous one.

Related

How to tell if SSL_read has received and processed all the records from single message

Following is the dilemma,
SSL_read, on success returns number of bytes read, SSL_pending is used to tell if the processed record has more that to be read, that means probably buffer provided is not sufficient to contain the record.
SSL_read may return n > 0, but what if this happens when first records has been processed and message effectively is multi record communication.
Question: I am using epoll to send/receive messages, which means I have to queue up event in case I expect more data. What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
PS: This code hasn't been tested so it may be incorrect. Purpose of the code is to share the idea that I am trying to implement.
Following is code snippet for the read -
//read whatever is available.
while (1)
{
auto n = SSL_read(ssl_, ptr_ + tail_, sz_ - tail_);
if (n <= 0)
{
int ssle = SSL_get_error(ch->ssl_, rd);
auto old_ev = evt_.events;
if (ssle == SSL_ERROR_WANT_READ)
{
//need more data to process, wait for epoll notification again
evt_.events = EPOLLIN | EPOLLERR;
}
else if (err == SSL_ERROR_WANT_WRITE)
{
evt_.events = EPOLLOUT | EPOLLERR;
}
else
{
/* connection closed by peer, or
some irrecoverable error */
done_ = true;
tail_ = 0; //invalidate the data
break;
}
if (old_ev != evt_.events)
if (epoll_ctl(epoll_fd_, EPOLL_CTL_MOD, socket_fd_, &evt_) < 0)
{
perror("handshake failed at EPOLL_CTL_MOD");
SSL_free(ssl_);
ssl_ = nullptr;
return false;
}
}
else //some data has been read
{
tail_ = n;
if (SSL_pending(ssl_) > 0)
//buffer wasn't enough to hold the content. resize and reread
resize();
else
break;
}
}
```
enter code here
SSL_read() returns the number of decrypted bytes returned in the caller's buffer, not the number of bytes received on the connection. This mimics the return value of recv() and read().
SSL_pending() returns the number of decrypted bytes that are still in the SSL's buffer and haven't been read by the caller yet. This would be equivalent to calling ioctl(FIONREAD) on a socket.
There is no way to know how many SSL/TLS records constitute an "application message", that is for the decrypted protocol data to dictate. The protocol needs to specify where a message ends and a new message begins. For instance, by including the message length in the message data. Or delimiting messages with terminators.
Either way, the SSL/TLS layer has no concept of "messages", only an arbitrary stream of bytes that it encrypts and decrypts as needed, and transmits in "records" of its choosing. Similar to how TCP breaks up a stream of arbitrary bytes into IP frames, etc.
So, while your loop is reading arbitrary bytes from OpenSSL, it needs to process those bytes to detect separations between protocol messages, so it can then act accordingly per message.
What check will ensure that all the records have been read from single message and it's time to remove this event and queue up an response event that will write the response back to client?
I'd have hoped that your message has a header with the number of records in it. Otherwise the protocol you've got is probably unparseable.
What you'd need is to have a stateful parser that consumes all the available bytes and outputs records once they are complete. Such a parser needs to suspend its state once it reaches the last byte of decrypted input, and then must be called again when more data is available to be read. But in all cases if you can't predict ahead of time how much data is expected, you won't be able to tell when the message is finished - that is unless you're using a self-synchronizing protocol. Something like ATM headers would be a starting point. But such complication is unnecessary when all you need is just to properly delimit your data so that the packet parser can know exactly whether it's got all it needs or not.
That's the problem with sending messages: it's very easy to send stuff that can't be decoded by the receiver, since the sender is perfectly fine with losing data - it just doesn't care. But the receiver will certainly need to know how many bytes or records are expected - somehow. It can be told this a-priori by sending headers that include byte counts or fixed-size record counts (it's the same size information just in different units), or a posteriori by using unique record delimiters. For example, when sending printable text split into lines, such delimiters can be Unicode paragraph separators (U+2029).
It's very important to ensure that the record delimiters can't occur within the record data itself. Thus you need some sort of a "stuffing" mechanism, where if a delimiter sequence appears in the payload, you can alter it so that it's not a valid delimiter anymore. You also need an "unstuffing" mechanism so that such altered delimiter sequences can be detected and converted back to their original form, of course without being interpreted as a delimiter. A very simple example of such delimiting process is the octet-stuffed framing in the PPP protocol. It is a form of HDLC framing. The record separator is 0x7E. Whenever this byte is detected in the payload, it is escaped - replaced by a 0x7D 0x5E sequence. On the receiving end, the 0x7D is interpreted to mean "the following character has been XOR'd with 0x20". Thus, the receiver converts 0x7D 0x5E to 0x5E first (it removes the escape byte), and then XORs it with 0x20, yielding the original 0x7E. Such framing is easy to implement but potentially has more overhead than framing with a longer delimiter sequence, or even a dynamic delimiter sequence whose form differs for each position within the stream. This could be used to prevent denial-of-service attacks, when the attacker may maliciously provide a payload that will incur a large escaping overhead. The dynamic delimiter sequence - especially if unpredictable, e.g. by negotiating a new sequence for every connection - prevents such service degradation.

Serialize and deserialize the message using google protobuf in socket programming in C++

Message format to send to server side as below :
package test;
message Test {
required int32 id = 1;
required string name = 2;
}
Server.cpp to do encoding :
string buffer;
test::Test original;
original.set_id(0);
original.set_name("original");
original.AppendToString(&buffer);
send(acceptfd,buffer.c_str(), buffer.size(),0);
By this send function it will send the data to client,i hope and i am not getting any error also for this particular code.
But my concern is like below:
How to decode using Google Protocol buffer for the above message in
the client side
So that i can see/print the message.
You should send more than just the protobuf message to be able to decode it on the client side.
A simple solution would be to send the value of buffer.size() over the socket as a 4-byte integer using network byte order, and the send the buffer itself.
The client should first read the buffer's size from the socket and convert it from network to host byte order. Let's denote the resulting value s. The client must then preallocate a buffer of size s and read s bytes from the socket into it. After that, just use MessageLite::ParseFromString to reconstruct your protobuf.
See here for more info on protobuf message methods.
Also, this document discourages the usage of required:
You should be very careful about marking fields as required. If at
some point you wish to stop writing or sending a required field, it
will be problematic to change the field to an optional field – old
readers will consider messages without this field to be incomplete and
may reject or drop them unintentionally. You should consider writing
application-specific custom validation routines for your buffers
instead. Some engineers at Google have come to the conclusion that
using required does more harm than good; they prefer to use only
optional and repeated. However, this view is not universal.

Socket Commuication with High frequency

I need to send data to another process every 0.02s.
The Server code:
//set socket, bind, listen
while(1){
sleep(0.02);
echo(newsockfd);
}
void echo (int sock)
{
int n;
char buffer[256]="abc";
n=send(sock,buffer,strlen(buffer),0);
if (n < 0) error("ERROR Sending");
}
The Client code:
//connect
while(1)
{
bzero(buffer,256);
n = read(sock,buffer,255);
printf("Recieved data:%s\n",buffer);
if (n < 0)
error("ERROR reading from socket");
}
The problem is that:
The client shows something like this:
Recieved data:abc
Recieved data:abcabcabc
Recieved data:abcabc
....
How does it happen? When I set sleep time:
...
sleep(2)
...
It would be ok:
Recieved data:abc
Recieved data:abc
Recieved data:abc
...
TCP sockets do not guarantee framing. When you send bytes over a TCP socket, those bytes will be received on the other end in the same order, but they will not necessarily be grouped the same way — they may be split up, or grouped together, or regrouped, in any way the operating system sees fit.
If you need framing, you will need to send some sort of packet header to indicate where each chunk of data starts and ends. This may take the form of either a delimiter (e.g, a \n or \0 to indicate where each chunk ends), or a length value (e.g, a number at the head of each chunk to denote how long it is).
Also, as other respondents have noted, sleep() takes an integer, so you're effectively not sleeping at all here.
sleep takes unsigned int as argument, so sleep(0.02) is actually sleep(0).
unsigned int sleep(unsigned int seconds);
Use usleep(20) instead. It will sleep in microseconds:
int usleep(useconds_t usec);
The OS is at liberty to buffer data (i.e. why not just send a full packet instead of multiple packets)
Besides sleep takes a unsigned integer.
The reason is that the OS is buffering data to be sent. It will buffer based on either size or time. In this case, you're not sending enough data, but you're sending it fast enough the OS is choosing to bulk it up before putting it on the wire.
When you add the sleep(2), that is long enough that the OS chooses to send a single "abc" before the next one comes in.
You need to understand that TCP is simply a byte stream. It has no concept of messages or sizes. You simply put bytes on the wire on one end and take them off on the other. If you want to do specific things, then you need to interpret the data special ways when you read it. Because of this, the correct solution is to create an actual protocol for this. That protocol could be as simple as "each 3 bytes is one message", or more complicated where you send a size prefix.
UDP may also be a good solution for you, depending on your other requirements.
sleep(0.02)
is effectively
sleep(0)
because argument is unsigned int, so implicit conversion does it for you. So you have no sleep at all here. You can use sleep(2) to sleep for 2 microseconds.Next, even if you had, there is no guarantee that your messages will be sent in a different frames. If you need this, you should apply some sort of delimiter, I have seen
'\0'
character in some implementation.
TCPIP stacks buffer up data until there's a decent amount of data, or until they decide that there's no more coming from the application and send what they've got anyway.
There are two things you will need to do. First, turn off Nagle's algorithm. Second, sort out some sort of framing mechanism.
Turning off Nagle's algorithm will cause the stack to "send data immediately", rather than waiting on the off chance that you'll be wanting to send more. It actually leads to less network efficiency because you're not filling up Ethernet frames, something to bare in mind on Gigabit where jumbo frames are required to get best throughput. But in your case timeliness is more important than throughput.
You can do your own framing by very simple means, eg by send an integer first that says how long the rest if the message will be. At the reader end you would read the integer, and then read that number of bytes. For the next message you'd send another integer saying how long that message is, etc.
That sort of thing is ok but not hugely robust. You could look at something like ASN.1 or Google Protocol buffers.
I've used Objective System's ASN.1 libraries and tools (they're not free) and they do a good job of looking after message integrity, framing, etc. They're good because they don't read data from a network connection one byte at a time so the efficiency and speed isn't too bad. Any extra data read is retained and included in the next message decode.
I've not used Google Protocol Buffers myself but it's possible that they have similar characteristics, and there maybe other similar serialisation mechanisms out there. I'd recommend avoiding XML serialisation for speed/efficiency reasons.

Handling TCP Streams

Our server is seemingly packet based. It is an adaptation from an old serial based system. It has been added, modified, re-built, etc over the years. Since TCP is a stream protocol and not a packet protocol, sometimes the packets get broken up. The ServerSocket is designed in such a way that when the Client sends data, part of the data contains the size of our message such as 55. Sometimes these packets are split into multiple pieces. They arrive in order but since we do not know how the messages will be split, our server sometimes does not know how to identify the split message.
So, having given you the background information. What is the best method to rebuild the packets as they come in if they are split? We are using C++ Builder 5 (yes I know, old IDE but this is all we can work with at the moment. ALOT of work to re-design in .NET or newer technology).
TCP guarantees that the data will arrive in the same order it was sent.
That beeing said, you can just append all the incoming data to a buffer. Then check if your buffer contains one or more packets, and remove them from the buffer, keeping all the remaining data into the buffer for future check.
This, of course, suppose that your packets have some header that indicates the size of the following data.
Lets consider packets have the following structure:
[LEN] X X X...
Where LEN is the size of the data and each X is an byte.
If you receive:
4 X X X
[--1--]
The packet is not complete, you can leave it in the buffer. Then, other data arrives, you just append it to the buffer:
4 X X X X 3 X X X
[---2---]
You then have 2 complete messages that you can easily parse.
If you do it, don't forget to send any length in a host-independant form (ntohs and ntohl can help).
This is often accomplished by prefixing messages with a one or two-byte length value which, like you said, gives the length of the remaining data. If I've understood you correctly, you're sending this as plain text (i.e., '5', '5') and this might get split up. Since you don't know the length of a decimal number, it's somewhat ambiguous. If you absolutely need to go with plain text, perhaps you could encode the length as a 16-bit hex value, i.e.:
00ff <255 bytes data>
000a <10 bytes data>
This way, the length of the size header is fixed to 4 bytes and can be used as a minimum read length when receiving on the socket.
Edit: Perhaps I misunderstood -- if reading the length value isn't a problem, deal with splits by concatenating incoming data to a string, byte buffer, or whatever until its length is equal to the value you read in the beginning. TCP will take care of the rest.
Take extra precautions to make sure that you can't get stuck in a blocking read state should the client not send a complete message. For example, say you receive the length header, and start a loop that keeps reading through blocking recv() calls until the buffer is filled. If a malicious client intentionally stops sending data, your server might be locked until the client either disconnects, or starts sending.
I would have a function called readBytes or something that takes a buffer and a length parameter and reads until that many bytes have been read. You'll need to capture the number of bytes actually read and if it's less than the number you're expecting, advance your buffer pointer and read the rest. Keep looping until you've read them all.
Then call this function once for the header (containing the length), assuming that the header is a fixed length. Once you have the length of the actual data, call this function again.

recv windows, one byte per call, what the?

c++
#define BUF_LEN 1024
the below code only receives one byte when its called then immediately moves on.
output = new char[BUF_LEN];
bytes_recv = recv(cli, output, BUF_LEN, 0);
output[bytes_recv] = '\0';
Any idea how to make it receive more bytes?
EDIT: the client connecting is Telnet.
The thing to remember about networking is that you will be able to read as much data as has been received. Since your code is asking for 1024 bytes and you only read 1, then only 1 byte has been received.
Since you are using a telnet client, it sounds like you have it configured in character mode. In this mode, as soon as you type a character, it will be sent.
Try to reconfigure your telnet client in line mode. In line mode, the telnet client will wait until you hit return before it sends the entire line.
On my telnet client. In order to do that, first I type ctrl-] to get to the telnet prompt and then type "mode line" to configure telnet in line mode.
Update
On further thought, this is actually a very good problem to have.
In the real world, your data can get fragmented in unexpected ways. The client may make a single send() call of N bytes but the data may not arrive in a single packet. If your code can handle byte arriving 1 by 1, then you know it will work know matter how the data arrives.
What you need to do is make sure that you accumulate your data across multiple receives. After your recv call returns, you should then append the data a buffer. Something like:
char *accumulate_buffer = new char[BUF_LEN];
size_t accumulate_buffer_len = 0;
...
bytes_recv = recv(fd,
accumulate_buffer + accumulate_buffer_len,
BUF_LEN - accumulate_buffer_len,
0);
if (bytes_recv > 0)
accumulate_buffer_len += bytes_recv;
if (can_handle_data(accumulate_buffer, accumulate_buffer_len))
{
handle_data(accumulate_buffer, accumulate_buffer_len);
accumulate_buffer_len = 0;
}
This code keeps accumulating the recv into a buffer until there is enough data to handle. Once you handle the data, you reset the length to 0 and you start accumulating afresh.
First, in this line:
output[bytes_recv] = '\0';
you need to check if bytes_recv < 0 first before you do that because you might have an error. And the way your code currently works, you'll just randomly stomp on some random piece of memory (likely the byte just before the buffer).
Secondly, the fact you are null terminating your buffer indicates that you're expecting to receive ASCII text with no embedded null characters. Never assume that, you will be wrong at the worst possible time.
Lastly stream sockets have a model that's basically a very long piece of tape with lots of letters stamped on it. There is no promise that the tape is going to be moving at any particular speed. When you do a recv call you're saying "Please give me as many letters from the tape as you have so far, up to this many.". You may get as many as you ask for, you may get only 1. No promises. It doesn't matter how the other side spit bits of the tape out, the tape is going through an extremely complex bunch of gears and you just have no idea how many letters are going to be coming by at any given time.
If you care about certain groupings of characters, you have to put things in the stream (ont the tape) saying where those units start and/or end. There are many ways of doing this. Telnet itself uses several different ones in different circumstances.
And on the receiving side, you have to look for those markers and put the sequences of characters you want to treat as a unit together yourself.
So, if you want to read a line, you have to read until you get a '\n'. If you try to read 1024 bytes at a time, you have to take into account that the '\n' might end up in the middle of your buffer and so your buffer may contain the line you want and part of the next line. It might even contain several lines. The only promise is that you won't get more characters than you asked for.
Force the sending side to send more bytes using Nagle's algorithm, then you will receive them in packages.