C++ TCP recv unknown buffer size

C++ TCP recv unknown buffer size - c++

I want to use the function recv(socket, buf, len, flags) to receive an incoming packet. However I do not know the length of this packet prior to runtime so the first 8 bytes are supposed to tell me the length of this packet. I don't want to just allocate an arbitrarily large len to accomplish this so is it possible to set len = 8 have buf be a type of uint64_t. Then afterwards
memcpy(dest, &buf, buf)?

Since TCP is stream-based, I'm not sure what type of packages you mean. I will assume that you are referring to application level packages. I mean packages which are defined by your application and not by underlying protocols like TCP. I will call them messages instead to avoid confusion.
I will show two possibilities. First I will show, how you could read a message without knowing the length before you have finished reading. The second example will do two calls. First it reads the size of the message. Then it read the whole message at once.
Read data until the message is complete
Since TCP is stream-based, you will not loss any data when your buffer is not big enough. So you can read a fixed amount of bytes. If something is missing, you can call recv again. Here is a extensive example. I just wrote it without testing. I hope everything would work.
std::size_t offset = 0;
std::vector<char> buf(512);
std::vector<char> readMessage() {
while (true) {
ssize_t ret = recv(fd, buf.data() + offset, buf.size() - offset, 0);
if (ret < 0) {
if (errno == EINTR) {
// Interrupted, just try again ...
continue;
} else {
// Error occured. Throw exception.
throw IOException(strerror(errno));
}
} else if (ret == 0) {
// No data available anymore.
if (offset == 0) {
// Client did just close the connection
return std::vector<char>(); // return empty vector
} else {
// Client did close connection while sending package?
// It is not a clean shutdown. Throw exception.
throw ProtocolException("Unexpected end of stream");
}
} else if (isMessageComplete(buf)) {
// Message is complete.
buf.resize(offset + ret); // Truncate buffer
std::vector<char> msg = std::move(buf);
std::size_t msgLen = getSizeOfMessage(msg);
if (msg.size() > msgLen) {
// msg already contains the beginning of the next message.
// write it back to buf
buf.resize(msg.size() - msgLen)
std::memcpy(buf.data(), msg.data() + msgLen, msg.size() - msgLen);
msg.resize(msgLen);
}
buf.resize(std::max(2*buf.size(), 512)) // prepare buffer for next message
return msg;
} else {
// Message is not complete right now. Read more...
offset += ret;
buf.resize(std::max(buf.size(), 2 * offset)); // double available memory
}
}
}
You have to define bool isMessageComplete(std::vector<char>) and std::size_t getSizeOfMessage(std::vector<char>) by yourself.
Read the header and check the length of the package
The second possibility is to read the header first. Just the 8 bytes which contains the size of the package in your case. After that, you know the size of the package. This mean you can allocate enough storage and read the whole message at once:
/// Reads n bytes from fd.
bool readNBytes(int fd, void *buf, std::size_t n) {
std::size_t offset = 0;
char *cbuf = reinterpret_cast<char*>(buf);
while (true) {
ssize_t ret = recv(fd, cbuf + offset, n - offset, MSG_WAITALL);
if (ret < 0) {
if (errno != EINTR) {
// Error occurred
throw IOException(strerror(errno));
}
} else if (ret == 0) {
// No data available anymore
if (offset == 0) return false;
else throw ProtocolException("Unexpected end of stream");
} else if (offset + ret == n) {
// All n bytes read
return true;
} else {
offset += ret;
}
}
}
/// Reads message from fd
std::vector<char> readMessage(int fd) {
std::uint64_t size;
if (readNBytes(fd, &size, sizeof(size))) {
std::vector buf(size);
if (readNBytes(fd, buf.data(), size)) {
return buf;
} else {
throw ProtocolException("Unexpected end of stream");
}
} else {
// connection was closed
return std::vector<char>();
}
}
The flag MSG_WAITALL requests that the function blocks until the full amount of data is available. However, you cannot rely on that. You have to check it and read again if something is missing. Just like I did it above.
readNBytes(fd, buf, n) reads n bytes. As far as the connection was not closed from the other side, the function will not return without reading n bytes. If the connection was closed by the other side, the function returns false. If the connection was closed in the middle of a message, an exception is thrown. If an i/o-error occurred, another exception is thrown.
readMessage reads 8 bytes [sizeof(std::unit64_t)] und use them as size for the next message. Then it reads the message.
If you want to have platform independency, you should convert size to a defined byte order. Computers (with x86 architecture) are using little endian. It is common to use big endian in network traffic.
Note: With MSG_PEEK it is possible to implement this functionality for UDP. You can request the header while using this flag. Then you can allocate enough space for the whole package.

A fairly common technique is to read leading message length field, then issue a read for the exact size of the expected message.
HOWEVER! Do not assume that the first read will give you all eight bytes(see Note), or that the second read will give you the entire message/packet.
You must always check the number of bytes read and issue another read (or two (or three, or...)) to get all the data you want.
Note: Because TCP is a streaming protocol and because the packet size "on the wire" varies in accordance with a very arcane algorithm designed to maximize network performance, you could easily issue a read for eight bytes and the read could return having only read three (or seven or ...) bytes. The guarantee is that unless there is an unrecoverable error you will receive at least one byte and at most the number of bytes you requested. Because of this you must be prepared to do byte address arithmetic and issue all reads in a loop that repeats until the desired number of bytes is returned.

Since TCP is streaming there isn't really any end to the data you receive, not until the connection is closed or there is an error.
Instead you need to implement your own protocol on top of TCP, one that either contains a specific end-of-message marker, a length-of-data header field, or possibly a command-based protocol where the data of each command is of a well-known size.
That way you can read into a small fixed-sized buffer and append to a larger (possibly expanding) buffer as needed. The "possibly expanding" part is ridiculously easy in C++, what with std::vector and std::string (depending on the data you have)
There is another important thing to remember, that since TCP is stream-based, a single read or recv call may not actually fetch all the data you request. You need to receive the data in a loop until you have received everything.

In my Personal opinion.
I suggest receive "size of message"(integer 4 byte fixed) first.
recv(socket, "size of message written in integer" , "size of integer")
then
receive real message after.
recv(socket, " real message" ,"size of message written in integer")
This techinique also can be used on "sending files, images ,long messages"

Related

How to send image data over linux socket

I have a relatively simple web server I have written in C++. It works fine for serving text/html pages, but the way it is written it seems unable to send binary data and I really need to be able to send images.
I have been searching and searching but can't find an answer specific to this question which is written in real C++ (fstream as opposed to using file pointers etc.) and whilst this kind of thing is necessarily low level and may well require handling bytes in a C style array I would like the the code to be as C++ as possible.
I have tried a few methods, this is what I currently have:
int sendFile(const Server* serv, const ssocks::Response& response, int fd)
{
// some other stuff to do with headers etc. ........ then:
// open file
std::ifstream fileHandle;
fileHandle.open(serv->mBase + WWW_D + resource.c_str(), std::ios::binary);
if(!fileHandle.is_open())
{
// error handling code
return -1;
}
// send file
ssize_t buffer_size = 2048;
char buffer[buffer_size];
while(!fileHandle.eof())
{
fileHandle.read(buffer, buffer_size);
status = serv->mSock.doSend(buffer, fd);
if (status == -1)
{
std::cerr << "Error: socket error, sending file\n";
return -1;
}
}
return 0
}
And then elsewhere:
int TcpSocket::doSend(const char* message, int fd) const
{
if (fd == 0)
{
fd = mFiledes;
}
ssize_t bytesSent = send(fd, message, strlen(message), 0);
if (bytesSent < 1)
{
return -1;
}
return 0;
}
As I say, the problem is that when the client requests an image it won't work. I get in std::cerr "Error: socket error sending file"
EDIT : I got it working using the advice in the answer I accepted. For completeness and to help those finding this post I am also posting the final working code.
For sending I decided to use a std::vector rather than a char array. Primarily because I feel it is a more C++ approach and it makes it clear that the data is not a string. This is probably not necessary but a matter of taste. I then counted the bytes read for the stream and passed that over to the send function like this:
// send file
std::vector<char> buffer(SEND_BUFFER);
while(!fileHandle.eof())
{
fileHandle.read(&buffer[0], SEND_BUFFER);
status = serv->mSock.doSend(&buffer[0], fd, fileHandle.gcount());
if (status == -1)
{
std::cerr << "Error: socket error, sending file\n";
return -1;
}
}
Then the actual send function was adapted like this:
int TcpSocket::doSend(const char* message, int fd, size_t size) const
{
if (fd == 0)
{
fd = mFiledes;
}
ssize_t bytesSent = send(fd, message, size, 0);
if (bytesSent < 1)
{
return -1;
}
return 0;
}

The first thing you should change is the while (!fileHandle.eof()) loop, because that will not work as you expect it to, in fact it will iterate once too many because the eof flag isn't set until after you try to read from beyond the end of the file. Instead do e.g. while (fileHandle.read(...)).
The second thing you should do is to check how many bytes was actually read from the file, and only send that amount of bytes.
Lastly, you read binary data, not text, so you can't use strlen on the data you read from the file.
A little explanations of the binary file problem: As you should hopefully know, C-style strings (the ones you use strlen to get the length of) are terminated by a zero character '\0' (in short, a zero byte). Random binary data can contain lots of zero bytes anywhere inside it, and it's a valid byte and doesn't have any special meaning.
When you use strlen to get the length of binary data there are two possible problems:
There's a zero byte in the middle of the data. This will cause strlen to terminate early and return the wrong length.
There's no zero byte in the data. That will cause strlen to go beyond the end of the buffer to look for the zero byte, leading to undefined behavior.

recv the first few bytes from a socket to determine buffer size

I'm writing a distributed system in c++ using TCP/IP and sockets.
For each of my messages, I need to receive the first 5 bytes to know the full length of the incoming message.
What's the best way to do this?
recv() only 5 bytes, then recv() again. if I choose this, would it be safe to assume I'll get 0 or 5 bytes in the recv (aka not write a loop to keep trying)?
use MSG_PEEK
recv() some larger buffer size, then read the first 5 bytes and allocate the final buffer then.

You don't need to know anything. TCP is a stream protocol, and at any given moment you can get as little as one byte, or as much as multiple megabytes of data. The correct and only way to use a TCP socket is to read in a loop.
char buf[4096]; // or whatever
std::deque<char> data;
for (int res ; ; )
{
res = recv(fd, buf, sizeof buf, MSG_DONTWAIT);
if (res == -1)
{
if (errno == EAGAIN || errno == EWOULDBLOCK)
{
break; // done reading
}
else
{
// error, break, die
}
}
if (res == 0)
{
// socket closed, finalise, break
}
else
{
data.insert(data.end(), buf, buf + res);
}
}
The only purpose of the loop is to transfer data from the socket buffer to your application. Your application must then decide independently if there's enough data in the queue to attempt extraction of some sort of higher-level application message.
For example, in your case you would check if the queue's size is at least 5, then inspect the first five bytes, and then check if the queue holds a complete application message. If no, you abort, and if yes, you extract the entire message and pop if off from the front of the queue.

Use a state machine with two states:
First State.
Receive bytes as they arrive into a buffer. When there are 5 or more bytes perform your check on those first 5 and possibly process the rest of the buffer. Switch to the second state.
Second State.
Receive and process bytes as they arrive to the end of the message.

to answer your question specifically:
it's not safe to assume you'll get 0 or 5. it is possible to get 1-4 as well. loop until you get 5 or an error as others have suggested.
i wouldn't bother with PEEK, most of the time you'll block (assuming blocking calls) or get 5 so skip the extra call into the stack.
this is fine too but adds complexity for little gain.

Partial receipt of packets from socket C++

I have a trouble, my server application sends packet 8 bytes length - AABBCC1122334455 but my application receives this packet in two parts AABBCC1122 and 334455, via "recv" function, how can i fix that?
Thanks!

To sum up a liitle bit:
TCP connection doesn't operate with packets or messages on the application level, you're dealing with stream of bytes. From this point of view it's similar to writing and reading from a file.
Both send and recv can send and receive less data than provided in the argument. You have to deal with it correctly (usually by applying proper loop around the call).
As you're dealing with streams, you have to find the way to convert it to meaningful data in your application. In other words, you have to design serialisation protocol.
From what you've already mentioned, you most probably want to send some kind of messages (well, it's usually what people do). The key thing is to discover the boundaries of messages properly. If your messages are of fixed size, you simply grab the same amount of data from the stream and translate it to your message; otherwise, you need a different approach:
If you can come up with a character which cannot exist in your message, it could be your delimiter. You can then read the stream until you reach the character and it'll be your message. If you transfer ASCII characters (strings) you can use zero as a separator.
If you transfer binary data (raw integers etc.), all characters can appear in your message, so nothing can act as a delimiter. Probably the most common approach in this case is to use fixed-size prefix containing size of your message. Size of this extra field depends on the max size of your message (you will be probably safe with 4 bytes, but if you know what is the maximum size, you can use lower values). Then your packet would look like SSSS|PPPPPPPPP... (stream of bytes), where S is the additional size field and P is your payload (the real message in your application, number of P bytes is determined by value of S). You know every packet starts with 4 special bytes (S bytes), so you can read them as an 32-bit integer. Once you know the size of the encapsulated message, you read all the P bytes. After you're done with one packet, you're ready to read another one from the socket.
Good news though, you can come up with something completely different. All you need to know is how to deserialise your message from a stream of bytes and how send/recv behave. Good luck!
EDIT:
Example of function receiving arbitrary number of bytes into array:
bool recv_full(int sock, char *buffer, size_t size)
{
size_t received = 0;
while (received < size)
{
ssize_t r = recv(sock, buffer + received, size - received, 0);
if (r <= 0) break;
received += r;
}
return received == size;
}
And example of receiving packet with 2-byte prefix defining size of payload (size of payload is then limited to 65kB):
uint16_t msgSize = 0;
char msg[0xffff];
if (recv_full(sock, reinterpret_cast<char *>(&msgSize), sizeof(msgSize)) &&
recv_full(sock, msg, msgSize))
{
// Got the message in msg array
}
else
{
// Something bad happened to the connection
}

That's just how recv() works on most platforms. You have to check the number of bytes you receive and continue calling it in a loop until you get the number that you need.

You "fix" that by reading from TCP socket in a loop until you get enough bytes to make sense to your application.

my server application sends packet 8 bytes length
Not really. Your server sends 8 individual bytes, not a packet 8 bytes long. TCP data is sent over a byte stream, not a packet stream. TCP neither respects nor maintains any "packet" boundary that you might have in mind.
If you know that your data is provided in quanta of N bytes, then call recv in a loop:
std::vector<char> read_packet(int N) {
std::vector buffer(N);
int total = 0, count;
while ( total < N && (count = recv(sock_fd, &buffer[N], N-total, 0)) > 0 )
total += count;
return buffer;
}
std::vector<char> packet = read_packet(8);
If your packet is variable length, try sending it before the data itself:
int read_int() {
std::vector<char> buffer = read_packet(sizeof (int));
int result;
memcpy((void*)&result, (void*)&buffer[0], sizeof(int));
return result;
}
int length = read_int();
std::vector<char> data = read_buffer(length);

how to receive the large data using recv()?

i developed client server program using c++,so i want to receive more than 500kb , my client message is terminated with "!" ,so i want to receive until my last byte(!) receive ,
this is my code it doesn't work.what is wrong with it.
do
{
int num = recv(*csock, buf, bytesLeft,0);
if (num == 0)
{
break;
}
else if (num < 0 && errno != EINTR)
{
fprintf(stderr, "Exit %d\n", __LINE__);
exit(1);
}
else if (num > 0)
{
numRd += num;
buf += num;
bytesLeft -= num;
fprintf(stderr, "read %d bytes - remaining = %d\n", num, bytesLeft);
}
}
while (bytesLeft != 0);
fprintf(stderr, "read total of %d bytes\n", numRd);

While I'm not sure exactly what your problem is because of the wording of your question, you generally can't use strcat to append raw buffers received over the network unless you know specifically they will be NULL-terminated, and even then, that's not really "safe" in the event you get an unexpected data transmission. The assumption with c-strings is that they are NULL-terminated, but a raw network buffer may not be, and using strcat will cause you to over-run the input buffer should it not be NULL-terminated. Instead of strcat, use a known fixed-size buffer of size N bytes for receiving the data into, and increment a temporary pointer through the buffer until you reach the end of the buffer or the end of the packet transmission. That way you will always read from the network up to N bytes and no more, and prevent buffer over-run situations from occuring.
For instance, you can do the following (this is not the fastest or more efficient solution because of all the copying, but it works):
unsigned char buf[10000]; //10Kb fixed-size buffer
unsigned char buffer[MAXRECV]; //temporary buffer
unsigned char* temp_buf = buf;
unsigned char* end_buf = buf + sizeof(buf);
do
{
iByteCount = recv(GetSocketId(), buffer,MAXRECV,0);
if ( iByteCount > 0 )
{
//make sure we're not about to go over the end of the buffer
if (!((temp_buf + iByteCount) <= end_buf))
break;
fprintf(stderr, "Bytes received: %d\n",iByteCount);
memcpy(temp_buf, buffer, iByteCount);
temp_buf += iByteCount;
}
else if ( iByteCount == 0 )
{
if(temp_buf != buf)
{
//do process with received data
}
else
{
fprintf(stderr, "receive failed");
break;
}
}
else
{
fprintf(stderr, "recv failed: ");
break;
}
} while(iByteCount > 0 && temp_ptr < end_buf); //check for end of buffer

Do you need all 1MB+ of data in one contiguous byte buffer? If so, and you stick with that protocol that has a terminating '!' and does not have a header that includes the length, then you ar stuck with memcpy() and realloc() a lot or some other buffer type like std::vector which, really just does the same thing.
If you don't need all those bytes in one string, you can store them in some other way, eg. a vector of *buffer, and so avoid copying.

Assuming you are using a blocking socket (which is the default mode for sockets), then recv() will block waiting for the full MAXRECV number of bytes to arrive. If the client sends less than that number of bytes, recv() will block waiting for data that does not arrive.
To work around that, you need to either:
1) call recv() with a 1-byte buffer, calling recv() until you encounter your ! byte.
2) call select() before calling recv() to detect when the socket actually has data to read, then call ioctlsocket(FIONREAD) to determine how many bytes can actually be read with recv() without blocking, then have recv() read that number of bytes.

One problem with read function in c++

I am using read function to read data from a socket, but when the data is more than 4k, read function just read part of the data, for example, less than 4k. Here is the key code:
mSockFD = socket(AF_INET, SOCK_STREAM, 0);
if (connect(mSockFD, (const sockaddr*)(&mSockAdd), sizeof(mSockAdd)) < 0)
{
cerr << "Error connecting in Crawl" << endl;
perror("");
return false;
}
n = write(mSockFD, httpReq.c_str(), httpReq.length());
bzero(mBuffer, BUFSIZE);
n = read(mSockFD, mBuffer, BUFSIZE);
Note than BUFSIZE is much larger than 4k.
When data is just a few hundred bytes, read function works as expected.

This is by design and to be expected.
The short answer to your question is you should continue calling "read" until you get all the data you expect. That is:
int total_bytes = 0;
int expected = BUFSIZE;
int bytes_read;
char *buffer = malloc(BUFSIZE+1); // +1 for null at the end
while (total_bytes < expected)
{
int bytes_read = read(mSockFD, buffer+total_bytes, BUFSIZE-total_bytes);
if (bytes_read <= 0)
break;
total_bytes += bytes_read;
}
buffer[total_bytes] = 0; // null terminate - good for debugging as a string
From my experience, one of the biggest misconceptions (resulting in bugs) that you'll receive as much data as you ask for. I've seen shipping code in real products written with the expectation that sockets work this way (and no one certain as to why it doesn't work reliably).
When the other side sends N bytes, you might get lucky and receive it all at once. But you should plan for receiving N bytes spread out across multiple recv calls. With the exception of a real network error, you'll eventually get all N bytes. Segmentation, fragmentation, TCP window size, MTU, and the socket layer's data chunking scheme are the reasons for all of this. When partial data is received, the TCP layer doesn't know about how much more is yet to come. It just passes what it has up to the app. It's up to the app to decide if it got enough.
Likewise, "send" calls can get conglomerated into the same packet together.
There may be ioctls and such that will make a socket block until all the expected data is received. But I don't know of any off hand.
Also, don't use read and write for sockets. Use recv and send.
Read this book. It will change your life with regards to sockets and TCP:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ TCP recv unknown buffer size - c++

Related

How to send image data over linux socket

recv the first few bytes from a socket to determine buffer size

Partial receipt of packets from socket C++

how to receive the large data using recv()?

One problem with read function in c++

Categories

Resources