Can `recv()` result in a buffer overflow? - c++

I'm introducing myself to socket programming in C/C++, and am using send() and recv() to exchange data between a client and server program over TCP sockets.
Here are some relevant excerpts from my code:
server.c:
char recv_data[1024];
// Socket setup and so on ommited...
bytes_recieved = recv(connected, recv_data, 1024, 0);
recv_data[bytes_recieved] = '\0';
client.c:
char send_data[1024];
// Setup ommited...
send(connected, send_data, strlen(send_data), 0);
Does recv() itself provide any protection against buffer overflows? For instance if I changed the 3rd argument to recv() to something higher than the buffer pointed to by recv_data (e.g. 4000) - would this cause a buffer overflow? (I've actually tried doing this, but can't seem to trigger a segfault).
I'm actually trying to create an intentionally vulnerable server program to better understand these issues, which is why I've tried to overflow via recv().
Amendment:
Not unrelated, would be finding out why client.c above would ever send more than the 1024 bytes specified by strlen(send_data). I'm using gets(send_data) to populate that buffer from standard input, but if I enter many more than 1024 bytes via standard in, the server.c program shows that it receives ALL THE BYTES! :). Does the strlen(send_data) for send() not restrict the number of bytes sent?

For instance if I changed the 3rd argument to recv() to something higher than the buffer pointed to by recv_data (e.g. 4000) - would this cause a buffer overflow?
Ofcourse yes. If the network buffer has data of 4000 bytes, it will put it in the buffer. The key point is that, recv like any other C API which takes a buffer and it's length believes that the caller will pass the actual length of the buffer and if the caller passes incorrect length, then the fault lies with the caller and it can lead to undefined behavior.
In C when you pass arrays to a function, there is no way for the called function to know the size of the array. So, all API(s) just rely on the input provided by you.
char recv_data[1024];
// Socket setup and so on ommited...
bytes_recieved = recv(connected, recv_data, 1024, 0);
recv_data[bytes_recieved] = '\0';
The above code can cause trouble in more ways than one. It will lead to undefined behavior under the following conditions:
(a) If recv returns -1, then you are directly indexing into the recv_data buffer without checking the return value
(b) If recv returns 1024, then again, it leads to out of bound access as the array of size 1024 should be accessed from 0 to 1023.

This
recv_data[bytes_recieved] = '\0';
could result in a buffer overflow, if 1024 bytes were received.
You might like to change this
bytes_recieved = recv(connected, recv_data, 1024, 0);
to become
bytes_recieved = recv(connected, recv_data, 1024 - 1, 0);
so that bytes_recieved would never become larger than 1023, which is the maximum valid index to recv_data.
Also your system calls (recv()/send()) lack error checking. Test them for having returned -1 prior to using the result in any other way.
Referring your amendment:
strlen() tries to return the number of characters starting from the character pointed to by its argument up until the first NUL/0-character. This number could be any value, depending on where you placed the terminating 0.
In case the seach for this 0-terminator runs behind the memory allocated to strlen()s argument the program most certainly runs into undefined behaviour, and therefore could return any value.
So to answer your question: If send_data is not 0-terminated strlen() makes the app run into undefined behaviuor so it might crash or strlen() returns a value greater than 1024, so send() would try to send this number of characters.

Even if you send larger bytes than the recv() buffer, you are still able to recv() it on succeeding calls to recv(), that's why you said that bytes_received is still 5000 bytes, because, let's say you send 5000 bytes, and your receive buffer is 1000 bytes, on the first call to recv() it will only get 1000 bytes, on the next call, 1000 bytes again, until it receives all your data. So, I think there's no buffer overflow here. This is by the way how TCP works.

Related

C++ char pointer size exceeds after malloc

I have a char pointer & have used malloc like
char *message;
message=(char *)malloc(4000*sizeof(char));
later I'm receiving data from socket in message what happens if data exceeds 4000 bytes ?
I'll assume you are asking what will happen if you do something like this:
recv(socket,message,5000,0);
and the amount of data read is greater than 4000.
This will be undefined behavior, so you need to make sure that it can't happen. Whenever you read from a socket, you should be able to specify the maximum number of characters to read.
Your question leaves out many details about the network protocol, see the answer by #DavidSchwartz.
But focussing on the buffer in which you store it: if you try to write more than 4K chars into the memory allocated by message, your program could crash.
If you test for the size of the message being received, you could do realloc:
int buf_len = 4000;
char *message;
message = static_cast<char*>(malloc(buf_len));
/* read message, and after you have read 4000 chars, do */
buf_len *= 2;
message = static_cast<char*>(realloc(message, buf_len));
/* rinse and repeat if buffer is still too small */
free(message); // don't forget to clean-up
But this is very labor-intensive. Just use a std::string
int buf_len = 4000;
std::string message;
message.reserve(buf_len); // allocate 4K to save on repeated allocations
/* read message, std::string will automatically expand, no worries! */
// destructor will automatically clean-up!
It depends on a few factors. Assuming there's no bug in your code, it will depend on the protocol you're using.
If TCP, you will never get more bytes than you asked for. You'll get more of the data the next time you call the receive function.
If UDP, you may get truncation, you may get an error (like MSG_TRUNC). This depends on the specifics of your platform and how you're invoking a receive function. I know of no platform that will save part of a datagram for your next invocation of a receive function.
Of course, if there's a bug in your code and you actually overflow the buffer, very bad things can happen. So make sure you pass only sane values to whatever receive function you're using.
For the best result,you get a segmentation fault error
see
What is a segmentation fault?
dangers of heap overflows?

Why is vsnprintf safe?

I have looked at this question as well as these PDFs' 1 and 2, this page and pretty much understand what happens if I do this printf(SOME_TEST_STRING). But what I do not understand is why exactly by ensuring the size of buffer vsnprintf becomes safe as compared to vsprintf?
What happens in these 2 cases ?
Case 1
char buf[3];
vsprint(buf, "%s", args);
Case 2
char buf[3];
vsnprint(buf, sizeof buf, "%s", args);
In case 1, if the string you're formatting has a length of 3 or greater, you have a buffer overrun, vsprintf might write to memory past the storage of the buf array, which is undefined behavior, possibly causing havoc/security concerns/crashes/etc.
In case 2. vsnprintf knows how big the buffer that will contain the result is, and it will make sure not to go past that(instead truncating the result to fit within buf ).
It's because vsnprintf has an additional size_t count parameter that vsprintf (and other non-n *sprintf methods) does not have. The implementation uses this to ensure that the data it writes to your buffer will not run off the end.
Data that runs off the end of a buffer can result in data corruption, or when maliciously exploited can be used as a buffer overrun attack.
The "n" in vsnprintf() means it takes the max size of the output string to avoid a buffer overflow. This makes it safe from buffer overflow, but does not make it safe if the format string comes from unsanitized user input. If your user gives you a giant format string, you'll avoid overflowing the target string, but if the user gives you %s and you don't pass a C string in the argument list at compile time, you are still left with undefined behavior.
I'm not sure what the problem is, since your question basically contains the answer already.
By passing your buffer size to vsnprintf you provide that function with information about your buffer size. The function now knows where the buffer ends and can make sure that it does not write past the end of the buffer.
vsprintf does not have information about buffer size, which is why it does not know where the buffer ends and cannot prevent buffer overflow.

C++ winsock send arrays

I'm trying to send arrays trough the net with winsock2. Now, i read microsoft disabled sending raw pointers, but you can still send un-edited binary data by casting the pointer to char*:
send(rsock, (char*)&counter, len, 0);
However, the problem is putting the data back in an array when it reaches the client. here, pass is the binary data. That's how I do fot integers, bools and doubles.
recv(sock, pass, sizeof(int), 0);
refresh = (int((void*)&pass));
recv(sock, pass, sizeof(bool[4800][254]), 0);
**key = (bool)&pass;
recv(sock, pass, sizeof(double[4800][254]), 0);
**mil = (double)&pass;
Integers aren't arrays, while bool and doubles are stored in 2 dimensional arrays. Now, the compiler says this code works for int and bool but for doubles it says "'type cast' : cannot convert from 'char **' to 'double'"
"invalid type conversion" even though I'm trying to put raw data in it. Have I done something wrong? Is there any other workaround to send arrays? Thanks in advance.
EDIT: also, I still haven't tried the code with another PC, so I highly doubt the conversion for ints and bools is done right.
Microsoft didn't disable sending anything. The fact is that sending a pointer will simply be of no use to the remote peer. A pointer is simply a memory address, and it is useless to know the address if the information is not there.
The problem you are probably facing is that this array is too big to fit the send buffer, that by default can hold only 64KB.
Pay attention to the return values of send() and recv() to know how much data you actually read/sent in that transaction. It will not always be the same size you told the function to do, as it is often split in pieces smaller than 4KB. You will have to manage the transmission of this information in pieces to fill your entire array.
Have I done something wrong?
Well...
by default, send and recv don't guarantee to return only after all the buffer you've supplied is either sent or received; they may return as soon as they've enqueued a bit more data for sending, or after receipt of a bit more data that you might be able to process... the buffer size supplied is just an upper limit to your request, not a minimum. If you want to ensure recv doesn't return until the full buffer has been populated, add the MSG_WAITALL flag as a final parameter. For send you must loop sending further parts of your output buffer.
check you return codes... send and recv tell you of errors and have pretty little numbers that give you clues as to the cause and resolution
"the compiler says this code works" - no it doesn't... it says your code requests something it's prepared to compile, full of casts that it isn't meant to try to verify, most of which will crash at runtime
Then there's this:
recv(sock, pass, sizeof(int), 0);
refresh = (int((void*)&pass));
recv(sock, pass, sizeof(bool[4800][254]), 0);
**key = (bool)&pass;
recv(sock, pass, sizeof(double[4800][254]), 0);
**mil = (double)&pass;
I'm not even going to begin to say what's wrong with all that... let's just talk about what might work (might being discussed below):
template <typename T>
void get(int sock, T& t)
{
if (recv(sock, (char*)&t, sizeof t, MSG_WAITALL) != sizeof t)
throw std::runtime_error("error while reading data from socket");
}
int refresh;
get(refresh);
bool key[4800][254];
get(key);
double mil[4800][254];
get(mil);
If your sending and receiving systems, compilers, compiler flags, executables etc. differ in any way then this may not work anyway as:
up until C++03 compilers weren't required to use any particular type to store bool, so who knows if your sending and receiving side will match
big and little endian systems have different byte ordering which can break naive binary transfers like this
the size of int may vary
Ultimately, a more robust way to do this would be to use the boost serialisation library.

c++ winsock2 bad pointers breakpoint triggered

I have a server and a client. I am using winsock2. The client sends 4 bytes:
char *ack = new char[4];
sprintf( ack, "%d", counter );
sendto( clientSocket, ack, 4, 0, ( struct sockaddr* )&remote, sizeof( remote ) );
and the server receives these 4 bytes:
char* acks = new char[4];
if( ( bytes = recvfrom( serverSocket, acks, 4, 0, ( struct sockaddr* )&remote, &remote_size ) ) == SOCKET_ERROR ) {
cout << "socket error = " << WSAGetLastError() << endl;
break;
}
if( bytes > 0 ) {
sscanf( acks, "%d", &i );
}
I am getting this error and I can't figure out how to fix it:
>Critical error detected c0000374
>
>server.exe has triggered a breakpoint.
I know there is a problem with the pointer and the memory allocation. But my c++ skills are basic.
String formatting overflow
The most pressing issue is that you are using sprintf and sscanf. Avoid using sprintf and sscanf - they make it far too easy to accidentally create the type of bug you're seeing here, which is buffer overflow (on both your client and your server).
Consider what happens on your client when your 'counter' value is 1729. Your code will run
sprintf(ack, "%d", 1729);
The C-style-string representation of 1729 is five bytes long - one byte each for the char values '1', '7', '2', '9', and '\0'. But your ack buffer is only 4 bytes long! Now you've written that last zero byte into some chunk of memory you never allocated. In C/C++, this is undefined behavior, which means your program might crash, or it might not, and if it doesn't crash, it might end up subtly wrong later, or it might work perfectly well, or it might work most of the time except it breaks on Tuesdays.
It's not a good place to be.
You might be wondering, "if this is so awful, why didn't the sprintf just return an error or something I called it with a buffer that was too small?" The answer1 is that sprintf can't make that check because it doesn't give you any way to tell it how big ack actually is. When your code here is calling sprintf, you know that ack is 4 bytes long (since you just created it), but all sprintf sees is a pointer to some memory, somewhere - you haven't told it a length, so it just has to blindly hope the chunk of memory you give it is big enough.
Blindly hoping is a pretty bad way to write software.
There's a few alternatives you could consider here.
If you are actually just trying to send an int across the wire, there's not really any need to stringify the int at all - just send it in its native format by passing reinterpret_cast<char*>(&counter) as your buffer to sendto2 with sizeof(counter) as the corresponding buffer length. Use a similar construction in recvfrom on the other end. Note that this will break if your sender and your receiver have different underlying representations of ints (for example, if they use different endiannesses), but since you're talking about Winsock here I'm assuming you're assuming both ends are reasonably recent versions of Windows where that won't be a problem.
If you really do need to stringify the content first, use size-cognizant string conversion functions, like boost::format (which is implicitly size-cognizant because it deals in std::string instead of raw char* buffers) or _snprintf_s/_snscanf_s (which explicitly take in buffer length parameters, but are Microsoft-specific).
Recvfrom access violation
The overflow in sscanf/sprintf doesn't necessarily explain this, however:
I just want to add that the error occurs in the sscanf line. If I comment that line the error occurrs in the recvfrom line.
One possible explanation for this could be not providing adequate space for the remote address, though so long as your remote_size is a correct reflection of your remote, I'd expect this to cause recvfrom to return an error3, not crash. Another possibility is passing bad memory/handles (for example, if you've set up the new operator to not throw on failure, or if your socket initialization failed and you didn't bail out). It's impossible to say exactly without seeing the code initializing all the variables in play, and ideally the actual error you get in that scenario.
1 Even though sprintf can't catch this, static analysis tools (like those included in Visual Studio 2012/2013) are very capable of catching this particular bug. If you run the posted code through the default Visual Studio 2012 Code Analyzer, it will complain with:
error C4996: 'sprintf': This function or variable may be unsafe
2 Some people prefer static_cast<char*>(static_cast<void*>(&counter)) to reinterpret_cast<char*>(&counter). Both work, it's essentially a coding convention choice.
3 For example, if you were initializing remote as a SOCKADDR_IN instead of a SOCKADDR_STORAGE, you might encounter such an error if you happened to receive from an IPv6 address. This answer goes through some of the relevant gory details.

Qt QIODevice::write / QTcpSocket::write and bytes written

We are quite confused about the behavior of QIODevice::write in general and the QTcpSocket implementation specifically. There is a similar question already, but the answer is not really satisfactory. The main confusion stems from the there mentioned bytesWritten signal respectively the waitForBytesWritten method. Those two seem to indicate the bytes that were written from the buffer employed by the QIODevice to the actual underlying device (there must be such buffer, otherwise the method would not make much sense). The question then is though, if the number returned by QIODevice::write corresponds with this number, or if in that case it indicates the number of bytes that were stored in the internal buffer, not the bytes written to the underlying device. If the number returned would indicate the bytes written to the internal buffer, we would need to employ a pattern like the following to ensure all our data is written:
void writeAll(QIODevice& device, const QByteArray& data) {
int written = 0;
do {
written = device.write(data.constData() + written, data.size() - written);
} while(written < data.size());
}
However, this will insert duplicate data if the return value of QIODevice::write corresponds with the meaning of the bytesWritten signal. The documentation is very confusing about this, as in both methods the word device is used, even though it seems logical and the general understanding, that one actually indicates written to buffer, and not device.
So to summarize, the question is: Is the number returned bye QIODevice::write the number of bytes written to the underlying device, and hence its save to call QIODevice::write without checking the returned number of bytes, as everything is stored in the internal buffer. Or does it indicate how much bytes it could store internally and a pattern like the above writeAll has to be employed to safely write all data to the device?
(UPDATE: Looking at the source, the QTcpSocket::write implementation actually will never return less bytes than one wanted to write, so the writeAll above is not needed. However, that is specific to the socket and this Qt version, the documentation is still confusing...)
QTcpSocket is a buffered QAbstractSocket. An internal buffer is allocated inside QAbstractSocket, and data is copied in that buffer. The return value of write is the size of the data passed to write().
waitForBytesWritten waits until the data in the internal buffer of QAbstractSocket is written to the native socket.
That previous question answers your question, as does the QIODevice::write(const char * data, qint64 maxSize) documentation:
Writes at most maxSize bytes of data from data to the device. Returns the number of bytes that were actually written, or -1 if an error occurred.
This can (and will in real life) return less than what you requested, and it's up to you to call write again with the remainder.
As for waitForBytesWritten:
For buffered devices, this function waits until a payload of buffered written data has been written to the device...
It applies only to buffered devices. Not all devices are buffered. If they are, and you wrote less than what the buffer can hold, write can return successfully before the device has finished sending all the data.
Devices are not necessarily buffered.