Sending Large Base64 String over TCP Socket - c++

I am trying to send a Base64 encoded image from TCP client using GO and TCP server in C++.
Here is the code snippet for C++ Receiver
std::string recieve(int bufferSize=1024,const char *eom_flag = "<EOF>"){
char buffer[bufferSize];
std::string output;
int iResult;
char *eom;
do{
iResult = recv(client, buffer, sizeof(buffer), 0);
//If End OF MESSAGE flag is found.
eom = strstr(buffer,eom_flag);
//If socket is waiting , do dot append the json, keep on waiting.
if(iResult == 0){
continue;
}
output+=buffer;
//Erase null character, if exist.
output.erase(std::find(output.begin(), output.end(), '\0'), output.end());
//is socket connection is broken or end of message is reached.
}while(iResult > -1 and eom == NULL);
//Trim <EOF>
std::size_t eom_pos = output.rfind(eom_flag);
return output.substr(0,eom_pos);}
Idea is to receive the message until End of Message is found, thereafter continue to listen for another message on the same TCP connection.
Golang TCP client code snippet.
//Making connection
connection, _ := net.Dial("tcp", "localhost"+":"+PortNumber)
if _, err := fmt.Fprintf(connection, B64img+"<EOF>"); err != nil {
log.Println(err)
panic(err)
}
Tried approaches:
Increasing the buffer size in the C++ receiver.
Removing the null character from the end of the string in the C++ receiver.
Observations:
Length of string sent by the client is fixed, while the length of the string after receive function is larger and
random. Example: Go client string length is 25243. For the same string, length after receive when i
run send and receive in the loop is 25243, 26743, 53092, 41389, 42849.
On Saving the received string in a file, I see <0x7f> <0x02> character in the string.
I am using winsock2.h for c++ socket.

You are treating the received data as a C string - a sequence of bytes ending with a 0 byte - which is not correct.
recv receives some bytes and puts them in buffer. Let's say it received 200 bytes.
You then do strstr(buffer,eom_flag);. strstr doesn't know that 200 bytes were received. strstr starts from the beginning of the buffer, and keeps looking until it finds either , or a 0 byte. There is a chance that it might find a in the other 824 bytes of the buffer, even though you didn't receive one.
Then you do output += buffer;. This also treats the buffer as if it ends with a 0 byte. This will look through the whole buffer (not just the first 200 bytes) to find a 0 byte. It will then add everything up to that point into output. Again, it might find a 0 byte in the last 824 bytes of the buffer, and add too much data. It might not find a 0 byte in the buffer at all, and then it will keep on adding extra data from other variables that are stored next to buffer in memory. Or it might find a 0 byte in the first 200 bytes, and stop there (but only if you sent a 0 byte).
What you should do is pay attention to the number of bytes received (which is iResult) and add that many bytes to the output. You could use:
output.insert(output.end(), buffer, buffer+iResult);
Also (as Phillipe Thomassigny has pointed out in a comment), the "" might not be received all at once. You might receive "" separately. You should check whether output has an "" instead of checking whether buffer has an "". (The performance implications of this are left as an exercise to the reader)
By the way, this line doesn't do anything at the moment:
output.erase(std::find(output.begin(), output.end(), '\0'), output.end());
because '\0' never gets added to output, because with output += buffer;, a '\0' tells it where to stop adding.

Related

Garbage values and Buffers differences in TCP

First question: I am confused between Buffers in TCP. I am trying to explain my proble, i read this documentation TCP Buffer, author said a lot about TCP Buffer, thats fine and a really good explanation for a beginner. What i need to know is this TCP Buffer is same buffer with the one we use in our basic client server program (Char *buffer[Some_Size]) or its some different buffer hold by TCP internally ?
My second question is that i am sending a string data with prefix length (This is data From me) from client over socket to server, when i print my data at console along with my string it prints some garbage value also like this "This is data From me zzzzzz 1/2 1/2....." ?. However i fixed it by right shifting char *recvbuf = new char[nlength>>3]; nlength to 3 bits but why i need to do it in this way ?
My third question is in relevance with first question if there is nothing like TCP Buffer and its only about the Char *buffer[some_size] then whats the difference my program will notice using such static memory allocation buffer and by using dynamic memory allocation buffer using char *recvbuf = new char[nlength];. In short which is best and why ?
Client Code
int bytesSent;
int bytesRecv = SOCKET_ERROR;
char sendbuf[200] = "This is data From me";
int nBytes = 200, nLeft, idx;
nLeft = nBytes;
idx = 0;
uint32_t varSize = strlen (sendbuf);
bytesSent = send(ConnectSocket,(char*)&varSize, 4, 0);
assert (bytesSent == sizeof (uint32_t));
std::cout<<"length information is in:"<<bytesSent<<"bytes"<<std::endl;
// code to make sure all data has been sent
while (nLeft > 0)
{
bytesSent = send(ConnectSocket, &sendbuf[idx], nLeft, 0);
if (bytesSent == SOCKET_ERROR)
{
std::cerr<<"send() error: " << WSAGetLastError() <<std::endl;
break;
}
nLeft -= bytesSent;
idx += bytesSent;
}
std::cout<<"Client: Bytes sent:"<< bytesSent;
Server code:
int bytesSent;
char sendbuf[200] = "This string is a test data from server";
int bytesRecv;
int idx = 0;
uint32_t nlength;
int length_received = recv(m_socket,(char*)&nlength, 4, 0);//Data length info
char *recvbuf = new char[nlength];//dynamic memory allocation based on data length info
//code to make sure all data has been received
while (nlength > 0)
{
bytesRecv = recv(m_socket, &recvbuf[idx], nlength, 0);
if (bytesRecv == SOCKET_ERROR)
{
std::cerr<<"recv() error: " << WSAGetLastError() <<std::endl;
break;
}
idx += bytesRecv;
nlength -= bytesRecv;
}
cout<<"Server: Received complete data is:"<< recvbuf<<std::endl;
cout<<"Server: Received bytes are"<<bytesRecv<<std::endl;
WSACleanup();
system("pause");
delete[] recvbuf;
return 0;
}
You send 200 bytes from the client, unconditionally, but in the server you only receive the actual length of the string, and that length does not include the string terminator.
So first of all you don't receive all data that was sent (which means you will fill up the system buffers), and then you don't terminate the string properly (which leads to "garbage" output when trying to print the string).
To fix this, in the client only send the actual length of the string (the value of varSize), and in the receiving server allocate one more character for the terminator, which you of course needs to add.
First question: I am confused between Buffers in TCP. I am trying to
explain my proble, i read this documentation TCP Buffer, author said a
lot about TCP Buffer, thats fine and a really good explanation for a
beginner. What i need to know is this TCP Buffer is same buffer with
the one we use in our basic client server program (Char
*buffer[Some_Size]) or its some different buffer hold by TCP internally ?
When you call send(), the TCP stack will copy some of the bytes out of your char array into an in-kernel buffer, and send() will return the number of bytes that it copied. The TCP stack will then handle the transmission of those in-kernel bytes to its destination across the network as quickly as it can. It's important to note that send()'s return value is not guaranteed to be the same as the number of bytes you specified in the length argument you passed to it; it could be less. It's also important to note that sends()'s return value does not imply that that many bytes have arrived at the receiving program; rather it only indicates the number of bytes that the kernel has accepted from you and will try to deliver.
Likewise, recv() merely copies some bytes from an in-kernel buffer to the array you specify, and then drops them from the in-kernel buffer. Again, the number of bytes copied may be less than the number you asked for, and generally will be different from the number of bytes passed by the sender on any particular call of send(). (E.g if the sender called send() and his send() returned 1000, that might result in you calling recv() twice and having recv() return 500 each time, or recv() might return 250 four times, or (1, 990, 9), or any other combination you can think of that eventually adds up to 1000)
My second question is that i am sending a string data with prefix
length (This is data From me) from client over socket to server, when
i print my data at console along with my string it prints some garbage
value also like this "This is data From me zzzzzz 1/2 1/2....." ?.
However i fixed it by right shifting char *recvbuf = new
char[nlength>>3]; nlength to 3 bits but why i need to it in this way ?
Like Joachim said, this happens because C strings depend on the presence of a NUL-terminator byte (i.e. a zero byte) to indicate their end. You are receiving strlen(sendbuf) bytes, and the value returned by strlen() does not include the NUL byte. When the receiver's string-printing routine tries to print the string, it keeps printing until if finds a NUL byte (by chance) somewhere later on in memory; in the meantime, you get to see all the random bytes that are in memory before that point. To fix the problem, either increase your sent-bytes counter to (strlen(sendbuf)+1), so that the NUL terminator byte gets received as well, or alternatively have your receiver manually place the NUL byte at the end of the string after it has received all of the bytes of the string. Either way is acceptable (the latter way might be slightly preferable as that way the receiver isn't depending on the sender to do the right thing).
Note that if your sender is going to always send 200 bytes rather than just the number of bytes in the string, then your receiver will need to always receive 200 bytes if it wants to receive more than one block; otherwise when it tries to receive the next block it will first get all the extra bytes (after the string) before it gets the next block's send-length field.
My third question is in relevance with first question if there is
nothing like TCP Buffer and its only about the Char *buffer[some_size]
then whats the difference my program will notice using such static
memory allocation buffer and by using dynamic memory allocation buffer
using char *recvbuf = new char[nlength];. In short which is best and
why ?
In terms of performance, it makes no difference at all. send() and receive() don't care a bit whether the pointers you pass to them point at the heap or the stack.
In terms of design, there are some tradeoffs: if you use new, there is a chance that you can leak memory if you don't always call delete[] when you're done with the buffer. (This can particularly happen when exceptions are thrown, or when error paths are taken). Placing the buffer on the stack, on the other hand, is guaranteed not to leak memory, but the amount of space available on the stack is finite so a really huge array could cause your program to run out of stack space and crash. In this case, a single 200-byte array on the stack is no problem, so that's what I would use.

Partial receipt of packets from socket C++

I have a trouble, my server application sends packet 8 bytes length - AABBCC1122334455 but my application receives this packet in two parts AABBCC1122 and 334455, via "recv" function, how can i fix that?
Thanks!
To sum up a liitle bit:
TCP connection doesn't operate with packets or messages on the application level, you're dealing with stream of bytes. From this point of view it's similar to writing and reading from a file.
Both send and recv can send and receive less data than provided in the argument. You have to deal with it correctly (usually by applying proper loop around the call).
As you're dealing with streams, you have to find the way to convert it to meaningful data in your application. In other words, you have to design serialisation protocol.
From what you've already mentioned, you most probably want to send some kind of messages (well, it's usually what people do). The key thing is to discover the boundaries of messages properly. If your messages are of fixed size, you simply grab the same amount of data from the stream and translate it to your message; otherwise, you need a different approach:
If you can come up with a character which cannot exist in your message, it could be your delimiter. You can then read the stream until you reach the character and it'll be your message. If you transfer ASCII characters (strings) you can use zero as a separator.
If you transfer binary data (raw integers etc.), all characters can appear in your message, so nothing can act as a delimiter. Probably the most common approach in this case is to use fixed-size prefix containing size of your message. Size of this extra field depends on the max size of your message (you will be probably safe with 4 bytes, but if you know what is the maximum size, you can use lower values). Then your packet would look like SSSS|PPPPPPPPP... (stream of bytes), where S is the additional size field and P is your payload (the real message in your application, number of P bytes is determined by value of S). You know every packet starts with 4 special bytes (S bytes), so you can read them as an 32-bit integer. Once you know the size of the encapsulated message, you read all the P bytes. After you're done with one packet, you're ready to read another one from the socket.
Good news though, you can come up with something completely different. All you need to know is how to deserialise your message from a stream of bytes and how send/recv behave. Good luck!
EDIT:
Example of function receiving arbitrary number of bytes into array:
bool recv_full(int sock, char *buffer, size_t size)
{
size_t received = 0;
while (received < size)
{
ssize_t r = recv(sock, buffer + received, size - received, 0);
if (r <= 0) break;
received += r;
}
return received == size;
}
And example of receiving packet with 2-byte prefix defining size of payload (size of payload is then limited to 65kB):
uint16_t msgSize = 0;
char msg[0xffff];
if (recv_full(sock, reinterpret_cast<char *>(&msgSize), sizeof(msgSize)) &&
recv_full(sock, msg, msgSize))
{
// Got the message in msg array
}
else
{
// Something bad happened to the connection
}
That's just how recv() works on most platforms. You have to check the number of bytes you receive and continue calling it in a loop until you get the number that you need.
You "fix" that by reading from TCP socket in a loop until you get enough bytes to make sense to your application.
my server application sends packet 8 bytes length
Not really. Your server sends 8 individual bytes, not a packet 8 bytes long. TCP data is sent over a byte stream, not a packet stream. TCP neither respects nor maintains any "packet" boundary that you might have in mind.
If you know that your data is provided in quanta of N bytes, then call recv in a loop:
std::vector<char> read_packet(int N) {
std::vector buffer(N);
int total = 0, count;
while ( total < N && (count = recv(sock_fd, &buffer[N], N-total, 0)) > 0 )
total += count;
return buffer;
}
std::vector<char> packet = read_packet(8);
If your packet is variable length, try sending it before the data itself:
int read_int() {
std::vector<char> buffer = read_packet(sizeof (int));
int result;
memcpy((void*)&result, (void*)&buffer[0], sizeof(int));
return result;
}
int length = read_int();
std::vector<char> data = read_buffer(length);

receive from unix local socket and buffer size

I'm having a problem with unix local sockets. While reading a message that's longer than my temp buffer size, the request takes too long (maybe indefinitely).
Added after some tests:
there is still problem with freeze at ::recv. when I send (1023*8) bytes or less to the UNIX socket - all ok, but when sended more than (1023*9) - i get freeze on recv command.
maybe its FreeBSD default UNIX socket limit or C++ default socket settings? Who know?
i made some additational tests and I am 100% sure that its "freeze" on the last 9th itteration when executing ::recv command, when trying to read message >= (1023*9) bytes long. (first 8th itterationg going well.)
What I'm doing:
The idea is to read in a do/while loop from a socket with
::recv (current_socket, buf, 1024, 0);
and check buf for a SPECIAL SYMBOL. If not found:
merge content of buffer to stringxxx += buf;
bzero temp buf
continue the ::recv loop
How do I fix the issue with the request taking too long in the while loop?
Is there a better way to clear the buffer? Currently, it's:
char buf [1025];
bzero(buf, 1025);
But I know bzero is deprecated in the new c++ standard.
EDIT:
*"Why need to clean the buffer*
I see questions at comments with this question. Without buffer cleanup on the next(last) itteration of reading to the buffer, it will contain the "tail" of first part of the message.
Example:
// message at the socket is "AAAAAACDE"
char buf [6];
::recv (current_socket, buf, 6, 0); // read 6 symbols, buf = "AAAAAA"
// no cleanup, read the last part of the message with recv
::recv (current_socket, buf, 6, 0);
// read 6 symbols, but buffer contain only 3 not readed before symbols, therefore
// buf now contain "CDEAAA" (not correct, we waiting for CDE only)
When your recv() enters an infinite loop, this probably means that it's not making any progress whatsoever on the iterations (i.e., you're always getting a short read of zero size immediately, so your loop never exits, because you're not getting any data). For stream sockets, a recv() of zero size means that the remote end has disconnected (it's something like read()ing from a file when the input is positioned at EOF also gets you zero bytes), or at least that it has shut down the sending channel (that's for TCP specifically).
Check whether your PHP script is actually sending the amount of data you claim it sends.
To add a small (non-sensical) example for properly using recv() in a loop:
char buf[1024];
std::string data;
while( data.size() < 10000 ) { // what you wish to receive
::ssize_t rcvd = ::recv(fd, buf, sizeof(buf), 0);
if( rcvd < 0 ) {
std::cout << "Failed to receive\n"; // Receive failed - something broke, see errno.
std::abort();
} else if( !rcvd ) {
break; // No data to receive, remote end closed connection, so quit.
} else {
data.append(buf, rcvd); // Received into buffer, attach to data buffer.
}
}
if( data.size() < 10000 ) {
std::cout << "Short receive, sender broken\n";
std::abort();
}
// Do something with the buffer data.
Instead of bzero, you can just use
memset(buf, 0, 1025);
These are 2 separate issues. The long time is probably some infinite loop due to a bug in your code and has nothing to do with the way you clear your buffer. As a matter of fact you shouldn't need to clear the buffer; receive returns the number of bytes read, so you can scan the buffer for your SPECIAL_SYMBOL up to that point.
If you paste the code maybe I can help. more.
Just to clarify: bzero is not deprecated in C++ 11. Rather, it's never been part of any C or C++ standard. C started out with memset 20+ years ago. For C++, you might consider using std::fill_n instead (or just using std::vector, which can zero-fill automatically). Then again, I'm not sure there's a good reason to zero-fill the buffer in this case at all.

Reading socket reply in loop

I have:
char buf[320];
read(soc, buf, sizeof(buf));
//print buf;
However, sometimes the reply is much bigger then 320 characters, so I'm trying to run the read in a loop to avoid taking up too much memory space. I tried read(soc, buf, sizeof(buf)) but that only prints the same first x characters over again. How would I print the leftover characters that did not fit into the first 320 characters in a loop?
Thanks
Change your loop to something like:
int numread;
while(1) {
if ((numread = read(soc, buf, sizeof(buf) - 1)) == -1) {
perror("read");
exit(1);
}
if (numread == 0)
break;
buf[numread] = '\0';
printf("Reply: %s\n", buf);
}
for the reasons Nikola states.
Every time you call read( s, buf, buf_size ) the kernel copies min( buf_size, bytes_available ) into the buf, where bytes_available is the number of bytes already received and waiting in socket receive buffer. The read(2) system call returns the number of bytes placed into application buffer, or -1 on error, or 0 to signal EOF, i.e. a close(2) of the socket on the sending end. Thus when you reuse the buffer, only part of it might be overwritten with new data. Also note that -1 evaluates to true in C and C++. This is probably the case you are hitting.
printf(3) expects zero-terminated string for the %s format specifier. The bytes read from the socket might not contain the '\0' byte, thus letting printf(3) print till it finds zero further down somewhere. This might lead to buffer overrun.
The points here are:
Always check the value returned from read(2)
If you print strings read from a socket - always zero-terminate them manually.
Hope this helps.

Unusual HTTP Response in Basic C++ Socket Programming

I've got a basic HTTP client set up in C++, which works ok so far. It's for a school assignment, so there's lots more to do, but I'm having a problem.
I use the recv() function in a while loop, to repeatedly add pieces of the response to my response buffer, and then output that buffer each time. The problem is, at the end of each piece of the response, the HTTP Request is getting tacked on as well.
For example, the response will be a chunk of the page's source code, followed by "GET / HTTP/1.1...", followed by the next chunk, and then the "GET..." again, and so on.
Here's my relevant code:
// Prepare request
char request[] = "HEAD /index.html HTTP/1.1\r\nHOST: www.google.com\r\nCONNECTION: close\r\n\r\n";
// Send request
len = send(sockfd, request, sizeof(request), 0);
// Write/output response
while (recv(sockfd, buf, sizeof(buf), 0) != 0)
{
// Read & output response
printf("%s", buf);
}
The buffer isn't null terminated, which is required for strings in C++. When you see the "extra GET", you are seeing memory that you shouldn't be because the stdlib tried to print your buffer, but never found a '\0' character.
A quick fix is to force the buffer to be terminated:
int n = 1;
while (n > 0) {
n = recv(sockfd, buf, sizeof(buf), 0);
if (n > 0) {
// null terminate the buffer so that we can print it
buf[n] = '\0';
// output response
printf("%s", buf);
}
}
I suspect it's because your buf is allocated in memory just below your request. When you call printf on the buffer, printf will print as much as it can before finding a NUL character (which marks the end of the string). If there isn't one, it'll go right on through into request. And generally, there won't be one, because recv is for receiving binary data and doesn't know that you want to treat its output a string.
One quick fix would be to limit the receive operation to sizeof(buf)-1, and to explicitly add the NUL terminator yourself, using the size of the returned data:
while ((nr = recv(sockfd, buf, sizeof(buf), 0)) > 0)
{
buf[nr] = 0;
...
}
Of course, for this to (marginally) safe you need to be sure that you'll always receive printable data.
recv does not add a \0 string terminator to the buffer recieved - it just works in raw binary. So your printf is running off the send of your buf buffer (and apparently ending up looking at your request buffer).
Either add a nul-terminator to the end of buf, or print the buffer one character at a time using putchar() (both of these approaches will make it necessary to store the value returned by recv()).
The recv call will not null-terminate buf; instead, it will just provide you with the raw data received from the wire. You need to save the return value of recv, and then add a null-terminating byte yourself into buf before printing it. Consequentially, you can only ask for sizeof(buf)-1 bytes.