read handle problem - c++

I am working on network programming using epoll and I have this code...
int read = read(socket, buf, bufsize);
I have a huge buffer size and I assumed it will receive everything clients sent.
However, I started facing problems like packet segmentation.
One example is that if a client sent 500 bytes but it somehow got into two 250 bytes packets then there is no way to handle this situation.
I looked up online and found this code
int handle_read(client *cli, struct epoll_event *ev) {
size_t len = 4096;
char *p;
ssize_t received;
cli->state = 1;
if (cli->buffer != NULL) {
//free(cli->buffer);
//printf("Buff not null %s\n", cli->buffer);
}
//allocate space for data
cli->buffer = (char*)malloc( (size_t)(sizeof(char) * 4096) );
p = cli->buffer;
do { //read until loop conditions
received = recv(ev->data.fd, p, len, 0);
if (received < 0 && errno != EAGAIN && errno != EWOULDBLOCK) {
//if error, remove from epoll and close socket
printf("Handle error!!!\nClient disconnected!\n");
epoll_ctl(epollfd, EPOLL_CTL_DEL, ev->data.fd, ev);
close(ev->data.fd);
}
p = &cli->buffer[received];
} while (received >= len && errno != EAGAIN && errno != EWOULDBLOCK);
return received;
}
Do you guys think it handles all the exceptions might happen while receiving? Also could you please provide me tutorials or examples that handles socket exceptions? Sample codes online don't cover details.. Thanks in advance

recv can return any of three things, and your code needs to handle each one correctly:
1) Positive number. This means it read some bytes.
2) Negative number. This means an "error" occurred.
3) Zero. This means the other end of the connection performed a successful shutdown() (or close()) on the socket. (In general, a return of 0 from read() or recv() means EOF.)
The "error" case further breaks down into "EAGAIN or EWOULDBLOCK" and "everything else". The first two just means it is a non-blocking socket and there was no data to give you at this time. You probably want to go back and call poll() (or select() or epoll()) again to avoid busy waiting...
"Everything else" means a real error. You need to handle those too; see the POSIX spec for recv() for a complete list.
Given all this, I would say your sample code is bad for several reasons. It does not handle 0 (closed connection) properly. It does not handle any errors. It does a busy-loop when the recv() returns EAGAIN/EWOULDBLOCK.
Oh, and it uses sizeof(char), which is a sure sign it was written by somebody who is not familiar with the C or C++ programming languages.

You can't know "How many datas client sent" in normaly. you should use scalable data format(that have data length in the header) or separator for data tokens. For example, you may add \xff between data and next data. Or, you should use fixed data format.

Related

Epoll reverse proxy stuck while writing client

I am trying to write reverse proxy with nonblocking socket and epoll. That seems ok at first, but when I tried to open a big jpg file, I got stuck.
When I try to write into client sometimes It may not writable and how can I handle proper way.
Additional Notes:
this->getFd() = ProxyFd
this->clientHandler->getFd = clientFd
I am using EPOLLET flag both proxy and client
if( (flag & EPOLLIN) ){
char buffer[1025] = {'\0'};
int readSize;
while( (readSize = read(this->getFd(),buffer,1024)) > 0){
this->headerParse(buffer);
this->readSize += readSize;
int check = 0;
do{
check = write(this->clientHandler->getFd(),buffer,readSize);
}while(check < 0);
}
if(this->headerEnd == 1 && this->readSize >= this->headerLenght ){
close(this->clientHandler->getFd());
close(this->getFd());
delete this->clientHandler;
delete this;
}
}
Thanks for taking time to read.
Assuming your headerParse() method doesn't change buffer in a size-extending way (you'd need to update readsize, at least, not to mention the buffer full scenario), it seems like your write() path is broken.
if the socket you're writing to is also in nonblocking mode, it's perfectly legal for write() to return -1 (and set errno to EGAIN or EWOULDBLOCK or whatever your platform has) before you wrote all data.
In that case, you must store the remaining data (the remainder of buffer minus what was written if one or more calls to write() succeeded), program epoll to notify the clientHandler->getFd() descriptor for writeability, if not already, and when you get subsequent "write ready" event, you write the data you stored. On this case, the write() can again be unable to flush all your data, so you must cycle until all data is sent.

C++ TCP recv unknown buffer size

I want to use the function recv(socket, buf, len, flags) to receive an incoming packet. However I do not know the length of this packet prior to runtime so the first 8 bytes are supposed to tell me the length of this packet. I don't want to just allocate an arbitrarily large len to accomplish this so is it possible to set len = 8 have buf be a type of uint64_t. Then afterwards
memcpy(dest, &buf, buf)?
Since TCP is stream-based, I'm not sure what type of packages you mean. I will assume that you are referring to application level packages. I mean packages which are defined by your application and not by underlying protocols like TCP. I will call them messages instead to avoid confusion.
I will show two possibilities. First I will show, how you could read a message without knowing the length before you have finished reading. The second example will do two calls. First it reads the size of the message. Then it read the whole message at once.
Read data until the message is complete
Since TCP is stream-based, you will not loss any data when your buffer is not big enough. So you can read a fixed amount of bytes. If something is missing, you can call recv again. Here is a extensive example. I just wrote it without testing. I hope everything would work.
std::size_t offset = 0;
std::vector<char> buf(512);
std::vector<char> readMessage() {
while (true) {
ssize_t ret = recv(fd, buf.data() + offset, buf.size() - offset, 0);
if (ret < 0) {
if (errno == EINTR) {
// Interrupted, just try again ...
continue;
} else {
// Error occured. Throw exception.
throw IOException(strerror(errno));
}
} else if (ret == 0) {
// No data available anymore.
if (offset == 0) {
// Client did just close the connection
return std::vector<char>(); // return empty vector
} else {
// Client did close connection while sending package?
// It is not a clean shutdown. Throw exception.
throw ProtocolException("Unexpected end of stream");
}
} else if (isMessageComplete(buf)) {
// Message is complete.
buf.resize(offset + ret); // Truncate buffer
std::vector<char> msg = std::move(buf);
std::size_t msgLen = getSizeOfMessage(msg);
if (msg.size() > msgLen) {
// msg already contains the beginning of the next message.
// write it back to buf
buf.resize(msg.size() - msgLen)
std::memcpy(buf.data(), msg.data() + msgLen, msg.size() - msgLen);
msg.resize(msgLen);
}
buf.resize(std::max(2*buf.size(), 512)) // prepare buffer for next message
return msg;
} else {
// Message is not complete right now. Read more...
offset += ret;
buf.resize(std::max(buf.size(), 2 * offset)); // double available memory
}
}
}
You have to define bool isMessageComplete(std::vector<char>) and std::size_t getSizeOfMessage(std::vector<char>) by yourself.
Read the header and check the length of the package
The second possibility is to read the header first. Just the 8 bytes which contains the size of the package in your case. After that, you know the size of the package. This mean you can allocate enough storage and read the whole message at once:
/// Reads n bytes from fd.
bool readNBytes(int fd, void *buf, std::size_t n) {
std::size_t offset = 0;
char *cbuf = reinterpret_cast<char*>(buf);
while (true) {
ssize_t ret = recv(fd, cbuf + offset, n - offset, MSG_WAITALL);
if (ret < 0) {
if (errno != EINTR) {
// Error occurred
throw IOException(strerror(errno));
}
} else if (ret == 0) {
// No data available anymore
if (offset == 0) return false;
else throw ProtocolException("Unexpected end of stream");
} else if (offset + ret == n) {
// All n bytes read
return true;
} else {
offset += ret;
}
}
}
/// Reads message from fd
std::vector<char> readMessage(int fd) {
std::uint64_t size;
if (readNBytes(fd, &size, sizeof(size))) {
std::vector buf(size);
if (readNBytes(fd, buf.data(), size)) {
return buf;
} else {
throw ProtocolException("Unexpected end of stream");
}
} else {
// connection was closed
return std::vector<char>();
}
}
The flag MSG_WAITALL requests that the function blocks until the full amount of data is available. However, you cannot rely on that. You have to check it and read again if something is missing. Just like I did it above.
readNBytes(fd, buf, n) reads n bytes. As far as the connection was not closed from the other side, the function will not return without reading n bytes. If the connection was closed by the other side, the function returns false. If the connection was closed in the middle of a message, an exception is thrown. If an i/o-error occurred, another exception is thrown.
readMessage reads 8 bytes [sizeof(std::unit64_t)] und use them as size for the next message. Then it reads the message.
If you want to have platform independency, you should convert size to a defined byte order. Computers (with x86 architecture) are using little endian. It is common to use big endian in network traffic.
Note: With MSG_PEEK it is possible to implement this functionality for UDP. You can request the header while using this flag. Then you can allocate enough space for the whole package.
A fairly common technique is to read leading message length field, then issue a read for the exact size of the expected message.
HOWEVER! Do not assume that the first read will give you all eight bytes(see Note), or that the second read will give you the entire message/packet.
You must always check the number of bytes read and issue another read (or two (or three, or...)) to get all the data you want.
Note: Because TCP is a streaming protocol and because the packet size "on the wire" varies in accordance with a very arcane algorithm designed to maximize network performance, you could easily issue a read for eight bytes and the read could return having only read three (or seven or ...) bytes. The guarantee is that unless there is an unrecoverable error you will receive at least one byte and at most the number of bytes you requested. Because of this you must be prepared to do byte address arithmetic and issue all reads in a loop that repeats until the desired number of bytes is returned.
Since TCP is streaming there isn't really any end to the data you receive, not until the connection is closed or there is an error.
Instead you need to implement your own protocol on top of TCP, one that either contains a specific end-of-message marker, a length-of-data header field, or possibly a command-based protocol where the data of each command is of a well-known size.
That way you can read into a small fixed-sized buffer and append to a larger (possibly expanding) buffer as needed. The "possibly expanding" part is ridiculously easy in C++, what with std::vector and std::string (depending on the data you have)
There is another important thing to remember, that since TCP is stream-based, a single read or recv call may not actually fetch all the data you request. You need to receive the data in a loop until you have received everything.
In my Personal opinion.
I suggest receive "size of message"(integer 4 byte fixed) first.
recv(socket, "size of message written in integer" , "size of integer")
then
receive real message after.
recv(socket, " real message" ,"size of message written in integer")
This techinique also can be used on "sending files, images ,long messages"

recv the first few bytes from a socket to determine buffer size

I'm writing a distributed system in c++ using TCP/IP and sockets.
For each of my messages, I need to receive the first 5 bytes to know the full length of the incoming message.
What's the best way to do this?
recv() only 5 bytes, then recv() again. if I choose this, would it be safe to assume I'll get 0 or 5 bytes in the recv (aka not write a loop to keep trying)?
use MSG_PEEK
recv() some larger buffer size, then read the first 5 bytes and allocate the final buffer then.
You don't need to know anything. TCP is a stream protocol, and at any given moment you can get as little as one byte, or as much as multiple megabytes of data. The correct and only way to use a TCP socket is to read in a loop.
char buf[4096]; // or whatever
std::deque<char> data;
for (int res ; ; )
{
res = recv(fd, buf, sizeof buf, MSG_DONTWAIT);
if (res == -1)
{
if (errno == EAGAIN || errno == EWOULDBLOCK)
{
break; // done reading
}
else
{
// error, break, die
}
}
if (res == 0)
{
// socket closed, finalise, break
}
else
{
data.insert(data.end(), buf, buf + res);
}
}
The only purpose of the loop is to transfer data from the socket buffer to your application. Your application must then decide independently if there's enough data in the queue to attempt extraction of some sort of higher-level application message.
For example, in your case you would check if the queue's size is at least 5, then inspect the first five bytes, and then check if the queue holds a complete application message. If no, you abort, and if yes, you extract the entire message and pop if off from the front of the queue.
Use a state machine with two states:
First State.
Receive bytes as they arrive into a buffer. When there are 5 or more bytes perform your check on those first 5 and possibly process the rest of the buffer. Switch to the second state.
Second State.
Receive and process bytes as they arrive to the end of the message.
to answer your question specifically:
it's not safe to assume you'll get 0 or 5. it is possible to get 1-4 as well. loop until you get 5 or an error as others have suggested.
i wouldn't bother with PEEK, most of the time you'll block (assuming blocking calls) or get 5 so skip the extra call into the stack.
this is fine too but adds complexity for little gain.

receive from unix local socket and buffer size

I'm having a problem with unix local sockets. While reading a message that's longer than my temp buffer size, the request takes too long (maybe indefinitely).
Added after some tests:
there is still problem with freeze at ::recv. when I send (1023*8) bytes or less to the UNIX socket - all ok, but when sended more than (1023*9) - i get freeze on recv command.
maybe its FreeBSD default UNIX socket limit or C++ default socket settings? Who know?
i made some additational tests and I am 100% sure that its "freeze" on the last 9th itteration when executing ::recv command, when trying to read message >= (1023*9) bytes long. (first 8th itterationg going well.)
What I'm doing:
The idea is to read in a do/while loop from a socket with
::recv (current_socket, buf, 1024, 0);
and check buf for a SPECIAL SYMBOL. If not found:
merge content of buffer to stringxxx += buf;
bzero temp buf
continue the ::recv loop
How do I fix the issue with the request taking too long in the while loop?
Is there a better way to clear the buffer? Currently, it's:
char buf [1025];
bzero(buf, 1025);
But I know bzero is deprecated in the new c++ standard.
EDIT:
*"Why need to clean the buffer*
I see questions at comments with this question. Without buffer cleanup on the next(last) itteration of reading to the buffer, it will contain the "tail" of first part of the message.
Example:
// message at the socket is "AAAAAACDE"
char buf [6];
::recv (current_socket, buf, 6, 0); // read 6 symbols, buf = "AAAAAA"
// no cleanup, read the last part of the message with recv
::recv (current_socket, buf, 6, 0);
// read 6 symbols, but buffer contain only 3 not readed before symbols, therefore
// buf now contain "CDEAAA" (not correct, we waiting for CDE only)
When your recv() enters an infinite loop, this probably means that it's not making any progress whatsoever on the iterations (i.e., you're always getting a short read of zero size immediately, so your loop never exits, because you're not getting any data). For stream sockets, a recv() of zero size means that the remote end has disconnected (it's something like read()ing from a file when the input is positioned at EOF also gets you zero bytes), or at least that it has shut down the sending channel (that's for TCP specifically).
Check whether your PHP script is actually sending the amount of data you claim it sends.
To add a small (non-sensical) example for properly using recv() in a loop:
char buf[1024];
std::string data;
while( data.size() < 10000 ) { // what you wish to receive
::ssize_t rcvd = ::recv(fd, buf, sizeof(buf), 0);
if( rcvd < 0 ) {
std::cout << "Failed to receive\n"; // Receive failed - something broke, see errno.
std::abort();
} else if( !rcvd ) {
break; // No data to receive, remote end closed connection, so quit.
} else {
data.append(buf, rcvd); // Received into buffer, attach to data buffer.
}
}
if( data.size() < 10000 ) {
std::cout << "Short receive, sender broken\n";
std::abort();
}
// Do something with the buffer data.
Instead of bzero, you can just use
memset(buf, 0, 1025);
These are 2 separate issues. The long time is probably some infinite loop due to a bug in your code and has nothing to do with the way you clear your buffer. As a matter of fact you shouldn't need to clear the buffer; receive returns the number of bytes read, so you can scan the buffer for your SPECIAL_SYMBOL up to that point.
If you paste the code maybe I can help. more.
Just to clarify: bzero is not deprecated in C++ 11. Rather, it's never been part of any C or C++ standard. C started out with memset 20+ years ago. For C++, you might consider using std::fill_n instead (or just using std::vector, which can zero-fill automatically). Then again, I'm not sure there's a good reason to zero-fill the buffer in this case at all.

One problem with read function in c++

I am using read function to read data from a socket, but when the data is more than 4k, read function just read part of the data, for example, less than 4k. Here is the key code:
mSockFD = socket(AF_INET, SOCK_STREAM, 0);
if (connect(mSockFD, (const sockaddr*)(&mSockAdd), sizeof(mSockAdd)) < 0)
{
cerr << "Error connecting in Crawl" << endl;
perror("");
return false;
}
n = write(mSockFD, httpReq.c_str(), httpReq.length());
bzero(mBuffer, BUFSIZE);
n = read(mSockFD, mBuffer, BUFSIZE);
Note than BUFSIZE is much larger than 4k.
When data is just a few hundred bytes, read function works as expected.
This is by design and to be expected.
The short answer to your question is you should continue calling "read" until you get all the data you expect. That is:
int total_bytes = 0;
int expected = BUFSIZE;
int bytes_read;
char *buffer = malloc(BUFSIZE+1); // +1 for null at the end
while (total_bytes < expected)
{
int bytes_read = read(mSockFD, buffer+total_bytes, BUFSIZE-total_bytes);
if (bytes_read <= 0)
break;
total_bytes += bytes_read;
}
buffer[total_bytes] = 0; // null terminate - good for debugging as a string
From my experience, one of the biggest misconceptions (resulting in bugs) that you'll receive as much data as you ask for. I've seen shipping code in real products written with the expectation that sockets work this way (and no one certain as to why it doesn't work reliably).
When the other side sends N bytes, you might get lucky and receive it all at once. But you should plan for receiving N bytes spread out across multiple recv calls. With the exception of a real network error, you'll eventually get all N bytes. Segmentation, fragmentation, TCP window size, MTU, and the socket layer's data chunking scheme are the reasons for all of this. When partial data is received, the TCP layer doesn't know about how much more is yet to come. It just passes what it has up to the app. It's up to the app to decide if it got enough.
Likewise, "send" calls can get conglomerated into the same packet together.
There may be ioctls and such that will make a socket block until all the expected data is received. But I don't know of any off hand.
Also, don't use read and write for sockets. Use recv and send.
Read this book. It will change your life with regards to sockets and TCP: