How to iterate through a fd_set - c++

I'm wondering if there's an easy way to iterate through a fd_set? The reason I want to do this is to not having to loop through all connected sockets, since select() alters these fd_sets to only include the ones I'm interested about. I also know that using an implementation of a type that is not meant to be directly accessed is generally a bad idea since it may vary across different systems. However, I need some way to do this, and I'm running out of ideas. So, my question is:
How do I iterate through an fd_set? If this is a really bad practice, are there any other ways to solve my "problem" except from looping through all connected sockets?
Thanks

You have to fill in an fd_set struct before calling select(), you cannot pass in your original std::set of sockets directly. select() then modifies the fd_set accordingly, removing any sockets that are not "set", and returns how many sockets are remaining. You have to loop through the resulting fd_set, not your std::set. There is no need to call FD_ISSET() because the resulting fd_set only contains "set" sockets that are ready, eg:
fd_set read_fds;
FD_ZERO(&read_fds);
int max_fd = 0;
read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i )
{
read_fds.fd_array[i] = connected_sockets[i];
if (read_fds.fd_array[i] > max_fd)
max_fd = read_fds.fd_array[i];
}
if (select(max_fd+1, &read_fds, NULL, NULL, NULL) > 0)
{
for( int i = 0; i < read_fds.fd_count; ++i )
do_socket_operation( read_fds.fd_array[i] );
}
Where FD_ISSET() comes into play more often is when using error checking with select(), eg:
fd_set read_fds;
FD_ZERO(&read_fds);
fd_set error_fds;
FD_ZERO(&error_fds);
int max_fd = 0;
read_fds.fd_count = connected_sockets.size();
for( int i = 0; i < read_fds.fd_count; ++i )
{
read_fds.fd_array[i] = connected_sockets[i];
if (read_fds.fd_array[i] > max_fd)
max_fd = read_fds.fd_array[i];
}
error_fds.fd_count = read_fds.fd_count;
for( int i = 0; i < read_fds.fd_count; ++i )
{
error_fds.fd_array[i] = read_fds.fd_array[i];
}
if (select(max_fd+1, &read_fds, NULL, &error_fds, NULL) > 0)
{
for( int i = 0; i < read_fds.fd_count; ++i )
{
if( !FD_ISSET(read_fds.fd_array[i], &error_fds) )
do_socket_operation( read_fds.fd_array[i] );
}
for( int i = 0; i < error_fds.fd_count; ++i )
{
do_socket_error( error_fds.fd_array[i] );
}
}

Select sets the bit corresponding to the file descriptor in the set, so, you need-not iterate through all the fds if you are interested in only a few (and can ignore others) just test only those file-descriptors for which you are interested.
if (select(fdmax+1, &read_fds, NULL, NULL, NULL) == -1) {
perror("select");
exit(4);
}
if(FD_ISSET(fd0, &read_fds))
{
//do things
}
if(FD_ISSET(fd1, &read_fds))
{
//do more things
}
EDIT
Here is the fd_set struct:
typedef struct fd_set {
u_int fd_count; /* how many are SET? */
SOCKET fd_array[FD_SETSIZE]; /* an array of SOCKETs */
} fd_set;
Where, fd_count is the number of sockets set (so, you can add an optimization using this) and fd_array is a bit-vector (of the size FD_SETSIZE * sizeof(int) which is machine dependent). In my machine, it is 64 * 64 = 4096.
So, your question is essentially: what is the most efficient way to find the bit positions of 1s in a bit-vector (of size around 4096 bits)?
I want to clear one thing here:
"looping through all the connected sockets" doesn't mean that you are actually reading/doing stuff to a connection. FD_ISSET() only checks weather the bit in the fd_set positioned at the connection's assigned file_descriptor number is set or not. If efficiency is your aim, then isn't this the most efficient? using heuristics?
Please tell us what's wrong with this method, and what are you trying to achieve using the alternate method.

It's fairly straight-forward:
for( int fd = 0; fd < max_fd; fd++ )
if ( FD_ISSET(fd, &my_fd_set) )
do_socket_operation( fd );

This looping is a limitation of the select() interface. The underlying implementations of fd_set are usually a bit set, which obviously means that looking for a socket requires scanning over the bits.
It is for precisely this reason that several alternative interfaces have been created - unfortunately, they are all OS-specific. For example, Linux provides epoll, which returns a list of only the file descriptors that are active. FreeBSD and Mac OS X both provide kqueue, which accomplishes the same result.

See this section 7.2 of Beej's guide to networking - '7.2. select()—Synchronous I/O Multiplexing' by using FD_ISSET.
in short, you must iterate through an fd_set in order to determine whether the file descriptor is ready for reading/writing...

I don't think what you are trying to do is a good idea.
Firstly its system dependent, but I believe you already know it.
Secondly, at the internal level these sets are stored as an array of integers and fds are stored as set bits. Now according to the man pages of select the FD_SETSIZE is 1024.
Even if you wanted to iterate over and get your interested fd's you have to loop over that number along with the mess of bit manipulation.
So unless you are waiting for more than FD_SETSIZE fd's on select which I don't think so is possible, its not a good idea.
Oh wait!!. In any case its not a good idea.

I don't think you could do much using the select() call efficiently. The information at "The C10K problem" are still valid.
You will need some platform specific solutions:
Linux => epoll
FreeBSD => kqueue
Or you could use an event library to hide the platform detail for you libev

ffs() may be used on POSIX or 4.3BSD for bits iteration, though it expects int (long and long long versions are glibc extensions). Of course, you have to check, if ffs() optimized as good as e.g. strlen and strchr.

Related

how can i get the number of bytes available to read on async socket on linux?

I have a simple tcp/ip server written in c++ on linux. I'm using asynchronous sockets and epoll. Is it possible to find out how many bytes are available for reading, when i get the EPOLLIN event?
From man 7 tcp:
int value;
error = ioctl(sock, FIONREAD, &value);
Or alternatively SIOCINQ, which is a synonym of FIONREAD.
Anyway, I'd recommend just to use recv in non-blocking mode in a loop until it returns EWOULDBLOCK.
UPDATE:
From your comments below I think that this is not the appropriate solution for your problem.
Imagine that your header is 8 bytes and you receive just 4; then your poll/select will return EPOLLIN, you will check the FIONREAD, see that the header is not yet complete and wayt for more bytes. But these bytes never arrive, so you keep on getting EPOLLIN on every call to poll/select and you have a no-op busy-loop. That is, poll/select are level-triggered. Not that an edge triggered function solves your problem either.
At the end you are far better doing a bit of work, adding a buffer per connection, and queuing the bytes until you have enough. It is not as difficult as it seems and it works far better. For example, something like that:
struct ConnectionData
{
int sck;
std::vector<uint8_t> buffer;
size_t offset, pending;
};
void OnPollIn(ConnectionData *d)
{
int res = recv(d->sck, d->buffer.data() + offset, d->pending);
if (res < 0)
handle_error();
d->offset += res;
d->pending -= res;
if (d->pending == 0)
DoSomethingUseful(d);
}
And whenever you want to get a number of bytes:
void PrepareToRecv(ConnectionData *d, size_t size)
{
d->buffer.resize(size);
d->offset = 0;
d->pending = size;
}

How can we get sockets list from select() function?

I'm working on a network project and I know select() function (with FD_XXX) returns the total number of socket handles that are ready and contained in the fd_set structures but do we know these sockets (as SOCKET or INT)? There is only way to get sockets list with a FOR LOOP-CHECK FD_ISSET, Am I right? else how?
Despite what others say about the return value of select(), I use it this way when dealing with a lot of sockets, it does not guarantee that you don't have to process all the list in case the only one socket happens to be the last one but would save some code if it's the first one.
int i;
int biggest=0;
fd_set sfds;
struct timeval timeout={0, 0};
FD_ZERO(&sfds);
for (i=0; i < NumberOfsockets; i++)
{
FD_SET(SocktList[i], &sfds);
if (SocktList[i] > biggest) biggest=SocktList[i];
}
timeout.tv_sec=30;
timeout.tv_usec=0;
// biggest is only necessary when dealing with Berkeley sockets,
// Visual Studio C++ (and others) ignore this parameter.
if ((nReady=select((biggest+1), &sfds, NULL, NULL, TimeOut)) > 0)
{
for (i=0; i < NumerbsOfSocket && nReady > 0; i++)
{
if (FD_ISSET(SocketList[i], &sfds)) {
// SocketList[i] got data to be read
... your code to process the socket when it's readable...
nReady--;
}
}
}

Elegant way to add/remove descriptors to/from poll

I have to handle around 1000 descriptors in one poll (I can't use epoll as it's Linux specific) and I have to be able to dynamically add/remove them(handle new connections and remove closed).
This means I should recombine the descriptors array on each iteration.
This is rather obvious from a technical point of view, but does anybody know a beautiful way to do that?
I'd keep the dead descriptors in the array, and purge once in a while.
I'd also maintain the location of each descriptor, for easy removal, but this can be optimized further.
The trick is to keep invalid descriptors in the array instead of rearranging the array every time.
For instance:
struct pollfd pfds[MY_MAX_FDS];
int nfds = 0;
enum { INVALID_FD = -1 };
....
typedef std::map<int,int> desc_to_index_t;
desc_to_index_t d2i;
....
// add descriptor
if (nfds == MY_MAX_FDS){
// purge old fds
// go through pfds and remove everything that has invalid fd
// pfds should point to a condensed array of valid objects and nfds should be updated as well, as long as d2i.
}
pfds[nfds] = { desc, events, revents};
d2i.insert(std::make_pair(desc,nfds));
++nfds;
....
// remove descriptor
desc_to_index_t::iterator it = d2i.find(desc);
assert(it != d2i.end());
pfds[it->second] = { INVALID_FD, 0, 0 };
d2i.erase(it);
This way you only need to purge once a certain threshold is crossed, and you don't need to build the array every time.

Speeding up non-blocking Unix Sockets (C++)

I've implemented a simple socket wrapper class. It includes a non-blocking function:
void Socket::set_non_blocking(const bool b) {
mNonBlocking = b; // class member for reference elsewhere
int opts = fcntl(m_sock, F_GETFL);
if(opts < 0) return;
if(b)
opts |= O_NONBLOCK;
else
opts &= ~O_NONBLOCK;
fcntl(m_sock, F_SETFL, opts);
}
The class also contains a simple receive function:
int Socket::recv(std::string& s) const {
char buffer[MAXRECV + 1];
s = "";
memset(buffer,0,MAXRECV+1);
int status = ::recv(m_sock, buffer, MAXRECV,0);
if(status == -1) {
if(!mNonBlocking)
std::cout << "Socket, error receiving data\n";
return 0;
} else if (status == 0) {
return 0;
} else {
s = buffer;
return status;
}
}
In practice, there seems to be a ~15ms delay when Socket::recv() is called. Is this delay avoidable? I've seen some non-blocking examples that use select(), but don't understand how that might help.
It depends on how you using sockets. If you have multiple sockets and you loop over all of them checking for data that may account for the delay.
With non-blocking recv you are depending on data being there. If your application need to use more than one socket you will have to constantly pool each socket in turns to find out if any of them have data available.
This is bad for system resources because it means your application is constantly running even when there is nothing to do.
You can avoid that with select. You basically set up your sockets, add them to group and select on the group. When anything happens on any of the selected sockets select returns specifying what happened and on which socket.
For some code about how to use select look at beej's guide to network programming
select will let you a specify a timeout, and can test if the socket is ready to be read from. So you can use something smaller than 15ms. Incidentally you need to be careful with that code you have, if the data on the wire can contain embedded NULs s won't contain all the read data. You should use something like s.assign(buffer, status);.
In addition to stefanB, I see that you are zeroing out your buffer every time. Why bother? recv returns how many bytes were actually read. Just zero out the one byte after ( buffer[status+1]=NULL )
How big is your MAXRECV? It might just be that you incur a page fault on the stack growth. Others already mentioned that zeroing out the receive buffer is completely unnecessary. You also take memory allocation and copy hit when you create a std::string out of received character data.

Iterating a read() from a socket

Is this the proper way to iterate over a read on a socket? I am having a hard time getting this to work properly. data.size is an unsigned int that is populated from the socket as well. It is correct. data.data is an unsigned char *.
if ( data.size > 0 ) {
data.data = (unsigned char*)malloc(data.size);
memset(&data.data, 0, data.size);
int remainingSize = data.size;
unsigned char *iter = data.data;
int count = 0;
do {
count = read(connect_fd, iter, remainingSize);
iter += count;
remainingSize -= count;
} while (count > 0 && remainingSize > 0);
}
else {
data.data = 0;
}
Thanks in advance.
You need to check the return value from read before you start adding it to other values.
You'll get a zero when the socket reports EOF, and -1 on error. Keep in mind that for a socket EOF is not the same as closed.
Low level socket programming is very tedious and error prone. If you use C++ you should try to use higher level libraries like Boost or ACE.
I would also suggest to read C++ Network Programming: Mastering Complexity Using ACE and Patterns and C++ Network Programming: Systematic Reuse with ACE and Frameworks
Put the read as part of the while condition.
while((remainingSize > 0) && (count = read(connect_fd, iter, remainingSize)) > 0)
{
iter += count;
remainingSize -= count;
}
This way if it fails you immediately stop the loop.
It is very common pattern to use the read as part of the loop condition otherwise you need to check the state inside the loop which makes the code uglier.
Personally:
I would move the whole above test into a separate function for readability but your milage may very.
Also using malloc (and company) is going to lead to a whole boat of memory management issues. I would use a std::vector. This also future proofs the code when you modify it to start throwing exceptions, now it will also be exception safe.
So assuming you change data.data to have a type of std::vector<unsigned char> then
if ( data.size > 0 )
{
std::vector<unsigned char> buffer(data.size);
unsigned char *iter = &buffer[0];
while(... read(connect_fd, iter, remainingSize) )
{
.....
}
... handle error as required
buffer.resize(buffer.size() - remainingSize);
data.data.swap(buffer);
}
Keep in mind that read() calls are system calls and thus a source of possible blocking, and even if you use non-blocking I/O, are inherently heavyweight. I would recommend minimising them.
A good way to go that has always served me well in over a decade of BSD socket programming in C is to use non-blocking I/O and issue a FIONREAD ioctl() to get the total amount of data waiting at a given polling interval (assuming you're using some sort of synchronous I/O mux like select()) and then just read() that amount as many times as necessary to capture all of it, and then return the function for the moment until the next timer tick.