C++ asynchronous hostname resolving - c++

I have a epoll server which sometimes opens outgoing connections, using their hostnames representation. Because of high rate of incoming connections flow, I don't want to block somewhere like getaddrinfo() or gethostbyname(). Sure, I could implement a cache plus a new thread, where the hostname resolution would be performed. Is there a single-threaded non-blocking way to resolve a hostname to an IP?

There are various libraries for the purpose, e.g. libevent contains a resolver.
I sort of agree with #Puciek though, doing this in a single thread adds quite a bit of complexity for questionable benefits. Using a dedicated resolving thread and communicating with it through pipes might be the best solution.
Since you mention epoll I guess you're using Linux. It has a getaddrinfo_a function that if I understand correctly does part of this for you. It clones a thread and runs getaddrinfo there. I never used it though so can't help beyond that.

Related

Does it make sense to call async_connect on a unix socket?

I'm using asio to build a network library that can connect to remote systems via TCP or unix sockets. I use asio::generic::stream_protocol::socket which has both a connect and async_connect methods. Does it make sense to use async_connect when connecting as a client to a unix socket?
Define "make sense". Both ways work, however with different implications on style, architecture and performance.
Data Input/Output especially over a network has high delays i.e. latency, and lead to the whole fuzzy async programming style, boost asio even got its name from.
So using the blocking connect would be a very bad choice in the TCP case, since a whole thread is waiting several ms until something happens and can't do anything else, while it would be ok in the case of IPC i.e. unix sockets.
But since you want to both, I recommend to simply use async_connect, since you have to organise you whole program in the "async" style for the TCP case anyway, and it doesn't have any drawbacks for the unix socket case (might be also faster or have higher throughput than the blocking one). And to not making unnecessary branches special cases, is considered by many people incl. me as making sense.

TCP Two-Way Communication using Qt

I am trying to setup a TCP communication framework between two computers. I would like each computer to send data to the other. So computer A would perform a calculation, and send it to computer B. Computer B would then read this data, perform a calculation using it, and send a result back to computer A. Computer A would wait until it receives something from computer B before proceeding with performing another calculation, and sending it to computer B.
This seems conceptually straightforward, but I haven't been able to locate an example that details two-way (bidirectional) communication via TCP. I've only found one-way server-client communication, where a server sends data to a client. These are some examples that I have looked at closely so far:
Server-Client communication
Synchronized server-client communication
I'm basically looking to have two "servers" communicate with each other. The synchronized approach above is, I believe, important for what I'm trying to do. But I'm struggling to setup a two-way communication framework via a single socket.
I would appreciate it greatly if someone could point me to examples that describe how to setup bidirectional communication with TCP, or give me some pointers on how to set this up, from the examples I have linked above. I am very new to TCP and network communication frameworks and there might be a lot that I could be misunderstanding, so it would be great if I could get some clear pointers on how to proceed.
This answer does not go into specifics, but it should give you a general idea, since that's what you really seem to be asking for. I've never used Qt before, I do all my networking code with BSD-style sockets directly or with my own wrappers.
Stuff to think about:
Protocol. Hand-rolled or existing?
Existing protocols can be heavyweight, depending on what your payload looks like. Examples include HTTP and Google ProtoBuf; there are many more.
Handrolled might mean more work, but more controlled. There are two general approaches: length-based and sentinel-based.
Length-based means embedding the length into the first bytes. Requires caring about endianness. Requires thinking about what if a message is longer than can be embedded in the length byte. If you do this, I strongly recommend that you define your packet formats in some data file, and then generate the low-level packet encoding logic.
Sentinel-based means ending the message when some character (or sequence) is seen. Common sentinels are '\0', '\n', and "\r\n". If the rest of your protocol is also text-based, this means it is much easier to debug.
For both designs, you have to think about what happens if the other side tries to send more data than you are willing (or able) to store in memory. In either case, limiting the payload size to a 16-bit unsigned integer is probably a good idea; you can stream replies with multiple packets. Note that serious protocols (based on UDP + crypto) typically have a protocol-layer size limit of 512-1500 bytes, though application-layer may be larger of course.
For both designs, EOF on the socket without having a sentinel means you must drop the message and log an error.
Main loop. Qt probably has one you can use, but I don't know about it.
It's possible to develop simple operations using solely blocking operations, but I don't recommend it. Always assume the other end of a network connection is a dangerous psychopath who knows where you live.
There are two fundamental operations in a main loop:
Socket events: a socket reports being ready for read, or ready to write. There are also other sorts of events that you probably won't use, since most useful information can be found separately in the read/write handlers: exceptional/priority, (write)hangup, read-hangup, error.
Timer events: when a certain time delta has passed, interrupt the wait-for-socket-events syscall and dispatch to the timer heap. If you don't have any, either pass the syscalls notion of "infinity". But if you have long sleeps, you might want some arbitrary, relatively number like "10 seconds" or "10 minutes" depending on your application, because long timer intervals can do all sorts of weird things with clock changes, hibernation, and such. It's possible to avoid those if you're careful enough and use the right APIs, but most people don't.
Choice of multiplex syscall:
The p versions below include atomic signal mask changing. I don't recommend using them; instead if you need signals either add signalfd to the set or else emulate it using signal handlers and a (nonblocking, be careful!) pipe.
select/pselect is the classic, available everywhere. Cannot have more than FD_SETSIZE file descriptors, which may be very small (but can be #defined on the command-line if you're careful enough. Inefficient with sparse sets. Timeout is microseconds for select and nanonseconds for pselect, but chances are you can't actually get that. Only use this if you have no other choice.
poll/ppoll solves the problems of sparse sets, and more significantly the problem of listening to more than FD_SETSIZE file descriptors. It does use more memory, but it is simpler to use. poll is POSIX, ppoll is GNU-specific. For both, the API provides nanosecond granularity for the timeout, but you probably can't get that. I recommend this if you need BSD compatibility and don't need massive scalability, or if you only have one socket and don't want to deal with epoll's headaches.
epoll solves the problem of having to respecify the file descriptor and event list every time. by keeping the list of file descriptors. Among other things, this means that when, the low-level kernel event occurs, the epoll can immediately be made aware, regardless of whether the user program is already in a syscall or not. Supports edge-triggered mode, but don't use it unless you're sure you understand it. Its API only provides millisecond granularity for the timeout, but that's probably all you can rely on anyway. If you are able to only target Linux, I strongly suggest you use this, except possibly if you can guarantee only a single socket at once, in which case poll is simpler.
kqueue is found on BSD-derived systems, including Mac OS X. It is supposed to solve the same problems as epoll, but instead of keeping things simple by using file descriptors, it has all sorts of strange structures and does not follow the "do only one thing" principle. I have never used it. Use this if you need massive scalability on BSD.
IOCP. This only exists on Windows and some obscure Unixen. I have never used it and it has significantly different semantics. Use this, but be aware that much of this post is not applicable because Windows is weird. But why would you use Windows for any sort of serious system?
io_uring. A new API in Linux 5.1. Significantly reducing the number of syscalls and memory copies. Worth it if you have a lot of sockets, but since it's so new, you must provide a fallback path.
Handler implementation:
When the multiplex syscall signifies an event, look up the handler for that file number (some class with virtual functions) and call the relevant events (note there may be more than one).
Make sure all your sockets have O_NONBLOCK set and also disable Nagle's algorithm (since you're doing buffering yourself), except possibly connect's before the connection is made, since that requires confusing logic, especially if you want to play nice with multiple DNS results.
For TCP sockets, all you need is accept in the listening socket's handler, and read/write family in the accept/connected handler. For other sorts of sockets, you need the send/recv family. See the "see also" in their man pages for more info - chances are one of them will be useful to you sometimes, do this before you hard-code too much into your API design.
You need to think hard about buffering. Buffering reads means you need to be able to check the header of a packet to see if there are enough bytes to do anything with it, or if you have to store the bytes until next time. Also remember that you might receive more than one packet at once (I suggest you rethink your design so that you don't mandate blocking until you get the reply before sending the next packet). Buffering writes is harder than you think, since you don't want to be woken when there is a "can write" even on a socket for which you have no data to write. The application should never write itself, only queue a write. Though TCP_CORK might imply a different design, I haven't used it.
Do not provide a network-level public API of iterating over all sockets. If needed, implement this at a higher level; remember that you may have all sorts of internal file descriptors with special purposes.
All of the above applies to both the server and the client. As others have said, there is no real difference once the connection is set up.
Edit 2019:
The documentation of D-Bus and 0MQ are worth reading, whether you use them or not. In particular, it's worth thinking about 3 kinds of conversations:
request/reply: a "client" makes a request and the "server" does one of 3 things: 1. replies meaningfully, 2. replies that it doesn't understand the request, 3. fails to reply (either due to a disconnect, or due to a buggy/hostile server). Don't let un-acknowledged requests DoS the "client"! This can be difficult, but this is a very common workflow.
publish/subscribe: a "client" tells the "server" that it is interested in certain events. Every time the event happens, the "server" publishes a message to all registered "clients". Variations: , subscription expires after one use. This workflow has simpler failure modes than request/reply, but consider: 1. the server publishes an event that the client didn't ask for (either because it didn't know, or because it doesn't want it yet, or because it was supposed to be a oneshot, or because the client sent an unsubscribe but the server didn't process it yet), 2. this might be a magnification attack (though that is also possible for request/reply, consider requiring requests to be padded), 3. the client might have disconnected, so the server must take care to unsubscribe them, 4. (especially if using UDP) the client might not have received an earlier notification. Note that it might be perfectly legal for a single client to subscribe multiple times; if there isn't naturally discriminating data you may need to keep a cookie to unsubscribe.
distribute/collect: a "master" distributes work to multiple "slaves", then collects the results, aka map/reduce any many other reinvented terms for the same thing. This is similar to a combination of the above (a client subscribes to work-available events, then the server makes a unique request to each clients instead of a normal notification). Note the following additional cases: 1. some slaves are very slow, while others are idle because they've already completed their tasks and the master might have to store the incomplete combined output, 2. some slaves might return a wrong answer, 3. there might not be any slaves, 4.
D-Bus in particular makes a lot of decisions that seem quite strange at first, but do have justifications (which may or may not be relevant, depending on the use case). Normally, it is only used locally.
0MQ is lower-level and most of its "downsides" are solved by building on top of it. Beware of the MxN problem; you might want to artificially create a broker node just for messages that are prone to it.
#include <QAbstractSocket>
#include <QtNetwork>
#include <QTcpServer>
#include <QTcpSocket>
QTcpSocket* m_pTcpSocket;
Connect to host: set up connections with tcp socket and implement your slots. If data bytes are available readyread() signal will be emmited.
void connectToHost(QString hostname, int port){
if(!m_pTcpSocket)
{
m_pTcpSocket = new QTcpSocket(this);
m_pTcpSocket->setSocketOption(QAbstractSocket::KeepAliveOption,1);
}
connect(m_pTcpSocket,SIGNAL(readyRead()),SLOT(readSocketData()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(error(QAbstractSocket::SocketError)),SIGNAL(connectionError(QAbstractSocket::SocketError)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(stateChanged(QAbstractSocket::SocketState)),SIGNAL(tcpSocketState(QAbstractSocket::SocketState)),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(disconnected()),SLOT(onConnectionTerminated()),Qt::UniqueConnection);
connect(m_pTcpSocket,SIGNAL(connected()),SLOT(onConnectionEstablished()),Qt::UniqueConnection);
if(!(QAbstractSocket::ConnectedState == m_pTcpSocket->state())){
m_pTcpSocket->connectToHost(hostname,port, QIODevice::ReadWrite);
}
}
Write:
void sendMessage(QString msgToSend){
QByteArray l_vDataToBeSent;
QDataStream l_vStream(&l_vDataToBeSent, QIODevice::WriteOnly);
l_vStream.setByteOrder(QDataStream::LittleEndian);
l_vStream << msgToSend.length();
l_vDataToBeSent.append(msgToSend);
m_pTcpSocket->write(l_vDataToBeSent, l_vDataToBeSent.length());
}
Read:
void readSocketData(){
while(m_pTcpSocket->bytesAvailable()){
QByteArray receivedData = m_pTcpSocket->readAll();
}
}
TCP is inherently bidirectional. Get one way working (client connects to server). After that both ends can use send and recv in exactly the same way.
Have a look at QWebSocket, this is based on HTTP and it also allows for HTTPS

Socket hostname lookup timeout: how to implement it?

I write portable Windows/Linux application that uses sockets. I use gethostbyname function to perform DNS lookups.
However, I don't see how to set gethostbyname timeout and secure my application against hanging during name lookup.
Of course, it is possible to run gethostbyname on another thread, and that what I do. However, it is solution for trivial applications only.
My application uses 1000-3000 connections in parallel. In such situation the question is: what to do with timeouted threads? I don't see good solution. We can "forget" them, however, we are at risk that our program threads count will grow up to infinity on bad networks. We can terminate them, but the idea looks terrible. From my experience, Windows can crash after thousands of threads termination, and I don't know how Linux will behave in such conditions.
Also, thread creation needs many resources; it is not good idea to create 3000 threads just to run gethostbyname function and exit.
So, separate thread doesn't look like good idea for really complex application. Another alternative is to write own DNS client of course, however, it also doesn't look good.
Is there any "official" way on Windows and Linux (or better portable way) to get host address with custom timeout?
First of all: don't use gethostbyname(), it's obsolete. Use getaddrinfo() instead.
What you want is asynchronous name resolution. It's a common requirement, but unfortunately there is no "standard" way, how to do it. Here my hints for finding the best solution for you:
Don't implement a DNS client. Name resolution is more than just DNS. Think of mDNS, hosts files and so on. Use a system function like getaddrinfo() that abstracts the different name resolution mechanisms for you.
Some systems offer asynchronous versions of the resolution functions, like glibc offers getaddrinfo_a().
There are asynchronous resolution libraries, that wrap around the synchronous system resolver functions. At first libasyncns comes to my mind.
Boost.Asio supports to use the resolver with a thread pool. See here.
The most portable solution is indeed to use a separate thread for name resolution. This is also documented in MSDN.
Non portable solutions exist that perform asynchronous name looksups.
Linux glibc 2.2.3+: getaddrinfo_a
Windows: WSAAsyncGetHostByName IPv4 only :(
There are asynchronous DNS libraries out there: TADNS for example looks easy enough.

Scheduling Packet Retransmission

I'm programming a network protocol over UDP, using C/C++ in Linux. The protocol must provide reliability, so I'm going to simulate something like TCP retransmission over UDP.
This can be done using pthreads or fork, but I believe that's an overkill and consumes a lot of system resources. A better approach is to exploit a scheduler.
I probably can't use Linux internal scheduler, since I'm programming in user space. Are there standard C/C++ libraries to accomplish this? How about 3rd party libraries?
Edit: Some people asked why I'm doing this. Why not use the TCP instead?
The answer is, since I'm implementing a tunneling protocol. If someone tunnels TCP over TCP, the efficiency will drop considerably. Here's more info Why TCP Over TCP Is A Bad Idea.
The "scheduler" you're after is called "select", and it's a user-space call available in linux. Type "man 2 select" to read the help page for how to use it.
If you need a timeout, just call select() with a timeout value. The select call will return either when new data has arrived, or a timeout has expired. You can then do retransmissions if there was a timeout.
Here's a sample of how to accomplish asynchronous coroutines with Boost. Boost manages the overhead of creating a thread to run the coroutine in this case so that you don't need to. If you would like the kernel to manage your interrupts, you can use alarm & setitimer, but they're very limited in what they can do.
Any solution will include threads, forks, or some variant of them at some level, unless you synchronously manage the transmission in the main thread using something like select().
Not clear what exactly you are trying to schedule. You can use libevent for efficient and somewhat portable interface. This is basically similar to Matthew's suggestion of using select, but using the most efficient interface (which select is not) on FreeBSD, Linux and MacOS X (actually their page now claims Windows support as well but I'm not too familiar with that).
This will give ability to do non-blocking event-driven network calls. It will not solve the scheduling part. DOing it in a separate thread is not going to hurt your performance. I think running a pthread per connection is not the best approach, but having a single scheduling thread and some worker threads dealing with the network events and maybe some non-trivial processing usually works well.

c++ send data to multiple UDP sockets

I got a c++ non-blocking server socket, with all the clients stored in a std::map structure.
I can call the send() method for each clientObject to send something to the connected client and that works pretty good already.
But for sending a message to all (broadcast?) i wanna know:
there is something better than do a for/loop with all the clients and call to ClientObject->send("foo") each iteration?
Or should i just try having a peek on multicast sockets?
Thanks in advance.
Rag.
Multicast is only an option if you're communicating over a LAN. It won't work over the Internet.
What you may want to do here is to demultiplex the sockets using asynchronous I/O. This allows you to send data to multiple sockets at the same time, and use asynchronous event handlers to deal with each transmission.
I would recommend looking into Boost ASIO for a portable way to do this. You can also use OS specific system calls, (such as poll/select on UNIX or epoll on Linux) to do this, but it is a lot more complicated.
Multicast would be much preferable... as long as you are talking about local nodes i.e. within the "broadcast/multicast" domain on the LAN.
Of course there are multicast distribution protocols for wider dispersion of such messages but they are seldom used and, depending on your specific case, you could/couldn't reliability depend on such facility.
The use of Multicast translates to lots of savings from a sender point of view: only one send operation needs to occur instead of n*send.
You'd better off to do udp unicast to each host unless you have those very expensive switches. Yes, broadcast/multicast can actually be slower for most switches that have much wimpier CPU than your pcs. Doing anything other than simple forwarding would slow them down tremendously.
Do a benchmark to find out.
Asynch socket programming is definitely the way to go! :)