Does it make sense to call async_connect on a unix socket? - c++

I'm using asio to build a network library that can connect to remote systems via TCP or unix sockets. I use asio::generic::stream_protocol::socket which has both a connect and async_connect methods. Does it make sense to use async_connect when connecting as a client to a unix socket?

Define "make sense". Both ways work, however with different implications on style, architecture and performance.
Data Input/Output especially over a network has high delays i.e. latency, and lead to the whole fuzzy async programming style, boost asio even got its name from.
So using the blocking connect would be a very bad choice in the TCP case, since a whole thread is waiting several ms until something happens and can't do anything else, while it would be ok in the case of IPC i.e. unix sockets.
But since you want to both, I recommend to simply use async_connect, since you have to organise you whole program in the "async" style for the TCP case anyway, and it doesn't have any drawbacks for the unix socket case (might be also faster or have higher throughput than the blocking one). And to not making unnecessary branches special cases, is considered by many people incl. me as making sense.

Related

Is there a better way to use asynchronous TCP sockets in C++ rather than poll or select?

I recently started writing some C++ code that uses sockets, which I'd like to be asynchronous. I've read many posts about how poll and select can be used to make my sockets asynchronous (using poll or select to wait for a send or recv buffer), but on my server side I have an array of struct pollfd, where every time the listening socket accepts a connection, it adds it to the array of struct pollfd so that it can monitor that socket's recv (POLLIN).
My problem is that if I have 5000 sockets connected to my listening socket on my server, then the array of struct pollfd would be of size 5000, since it would be monitoring all the connected sockets BUT the only way I know how to check if a recv for a socket is ready, is by looping through all the items in the array of struct pollfd to find the ones whose revents equals POLLIN. This just seems kind of inefficient, when the number of connected sockets because very large. Is there a better way to do this?
How does the boost::asio library handle async_accept, async_send, etc...? How should I handle it?
What the heck, I will go ahead and write up an answer.
I am going to ignore the "asynchronous" vs "non-blocking" terminology because I believe it is irrelevant to your question.
You are worried about performance when handling thousands of network clients, and you are right to be worried. You have rediscovered the C10K problem. Back when the Web was young, people saw a need for a small number of fast servers to handle a large number of (relatively) slow clients. The existing select/poll type interfaces require linear scans -- in both kernel and user space -- across all sockets to determine which are ready. If many sockets are often idle, your server can wind up spending more time figuring out what work to do than doing actual work.
Fast-forward to today, where we have basically two approaches for dealing with this problem:
1) Use one thread per socket and just issue blocking reads and writes. This is usually the simplest to code, in my opinion, and modern operating systems are quite good at letting idle threads sleep peacefully out of the way without imposing any significant performance overhead. In my experience, this approach works very well for hundreds of clients; I cannot personally say how it will work for thousands.
2) Use one of the platform-specific interfaces that were introduced to tackle the C10K problem. That means epoll (Linux), kqueue (BSD/Mac), or completion ports (Windows). (If you think epoll is the same as poll, look again.) All of these will only notify your application about sockets that are actually ready, avoiding the wasteful linear scan across idle connections. There are several libraries that make these platform-specific interfaces easier to use, including libevent, libev, and Boost.Asio. You will find that all of them ultimately invoke epoll on Linux, kqueue on BSD, and so on, whenever such interfaces are available.

C++ asynchronous hostname resolving

I have a epoll server which sometimes opens outgoing connections, using their hostnames representation. Because of high rate of incoming connections flow, I don't want to block somewhere like getaddrinfo() or gethostbyname(). Sure, I could implement a cache plus a new thread, where the hostname resolution would be performed. Is there a single-threaded non-blocking way to resolve a hostname to an IP?
There are various libraries for the purpose, e.g. libevent contains a resolver.
I sort of agree with #Puciek though, doing this in a single thread adds quite a bit of complexity for questionable benefits. Using a dedicated resolving thread and communicating with it through pipes might be the best solution.
Since you mention epoll I guess you're using Linux. It has a getaddrinfo_a function that if I understand correctly does part of this for you. It clones a thread and runs getaddrinfo there. I never used it though so can't help beyond that.

Can I use Boost::Asio and not worry about network programming problems?

I have to make a server in my current project but I don't have any or little experience in this area. My question is, can I just use Asio in my project and it will simply handle any problems a normal server has to face (partial reads, multithreading problems, ...)?
(My server will have to handle hundreds of clients at the same time)
ASIO takes care of the low-level socket programming and polling code. You still have to provide all the functionality to process raw network data. Ultimately, you get an unpredictable number of bytes from the network any time a read callback is called, and it is up to you to take those bytes and reconstruct your application message from them.
But indeed, as far as receiving an unspecified number of bytes is concerned, you won't have to worry about how that is implemented.
Multithreading is "easy" in the sense that you can run the ASIO processor multiple times concurrently, but it is your responsibility to provide a read callback that can deal with being run multiple times at once.
Asio is intentionally not multithreaded. It handles concurrency by multiplexing via the operating system's select(), kqueue, epoll, or other mechanism.
As for partial receives, there is no automatic way to get TCP to respect message boundaries. Asio can't do anything about that, so you'll need some technique at the application level to indicate completion. HTTP traditionally handles this by closing the socket when it's finished, though it's also possible to pre-send the size of the message.

Different between none-blocking mode with async socket in C++ winsock server

In C++, I've read some tutorials to create a server which can accept connections from multiple clients. They suggest using async socket, but i don't really know why we should choose async over none-blocking mode. And what's about the ideas that use multi-threading? is it better than using async socket? Thanks!!
Since you're requesting a solution in C++, boost asio is imo the best async io library there is.
I assume you're talking about the "one thread per client" solution when refering to "multi-threading", which is generally a very bad idea for servers who expect many clients in a short time frame or connected at the same time.
Threads are way to resource consuming for this use, plus you have to take care of mutual exclusion, which in combination with blocking calls can drive you into deadlocks very fast. And thats the least worst of what you can run into.
Additionally on that, it's very easy for an attacker to exploit your server to stuck. You will spend much time on trying to design your code so that this will be avoided, which leads you into having an unreadable, hard to update and error phrone code.
In boost.asio the specified thread(s) ( those who call io_service::run ) will only do work when there is actually work to do, directly leading you into the object assigned to the task.
So technically async is also blocking, with the difference that only the scheduler waits for work to do, while those functions you add work with ( connect, send, receive, ... ) will return immediately.
I'll assume you're talking TCP and not UDP. I definitely recommend skipping async sockets, those are favored by Microsoft and supporters but are not portable. Instead use the vanilla stuff: here's an example with server and client.

Is epoll a bad idea for udp client?

I have created a linux server using epoll. And I realized that the clients will use udp packets...
I just erased the "listen" part from my code and it seems like working. But I was wondering any hidden issues or problems I might face.
Also, is this a bad idea using epoll for server if clients are sending udp packets?
If the respective thread does not need to do anything else but receive UDP packets, you can as well just block on recvfrom, this will be the exact same effect with one less syscall and less code complexity.
On the other hand, if you need to do other things periodically or with some timely constraints that should not depend on whether packets arrive on the wire, it's better to use epoll anyway, even if it seems overkill.
The big advantage of epoll is that besides being reasonably efficient, it is comfortable and extensible (you can plug in a signalfd, timerfd or eventfd and many other things).