I got a c++ non-blocking server socket, with all the clients stored in a std::map structure.
I can call the send() method for each clientObject to send something to the connected client and that works pretty good already.
But for sending a message to all (broadcast?) i wanna know:
there is something better than do a for/loop with all the clients and call to ClientObject->send("foo") each iteration?
Or should i just try having a peek on multicast sockets?
Thanks in advance.
Rag.
Multicast is only an option if you're communicating over a LAN. It won't work over the Internet.
What you may want to do here is to demultiplex the sockets using asynchronous I/O. This allows you to send data to multiple sockets at the same time, and use asynchronous event handlers to deal with each transmission.
I would recommend looking into Boost ASIO for a portable way to do this. You can also use OS specific system calls, (such as poll/select on UNIX or epoll on Linux) to do this, but it is a lot more complicated.
Multicast would be much preferable... as long as you are talking about local nodes i.e. within the "broadcast/multicast" domain on the LAN.
Of course there are multicast distribution protocols for wider dispersion of such messages but they are seldom used and, depending on your specific case, you could/couldn't reliability depend on such facility.
The use of Multicast translates to lots of savings from a sender point of view: only one send operation needs to occur instead of n*send.
You'd better off to do udp unicast to each host unless you have those very expensive switches. Yes, broadcast/multicast can actually be slower for most switches that have much wimpier CPU than your pcs. Doing anything other than simple forwarding would slow them down tremendously.
Do a benchmark to find out.
Asynch socket programming is definitely the way to go! :)
Related
I recently started writing some C++ code that uses sockets, which I'd like to be asynchronous. I've read many posts about how poll and select can be used to make my sockets asynchronous (using poll or select to wait for a send or recv buffer), but on my server side I have an array of struct pollfd, where every time the listening socket accepts a connection, it adds it to the array of struct pollfd so that it can monitor that socket's recv (POLLIN).
My problem is that if I have 5000 sockets connected to my listening socket on my server, then the array of struct pollfd would be of size 5000, since it would be monitoring all the connected sockets BUT the only way I know how to check if a recv for a socket is ready, is by looping through all the items in the array of struct pollfd to find the ones whose revents equals POLLIN. This just seems kind of inefficient, when the number of connected sockets because very large. Is there a better way to do this?
How does the boost::asio library handle async_accept, async_send, etc...? How should I handle it?
What the heck, I will go ahead and write up an answer.
I am going to ignore the "asynchronous" vs "non-blocking" terminology because I believe it is irrelevant to your question.
You are worried about performance when handling thousands of network clients, and you are right to be worried. You have rediscovered the C10K problem. Back when the Web was young, people saw a need for a small number of fast servers to handle a large number of (relatively) slow clients. The existing select/poll type interfaces require linear scans -- in both kernel and user space -- across all sockets to determine which are ready. If many sockets are often idle, your server can wind up spending more time figuring out what work to do than doing actual work.
Fast-forward to today, where we have basically two approaches for dealing with this problem:
1) Use one thread per socket and just issue blocking reads and writes. This is usually the simplest to code, in my opinion, and modern operating systems are quite good at letting idle threads sleep peacefully out of the way without imposing any significant performance overhead. In my experience, this approach works very well for hundreds of clients; I cannot personally say how it will work for thousands.
2) Use one of the platform-specific interfaces that were introduced to tackle the C10K problem. That means epoll (Linux), kqueue (BSD/Mac), or completion ports (Windows). (If you think epoll is the same as poll, look again.) All of these will only notify your application about sockets that are actually ready, avoiding the wasteful linear scan across idle connections. There are several libraries that make these platform-specific interfaces easier to use, including libevent, libev, and Boost.Asio. You will find that all of them ultimately invoke epoll on Linux, kqueue on BSD, and so on, whenever such interfaces are available.
I'm developing a peer to peer message parsing application. So one peer may need to handle many clients. And also there is a possibility to send and receive large data (~20 MB data as one message). There can be situations like many peers send large data to the same peer. I heard there are many solutions to handle these kind of a situation.
Use thread per peer
Using a loop to go through the peers and if there are data we can
recive
Using select function
etc.
What is the most suitable methodology or most common and accepted way to handle these kind of a situation? Any advice or hint are welcome.
Updated: Is there a good peer to peer distributed computing library or framework for C++ on windows platform
Don't use a thread per peer; past the number of processors, additional threads is likely only to hurt performance. You'd also have been expected to tweak the dwStackSize so that 1000 idle peers doesn't cost you 1000MB of RAM.
You can use a thread-pool (X threads handling Y sockets) to get a performance boost (or, ideally, IO Completion Ports), but this tends to work incredibly well for certain kinds of applications, and not at all for other kinds of applications. Unless you're certain that yours is suited for this, I wouldn't justify taking the risk.
It's entirely permissible to use a single thread and poll/send from a large quantity of sockets. I don't know precisely when large would have a concernable overhead, but I'd (conservatively) ballpark it somewhere between 2k-5k sockets (on below average hardware).
The workaround for WSAEWOULDBLOCK is to have a std::queue<BYTE> of bytes (not a queue of "packet objects") for each socket in your application (you populate this queue with the data you want to send), and have a single background-thread whose sole purpose is to drain the queues into the respective socket send (X bytes at a time); you can use blocking socket for this now (since it's a background-worker), but if you do use a non-blocking socket and get WSAEWOULDBLOCK you can just keep trying to drain the queue (here it won't obstruct the flow of your application).
You could use libtorrent.org which is built on top of boost (boost-asio ). It's focusing on efficiency and scalability.
I have not much experience in developing a socket in C++ but in C# I had really good experience accepting connections asynchronously and pass them to an own thread from a threadpool.
I have created a linux server using epoll. And I realized that the clients will use udp packets...
I just erased the "listen" part from my code and it seems like working. But I was wondering any hidden issues or problems I might face.
Also, is this a bad idea using epoll for server if clients are sending udp packets?
If the respective thread does not need to do anything else but receive UDP packets, you can as well just block on recvfrom, this will be the exact same effect with one less syscall and less code complexity.
On the other hand, if you need to do other things periodically or with some timely constraints that should not depend on whether packets arrive on the wire, it's better to use epoll anyway, even if it seems overkill.
The big advantage of epoll is that besides being reasonably efficient, it is comfortable and extensible (you can plug in a signalfd, timerfd or eventfd and many other things).
I'm programming a network protocol over UDP, using C/C++ in Linux. The protocol must provide reliability, so I'm going to simulate something like TCP retransmission over UDP.
This can be done using pthreads or fork, but I believe that's an overkill and consumes a lot of system resources. A better approach is to exploit a scheduler.
I probably can't use Linux internal scheduler, since I'm programming in user space. Are there standard C/C++ libraries to accomplish this? How about 3rd party libraries?
Edit: Some people asked why I'm doing this. Why not use the TCP instead?
The answer is, since I'm implementing a tunneling protocol. If someone tunnels TCP over TCP, the efficiency will drop considerably. Here's more info Why TCP Over TCP Is A Bad Idea.
The "scheduler" you're after is called "select", and it's a user-space call available in linux. Type "man 2 select" to read the help page for how to use it.
If you need a timeout, just call select() with a timeout value. The select call will return either when new data has arrived, or a timeout has expired. You can then do retransmissions if there was a timeout.
Here's a sample of how to accomplish asynchronous coroutines with Boost. Boost manages the overhead of creating a thread to run the coroutine in this case so that you don't need to. If you would like the kernel to manage your interrupts, you can use alarm & setitimer, but they're very limited in what they can do.
Any solution will include threads, forks, or some variant of them at some level, unless you synchronously manage the transmission in the main thread using something like select().
Not clear what exactly you are trying to schedule. You can use libevent for efficient and somewhat portable interface. This is basically similar to Matthew's suggestion of using select, but using the most efficient interface (which select is not) on FreeBSD, Linux and MacOS X (actually their page now claims Windows support as well but I'm not too familiar with that).
This will give ability to do non-blocking event-driven network calls. It will not solve the scheduling part. DOing it in a separate thread is not going to hurt your performance. I think running a pthread per connection is not the best approach, but having a single scheduling thread and some worker threads dealing with the network events and maybe some non-trivial processing usually works well.
I am coding sockets server for 1000 clients maxmimum, the server is about my game, i'm using non-blocking sockets and about 10 threads that receive data simultaneously from different sockets (first thread receives from 0-100,second from 101-200 and so on..)
but if thread 1 wants to send data to all 1000 clients and thread 2 also wants to send data to all 1000 clients at the same time, is that safe? are there any chances of the data being messed in the other (client) side?
if yes, i guess the only problem that can happen is that sometimes client would receive 2 or 10 packets as 1 packet, is that correct? if yes, is there any solution to that :(
The usual pattern of dealing with many sockets is to have a dedicated thread polling for I/O events with select(2), poll(2), or better kqueue(2) or epoll(4) (depending on the platform) acting as socket event dispatcher. The sockets are usually handled in non-blocking mode. Then one might have pool of threads reacting to the events and either do reads and writes directly or via lower level buffers/queues.
All sorts of techniques are applicable here - from queues to event subscription whiteboards. It gets tricky with multiplexing accepts/reads/writes/EOFs on the I/O level and with event arbitration on the application level. Several libraries like libevent and boost::asio help structure the lower level (the ACE library is also in this space, but I'd hate recommending it to anybody). You would have to come up with application-level protocols and state machines yourself (again boost::statechart might be of help).
Some good links to get better understanding of what you are up against (this is probably the millionth time they are mentioned here on SO):
The C10K problem
High-Performance Server Architecture
Apologies for not offering a concrete solution, but this is a very wide design question and most decisions depend heavily on the context (lots of fun though). Hope this helps a bit.
Since you are sending data using different sockets, there must not be any problem. Rather when these different threads access same data you have to ensure data integrity.
Are you using UDP or TCP sockets?
If UDP, each write should be encapsulated in a separate packet and should be carried to the other side intact. The order may be swapped (as it may for any UDP packet) but they should be whole.
If TCP, there's no concept of packets on the transport layer and any 10 writes on one side may be bundled up on the other side in one read. TCP writes may also only accept part of your buffer so even if the send() function is atomic, your write isn't necessarily. In this case you'd need to synchronize it.
send() is not atomic in most implementations, so sending to 1000 different sockets from multiple threads could lead to mixed-up messages arriving on the client side, and all kinds of weirdness. (I know nothing, see Nicolai's and Robert's comments below the rest of my comment still stands though (in terms of being a solution to your problem))
What I would do is use threads for sending like you use them for receiving. One thread to manage sending to one (or more) sockets that ensures that you don't write to one socket from multiple threads at the same time.
Also look here for some additional discussion and more interesting links.
If you're on windows, the winsock programmers faq is an invaluable resource, for your issue see here.