Problems implementing a multi-threaded UDP server (threadpool?)

Problems implementing a multi-threaded UDP server (threadpool?) - c++

I am writing an audio streamer (client-server) as a project of mine (C/C++),
and I decided to make a multi threaded UDP server for this project.
The logic behind this is that each client will be handled in his own thread.
The problems I`m having are the interference of threads to one another.
The first thing my server does is create a sort of a thread-pool; it creates 5
threads that all are blocked automatically by a recvfrom() function,
though it seems that, on most of the times when I connect another device
to the server, more than one thread is responding and later on
that causes the server to be blocked entirely and not operate further.
It's pretty difficult to debug this as well so I write here in order
to get some advice on how usually multi-threaded UDP servers are implemented.
Should I use a mutex or semaphore in part of the code? If so, where?
Any ideas would be extremely helpful.

Take a step back: you say
each client will be handled in his own thread
but UDP isn't connection-oriented. If all clients use the same multicast address, there is no natural way to decide which thread should handle a given packet.
If you're wedded to the idea that each client gets its own thread (which I would generally counsel against, but it may make sense here), you need some way to figure out which client each packet came from.
That means either
using TCP (since you seem to be trying for connection-oriented behaviour anyway)
reading each packet, figuring out which logical client connection it belongs to, and sending it to the right thread. Note that since the routing information is global/shared state, these two are equivalent:
keep a source IP -> thread mapping, protected by a mutex, read & access from all threads
do all the reads in a single thread, use a local source IP -> thread mapping
The first seems to be what you're angling for, but it's poor design. When a packet comes in you'll wake up one thread, then it locks the mutex and does the lookup, and potentially wakes another thread. The thread you want to handle this connection may also be blocked reading, so you need some mechanism to wake it.
The second at least gives a seperation of concerns (read/dispatch vs. processing).
Sensibly, your design should depend on
number of clients
I/O load
amount of non-I/O processing (or IO:CPU ratio, or ...)

The first thing my server does is create a sort of a thread-pool; it creates 5 threads that all are blocked automatically by a recvfrom() function, though it seems that, on most of the times when I connect another device to the server, more than one thread is responding and later on that causes the server to be blocked entirely and not operate further
Rather than having all your threads sit on a recvfrom() on the same socket connection, you should protect the connection with a semaphore, and have your worker threads wait on the semaphore. When a thread acquires the semaphore, it can call recvfrom(), and when that returns with a packet, the thread can release the semaphore (for another thread to acquire) and handle the packet itself. When it's done servicing the packet, it can return to waiting on the semaphore. This way you avoid having to transfer data between threads.

Your recvfrom should be in the master thread and when it gets data you should pass the address IP:Port and data of the UDP client to the helper threads.
Passing the IP:port and data can be done by spawning a new thread everytime the master thread receives a UDP packet or can be passed to the helper threads through a message queue

I think that your main problem is the non-persistent udp connection. Udp is not keeping your connections alive, it exchanges only two datagrams per session. Depending on your application, in the worst case, it will have concurrent threads reading from the first available information, ie, recvfrom() will unblock even if it is not it's turn to do it.
I think the way to go is using select in the main thread and, with a concurrent buffer, manage what wich thread will do.
In this solution, you can have one thread per client, or one thread per file, assuming that you keep the clients necessary information to make sure you're sending the right file part.
TCP is another way to do it, since it keeps the connection alive for every thread you run, but is not the best transmission way on data lost allowed applications.

Related

C++ select()/threads or an alternative to handle multiple clients in a server

I am trying to learn C++ (with prior programming knowledge) by creating a server application with multiple clients. Server application will run on Raspberry Pi/Debian (Raspbian). I thought this would also be a good opportunity to learn about low-level concurrent programming with threads (e.g. POSIX). Then I came across with select() function which basically allows usage of blocking functions in a single thread to handle multiple clients, which is interesting. Some people here on StackOverflow mentioned that threads cause a lot of overhead and select() seems to be a nice alternative.
In reality, I will have 1-3 clients connected but I would like to keep my application flexible. As a structure design, I was thinking about a main thread invoking a data thread (processing stuff non-stop) and a server thread (listening for incoming connections). Since accept() call is blocking, the latter one needs to be a separate thread. If a client connects, then for each client, I may need a separate thread as well.
At the end, worker thread will write to the shared memory and client threads will read from there and communicate with the clients. Some people were opposing to the usage of threads but in my understanding, threads are good if they are invoked rarely (and long living) and if there are blocking function calls. For the last one as it seems there is the select() function, which used in a loop, allows for handling of multiple sockets in a single thread.
I think at least for the data processing and server accept() call, I will need 2 separate threads initiated at the beginning. I may handle all clients with select() in a single thread or separate threads. What would be the correct approach and are there smarter alternatives?

communication between main() and thread()

I have a question about threads communication.
there is client and server.
server:
main function- its job is to listen to some port (TCP communication) and get commands from the client
the thread job is to transmit video fluently to the client.
client:
main function- transmit commands to server
thread- watch the video
the TCP\video part works fine.
after the main function of the server, got the command from the client, I need to send the command to the video thread and send back from the video thread to the server's main- "o.k" .
the problem is to send commands from the server's main, to the video thread and vice versa.
its enough that the command will be one variable..
any ideas?
thanks!

The pipe is a bad approach towards two way communication, you could use, shared memory.
In shared memory, both processes have access to some memory that can be used read or write, such that writes in one are visible in the reads of the other and vice versa.
for more details on shared memory http://www.cs.cf.ac.uk/Dave/C/node27.html

if threads and one variable then use atomic variable. if object then use locking (trylock inside video streaming loop and lock write command inside main). if you want commands as queue-ed then use safe concurent queue
I think on your case:
I would do it with two Wait-free ring buffer from boost examples. Making two single producer -consumer.In one producer will be main function consumer other thread,on another one vice versa. (it be like using two pipes in unix but efficient one)
A wait-free ring buffer provides a mechanism for relaying objects from one single "producer" thread to one single "consumer" thread without any locks. The operations on this data structure are "wait-free" which means that each operation finishes within a constant number of steps. This makes this data structure suitable for use in hard real-time systems or for communication with interrupt/signal handlers.
Wait-free ring buffer
But considering that I am not aware of your situation there are might be more right ways.Maybe redesining at all

Receive/Send in single thread or separate threads?

I was in a discussion about Multiple Threads in a client application and was told that using a separate thread for receiving data and another thread for sending data is not the way to go.
Why?
From what I know TCP is Full-Duplex so this would be a performance improvement, or not?

Having a dedicated send thread and a dedicated receive thread is bad for two reasons.
First, it means that a context switch is required every time you go from receiving to sending unless you are doing both at the same time.
Second, it means that in the the typical path where you receive a query, formulate a response, and then send that response, data will need to be handed from one thread to another, blowing out caches.
That said, if performance isn't super-critical and it fits into your design well, it certainly works. It's just that there's usually no advantage.

I suppose it depends on the scale of your application. If you are doing a small app for a class project, it might be enough to have the send and receive on the same thread. Then you don't have to worry about threading issues.
However, I worked on an application that had to listen for several thousand incoming connections, and each connection might be sending a significant amount of data. We had a thread whose sole purpose was listening for socket connections and putting the new connections into a pool, and a variable number of threads (depending on how busy the app was) just for reading off of sockets, and a different pool of threads for writing.
The problem is that if your listening socket isn't reading the data off of the wire fast enough and the buffer fills up, an error is returned, and in the case of thousands of clients, caused there to be a lot of reconnects and re-sends of data, which compounded the problem that the data was not being read fast enough in the first place.
So it comes back to what I said in the first place - it depends on the scale of your application, but why not add in the ability now? Just make sure that you are thread safe, and you should be OK.

Waiting on a condition (pthread_cond_wait) and a socket change (select) simultaneously

I'm writing a POSIX compatible multi-threaded server in c/c++ that must be able to accept, read from, and write to a large number of connections asynchronously. The server has several worker threads which perform tasks and occasionally (and unpredictably) queue data to be written to the sockets. Data is also occasionally (and unpredictably) written to the sockets by the clients, so the server must also read asynchronously. One obvious way of doing this is to give each connection a thread which reads and writes from/to its socket; this is ugly, though, since each connection may persist for a long time and the server thus may have to hold hundred or thousand threads just to keep track of connections.
A better approach would be to have a single thread that handled all communications using the select()/pselect() functions. I.e., a single thread waits on any socket to be readable, then spawns a job to process the input that will be handled by a pool of other threads whenever input is available. Whenever the other worker threads produce output for a connection, it gets queued, and the communication thread waits for that socket to be writable before writing it.
The problem with this is that the communication thread may be waiting in the select() or pselect() function when output is queued by the worker threads of the server. It's possible that, if no input arrives for several seconds or minutes, a queued chunk of output will just wait for the communication thread to be done select()ing. This shouldn't happen, however--data should be written as soon as possible.
Right now I see a couple solutions to this that are thread-safe. One is to have the communication thread busy-wait on input and update the list of sockets it waits on for writing every tenth of a second or so. This isn't optimal since it involves busy-waiting, but it will work. Another option is to use pselect() and send the USR1 signal (or something equivalent) whenever new output has been queued, allowing the communication thread to update the list of sockets it is waiting on for writable status immediately. I prefer the latter here, but still dislike using a signal for something that should be a condition (pthread_cond_t). Yet another option would be to include, in the list of file descriptors on which select() is waiting, a dummy file that we write a single byte to whenever a socket needs to be added to the writable fd_set for select(); this would wake up the communications server because that particular dummy file would then be readable, thus allowing the communications thread to immediately update it's writable fd_set.
I feel intuitively, that the second approach (with the signal) is the 'most correct' way to program the server, but I'm curious if anyone knows either which of the above is the most efficient, generally speaking, whether either of the above will cause race conditions that I'm not aware of, or if anyone knows of a more general solution to this problem. What I really want is a pthread_cond_wait_and_select() function that allows the comm thread to wait on both a change in sockets or a signal from a condition.
Thanks in advance.

This is a fairly common problem.
One often used solution is to have pipes as a communication mechanism from worker threads back to the I/O thread. Having completed its task a worker thread writes the pointer to the result into the pipe. The I/O thread waits on the read end of the pipe along with other sockets and file descriptors and once the pipe is ready for read it wakes up, retrieves the pointer to the result and proceeds with pushing the result into the client connection in non-blocking mode.
Note, that since pipe reads and writes of less then or equal to PIPE_BUF are atomic, the pointers get written and read in one shot. One can even have multiple worker threads writing pointers into the same pipe because of the atomicity guarantee.

Unfortunately, the best way to do this is different for each platform. The canonical, portable way to do it is to have your I/O thread block in poll. If you need to get the I/O thread to leave poll, you send a single byte on a pipe that the thread is polling. That will cause the thread to exit from poll immediately.
On Linux, epoll is the best way. On BSD-derived operating systems (including OSX, I think), kqueue. On Solaris, it used to be /dev/poll and there's something else now whose name I forget.
You may just want to consider using a library like libevent or Boost.Asio. They give you the best I/O model on each platform they support.

Your second approach is the cleaner way to go. It's totally normal to have things like select or epoll include custom events in your list. This is what we do on my current project to handle such events. We also use timers (on Linux timerfd_create) for periodic events.
On Linux the eventfd lets you create such arbitrary user events for this purpose -- thus I'd say it is quite accepted practice. For POSIX only functions, well, hmm, perhaps one of the pipe commands or socketpair I've also seen.
Busy-polling is not a good option. First you'll be scanning memory which will be used by other threads, thus causing CPU memory contention. Secondly you'll always have to return to your select call which will create a huge number of system calls and context switches which will hurt overall system performance.

send over IP immediately on different thread

This is probably impossible, but i'm going to ask anyways. I have a multi-threaded program (server) that receives a request on a thread dedicated to IP communications and then passes it on to worker threads to do work, then I have to send a reply back with answers to the client and send it when it is actually finished, with as little delay as possible. Currently I am using a consumer/producer pattern and placing replies on a queue for the IP Thread to take off and send back to my client. This, however gives me no guarantee about WHEN this is going to happen, as the IP thread might not get scheduled any time soon, I cannot know. This makes my client, that is blocking for this call, think that the request has failed, which is obviously not the point.
Due to the fact I am unable to make changes in the client, I need to solve this sending issue on my side, the problem that I'm facing is that I do not wish to start sharing my IP object (currently only on 1 thread) with the worker threads, as then things get overly complicated. I wondered if there is some way I can use thread sync mechanisms to ensure that the moment my worker thread is finished, the IP thread will execute my send the reply back to the client?
Will manual/autoreset events do this for me or are these not guaranteed to wake up the thread immediately?

If you need it sent immediately, your best bet is to bite the bullet and start sharing the connection object. Lock it before accessing it, of course, and be sure to think about what you'll do if the send buffer is already full (the connection thread will need to deal with sending the portion of the message that didn't fit the first time, or the worker thread will be blocked until the client accepts some of the data you've sent). This may not be too difficult if your clients only have one request running at a time; if that's the case you can simply pass ownership of the client object to the worker thread when it begins processing, and pass it back when you're done.
Another option is using real-time threads. The details will vary between operating systems, but in most cases, if your thread has a high enough priority, it will be scheduled in immediately if it becomes ready to run, and will preempt all other threads with lower priority until done. On Linux this can be done with the SCHED_RR priority class, for example. However, this can negatively impact performance in many cases; as well as crashing the system if your thread gets into an infinite loop. It also usually requires administrative rights to use these scheduling classes.
That said, if scheduling takes long enough that the client times out, you might have some other problems with load. You should also really put a number on how fast the response needs to be - there's no end of things you can do if you want to speed up the response, but there'll come a point where it doesn't matter anymore (do you need response in the tens of ms? single-digit ms? hundreds of microseconds? single-digit microseconds?).

There is no synchronization mechanism that will wake a thread immediately. When a synchronization mechanism for which a thread is waiting is signaled, the thread is placed in a ready queue for its priority class. It can be starved there for several seconds before it's scheduled (Windows does have mechanisms that deal with starvation over 3-4 second intervals).
I think that for out-of-band, critical communications you can have a higher priority thread to which you can enqueue the reply message and wake it up (with a condition variable, MRE or any other synchronization mechanism). If that thread has higher priority than the rest of your application's threads, waking it up will immediately effect a context switch.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js