UDP send and receive in different threads - c++

How independent is the handling of UDP send and receive on same socket in Linux kernel? The use case I have is a worker thread sending UDP test traffic on (up to) 1000 sockets, and receiving the UDP replies in another worker thread. The receiver will be an epoll loop that also receives hardware send and receive timestamps on the socket error queue.
To clarify, when doing a sendmsg() syscall, will this temporarily block (or generate EAGAIN/EWOULDBLOCK) on the receiver thread receiving on the same socket? (i.e. if the send and receive happen to overlap in time) All sockets are set to non-blocking mode.
Another question is granularity of locking in the kernel - if I send and receive with sendmmsg/recvmmsg, is a lock for that socket locked once per sendmmsg, or once per UDP datagram in the sendmmsg?
UPDATE: I took a look at the original patch for sendmmsg in Linux kernel, seems the main benefit is avoiding multiple transitions user-kernel space. If any locking is done, it is probably done inside the individual calls to __sys_sendmsg:
https://lwn.net/Articles/441169/

Each system call is thread independent. So, as far as you don't involve per process kernel data, both will run independently without disturbing each other.
Another different thing is what the kernel does with system calls related to the same inode (in this case, the virtual node assigned to the socket you use to communicate) To serialize and make atomic calls to the filesystem, the kernel normally does an inode lock during the whole system call (being this a read, write or ioctl system call) that stands for the whole system call (even if you do a unique write call to write a zillion bytes, the inode is blocked during the execution of the whole system call)
In the tcp-ip stack, this is made at the socket level, and is controlled in your case by the specific AF_INET socket class software. As udp is concerned, sending a packet or receiving doesn't affect shared resources that need to be locked, but you'll have to look at your udp implementation (or socket level) to see if some locking is done and what the granularity is. Normally, the lock, should it be, should be used only during the load/unload of the udp buffers (normally there aren't buffers in udp, as the socket and network card driver are enough to supply enough buffer resources.

Related

accept a socket in one thread and write data in different thread [duplicate]

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.

boost::asio internal queue capacity

I was attempting to understand Boost Asio implementation and limitations. As I understand from here - https://www.boost.org/doc/libs/1_75_0/doc/html/boost_asio/overview/core/basics.html
When you do an async_receive_from call on a socket, the following things happen
The socket forwards the request to the I/O execution context.
The I/O execution context signals to the operating system that it should start an asynchronous connect.
The operating system indicates that the connect operation has completed by placing the result on a queue, ready to be picked up by the I/O execution context.
When using an io_context as the I/O execution context, your program must make a call to io_context::run() (or to one of the similar io_context member functions) in order for the result to be retrieved. A call to io_context::run() blocks while there are unfinished asynchronous operations, so you would typically call it as soon as you have started your first asynchronous operation.
Assuming I have very high throughput of data coming in, what I'm trying to understand is
Is there a possibility of data loss in step 2 above where IO execution context signals OS to perform the async receive operation? Can the OS get somehow overwhelmed with the volume of asynchronous reads?
In step 3 above, OS puts completed reads in a queue. What is the capacity of this queue? Can this queue overflow if for example, there was a burst of network traffic and all the threads running io_context::run() are occupied, hence read data keeps accumulating in the queue? Is this queue bounded or unbounded?
The ASIO code is open-source, but I'm fairly new to C++ and am finding it a little difficult to understand the code. Appreciate any help on these questions. Thanks!
There's no buffering in ASIO whatsoever; ASIO is a thin wrapper around native OS select/epoll/kqueue/IOCP (depending on OS) as well as non-blocking send/recv calls.
Your question can thus be re-phased as "what happens when I don't call recv fast enough?". As it turns out, that question has already been asked before, see What happens if one doesn't call POSIX's recv “fast enough”?.
Anyway, to answer the specific questions:
1. Is there a possibility of data loss in step 2 above where IO execution context signals OS to perform the async receive operation? Can the OS get somehow overwhelmed with the volume of asynchronous reads?
The OS can't get overwhelmed with async receive calls because you can have at most 1 active async receive and send per socket, and the number of sockets is limited.
2. ... What is the capacity of this queue? Can this queue overflow if for example, there was a burst of network traffic and all the threads running io_context::run() are occupied, hence read data keeps accumulating in the queue? Is this queue bounded or unbounded?
The queueing characteristics of a TCP stream are determined by the TCP receive buffer and TCP receive window. These are configurable in most modern OSes, and can even by dynamic. The receive buffer is bounded, and if you don't receive fast enough, TCP has built-in mechanisms to signal the sending side to slow down/retransmit (a.k.a. TCP Flow Control).
Similarly UDP has a receive buffer. When that one gets full, new incoming packets are dropped.

UDP send() to localhost under Winsock throwing away packets?

Scenario is rather simple... not allowed to use "sendto()" so using "send()" instead...
Under winsock2.2, normal operation on an brand new i7 machine running Windows 7 Professional...
Using SOCK_DGRAM socket, Client and Server console applications connect over localhost (127.0.0.1) to test things ...
Have to use packets of constant size...
Client socket uses connect(), Server socket uses bind()...
Client sends N packets using series of BLOCKING send() calls. Server only uses ioctlsocket call with FIONREAD, running in a while loop to constantly printf() number of bytes awaiting to be received...
PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF TIME... What I mean is the number of bytes on the receiving socket differs between runs if I do not use SLEEP()...
Have played with changing buffer sizes, situation did not change much, except now there is no overflow, but the problem with the delay remains the same ...
I have seen many discussions about the issue between send() and recv(), but in this scenario, recv() is not even involved...
Thoughts anyone?
(P.S. The constraints under which I am programming are required for reasons beyond my control, so no WSA, .NET, MFC, STL, BOOST, QT or other stuff)
It is NOT an issue of buffer overflow for three reasons:
Both incoming and outgoing buffers are set and checked to be
significantly larger than ALL of the information being sent.
There is no recv(), only checking of the incoming buffer via ioctl() call, recv() is called long after, upon user input.
When Sleep() of >40ms is added between send()-s, the whole thing works, i.e. if there was an overflow no ammount of
Sleep() would have helped (again, see point (2) )
PACKETS GET LOST UNLESS I PUT SLEEP() WITH CONSIDERABLE AMMOUNT OF
TIME... What I mean is the number of bytes on the receiving socket
differs between runs if I do not use SLEEP()...
This is expected behavior; as others have said in the comments, UDP packets can and do get dropped for any reason. In the context of localhost-only communication, however, the reason is usually that a fixed-size packet buffer somewhere is full and can't hold the incoming UDP packet. Note that UDP has no concept of flow control, so if your receiving program can't keep up with your sending program, packet loss is definitely going to occur as soon as the buffers get full.
As for what to do about it, the insert-a-call-to-sleep() solution isn't particularly good because you have no good way of knowing what the "right" sleep-duration ought to be. (To short a sleep() and you'll still drop packets; too long a sleep() and you're transferring data more slowly than you might otherwise do; and of course the "best" value will likely vary from one computer to the next, or one moment to the next, in non-obvious ways).
One thing you could do is switch to a different transport protocol such as TCP, or (since you're only communicating within localhost), a simple pipe or socketpair. These protocols have the lossless FIFO semantics that you are looking for, so they might be the right tool for the job.
Assuming you are required to use UDP, however, UDP packet loss will be a fact of life for you, but there are some things you can do to reduce packet loss:
send() in blocking mode, or if using non-blocking send(), be sure to wait until the UDP socket select()'s as ready-for-write before calling send(). (I know you said you send() in blocking mode; I'm just including this for completeness)
Make your SO_RCVBUF setting as large as possible on the receiving UDP socket(s). The larger the buffer, the lower the chance of it filling up to capacity.
In the receiving program, be sure that the thread that calls recv() does nothing else that would ever hold it off from getting back to the next recv() call. In particular, no blocking operations (even printf() is a blocking operation that can slow your thread down, especially under Windows where the DOS prompt is infamous for slow scrolling under load)
Run your receiver's network recv() loop in a separate thread that does nothing else but call recv() and place the received data into a FIFO queue (or other shared data structure) somewhere. Then another thread can do the less time-critical work of examining and parsing the data in the FIFO, without fear of causing a dropped packet.
Run the UDP-receive thread at the highest priority you can convince the OS to let you run at. The fewer other tasks that can hold of the UDP-receive thread, the fewer opportunities for packets to get dropped during those hold-off periods.
Just keep in mind that no matter how clever you are at reducing the chances for UDP packet loss, UDP packet loss will still happen. So regardless you need to come up with a design that allows your programs to still function in a reasonably useful manner even when packets are lost. This could be done by implementing some kind of automatic-resend mechanism, or (depending on what you are trying to accomplish) by designing the protocol such that packet loss can simply be ignored.

Problems implementing a multi-threaded UDP server (threadpool?)

I am writing an audio streamer (client-server) as a project of mine (C/C++),
and I decided to make a multi threaded UDP server for this project.
The logic behind this is that each client will be handled in his own thread.
The problems I`m having are the interference of threads to one another.
The first thing my server does is create a sort of a thread-pool; it creates 5
threads that all are blocked automatically by a recvfrom() function,
though it seems that, on most of the times when I connect another device
to the server, more than one thread is responding and later on
that causes the server to be blocked entirely and not operate further.
It's pretty difficult to debug this as well so I write here in order
to get some advice on how usually multi-threaded UDP servers are implemented.
Should I use a mutex or semaphore in part of the code? If so, where?
Any ideas would be extremely helpful.
Take a step back: you say
each client will be handled in his own thread
but UDP isn't connection-oriented. If all clients use the same multicast address, there is no natural way to decide which thread should handle a given packet.
If you're wedded to the idea that each client gets its own thread (which I would generally counsel against, but it may make sense here), you need some way to figure out which client each packet came from.
That means either
using TCP (since you seem to be trying for connection-oriented behaviour anyway)
reading each packet, figuring out which logical client connection it belongs to, and sending it to the right thread. Note that since the routing information is global/shared state, these two are equivalent:
keep a source IP -> thread mapping, protected by a mutex, read & access from all threads
do all the reads in a single thread, use a local source IP -> thread mapping
The first seems to be what you're angling for, but it's poor design. When a packet comes in you'll wake up one thread, then it locks the mutex and does the lookup, and potentially wakes another thread. The thread you want to handle this connection may also be blocked reading, so you need some mechanism to wake it.
The second at least gives a seperation of concerns (read/dispatch vs. processing).
Sensibly, your design should depend on
number of clients
I/O load
amount of non-I/O processing (or IO:CPU ratio, or ...)
The first thing my server does is create a sort of a thread-pool; it creates 5 threads that all are blocked automatically by a recvfrom() function, though it seems that, on most of the times when I connect another device to the server, more than one thread is responding and later on that causes the server to be blocked entirely and not operate further
Rather than having all your threads sit on a recvfrom() on the same socket connection, you should protect the connection with a semaphore, and have your worker threads wait on the semaphore. When a thread acquires the semaphore, it can call recvfrom(), and when that returns with a packet, the thread can release the semaphore (for another thread to acquire) and handle the packet itself. When it's done servicing the packet, it can return to waiting on the semaphore. This way you avoid having to transfer data between threads.
Your recvfrom should be in the master thread and when it gets data you should pass the address IP:Port and data of the UDP client to the helper threads.
Passing the IP:port and data can be done by spawning a new thread everytime the master thread receives a UDP packet or can be passed to the helper threads through a message queue
I think that your main problem is the non-persistent udp connection. Udp is not keeping your connections alive, it exchanges only two datagrams per session. Depending on your application, in the worst case, it will have concurrent threads reading from the first available information, ie, recvfrom() will unblock even if it is not it's turn to do it.
I think the way to go is using select in the main thread and, with a concurrent buffer, manage what wich thread will do.
In this solution, you can have one thread per client, or one thread per file, assuming that you keep the clients necessary information to make sure you're sending the right file part.
TCP is another way to do it, since it keeps the connection alive for every thread you run, but is not the best transmission way on data lost allowed applications.

simultaneously read and write on the same socket in C or C++

I am implementing a simple server, that accepts a single connection and then uses that socket to simultaneously read and write messages from the read and write threads.
What is the safe and easy way to simultaneously read and write from the same socket descriptor in c/c++ on linux?
I dont need to worry about multiple threads read and writing from the same socket as there will be a single dedicated read and single dedicated write thread writing to the socket.
In the above scenario, is any kind of locking required?
Does the above scenario require non blocking socket?
Is there any opensource library, that would help in the above scenario?
In the above scenario, is any kind of locking required?
None.
Does the above scenario require non blocking socket?
The bit you're probably worried about - the read/recv and write/send threads on an established connection - do not need to be non-blocking if you're happy for those threads to sit there waiting to complete. That's normally one of the reasons you'd use threads rather than select, epoll, async operations, or io_uring - keeps the code simpler too.
If the thread accepting new clients is happy to block in the call to accept(), then you're all good there too.
Still, there's one subtle issue with TCP servers you might want to keep in the back of your mind... if your program grows to handle multiple clients and have some periodic housekeeping to do. It's natural and tempting to use a select or epoll call with a timeout to check for readability on the listening socket - which indicates a client connection attempt - then accept the connection. There's a race condition there: the client connection attempt may have dropped between select() and accept(), in which case accept() will block if the listening socket's not non-blocking, and that can prevent a timely return to the select() loop and halt the periodic on-timeout processing until another client connects.
Is there any opensource library, that would help in the above scenario?
There are hundreds of libraries for writing basic servers (and asking for 3rd party lib recommendations is off-topic on SO so I won't get into it), but ultimately what you've asked for is easily achieved atop an OS-provided BSD sockets API or the Windows bastardisation ("winsock").
Sockets are BI-DIRECTIONAL. If you've ever actually dissected an Ethernet or Serial cable or seen the low-level hardware wiring diagram for them, you can actually SEE distinct copper wires for the "TX" (transmit) and "RX" (receive) lines. The software for sending the signals, from the device controller up to most OS APIs for a 'socket', reflects this and it is the key difference between a socket and an ordinary pipe on most systems (e.g. Linux).
To really get the most out of sockets, you need:
1) Async IO support that uses IO Completion Ports, epoll(), or some similar async callback or event system to 'wake up' whenever data comes in on the socket. This then must call your lowest-level 'ReadData' API to read the message off the socket connection.
2) A 2nd API that supports the low-level writes, a 'WriteData' (transmit) that pushes bytes onto the socket and does not depend on anything the 'ReadData' logic needs. Remember, your send and receive are independent even at the hardware level, so don't introduce locking or other synchronization at this level.
3) A pool of Socket IO threads, which blindly do any processing of data that is read from or will be written to a socket.
4) PROTOCOL CALLBACK: A callback object the socket threads have smart pointers to. It handles any PROTOCOL layer- such as parsing your data blob into a real HTTP request- that sits on top of the basic socket connection. Remember, a socket is just a data pipe between computers and data sent over it will often arrive as a series of fragments- the packets. In protocols like UDP the packets aren't even in order. The low-level 'ReadData' and 'WriteData' will callback from their threads into here, because it is where content-aware data processing actually begins.
5) Any callbacks the protocol handler itself needs. For HTTP, you package the raw request buffers into nice objects that you hand off to a real servlet, which should return a nice response object that can be serialized into an HTTP spec-compliant response.
Notice the basic pattern: You have to make the whole system fundamentally async (an 'onion of callbacks') if you wish to take full advantage of bi-directional, async IO over sockets. The only way to read and write simultaneously to the socket is with threads, so you could still synchronize between a 'writer' and 'reader' thread, but I'd only do it if the protocol or other considerations forced my hand. The good news is that you can get great performance with sockets using highly async processing, the bad is that building such a system in a robust way is a serious effort.
You don't have to worry about it. One thread reading and one thread writing will work as you expect. Sockets are full duplex, so you can read while you write and vice-versa. You'd have to worry if you had multiple writers, but this is not the case.