What I understand about DPDK is that the ring buffer of NIC gets mapped to userspace address and data in there gets processes on a polling basis. (Please correct me if wrong)
For that, how is periodic polling carried out? Is there any process running in the background that periodically causes polling, through API provided by PMD(polling mode driver)?
The polling is done straightforward in a loop, i.e.:
main() {
// Init ports
...
// Main loop
while(!quit_flag) {
// Receive a burst of packets (poll)
nb_rx = rte_eth_rx_burst(...);
// Process packets
...
// Send a burst of packets
rte_eth_tx_burst(..., nb_rx);
}
}
Sure, it could be done on a separate thread(s) (in DPDK we call them lcores), but the idea stays the same: the application model is up to the developer.
There are lots of examples in the DPDK repo
DPDK also has a few frameworks to facilitate an implementation of an event-driven or pipeline application architecture.
For more details see DPDK Programmers Guide
Related
I'm using Boost::Asio to create multiple UDP sockets and serial ports, and I use a single boost::asio::io_context that is shared among all of them (given in the constructor). All devices are configured to run with async reads and writes. Then, I only call io_context.run(); to let it run forever. It works very well most of the time.
However, at some point, for example when a sudden traffic load reaches one socket, my process suddenly jumps to 100% on the CPU and stays at 100% forever, with the global communication dying. When this happens, using perf, I see that the process is stuck 99.99% of the time at the same place, with a stack looking like:
main
asio::detail::scheduler::run
asio::detail::epoll_reactor::descriptor_state::do_complete
asio::descriptor_read_op<asio::mutable_buffers_1, std::_Bind<void
my_serial_port::on_async_data_received <- this is my receiving function for serial ports
...
So it seems that it is stuck processing only one serial port in loop, but nothing else anymore, as if a same event was processed endlessly, while there is still a lot of other data coming in the system.
Is there something I'm doing wrong by sharing the io_context ?
Is there a way to debug such issues with events with Boost::Asio ?
I have seen a similar hang, but where the stack only shows a function called by a timer event instead of the serial port (i.e. a timer sending a statistics packet at 1 Hz, but taking 100% of the CPU and blocking everything else).
Context:
On an embedded system using ROS and Linux, I'm running a process (ROS node) that acts as a communication router. It has 7 inputs/outputs: 2 serial ports (3 Mb/s), 2 network UDP sockets and 3 local UDP sockets (UNIX domain). It also listens to some ROS topics coming from other processes.
Packets can be received on all ports and a custom protocol is used to decode the packets, read their destination and send them out further on the given port. Some packets are also generated in the process and sent out on some ports, based on data subscribed through ROS.
To keep things simple, to avoid concurrency and because I only have one core available, I try to run this process on a single main thread.
To merge ROS and Boost::Asio together in a single thread, I'm using librosasio to forward events from ROS to the asio::io_context.
Thanks !
The BOOST_ENABLE_HANDLER_TRACKING may give you some insight into where the problem lies.
I am starting to work and understand the basics of DPDK and it's working with VMWare (VMXNET3 PMD). I started browsing through the code base and found reference to 3 ring structures in vmxnet3_tx_queue_t (at vmxnet3_ring.h), namely cmd_ring, data_ring and comp_ring.
I tried surfing to understand the use case and working of them, but didn't get quite get the documentation on it or was unable to understand.
Any pointers / direction would be of great help.
The vmxnet3 is pretty decently described in the DPDK NIC documentation:
http://doc.dpdk.org/guides/nics/vmxnet3.html
The driver pre-allocates the packet buffers and loads the command ring descriptors in advance. The hypervisor fills those packet buffers on packet arrival and write completion ring descriptors, which are eventually pulled by the PMD. After reception, the DPDK application frees the descriptors and loads new packet buffers for the coming packets.
In the transmit routine, the DPDK application fills packet buffer pointers in the descriptors of the command ring and notifies the hypervisor. In response the hypervisor takes packets and passes them to the vSwitch, It writes into the completion descriptors ring. The rings are read by the PMD in the next transmit routine call and the buffers and descriptors are freed from memory.
Not sure though if those details are the "basics of DPDK", as those low level queues are abstracted by the DPDK Poll Mode Driver API:
https://doc.dpdk.org/guides/prog_guide/poll_mode_drv.html
So you better refer this document and use this API, as you won't be able to use vmxnet3 rings directly in your app anyway...
Is it possible to configure DPDK so that the NIC sends an interrupt whenever a packet is received (rather than turning off interrupts and having the core poll on the RX queue)? I know this seems counterintuitive but there is a use case I have in mind that could benefit from this.
DPDK claims to allow you to use interrupts for RX queues (you can call rte_eth_dev_rx_intr_enable and pass a port/queue pair as arguments), but upon digging through the code, it seems that this is misleading. There is a polling thread that calls epoll_wait, and upon receipt of a packet, calls eal_intr_process_interrupts. This function then goes through a list of callback functions (which are supposed to be the interrupt handlers) and executes each one. The function then calls epoll_wait again (i.e. it is in an infinite loop).
Is my understanding of how DPDK handles "interrupts" correct? In other words, even if you turn "interrupts" on, DPDK is really just polling in the background and then executing callback functions (so there are no interrupts)?
Is my understanding of how DPDK handles "interrupts" correct?
DPDK is a user space application. Unfortunately, there is no magic way to receive an interrupt callback directly to the user space application.
So NIC interrupts get serviced in kernel any way, then kernel notifies to a user space using an eventfd. User space thread waits for the eventfd notification using epoll_wait.
In other words, even if you turn "interrupts" on, DPDK is really just polling in the background and then executing callback functions (so there are no interrupts)?
If there is no data to receive, DPDK polling thread should block on epoll_wait.
The program is a client server socket application being developed with C on Linux. There is a remote server to which each client connects and logs itself as being online. There will be most likely be several clients online at any given point of time, all trying to connect to the server to log themselves as being online/busy/idle etc. So how can the server handle these concurrent requests. What's a good design approach (Forking/multithreading for each connection request maybe?)?
personally i would use the event driven approach for servers. there you register a callback that is called as soon as a connection arrives. and event callbacks whenever the socket is ready to read or write.
with a huge amount of connections you will have a great performance and resource benefit compared to threads. But i would also prefere this for a smaler count of connections.
i only would use threads if you really need to use multiple cores or if you have some request that could take longer to process and where it is too complicate to handle it without threads.
i use libev as base library to handle event driven networking.
Generally speaking, you want a thread pool to service requests.
A typical structure will start with a single thread that does nothing but queue up incoming requests. Since it doesn't do very much, it's typically pretty easy for one thread to keep up with the maximum speed of the network.
That puts the items into some sort of concurrent queue. Then you have a pool of other threads reading items from the queue, doing what's needed, then depositing the result in another queue (and repeating, and repeating until the servers shuts down).
Finally, you have another single thread that just takes items from the result queue, and sends replies out to the clients.
Best approach is a combination of event driven model with multithreaded model.
You create a bunch of nonblocking sockets, but threads count should be much fewver. I.e. 10 sockets per thread.
Then you just listen for an event (incoming request) on every thread in a non-blocking mode and process it as it happens.
This technique usually performs better then non-blocking sockets or multithreaded model separately.
Take a look at Comer's "Internetworking with TCP/IP" volume 3 (BSD sockets version), it has detailed examples for different ways of writing servers and clients. The full code (sans explanations, unfortunally) is on the web. Or rummage around in http://tldp.org, there you'll find a collection of tutorials.
select or poll or epoll
These are facilities on *nix systems to aggregate multiple event sources (connections) into a single waiting point. The server adds the connections to a data structure, and then waits by calling select etc. It gets woken up when stuff happens on any of these connections, figures out which one, handles it, and then goes back to sleep. See manual for details.
There are several higher level libraries built on top of these mechanisms, that make programming them somewhat easier e.g. libevent, libev etc.
Given: multithreaded (~20 threads) C++ application under RHEL 5.3.
When testing under load, top shows that CPU usage jumps in range 10-40% every second.
The design mostly pretty simple - most of the threads implement active object design pattern: thread has a thread-safe queue, requests from other queues are pushed to the queue, while the thread only polling on the queue and process incomming requests. Processed request causes to a new request to be pushed to next processing thread.
The process has several TCP/UDP connection over each a data is received/sent in a high load.
I know I did not provided sufficiant data. This is pretty big application, and I'n not familiar well with all it's parts. It's now ported from Windows on Linux over ACE library (used for networking part).
Suppusing the problem is in the application and not external one, what are the techicues/tools/approaches can be used to discover the problem. For example I suspect that this maybe caused by some mutex contention.
I have faced similar problem some time back and here are the steps that helped me.
1) Start with using strace to see where the application is spending the time executing system calls.
2) Use OProfile to profile both the application and the kernel.
3) If you are using an SMP system , look at the numa settings,
In my case that caused a havoc .
/proc/appPID/numa_maps will give a quick look at how the access to the memory is happening.
numa misses can cause the jumps.
4) You have mentioned about TCP connections in your app.
Look at the MTU size and see its set to right value and
Depending upon the type of Data getting transferred use the Nagles Delay appropriately.
Nagles Delay