I've been hitting the performance limits of the kernel TUN driver and I'm looking to the DPDK KNI driver as an alternative since it cites itself as a replacement for Linux TUN/TAP and provides a more efficient interface to get packets to the network stack.
I've been experimenting with it as a replacement, but it operates at L2 and I'm unsure how to configure the interface or what scaffolding I need to build to make it behave as an L3 point-to-point TUN driver. I can see packets (with Ethernet headers) being presented to the interface, including a DHCP packet from the OS, but I'm not sure how to respond to these messages and a short while adding an IP to the interface the kernel drops it.
Is it possible to use the KNI driver as a replacement for TUN and, if so, is there any existing tooling / open-source projects to use the KNI interface as a L3 TUN device?
Related
I have DPDK-20.11.3 installed.
Given that a Dpdk application is a process from the Linux point of view, I would assume that there should be ways (possibly with constraints) of communication between the Dpdk application and Linux native application (capable of using Linux sockets interface).
What are the possible options, that I could explore? Pipes, shared memory? I don't have any tcp/ip stack ported on dpdk, so I likely can't use sockets at all?
I'd appreciate links and documents that could shed some light on this. Thanks!
You can use KNI interface. Here is the Sample app for the same.
https://doc.dpdk.org/guides-20.11/sample_app_ug/kernel_nic_interface.html
As clarified over comments the real intention is to send and receive full or partial packets into the Kernel network subsystem. The easiest way is to make use of DPDK PCAP PMD or TAP PMD.
How to use:
Tap:
ensure the DPDK application is running in a Linux environment.
making use of DPDK testpmd, l2fwd or skeleton update DPDK EAL by --vdev=net_tap0.
Starting DPDK application will result in tap interface dtap0
bring the interface up by sudo ip link set dtap0 up
One can assign an IP address or use a raw promiscuous device.
pinging both kernel thread and DPDK TAP PMD thread, up to 4Gbps of packet throughput can be achieved for small packets.
PCAP:
Create veth interface pair in Linux using ip link add dev v1 type veth peer name v2
use v1 in linux network subsystem
use v2 in dpdk application by --vdev=net_pcap0,iface=v2
Note:
my recommendation is to use the TAP interface since it is a dedicated PMD handling probe and removed with the DPDK application. Assigning IP address from Linux also allows it to be part of a local termination, firewall and netfilter processing. All kernel network knobs for ipv4formward, TCP, udp and sctp can be exercised too.
I do not recommend the use of KNI PMD, since it is deprecated and will be removed, additional thread in the kernel to handle the buffer management and Netlink, external dependency to be built (not done for most distros package distribution).
environment
I need to create a veth device for the slowpath for control packets.
What I tried till now:
I have created veth interfaces using below command
sudo ip link add veth1-2 type veth peer name veth2-1
when I use command sudo dpdk-devbind.py --bind=igb_uio veth1-2 to add veth into dpdk.
It gives me an error that "Unknown device: veth2-1
Is there any way we can add veth interfaces in dpdk ?
If you want a "dpdk-solution", then what you'll want to look at is KNI: https://doc.dpdk.org/guides/prog_guide/kernel_nic_interface.html
From their docs:
The DPDK Kernel NIC Interface (KNI) allows userspace applications access to the Linux* control plane.
The benefits of using the DPDK KNI are:
Faster than existing Linux TUN/TAP interfaces (by eliminating system
calls and copy_to_user()/copy_from_user() operations.
Allows management of DPDK ports using standard Linux net tools such as
ethtool, ifconfig and tcpdump.
Allows an interface with the kernel network stack.
If your fine using a non-dpdk solution, then a TUN/TAP device is a typical way to interface with the networking stack. Your application would receive packets on the dpdk-controlled nic, and if it is a control packet you would simply forward it on to the TUN/TAP device (or KNI if using DPDK's version of TUN/TAP). Similarly for the other direction, if the TUN/TAP/KNI device receives a packet from the networking stack, you would simply send it out the DPDK physical nic.
I want to implement LACP/LAG in my OVS-DPDK and offload it to hardware (hardware lag). But I don't find any related patches for the same, can you suggest me anything with which I can proceed with ?
Details:
1). OVS version: 2.13.0
2). DPDK version 19.11.0
3). OS "CentOS Linux 7"
4). Using virtual DPDK NIC
5). Trying to implement using lag PMD (new to this area and don't have much in-depth knowledge about these so searching for patches if any).
6). Running Lag on a switch which is created using OVS
[Edit-2] based on the update from comment "X722" but in dpdk I am using the ifc modules
The PCIe NIC card in use Intel FPGA 100G VF, which is used virtio vDPA acceleration by DMA copy to virtio VF ports by skipping the need for a virtual switch like OVS-DPDK.
The short answer there is no ready-made support for Hardware Lag orRTE_FLOW through IFC PMD
the detailed answer is you can if you accomplish the following
The RTE_FLOW with match action can be offloaded to NIC via enabling dpdk ovs build in compilation and run with other_config:hw-offload=true (as suggested by #stackinside). But this is not for LAG it is for exact match table offload to FPGA
For Hardware LAG enablement, you will need to work with FPGA engineer to create HW LAG FPGA binary by programming LUT. Then expose this feature to OVS either by custom IOCTL call (via admin queue) to PF.
Once step is done, then you can expose to DPDK via modifying IFCVF driver to support Hardware LAG.
Note: this will break the actual HW function for DMA virtio RX-TX to something different. There is no patch in ovs-dpdk that can create a binary image for FPGA.
Hence answer to your queries is
[Question-1] I want to implement LACP/LAG in my OVS-DPDK and offload it to hardware (hardware lag) (described in the description)
[Answer] there are 3 modes of LAG/LACP that can be done over OVS-DPDK
via OVS-DPDK (software) logic
via DPDK library (software)
via OVS-DPDK Hardware offload (OVS-DPDK agnostic and OVS-DPDk aware)
For OVS-DPDK Software logic I request you to check red hat ovs-dpdk configuration and verifying with show lacp 1 for lag-1 details.
For DPDK library Lag (Software), please add DPDK-LAG interface with option --vdev 'net_bonding0,bond_opt0=..,bond opt1=..' as mentioned dpdk test lag url
For HW offload LAG from
Intel side the ASIC like FM10K, FM12K and Snow Ridge SoC has LAG, but has to be configured via SDK using IES API.
Mellanox the embedded switch can create and maintain HW LAG, please refer to ovs and not DPDK-OVS LAG over BRIDGE-PHYSICAL
similar is the case for Broadcom, Marvell and netrnome.
But the HW LAG is not part of DPDK port or DPDK port representation. Hence the HW vendor or ASIC vendor will have custom calls via SDK.
note: list IP FPGA firmware offered by Intel
[Question-2] is to create bonds using OVS and integrate it with dpdk for lag implementation. (from comments)
[Answer] I believe there are vendor-specific patches for OVS (and not ovs-dpdk) for Broadcom, Marvell, Mellanox and netrnome. In these model, there is a specific name for bridge that points to ASIC or embedded switch br-phy. You will get in touch with vendor or check vendor github pages to get access to patches for OVS. Hence steps will be
Identify the ASIC in use
find the patch for OVS from vendor or vendor github
apply the patches and rebuild the for OVS-DPDK
based on the patch use the right bridge to setup lag example br-phy that is bridge physcial
Note: requested information for ASIC and vendor, since it is not shared it is difficult to lookup in Github.
[EDIT-1] OVS-DPDK generally relies on RTE_FLOW for any hardware-based offload. The vendor or ASIC specific offload/patches are available for OVS/OVS-DPDK from the vendor by integrating ASIC SDK.
#DeepakSahoo in comments I have shared the link DPDK NIC (also in the comments). Please try to identify the ASIC and use lshw -c net -businfo. If it is generic access either via RTE_FLOW or NIC specific DPDK API we can offload LAG to HW embedded switch. But if it is not present, you need access to SDK and Libraries for configuring the HW ASIC or embedded switch, then invoke those calls from OVS-DPDK code base. I have shared in the above comment how it is done by mellanox for OVS today. Hence there is no vendor or ASIC specific HW offload patches
I am a newbie to Intel DPDK.
I am planning to write an http web server.
Can it be implemented using the following logic using DPDK ?
Get the packets and send it to Worker Logical Cores.
A Worker Logical Core build 'http reuqest' sent by the client, using
the incoming packets.
Process the 'http reuest' in the Worker Logical Core and produce an
'http response'.
Create packets for the 'http response' and dispatch them to output
software rings.
I am not sure whether the above is feasible or not.
Is it possible to write a web server using Intel DPDK?
It is lot of work since you'll need a TCP/IP stack on top of the DPDK. Even once you'll have ported a TCP/IP stack on top of DPDK (or reusing a port from an OS), you won't have the performance because it is easy to write C code that runs, but writting a TCP/IP stack that sustains good performances, it is a very difficult development.
You can try http://www.6wind.com/6windgate-performance/tcp-termination/ : they do not provide a HTTP server, but they provide a L7 like TCP socket support to build the fastest HTTP servers.
Yes, its possible to build a Web Server using DPDK. You could either use a KNI interface provided by DPDK. All packets received on a KNI interfaces are still routed through the kernel network stack -- however, and heres the catch, this is still faster than directly receiving packets from the kernel (requires multiple copies). With DPDK you could still ping cores to RX and different lcores to TX. You could then instruct your OS not to use these lcores for anything else. So you really have dedicated lcores for packet TX and RX. Ensure that Tx and RX lcores lie on different CPU sockets.
More information at:
http://dpdk.org/doc/guides/sample_app_ug/kernel_nic_interface.html
I want to develop a bandwidth allocator to a network which will be behind my machine.
Now, I've read about NDIS but I am not sure whether the network traffic that is neither originating from my machine nor is destined for my machine will enter my TCP/IP stack, so that I can block/unblock packets via NDIS on a windows machine.
NDIS (kernel) drivers live in the Windows network stack, and so can only intercept packets which are handled by this stack.
You cannot filter packets which are not send to your computer.
(When the computer acts as a router, the packets are send to the computer and the computer forwards the packets to the actual recepient, if that was the question)
In normal operation mode the irrelevant traffic will be dropped by the NIC driver/firmware, like pointed above. However, this is a SW issue so this behavior can be changed by adding an appropriate logic into the device driver and/or firmware. This is how sniffers operate, for example.