Running DPDK C program without root privileges - dpdk

I have C application that is using DPDK 19.11. Currently, the application is running with root permissions (using sudo command). In addition my application is running with huge pages (1GB).
Network devices using DPDK-compatible driver:
0000:02:00.0 'Ethernet Controller X710 for 10GbE backplane 1581'
drv=igb_uio unused=
I would like to run my application without the root permissions - get rid from "sudo" command.
I change permission for those files/folders:
/sys/class/uio/uio*/device/resource*
/sys/class/uio/uio*/device/config
/dev/uio*
/dev/hugepages/*
when I run my application without "sudo"- I run in a problems with rte_eal_init function. I got this error:
EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not available
EAL: Cannot use IOVA as 'PA' since physical addresses are not available
My OS is Ubuntu 18.04, kernel 4.15.0-128-generic I noticed that at DPDK docs there is a remark about Running DPDK Applications Without Root Privileges - "since version 4.0, the kernel does not allow unprivileged processes to read the physical address information from the pagemaps file, making it impossible for those processes to be used by non-privileged users. In such cases, using the VFIO driver is recommended."
After reading comments I tried to use vfio-pci.
I load the module using:
sudo modprobe vfio-pci enable_unsafe_noiommu_mode=1
I also changed permission for /dev/hugepages/* and /dev/vfio/*
running with vfio-pci and sudo was successfully.
when running without sudo i got the same error:
EAL: FATAL: Cannot use IOVA as 'PA' since physical addresses are not available
EAL: Cannot use IOVA as 'PA' since physical addresses are not available
See also: https://doc.dpdk.org/guides/linux_gsg/enable_func.html#running-dpdk-applications-without-root-privileges
I wonder if someone has experience to run DPDK application without root with kernel 4.0 and above?
Also, as an alternative solution is to launch simple DPDK application with root privileges that will init DPDK. In parallel run another application without root privileges - this application will consume the packet and perform the business logic, is it possible?
thanks

First, it makes sense to check if you really need to use the unsafe mode with vfio-pci. Perhaps you just need to add intel_iommu=on iommu=pt to the kernel parameters for making the device work safely, i.e.:
modprobe vfio-pci
I haven't used the unsafe mode so far, perhaps the kernel even unconditionally disallows mappings for the vfio device, if unsafe mode is enabled, for (obvious?) security reasons.
For running a dpdk application without root privileges you need to adjust the permissions of the right vfio device. For example, when the permissions look like this
# ls -l /dev/vfio/
total 0
crw-------. 1 root root 235, 0 2021-08-21 15:13 17
crw-rw-rw-. 1 root root 10, 196 2021-08-21 15:13 vfio
then /dev/vfio/17 is the device you've bound for dpdk, thus adjust its permission like this:
chown juser /dev/vfio/17
A user process doesn't need extra permission for mapping huge pages. You don't even have to mount the hugetblfs, if you supply the --in-memory option to your dpdk program.
However, some hugepages must be reserved by root, e.g. during system boot. Example:
echo 4096 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
echo 8 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
Besides permissions, the default resource limits likely are too low. Especially, the memlock ones. If it's too low kernel logs something like this when starting the dpdk application:
kernel: vfio_pin_pages_remote: RLIMIT_MEMLOCK (65536) exceeded
And the dpdk application prints:
EAL: cannot set up DMA remapping, error 12 (Cannot allocate memory)
EAL: 0000:05:00.1 DMA remapping failed, error 12 (Cannot allocate memory)
Increasing the limits fixes this issue, e.g.:
cat /etc/security/limits.d/24-memlock.conf
# memlock unit: KiB
juerr hard memlock 16777216
juser soft memlock 1048576

DPDK should detect if you should use IOVA VA or PA. Using the switch enable_unsafe_noiommu_mode=1 is telling DPDK that you have no iommu and that you will use IOVA PA.
The problem is, that running in PA mode requires root privileges as you need access to the physical address.
That dpdk.org document you cited should do the trick. I was able to get DPDK running without root privileges in 20.02 in a docker container. However, there was another problem with the software we were running on top of DPDK and its interaction with the hugepage backing.
In the end, we decided to still run DPDK as root, however, we limited the capabilities of the container to the bare minimum set needed to run DPDK.

Related

EAL: Error - exiting with code: 1 Cause: No Ethernet ports - bye

I was trying to get the l2fwd application to work but it keeps showing this error. I dont understand I have the NICs properly bound and hugepages configured
The error
`
./dpdk-l2fwd -l 0-3 -n 1 --no-telemetry -- -q 8 -p ffff
EAL: Detected CPU lcores: 6
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
TELEMETRY: No legacy callbacks, legacy socket not created
MAC updating enabled
EAL: Error - exiting with code: 1
Cause: No Ethernet ports - bye
root#dpdku:~/dpdk/build/examples#
`
Hugepages
`
dpdk-hugepages.py -s
Node Pages Size Total
0 600 2Mb 1Gb
Hugepages mounted on /dev/hugepages
`
NICs:
`
dpdk-devbind.py -s
Network devices using DPDK-compatible driver
============================================
0000:00:08.0 'RTL-8100/8101L/8139 PCI Fast Ethernet Adapter 8139' drv=uio_pci_generic unused=8139too,vfio-pci
0000:00:09.0 'RTL-8100/8101L/8139 PCI Fast Ethernet Adapter 8139' drv=uio_pci_generic unused=8139too,vfio-pci
Network devices using kernel driver
===================================
0000:00:03.0 'RTL-8100/8101L/8139 PCI Fast Ethernet Adapter 8139' if=ens3 drv=8139cp unused=8139too,vfio-pci,uio_pci_generic *Active*
`
DPDK application l2fwd is just doing the right thing, not identifying the NIC you have shared. There is nothing wrong with the behaviour as this is the expected behaviour. Reason for the same is shared below as
The nic RTL-8100 is Realtek Semiconductor RTL-8100/8101L/8139 PCI Fast Ethernet Adapter
Following DPDK device enter link description here does not list the support for Realtek NIC
One can check nic vs features to go through the various PF and VF PMD, realtek driver is absent.
Also checking the online driver net PMD realtek driver is not supported.
Hence the only way to use the Realtek nic dpdk is to
bind the nic back to the kernel
use either PCAP PMD to access the device into DPDK
or use AF_XDP (if RTL supports the same), you can use AF_PACKET PMD.
Note: please at least read up on documentation or hw list from dpdk.org to understand the library and supported PMD.

DPDK l2fwd - How to forward ethernet interface to my PMD

I have a board with one ethernet interface (eth0) running Linux.
I'm trying to forward all incoming traffic from eth0 to my PMD driver, using dpdk-l2fwd example application.
Here is what I've tried:
./dpdk-l2fwd -c 0x3 --vdev={my_pmd}0 -- -p 0x3 -T 0
I can see that my rx_pkt_burst callback is polled by the application, but that's it.
How can I forward all incoming eth0 packets to my PMD?
I tried to use net_tap, using the following command:
./dpdk-l2fwd -c 0xff --vdev=net_tap0 --vdev={my_pmd}0 -- -p 0x7 -T 0 --portmap="(1,2)"
And my tx_pkt_burst callback is called occasionally, but not when I think it should be called.
For example, if I ping this board from another one, the ping is successful, but the tx_pkt_burst callback is not been called.
I tried to use devbind tool, but no devices are detected:
./usertools/dpdk-devbind.py --status
No 'Network' devices detected
=============================
No 'Baseband' devices detected
==============================
No 'Crypto' devices detected
============================
No 'Eventdev' devices detected
==============================
No 'Mempool' devices detected
=============================
No 'Compress' devices detected
==============================
No 'Misc (rawdev)' devices detected
===================================
No 'Regex' devices detected
===========================
Update
DPDK version - 20.11.
My HW is a embedded device based on NXP's Layerscape.
$ lshw -class network
*-network
description: Ethernet interface
physical id: 3
logical name: eth0
serial: 00:11:22:44:11:44
size: 1Gbit/s
capacity: 1Gbit/s
capabilities: ethernet physical tp mii 10bt-fd 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=fsl_dpaa2_eth driverversion=5.10.35-00002-g3434eea0e1e7-dir duplex=full firmware=7.17 ip=192.168.15.157 link=yes multicast=yes port=twisted pair speed=1Gbit/s
I'm trying to bypass all traffic to the PMD I'm currently developing.
Thanks.
[EDIT-1] clarification of using same interface for DPDK and Kernel routing
Answer> as discussed over comments please refer to DPDKD + kernel on same interface
Based on the information shared there are multiple questions to the single query I'm trying to bypass all traffic to the PMD I'm currently developing. Addressing each one separately below
question 1: using dpdk-l2fwd example application
Answer> DPDK application l2fwd application makes use of basic APi with almost no HW offloads. Based on your environment (I have a board with one ethernet interface (eth0)), the right set of parameters should be -p 0x1 --no-mac-updating -T 1. This will configure the application to receive and transmit packet using single DPDK interface (that is eth0 on your board).
Note: DPDK Application can work with DPDK PMD both physical and virtual
question 2: I tried to use net_tap, using the following command:
Answer> If the intend is to intercept the traffic from physical and then forward to tap interface, then one needs modify the eal arguments as ./build/l2fwd --vdev=net_tap0,iface="my_eth0" -- -p 0x3 -T 1 --no-mac-updating. This will allow the application to probe physical NXP interface (eth0) and make use of Linux TAP interface as secondary interface. Thus any traffic from NXP and TAP will be cross connected such as NXP (eth0) <==> TAP (my_eth0)
question 3: ./usertools/dpdk-devbind.py --status returns empty
Answer> Form the dpdk site supported NIC list NXP dpaa, dpaa2, enetc, enetfec, pfe. Cross checking the kernel driver fsl_dpaa2_eth I think it is safe to assume dpaa2 PMD is supported. As you have mentioned the NIC is not enumerated, it looks like there are certain caveats to such model revision, supported board, BSP package, vendor-sub vendor ID check etc. More details can be found Board Support Package, and DPAA2 NIC guide
Debug & Alternative solutions:
To start with use the Kernel Driver to bring in packets
Use extra logging and debug to identify why the NIC is shown in the application
Approach 1:
Make sure the NIC is bind with kernel driver fsl_dpaa2_eth.
ensure NIC is connected and link is up with ethtool eth0
set to promiscous mode with ifconfig eth0 promisc up
start DPDK application with PCAP PMD, ./build/l2fwd --vdev=net_pcap0,iface=eth0 -- -p 1 --no-mac-updating -T 1
Check packet are received and redirected to PCAP eth0 PMD by checking the statistics.
Approach 2:
Ideally the NIC should be categorized under network device to be probed by debind.py.
Check the device details using lshw -c net -businfo for network.
try checking with lspci -Dvmmnnk [PCIe BUS:Slot:Function id] for network details.
If above details does not show up as network device this might be reason for not getting listed.
Suggestions or workaround: You can try to forcefully bind with igb_uio or vfio-pci (I am not much famialr with NXP SoC) by dpdk-devbind -b vfio-pci [PCIe S:B:F]. Then cross check with lspci -ks [PCIe S:B:F]. Once successfully done, one can start dpdk l2fwd in PMD debug mode with ./build/l2fwd -a [PCIe S:B:F] --log-level=pmd,8 -- -p 1 --no-mac-updating | more. Thus by intercepting and interpreting the logs one can identify what is going
Note:
It is assumed the application is build with static libraries and not dynamic. To build with static libraries use make static for l2fwd.
For the described use case recommended application is basicfwd/skeleton rather than l2fwd.
Found the problem.
I had to unbind eth0 from Linux kernel.
Now I can simply run:
./dpdk-l2fwd -c 0x3 --vdev={MY_PMD}0 -- -p 0x3 -T 1
And all traffic in the physical port is forwarded to my PMD.

How to run valgrind for the server?

I am new to valgrind. I need to run the valgrind for a server written in C++. The server listens to a port. When the run the server inside the Valgrind, I couldn't communicate with the server. The Port is not listening.
valgrind --tool=memcheck --leak-check=yes --log-file=valgrind_log.txt /binary_path-c
I need the server should listen to the port when i run with valgrind.
If you have already confirmed that the exact same binary is doing that desired network socket open() and it doesn’t work in Valgrind, then read on.
Valgrind only works with binary file and cannot attach to an already running process (as explained here).
Valgrind is also sensitive to change of effective UID, particularly when running from root UID. You cannot use sudo with valgrind (detailed here).
You cannot Valgrind on an executable binary that has Linux capability bit enabled (details here).
Valgrind cannot handle root setuid on NFS filesystem (even when mounted to allow this). Workaround is to move your build or binary to non-NFS partition.
Having said all that above, it is a timing problem where Valgrind is taking things SLOWER and that the control flow of your code is “missing its mark” to performing that open to a network socket. Only way is to put in debug print statements throughout your code and nail that timing logic.
Alternatively...
To see what a production grade daemon is doing from the very beginning of startup, execute:
valgrind --trace-children=yes /usr/skin/<your-server-binary>
There’s another way to monitor network socket in action, read on ...
Tracing from start of execution
You can perform strace from the start and find out what network socket got opened (and described later, show its buffer content) by:
strace -eopen <your-server-binary> <server-arguments>
make a note of the desired fd (file descriptor) number.
As with any strace command in starting a process, pressing Ctrl-C will stop that process. But using strace on a live process, you safely detach using Ctrl-C from its targeted process (and let that process continue running) and return to your command shell prompt.
Attaching to already running server
But you could monitor an already running production daemon server using strace but it’s harder to find that opened fd number for your network socket. Do previous step briefly to get that fd.
Find out your PID using ps auxw.
Then plug in your server/daemon’s PID here:
strace -f -p <your-server-PID -fnetwork
to find out its fd number.
Exact socket monitoring
With the identified fd on hand, rerun strace to attach to that production server with:
strace -f -eread=<fd> -ewrite=<fd> -p<your-daemon-PID>
network troubleshooting checklist
lsof -i -n a list of open ports
strace
netstat -lt
tcpdump/wireshark
A list of network troubleshooting tools for Linux is given here, here and most comprehensively here.

"Incompatible hardware version" error when running DPDK on VMWare with VMXNET3 interface

We're trying to run DPDK example apps in a guest machine running Centos 7.5. The host is ESXi version 6.5.
I'm building dpdk on the guest machine where I'm trying to run it. I've tried both DPDK versions 18.05 and 18.08.
We have created five interfaces on esxi for connection to our guest. One management port and four data ports. We're binding theses four data ports to DPDK. The ports are all VMXNET3 interfaces. They are basically setup like the VMXNET3 interfaces in [https://doc.dpdk.org/guides/nics/vmxnet3.html], using a vswitch to connect to a physical interface. However note that we do not have any VF interfaces as shown in this document, only VMXNET3 interfaces. Unfortunately this document does not show any details on how to do the setup.
This document from vmware also shows a very similar setup. But again no details on how to setup.
Fundamentally, the roadblock we are hitting is that the VMXNET3 interfaces are failing initialization when starting the DPDK example app. Here is what we see:
[root#rg-vm ~]# ./dpdk-18.08/examples/packet_ordering/build/packet_ordering -c 0x0e0 -- -p 0xf
EAL: Detected 24 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Probing VFIO support...
EAL: PCI device 0000:04:00.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 15ad:7b0 net_vmxnet3
eth_vmxnet3_dev_init(): Incompatible hardware version: 0
EAL: Requested device 0000:04:00.0 cannot be used
We see this for all four interfaces that we are trying to bind to DPDK. However, strangely, sometimes after a reboot, the first two interfaces initialize correctly. But after that first attempt, all four interfaces then fail the same way.
Here are the commands we're using to setup DPDK.
modprobe uio
insmod ./dpdk-18.08/build/build/kernel/linux/igb_uio/igb_uio.ko
./dpdk-18.08/usertools/dpdk-devbind.py --bind=igb_uio 04:00.0
./dpdk-18.08/usertools/dpdk-devbind.py --bind=igb_uio 0c:00.0
./dpdk-18.08/usertools/dpdk-devbind.py --bind=igb_uio 13:00.0
./dpdk-18.08/usertools/dpdk-devbind.py --bind=igb_uio 1b:00.0
Note that we have also tried using the uio_pci_generic with the same results. We have not been able to get the vfio-pci driver to bind to the VMXNET3 interfaces.
I'm not sure it matters, but the physical interfaces on the other side of the vswitch that we're connecting to are:
17:00.0 Ethernet controller: Intel Corporation I350 Gigabit Fiber Network Connection (rev 01)
We have also tried using a Ethernet cards based on the intel 82576 chipset (this is the chipset DPDK shows being used in their documentation), and one based on the Intel X710. We see the same error using either of these cards as we did with the i350. So I think that eliminates the ethernet hardware, which makes sense, as using the vswitch between us and the ethernet controller should make us agnostic to what it actually is.
We are running on a Dell R540. Also note that when we run Centos 7.5 with DPDK on this hardware without VMWare, everything works fine. Also if we run in VMWare, but "passthrough" the i350 interfaces to the VM (instead of using vswitch and vmxnet) everything also works fine in that case.
I've tried updating the kernel (3.10) to the latest (4.18) but still get the same error.
If I try to read the version register (VRRS) (the one that causes this error) in the vmxnet3 pci bar registers (before I bind to DPDK) using ethtool, it looks fine (0xf). I've googled around a lot but can't seem to find much help on this. It is very possible the issue is with how I'm setting things up but I can't find any info that gives details on how else to do it.
Any help would be greatly appreciated. Thanks!
Try these steps:
cd /etc/default
vi grub
Edit GRUB-CMDLINE and Add “nopku”
GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet nopku transparent_hugepage=never log_buf_len=8M"
Recompile grub: sudo grub2-mkconfig -o /boot/grub2/grub.cfg
Reboot the VM and try DPDK.

"vagrant up" failing: Vagrant VM failed to remain in the running state

The command vagrant up is failing and I don't know why.
$ egrep -v '^ *(#|$)' Vagrantfile
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "precise32"
end
$ vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
[default] Importing base box 'precise32'...
[default] Matching MAC address for NAT networking...
[default] Setting the name of the VM...
[default] Clearing any previously set forwarded ports...
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Preparing network interfaces based on configuration...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] Booting VM...
[default] Waiting for VM to boot. This can take a few minutes.
The VM failed to remain in the "running" state while attempting to boot.
This is normally caused by a misconfiguration or host system incompatibilities.
Please open the VirtualBox GUI and attempt to boot the virtual machine
manually to get a more informative error message.
$ vagrant status
Current machine states:
default poweroff (virtualbox)
The VM is powered off. To restart the VM, simply run `vagrant up`
$ VBoxManage list runningvms
$
Here are the messages in the VirtualBox log file, VBoxSVC.log:
$ cat ~/.VirtualBox/VBoxSVC.log
VirtualBox XPCOM Server 4.2.16 r86992 linux.amd64 (Jul 4 2013 16:29:59) release log
00:00:00.000499 main Log opened 2013-08-13T18:40:45.907580000Z
00:00:00.000508 main OS Product: Linux
00:00:00.000509 main OS Release: 3.6.11-4.fc16.x86_64
00:00:00.000510 main OS Version: #1 SMP Tue Jan 8 20:57:42 UTC 2013
00:00:00.000537 main DMI Product Name: X8DA3
00:00:00.000547 main DMI Product Version: 1234567890
00:00:00.000647 main Host RAM: 24103MB total, 17127MB available
00:00:00.000654 main Executable: /usr/local/VirtualBox/VBoxSVC
00:00:00.000655 main Process ID: 9417
00:00:00.000656 main Package type: LINUX_64BITS_GENERIC
00:00:00.110125 nspr-2 Loading settings file "/opt/tomcat/.VirtualBox/VirtualBox.xml" with version "1.12-linux"
00:00:00.110817 nspr-2 Failed to retrive disk info: getDiskName(/dev/md126p1) --> md126p1
00:00:00.264367 nspr-2 VDInit finished
00:00:00.275173 nspr-2 Loading settings file "/opt/tomcat/VirtualBox VMs/vagrant_getting_started_default_1376419129/vagrant_getting_started_default_1376419129.vbox" with version "1.12-linux"
00:00:05.288923 main ERROR [COM]: aRC=VBOX_E_OBJECT_IN_USE (0x80bb000c) aIID={29989373-b111-4654-8493-2e1176cba890} aComponent={Medium} aText={Medium '/opt/tomcat/VirtualBox VMs/vagrant_getting_started_default_1376419129/box-disk1.vmdk' cannot be closed because it is still attached to 1 virtual machines}, preserve=false
00:00:05.290229 Watcher ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={3b2f08eb-b810-4715-bee0-bb06b9880ad2} aComponent={VirtualBox} aText={The object is not ready}, preserve=false
$
Any advice would be greatly appreciated.
Had the same error on OSX. Restarting VirtualBox fixed it :S
sudo /Library/StartupItems/VirtualBox/VirtualBox restart
Also see: https://forums.virtualbox.org/viewtopic.php?t=5489
I solved the problem by re-installing VirtualBox and adding myself to the vboxusers group. The re-installation process printed a message indicating that VM users had to be a member of that group. I don't know if the re-installation was necessary or if being added to the group would have sufficed.
The host machine was 32bits (Ubuntu) and the guest was 64bit, I changed the guest to 32 and it solved the problem.
My understanding is that vboxusers group is related to accessing USB devices within the guest. Not sure why it is causing the issue. Normally, as a vagrant base box build guideline, audio and USB are both disabled.
As per the VirtualBox Manual => The vboxusers group
The Linux installers create the system user group vboxusers during installation. Any system user who is going to use USB devices from VirtualBox guests must be a member of that group. A user can be made a member of the group vboxusers through the GUI user/group management or at the command line with sudo usermod -a -G vboxusers username
Note that adding an active user to that group will require that user to log out and back in again. This should be done manually after successful installation of the package.
I had the same problem. It is because I did a wrong configuration on my Vagrantfile in the provider section. I had tried to make my VM machine more powerfull, with 2 cpus when i have on the machine host just one.
this often happens when you try to add more hardware to your VM machine but your host machine does not have the minimun requirements