Multiple errors when trying to use DPDK pdump - dpdk

Goal: To write a DPDK based application that can read UDP packets via a 100gbe ethernet port and write the payload to disk depending on what the destination IP/Port is. At most, each 100gbe link will have two different destination IP addresses and 4 unique destination port numbers. Initial design calls for two unique port numbers.
Hardware
Current Test System
For now, I am testing with the following hardware. The server hardware and NVME drives will be significantly upgraded in the next few weeks. For now, I am using the following hardware to develop a proof of concept (POC). The NIC will remain the same unless recommended otherwise.
2 x Intel Xeon Gold 6348 CPU # 2.6 Ghz
28 cores per socket
Max 3.5 Ghz
Hyperthreading disabled
Ubuntu 22.04.1 LTS
Kernel 5.15.0-53-generic
Cores set to performance governor
4 x Sabrent 2TB Rocket 4 Plus in RAID0 Config
128 GB DDR4 Memory
10 1GB HugePages (Can change to what is required)
1 x Mellanox ConnectX-5 100gbe NIC
31:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Firmware-version: 16.35.1012
UDP Source:
100 gbe NIC
9000 MTU Packets
ipv4-udp packets
Currently only 4GB/s per port but eventually will be 10GB/s per port.
NIC Information
ethtool output:
Settings for ens7f0np0:
Supported ports: [ Backplane ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: RS
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
link
Link detected: yes
testpmd info:
[sudo] password for maa:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: B8:CE:F6:FB:13:30
Configuring Port 1 (socket 0)
Port 1: B8:CE:F6:FB:13:31
Checking link statuses...
Done
testpmd> show port info 0
********************* Infos for port 0 *********************
MAC address: B8:CE:F6:FB:13:30
Device name: 31:00.0
Driver name: mlx5_pci
Firmware-version: 16.35.1012
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 100 Gbps
Link duplex: full-duplex
Autoneg status: On
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 128
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 1
Supported RSS offload flow types:
ipv4
ipv4-frag
ipv4-tcp
ipv4-udp
ipv4-other
ipv6
ipv6-frag
ipv6-tcp
ipv6-udp
ipv6-other
ipv6-ex
ipv6-tcp-ex
ipv6-udp-ex
l4-dst-only
l4-src-only
l3-dst-only
l3-src-only
Minimum size of RX buffer: 32
Maximum configurable length of RX packet: 65536
Maximum configurable size of LRO aggregated packet: 65280
Current number of RX queues: 1
Max possible RX queues: 1024
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 1024
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1
Max segment number per packet: 40
Max segment number per MTU/TSO: 40
Device capabilities: 0x14( RXQ_SHARE FLOW_SHARED_OBJECT_KEEP )
Switch name: 31:00.0
Switch domain Id: 0
Switch Port Id: 65535
Switch Rx domain: 0
RX offload capabilities:
testpmd> show port 0 rx_offload capabilities
Rx Offloading Capabilities of port 0 :
Per Queue : VLAN_STRIP IPV4_CKSUM UDP_CKSUM TCP_CKSUM TCP_LRO SCATTER TIMESTAMP KEEP_CRC RSS_HASH BUFFER_SPLIT
Per Port : VLAN_FILTER
Physical Layout
For now, I am only attempting to get a single stream on a single port working. My plan is for each queue to be tied to an lcore, and each lcore has its own thread to strip the headers, and then a thread to write to disk. Assume that the hardware is on the same NUMA node. No TX is currently required.
Requirements
Eventually 10 GB/s to disk from one 100gb/s port (Currently trying to get 4GB/s)
Assume that I will have a processor, PCI Lanes, and NVME configuration that can support this, I am only worried about the DPDK side of things for now.
Each individual stream will be ~5GB/s, currently they are ~2GB/s and I am just trying to get that to work for now.
Each stream's headers are stripped.
Each stream has its payload written to disk as a single continous file (.dat format).
Each stream has a unique port number.
If the FCS is bad, drop the packet and replace with blank data (all zeros).
What I have tried:
I am having a similar issue to the poster at this question: DPDK pdump failed to hotplug add device
I attempted the fix there and am still having issues.
I have simply tried to get a pcap file by attaching pdump to the skeleton code.
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2015 Intel Corporation
*/
#include <stdint.h>
#include <inttypes.h>
#include <rte_eal.h>
#include <rte_ethdev.h>
#include <rte_cycles.h>
#include <rte_lcore.h>
#include <rte_mbuf.h>
#include <rte_pdump.h>
#define RX_RING_SIZE 1024
#define TX_RING_SIZE 1024
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 250
#define BURST_SIZE 32
/* basicfwd.c: Basic DPDK skeleton forwarding example. */
/*
* Initializes a given port using global settings and with the RX buffers
* coming from the mbuf_pool passed as a parameter.
*/
/* Main functional part of port initialization. 8< */
static inline int
port_init(uint16_t port, struct rte_mempool *mbuf_pool)
{
struct rte_eth_conf port_conf;
const uint16_t rx_rings = 1, tx_rings = 1;
uint16_t nb_rxd = RX_RING_SIZE;
uint16_t nb_txd = TX_RING_SIZE;
int retval;
uint16_t q;
struct rte_eth_dev_info dev_info;
struct rte_eth_txconf txconf;
if (!rte_eth_dev_is_valid_port(port))
return -1;
memset(&port_conf, 0, sizeof(struct rte_eth_conf));
retval = rte_eth_dev_info_get(port, &dev_info);
if (retval != 0) {
printf("Error during getting device (port %u) info: %s\n",
port, strerror(-retval));
return retval;
}
if (dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)
port_conf.txmode.offloads |=
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
/* Configure the Ethernet device. */
retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
if (retval != 0)
return retval;
retval = rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
if (retval != 0)
return retval;
/* Allocate and set up 1 RX queue per Ethernet port. */
for (q = 0; q < rx_rings; q++) {
retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
rte_eth_dev_socket_id(port), NULL, mbuf_pool);
if (retval < 0)
return retval;
}
txconf = dev_info.default_txconf;
txconf.offloads = port_conf.txmode.offloads;
/* Allocate and set up 1 TX queue per Ethernet port. */
for (q = 0; q < tx_rings; q++) {
retval = rte_eth_tx_queue_setup(port, q, nb_txd,
rte_eth_dev_socket_id(port), &txconf);
if (retval < 0)
return retval;
}
/* Starting Ethernet port. 8< */
retval = rte_eth_dev_start(port);
/* >8 End of starting of ethernet port. */
if (retval < 0)
return retval;
/* Display the port MAC address. */
struct rte_ether_addr addr;
retval = rte_eth_macaddr_get(port, &addr);
if (retval != 0)
return retval;
printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
" %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
port, RTE_ETHER_ADDR_BYTES(&addr));
/* Enable RX in promiscuous mode for the Ethernet device. */
retval = rte_eth_promiscuous_enable(port);
/* End of setting RX port in promiscuous mode. */
if (retval != 0)
return retval;
return 0;
}
/* >8 End of main functional part of port initialization. */
/*
* The lcore main. This is the main thread that does the work, reading from
* an input port and writing to an output port.
*/
/* Basic forwarding application lcore. 8< */
static __rte_noreturn void
lcore_main(void)
{
uint16_t port;
int total = 0;
/*
* Check that the port is on the same NUMA node as the polling thread
* for best performance.
*/
RTE_ETH_FOREACH_DEV(port)
if (rte_eth_dev_socket_id(port) >= 0 &&
rte_eth_dev_socket_id(port) !=
(int)rte_socket_id())
printf("WARNING, port %u is on remote NUMA node to "
"polling thread.\n\tPerformance will "
"not be optimal.\n", port);
printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
rte_lcore_id());
/* Main work of application loop. 8< */
for (;;) {
/*
* Receive packets on a port and forward them on the paired
* port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
*/
RTE_ETH_FOREACH_DEV(port) {
/* Get burst of RX packets, from first port of pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
total += nb_rx;
//rte_pktmbuf_free(bufs);
//printf("\nTotal: %d",total);
}
}
/* >8 End of loop. */
}
/* >8 End Basic forwarding application lcore. */
/*
* The main function, which does initialization and calls the per-lcore
* functions.
*/
int
main(int argc, char *argv[])
{
struct rte_mempool *mbuf_pool;
unsigned nb_ports;
uint16_t portid;
/* Initializion the Environment Abstraction Layer (EAL). 8< */
int ret = rte_eal_init(argc, argv);
rte_pdump_init();
if (ret < 0)
rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
/* >8 End of initialization the Environment Abstraction Layer (EAL). */
argc -= ret;
argv += ret;
/* Check that there is an even number of ports to send/receive on. */
nb_ports = rte_eth_dev_count_avail();
// if (nb_ports < 2 || (nb_ports & 1))
// rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
/* Creates a new mempool in memory to hold the mbufs. */
/* Allocates mempool to hold the mbufs. 8< */
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
/* >8 End of allocating mempool to hold mbuf. */
if (mbuf_pool == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
/* Initializing all ports. 8< */
RTE_ETH_FOREACH_DEV(portid)
if (port_init(portid, mbuf_pool) != 0)
rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
portid);
/* >8 End of initializing all ports. */
if (rte_lcore_count() > 1)
printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
/* Call lcore_main on the main core only. Called on single lcore. 8< */
lcore_main();
/* >8 End of called on single lcore. */
/* clean up the EAL */
rte_eal_cleanup();
rte_pdump_uninit();
return 0;
}
Command to run this example that I am using: sudo ./dpdk-skeleton -l 1,2,3,4 -n 4 -a 0000:31:00.0
Command running:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Port 0 MAC: b8 ce f6 fb 13 30
WARNING: Too many lcores enabled. Only 1 used.
Core 1 forwarding packets. [Ctrl+C to quit]
I am not really sure what some command line values should be (like -n) so I am just guessing at this point.
For pdump, I am using this command sudo ./dpdk-pdump -l 3,4,5 -a 0000:31:00.0 -- --multi --pdump 'port=0,queue=0,rx-dev=/mnt/md0/rx-1.pcap'
Again I am not sure of some of these values so making a best guess.
The Issue
Now with the primary application running, I attempt to run pdump the first time:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57567_26f06531cb8cd
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
Segmentation fault
The Second Time:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57601_26f14f88bf7bb
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
EAL: Failed to hotplug add device
EAL: Error - exiting with code: 1
Cause: vdev creation failed:create_mp_ring_vdev:767
And finally, the third time seems to run but has an unfamiliar MAC address:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57679_26f28a2a6e9f1
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
Port 1 MAC: 02 70 63 61 70 01
core (4); port 0 device ((null)) queue 0
However, I am not receiving anything and the pcap file is empty (and yes I am sending packets):
Signal 2 received, preparing to exit...
##### PDUMP DEBUG STATS #####
-packets dequeued: 0
-packets transmitted to vdev: 0
-packets freed: 0
Also, on the second run, these messages start appearing on the primary application:
EAL: Failed to send hotplug request to secondary
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
lcore 1 called rx_pkt_burst for not ready port 1
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
lcore 1 called rx_pkt_burst for not ready port 1
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
My Questions
I will handle writing to file etc. later. I am simply trying to get the data into DPDK properly and to the correct queue at this point.
I assume that I need to use RSS or rte_flow to direct each stream to its own queue based on IP/Port. How do I do this? This is my first time doing any kind of advanced network programming so I am not familiar with these concepts, but based on my research they appear to be what I need.
What is the difference between RSS and rte_flow/offloading? Are they the same thing?
Are there any examples out there that are similar to what I am trying to do?
Is what I am trying to do feasible?

First part of question:
If the goal to capture packets at line rate and write the descriptor and met-data into NVME, then please do not use the testpmd-pdump model. this will not scale for your use case intent.
To clarify the above
DPDK allows a multiprocess model - Single primary and multiple secondary to share same hugepage, hw resources and NIC.
DPDK testpmd and dpdk pdump is a prime example of using priamry-secondary model
BUT PDUMP library as of today make use of huge page sharing to clone-copy the packet received from PRIMARY (testpmd) to send it over RING buffer to SECODNARY (PDUMP) and then write to file using PCAP
PDUMP application is enahced to support multiple queues on multiple CPU, but one will still have inherent overhead of packet copy and send over ring buffer.
hence for the above use case mentioned, best approach to have the Primary custom application to receive packets over multiple queues (4 streams - then 4 queues) for UDP. Take as much of help from HW (MLX NIC CX05) for packet filtering using rte_flow
based flow bifurcation and type for packet classification to minimize the CPU overhead in parsing.
The second part of the question:
you are using DPDK Skeleton as primary
there is only 1RX-TX queue configured for the port.
due to the inherent design of CX-5 max that 1 queue can do is around 35Mpps to 40Mpps with 1 queue with the right settings.
for 100Gbps port with 1 queue you can achieve line rate for simple forward for a payload size of 512B onwards
From the code snippet I do find rte_pdump_init invoked but returned not checked (please fix).
I believe MLX CX-5 works with PDUMP with the right firmware, hence please cross-check the same too (information requested over comment).
But the recommendation is not to use PDUMP as secondary due to native overhead to MP_HANDLE. Instead directly consume the paylaod in primary or secondary.
Note:
as mentioned there are 3 or 4 scenarios cramped into a single question, and requires live debug for sorting these out 1 by 1.
please do not club multiple questions and scenario into 1 question

Related

run l2fwd fail in two containers

I am ready to run l2fwd in two containers, both of them are in the same host, start container1 run l2fwd successful, once start run l2fwd in another container2, then both of them got Segmentation fault error, anyone met this error, thanks.
Host: 4 sriov-vf enabled, driver: vfio-pci
container1: docker run --privileged --name="vhost_user" -v /dev:/dev -v /tmp:/tmp -itd centos-cu:v3
container2: docker run --privileged --name="virtio_user" -v /dev:/dev -v /tmp:/tmp -itd centos-cu:v3
l2fwd logs:
container1:
container1:
# ./l2fwd -l 2-3 -n 2 -w 0000:04:10.7 -w 0000:04:10.5 -- -p 0x3
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:04:10.5 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10ed net_ixgbe_vf
EAL: using IOMMU type 1 (Type 1)
EAL: PCI device 0000:04:10.7 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10ed net_ixgbe_vf
MAC updating enabled
Lcore 2: RX port 0
Lcore 3: RX port 1
Initializing port 0... done:
Port 0, MAC address: 02:09:C0:11:47:97
Initializing port 1... done:
Port 1, MAC address: 02:09:C0:00:2C:47
Checking link statusdone
Port0 Link Up. Speed 10000 Mbps - full-duplex
Port1 Link Up. Speed 10000 Mbps - full-duplex
L2FWD: entering main loop on lcore 3
L2FWD: -- lcoreid=3 portid=1
L2FWD: entering main loop on lcore 2
L2FWD: -- lcoreid=2 portid=0
Port statistics ====================================
Statistics for port 0 ------------------------------
Packets sent: 0
Packets received: 0
Packets dropped: 0
Statistics for port 1 ------------------------------
Packets sent: 0
Packets received: 5
Packets dropped: 0
Aggregate statistics ===============================
Total packets sent: 0
Total packets received: 5
Total packets dropped: 0
====================================================
Port statistics ====================================
Statistics for port 0 ------------------------------
Packets sent: 23
Packets received: 16
Packets dropped: 0
Statistics for port 1 ------------------------------
Packets sent: 16
Packets received: 26
Packets dropped: 0
Aggregate statistics ===============================
Total packets sent: 39
Total packets received: 42
Total packets dropped: 0
====================================================
(start to run l2fwd in container2)
./run_l2fwd.sh: line 3: 116 Segmentation fault (core dumped) ./l2fwd -l 2-3 -n 2 -w 0000:04:10.7 -w 0000:04:10.5 -- -p 0x3
container2:
# /l2fwd -l 0-1 -n 2 -w 0000:04:10.3 -w 0000:04:10.1 -- -p 0x3
EAL: Detected 4 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:04:10.1 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10ed net_ixgbe_vf
EAL: using IOMMU type 1 (Type 1)
EAL: PCI device 0000:04:10.3 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 8086:10ed net_ixgbe_vf
MAC updating enabled
Lcore 0: RX port 0
Lcore 1: RX port 1
Initializing port 0... ./run_l2fwd.sh: line 3: 90 Segmentation fault (core dumped) ./l2fwd -l 0-1 -n 2 -w 0000:04:10.3 -w 0000:04:10.1 -- -p 0x3
Mapping hugepages from files in hugetlbfs is essential for multi-process, because secondary processes need to map the same hugepages. EAL creates files like rtemap_0 in directories specified with --huge-dir option (or in the mount point for a specific hugepage size). The rte prefix can be changed using --file-prefix. This may be needed for running multiple primary processes that share a hugetlbfs mount point. Each backing file by default corresponds to one hugepage, it is opened and locked for the entire time the hugepage is used. This may exhaust the number of open files limit (NOFILE).

Cannot allocate memory :Failed to create packet memory pool (rte_pktmbuf_pool_create failed) - for port_id 0

I have upgraded DPDK from 17.02 to 21.11. RPM build was built and installed successfully. While running the custom application I saw the following error:
Cannot allocate memory#012ms_dpdk::port::port: Failed to create packet memory pool (rte_pktmbuf_pool_create failed) - for port_id
Function call parameters : rte_pktmbuf_pool_create(port-0,267008,32,0,2176,0)
I have added std::string msg = rte_strerror(rte_errno); in error logs and it gives the output as
Cannot allocate memory
LDD output shows the libraries are linked properly and there are no "no found" entries.
ldd /opt/NETAwss/proxies/proxy | grep "buf"
librte_mbuf.so.22 => /lib64/librte_mbuf.so.22 (0x00007f795873f000)
ldd /opt/NETAwss/proxies/proxy | grep "pool"
librte_mempool_ring.so.22 => /lib64/librte_mempool_ring.so.22 (0x00007f7a1da3f000)
librte_mempool.so.22 => /lib64/librte_mempool.so.22 (0x00007f7a1da09000)
igb_uio is also loaded successfully.
lsmod | grep uio
igb_uio 4190 1
uio 8202 3 igb_uio
cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
512
grep Huge /proc/meminfo
AnonHugePages: 983040 kB
ShmemHugePages: 0 kB
HugePages_Total: 512
HugePages_Free: 511
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
When I run dpdk-testpmd it seems to be working fine. Below is the output of the test application.
./dpdk-testpmd
EAL: Detected CPU lcores: 2
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:13:00.0 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:50:56:88:9A:43
Checking link statuses...
Done
No commandline core given, start packet forwarding
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
Press enter to exit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 2 RX-dropped: 0 RX-total: 2
TX-packets: 2 TX-dropped: 0 TX-total: 2
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 2 RX-dropped: 0 RX-total: 2
TX-packets: 2 TX-dropped: 0 TX-total: 2
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
Stopping port 0...
Stopping ports...
Done
Shutting down port 0...
Closing ports...
Port 0 is closed
Done
Bye...
I am not able to figure out the root cause of this error. Any help is appreciated. Thanks
Memory allocation failure happens by moving from DPDK 17.02 to 21.11. This is expected for fixed 512 * 2MB and memory requirements from custom application.
DPDK 21.11 introduces new features like telemetry, fb_arrary, MP communication sockets, service cores, which requires more internal memory allocation (not everything is from HEAP region but hugepage).
rte_pktmbuf_pool_create tries to create (267008 * 2176 + additional place holder) is about 0.8GB.
hence with the above new memory model and services, a total huge page potential shoots over 1GB MMAPED area. Currently, the Huge pages allocated in the system are 512 * 2MB only.
Solutions:
reduce the number of MBUF from 267008 to a lower value like 200000 to satisfy the memory requirement.
Increase the number of available huge pages from 512 to 600
use the new EAL to use legacy memory, no telemetry, no multiprocess, no service cores, to reduce memory footprint.
use real arg --socket-mem or -m, to fix the memory allocations.
Note: the RPM package was not initially housing libdpdk.pc. This is required for obtaining platform-specific CFLAGS and LDFLAGS.

DPDK failed to configure eth device with err No: -22, tx_queues is greater than one (Ethdev port_id=0 nb_tx_queues=2 > 1)

I have an application that uses DPDK for Fast Path. During DPDK initialization, am trying to configure two TX queues for a port, but it failed to configure the eth device.
I am using Intel IGB driver(I350 NIC) on a Bare Metal setup. As per DPDK documentation IGB Poll Mode Driver (https://doc.dpdk.org/guides/nics/igb.html) should support multiple RX/TX queues for a port. Am trying to configure two TX queues for a port(port_id = 0), when invoking the API "rte_eth_dev_configure(portid = 0, nb_rx_queue = 1, nb_tx_queue = 2, &local_port_conf)", it is returning error code: -22, "Ethdev port_id=0 nb_tx_queues=2 > 1".
Does the IGB PMD driver support multiple TX queues for a port? Or do I need to do any configuration changes to support multiple TX queues?
For virtual functions (VFs), the NIC model in question supports only single-queue mode (source).
To test multi-queue support, consider passing a physical function (PF) to the setup instead.
Intel Foundational NIC i350 assign Up to eight queues per port for Physical Funtion driver as per PRODUCT BRIEF IntelĀ® Ethernet Controller I350. With respect to virtual function per port 1 queue for each VF with a total of 8 VF max per port defined under IntelĀ® Ethernet Controller I350 Datasheet.
In Linux, these can be validated by
ethtool: One can check per queue instance with option -S, as statistics is supported per queue.
dpdk: using rte_eth_dev_info_get to get attributes max_rx_queues and max_tx_queues
Note: as VF gets allocated with 1 RX-TX queue pair, the error mentioned -22, Ethdev port_id=0 nb_tx_queues=2 > 1 can be of VF in use. Hence the right way of configuration is first get the DPDK port max rx-tx queues and ensure it is in bounds.
Sample Code Snippet:
/* Initialise each port */
RTE_ETH_FOREACH_DEV(portid) {
struct rte_eth_rxconf rxq_conf;
struct rte_eth_txconf txq_conf;
struct rte_eth_conf local_port_conf = port_conf;
struct rte_eth_dev_info dev_info;
/* skip ports that are not enabled */
if ((l2fwd_enabled_port_mask & (1 << portid)) == 0) {
printf("Skipping disabled port %u\n", portid);
continue;
}
nb_ports_available++;
/* init port */
printf("Initializing port %u... ", portid);
fflush(stdout);
ret = rte_eth_dev_info_get(portid, &dev_info);
if (ret != 0)
rte_exit(EXIT_FAILURE,
"Error during getting device (port %u) info: %s\n",
portid, strerror(-ret));
if (dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)
local_port_conf.txmode.offloads |=
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
/* Configure the number of queues for a port. */
if ((dev_info.max_rx_queues >= user_rx_queues) && (dev_info.max_tx_queues >= user_tx_queues)) {
ret = rte_eth_dev_configure(portid, user_rx_queues, user_tx_queues, &local_port_conf);
if (ret < 0)
rte_exit(EXIT_FAILURE, "Cannot configure device: err=%d, port=%u\n",
ret, portid);
/* >8 End of configuration of the number of queues for a port. */
}

DPDK sample app ipsec-secgw failing with virtio NIC

I tried running the DPDK ipsec-secgw sample app with the following versions
DPDK version dpdk-stable-19.11.5
OS CentOS Linux release 7.7.1908 (Core)
Kernel 3.10.0-1062.el7.x86_64
NIC type and driver
0000:00:04.0 'Virtio network device 1000' drv=igb_uio unused=virtio_pci,uio_pci_generic
Command and cmd line args used to run the app
./build/ipsec-secgw -l 6 -w 00:04.0 -w 00:05.0 --vdev "crypto_null" --log-level 8 \
--socket-mem 1024 -- -p 0xf -P -u 0x2 \
--config="(0,0,6),(1,0,6)" -f /root/config_file
Output:
EAL: Detected 8 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probing VFIO support...
EAL: PCI device 0000:00:04.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 1af4:1000 net_virtio
EAL: PCI device 0000:00:05.0 on NUMA socket -1
EAL: Invalid NUMA socket, default to 0
EAL: probe driver: 1af4:1000 net_virtio
CRYPTODEV: Creating cryptodev crypto_null
CRYPTODEV: Initialisation parameters - name: crypto_null,socket id: 0, max queue pairs: 8
Promiscuous mode selected
librte_ipsec usage: disabled
replay window size: 0
ESN: disabled
SA flags: 0
Frag TTL: 10000000000 ns
Allocated mbuf pool on socket 0
CRYPTODEV: elt_size 64 is expanded to 176
Allocated session pool on socket 0
Allocated session priv pool on socket 0
Configuring device port 0:
Address: 52:54:00:A5:82:2D
Creating queues: nb_rx_queue=1 nb_tx_queue=1...
EAL: Error - exiting with code: 1
Cause: Error: port 0 required RX offloads: 0xe, avaialbe RX offloads: 0xa1d
Config file contents:
#SP IPv4 rules
sp ipv4 out esp protect 1005 pri 1 dst 192.168.105.0/24 sport 0:65535 dport 0:65535
#SA rules
sa out 1005 aead_algo aes-128-gcm aead_key 2b:7e:15:16:28:ae:d2:a6:ab:f7:15:88:09:cf:4f:3d:de:ad:be:ef \
mode ipv4-tunnel src 172.16.1.5 dst 172.16.2.5 \
port_id 1 \
type inline-crypto-offload \
sa in 5 aead_algo aes-128-gcm aead_key 2b:7e:15:16:28:ae:d2:a6:ab:f7:15:88:09:cf:4f:3d:de:ad:be:ef \
mode ipv4-tunnel src 172.16.1.5 dst 172.16.2.5 \
port_id 1 \
type inline-crypto-offload \
#Routing rules
rt ipv4 dst 172.16.2.5/32 port 1
rt ipv4 dst 192.168.105.10/32 port 0
It says that certain offload capabilities are missing.
I got the config file details and command line arguments from a DPDK test plan for Niantic NICs. Is the app only supposed to work with Niantic PFs/VFs. Is there anyway to get it to work with virtio paravirtualized NICs?
Instructions link followed:
Instructions
DPDK example ipsec-gw make use of RX offload .offloads = DEV_RX_OFFLOAD_CHECKSUM. For DPDK 19.11.5 LTS following are the list of devices which supports the same
axgbe
dpaa2
e1000
enic
hinic
ixgbe
mlx4
mlx5
mvneta
mvpp2
netvsc
octeontx
octeontx2
sfc
tap
thunderx
thunderx
vmxnet3
DPDK RX Checksum offload is defined as #define DEV_RX_OFFLOAD_CHECKSUM (DEV_RX_OFFLOAD_IPV4_CKSUM | DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM). Based on the error log Cause: Error: port 0 required RX offloads: 0xe, available RX offloads: 0xa1d, looks like DEV_RX_OFFLOAD_IPV4_CKSUM is not present in the PMD.
For the question ipsec-gw only works for Niantic NIC assumption is not incorrect. Becuase IPSEC-GW application can run any NIC which has RX offload checksum available. List is shared above.
For the question Is there any way to get it to work with virtio para-virtualized NICs? one can always disable the RX_CHECKSUM and do the checksum of IPv4 in software. But you will need to edit the application and use rte_ip_cksum.

How to send and receive data using DPDK

I have a quad port Intel 1G network card. I am using DPDK to send data on one physical port and receive on another.
I saw a few examples in DPDK code, but could not make it work. If anybody knows how to do that please send me simple instructions so I can follow and understand. I setup my PC properly for huge pages, loading driver, and assigning network port to use dpdk driver etc... I can run helloworld from DPDK so system setup looks ok to me.
Thanks in advance.
temp5556
After building DPDK:
cd to the DPDK directory.
Run sudo build/app/testpmd -- --interactive
You should see output like this:
$ sudo build/app/testpmd -- --interactive
EAL: Detected 8 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Multi-process socket /var/run/.rte_unix
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0002:00:02.0 on NUMA socket 0
EAL: probe driver: 15b3:1004 net_mlx4
PMD: net_mlx4: PCI information matches, using device "mlx4_0" (VF: true)
PMD: net_mlx4: 1 port(s) detected
PMD: net_mlx4: port 1 MAC address is 00:0d:3a:f4:6e:17
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port
will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:0D:3A:F4:6E:17
Checking link statuses...
Done
testpmd>
Don't worry about the "No free hugepages" message. It means it couldn't find any 1024 MB hugepages but it since it continued OK, it must have found some 2 MB hugepages. It'd be nice if it said "EAL: Using 2 MB huge pages" instead.
At the prompt type, start tx_first, then quit. You should see something like:
testpmd> start tx_first
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0:
CRC stripping enabled
RX queues=1 - RX desc=1024 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX queues=1 - TX desc=1024 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX RS bit threshold=0 - TXQ offloads=0x0
testpmd> quit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 0 RX-dropped: 0 RX-total: 0
TX-packets: 32 TX-dropped: 0 TX-total: 32
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 0 RX-dropped: 0 RX-total: 0
TX-packets: 32 TX-dropped: 0 TX-total: 32
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In my system there is only one DPDK port, so I sent 32 packets but did not receive any. If I had a multi-port card with a cable directly between the ports, then I'd see the RX count also increase.
you can use TESTPMD to test DPDK.
TestPMD can work as a packet generator (tx_only mode) , a receiver (rx_only mode) , or a forwarder(io mode).
you will need generator nodes to be connected to your box if you are willing to use TESTPMD as a forwarder only.
I propose that you start with the following examples :
generator(pktgen) ------> testPMD (io mode )----------> recevier (testPMD rx_only mode).
at the pktgen generator specify the mac address destination which is the MAC address of the receive's receiving PORT.
PKTGEN and how it works in detail is explained more in this link :
http://pktgen.readthedocs.io/en/latest/getting_started.html
TESTPMD and how it works is explained here :
http://www.intel.com/content/dam/www/public/us/en/documents/guides/dpdk-testpmd-application-user-guide.pdf
I hope this helps.