I am using Netronome 4000 SmartNIC. After binding a VF to dpdk driver, I try to run dpdk testpmd but it shows 'Link status: down'. I also try to bring the port up in testpmd, but it also fails.
Do I need to explicitly bring up the interface/port in some way?
lspci -kd 19ee: 05:00.0 Ethernet controller: Netronome Systems, Inc. Device 4000
Subsystem: Netronome Systems, Inc. Device 4000
Kernel driver in use: nfp
Kernel modules: nfp 05:08.0 Ethernet controller: Netronome Systems, Inc. Device 6003
Subsystem: Netronome Systems, Inc. Device 4000
Kernel driver in use: igb_uio
Kernel modules: nfp
testpmd> show port info all
********************* Infos for port 0 *********************
MAC address: 00:15:4D:00:00:00
Device name: 0000:05:08.0
Driver name: net_nfp_vf
Connect to socket: 0
memory allocation on the socket: 0
Link status: down
Link speed: 0 Mbps
Link duplex: half-duplex
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 1
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 128
Supported RSS offload flow types:
ipv4
ipv4-tcp
ipv4-udp
ipv6
ipv6-tcp
ipv6-udp
Minimum size of RX buffer: 68
Maximum configurable length of RX packet: 9216
Maximum configurable size of LRO aggregated packet: 0
Current number of RX queues: 1
Max possible RX queues: 1
Max possible number of RXDs per queue: 32768
Min possible number of RXDs per queue: 64
RXDs number alignment: 128
Current number of TX queues: 1
Max possible TX queues: 1
Max possible number of TXDs per queue: 32768
Min possible number of TXDs per queue: 64
TXDs number alignment: 128
Max segment number per packet: 255
Max segment number per MTU/TSO: 8
testpmd> set link-up port 0
nfp_net_set_link_up(): Set link up
Set link up fail.
Related
Goal: To write a DPDK based application that can read UDP packets via a 100gbe ethernet port and write the payload to disk depending on what the destination IP/Port is. At most, each 100gbe link will have two different destination IP addresses and 4 unique destination port numbers. Initial design calls for two unique port numbers.
Hardware
Current Test System
For now, I am testing with the following hardware. The server hardware and NVME drives will be significantly upgraded in the next few weeks. For now, I am using the following hardware to develop a proof of concept (POC). The NIC will remain the same unless recommended otherwise.
2 x Intel Xeon Gold 6348 CPU # 2.6 Ghz
28 cores per socket
Max 3.5 Ghz
Hyperthreading disabled
Ubuntu 22.04.1 LTS
Kernel 5.15.0-53-generic
Cores set to performance governor
4 x Sabrent 2TB Rocket 4 Plus in RAID0 Config
128 GB DDR4 Memory
10 1GB HugePages (Can change to what is required)
1 x Mellanox ConnectX-5 100gbe NIC
31:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Firmware-version: 16.35.1012
UDP Source:
100 gbe NIC
9000 MTU Packets
ipv4-udp packets
Currently only 4GB/s per port but eventually will be 10GB/s per port.
NIC Information
ethtool output:
Settings for ens7f0np0:
Supported ports: [ Backplane ]
Supported link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Supported pause frame use: Symmetric
Supports auto-negotiation: Yes
Supported FEC modes: None RS BASER
Advertised link modes: 1000baseKX/Full
10000baseKR/Full
40000baseKR4/Full
40000baseCR4/Full
40000baseSR4/Full
40000baseLR4/Full
25000baseCR/Full
25000baseKR/Full
25000baseSR/Full
50000baseCR2/Full
50000baseKR2/Full
100000baseKR4/Full
100000baseSR4/Full
100000baseCR4/Full
100000baseLR4_ER4/Full
Advertised pause frame use: Symmetric
Advertised auto-negotiation: Yes
Advertised FEC modes: RS
Speed: 100000Mb/s
Duplex: Full
Auto-negotiation: on
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000004 (4)
link
Link detected: yes
testpmd info:
[sudo] password for maa:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.1 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Interactive-mode selected
testpmd: create a new mbuf pool <mb_pool_0>: n=171456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Configuring Port 0 (socket 0)
Port 0: B8:CE:F6:FB:13:30
Configuring Port 1 (socket 0)
Port 1: B8:CE:F6:FB:13:31
Checking link statuses...
Done
testpmd> show port info 0
********************* Infos for port 0 *********************
MAC address: B8:CE:F6:FB:13:30
Device name: 31:00.0
Driver name: mlx5_pci
Firmware-version: 16.35.1012
Devargs:
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 100 Gbps
Link duplex: full-duplex
Autoneg status: On
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 128
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip off, filter off, extend off, qinq strip off
Hash key size in bytes: 40
Redirection table size: 1
Supported RSS offload flow types:
ipv4
ipv4-frag
ipv4-tcp
ipv4-udp
ipv4-other
ipv6
ipv6-frag
ipv6-tcp
ipv6-udp
ipv6-other
ipv6-ex
ipv6-tcp-ex
ipv6-udp-ex
l4-dst-only
l4-src-only
l3-dst-only
l3-src-only
Minimum size of RX buffer: 32
Maximum configurable length of RX packet: 65536
Maximum configurable size of LRO aggregated packet: 65280
Current number of RX queues: 1
Max possible RX queues: 1024
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Current number of TX queues: 1
Max possible TX queues: 1024
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1
Max segment number per packet: 40
Max segment number per MTU/TSO: 40
Device capabilities: 0x14( RXQ_SHARE FLOW_SHARED_OBJECT_KEEP )
Switch name: 31:00.0
Switch domain Id: 0
Switch Port Id: 65535
Switch Rx domain: 0
RX offload capabilities:
testpmd> show port 0 rx_offload capabilities
Rx Offloading Capabilities of port 0 :
Per Queue : VLAN_STRIP IPV4_CKSUM UDP_CKSUM TCP_CKSUM TCP_LRO SCATTER TIMESTAMP KEEP_CRC RSS_HASH BUFFER_SPLIT
Per Port : VLAN_FILTER
Physical Layout
For now, I am only attempting to get a single stream on a single port working. My plan is for each queue to be tied to an lcore, and each lcore has its own thread to strip the headers, and then a thread to write to disk. Assume that the hardware is on the same NUMA node. No TX is currently required.
Requirements
Eventually 10 GB/s to disk from one 100gb/s port (Currently trying to get 4GB/s)
Assume that I will have a processor, PCI Lanes, and NVME configuration that can support this, I am only worried about the DPDK side of things for now.
Each individual stream will be ~5GB/s, currently they are ~2GB/s and I am just trying to get that to work for now.
Each stream's headers are stripped.
Each stream has its payload written to disk as a single continous file (.dat format).
Each stream has a unique port number.
If the FCS is bad, drop the packet and replace with blank data (all zeros).
What I have tried:
I am having a similar issue to the poster at this question: DPDK pdump failed to hotplug add device
I attempted the fix there and am still having issues.
I have simply tried to get a pcap file by attaching pdump to the skeleton code.
/* SPDX-License-Identifier: BSD-3-Clause
* Copyright(c) 2010-2015 Intel Corporation
*/
#include <stdint.h>
#include <inttypes.h>
#include <rte_eal.h>
#include <rte_ethdev.h>
#include <rte_cycles.h>
#include <rte_lcore.h>
#include <rte_mbuf.h>
#include <rte_pdump.h>
#define RX_RING_SIZE 1024
#define TX_RING_SIZE 1024
#define NUM_MBUFS 8191
#define MBUF_CACHE_SIZE 250
#define BURST_SIZE 32
/* basicfwd.c: Basic DPDK skeleton forwarding example. */
/*
* Initializes a given port using global settings and with the RX buffers
* coming from the mbuf_pool passed as a parameter.
*/
/* Main functional part of port initialization. 8< */
static inline int
port_init(uint16_t port, struct rte_mempool *mbuf_pool)
{
struct rte_eth_conf port_conf;
const uint16_t rx_rings = 1, tx_rings = 1;
uint16_t nb_rxd = RX_RING_SIZE;
uint16_t nb_txd = TX_RING_SIZE;
int retval;
uint16_t q;
struct rte_eth_dev_info dev_info;
struct rte_eth_txconf txconf;
if (!rte_eth_dev_is_valid_port(port))
return -1;
memset(&port_conf, 0, sizeof(struct rte_eth_conf));
retval = rte_eth_dev_info_get(port, &dev_info);
if (retval != 0) {
printf("Error during getting device (port %u) info: %s\n",
port, strerror(-retval));
return retval;
}
if (dev_info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE)
port_conf.txmode.offloads |=
RTE_ETH_TX_OFFLOAD_MBUF_FAST_FREE;
/* Configure the Ethernet device. */
retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &port_conf);
if (retval != 0)
return retval;
retval = rte_eth_dev_adjust_nb_rx_tx_desc(port, &nb_rxd, &nb_txd);
if (retval != 0)
return retval;
/* Allocate and set up 1 RX queue per Ethernet port. */
for (q = 0; q < rx_rings; q++) {
retval = rte_eth_rx_queue_setup(port, q, nb_rxd,
rte_eth_dev_socket_id(port), NULL, mbuf_pool);
if (retval < 0)
return retval;
}
txconf = dev_info.default_txconf;
txconf.offloads = port_conf.txmode.offloads;
/* Allocate and set up 1 TX queue per Ethernet port. */
for (q = 0; q < tx_rings; q++) {
retval = rte_eth_tx_queue_setup(port, q, nb_txd,
rte_eth_dev_socket_id(port), &txconf);
if (retval < 0)
return retval;
}
/* Starting Ethernet port. 8< */
retval = rte_eth_dev_start(port);
/* >8 End of starting of ethernet port. */
if (retval < 0)
return retval;
/* Display the port MAC address. */
struct rte_ether_addr addr;
retval = rte_eth_macaddr_get(port, &addr);
if (retval != 0)
return retval;
printf("Port %u MAC: %02" PRIx8 " %02" PRIx8 " %02" PRIx8
" %02" PRIx8 " %02" PRIx8 " %02" PRIx8 "\n",
port, RTE_ETHER_ADDR_BYTES(&addr));
/* Enable RX in promiscuous mode for the Ethernet device. */
retval = rte_eth_promiscuous_enable(port);
/* End of setting RX port in promiscuous mode. */
if (retval != 0)
return retval;
return 0;
}
/* >8 End of main functional part of port initialization. */
/*
* The lcore main. This is the main thread that does the work, reading from
* an input port and writing to an output port.
*/
/* Basic forwarding application lcore. 8< */
static __rte_noreturn void
lcore_main(void)
{
uint16_t port;
int total = 0;
/*
* Check that the port is on the same NUMA node as the polling thread
* for best performance.
*/
RTE_ETH_FOREACH_DEV(port)
if (rte_eth_dev_socket_id(port) >= 0 &&
rte_eth_dev_socket_id(port) !=
(int)rte_socket_id())
printf("WARNING, port %u is on remote NUMA node to "
"polling thread.\n\tPerformance will "
"not be optimal.\n", port);
printf("\nCore %u forwarding packets. [Ctrl+C to quit]\n",
rte_lcore_id());
/* Main work of application loop. 8< */
for (;;) {
/*
* Receive packets on a port and forward them on the paired
* port. The mapping is 0 -> 1, 1 -> 0, 2 -> 3, 3 -> 2, etc.
*/
RTE_ETH_FOREACH_DEV(port) {
/* Get burst of RX packets, from first port of pair. */
struct rte_mbuf *bufs[BURST_SIZE];
const uint16_t nb_rx = rte_eth_rx_burst(port, 0,
bufs, BURST_SIZE);
if (unlikely(nb_rx == 0))
continue;
total += nb_rx;
//rte_pktmbuf_free(bufs);
//printf("\nTotal: %d",total);
}
}
/* >8 End of loop. */
}
/* >8 End Basic forwarding application lcore. */
/*
* The main function, which does initialization and calls the per-lcore
* functions.
*/
int
main(int argc, char *argv[])
{
struct rte_mempool *mbuf_pool;
unsigned nb_ports;
uint16_t portid;
/* Initializion the Environment Abstraction Layer (EAL). 8< */
int ret = rte_eal_init(argc, argv);
rte_pdump_init();
if (ret < 0)
rte_exit(EXIT_FAILURE, "Error with EAL initialization\n");
/* >8 End of initialization the Environment Abstraction Layer (EAL). */
argc -= ret;
argv += ret;
/* Check that there is an even number of ports to send/receive on. */
nb_ports = rte_eth_dev_count_avail();
// if (nb_ports < 2 || (nb_ports & 1))
// rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n");
/* Creates a new mempool in memory to hold the mbufs. */
/* Allocates mempool to hold the mbufs. 8< */
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS * nb_ports,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
/* >8 End of allocating mempool to hold mbuf. */
if (mbuf_pool == NULL)
rte_exit(EXIT_FAILURE, "Cannot create mbuf pool\n");
/* Initializing all ports. 8< */
RTE_ETH_FOREACH_DEV(portid)
if (port_init(portid, mbuf_pool) != 0)
rte_exit(EXIT_FAILURE, "Cannot init port %"PRIu16 "\n",
portid);
/* >8 End of initializing all ports. */
if (rte_lcore_count() > 1)
printf("\nWARNING: Too many lcores enabled. Only 1 used.\n");
/* Call lcore_main on the main core only. Called on single lcore. 8< */
lcore_main();
/* >8 End of called on single lcore. */
/* clean up the EAL */
rte_eal_cleanup();
rte_pdump_uninit();
return 0;
}
Command to run this example that I am using: sudo ./dpdk-skeleton -l 1,2,3,4 -n 4 -a 0000:31:00.0
Command running:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
Port 0 MAC: b8 ce f6 fb 13 30
WARNING: Too many lcores enabled. Only 1 used.
Core 1 forwarding packets. [Ctrl+C to quit]
I am not really sure what some command line values should be (like -n) so I am just guessing at this point.
For pdump, I am using this command sudo ./dpdk-pdump -l 3,4,5 -a 0000:31:00.0 -- --multi --pdump 'port=0,queue=0,rx-dev=/mnt/md0/rx-1.pcap'
Again I am not sure of some of these values so making a best guess.
The Issue
Now with the primary application running, I attempt to run pdump the first time:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57567_26f06531cb8cd
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
Segmentation fault
The Second Time:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57601_26f14f88bf7bb
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
EAL: Failed to hotplug add device
EAL: Error - exiting with code: 1
Cause: vdev creation failed:create_mp_ring_vdev:767
And finally, the third time seems to run but has an unfamiliar MAC address:
EAL: Detected CPU lcores: 56
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket_57679_26f28a2a6e9f1
EAL: Selected IOVA mode 'PA'
EAL: VFIO support initialized
EAL: Probe PCI driver: mlx5_pci (15b3:1017) device: 0000:31:00.0 (socket 0)
Port 1 MAC: 02 70 63 61 70 01
core (4); port 0 device ((null)) queue 0
However, I am not receiving anything and the pcap file is empty (and yes I am sending packets):
Signal 2 received, preparing to exit...
##### PDUMP DEBUG STATS #####
-packets dequeued: 0
-packets transmitted to vdev: 0
-packets freed: 0
Also, on the second run, these messages start appearing on the primary application:
EAL: Failed to send hotplug request to secondary
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
lcore 1 called rx_pkt_burst for not ready port 1
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
lcore 1 called rx_pkt_burst for not ready port 1
6: [./dpdk-skeleton(_start+0x25) [0x55f70a909045]]
5: [/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7fb90e478e40]]
4: [/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7fb90e478d90]]
3: [./dpdk-skeleton(main+0x1ad) [0x55f70a20a8ad]]
2: [./dpdk-skeleton(+0x1fdde7) [0x55f70a014de7]]
1: [./dpdk-skeleton(rte_dump_stack+0x32) [0x55f70aa973d2]]
My Questions
I will handle writing to file etc. later. I am simply trying to get the data into DPDK properly and to the correct queue at this point.
I assume that I need to use RSS or rte_flow to direct each stream to its own queue based on IP/Port. How do I do this? This is my first time doing any kind of advanced network programming so I am not familiar with these concepts, but based on my research they appear to be what I need.
What is the difference between RSS and rte_flow/offloading? Are they the same thing?
Are there any examples out there that are similar to what I am trying to do?
Is what I am trying to do feasible?
First part of question:
If the goal to capture packets at line rate and write the descriptor and met-data into NVME, then please do not use the testpmd-pdump model. this will not scale for your use case intent.
To clarify the above
DPDK allows a multiprocess model - Single primary and multiple secondary to share same hugepage, hw resources and NIC.
DPDK testpmd and dpdk pdump is a prime example of using priamry-secondary model
BUT PDUMP library as of today make use of huge page sharing to clone-copy the packet received from PRIMARY (testpmd) to send it over RING buffer to SECODNARY (PDUMP) and then write to file using PCAP
PDUMP application is enahced to support multiple queues on multiple CPU, but one will still have inherent overhead of packet copy and send over ring buffer.
hence for the above use case mentioned, best approach to have the Primary custom application to receive packets over multiple queues (4 streams - then 4 queues) for UDP. Take as much of help from HW (MLX NIC CX05) for packet filtering using rte_flow
based flow bifurcation and type for packet classification to minimize the CPU overhead in parsing.
The second part of the question:
you are using DPDK Skeleton as primary
there is only 1RX-TX queue configured for the port.
due to the inherent design of CX-5 max that 1 queue can do is around 35Mpps to 40Mpps with 1 queue with the right settings.
for 100Gbps port with 1 queue you can achieve line rate for simple forward for a payload size of 512B onwards
From the code snippet I do find rte_pdump_init invoked but returned not checked (please fix).
I believe MLX CX-5 works with PDUMP with the right firmware, hence please cross-check the same too (information requested over comment).
But the recommendation is not to use PDUMP as secondary due to native overhead to MP_HANDLE. Instead directly consume the paylaod in primary or secondary.
Note:
as mentioned there are 3 or 4 scenarios cramped into a single question, and requires live debug for sorting these out 1 by 1.
please do not club multiple questions and scenario into 1 question
I have upgraded DPDK from 17.02 to 21.11. RPM build was built and installed successfully. While running the custom application I saw the following error:
Cannot allocate memory#012ms_dpdk::port::port: Failed to create packet memory pool (rte_pktmbuf_pool_create failed) - for port_id
Function call parameters : rte_pktmbuf_pool_create(port-0,267008,32,0,2176,0)
I have added std::string msg = rte_strerror(rte_errno); in error logs and it gives the output as
Cannot allocate memory
LDD output shows the libraries are linked properly and there are no "no found" entries.
ldd /opt/NETAwss/proxies/proxy | grep "buf"
librte_mbuf.so.22 => /lib64/librte_mbuf.so.22 (0x00007f795873f000)
ldd /opt/NETAwss/proxies/proxy | grep "pool"
librte_mempool_ring.so.22 => /lib64/librte_mempool_ring.so.22 (0x00007f7a1da3f000)
librte_mempool.so.22 => /lib64/librte_mempool.so.22 (0x00007f7a1da09000)
igb_uio is also loaded successfully.
lsmod | grep uio
igb_uio 4190 1
uio 8202 3 igb_uio
cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
512
grep Huge /proc/meminfo
AnonHugePages: 983040 kB
ShmemHugePages: 0 kB
HugePages_Total: 512
HugePages_Free: 511
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
When I run dpdk-testpmd it seems to be working fine. Below is the output of the test application.
./dpdk-testpmd
EAL: Detected CPU lcores: 2
EAL: Detected NUMA nodes: 1
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: Probe PCI driver: net_vmxnet3 (15ad:7b0) device: 0000:13:00.0 (socket 0)
TELEMETRY: No legacy callbacks, legacy socket not created
testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:50:56:88:9A:43
Checking link statuses...
Done
No commandline core given, start packet forwarding
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP allocation mode: native
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0: RX queue number: 1 Tx queue number: 1
Rx offloads=0x0 Tx offloads=0x0
RX queue: 0
RX desc=0 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
RX Offloads=0x0
TX queue: 0
TX desc=0 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX offloads=0x0 - TX RS bit threshold=0
Press enter to exit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 2 RX-dropped: 0 RX-total: 2
TX-packets: 2 TX-dropped: 0 TX-total: 2
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 2 RX-dropped: 0 RX-total: 2
TX-packets: 2 TX-dropped: 0 TX-total: 2
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Done.
Stopping port 0...
Stopping ports...
Done
Shutting down port 0...
Closing ports...
Port 0 is closed
Done
Bye...
I am not able to figure out the root cause of this error. Any help is appreciated. Thanks
Memory allocation failure happens by moving from DPDK 17.02 to 21.11. This is expected for fixed 512 * 2MB and memory requirements from custom application.
DPDK 21.11 introduces new features like telemetry, fb_arrary, MP communication sockets, service cores, which requires more internal memory allocation (not everything is from HEAP region but hugepage).
rte_pktmbuf_pool_create tries to create (267008 * 2176 + additional place holder) is about 0.8GB.
hence with the above new memory model and services, a total huge page potential shoots over 1GB MMAPED area. Currently, the Huge pages allocated in the system are 512 * 2MB only.
Solutions:
reduce the number of MBUF from 267008 to a lower value like 200000 to satisfy the memory requirement.
Increase the number of available huge pages from 512 to 600
use the new EAL to use legacy memory, no telemetry, no multiprocess, no service cores, to reduce memory footprint.
use real arg --socket-mem or -m, to fix the memory allocations.
Note: the RPM package was not initially housing libdpdk.pc. This is required for obtaining platform-specific CFLAGS and LDFLAGS.
I've been playing around with DPDK and trying to create the following scenario.
I have 2 identical servers with 2 different NICs each. The goal is from server1 to send packets in different rates (up to the link maximum using DPDK), and capture the packets on the other side where an app will be running.
On server1 one NIC (Netronome) is taken by DPDK, but on server 2 it's not. The NICs are directly connected with fiber.
On server1 I run
./dpdk-devbind.py --bind=vfio-pci 0000:05:00.0
and then pktgen. It appears to be working (packets are being reported as sent by pktgen). However on the other side (server2), the inteface goes down the moment I devbind. From:
Settings for enp6s0np1:
Supported ports: [ FIBRE ]
Supported link modes: Not reported
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Supported FEC modes: None
Advertised link modes: Not reported
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: None
Speed: 40000Mb/s
Duplex: Full
Auto-negotiation: off
Port: Direct Attach Copper
PHYAD: 0
Transceiver: internal
Link detected: yes
it goes to:
ethtool enp6s0np1
Settings for enp6s0np1:
Supported ports: [ FIBRE ]
Supported link modes: Not reported
Supported pause frame use: Symmetric
Supports auto-negotiation: No
Supported FEC modes: None
Advertised link modes: Not reported
Advertised pause frame use: Symmetric
Advertised auto-negotiation: No
Advertised FEC modes: None
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: off
Port: Other
PHYAD: 0
Transceiver: internal
Link detected: no
It thinks that there is no physical connection between the two NICs and obviously this is not the case.
enp6s0np1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 00:15:4d:13:30:5c brd ff:ff:ff:ff:ff:ff
The pktgen output:
\ Ports 0-1 of 2 <Main Page> Copyright(c) <2010-2021>, Intel Corporation
Flags:Port : P------Sngl :0
Link State : <--Down--> ---Total Rate---
Pkts/s Rx : 0 0
Tx : 14,976 14,976
MBits/s Rx/Tx : 0/10 0/10
Pkts/s Rx Max : 0 0
Tx Max : 15,104 15,104
Broadcast : 0
Multicast : 0
Sizes 64 : 0
65-127 : 0
128-255 : 0
256-511 : 0
512-1023 : 0
1024-1518 : 0
Runts/Jumbos : 0/0
ARP/ICMP Pkts : 0/0
Errors Rx/Tx : 0/0
Total Rx Pkts : 0
Tx Pkts : 2,579,072
Rx/Tx MBs : 0/1,733
TCP Flags : .A....
TCP Seq/Ack : 305419896/305419920
Pattern Type : abcd...
Tx Count/% Rate : Forever /0.1%
Pkt Size/Tx Burst : 64 / 128
TTL/Port Src/Dest : 10/ 1234/ 8000
Pkt Type:VLAN ID : IPv4 / UDP:0001
802.1p CoS/DSCP/IPP : 0/ 0/ 0
VxLAN Flg/Grp/vid : 0000/ 0/ 0
IP Destination : 192.168.0.2
Source : 192.168.0.1
MAC Destination : 00:15:4d:13:30:5c
Source : 00:15:4d:13:30:81
PCI Vendor/Addr : 19ee:4000/05:00.0
when I try to capture with tcpdump -i enp6s0np1, it doesn't record anything. Are those issues related and if yes, is there any workaround? Shouldn't some packets be captured by tcpdump on server2?
Nvm this got resolved. Apparently the NIC has two ports and only one was connected. Therefore the issue was the Link State down below.
Ports 0-1 of 2 <Main Page> Copyright(c) <2010-2021>, Intel Corporation
Flags:Port : P------Sngl :0
Link State : <--Down-->
To resolve it I had to use port 1 instead of port 0
Hello Stackoverflow Experts,
I am using DPDK on Mellanox NIC, but am struggling with applying the packet
fragmentation in DPDK application.
sungho#c3n24:~$ lspci | grep Mellanox
81:00.0 Ethernet controller: Mellanox Technologies MT27500 Family
[ConnectX-3]
the dpdk application(l3fwd, ip-fragmentation, ip-assemble) did not
recognized the received packet as the ipv4 header.
At first, I have crafted my own packets when sending ipv4 headers so I
assumed that I was crafting the packets in a wrong way.
So I have used DPDK-pktgen but dpdk-application (l3fwd, ip-fragmentation,
ip-assemble) did not recognized the ipv4 header.
As the last resort, I have tested the dpdk-testpmd, and found out this in
the status info.
********************* Infos for port 1 *********************
MAC address: E4:1D:2D:D9:CB:81
Driver name: net_mlx4
Connect to socket: 1
memory allocation on the socket: 1
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
MTU: 1500
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 127
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
strip on
filter on
qinq(extend) off
No flow type is supported.
Max possible RX queues: 65408
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Max possible TX queues: 65408
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1
testpmd> show port
According to DPDK documentation.
in the flow type of the info status of port 1 should show, but mine shows
that no flow type is supported.
The below example should be the one that needs to be displayed in flow types:
Supported flow types:
ipv4-frag
ipv4-tcp
ipv4-udp
ipv4-sctp
ipv4-other
ipv6-frag
ipv6-tcp
ipv6-udp
ipv6-sctp
ipv6-other
l2_payload
port
vxlan
geneve
nvgre
So Is my NIC, Mellanox Connect X-3 does not support DPDK IP fragmentation? Or is
there additional configuration that needs to be done before trying out the packet fragmentation?
-- [EDIT]
So I have checked the packets from DPDK-PKTGEN and the packets received by DPDK application.
The packets that I receive is the exact one that I have sent from the application. (I get the correct data)
The problem begins at the code
struct rte_mbuf *pkt
RTE_ETH_IS_IPV4_HDR(pkt->packet_type)
This determines the whether the packet is ipv4 or not.
and the value of pkt->packet_type is both zero from DPDK-PKTGEN and DPDK application. and if the pkt-packet_type is zero then the DPDK application reviews this packet as NOT IPV4 header.
This basic type checker is wrong from the start.
So what I believe is that either the DPDK sample is wrong or the NIC cannot support ipv4 for some reason.
The data I received have some pattern at the beginning I receive the correct message but after that sequence of packets have different data between the MAC address and the data offset
So what I assume is they are interpreting the data differently, and getting the wrong result.
I am pretty sure any NIC, including Mellanox ConnectX-3 MUST support ip fragments.
The flow type you are referring is for the Flow Director, i.e. mapping specific flows to specific RX queues. Even if your NIC does not support flow director, it does not matter for the IP fragmentation.
I guess there is an error in the setup or in the app. You wrote:
the dpdk application did not recognized the received packet as the ipv4 header.
I would look into this more closely. Try to dump those packets with dpdk-pdump or even by simply dumping the receiving packet on the console with rte_pktmbuf_dump()
If you still suspect the NIC, the best option would be to temporary substitute it with another brand or a virtual device. Just to confirm it is the NIC indeed.
EDIT:
Have a look at mlx4_ptype_table for fragmented IPv4 packets it should return packet_type set to RTE_PTYPE_L2_ETHER | RTE_PTYPE_L3_IPV4_EXT_UNKNOWN | RTE_PTYPE_L4_FRAG
Please note the functionality was added in DPDK 17.11.
I suggest you to dump pkt->packet_type on console to make sure it is zero indeed. Also make sure you have the latest libmlx4 installed.
I have a quad port Intel 1G network card. I am using DPDK to send data on one physical port and receive on another.
I saw a few examples in DPDK code, but could not make it work. If anybody knows how to do that please send me simple instructions so I can follow and understand. I setup my PC properly for huge pages, loading driver, and assigning network port to use dpdk driver etc... I can run helloworld from DPDK so system setup looks ok to me.
Thanks in advance.
temp5556
After building DPDK:
cd to the DPDK directory.
Run sudo build/app/testpmd -- --interactive
You should see output like this:
$ sudo build/app/testpmd -- --interactive
EAL: Detected 8 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Multi-process socket /var/run/.rte_unix
EAL: Probing VFIO support...
EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
EAL: PCI device 0002:00:02.0 on NUMA socket 0
EAL: probe driver: 15b3:1004 net_mlx4
PMD: net_mlx4: PCI information matches, using device "mlx4_0" (VF: true)
PMD: net_mlx4: 1 port(s) detected
PMD: net_mlx4: port 1 MAC address is 00:0d:3a:f4:6e:17
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=203456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
Warning! port-topology=paired and odd forward ports number, the last port
will pair with itself.
Configuring Port 0 (socket 0)
Port 0: 00:0D:3A:F4:6E:17
Checking link statuses...
Done
testpmd>
Don't worry about the "No free hugepages" message. It means it couldn't find any 1024 MB hugepages but it since it continued OK, it must have found some 2 MB hugepages. It'd be nice if it said "EAL: Using 2 MB huge pages" instead.
At the prompt type, start tx_first, then quit. You should see something like:
testpmd> start tx_first
io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support enabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 1 streams:
RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
io packet forwarding packets/burst=32
nb forwarding cores=1 - nb forwarding ports=1
port 0:
CRC stripping enabled
RX queues=1 - RX desc=1024 - RX free threshold=0
RX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX queues=1 - TX desc=1024 - TX free threshold=0
TX threshold registers: pthresh=0 hthresh=0 wthresh=0
TX RS bit threshold=0 - TXQ offloads=0x0
testpmd> quit
Telling cores to stop...
Waiting for lcores to finish...
---------------------- Forward statistics for port 0 ----------------------
RX-packets: 0 RX-dropped: 0 RX-total: 0
TX-packets: 32 TX-dropped: 0 TX-total: 32
----------------------------------------------------------------------------
+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++
RX-packets: 0 RX-dropped: 0 RX-total: 0
TX-packets: 32 TX-dropped: 0 TX-total: 32
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In my system there is only one DPDK port, so I sent 32 packets but did not receive any. If I had a multi-port card with a cable directly between the ports, then I'd see the RX count also increase.
you can use TESTPMD to test DPDK.
TestPMD can work as a packet generator (tx_only mode) , a receiver (rx_only mode) , or a forwarder(io mode).
you will need generator nodes to be connected to your box if you are willing to use TESTPMD as a forwarder only.
I propose that you start with the following examples :
generator(pktgen) ------> testPMD (io mode )----------> recevier (testPMD rx_only mode).
at the pktgen generator specify the mac address destination which is the MAC address of the receive's receiving PORT.
PKTGEN and how it works in detail is explained more in this link :
http://pktgen.readthedocs.io/en/latest/getting_started.html
TESTPMD and how it works is explained here :
http://www.intel.com/content/dam/www/public/us/en/documents/guides/dpdk-testpmd-application-user-guide.pdf
I hope this helps.