rte_eth_tx_burst suddenly stops sending packets - c++

I am using DPDK 21.11 for my application. After a certain time, the the API rte_eth_tx_burst stops sending any packets out.
Ethernet Controller X710 for 10GbE SFP+ 1572
drv=vfio-pci
MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3
do
{
num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);
pkt_count -= num_sent_pkt;
retry_count++;
} while(pkt_count && (retry_count != MAX_RETRY_COUNT_RTE_ETH_TX_BURST));
To debug, I tried to use telemetry to print out the xstats. However, i do not see any errors.
--> /ethdev/xstats,1
{"/ethdev/xstats": {"rx_good_packets": 97727, "tx_good_packets": 157902622, "rx_good_bytes": 6459916, "tx_good_bytes": 229590348448, "rx_missed_errors": 0, "rx_errors": 0, "tx_errors": 0, "rx_mbuf_allocation_errors": 0, "rx_unicast_packets": 95827, "rx_multicast_packets": 1901, "rx_broadcast_packets": 0, "rx_dropped_packets": 0, "rx_unknown_protocol_packets": 97728, "rx_size_error_packets": 0, "tx_unicast_packets": 157902621, "tx_multicast_packets": 0, "tx_broadcast_packets": 1, "tx_dropped_packets": 0, "tx_link_down_dropped": 0, "rx_crc_errors": 0, "rx_illegal_byte_errors": 0, "rx_error_bytes": 0, "mac_local_errors": 0, "mac_remote_errors": 0, "rx_length_errors": 0, "tx_xon_packets": 0, "rx_xon_packets": 0, "tx_xoff_packets": 0, "rx_xoff_packets": 0, "rx_size_64_packets": 967, "rx_size_65_to_127_packets": 96697, "rx_size_128_to_255_packets": 0, "rx_size_256_to_511_packets": 64, "rx_size_512_to_1023_packets": 0, "rx_size_1024_to_1522_packets": 0, "rx_size_1523_to_max_packets": 0, "rx_undersized_errors": 0, "rx_oversize_errors": 0, "rx_mac_short_dropped": 0, "rx_fragmented_errors": 0, "rx_jabber_errors": 0, "tx_size_64_packets": 0, "tx_size_65_to_127_packets": 46, "tx_size_128_to_255_packets": 0, "tx_size_256_to_511_packets": 0, "tx_size_512_to_1023_packets": 0, "tx_size_1024_to_1522_packets": 157902576, "tx_size_1523_to_max_packets": 0, "rx_flow_director_atr_match_packets": 0, "rx_flow_director_sb_match_packets": 13, "tx_low_power_idle_status": 0, "rx_low_power_idle_status": 0, "tx_low_power_idle_count": 0, "rx_low_power_idle_count": 0, "rx_priority0_xon_packets": 0, "rx_priority1_xon_packets": 0, "rx_priority2_xon_packets": 0, "rx_priority3_xon_packets": 0, "rx_priority4_xon_packets": 0, "rx_priority5_xon_packets": 0, "rx_priority6_xon_packets": 0, "rx_priority7_xon_packets": 0, "rx_priority0_xoff_packets": 0, "rx_priority1_xoff_packets": 0, "rx_priority2_xoff_packets": 0, "rx_priority3_xoff_packets": 0, "rx_priority4_xoff_packets": 0, "rx_priority5_xoff_packets": 0, "rx_priority6_xoff_packets": 0, "rx_priority7_xoff_packets": 0, "tx_priority0_xon_packets": 0, "tx_priority1_xon_packets": 0, "tx_priority2_xon_packets": 0, "tx_priority3_xon_packets": 0, "tx_priority4_xon_packets": 0, "tx_priority5_xon_packets": 0, "tx_priority6_xon_packets": 0, "tx_priority7_xon_packets": 0, "tx_priority0_xoff_packets": 0, "tx_priority1_xoff_packets": 0, "tx_priority2_xoff_packets": 0, "tx_priority3_xoff_packets": 0, "tx_priority4_xoff_packets": 0, "tx_priority5_xoff_packets": 0, "tx_priority6_xoff_packets": 0, "tx_priority7_xoff_packets": 0, "tx_priority0_xon_to_xoff_packets": 0, "tx_priority1_xon_to_xoff_packets": 0, "tx_priority2_xon_to_xoff_packets": 0, "tx_priority3_xon_to_xoff_packets": 0, "tx_priority4_xon_to_xoff_packets": 0, "tx_priority5_xon_to_xoff_packets": 0, "tx_priority6_xon_to_xoff_packets": 0, "tx_priority7_xon_to_xoff_packets": 0}}
I have RX-DESC = 128 and TX-DESC = 512 configured.
I am assuming there is some desc leak, is there a way to know if the drop is due to no-desc present? Which counter should I check for that"?
[More Info]
Debugging refcnt lead to a deadend.
Following the code, it seems that the NIC card does not set the DONE status on the descriptor.
When rte_eth_tx_burst is called, the next func internally calls i40e_xmit_pkts -> i40e_xmit_cleanup
When the issue occurs, the following condition fails leading to NIC failure in sending packets out.
if ((txd[desc_to_clean_to].cmd_type_offset_bsz &
rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) !=
rte_cpu_to_le_64(I40E_TX_DESC_DTYPE_DESC_DONE)) {
PMD_TX_LOG(DEBUG, "TX descriptor %4u is not done "
"(port=%d queue=%d)", desc_to_clean_to,
txq->port_id, txq->queue_id);
return -1;
}
If I comment out the "return -1" (ofcourse not the fix and will lead to other issues) ..but I can see that traffic is stable for a long long time.
I tracked all the mbuf from start of traffic till issue is hit,there is no issue seen atleast in mbuf that I could see.
I40E_TX_DESC_DTYPE_DESC_DONE will be set in h/w for the descriptor. Is there any way I can see that code? Is it part of x710 driver code?
I still doubt my own code since the issue is present even after NIC card is replaced.
However, how can my code effect NIC card not modifying the DONE status of descriptor?
Any suggestions would really be helpful.
[UPDATE]
Found out that 2 cores were using the same TX queueID to send packets.
Data processing and TX core
ARP req/response by Data RX core
This lead to some potential corruption ?
Found some info on this:
http://mails.dpdk.org/archives/dev/2014-January/001077.html
After creating separate queue for ARP messages, issue is not seen anymore/yet for 2+hours

[EDIT-2] the error is narrowed down to multiple threads are using same portid-queueid pair which causes the stalls in the NIC from XMIT. Earlier the debug was not focusing on slow path (ARP reply) hence this was missed out.
[Edit-1] based on the limited debug opportunities and updates from the message, the updates are
The internal TX code updates refcnt by 2 (that is refcnt is 3).
Once the reply is received the refcnt is decremented by 2
Corner cases are now addressed for mbuf_free
Tested on RHEL and Centos both has issues, hence it is software and not os
updated the NIC firmware, now all platforms consistently shows error after a couple of hours of the run.
Note:
hence all pointers lead to code and corner case handling gaps since testpmd|l2fwd|l3fwd does not show the case the error with DPDK library or platform.
Since the code base is not shared, only option is to rely on updates.
hence after extensive debugging and analysis, the root cause of the issue is not DPDK, NIC or platform but GAP in the code being used.
If the code's intent is to try within MAX_RETRY_COUNT_RTE_ETH_TX_BURST for all packets of pkt_count, the current code snippet needs a few corrections. Let me explain
mbuf is the array of valid packets to be TX
mbuf_idx represents the current index to be sent for TX
pkt_count represents the number of packets sent out in the current attempt.
num_sent_pkt represents actual packets sent for DMA copy to NIC (physical).
retry_count is the local variable keeping count of retries.
there are 2 corner cases to be taken care of (not shared in the current snippet)
If MAX_RETRY_COUNT_RTE_ETH_TX_BURST is exceeded and num_sent_pkt is not equal to actual TX, at end of the while loop one needs to free up the non-transmitted MBUF.
If there are any MBUF with ref_cnt greater than 1 (especially with multicast or broadcast or packet duplication) one needs a mechanism free those too.
A possible code snippet could be:
MAX_RETRY_COUNT_RTE_ETH_TX_BURST 3
retry_count = 0;
mbuf_idx = 0;
pkt_count = try_sent; /* try_sent intended send*/
/* if there are any mbuf with ref_cnt > 1, we need separate logic to handle those */
do {
num_sent_pkt = rte_eth_tx_burst(eth_port_id, queue_id, &mbuf[mbuf_idx], pkt_count);
pkt_count -= num_sent_pkt;
mbuf_idx += num_sent_pkt;
retry_count++;
} while((pkt_count) && (retry_count < MAX_RETRY_COUNT_RTE_ETH_TX_BURST));
/* to prevent the leak for unsent packet*/
if (pkt_count) {
rte_pktmbuf_free_bulk(&mbuf[mbuf_idx], pkt_count);
}
note: the easiest way to identify mbuf leak is to run DPDK secondary process proc-info to check for mbuf free count.
[EDIT-1] based on the debug, it has been identified that the recent is indeed greater than 1. Accumulating such corner cases lead to mempool depletion.
logs:
dump mbuf at 0x2b67803c0, iova=0x2b6780440, buf_len=9344
pkt_len=1454, ol_flags=0x180, nb_segs=1, port=0, ptype=0x291
segment at 0x2b67803c0, data=0x2b67804b8, len=1454, off=120, refcnt=3

Related

DPDK packet lost and disorder

I did a simple test program using DPDK: program1 in computer1 is to send packet, and program2 in computer2 is to receive packet. computer1 and computer2 are directly connected, no switch.
In program 1, I use a packId to indicate the sequence of packet id.
while(true){
pkt = rte_pktmbuf_alloc(mbuf_pool);
uint8_t* pchar = rte_pktmbuf_mod(pkt, uint8_t*);
//set mac address and packet length. (pkt 0 to pkt 13).
//use from byte 14 to store uint_64 packId;
uint64_t* pPackId = (uint64_t*)(pchar+14);
*pPackId = packId;
packId++;
//put 1024 bytes data inside packet.
uint16_t sent = rte_eth_tx_burst(0, 0, &pkt, 1);
while(sent!=1)
{
sent = rte_eth_tx_burst(0, 0, &pkt, 1);
}
}
In receiver, i define long RX ring: nb_rxd=3072:
rte_eth_dev_adjust_nb_rx_tx_desc(0, &nb_rxd, &nb_txd);
rte_eth_rx_queue_setup(0, 0, nb_rxd, rte_eth_dev_socket_id(0), NULL, mbuf_pool);
There is a for loop to receive packets, and check packet sequence id.
for(;;)
{
strcut rte_mbuf *bufs[32];
const uint16_t nb_rx = rte_eth_rx_burst(0, 0, bus, 32);
if(unlikely(nb_rx==0))
continue;
int m = 0;
for (m=0; m<nb_rx;m++)
{
uint8_t* pchar = rte_pktmbuf_mtod(buf[m], uint8_t*);
uint64_t* pPackId = pchar+14;
uint64_t packid = *pPackId;
if(expectedPackid!=packid){
printf...
expectedPackid = packid+1;
}
else expectedPackid++;
}
}
Based on program2, I see a lot of packet lost and disorder. The received packet is put inside the ring buffer. Should it receive in order, and I also find there is packet lost, but my program1's sending speed is only around 1gbps.
rte_eth_stats_get() is very useful for troubleshooting. From the rte_eth_stats, I found ipackets is correct, q_ipackets[0] is correct, and imissed is 0, ierrors is 0, rx_nombuf is 0, q_errors[0] is 0. So it should be codes in program2 has problem. After check codes, it is because some memory management in program2.

How can I manually set custom sound card device for espeak in C++?

I have already written following codes and the program can speak.
espeak_Initialize(AUDIO_OUTPUT_PLAYBACK, 0, NULL, 0);
espeak_SetParameter(espeakWORDGAP,7,0);
espeak_SetParameter(espeakCAPITALS,20,0);
espeak_SetVoiceByName("en-gb");
espeak_Synth(s.c_str(), s.length() + 1, 0, POS_CHARACTER, 0,
espeakCHARS_UTF8, NULL, NULL);
However, I got two sound cards in my machine(audio and audio1 in /dev) Can I manually set which device to play the sound in program? Many Thanks.

clGetProfilingEventInfo: How to get multiple profiling info?

I would like to get profiling info. My commandqueue is already enabled for profiling.
This is my code:
status = clEnqueueNDRangeKernel(
commandQueue,
kernl,
2,
NULL,
globalThreads,
localThreads,
0,
NULL,
&ndrEvt);
CHECK_OPENCL_ERROR(status, "clEnqueueNDRangeKernel failed.");
//Won't proceed ahead if all work-items have not finished processing; Synchronization point
status = clFinish(commandQueue);
CHECK_OPENCL_ERROR(status, "clFlush failed.");
//fetch performance data
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_QUEUED, sizeof(cl_ulong), &time_start2, NULL);
//clRetainEvent(ndrEvt);
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_SUBMIT, sizeof(cl_ulong), &time_end2, NULL);
single_exec_time2 = time_end2 - time_start2;
clGetEventProfilingInfo(ndrEvt, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &time_end, NULL);
single_exec_time = time_end - time_start2;
Signle_exec_time_2 shows me a correct result, but single_exec_time = 0.
I think the problem lies in event handling, the reference count of ndrEvt goes to zero.
I've tried to introduce clRetainEvent(ndrEvt) (you can see it as comment) and it "worked", so at this point I'm wondering if introducing clRetainEvent() would give me the right result?

GDB is killing my inferior process

GDB is killing my inferior. Inferior is a long-running (20-30 minutes) benchmark. GDB and inferior are both running under my uid. Runs fine for a while then my signal handler is called with a siginfo_t instance with si_signo = 11, si_errno = 0 and si_code = 0; _sifields._kill.si_pid = (gdb-pid), _sifields._kill.si_uid = (my-uid).
I read this as GDB decided to send a kill signal to my inferior process. Under what circumstances would GDB do this?
This is not a SIGSEGV (even though si_signo would suggest that it is) since si_code is 0 and si_pid and si_uid are set). My inferior is a multi-threaded C++ application with a custom signal handler to handle GPFs when the application hits a memory barrier that I set up to protect certain ranges of memory. When I run under GDB I set
handle SIGSEGV noprint
to ensure that GDB passes SIGSEGV signals relating to the memory barrier on to my application for handling. That part seems to be working fine -- SIGSEGV with nonzero si_code in the siginfo_t struct are handled properly (after verifying that the faulting address in siginfo->_sifields.si_addr is within a protected range of memory).
But SIGSEGV with zero si_code indicates that the inferior is being killed, as far as I can tell, and the _sifields._kill fields, which overlays _sifields._sigfault fields, support this interpretation: GDB is killing my inferior process.
I just don't understand what causes GDB to do this.
An update on this: it looks like GDB is sending SIGSTOP to the inferior. If I look at $_siginfo at point of failure I see:
(gdb) p $_siginfo
$2 = {
si_signo = 5,
si_errno = 0,
si_code = 128,
_sifields = {
_pad = {0, 0, -1054653696, 57, 97635496, 0, 5344160, 0, 47838328, 0, -154686444, 32767, 47838328, 0, 4514687, 0, 0, 0, 49642032, 0, 50016832, 0, 49599376, 1, 0, 0, 92410096, 0},
_kill = {
si_pid = 0,
si_uid = 0
},
_timer = {
si_tid = 0,
si_overrun = 0,
si_sigval = {
sival_int = -1054653696,
sival_ptr = 0x39c1234300
}
},
_rt = {
si_pid = 0,
si_uid = 0,
si_sigval = {
sival_int = -1054653696,
sival_ptr = 0x39c1234300
}
},
_sigchld = {
si_pid = 0,
si_uid = 0,
si_status = -1054653696,
si_utime = 419341262248738873,
si_stime = 22952992424591360
},
_sigfault = {
si_addr = 0x0
},
_sigpoll = {
si_band = 0,
si_fd = -1054653696
}
}
}
But my signal handler sees this (somewhat obfuscated * -- I am working in a clean-room environment):
(gdb) bt
#0 ***SignalHandler (signal=11, sigInfo=0x7fff280083f0, contextInfo=0x7fff280082c0) at ***signal.c:***
...
(gdb) setsig 0x7fff280083f0
[signo=11; code=0; addr=0xbb900007022] ((siginfo_t*) 0x7fff280083f0)
...
(gdb) p *((siginfo_t*) 0x7fff280083f0)
$4 = {
si_signo = 11,
si_errno = 0,
si_code = 0,
_sifields = {
_pad = {28706, 3001, -515511096, 32767, -233916640, 32767, -228999566, 32767, 671122824, 32767, -468452105, 1927272, 1, 0, -515510808, 32767, 0, 32767, 37011703, 0, -515511024, 32767, 37011703, 32767, 2, 32767, 1000000000, 0},
_kill = {
si_pid = 28706,
si_uid = 3001
},
_timer = {
si_tid = 28706,
si_overrun = 3001,
si_sigval = {
sival_int = -515511096,
sival_ptr = 0x7fffe145ecc8
}
},
_rt = {
si_pid = 28706,
si_uid = 3001,
si_sigval = {
sival_int = -515511096,
sival_ptr = 0x7fffe145ecc8
}
},
_sigchld = {
si_pid = 28706,
si_uid = 3001,
si_status = -515511096,
si_utime = 140737254438688,
si_stime = 140737259355762
},
_sigfault = {
si_addr = 0xbb900007022
},
_sigpoll = {
si_band = 12889196884002,
si_fd = -515511096
}
}
}
(gdb) shell ps -ef | grep gdb
*** 28706 28704 0 Jun26 pts/17 00:00:02 /usr/bin/gdb -q ***
(gdb) shell echo $UID
3001
So my signal handler sees a siginfo_t struct with si_signo 11 (SIGSEGV), si_code = 0 (kill), si_pid = 28706 (gdb), and si_user = 3001 (me). And GDB reports a siginfo_t with si_signo = 5 (SIGSTOP).
It may be that the inferior process is performing some low-level handling of the original SIGSTOP and sending it up the chain as a kill. But it is the original SIGSTOP that I don't understand/want to eliminate.
I should add that I am setting the following directives before starting the inferior (and it makes no difference whether the handle SIGSTOP directive is set or not):
handle SIGSEGV noprint
handle SIGSTOP nostop print ignore
Does this shed any light on the problem? This is killing me. Also, if no insight here, can anyone suggest other forums that might be helpful to post this to?
(gdb) show version
GNU gdb (GDB) Red Hat Enterprise Linux (7.1-29.el6_0.1)
Copyright (C) 2010 Free Software Foundation, Inc.
I am running this on a 1.8GHz 16 Core/32 Thread Xeon, 4x E7520, Nehalem-based server. I get the same result regardless of whether hyperthreading is enabled or disabled.
Under Linux, si_signo = 11 would indicate that this is GDB propagating SIGSEGV. See signal(7) for the signal numbers.
Try:
(gdb) handle SIGSEGV nopass
Signal Stop Print Pass to program Description
SIGSEGV Yes Yes No Segmentation fault
Try casting the third argument of the signal handler function you register with sigaction() to a (ucontext *) and dumping the CPU registers. The instruction pointer in particular could provide some clue:
#include <ucontext.h>
int my_sigsegv_handler(int signo, siginfo_t *info, void *context)
{
ucontext *u = (ucontext *)context;
/* dump u->uc_mtext.gregs[REG_RIP] o REG_EIP */
}
then pass the instruction pointer to info addr in GDB.
To really understand what's happening, I'd try to pin down:
Exactly which signal is seen by your process, is it SIGSEGV as indicated by the si_signo member of siginfo_t? What's the first argument of the signal handler function registered by sigaction()? (Those two things not matching is unlikely but not impossible with PTRACE_SETSIGINFO)
GDB either intercepted a signal that the kernel is sending to your process and then injected the signal again or decided to send the signal itself. Try to determine which. This could be done by running GDB under itself and breaking on kill, tkill, tgkill and ptrace if $rdi == PTRACE_KILL (Sounds time consuming, I know).
I'm running into the exact same problem on a different program with gdb 7.4. I upgraded to gdb 7.9 ---same machine and kernel version--- and the problem disappeared. On a different machine, gdb 7.7 works too. My guess is that the issue was fixed in gdb 7.5-7.7.

Socket select reducing the number of sockets in file descriptor set

I have a piece of code that accepts 2 connections, creates a file descriptor set with their respective sockets, and passes it to select. But when select returns, the number of file descriptors in the file descriptor set was reduced to 1, and select can just detect received data for the first socket in the fd_array array.
Any ideas where I should look at?
Thanks in advance,
Andre
fd_set mSockets;
/* At this point
mSockets.fd_count = 2
mSockets.fd_array[0] = 3765
mSockets.fd_array[1] = 2436
*/
select(0, & mSockets, 0, 0, 0);
/* At this point
mSockets.fd_count = 1
mSockets.fd_array[0] = 3765
mSockets.fd_array[1] = 2436
*/
That is by design the readfds, writefds and exceptfds paramters of the select functions are in/out paramters.
You should initialize the fd_set before each call to select:
SOCKET s1;
SOCKET s2;
// open sockets s1 and s2
// prepare select call
FD_ZERO(&mSockets);
FD_SET(s1, &mSockets);
FD_SET(s2, &mSockets);
select(0, &mSockets, 0, 0, 0);
// evaluate select results
if (FD_ISSET(s1, &mSockets))
{
// process s1 traffic
}
if (FD_ISSET(s2, &mSockets))
{
// process s2 traffic
}
Additionally cou can check the return value of select. It indicates invalid if you can opertate with the sockets at all. I.e. a zero return indicates, that all FD_ISSET amcros will return 0.
EDIT:
Since readfds, writefds and exceptfds are also out paramters of the select functions, they are modified. The fd_count member indicates how many fd_array members are valid. You should not evaluate fd_array[1] if fd_count is less than 2.