C++ udp recvfrom reduce drops - c++

I have quite standard setup of my udp receiver socket. My sender sends data at 36Hz and receiver reads at 72Hz. 12072bytes per send.
When I do cat /proc/net/udp. I get usually
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
7017: 0101007F:0035 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 10636 2 0000000000000000 0
7032: 00000000:0044 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 14671 2 0000000000000000 0
7595: 00000000:0277 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 11113 2 0000000000000000 0
7660: 00000000:22B8 00000000:0000 07 00000000:00004100 00:00000000 00000000 1000 0 251331 3 0000000000000000 352743
You can see the rx_queue has some values in there, reads not fast enough ?
My code
int recv_len = recvfrom(s, buf, BUFLEN, MSG_TRUNC, (struct sockaddr *) &si_other, &slen);
// dont worry buflen is like 64000 no error here
std::cout <<" recv_len "<<recv_len<<std::endl;
I always get output as recv_len 12072 even though the queue is quite big ? why is this ? Is there a way to speed up my read or read all the messages in the queue ? I dont understand what's wrong even my read frequency is higher.

UDP datagrams always travel as a complete atomic unit. If you send a 12072 byte UDP datagram, your receiver will get exactly one 12072 byte datagram or nothing at all -- you won't ever receive a partial message (*) or multiple messages concatenated.
Note that with datagrams of this size, they're almost certainly being fragmented at the IP layer because they're probably larger than your network's MTU (maximum transmission unit). In that case, if any one of the fragments is dropped along the way or at the receiving host or found to be corrupted, the entire UDP datagram will be dropped.
(* A message may be truncated if the buffer provided to recvfrom is too small, but it will never even be considered for receiving if the entire message could not be reassembled in the kernel.)
If you are unable to receive all the messages being sent, I would check whether you need to increase the kernel buffer space allocated to UDP. This is done with the sysctl utility. Specifically you should check and possibly adjust the values of net.core.rmem_max and net.ipv4.udp_mem. See the corresponding documentation:
https://www.kernel.org/doc/Documentation/sysctl/net.txt
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
Finally, it seems a bit curious to talk about "read frequency" -- I assume that means you are polling the socket 72 times per second? Why not just dedicate a thread to reading from the socket. Then the thread can block on the recvfrom and the receive will complete with the least possible latency. (In any case, this is worth a try, even if only for a test -- to see if the polling is contributing to your inability to keep up with the sender.)

Related

rte_eth_tx_burst() descriptor/mbuf management guarantees vs. free thresholds

The rte_eth_tx_burst() function is documented as:
* It is the responsibility of the rte_eth_tx_burst() function to
* transparently free the memory buffers of packets previously sent.
* This feature is driven by the *tx_free_thresh* value supplied to the
* rte_eth_dev_configure() function at device configuration time.
* When the number of free TX descriptors drops below this threshold, the
* rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf* buffers
* of those packets whose transmission was effectively completed.
I have a small test program where this doesn't seem to hold true (when using the ixgbe driver on a vfio X553 1GbE NIC).
So my program sets up one transmit queue like this:
uint16_t tx_ring_size = 1024-32;
rte_eth_dev_configure(port_id, 0, 1, &port_conf);
r = rte_eth_dev_adjust_nb_rx_tx_desc(port_id, &rx_ring_size, &tx_ring_size);
struct rte_eth_txconf txconf = dev_info.default_txconf;
r = rte_eth_tx_queue_setup(port_id, 0, tx_ring_size,
rte_eth_dev_socket_id(port_id), &txconf);
The transmit mbuf packet pool is created like this:
struct rte_mempool *pkt_pool = rte_pktmbuf_pool_create("pkt_pool", 1023, 341, 0,
RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
In that way, when sending packets I rather run out of TX descriptors before I run out of packet buffers. (the program generates packets with just one segment)
My expectation is that when I call rte_eth_tx_burst() in a loop (to send one packet after another) that it never fails since it transparently frees mbufs of already sent packets.
However, this doesn't happen.
I basically have a transmit loop like this:
for (unsigned i = 0; i < 2048; ++i) {
struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
// error check, prepare packet etc.
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
// error check etc.
}
After 1086 transmitted packets (of ~ 300 bytes each), rte_eth_tx_burst() returns 0.
I use the default threshold values, i.e. the queried values are (from dev_info.default_txconf):
tx thresh : 32
tx rs thresh: 32
wthresh : 0
So the main question now is: How hard is rte_eth_tx_burst() supposed to try to free mbuf buffers (and thus descriptors)?
I mean, it could busy loop until the transmission of previously supplied mbufs is completed.
Or it could just quickly check if some descriptors are free again. But if not, just give up.
Related question: Are the default threshold values appropriate for this use case?
So I work around this like that:
for (;;) {
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
if (l == 1) {
break;
} else {
RTE_LOG(ERR, USER1, "cannot send packet\n");
int r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
if (r < 0) {
rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
}
RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
}
}
With that I get output like this:
USER1: cannot send packet
USER1: 1086. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1118. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1150. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 1182. cleaned up 0 descriptors ...
[..]
USER1: cannot send packet
USER1: 1950. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1982. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 2046. cleaned up 32 descriptors ...
Meaning that it frees at most 32 descriptors like this. And that it doesn't always succeed, but then the next rte_eth_tx_burst() succeeds freeing some.
Side question: Is there a better more dpdk-idiomatic way to handle the recycling of mbufs?
When I change the code such that I run out of mbufs before I run out of transmit descriptors (i.e. tx ring created with 1024 descriptors, mbuf pool still has 1023 elements), I have to change the alloc part like this:
struct rte_mbuf *pkt;
do {
pkt = rte_pktmbuf_alloc(args.pkt_pool);
if (!pkt) {
r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
if (r < 0) {
rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
}
RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
}
} while (!pkt);
The output is similar, e.g.:
USER1: 1023. cleaned up 95 descriptors ...
USER1: 1118. cleaned up 32 descriptors ...
USER1: 1150. cleaned up 32 descriptors ...
USER1: 1182. cleaned up 32 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 32 descriptors ...
[..]
That means the freeing of descriptors/mbufs is so 'slow' that it has to busy loop up to 3 times.
Again, is this a valid approach, or are there better dpdk ways to solve this?
Since rte_eth_tx_done_cleanup() might return -ENOTSUP, this may point to the direction that my usage of it might not be the best solution.
Incidentally, even with the ixgbe driver it fails for me when I disable checksum offloads!
Apparently, ixgbe_dev_tx_done_cleanup() then invokes ixgbe_tx_done_cleanup_vec() instead of ixgbe_tx_done_cleanup_full() which unconditionally returns -ENOTSUP:
static int
ixgbe_tx_done_cleanup_vec(struct ixgbe_tx_queue *txq __rte_unused,
uint32_t free_cnt __rte_unused)
{
return -ENOTSUP;
}
Does this make sense?
So then perhaps the better strategy is then to make sure that there are less descriptors than pool elements (e.g. 1024-32 < 1023) and just re-call rte_eth_tx_burst() until it returns one?
That means like this:
for (;;) {
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
if (l == 1) {
break;
} else {
RTE_LOG(ERR, USER1, "%u. cannot send packet - retry\n", i);
}
}
This works, and the output shows again that the descriptors are freed 32 at a time, e.g.:
USER1: 1951. cannot send packet - retry
USER1: 1951. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2047. cannot send packet - retry
USER1: 2047. cannot send packet - retry
I know that I also can use rte_eth_tx_burst() to submit bigger bursts. But I want to get the simple/edge cases right and understand the dpdk semantics, first.
I'm on Fedora 33 and DPDK 20.11.2.
Recommendation/Solution: after analyzing the cause of the issue is indeed with TX descriptor with either rte_mempool_list_dump or dpdk-procinfo, please use rte_eth_tx_buffer_flush or change the settings for TX thresholds.
Explanation:
The behaviour mbuf_free is varied across PMD, and within the same NIC PF and VF also varies. Follow are some points to understand this propely
rte_mempool can be created with or without cache elements.
when created with cached elements, depending upon the available lcores (eal_options) and number of cache elements per core parameter, the configured mbufs are added per core cache.
When HW offload DEV_TX_OFFLOAD_MBUF_FAST_FREE is available and enabled, the agreement is the mbuf will have ref_cnt as 1.
So when ever tx_burst (success or failure is invoked) threshold levels are checked if free mbuf/mbuf-segments can be pushed back to pool.
With DEV_TX_OFFLOAD_MBUF_FAST_FREE enabled the driver blindly puts the elements into lcore cache.
while in case of no DEV_TX_OFFLOAD_MBUF_FAST_FREE, generic approach of validating the MBUF ensuring the nb_segments and ref_cnt are checked, then pushed to mempool.
But always the either fixed (32 I believe is the default set for all PMD) or available free mbuf is pushed to cache or pool always.
Facts:
In the case of the IXGBE VF driver the option DEV_TX_OFFLOAD_MBUF_FAST_FREE is not available. Which means each time whenever thresholds are met, each individual mbuf are checked and pushed to the mempool.
as per the code snippet rte_eth_dev_configure is configured only for TX, and rte_pktmbuf_pool_create is created to have 341 elements as cache.
Assumption has to be made, that there is only 1 Lcore based (which runs the loop of alloc and tx).
Code Snippet-1:
for (unsigned i = 0; i < 2048; ++i) {
struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
// error check, prepare packet etc.
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
// error check etc.
}
After 1086 transmitted packets (of ~ 300 bytes each), rte_eth_tx_burst() returns 0.
[Observation] If indeed the mbuf were running, the rte_pktmbuf_alloc should be failing before rte_eth_tx_burst. But failing at 1086, creates an interesting phenomenon because total mbuf created is 1023, and failure happens are 2 iteration of 32 mbuf_release to mempool. Analyzing the driver code for ixgbe, it can be found that (only place return as 0) in tx_xmit_pkts is
/* Only use descriptors that are available */
nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
return 0;
Even though in config tx_ring_size is set to 992, internally rte_eth_dev_adjust_nb_desc sets to max of *nb_desc, desc_lim->nb_min. Based on the code it is not because there are no free mbuf, but it due to TX descriptor is low or not availble.
while in all other cases, whenever rte_eth_tx_done_cleanup or rte_eth_tx_buffer_flush these actually pushes any pending descriptors to be DMA immediately out of SW PMD. This internally frees up more descriptors which makes the tx_burst much smoother.
To identify the root cause, whenever DPDK API tx_burst return either
invoke rte_mempool_list_dump or
make use of mempool dump via dpdk-procinfo
Note: most PMD operates on amortizing the cost of the descriptor (PCIe payload) write by batching and bunching for at least 4 (in case of SSE). Hence a single packet even if DPDK tx_burst returning 1 will not be pushing the packet out of NIC. Hence to ensure use rte_eth_tx_buffer_flush.
Say, you invoke rte_eth_tx_burst() to send one small packet (single mbuf, no offloads). Suppose, the driver indeed pushes the packet to the HW. Doing so eats up one descriptor in the ring: the driver "remembers" that this packet mbuf is associated with that descriptor. But the packet is not sent instantly. The HW typically has some means to notify the driver of completions. Just imagine: if the driver checked for completions on every rte_eth_tx_burst() invocation (thus ignoring any thresholds), then calling rte_eth_tx_burst() one more time in a tight loop manner for another packet would likely consume one more descriptor rather than recycle the first one. So, given this fact, I'd not use tight loop when investigating tx_free_thresh semantics. And it shouldn't matter whether you invoke rte_eth_tx_burst() once per a packet or once per a batch of them.
Now. Say, you have a Tx ring of size N. Suppose, tx_free_thresh is M. And you have a mempool of size Z. What you do is allocate a burst of N - M - 1 small packets and invoke rte_eth_tx_burst() to send this burst (no offloads; each packet is assumed to eat up one Tx descriptor). Then you wait for some wittingly sufficient (for completions) amount of time and check the number of free objects in the mempool. This figure should read Z - (N - M - 1). Then you allocate and send one extra packet. Then wait again. This time, the number of spare objects in the mempool should read Z - (N - M). Finally, you allocate and send one more packet (again!) thus crossing the threshold (the number of spare Tx descriptors becomes less than M). During this invocation of rte_eth_tx_burst(), the driver should detect crossing the threshold and start checking for completions. This should make the driver free (N - M) descriptors (consumed by two previous rte_eth_tx_burst() invocations) thus clearing up the whole ring. Then the driver proceeds to push the new packet in question to the HW thus spending one descriptor. You then check the mempool: this should report Z - 1 free objects.
So, the short of it: no loop, just three rte_eth_tx_burst() invocations with sufficient waiting time between them. And you check the spare object count in the mempool after each send operation. Theoretically, this way, you'll be able to understand the corner case semantics. That's the gist of it. However, please keep in mind that the actual behaviour may vary across different vendors / PMDs.
Relying on rte_eth_tx_done_cleanup() really isn't an option since many PMDs don't implement it. Mostly Intel PMD's provide it, but e.g. SFC, MLX* and af_packet ones don't.
However, it's still unclear why the ixgbe PMD doesn't support cleanup when no offloads are enabled.
The requirements on rte_eth_tx_burst() with respect to freeing are really light - from the API docs:
* It is the responsibility of the rte_eth_tx_burst() function to
* transparently free the memory buffers of packets previously sent.
* This feature is driven by the *tx_free_thresh* value supplied to the
* rte_eth_dev_configure() function at device configuration time.
* When the number of free TX descriptors drops below this threshold, the
* rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf* buffers
* of those packets whose transmission was effectively completed.
[..]
* #return
* The number of output packets actually stored in transmit descriptors of
* the transmit ring. The return value can be less than the value of the
* *tx_pkts* parameter when the transmit ring is full or has been filled up.
So just attempting to free (but not waiting on the results of that attempt) and returning 0 (since 0 is less than tx_pkts) is covered by that 'contract'.
FWIW, no example distributed with dpdk loops around rte_eth_tx_burst() to re-submit not-yet-sent packages. There are some examples that use rte_eth_tx_burst() and discard unsent packages, though.
AFAICS, besides rte_eth_tx_done_cleanup() and rte_eth_tx_burst() there is no other function for requesting the release of mbufs previously submitted for transmission.
Thus, it's advisable to size the mbuf packet pool larger than the configured ring size in order to survive situations where all mbufs are inflight and can't be recovered because there is no mbuf left for calling rte_eth_tx_burst() again.

Sometime Disconnect Req is inside Publish Message

On the client side I use:
mosquitto_pub -t tpc -m msg
On the server side I use nonblocking socket and socket() API:
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzab6/xnonblock.htm
After first received packet I send connect acknowledge packet.
For each received packet I print how many bytes was received and whole buffer in hex.
I compare received data with WireShark capturing.
Sometime it works well:
37 bytes received - Connect Command
10 bytes received - Publish Message [tpc]
2 bytes received - Disconnect Req
Sometime I get Disconnect Req inside Publish Message [tpc]:
37 bytes received - Connect Command
12 bytes received - Publish Message [tpc] + Disconnect Req
These last two bytes are Disconnect Req:
30
8
0
3
74
70
63
6d
73
67
ffffffe0 <--
0 <--
How can I avoid these situations and get always 3 packets?
Short answer: you can't. You have to actually parse the messages to determine the length.
The constant to create a tcp socket is called SOCK_STREAM for a reason. Socket has to be treated as such: a stream of bytes. Nobody guarantees that one send() on one side results in one recv() on the other side. The only guarantee is that the sequence is preserved: abcd may become (ab, cd), but will not become acbd.
The packets may be splitted somewhere half the way. So it may be that the client sends 2048 bytes, but on the server side you'll receive first ~1400 bytes and then the rest. So N sends does not result in N recv.
Another thing is that the client also treats the socket as a stream. It may send byte by byte, or send a batch of messages with one send(). N messages are not N sends.

DPDK rte_eth_tx_burst() reliability

According to the DPDK documentation, the rte_eth_tx_burst() function takes a batch of packets, and returns the number of packets that have been actually stored in transmit descriptors of the transmit ring.
Assuming that the packets are sent exactly in the same order as they are inserted in the tx_pkts array parameter, it is possible to call the function iteratively until all the packets are sent. Here a sample code taken from one of the examples:
sent = 0;
do {
n_pkts = rte_eth_tx_burst(portid, 0, &tx_pkts_burst[sent], n_mbufs - sent);
sent += n_pkts;
} while (sent < n_mbufs);
However, using the above code, I see that, sometimes, the amount of packets that the function says are sent, are not really sent.
I am accumulating the return value of rte_eth_tx_burst() in a variable and, at the end of the job, the value of the accumulator is greater than the value of opackets in the device eth_stats.
I see the same number of transmitted packets in eth_stats, eth_xstats and on the other side of the cable, and this number is less than the sum of the values returned by rte_eth_tx_burst().
So, my question is: in what case the rte_eth_tx_burst() function returns a value that does not correspond to the real number of transmitted packets?
According to the documentation, the function is returning only the number of packets that have been successfully inserted in the ring, so I assumed the return value was reliable.
My testbed:
NIC: Intel 82599ES
DPDK driver: igb_uio
DPDK version: 18.05
Traffic: UDP packets, sized 174B, with IP and UDP checksum offload
Edit 1
My test is the following:
the sender sends 32 messages with different IDs, then for each ACK received, a new message with the same ID of the ack-ed packet is sent again. The test ends when every ID has been sent and ack-ed N times (N=36864).
As described above, at some point one packet is not sent, so all the IDs complete the cycle, except one. This is what I see as output:
ID - #sent
0 - 36864
1 - 36864
2 - 36864
3 - 36864
4 - 8151
5 - 36864
6 - 36864
7 - 36864
....
At the end of the test, the accumulator variable with the number of packets sent is greater than the stats and the difference is 1. So, it looks like the rte_eth_tx_burst function failed to send that one packet that is not acknowledged.
Edit 2
It can be relevant that the value "n_mbufs" is not necessarily constant, since the packets are read as a burst from a ring.

Why DPDK only cannot send and receive 60 bytes packet

I have written a simple DPDK send and receive application. When the packet len <= 60 bytes, send and receive application works, but when packet len > 60 bytes, send application show it has sent out packet. but in recieve application, it does not receive anything.
In send application:
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
pkt = rte_pktmbuf_alloc(mbuf_pool);
pkt->data_len = packlen; //if packlen<=60, it works, but when packlen>60, receiver cannot receive anything.
I try both l2fwd and basicfwd as receive application. It is same result.
The issue is here:
pchar[12]=0;
pchar[13] = 0
This means Ethertype is 0. From the list of assigned Ethertypes:
https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml
We see that 0 means zero Ethernet frame length. Since the minimum Ethernet frame length is 64 (60 + 4 FCS), that is why you have troubles sending packets longer that 60 bytes.
To fix the issue, simply put there a reasonable Ethertype from the list above.

Realtime receiving of UDP packets with QNX RTOS

I have a source which sends UDP packets at a rate of 819.2 Hz (~1.2ms) to my QNX Neutrino machine. I want to receive and process those messages with as little delay and jitter as possible.
My first code was basically:
SetupUDPSocket();
while (true) {
recv(socket, buffer, BufferSize, MSG_WAITALL); // blocks until whole packet is received
processPacket(buffer);
}
The problem is that recv() only checks at each timer tick of the system if there is a new packet available. The timer tick is usually 1ms. So, if I use this I will get a huge jitter, because I process a packet every 1ms or every 2ms. I could reset the size of the timer ticks, but that would affect the whole system (and other timers of other processes, etc). And I still would have a jitter, because I certainly would never exactly match the 819.2 Hz.
So, I tried to use the interrupt line of the network card (5). But it seems as there are also other things which causes the interrupt to rise. I used to following code:
ThreadCtl(_NTO_TCTL_IO, 0);
SIGEV_INTR_INIT(&event);
iID = InterruptAttachEvent(IRQ5, &event, _NTO_INTR_FLAGS_TRK_MSK);
while(true) {
if (InterruptWait(0, NULL) == -1) {
std::cerr << "errno: " << errno << std::endl;
}
length = recv(socket, buffer, bufferSize, 0); // non-blocking this time
LogTimeAndLength();
InterruptUnmask(IRQ5, iID;
}
This results in a single succesful read in the beginning, followed by reads with 0 byte length after 0 time passing. It seems, that after do the InterruptUnmask(), the InterruptWait() does not wait at all, so there must already be a new interrupt (or the same?!).
Is it possible to do something like that with the interrupt line of the network card? Are there any other possibilties to receive the packets at a rate of 819.2 Hz?
Some information about the network card:
'pci -vvv' outputs:
Class = Network (Ethernet)
Vendor ID = 8086h, Intel Corporation
Device ID = 107ch, 82541PI Gigabit Ethernet Controller
PCI index = 0h
Class Codes = 020000h
Revision ID = 5h
Bus number = 4
Device number = 15
Function num = 0
Status Reg = 230h
Command Reg = 17h
I/O space access enabled
Memory space access enabled
Bus Master enabled
Special Cycle operations ignored
Memory Write and Invalidate enabled
Palette Snooping disabled
Parity Error Response disabled
Data/Address stepping disabled
SERR# driver disabled
Fast back-to-back transactions to different agents disabled
Header type = 0h Single-function
BIST = 0h Build-in-self-test not supported
Latency Timer = 40h
Cache Line Size= 8h un-cacheable
PCI Mem Address = febc0000h 32bit length 131072 enabled
PCI Mem Address = feba0000h 32bit length 131072 enabled
PCI IO Address = ec00h length 64 enabled
Subsystem Vendor ID = 8086h
Subsystem ID = 1376h
PCI Expansion ROM = feb80000h length 131072 disabled
Max Lat = 0ns
Min Gnt = 255ns
PCI Int Pin = INT A
Interrupt line = 5
CPU Interrupt = 5h
Capabilities Pointer = dch
Capability ID = 1h - Power Management
Capabilities = c822h - 28002000h
Capability ID = 7h - PCI-X
Capabilities = 2h - 400000h
Device Dependent Registers:
0x040: 0000 0000 0000 0000 0000 0000 0000 0000
...
0x0d0: 0000 0000 0000 0000 0000 0000 01e4 22c8
0x0e0: 0020 0028 0700 0200 0000 4000 0000 0000
0x0f0: 0500 8000 0000 0000 0000 0000 0000 0000
and 'nicinfo' outputs:
wm1:
INTEL 82544 Gigabit (Copper) Ethernet Controller
Physical Node ID ........................... 000E0C C5F6DD
Current Physical Node ID ................... 000E0C C5F6DD
Current Operation Rate ..................... 100.00 Mb/s full-duplex
Active Interface Type ...................... MII
Active PHY address ....................... 0
Maximum Transmittable data Unit ............ 1500
Maximum Receivable data Unit ............... 0
Hardware Interrupt ......................... 0x5
Memory Aperture ............................ 0xfebc0000 - 0xfebdffff
Promiscuous Mode ........................... Off
Multicast Support .......................... Enabled
Thanks for reading!
I am quite not sure why the statement "The problem is that recv() only checks at each timer tick of the system if there is a new packet available. The timer tick is usually 1ms." would be true for preemptive OS. There must be something in the system configuration or the network protocol stack implementation has some issues.
Years ago when I was working on some IPTV STB project for Yahoo BB Japan, i got an issue in RTP receiving. The issues is not delay or jitter, but the overall system performance in the STB after we add some NDS algorithm. We are using vxWorks, and vxWorks support ethernet hook interface, which will be called each time a ethernet packet is received by the driver.
I hook an API into it and just parse the UDP with specified port from the ethernet packets directly. Of course we have some assumption that there is no fragmentation, which is guaranteed by the network setup for performance issues. Maybe you can also check to see if you can get the same hook in the QNX ethernet driver. At lease you found out if the jitter comes from driver or not.
How big are your UDP packets ? If the packet size is small you will gain greater efficiency by packing more data into single packet and decreasing transmission rate.
I suspect the interrupt service routing (ISR) is not masking the interrupt. Perhaps it is designed for edge-sensitivity and the interrupt is level-sensitive.
sorry I'm a bit late to the party, but I came across your question and saw that it was similar to a situation I encountered. Instead of hardware interrupts, you could try a software interrupt using signals. QNX has some documentation here: http://www.qnx.com/developers/docs/qnx_4.25_docs/qnx4/sysarch/microkernel.html#IPCSIGNALS . I was using CentOS at the time but the theory is the same. According to http://www.qnx.com/developers/docs/6.3.0SP3/neutrino/lib_ref/s/socket.html you can use ioctl() to set up a receive group for the SIGIO signal for a given file descriptor...in your case a UDP socket. When the socket has data that is ready for reading, a SIGIO signal is sent to the process indicated by ioctl(). Use sigaction() to tell the OS what signal handling function to use. In your case, the signal handler can read the data off the socket and store it in a buffer for processing. Use pause() to suspend the process until it handles the SIGIO signal. When the signal handler returns, the process will wake up and you can process the data in the buffer.
That should allow you to process your data as it comes in without having to deal with timers or hardware interrupts. One thing to be aware of is that your system can process those signals as fast as the UDP traffic is coming in.