DPDK packet lost and disorder - dpdk

I did a simple test program using DPDK: program1 in computer1 is to send packet, and program2 in computer2 is to receive packet. computer1 and computer2 are directly connected, no switch.
In program 1, I use a packId to indicate the sequence of packet id.
while(true){
pkt = rte_pktmbuf_alloc(mbuf_pool);
uint8_t* pchar = rte_pktmbuf_mod(pkt, uint8_t*);
//set mac address and packet length. (pkt 0 to pkt 13).
//use from byte 14 to store uint_64 packId;
uint64_t* pPackId = (uint64_t*)(pchar+14);
*pPackId = packId;
packId++;
//put 1024 bytes data inside packet.
uint16_t sent = rte_eth_tx_burst(0, 0, &pkt, 1);
while(sent!=1)
{
sent = rte_eth_tx_burst(0, 0, &pkt, 1);
}
}
In receiver, i define long RX ring: nb_rxd=3072:
rte_eth_dev_adjust_nb_rx_tx_desc(0, &nb_rxd, &nb_txd);
rte_eth_rx_queue_setup(0, 0, nb_rxd, rte_eth_dev_socket_id(0), NULL, mbuf_pool);
There is a for loop to receive packets, and check packet sequence id.
for(;;)
{
strcut rte_mbuf *bufs[32];
const uint16_t nb_rx = rte_eth_rx_burst(0, 0, bus, 32);
if(unlikely(nb_rx==0))
continue;
int m = 0;
for (m=0; m<nb_rx;m++)
{
uint8_t* pchar = rte_pktmbuf_mtod(buf[m], uint8_t*);
uint64_t* pPackId = pchar+14;
uint64_t packid = *pPackId;
if(expectedPackid!=packid){
printf...
expectedPackid = packid+1;
}
else expectedPackid++;
}
}
Based on program2, I see a lot of packet lost and disorder. The received packet is put inside the ring buffer. Should it receive in order, and I also find there is packet lost, but my program1's sending speed is only around 1gbps.

rte_eth_stats_get() is very useful for troubleshooting. From the rte_eth_stats, I found ipackets is correct, q_ipackets[0] is correct, and imissed is 0, ierrors is 0, rx_nombuf is 0, q_errors[0] is 0. So it should be codes in program2 has problem. After check codes, it is because some memory management in program2.

Related

I need code snippet to embed the ethernet header with payload from secondary DPDK application to primary

The architecture is as follows:
Secondary DPDK app ------> Primary DPDK App ----> (EDIT)Interface
Inside my Secondary I have a vector of u8 bytes representing an L2 packet.
I want to send this L2 packet to the Primary App so the primary could send it to the internet.
From what I understood, the L2 packet has to be wrapped in mbuf in order to be able to put on a shared ring.
But I have no clue on how to do this wrapping.
What I don't know exactly: my packet is just a vector of bytes, how could I extract useful information out of it in order to fill the mbuf fields? And which fields of the mbuf should be filled minimally for this to work?
For better understanding, here is what should happen step by step:
Vector of bytes gets in secondary (doesn't matter how)
Secondary gets an mbuf from the shared mempool.
Secondary puts the vector inside the mbuf (the vector is an L2 packet)
mbuf has many fields representing many things, so I don't know which field to fill and with what.
Secondary places the mbuf on a shared ring.
Primary grabs the mbuf from shared ring.
Primary send the mbuf to the internet.
This is what I coded so far, the secondary App is in Rust and primary App is in C language.
Secondary is here (github):
Remember, L2 Packet is just Vec, that is like [1, 222, 23, 34...], a simple array.
// GETTING THE MBUF FROM SHARED MEMPOOL
let mut my_buffer = self.do_rte_mempool_get();
while let Err(er) = my_buffer {
warn!("rte_mempool_get failed, trying again.");
my_buffer = self.do_rte_mempool_get();
// it may fail if not enough entries are available.
}
warn!("rte_mempool_get success");
// Let's just send an empty packet for starters.
let my_buffer = my_buffer.unwrap();
// HERE I SHOULD PUT THE L2 PACKET INSIDE THE MBUF.
// MY L2 PACKET is a Vec<u8>
// NOW I PUT THE MBUF ON THE SHARED RING, BYE MBUF
let mut res = self.do_rte_ring_enqueue(my_buffer);
// it may fail if not enough room in the ring to enqueue
while let Err(er) = res {
warn!("rte_ring_enqueue failed, trying again.");
res = self.do_rte_ring_enqueue(my_buffer);
}
warn!("rte_ring_enqueue success");
And Primary is here (it just gets mbufs from ring and has to send them with rte_eth_tx_burst()):
/* Run until the application is quit or killed. */
for (;;)
{
// receive packets on rte ring
// then send them to NIC
struct rte_mbuf *bufs[BURST_SIZE];
void *mbuf;
if (rte_ring_dequeue(recv_ring, &mbuf) < 0) {
continue;
}
printf("Received mbuf.\n");
//for now I just want to test it out so I stop here
continue;
//* Send packet to port */
bufs[0] = mbuf;
uint16_t nbPackets = 1;
const uint16_t nb_tx = rte_eth_tx_burst(port, 0,
bufs, nbPackets);
// /* Free any unsent packets. */
if (unlikely(nb_tx < nbPackets))
{
rte_pktmbuf_free(bufs[nbPackets]);
}
If you have any questions please let me know!
As always, thanks for reading!
UPDATE: the dpdk primary wasn't actually connected to the internet. It is simply using an interface of a virtual machine. The DPDK secondary and primary are both running inside a virtual machine and the interface used by primary is connected to a host interface through a bridge. So I can watch the bridge in question on the host using tcpdump.
I tried something to put the L2 packet inside the mbuf on secondary and it looks like this:
(you can also check github)
// After receiving something on the channel
// I want to send it to the primary DPDK
// And the primary will send it to hardware NIC
let mut my_buffer = self.do_rte_mempool_get();
while let Err(er) = my_buffer {
warn!("rte_mempool_get failed, trying again.");
my_buffer = self.do_rte_mempool_get();
// it may fail if not enough entries are available.
}
warn!("rte_mempool_get success");
// Let's just send an empty packet for starters.
let my_buffer = my_buffer.unwrap();
let my_buffer_struct: *mut rte_mbuf = my_buffer as (*mut rte_mbuf);
unsafe {
// the packet buffer, not the mbuf
let buf_addr: *mut c_void = (*my_buffer_struct).buf_addr;
let mut real_buf_addr = buf_addr.offset((*my_buffer_struct).data_off as isize);
//try to copy the Vec<u8> inside the mbuf
copy(my_data.as_mut_ptr(), real_buf_addr as *mut u8, my_data.len());
(*my_buffer_struct).data_len = my_data.len() as u16;
};
(my_data is the Vec in the above code snippet)
Now, on the primary DPDK I am receiving those bytes which are the bytes of the L2 packet. I printed them and they are the same as in secondary which is great.
for (;;)
{
// receive packets on rte ring
// then send them to NIC
struct rte_mbuf *bufs[BURST_SIZE];
void *mbuf;
unsigned char* my_packet;
uint16_t data_len;
uint16_t i = 0;
if (rte_ring_dequeue(recv_ring, &mbuf) < 0) {
continue;
}
printf("Received mbuf.\n");
my_packet = ((unsigned char *)(*(struct rte_mbuf *)mbuf).buf_addr) + ((struct rte_mbuf *)mbuf)->data_off;
data_len = ((struct rte_mbuf *)mbuf)->data_len;
for (i = 0; i < data_len; i++) {
printf("%d ", (uint8_t)my_packet[i]);
}
printf("\n");
//for now I just want to test it out so I stop here
// rte_pktmbuf_free(mbuf);
// continue;
//* Send packet to port */
bufs[0] = (struct rte_mbuf *)mbuf;
uint16_t nbPackets = 1;
const uint16_t nb_tx = rte_eth_tx_burst(port, 0,
bufs, nbPackets);
// /* Free any unsent packets. */
if (unlikely(nb_tx < nbPackets))
{
rte_pktmbuf_free(bufs[nbPackets]);
}
But the issue is that after sending the mbuf from primary with eth_tx_burst, I cannot see any packet while using tcpdump on the host.
So I am guessing I am not wrapping the packet inside the mbuf properly.
I hope it makes more sense.
#Mihai, if one needs to create a DPDK buffer in secondary and send it via RTE_RING following are the steps to do so
Start the secondary application
Get the Mbuf pool ptr via rte_mempool_lookup
Allocate mbuf from mbuf pool via rte_pktmbuf_alloc
set minimum fields in mbuf such as pkt_len, data_len, next and nb_segs to appropriate values.
fetch the starting of the region to mem copy your custom packet with rte_pktmbuf_mtod_offset or rte_pktmbuf_mtod
then memcopy the content from user vector to DPDK area
Note: based on the checksum offload, actual frame len and chain mbuf mode other fields need to be updated.
code snippet
mbuf_ptr = rte_pktmbuf_alloc(mbuf_pool);
mbuf_ptr->data_len = [size of vector preferably under 1500];
mbuf_ptr->pkt_len = mbuf_ptr->data_len;
struct rte_ether_hdr *eth_hdr = rte_pktmbuf_mtod(created_pkt, struct rte_ether_hdr *);
rte_memcpy(&eth_addr, &user_buffer, mbuf_ptr->data_len);
Note: similar to the above code has been implemented in Rust + C wireguard to be enabled with DPDK.
Please rework with the above code into your code
warn!("rte_mempool_get success");
// Let's just send an empty packet for starters.
let my_buffer = my_buffer.unwrap();
let my_buffer_struct: *mut rte_mbuf = my_buffer as (*mut rte_mbuf);
unsafe {
// the packet buffer, not the mbuf
let buf_addr: *mut c_void = (*my_buffer_struct).buf_addr;
let mut real_buf_addr = buf_addr.offset((*my_buffer_struct).data_off as isize);
//try to copy the Vec<u8> inside the mbuf
copy(my_data.as_mut_ptr(), real_buf_addr as *mut u8, my_data.len());
(*my_buffer_struct).data_len = my_data.len() as u16;
};
unsafe {
warn!("Length of segment buffer: {}", (*my_buffer_struct).buf_len);
warn!("Data offset: {}", (*my_buffer_struct).data_off);
let buf_addr: *mut c_void = (*my_buffer_struct).buf_addr;
let real_buf_addr = buf_addr.offset((*my_buffer_struct).data_off as isize);
warn!("Address of buf_addr: {:?}", buf_addr);
warn!("Address of buf_addr + data_off: {:?}", real_buf_addr);
warn!("\n");
};

Recv function for TCP Socket programming

I am new in Socket Programming. I am trying to create a client application. The server is a camera which communicates using TCP. The camera is sending continuous data. Using Wireshark, I can see that the camera is sending continuous packets of different sizes, but not more than 1514 bytes. But my recv function is always returning 2000 which is the size of my buffer.
unsigned char buf[2000];
int bytesIn = recv(sock, (char*)buf, sizeof(buf) , 0);
if (bytesIn > 0)
{
std::cout << bytesIn << std::endl;
}
The first packet I receive is of size 9 bytes, which recv returns correct, but after that it always returns 2000.
Can anyone please tell me the solution so that I can get the correct size of the actual data payload?
EDIT
int bytesIn = recv(sock, (char*)buf, sizeof(buf) , 0);
if (bytesIn > 0)
{
while (bytes != 1514)
{
if (count == 221184)
{
break;
}
buffer[count++] = buf[bytes++];
}
std::cout << count;
}
EDIT:
Here is my Wireshark capture:
My Code to handle packets
int bytesIn = recv(sock, (char*)&buf, sizeof(buf) , 0);
if (bytesIn > 0)
{
if (flag1 == true)
{
while ((bytes != 1460 && (buf[bytes] != 0)) && _fillFlag)
{
buffer[fill++] = buf[bytes++];
if (fill == 221184)
{
flag1 = false;
_fillFlag = false;
fill = 0;
queue.Enqueue(buffer, sizeof(buffer));
break;
}
}
}
if ((strncmp(buf, _string2, 10) == 0))
{
flag1 = true;
}
}
For each frame camera is sending 221184 bytes and after each frame it sends a packet of data 9 bytes which I used to compare this 9 bytes are constant.
This 221184 bytes send by camera doesn't have 0 so I use this condition in while loop. This code is working and showing the frame but after few frame it shows fully black frame. I think the mistake is in receiving the packet.
Size of per frame is : 221184 (fixed)
Size of per recv is : 0 ~ 1514
My implementation here :
DWORD MakeFrame(int socket)
{
INT nFrameSize = 221184;
INT nSizeToRecv = 221184;
INT nRecvSize = 2000;
INT nReceived = 0;
INT nTotalReceived = 0;
BYTE byCamera[2000] = { 0 }; // byCamera size = nRecvSize
BYTE byFrame[221184] = { 0 }; // byFrame size = nFrameSize
while(0 != nSizeToRecv)
{
nRecvSize = min(2000, nSizeToRecv);
nReceived = recv(socket, (char*)byCamera, nRecvSize, 0);
memcpy_s(byFrame + nTotalReceived, nFrameSize, byCamera, nReceived);
nSizeToRecv -= nReceived;
nTotalReceived += nReceived;
}
// byFrame is ready to use.
// ...
// ...
return WSAGetLastError();
}
The first packet I receive is of size 9 bytes which it print correct after that it always print 2000. So can anyone please tell me the solution that I only get the size of actual data payload.
TCP is no packet-oriented, but a stream-oriented transport protocol. There is no notion of packets in TCP (apart maybe from a MTU). If you want to work in packets, you have to either use UDP (which is in fact packet-oriented, but by default not reliable concerning order, discarding and alike) or you have to implement your packet logic in TCP, i.e. reading from a stream and partition the data into logical packets once received.

Replacing av_read_frame() to reduce delay

I am implementing a (very) low latency video streaming C++ application using ffmpeg. The client receives a video which is encoded with x264’s zerolatency preset, so there is no need for buffering. As described here, if you use av_read_frame() to read packets of the encoded video stream, you will always have at least one frame delay because of internal buffering done in ffmpeg. So when I call av_read_frame() after frame n+1 has been sent to the client, the function will return frame n.
Getting rid of this buffering by setting the AVFormatContext flags AVFMT_FLAG_NOPARSE | AVFMT_FLAG_NOFILLIN as suggested in the source disables packet parsing and therefore breaks decoding, as noted in the source.
Therefore, I am writing my own packet receiver and parser. First, here are the relevant steps of the working solution (including one frame delay) using av_read_frame():
AVFormatContext *fctx;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;
//Initialization of AV structures
//…
//Main Loop
while(true){
//Receive packet
av_read_frame(fctx, pkt);
//Decode:
avcodec_send_packet(cctx, pkt);
avcodec_receive_frame(cctx, frm);
//Display frame
//…
}
And below is my solution, which mimics the behavior of av_read_frame(), as far as I could reproduce it. I was able to track the source code of av_read_frame() down to ff_read_packet(),but I cannot find the source of AVInputformat.read_packet().
int tcpsocket;
AVCodecContext *cctx;
AVPacket *pkt;
AVFrame *frm;
uint8_t recvbuf[(int)10e5];
memset(recvbuf,0,10e5);
int pos = 0;
AVCodecParserContext * parser = av_parser_init(AV_CODEC_ID_H264);
parser->flags |= PARSER_FLAG_COMPLETE_FRAMES;
parser->flags |= PARSER_FLAG_USE_CODEC_TS;
//Initialization of AV structures and the tcpsocket
//…
//Main Loop
while(true){
//Receive packet
int length = read(tcpsocket, recvbuf, 10e5);
if (length >= 0) {
//Creating temporary packet
AVPacket * tempPacket = new AVPacket;
av_init_packet(tempPacket);
av_new_packet(tempPacket, length);
memcpy(tempPacket->data, recvbuf, length);
tempPacket->pos = pos;
pos += length;
memset(recvbuf,0,length);
//Parsing temporary packet into pkt
av_init_packet(pkt);
av_parser_parse2(parser, cctx,
&(pkt->data), &(pkt->size),
tempPacket->data, tempPacket->size,
tempPacket->pts, tempPacket->dts, tempPacket->pos
);
pkt->pts = parser->pts;
pkt->dts = parser->dts;
pkt->pos = parser->pos;
//Set keyframe flag
if (parser->key_frame == 1 ||
(parser->key_frame == -1 &&
parser->pict_type == AV_PICTURE_TYPE_I))
pkt->flags |= AV_PKT_FLAG_KEY;
if (parser->key_frame == -1 && parser->pict_type == AV_PICTURE_TYPE_NONE && (pkt->flags & AV_PKT_FLAG_KEY))
pkt->flags |= AV_PKT_FLAG_KEY;
pkt->duration = 96000; //Same result as in av_read_frame()
//Decode:
avcodec_send_packet(cctx, pkt);
avcodec_receive_frame(cctx, frm);
//Display frame
//…
}
}
I checked the fields of the resulting packet (pkt) just before avcodec_send_packet() in both solutions. They are as far as I can tell identical. The only difference might be the actual content of pkt->data. My solution decodes I-Frames fine, but the references in P-Frames seem to be broken, causing heavy artifacts and error messages such as “invalid level prefix”, “error while decoding MB xx”, and similar.
I would be very grateful for any hints.
Edit 1: I have developed a workaround for the time being: in the video server, after sending the packet containing the encoded data of a frame, I send one dummy packet which only contains the delimiters marking beginning and end of the packet. This way, I push the actual video data frames through av_read_frame(). I discard the dummy packets immediately after av_frame_read().
Edit 2: Solved here by rom1v, as written in his comment to this question.
av_parser_parse2() does not neccessarily consume your tempPacket in one go. You have to call it in another loop and check its return value, like in the API docs.

Reinjecting modified packets in netfilter module

I have used netfiler_queue to create a NFQUEUE module for iptables that handles all outgoing UDP packets.
I want to modify all UDP packets that match a certain pattern, and reinject them into the network.
Here is some example code:
...
static int Callback( nfq_q_handle *myQueue, struct nfgenmsg *msg, nfq_data *pkt, void *cbData) {
uint32_t id = 0;
nfqnl_msg_packet_hdr *header;
if ((header = nfq_get_msg_packet_hdr(pkt))) {
id = ntohl(header->packet_id);
}
// Get the packet payload
unsigned char *pktData;
int len = nfq_get_payload(pkt, &pktData);
// The following is an example.
// In reality, it involves more parsing of the packet payload.
if (len && pktData[40] == 'X') {
// Modify byte 40
pktData[40] = 'Y';
}
// Pass through the (modified) packet.
return nfq_set_verdict(myQueue, id, NF_ACCEPT, 0, NULL);
}
...
int main(){
...
struct nfq_handle nfqHandle;
nfq_create_queue(nfqHandle, 0, &Callback, NULL)
...
return 0;
}
The modified packet does not get reinjected into the stream. How would I inject the modified version of the packet?
Two things. First:
return nfq_set_verdict(myQueue, id, NF_ACCEPT, 0, NULL);
should be:
return nfq_set_verdict(myQueue, id, NF_ACCEPT, len, pktData);
That tells it you want to send a modified packet. (you might need some type casting)
Second, you just modified the packet. The IP stack isn't helping you out any more at this point, so you'll need to recompute the UDP checksum for that packet, or zero it out so the other end won't even check it.
The UDP checksum will live in bytes 0x1A and 0x1B of your packet, so this will zero them out:
pktData[0x1a] = 0;
pktData[0x1b] = 0;
and then your packet will go through.

IOCP and overwritten buffer

Well i make a IOCP for handling client connections with the following details:
- Threads = (CPU cores * 2)
- Assigning an completion port to each socket
- Accessing the socket context by Client Index or overlapped struct (either way is the same)
So i am trying to debug the incoming packets, its works like a charm, except for a little but nasty detail... I set a break point on WorkersThread function (where i recv the packet) i am watching the buffer with the packet i recv, when suddenly the buffer gets overwritten with a new packet that i got from client.
Why is that? according to what i read, IOCP should wait till i process the packet, send a response to client before recv any other packet. So i set a flag on my socket context called "Processing" and still got the overwritten buffer with an incoming packet. So it doesn't let me debug at all and its driving me crazy
Is ollydbg (debugger) fault that let the other threads running while i set a break point? Or is some error in my IOCP implementation?
Here is how my WorkerThread is coded:
DWORD WINAPI WorkerThread(void* argument)
{
int BytesTransfer;
int BytesRecv;
int ClientID;
int result;
OVERLAPPED* overlapped = 0;
ClientInfo* clientinfo = 0;
WSABUF wsabuf;
int flags;
//Exit only when shutdown signal is recv
while (WaitForSingleObject(IOCPBase::internaldata->sockcontext.ShutDownSignal, NULL) != WAIT_OBJECT_0)
{
flags = 0; BytesTransfer = 0; BytesRecv = 0; ClientID = 0;
//Get from queued list
if (GetQueuedCompletionStatus(IOCPBase::internaldata->sockcontext.CompletionPort, (LPDWORD)&BytesTransfer, (PULONG_PTR)&ClientID, &overlapped, INFINITE) == TRUE)
{
if (overlapped == 0)
{
//Fatal error
break;
}
clientinfo = (ClientInfo*)overlapped;
if (BytesTransfer != 0)
{
//Assign the buffer pointer and buffer len to WSABUF local
clientinfo->RecvContext.RecvBytes = BytesTransfer;
wsabuf.buf = (char*)clientinfo->RecvContext.Buffer;
wsabuf.len = clientinfo->RecvContext.Len;
//Switch for OperationCode
//switch (IOCPBase::internaldata->ClientContext[ClientID].OperationCode)
switch (clientinfo->OperationCode)
{
case FD_READ:
// Check if we have send all data to the client from a previous send
if (clientinfo->SendContext.SendBytes < clientinfo->SendContext.TotalBytes)
{
clientinfo->OperationCode = FD_READ; //We set FD_READ caused on the next send, there could still be bytes left to send
wsabuf.buf += clientinfo->SendContext.SendBytes; //The buffer position is + sended bytes
wsabuf.len = clientinfo->SendContext.TotalBytes - clientinfo->SendContext.SendBytes; //the buffer len is total - sended bytes
//Send the remain bytes
result = WSASend(clientinfo->sock, &wsabuf, 1, (LPDWORD)&BytesRecv, flags, &clientinfo->overlapped, NULL);
if (result == SOCKET_ERROR && (WSAGetLastError() != WSA_IO_PENDING))
{
CloseClient(ClientID);
}
clientinfo->SendContext.SendBytes += BytesRecv;
}
else
{
if (clientinfo->Processing == 0)
{
clientinfo->OperationCode = FD_WRITE; //If no more bytes left to send now we can set the operation code to write (in fact is read)
memset(clientinfo->RecvContext.Buffer, NULL, MAX_DATA_BUFFER_SIZE); //Clean the buffer for recv new data
//Recv data from our client
clientinfo->RecvContext.RecvBytes = WSARecv(clientinfo->sock, &wsabuf, 1, (LPDWORD)&BytesRecv, (LPDWORD)&flags, &clientinfo->overlapped, NULL);
if (clientinfo->RecvContext.RecvBytes == SOCKET_ERROR && WSAGetLastError() != WSA_IO_PENDING)
{
CloseClient(ClientID);
break;
}
}
}
break;
case FD_WRITE:
//Send data to the RecvProtocol
clientinfo->Processing = 1;
IOCPBase::internaldata->callback.RecvProtocol(clientinfo->RecvContext.Buffer, clientinfo->RecvContext.Len, ClientID);
clientinfo->Processing = 0;
default:
break;
}
}
}
}
return false;
}
The problem appears when looking at clientinfo->RecvContext.Buffer. I am watching the packet, past a few seconds and boom the buffer is overwritten with a new packet.
Thanks !
Never mind i fix the debug problem by copy the packet to the stack frame of the function i use to analyze the packet, this way i have no overwritten problem.