DPDK rte_eth_tx_burst() reliability - dpdk

According to the DPDK documentation, the rte_eth_tx_burst() function takes a batch of packets, and returns the number of packets that have been actually stored in transmit descriptors of the transmit ring.
Assuming that the packets are sent exactly in the same order as they are inserted in the tx_pkts array parameter, it is possible to call the function iteratively until all the packets are sent. Here a sample code taken from one of the examples:
sent = 0;
do {
n_pkts = rte_eth_tx_burst(portid, 0, &tx_pkts_burst[sent], n_mbufs - sent);
sent += n_pkts;
} while (sent < n_mbufs);
However, using the above code, I see that, sometimes, the amount of packets that the function says are sent, are not really sent.
I am accumulating the return value of rte_eth_tx_burst() in a variable and, at the end of the job, the value of the accumulator is greater than the value of opackets in the device eth_stats.
I see the same number of transmitted packets in eth_stats, eth_xstats and on the other side of the cable, and this number is less than the sum of the values returned by rte_eth_tx_burst().
So, my question is: in what case the rte_eth_tx_burst() function returns a value that does not correspond to the real number of transmitted packets?
According to the documentation, the function is returning only the number of packets that have been successfully inserted in the ring, so I assumed the return value was reliable.
My testbed:
NIC: Intel 82599ES
DPDK driver: igb_uio
DPDK version: 18.05
Traffic: UDP packets, sized 174B, with IP and UDP checksum offload
Edit 1
My test is the following:
the sender sends 32 messages with different IDs, then for each ACK received, a new message with the same ID of the ack-ed packet is sent again. The test ends when every ID has been sent and ack-ed N times (N=36864).
As described above, at some point one packet is not sent, so all the IDs complete the cycle, except one. This is what I see as output:
ID - #sent
0 - 36864
1 - 36864
2 - 36864
3 - 36864
4 - 8151
5 - 36864
6 - 36864
7 - 36864
....
At the end of the test, the accumulator variable with the number of packets sent is greater than the stats and the difference is 1. So, it looks like the rte_eth_tx_burst function failed to send that one packet that is not acknowledged.
Edit 2
It can be relevant that the value "n_mbufs" is not necessarily constant, since the packets are read as a burst from a ring.

Related

rte_eth_tx_burst() descriptor/mbuf management guarantees vs. free thresholds

The rte_eth_tx_burst() function is documented as:
* It is the responsibility of the rte_eth_tx_burst() function to
* transparently free the memory buffers of packets previously sent.
* This feature is driven by the *tx_free_thresh* value supplied to the
* rte_eth_dev_configure() function at device configuration time.
* When the number of free TX descriptors drops below this threshold, the
* rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf* buffers
* of those packets whose transmission was effectively completed.
I have a small test program where this doesn't seem to hold true (when using the ixgbe driver on a vfio X553 1GbE NIC).
So my program sets up one transmit queue like this:
uint16_t tx_ring_size = 1024-32;
rte_eth_dev_configure(port_id, 0, 1, &port_conf);
r = rte_eth_dev_adjust_nb_rx_tx_desc(port_id, &rx_ring_size, &tx_ring_size);
struct rte_eth_txconf txconf = dev_info.default_txconf;
r = rte_eth_tx_queue_setup(port_id, 0, tx_ring_size,
rte_eth_dev_socket_id(port_id), &txconf);
The transmit mbuf packet pool is created like this:
struct rte_mempool *pkt_pool = rte_pktmbuf_pool_create("pkt_pool", 1023, 341, 0,
RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
In that way, when sending packets I rather run out of TX descriptors before I run out of packet buffers. (the program generates packets with just one segment)
My expectation is that when I call rte_eth_tx_burst() in a loop (to send one packet after another) that it never fails since it transparently frees mbufs of already sent packets.
However, this doesn't happen.
I basically have a transmit loop like this:
for (unsigned i = 0; i < 2048; ++i) {
struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
// error check, prepare packet etc.
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
// error check etc.
}
After 1086 transmitted packets (of ~ 300 bytes each), rte_eth_tx_burst() returns 0.
I use the default threshold values, i.e. the queried values are (from dev_info.default_txconf):
tx thresh : 32
tx rs thresh: 32
wthresh : 0
So the main question now is: How hard is rte_eth_tx_burst() supposed to try to free mbuf buffers (and thus descriptors)?
I mean, it could busy loop until the transmission of previously supplied mbufs is completed.
Or it could just quickly check if some descriptors are free again. But if not, just give up.
Related question: Are the default threshold values appropriate for this use case?
So I work around this like that:
for (;;) {
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
if (l == 1) {
break;
} else {
RTE_LOG(ERR, USER1, "cannot send packet\n");
int r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
if (r < 0) {
rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
}
RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
}
}
With that I get output like this:
USER1: cannot send packet
USER1: 1086. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1118. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1150. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 1182. cleaned up 0 descriptors ...
[..]
USER1: cannot send packet
USER1: 1950. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 1982. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 0 descriptors ...
USER1: cannot send packet
USER1: 2014. cleaned up 32 descriptors ...
USER1: cannot send packet
USER1: 2046. cleaned up 32 descriptors ...
Meaning that it frees at most 32 descriptors like this. And that it doesn't always succeed, but then the next rte_eth_tx_burst() succeeds freeing some.
Side question: Is there a better more dpdk-idiomatic way to handle the recycling of mbufs?
When I change the code such that I run out of mbufs before I run out of transmit descriptors (i.e. tx ring created with 1024 descriptors, mbuf pool still has 1023 elements), I have to change the alloc part like this:
struct rte_mbuf *pkt;
do {
pkt = rte_pktmbuf_alloc(args.pkt_pool);
if (!pkt) {
r = rte_eth_tx_done_cleanup(args.port_id, 0, 256);
if (r < 0) {
rte_panic("%u. cannot cleanup tx descs: %s\n", i, rte_strerror(-r));
}
RTE_LOG(WARNING, USER1, "%u. cleaned up %d descriptors ...\n", i, r);
}
} while (!pkt);
The output is similar, e.g.:
USER1: 1023. cleaned up 95 descriptors ...
USER1: 1118. cleaned up 32 descriptors ...
USER1: 1150. cleaned up 32 descriptors ...
USER1: 1182. cleaned up 32 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 0 descriptors ...
USER1: 1214. cleaned up 32 descriptors ...
[..]
That means the freeing of descriptors/mbufs is so 'slow' that it has to busy loop up to 3 times.
Again, is this a valid approach, or are there better dpdk ways to solve this?
Since rte_eth_tx_done_cleanup() might return -ENOTSUP, this may point to the direction that my usage of it might not be the best solution.
Incidentally, even with the ixgbe driver it fails for me when I disable checksum offloads!
Apparently, ixgbe_dev_tx_done_cleanup() then invokes ixgbe_tx_done_cleanup_vec() instead of ixgbe_tx_done_cleanup_full() which unconditionally returns -ENOTSUP:
static int
ixgbe_tx_done_cleanup_vec(struct ixgbe_tx_queue *txq __rte_unused,
uint32_t free_cnt __rte_unused)
{
return -ENOTSUP;
}
Does this make sense?
So then perhaps the better strategy is then to make sure that there are less descriptors than pool elements (e.g. 1024-32 < 1023) and just re-call rte_eth_tx_burst() until it returns one?
That means like this:
for (;;) {
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
if (l == 1) {
break;
} else {
RTE_LOG(ERR, USER1, "%u. cannot send packet - retry\n", i);
}
}
This works, and the output shows again that the descriptors are freed 32 at a time, e.g.:
USER1: 1951. cannot send packet - retry
USER1: 1951. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 1983. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2015. cannot send packet - retry
USER1: 2047. cannot send packet - retry
USER1: 2047. cannot send packet - retry
I know that I also can use rte_eth_tx_burst() to submit bigger bursts. But I want to get the simple/edge cases right and understand the dpdk semantics, first.
I'm on Fedora 33 and DPDK 20.11.2.
Recommendation/Solution: after analyzing the cause of the issue is indeed with TX descriptor with either rte_mempool_list_dump or dpdk-procinfo, please use rte_eth_tx_buffer_flush or change the settings for TX thresholds.
Explanation:
The behaviour mbuf_free is varied across PMD, and within the same NIC PF and VF also varies. Follow are some points to understand this propely
rte_mempool can be created with or without cache elements.
when created with cached elements, depending upon the available lcores (eal_options) and number of cache elements per core parameter, the configured mbufs are added per core cache.
When HW offload DEV_TX_OFFLOAD_MBUF_FAST_FREE is available and enabled, the agreement is the mbuf will have ref_cnt as 1.
So when ever tx_burst (success or failure is invoked) threshold levels are checked if free mbuf/mbuf-segments can be pushed back to pool.
With DEV_TX_OFFLOAD_MBUF_FAST_FREE enabled the driver blindly puts the elements into lcore cache.
while in case of no DEV_TX_OFFLOAD_MBUF_FAST_FREE, generic approach of validating the MBUF ensuring the nb_segments and ref_cnt are checked, then pushed to mempool.
But always the either fixed (32 I believe is the default set for all PMD) or available free mbuf is pushed to cache or pool always.
Facts:
In the case of the IXGBE VF driver the option DEV_TX_OFFLOAD_MBUF_FAST_FREE is not available. Which means each time whenever thresholds are met, each individual mbuf are checked and pushed to the mempool.
as per the code snippet rte_eth_dev_configure is configured only for TX, and rte_pktmbuf_pool_create is created to have 341 elements as cache.
Assumption has to be made, that there is only 1 Lcore based (which runs the loop of alloc and tx).
Code Snippet-1:
for (unsigned i = 0; i < 2048; ++i) {
struct rte_mbuf *pkt = rte_pktmbuf_alloc(args.pkt_pool);
// error check, prepare packet etc.
uint16_t l = rte_eth_tx_burst(args.port_id, 0, &pkt, 1);
// error check etc.
}
After 1086 transmitted packets (of ~ 300 bytes each), rte_eth_tx_burst() returns 0.
[Observation] If indeed the mbuf were running, the rte_pktmbuf_alloc should be failing before rte_eth_tx_burst. But failing at 1086, creates an interesting phenomenon because total mbuf created is 1023, and failure happens are 2 iteration of 32 mbuf_release to mempool. Analyzing the driver code for ixgbe, it can be found that (only place return as 0) in tx_xmit_pkts is
/* Only use descriptors that are available */
nb_pkts = (uint16_t)RTE_MIN(txq->nb_tx_free, nb_pkts);
if (unlikely(nb_pkts == 0))
return 0;
Even though in config tx_ring_size is set to 992, internally rte_eth_dev_adjust_nb_desc sets to max of *nb_desc, desc_lim->nb_min. Based on the code it is not because there are no free mbuf, but it due to TX descriptor is low or not availble.
while in all other cases, whenever rte_eth_tx_done_cleanup or rte_eth_tx_buffer_flush these actually pushes any pending descriptors to be DMA immediately out of SW PMD. This internally frees up more descriptors which makes the tx_burst much smoother.
To identify the root cause, whenever DPDK API tx_burst return either
invoke rte_mempool_list_dump or
make use of mempool dump via dpdk-procinfo
Note: most PMD operates on amortizing the cost of the descriptor (PCIe payload) write by batching and bunching for at least 4 (in case of SSE). Hence a single packet even if DPDK tx_burst returning 1 will not be pushing the packet out of NIC. Hence to ensure use rte_eth_tx_buffer_flush.
Say, you invoke rte_eth_tx_burst() to send one small packet (single mbuf, no offloads). Suppose, the driver indeed pushes the packet to the HW. Doing so eats up one descriptor in the ring: the driver "remembers" that this packet mbuf is associated with that descriptor. But the packet is not sent instantly. The HW typically has some means to notify the driver of completions. Just imagine: if the driver checked for completions on every rte_eth_tx_burst() invocation (thus ignoring any thresholds), then calling rte_eth_tx_burst() one more time in a tight loop manner for another packet would likely consume one more descriptor rather than recycle the first one. So, given this fact, I'd not use tight loop when investigating tx_free_thresh semantics. And it shouldn't matter whether you invoke rte_eth_tx_burst() once per a packet or once per a batch of them.
Now. Say, you have a Tx ring of size N. Suppose, tx_free_thresh is M. And you have a mempool of size Z. What you do is allocate a burst of N - M - 1 small packets and invoke rte_eth_tx_burst() to send this burst (no offloads; each packet is assumed to eat up one Tx descriptor). Then you wait for some wittingly sufficient (for completions) amount of time and check the number of free objects in the mempool. This figure should read Z - (N - M - 1). Then you allocate and send one extra packet. Then wait again. This time, the number of spare objects in the mempool should read Z - (N - M). Finally, you allocate and send one more packet (again!) thus crossing the threshold (the number of spare Tx descriptors becomes less than M). During this invocation of rte_eth_tx_burst(), the driver should detect crossing the threshold and start checking for completions. This should make the driver free (N - M) descriptors (consumed by two previous rte_eth_tx_burst() invocations) thus clearing up the whole ring. Then the driver proceeds to push the new packet in question to the HW thus spending one descriptor. You then check the mempool: this should report Z - 1 free objects.
So, the short of it: no loop, just three rte_eth_tx_burst() invocations with sufficient waiting time between them. And you check the spare object count in the mempool after each send operation. Theoretically, this way, you'll be able to understand the corner case semantics. That's the gist of it. However, please keep in mind that the actual behaviour may vary across different vendors / PMDs.
Relying on rte_eth_tx_done_cleanup() really isn't an option since many PMDs don't implement it. Mostly Intel PMD's provide it, but e.g. SFC, MLX* and af_packet ones don't.
However, it's still unclear why the ixgbe PMD doesn't support cleanup when no offloads are enabled.
The requirements on rte_eth_tx_burst() with respect to freeing are really light - from the API docs:
* It is the responsibility of the rte_eth_tx_burst() function to
* transparently free the memory buffers of packets previously sent.
* This feature is driven by the *tx_free_thresh* value supplied to the
* rte_eth_dev_configure() function at device configuration time.
* When the number of free TX descriptors drops below this threshold, the
* rte_eth_tx_burst() function must [attempt to] free the *rte_mbuf* buffers
* of those packets whose transmission was effectively completed.
[..]
* #return
* The number of output packets actually stored in transmit descriptors of
* the transmit ring. The return value can be less than the value of the
* *tx_pkts* parameter when the transmit ring is full or has been filled up.
So just attempting to free (but not waiting on the results of that attempt) and returning 0 (since 0 is less than tx_pkts) is covered by that 'contract'.
FWIW, no example distributed with dpdk loops around rte_eth_tx_burst() to re-submit not-yet-sent packages. There are some examples that use rte_eth_tx_burst() and discard unsent packages, though.
AFAICS, besides rte_eth_tx_done_cleanup() and rte_eth_tx_burst() there is no other function for requesting the release of mbufs previously submitted for transmission.
Thus, it's advisable to size the mbuf packet pool larger than the configured ring size in order to survive situations where all mbufs are inflight and can't be recovered because there is no mbuf left for calling rte_eth_tx_burst() again.

Sometime Disconnect Req is inside Publish Message

On the client side I use:
mosquitto_pub -t tpc -m msg
On the server side I use nonblocking socket and socket() API:
https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzab6/xnonblock.htm
After first received packet I send connect acknowledge packet.
For each received packet I print how many bytes was received and whole buffer in hex.
I compare received data with WireShark capturing.
Sometime it works well:
37 bytes received - Connect Command
10 bytes received - Publish Message [tpc]
2 bytes received - Disconnect Req
Sometime I get Disconnect Req inside Publish Message [tpc]:
37 bytes received - Connect Command
12 bytes received - Publish Message [tpc] + Disconnect Req
These last two bytes are Disconnect Req:
30
8
0
3
74
70
63
6d
73
67
ffffffe0 <--
0 <--
How can I avoid these situations and get always 3 packets?
Short answer: you can't. You have to actually parse the messages to determine the length.
The constant to create a tcp socket is called SOCK_STREAM for a reason. Socket has to be treated as such: a stream of bytes. Nobody guarantees that one send() on one side results in one recv() on the other side. The only guarantee is that the sequence is preserved: abcd may become (ab, cd), but will not become acbd.
The packets may be splitted somewhere half the way. So it may be that the client sends 2048 bytes, but on the server side you'll receive first ~1400 bytes and then the rest. So N sends does not result in N recv.
Another thing is that the client also treats the socket as a stream. It may send byte by byte, or send a batch of messages with one send(). N messages are not N sends.

Decoding an unknown CRC or checksum?

I've been trying decode the CRC or checksum algorithm that is being used for the serial communication between a drone and its camera for about a week without a lot of luck and I was wondering if anybody here sees something I am missing or has any suggestions.
A typical packet looks like this:
FE1A390100020001AE0BE0FF090046250B00040000004E0D32080008540D8808F4016B54
They always start with 0xFE. The 2nd byte is the total size of the packet minus 10 bytes. The packet sizes vary, but I think I am specifically interested the 0x1A size. Byte 3 seems to be a packet counter because it usually increases by 1, but sometimes I have seen it jump to a completely different number for a few packets (usually when changing to a 0x22 size packet) before resuming the increment by 1 sequence. The last 2 bytes always change and I believe are the checksum or CRC. All the rest of the bytes seem to stay the same from one 0x1A packet to the next unless I manipulate the drones radio controls.
Right after powering up there is a series of packets that I assume is for initializing the communication. They are the shortest packets and have the least amount of change between them so it seems like they might be the easyiest to look at. Here are the first 7 bytes sent after powering it on.
From Drone to camera
Time:
8.3982205 FE030001000000010200018F68
8.39934725 FE03010100000001020001A844
8.400473958 FE03020100000001020001C130
8.401600708 FE050301000000000000000001AAE8
8.402900792 FE1A040100020001000000000000000000000C000300000853060008AB028808F4014629
8.406020958 FE22050100030002000000000000000000000000000000000000B3FFFFFFDE22006300FF615110050000C956
8.4098345 FE1A060100020001000000000000000000000C000300000853060008AB028808F40180A9
If I put the first 3 packets into reveng with -w 16 -s then it comes back with:
reveng: warning: you have only given 3 samples
reveng: warning: to reduce false positives, give 4 or more samples
width=16 poly=0x1487 init=0x0334 refin=false refout=false xorout=0x0000 check=0xa5b9 residue=0x0000 name=(none)
If i add the 4th packet it finds the same poly, but there rest of it looks differnt:
width=16 poly=0x1487 init=0x417d refin=false refout=false xorout=0x5582 check=0xbfa2 residue=0xb059 name=(none)
If i add the 5th packet reveng comes back with no model found.
However, if I remove packet 4 and then run it with packets, 1,2,3 and 5 if finds the same poly again, but different values for the rest:
width=16 poly=0x1487 init=0x804b refin=false refout=false xorout=0x0138 check=0x7dcc residue=0xc8ca name=(none)
Most combinations of packets containing a 0x1A size packet and the first 3 initialization packets that I run through reveng come back with 'no model found'. So far every time I have run reveng with only 0x1a sized packets has failed to find a model.
I think it is possible that after the initialization packets it some how incorporates info it receives from the camera to the drone into the CRC calculation for the data going from the drone to the camera, but there isn't a lot of data in those packets. Here are the first 9 packets that are sent from the camera to the drone. Prior to the first 0x1A packet being sent from the drone, the only data sent from the camera seems to be 0x7D0001.
From camera to drone:
Time
3.474456792 FE0500020000000000007D00013D40
4.475220208 FE0501020000000000007D000168C5
5.476483875 FE0502020000000000007D00018642
6.477295958 FE0503020000000000007D0001D3C7
7.4783405 FE0504020000000000007D00014B45
8.479420458 FE06050200010003FA078538B838B3
8.480811667 FE0506020000000000007D0001F047
9.48057875 FE0507020000000000007D0001A5C2
9.481883 FE06080200010003F9078638B8386037
I have tried incorporating 0x7D0001 into the packets and running them through reveng, but that didn't seem to help.
I have also tried reveng -w 8 -s on various combinations of packets without finding a model. And I have tried various checksum algos manually (possibly incorrectly) without success.
I have a bunch more data that I have captured here:
https://drive.google.com/open?id=1v8MCaXOvP_2Wv_hcaqhUZnXvqNI1_2Ur
Any ideas? Suggestions? This has been driving me nuts for a week

Why DPDK only cannot send and receive 60 bytes packet

I have written a simple DPDK send and receive application. When the packet len <= 60 bytes, send and receive application works, but when packet len > 60 bytes, send application show it has sent out packet. but in recieve application, it does not receive anything.
In send application:
mbuf_pool = rte_pktmbuf_pool_create("MBUF_POOL", NUM_MBUFS,
MBUF_CACHE_SIZE, 0, RTE_MBUF_DEFAULT_BUF_SIZE, rte_socket_id());
pkt = rte_pktmbuf_alloc(mbuf_pool);
pkt->data_len = packlen; //if packlen<=60, it works, but when packlen>60, receiver cannot receive anything.
I try both l2fwd and basicfwd as receive application. It is same result.
The issue is here:
pchar[12]=0;
pchar[13] = 0
This means Ethertype is 0. From the list of assigned Ethertypes:
https://www.iana.org/assignments/ieee-802-numbers/ieee-802-numbers.xhtml
We see that 0 means zero Ethernet frame length. Since the minimum Ethernet frame length is 64 (60 + 4 FCS), that is why you have troubles sending packets longer that 60 bytes.
To fix the issue, simply put there a reasonable Ethertype from the list above.

TCP memcpy buffer returns rubbish data using C++

I'm doing something similar to Stack Overflow question Handling partial return from recv() TCP in C.
The data receive is bigger than the buffer initialised (for example, 1000 bytes). Therefore a temporary buffer of a bigger size (for example, 10000 bytes) is used. The problem is that the multiple data received is rubbish. I've already checked the offset to memcpy to the temporary buffer, but I keep receiving rubbish data.
This sample shows what I do:
First message received:
memcpy(tmpBuff, dataRecv, 1000);
offSet = offSet + 1000;
Second msg onwards:
memcpy(tmpBuffer + offSet, dataRecv, 1000);
Is there something I should check?
I've checked the TCP hex that was sent out. Apparently, the sender is sending an incomplete message. How my program works is that when the sender sends the message, it will pack (message header + actual message). the message header has some meta data, and one of it is the message length.
When the receiver receives the packet, it will get the message header using the message header offset and message header length. It will extract the message length, check if the current packet size is more than or equal to the message length and return the correct message size to the users. If there's a remaining amount of message left in the packet, it will store it into a temporary buffer and wait to receive the next packet. When it receives the next packet, it will check the message header for the message length and do the same thing.
If the sender pack three messages in a packet, each message have its own message header indicating the message length. Assume all three messages are 300 bytes each in length. Also assume that the second message sent is incomplete and turns out to be only 100 bytes.
When the receiver receives the three messages in a packet, it will return the first message correctly. Since the second message is incomplete, my program wouldn't know, and so it will return 100 bytes from the second message and 200 bytes from the third message since the message header indicates the total size is 300 bytes. Thus the second message returned will have some rubbish data.
As for the third message, my program will try to get the message length from the message header. Since the first 200 bytes are already returned, the message header is invalid. Thus, the message length returned to my program will be rubbish as well. Is there a way to check for a complete message?
Suppose you are expecting 7000 bytes over the tcp connection. In this case it is very likely that your messages will be split into tcp packets with an actual payload size of let's say 1400 bytes (so 5 messages).
In this case it is perfectly possible consecutive recv calls with a target buffer of 1000 bytes will behave as follows:
recv -> reads 1000 bytes (packet 1)
recv -> reads 400 bytes (packet 1)
recv -> reads 1000 bytes (packet 2)
recv -> reads 400 bytes (packet 2)
...
Now, in this case, when reading the 400 bytes packet you still copy the full 1000 bytes to your larger buffer, actually pasting 600 bytes of rubbish in between. You should actually only memcpy the number of bytes received, which is the return value of recv itself. Of course you should also check if this value is 0 (socket closed) or less than zero (socket error).