dpdk ixgbe nic multi rx queue with rss not work well

dpdk ixgbe nic multi rx queue with rss not work well - dpdk

I am using IXGBE Nic, using dpdk 19.11 multi rx queue with rss "ETH_RSS_TCP | ETH_RSS_IP".
IXGBE support max 64 queues, and i used 4 queues. But all packages arrvied to the same queue(queue 0), it seems rss function not work well.
The following is the pseudo code of my rss part. Is the rss function not taking effect due to my incorrect configuration?
static int rss_setup(const int port_id, const uint16_t nb_queues)
{
int i;
int ret;
struct rte_eth_dev_info dev_info;
struct rte_eth_rss_reta_entry64 *reta_conf = NULL;
rte_eth_dev_info_get(port_id, &dev_info);
if (nb_queues == 0) {
return ERR_VAL;
}
reta_conf = calloc(dev_info.reta_size / RTE_RETA_GROUP_SIZE,
sizeof(struct rte_eth_rss_reta_entry64));
if (!reta_conf) {
return ERR_MEM;
}
for (i = 0; i < dev_info.reta_size; i++) {
struct rte_eth_rss_reta_entry64 *one_reta_conf =
&reta_conf[i / RTE_RETA_GROUP_SIZE];
one_reta_conf->reta[i % RTE_RETA_GROUP_SIZE] = i % nb_queues;
}
for (i = 0; i < dev_info.reta_size / RTE_RETA_GROUP_SIZE; i++) {
struct rte_eth_rss_reta_entry64 *one_reta_conf = &reta_conf[i];
one_reta_conf->mask = 0xFFFFFFFFFFFFFFFFULL;
}
ret = rte_eth_dev_rss_reta_update(port_id, reta_conf, dev_info.reta_size);
if (ret < 0) {
printf("cannot update rss reta at port %d: %s\n",
port_id, rte_strerror(-ret));
}
free(reta_conf);
return ERR_OK;
}
main (){
struct rte_eth_conf conf = {0}；
conf.link_speeds = ETH_LINK_SPEED_AUTONEG;
conf.txmode.mq_mode = ETH_MQ_TX_NONE;
conf.rxmode.mq_mode = ETH_MQ_RX_NONE;
struct rte_eth_dev_info dev_info;
rte_eth_dev_info_get(port_id, &dev_info);
int rss_enable = 0;
uint64_t def_rss_hf = ETH_RSS_TCP | ETH_RSS_IP;
struct rte_eth_rss_conf rss_conf = {
NULL,
40,
def_rss_hf,
};
rss_conf.rss_hf &= dev_info->flow_type_rss_offloads;
if (rss_conf.rss_hf) {
rss_enable = 1;
conf->rx_adv_conf.rss_conf = rss_conf;
conf->rxmode.mq_mode = ETH_MQ_RX_RSS;
rss_setup(port_id, 4); // queues cnt is 4
}
}

Looks like the program does not invoke rte_eth_dev_configure() before invoking rss_setup(). This is not correct as rte_eth_dev_configure() "must be invoked first before any other function in the Ethernet API" (doc reference). Please re-arrange the program to first invoke rte_eth_dev_configure() with rx_adv_conf.rss_conf and ETH_MQ_RX_RSS specified, then proceed with the rest of bring-up.
In rss_setup(), the program invokes rte_eth_dev_rss_reta_update(). For the PMD in question, this API can only be invoked in started state (code reference). So please re-arrange the program a bit more to invoke rte_eth_dev_start() before rss_setup().
Please make sure that TCP source port numbers (and/or IP source addresses) are randomised at the sender side. If IP/TCP source and destination fields do not vary, it's no surprise that the packets end up in the same Rx queue.

Related

How to send data simultaneously to 2 HID device which have the same VendorID and ProductID with HID API?

1. About the problem
I'm trying to send data store in a array called UxbTx[64] to 2 Hid devices that have the same VendorID(VID) and ProductID(PID) but differ in SerialNum (this differnce help me to enumerate 2 devices) and connect to PC via Usb Hub . I have successfully recognized two device and use hid_write() to send data but just 1 device acting right with my data config. For example, i want to turn on 2 devices but device 1 ON whereas the device 2 still remain OFF.
2. What I have tried
At first I thought that I have failed to send to the second device but it not. I use res=hid_write() and it return 0 which means successfully send for both devices.
This is the code that i use:
static hid_device** id_device;
static int max_size;
int res;
struct device_info
{
wchar_t* serial_num;
int id;
};
std::vector<device_info> devEnum(unsigned short vendor_id, unsigned short product_id)
{
std::vector<device_info> device;
int count = 0, total = 0, res = 0,i=0;
hid_device_info* dev = NULL, * cur_dev = NULL;
hid_device* temp_handle = NULL;
std::vector<wchar_t*> string;
wchar_t wstr[MAX_STR], temp[MAX_STR];
device_info inf;
hid_enumerate(vendor_id, product_id);
res = hid_init();
dev = hid_enumerate(0x461, 0x20);
for (cur_dev = dev; cur_dev != NULL; cur_dev = cur_dev->next)
{
memcpy(temp, cur_dev->serial_number, MAX_STR);
wcsncpy(temp, cur_dev->serial_number, MAX_STR);
temp[MAX_STR - 1] = L'\0';
string.push_back(temp);
inf.serial_num = cur_dev->serial_number;
inf.id = count;
device.push_back(inf);
count++;
}
max_size = device.size();
return device;
}
int NhgIsOpen(wchar_t* Manufacturer, wchar_t* Product, wchar_t* SerialNumber, std::vector<device_info> devices) {
int length = devices.size();
int res = 0;
id_device = new hid_device * [length];
wchar_t KeyManufacturer[MAX_STR] = L"test";
wchar_t KeyProduct[MAX_STR] = L"device test";
for (int i = 0; i < length; i++)
{
id_device[i] = hid_open(0x123, 0x15, devices[i].serial_num);
if (!id_device[i])
return -2;
hid_set_nonblocking(id_device[i], 0);
res = hid_get_manufacturer_string(id_device[i], Manufacturer, MAX_STR);
if (wcscmp(Manufacturer, KeyManufacturer) != 0)
return -3; //Manufacturer not match
res = hid_get_product_string(id_device[i], Product, MAX_STR);
if (wcscmp(Product, KeyProduct) != 0)
return -4; // KeyProdeuct not match
}
return 0;
}
INT32 IoControl(UINT16 IoState, int axis_id) {
UINT8 UsbTx[64];
//clear TX buffer
std::fill_n(UsbTx, 64, 0);
//report byte
UsbTx[0] = 0x01;
//USB user define cmd
UsbTx[1] = 0x00;
UsbTx[2] = 0x15; // turn on device
UsbTx[11] = 0x00;
UsbTx[12] = (0xFF) & IoState;
res = hid_write(id_device[axis_id], UsbTx, 64);
if (res < 0)
return -5; //can't write
return 0;
}
3. Question
If i can enumerate 2 devices does it means i can talk to 2 devices simultaneously?
Does the Report ID in firmware is the problem? I mean does i have to re-config Report ID in the Descriptor of the device so they can read the data at the same time?

Interface or port id to IP address in dpdk 20.11

Are they any existing API's in dpdk (as of 20.11 LTS) to map IP addresses to interfaces (port-id's) or vice versa?

As per the current available DPDK 21.05.0 release, there is no direct API that supports the mapping of IP address to port Number. This is because DPDK API does not implement IP stack.
if one needs to maintain a simple mapping of IP address to DPDK port, I heavily recommend using a simple array to do so.
uin8_t mapPortIp4[RTE_MAX_ETHPORTS][4] = {0};
uint8_t rte_ipv6_hdr mapPortIp6[RTE_MAX_ETHPORTS][16] = {0};
int updateIp4(uin8_t Ip4[4], uint16_t port)
{
if (port >= rte_eth_dev_count_avail())
return -1;
mapPortIp4[port][0] = Ip4[0];
mapPortIp4[port][1] = Ip4[1];
mapPortIp4[port][2] = Ip4[2];
mapPortIp4[port][3] = Ip4[3];
return 0;
}
int updateIp6(uin8_t Ip6[16], uint16_t port)
{
if (port >= rte_eth_dev_count_avail())
return -1;
mapPortIp6[port][0] = Ip6[0];
mapPortIp6[port][1] = Ip6[1];
mapPortIp6[port][2] = Ip6[2];
mapPortIp6[port][3] = Ip6[3];
mapPortIp6[port][4] = Ip6[4];
mapPortIp6[port][5] = Ip6[5];
mapPortIp6[port][6] = Ip6[6];
mapPortIp6[port][7] = Ip6[7];
mapPortIp6[port][8] = Ip6[8];
mapPortIp6[port][9] = Ip6[9];
mapPortIp6[port][10] = Ip6[10];
mapPortIp6[port][11] = Ip6[11];
mapPortIp6[port][12] = Ip6[12];
mapPortIp6[port][13] = Ip6[13];
mapPortIp6[port][14] = Ip6[14];
mapPortIp6[port][15] = Ip6[15];
return 0;
}

DPDK 17.11.1 - drops seen when doing destination based rate limiting

Editing the problem statement to highlight more on the core logic
We are seeing performance issues when doing destination based rate limiting.
We maintain state for every {destination-src} pair (max of 100 destinations and 2^16 sources). We have an array of 100 nodes and at each node we have a rte_hash*. This hash table is going to maintain the state of every source ip seen by that destination. We have a mapping for every destination seen (0 to 100) and this is used to index into the array. If a particular source exceeds a threshold defined for this destination in a second, we block the source, else we allow the source. At runtime, when we see only traffic for 2 or 3 destinations, there are no issues, but when we go beyond 5, we are seeing lot of drops. Our function has to do a lookup and identify the flow matching the dest_ip and src_ip. Process the flow and decide whether it needs dropping. If the flow is not found, add it to the hash.
struct flow_state {
struct rte_hash* hash;
};
struct flow_state flow_state_arr[100];
// am going to create these hash tables using rte_hash_create at pipeline_init and free them during pipeline_free.
Am outlining what we do in pseudocode.
run()
{
1) do rx
2) from the pkt, get index into the flow_state_arr and retrieve the rte_hash* handle
3) rte_hash_lookup_data(hash, src_ip,flow_data)
4) if entry found, take decision on the flow (the decision is simply say rate limiting the flow)
5) else rte_hash_add_data(hash,src_ip,new_flow_data) to add the flow to table and forward
}
Please guide if we can have these multiple hash table objects in data path or whats the best way if we need to handle states for every destination separately.
Edit
Thanks for answering. I will be glad to share the code snippets and our gathered results. I don't have comparison results for other DPDK versions, but below are some of the results for our tests using 17.11.1.
Test Setup
Am using IXIA traffic gen (using two 10G links to generate 12Mpps) for 3 destinations 14.143.156.x (in this case - 101,102,103). Each destination's traffic comes from 2^16 different sources. This is the traffic gen setup.
Code Snippet
struct flow_state_t {
struct rte_hash* hash;
uint32_t size;
uint64_t threshold;
};
struct flow_data_t {
uint8_t curr_state; // 0 if blocked, 1 if allowed
uint64_t pps_count;
uint64_t src_first_seen;
};
struct pipeline_ratelimit {
struct pipeline p;
struct pipeline_ratelimit_params params;
rte_table_hash_op_hash f_hash;
uint32_t swap_field0_offset[SWAP_DIM];
uint32_t swap_field1_offset[SWAP_DIM];
uint64_t swap_field_mask[SWAP_DIM];
uint32_t swap_n_fields;
pipeline_msg_req_handler custom_handlers[2]; // handlers for add and del
struct flow_state_t flow_state_arr[100];
struct flow_data_t flows[100][65536];
} __rte_cache_aligned;
/*
add_handler(pipeline,msg) -- msg includes index and threshold
In the add handler
a rule/ threshold is added for a destination
rte_hash_create and store rte_hash* in flow_state_arr[index]
max of 100 destinations or rules are allowed
previous pipelines add the ID (index) to the packet to look in to the
flow_state_arr for the rule
*/
/*
del_handler(pipeline,msg) -- msg includes index
In the del handler
a rule/ threshold #index is deleted
the associated rte_hash* is also freed
the slot is made free
*/
#define ALLOWED 1
#define BLOCKED 0
#define TABLE_MAX_CAPACITY 65536
int do_rate_limit(struct pipeline_ratelimit* ps, uint32_t id, unsigned char* pkt)
{
uint64_t curr_time_stamp = rte_get_timer_cycles();
struct iphdr* iph = (struct iphdr*)pkt;
uint32_t src_ip = rte_be_to_cpu_32(iph->saddr);
struct flow_state_t* node = &ps->flow_state_arr[id];
struct flow_data_t* flow = NULL
rte_hash_lookup_data(node->hash, &src_ip, (void**)&flow);
if (flow != NULL)
{
if (flow->curr_state == ALLOWED)
{
if (flow->pps_count++ > node->threshold)
{
uint64_t seconds_elapsed = (curr_time_stamp - flow->src_first_seen) / CYCLES_IN_1SEC;
if (seconds_elapsed)
{
flow->src_first_seen += seconds_elapsed * CYCLES_IN_1_SEC;
flow->pps_count = 1;
return ALLOWED;
}
else
{
flow->pps_count = 0;
flow->curr_state = BLOCKED;
return BLOCKED;
}
}
return ALLOWED;
}
else
{
uint64_t seconds_elapsed = (curr_time_stamp - flow->src_first_seen) / CYCLES_IN_1SEC;
if (seconds_elapsed > 120)
{
flow->curr_state = ALLOWED;
flow->pps_count = 0;
flow->src_first_seen += seconds_elapsed * CYCLES_IN_1_SEC;
return ALLOWED;
}
return BLOCKED;
}
}
int index = node->size;
// If entry not found and we have reached capacity
// Remove the rear element and mark it as the index for the new node
if (node->size == TABLE_MAX_CAPACITY)
{
rte_hash_reset(node->hash);
index = node->size = 0;
}
// Add new element #packet_flows[mit_id][index]
struct flow_data_t* flow_data = &ps->flows[id][index];
*flow_data = { ALLOWED, 1, curr_time_stamp };
node->size++;
// Add the new key to hash
rte_hash_add_key_data(node->hash, (void*)&src_ip, (void*)flow_data);
return ALLOWED;
}
static int pipeline_ratelimit_run(void* pipeline)
{
struct pipeline_ratelimit* ps = (struct pipeline_ratelimit*)pipeline;
struct rte_port_in* port_in = p->port_in_next;
struct rte_port_out* port_out = &p->ports_out[0];
struct rte_port_out* port_drop = &p->ports_out[2];
uint8_t valid_pkt_cnt = 0, invalid_pkt_cnt = 0;
struct rte_mbuf* valid_pkts[RTE_PORT_IN_BURST_SIZE_MAX];
struct rte_mbuf* invalid_pkts[RTE_PORT_IN_BURST_SIZE_MAX];
memset(valid_pkts, 0, sizeof(valid_pkts));
memset(invalid_pkts, 0, sizeof(invalid_pkts));
uint64_t n_pkts;
if (unlikely(port_in == NULL)) {
return 0;
}
/* Input port RX */
n_pkts = port_in->ops.f_rx(port_in->h_port, p->pkts,
port_in->burst_size);
if (n_pkts == 0)
{
p->port_in_next = port_in->next;
return 0;
}
uint32_t rc = 0;
char* rx_pkt = NULL;
for (j = 0; j < n_pkts; j++) {
struct rte_mbuf* m = p->pkts[j];
rx_pkt = rte_pktmbuf_mtod(m, char*);
uint32_t id = rte_be_to_cpu_32(*(uint32_t*)(rx_pkt - sizeof(uint32_t)));
unsigned short packet_len = rte_be_to_cpu_16(*((unsigned short*)(rx_pkt + 16)));
struct flow_state_t* node = &(ps->flow_state_arr[id]);
if (node->hash && node->threshold != 0)
{
// Decide whether to allow of drop the packet
// returns allow - 1, drop - 0
if (do_rate_limit(ps, id, (unsigned char*)(rx_pkt + 14)))
valid_pkts[valid_pkt_count++] = m;
else
invalid_pkts[invalid_pkt_count++] = m;
}
else
valid_pkts[valid_pkt_count++] = m;
if (invalid_pkt_cnt) {
p->pkts_mask = 0;
rte_memcpy(p->pkts, invalid_pkts, sizeof(invalid_pkts));
p->pkts_mask = RTE_LEN2MASK(invalid_pkt_cnt, uint64_t);
rte_pipeline_action_handler_port_bulk_mod(p, p->pkts_mask, port_drop);
}
p->pkts_mask = 0;
memset(p->pkts, 0, sizeof(p->pkts));
if (valid_pkt_cnt != 0)
{
rte_memcpy(p->pkts, valid_pkts, sizeof(valid_pkts));
p->pkts_mask = RTE_LEN2MASK(valid_pkt_cnt, uint64_t);
}
rte_pipeline_action_handler_port_bulk_mod(p, p->pkts_mask, port_out);
/* Pick candidate for next port IN to serve */
p->port_in_next = port_in->next;
return (int)n_pkts;
}
}
RESULTS
When generated traffic for only one destination from 60000 sources with threshold of 14Mpps, there were no drops. We were able to send 12Mpps from IXIA and recv 12Mpps
Drops were observed after adding 3 or more destinations (each configured to recv traffic from 60000 sources). The throughput was only 8-9 Mpps. When sent for 100 destinations (60000 src each), only 6.4Mpps were handled. 50% drop was seen.
On running it through vtune-profiler, it reported rte_hash_lookup_data as the hotspot and mostly memory bound (DRAM bound). I will attach the vtune report soon.

Based on the update from internal testing, rte_hash library is not causing performance drops. Hence as suggested in comment is more likely due to current pattern and algorithm design which might be leading cache misses and lesser Instruction per Cycle.
To identify whether it is frontend stall or backend pipeline stall or memory stall please either use perf or vtune. Also try to minimize branching and use more likely and prefetch too.

Why I receive no WT_PACKETEXT window messages after initializing Wintab extensions?

I'm currently trying to process the absolute value of a drawing tablet's touch ring, through the Wintab API. However, despite following instructions as they are described in the documentation, it seems like the WTOpen call doesn't set any extension settings. Using the touch ring after initializing Wintab still triggers the default events, while the default events for pen inputs are suppressed and all pen inputs related to my application instead.
Here are the relevant segments of code:
...
#include "wintab.h"
#define PACKETDATA (PK_X | PK_Y | PK_Z | PK_NORMAL_PRESSURE | PK_ORIENTATION | PK_TIME | PK_BUTTONS)
#define PACKETMODE 0
#define PACKETTOUCHSTRIP PKEXT_ABSOLUTE
#define PACKETTOUCHRING PKEXT_ABSOLUTE
#include "pktdef.h"
...
internal b32
InitWinTab(HWND Window, window_mapping *Map)
{
if(!LoadWintabFunctions())
return false;
LOGCONTEXT Tablet;
AXIS TabletX, TabletY, TabletZ, Pressure;
if(!gpWTInfoA(WTI_DEFCONTEXT, 0, &Tablet))
return false;
gpWTInfoA(WTI_DEVICES, DVC_X, &TabletX);
gpWTInfoA(WTI_DEVICES, DVC_Y, &TabletY);
gpWTInfoA(WTI_DEVICES, DVC_Z, &TabletZ);
gpWTInfoA(WTI_DEVICES, DVC_NPRESSURE, &Pressure);
UINT TouchStripOffset = 0xFFFF;
UINT TouchRingOffset = 0xFFFF;
for(UINT i = 0, ScanTag = 0; gpWTInfoA(WTI_EXTENSIONS + i, EXT_TAG, &ScanTag); i++)
{
if (ScanTag == WTX_TOUCHSTRIP)
TouchStripOffset = i;
if (ScanTag == WTX_TOUCHRING)
TouchRingOffset = i;
}
Tablet.lcOptions |= CXO_MESSAGES;
Tablet.lcPktData = PACKETDATA;
Tablet.lcPktMode = PACKETMODE;
Tablet.lcMoveMask = PACKETDATA;
Tablet.lcBtnUpMask = Tablet.lcBtnDnMask;
Tablet.lcInOrgX = 0;
Tablet.lcInOrgY = 0;
Tablet.lcInExtX = TabletX.axMax;
Tablet.lcInExtY = TabletY.axMax;
if(TouchStripOffset != 0xFFFF)
{
WTPKT DataMask;
gpWTInfoA(WTI_EXTENSIONS + TouchStripOffset, EXT_MASK, &DataMask);
Tablet.lcPktData |= DataMask;
}
if(TouchRingOffset != 0xFFFF)
{
WTPKT DataMask;
gpWTInfoA(WTI_EXTENSIONS + TouchRingOffset, EXT_MASK, &DataMask);
Tablet.lcPktData |= DataMask;
}
Map->AxisMax.x = (r32)TabletX.axMax;
Map->AxisMax.y = (r32)TabletY.axMax;
Map->AxisMax.z = (r32)TabletZ.axMax;
Map->PressureMax = (r32)Pressure.axMax;
if(!gpWTOpenA(Window, &Tablet, TRUE))
return false;
return(TabletX.axMax && TabletY.axMax && TabletZ.axMax && Pressure.axMax);
}
...
case WT_PACKET:
{
PACKET Packet;
if(gpWTPacket((HCTX)LParam, (UINT)WParam, &Packet))
{
...
}
} break;
case WT_PACKETEXT:
{
PACKETEXT Packet;
if(gpWTPacket((HCTX)LParam, (UINT)WParam, &Packet))
{
...
}
} break;
...
The bitmask for the packet data in the initialization have sensible bits set for both extensions and don't overlap with the existing bitmask. No stage of the initialization fails. WT_PACKET gets called only with valid packet data while WT_PACKETEXT never gets called. Furthermore, calling WTPacketsGet with a PACKETEXT pointer on the HCTX returned by WTOpen fills the packet with garbage from the regular packet queue. This leaves me with the conclusion that somehow WTOpen didn't receive notification that the extensions should be loaded and I'm unable to find what else I should define in the LOGCONTEXT data structure to change that.
Is there a mistake in my initialization? Or is there a way to get a better readout to why the extensions didn't load?

It turned out that a driver setting prevented the extension packets from being sent, in favor of using the touch ring for different function. Changing this setting resolved the issue. The code didn't contain any errors itself.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

dpdk ixgbe nic multi rx queue with rss not work well - dpdk

Related

How to send data simultaneously to 2 HID device which have the same VendorID and ProductID with HID API?

Interface or port id to IP address in dpdk 20.11

DPDK 17.11.1 - drops seen when doing destination based rate limiting

More efficient way for reading CAN data in while loop

Why I receive no WT_PACKETEXT window messages after initializing Wintab extensions?

Categories

Resources