I'm writing a C++ function for get CPU usage of specific process in Windows.
Many reference (like this) suggest using function GetProcessTimes for implementation.
However, I tried with a sample program but the value of KernelTime and UserTime is not changed over times.Below is my code:
#include <iostream>
#include <Windows.h>
int main()
{
int processID = 14532;
HANDLE processHandle = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, processID);
if (processHandle == NULL) {
return -1;
}
FILETIME ftProcCreation, ftProcExit, ftProcKernel, ftProcUser;
for (int i = 0; i < 10; i++) {
if (!GetProcessTimes(GetCurrentProcess(), &ftProcCreation,
&ftProcExit, &ftProcKernel, &ftProcUser)) {
return -1;
}
LARGE_INTEGER lKernel, lUser;
lKernel.LowPart = ftProcKernel.dwLowDateTime;
lKernel.HighPart = ftProcKernel.dwHighDateTime;
lUser.LowPart = ftProcUser.dwLowDateTime;
lUser.HighPart = ftProcUser.dwHighDateTime;
printf("%lld : %lld\n", lKernel.QuadPart, lUser.QuadPart);
Sleep(250);
}
}
The process that I inspect is a running Virtualbox process that always take about 20% of CPU.
However, when I run the sample code, the result is as below:
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
0 : 0
Sometimes, it might give following result:
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
312500 : 0
Again, it might give following result:
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
156250 : 0
Or might be:
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
0 : 156250
And so on, but the value of "0:0" is more frequency.
Is there any wrong with my code, so that it cannot get Kernel/Idle time of the process? And why the value does not change over time?
=======
You're querying the time of the current process (GetCurrentProcess()), not the target process (processHandle). Since GetProcessTimes() returns the CPU time consumed by the process, but your process isn't consuming much time as it is mostly sleeping, the result changes very slowly.
So, pass processHandle instead of GetCurrentProcess() to GetProcessTimes().
Related
I'm working on a project where I am supposed to generate and solve random mazes with DFS and backtracking. So far I have managed to mark single nodes as "visited", and they show up as "1" instead of "0" on the grid, yet when I try to run the "random neighbour selection"-algorithm below, the program only visits a few nodes and then stops. Sometimes it even loops back and forth between i.e coordinate (0, 2) and (0, 1). Any tips? Code:
// Starting node. Set to "visited".
node new_node{0, 0};
set_as_visited(new_node);
node_stack.push(new_node);
// "visited_nodes" is the counter for visited nodes.
if (visited_nodes < (rows * cols)) {
int stack_x = node_stack.top().x;
int stack_y = node_stack.top().y;
std::vector<int> neighbour;
int north_x = stack_x;
int north_y = stack_y - 1;
if (north_y > 0 && !is_visited(north_x, north_y)) {
neighbour.push_back(0);
}
int east_x = stack_x + 1;
int east_y = stack_y;
if (east_x < (rows - 1) && !is_visited(east_x, east_y)) {
neighbour.push_back(1);
}
int west_x = stack_x - 1;
int west_y = stack_y;
if (west_x > 0 && !is_visited(west_x, west_y)) {
neighbour.push_back(2);
}
int south_x = stack_x;
int south_y = stack_y + 1;
if (south_y < (cols - 1) && !is_visited(south_x, south_y)) {
neighbour.push_back(3);
}
if (!neighbour.empty()) {
srand(time(nullptr));
int random_neighbour = neighbour[rand() % neighbour.size()];
node neighbour_to_add{};
switch (random_neighbour) {
case 0: // North
neighbour_to_add.x = north_x;
neighbour_to_add.y = north_y;
break;
case 1: // East
neighbour_to_add.x = east_x;
neighbour_to_add.y = east_y;
break;
case 2: // West
neighbour_to_add.x = west_x;
neighbour_to_add.y = west_y;
break;
case 3: // South
neighbour_to_add.x = south_x;
neighbour_to_add.y = south_y;
break;
}
set_as_visited(neighbour_to_add);
node_stack.push(neighbour_to_add);
}
else node_stack.pop();
}
draw_maze(grid);
Here is my node struct:
struct node {
int x, y;
};
This is what shows up on compile:
1 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
Process finished with exit code 0
When I try to throw my algorithm into a while-loop, nothing gets printed out at all. I'm not even sure the code is compiling by then. All I see is my working directory.
Thank you in advance!
The platform is Ubuntu 20.04 with intel 82599 NIC, using DPDK 20.08. Down below is the main demo code. Even though every dummy thread can get packets with rte_eth_rx_burst, only the q_ipackets[0] can return non-zero number, which equals to ipackets. Is this not supported by NIC or caused by misconfiguration?
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <inttypes.h>
#include <sys/types.h>
#include <string.h>
#include <sys/queue.h>
#include <stdarg.h>
#include <errno.h>
#include <getopt.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <rte_common.h>
#include <rte_byteorder.h>
#include <rte_log.h>
#include <rte_memory.h>
#include <rte_memcpy.h>
#include <rte_eal.h>
#include <rte_launch.h>
#include <rte_atomic.h>
#include <rte_cycles.h>
#include <rte_prefetch.h>
#include <rte_lcore.h>
#include <rte_per_lcore.h>
#include <rte_branch_prediction.h>
#include <rte_interrupts.h>
#include <rte_random.h>
#include <rte_debug.h>
#include <rte_ether.h>
#include <rte_ethdev.h>
#include <rte_mempool.h>
#include <rte_mbuf.h>
#include <rte_ip.h>
#include <rte_tcp.h>
#include <rte_udp.h>
#include <rte_string_fns.h>
#include <rte_acl.h>
#include <rte_ring.h>
#include <rte_ethdev.h>
#include <rte_hash.h>
#include <rte_rwlock.h>
#include <rte_flow.h>
#include <rte_ring_elem.h>
#include <rte_bpf.h>
#include <rte_member.h>
#include "include/data_structure.h"
#include "include/utils.h"
#define DEFAULT_RX_PORT 0
#define DEFAULT_TX_PORT 1
#define RTE_TEST_RX_DESC_DEFAULT 4096
#define RTE_TEST_TX_DESC_DEFAULT 4096
#define NB_MBUF 1048576
#define MEMPOOL_CACHE_SIZE 256
#define BURST_SIZE 32
static struct rte_eth_conf port_conf = {
.rxmode = {
.mq_mode = ETH_MQ_RX_RSS,
.max_rx_pkt_len = RTE_ETHER_MAX_LEN,
},
.rx_adv_conf = {
.rss_conf = {
.rss_key = NULL,
.rss_hf = ETH_RSS_IPV4 | ETH_RSS_NONFRAG_IPV4_TCP | ETH_RSS_NONFRAG_IPV4_UDP,
},
},
.txmode = {
.mq_mode = ETH_MQ_TX_NONE,
},
};
static int
dummy_thread( void *args ){
struct thread_args *t_args = (struct thread_args*)args;
struct rte_mbuf *bufs[BURST_SIZE];
uint32_t nb_deq = 0;
printf("Entering thread...\n");
printf("Port %u queue %u\n", t_args->portid, t_args->queueid);
while(true){
nb_deq = rte_eth_rx_burst(t_args->portid, t_args->queueid, bufs, BURST_SIZE);
if(unlikely(nb_deq == 0))
continue;
do{
rte_pktmbuf_free(bufs[--nb_deq]);
}while (nb_deq > 0);
}
}
int
main(int argc, char **argv)
{
int ret;
ret = rte_eal_init(argc, argv);
if(ret < 0)
rte_exit(EXIT_FAILURE, "Invalid EAL parameters\n");
argc -= ret;
argv += ret;
unsigned nb_ports = rte_eth_dev_count_avail();
uint32_t nb_lcores = rte_lcore_count();
uint16_t portid;
uint8_t nb_rx_queue, nb_tx_queue;
struct rte_eth_dev_info dev_info;
struct rte_eth_conf local_port_conf = port_conf;
portid = 0;
ret = rte_eth_dev_info_get(portid, &dev_info);
nb_rx_queue = 5;
nb_tx_queue = 0;
ret = rte_eth_dev_configure(portid, nb_rx_queue, nb_tx_queue, &local_port_conf);
if (ret < 0)
rte_exit(EXIT_FAILURE,
"Cannot configure device: err=%d, port=%d\n",
ret, portid);
else
printf("Configuring finished...\n");
uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
ret = rte_eth_dev_adjust_nb_rx_tx_desc(portid, &nb_rxd, &nb_txd);
if (ret < 0)
rte_exit(EXIT_FAILURE,
"rte_eth_dev_adjust_nb_rx_tx_desc: err=%d, port=%d\n",
ret, portid);
struct rte_mempool *pktmbuf_pool;
pktmbuf_pool = rte_pktmbuf_pool_create("global_pktmbuf_pool", NB_MBUF,
MEMPOOL_CACHE_SIZE, 0,
RTE_MBUF_DEFAULT_BUF_SIZE,
rte_socket_id());
if (pktmbuf_pool == NULL)
rte_exit(EXIT_FAILURE,
"Cannot init mbuf pool on socket %d\n",
rte_socket_id());
else
printf("Allocated Succeeded...\n");
for (uint8_t queue = 0; queue < nb_rx_queue; queue ++){
struct rte_eth_rxconf rxq_conf;
ret = rte_eth_dev_info_get(portid, &dev_info);
if (ret != 0)
rte_exit(EXIT_FAILURE,
"Error during getting device (port %u) info: %s\n",
portid, strerror(-ret));
rxq_conf = dev_info.default_rxconf;
rxq_conf.offloads = port_conf.rxmode.offloads;
ret = rte_eth_rx_queue_setup(portid, queue, nb_rxd, rte_socket_id(), &rxq_conf, pktmbuf_pool);
if (ret < 0)
rte_exit(EXIT_FAILURE,
"rte_eth_rx_queue_setup: err=%d,"
"port=%d\n", ret, portid);
}
struct thread_args *args = NULL;
for (uint8_t queue = 0; queue < nb_rx_queue; queue ++){
args = (struct thread_args*)malloc(sizeof(struct thread_args));
args->logic_no = queue;
args->portid = portid;
args->queueid = queue;
// rte_eal_remote_launch(thread, (void*)args, queue + 1);
rte_eal_remote_launch(dummy_thread, (void*)args, queue + 1);
}
ret = rte_eth_promiscuous_enable(portid);
if (ret != 0)
rte_exit(EXIT_FAILURE,
"rte_eth_promiscuous_enable: err=%s, port=%u\n",
rte_strerror(-ret), portid);
else
printf("Enabled Promiscuous Mode...\n");
if(ret == 0){
printf("Succeeded...\n");
}else{
printf("failed...\n");
}
ret = rte_eth_dev_start(portid);
if (ret < 0)
rte_exit(EXIT_FAILURE,
"rte_eth_dev_start: err=%d, port=%d\n",
ret, portid);
else
printf("Starting port succeeded...\n");
struct rte_eth_stats stats;
while(true){
sleep(2);
rte_eth_stats_get(portid, &stats);
printf("***************************************\n");
for( unsigned long i = 0; i < nb_rx_queue; i++ ){
printf("hardware port 0 queue %lu queue length: %d\n", i, rte_eth_rx_queue_count(0, i));
}
for( unsigned long i = 0; i < RTE_ETHDEV_QUEUE_STAT_CNTRS; i++ ){
printf("hardware port 0 queue %lu queue received: %lu\n", i, stats.q_ipackets[i]);
}
printf("%lu\n", stats.ipackets);
printf("%lu\n", stats.imissed);
}
rte_eal_mp_wait_lcore();
}
ps:
The traffic used is generated by MoonGen, which contains only udp packets and l3 information as
Sip: 10.1.0.10
Dip: 10.2.0.10 - 10.3.0.10
Sport: 1234
Sport: 319
The ethtool stats are
NIC statistics:
rx_packets: 17029206
tx_packets: 18
rx_bytes: 5108761800
tx_bytes: 1476
rx_pkts_nic: 17029206
tx_pkts_nic: 18
rx_bytes_nic: 5178531776
tx_bytes_nic: 1548
lsc_int: 11
tx_busy: 0
non_eop_descs: 0
rx_errors: 0
tx_errors: 0
rx_dropped: 0
tx_dropped: 0
multicast: 0
broadcast: 0
rx_no_buffer_count: 0
collisions: 0
rx_over_errors: 0
rx_crc_errors: 0
rx_frame_errors: 0
hw_rsc_aggregated: 0
hw_rsc_flushed: 0
fdir_match: 0
fdir_miss: 17045796
fdir_overflow: 0
rx_fifo_errors: 0
rx_missed_errors: 5438
tx_aborted_errors: 0
tx_carrier_errors: 0
tx_fifo_errors: 0
tx_heartbeat_errors: 0
tx_timeout_count: 0
tx_restart_queue: 0
rx_length_errors: 0
rx_long_length_errors: 0
rx_short_length_errors: 0
tx_flow_control_xon: 2
rx_flow_control_xon: 0
tx_flow_control_xoff: 4
rx_flow_control_xoff: 0
rx_csum_offload_errors: 17028934
alloc_rx_page: 32287
alloc_rx_page_failed: 0
alloc_rx_buff_failed: 0
rx_no_dma_resources: 0
os2bmc_rx_by_bmc: 0
os2bmc_tx_by_bmc: 0
os2bmc_tx_by_host: 0
os2bmc_rx_by_host: 0
tx_hwtstamp_timeouts: 0
tx_hwtstamp_skipped: 0
rx_hwtstamp_cleared: 0
tx_ipsec: 0
rx_ipsec: 0
fcoe_bad_fccrc: 0
rx_fcoe_dropped: 0
rx_fcoe_packets: 0
rx_fcoe_dwords: 0
fcoe_noddp: 0
fcoe_noddp_ext_buff: 0
tx_fcoe_packets: 0
tx_fcoe_dwords: 0
tx_queue_0_packets: 0
tx_queue_0_bytes: 0
tx_queue_1_packets: 12
tx_queue_1_bytes: 936
tx_queue_2_packets: 0
tx_queue_2_bytes: 0
tx_queue_3_packets: 0
tx_queue_3_bytes: 0
tx_queue_4_packets: 0
tx_queue_4_bytes: 0
tx_queue_5_packets: 0
tx_queue_5_bytes: 0
tx_queue_6_packets: 0
tx_queue_6_bytes: 0
tx_queue_7_packets: 0
tx_queue_7_bytes: 0
tx_queue_8_packets: 0
tx_queue_8_bytes: 0
tx_queue_9_packets: 0
tx_queue_9_bytes: 0
tx_queue_10_packets: 0
tx_queue_10_bytes: 0
tx_queue_11_packets: 0
tx_queue_11_bytes: 0
tx_queue_12_packets: 0
tx_queue_12_bytes: 0
tx_queue_13_packets: 0
tx_queue_13_bytes: 0
tx_queue_14_packets: 0
tx_queue_14_bytes: 0
tx_queue_15_packets: 0
tx_queue_15_bytes: 0
tx_queue_16_packets: 0
tx_queue_16_bytes: 0
tx_queue_17_packets: 0
tx_queue_17_bytes: 0
tx_queue_18_packets: 0
tx_queue_18_bytes: 0
tx_queue_19_packets: 0
tx_queue_19_bytes: 0
tx_queue_20_packets: 0
tx_queue_20_bytes: 0
tx_queue_21_packets: 0
tx_queue_21_bytes: 0
tx_queue_22_packets: 0
tx_queue_22_bytes: 0
tx_queue_23_packets: 0
tx_queue_23_bytes: 0
tx_queue_24_packets: 0
tx_queue_24_bytes: 0
tx_queue_25_packets: 0
tx_queue_25_bytes: 0
tx_queue_26_packets: 0
tx_queue_26_bytes: 0
tx_queue_27_packets: 0
tx_queue_27_bytes: 0
tx_queue_28_packets: 0
tx_queue_28_bytes: 0
tx_queue_29_packets: 0
tx_queue_29_bytes: 0
tx_queue_30_packets: 0
tx_queue_30_bytes: 0
tx_queue_31_packets: 0
tx_queue_31_bytes: 0
tx_queue_32_packets: 0
tx_queue_32_bytes: 0
tx_queue_33_packets: 0
tx_queue_33_bytes: 0
tx_queue_34_packets: 0
tx_queue_34_bytes: 0
tx_queue_35_packets: 0
tx_queue_35_bytes: 0
tx_queue_36_packets: 0
tx_queue_36_bytes: 0
tx_queue_37_packets: 0
tx_queue_37_bytes: 0
tx_queue_38_packets: 0
tx_queue_38_bytes: 0
tx_queue_39_packets: 0
tx_queue_39_bytes: 0
tx_queue_40_packets: 0
tx_queue_40_bytes: 0
tx_queue_41_packets: 0
tx_queue_41_bytes: 0
tx_queue_42_packets: 0
tx_queue_42_bytes: 0
tx_queue_43_packets: 0
tx_queue_43_bytes: 0
tx_queue_44_packets: 0
tx_queue_44_bytes: 0
tx_queue_45_packets: 0
tx_queue_45_bytes: 0
tx_queue_46_packets: 0
tx_queue_46_bytes: 0
tx_queue_47_packets: 0
tx_queue_47_bytes: 0
tx_queue_48_packets: 0
tx_queue_48_bytes: 0
tx_queue_49_packets: 0
tx_queue_49_bytes: 0
tx_queue_50_packets: 0
tx_queue_50_bytes: 0
tx_queue_51_packets: 0
tx_queue_51_bytes: 0
tx_queue_52_packets: 0
tx_queue_52_bytes: 0
tx_queue_53_packets: 0
tx_queue_53_bytes: 0
tx_queue_54_packets: 0
tx_queue_54_bytes: 0
tx_queue_55_packets: 0
tx_queue_55_bytes: 0
tx_queue_56_packets: 0
tx_queue_56_bytes: 0
tx_queue_57_packets: 0
tx_queue_57_bytes: 0
tx_queue_58_packets: 0
tx_queue_58_bytes: 0
tx_queue_59_packets: 0
tx_queue_59_bytes: 0
tx_queue_60_packets: 0
tx_queue_60_bytes: 0
tx_queue_61_packets: 6
tx_queue_61_bytes: 540
tx_queue_62_packets: 0
tx_queue_62_bytes: 0
tx_queue_63_packets: 0
tx_queue_63_bytes: 0
rx_queue_0_packets: 1064316
rx_queue_0_bytes: 319294800
rx_queue_1_packets: 1064318
rx_queue_1_bytes: 319295400
rx_queue_2_packets: 1064328
rx_queue_2_bytes: 319298400
rx_queue_3_packets: 1064329
rx_queue_3_bytes: 319298700
rx_queue_4_packets: 1064326
rx_queue_4_bytes: 319297800
rx_queue_5_packets: 1064328
rx_queue_5_bytes: 319298400
rx_queue_6_packets: 1064330
rx_queue_6_bytes: 319299000
rx_queue_7_packets: 1064327
rx_queue_7_bytes: 319298100
rx_queue_8_packets: 1064316
rx_queue_8_bytes: 319294800
rx_queue_9_packets: 1064317
rx_queue_9_bytes: 319295100
rx_queue_10_packets: 1064329
rx_queue_10_bytes: 319298700
rx_queue_11_packets: 1064331
rx_queue_11_bytes: 319299300
rx_queue_12_packets: 1064325
rx_queue_12_bytes: 319297500
rx_queue_13_packets: 1064325
rx_queue_13_bytes: 319297500
rx_queue_14_packets: 1064331
rx_queue_14_bytes: 319299300
rx_queue_15_packets: 1064330
rx_queue_15_bytes: 319299000
rx_queue_16_packets: 0
rx_queue_16_bytes: 0
rx_queue_17_packets: 0
rx_queue_17_bytes: 0
rx_queue_18_packets: 0
rx_queue_18_bytes: 0
rx_queue_19_packets: 0
rx_queue_19_bytes: 0
rx_queue_20_packets: 0
rx_queue_20_bytes: 0
rx_queue_21_packets: 0
rx_queue_21_bytes: 0
rx_queue_22_packets: 0
rx_queue_22_bytes: 0
rx_queue_23_packets: 0
rx_queue_23_bytes: 0
rx_queue_24_packets: 0
rx_queue_24_bytes: 0
rx_queue_25_packets: 0
rx_queue_25_bytes: 0
rx_queue_26_packets: 0
rx_queue_26_bytes: 0
rx_queue_27_packets: 0
rx_queue_27_bytes: 0
rx_queue_28_packets: 0
rx_queue_28_bytes: 0
rx_queue_29_packets: 0
rx_queue_29_bytes: 0
rx_queue_30_packets: 0
rx_queue_30_bytes: 0
rx_queue_31_packets: 0
rx_queue_31_bytes: 0
rx_queue_32_packets: 0
rx_queue_32_bytes: 0
rx_queue_33_packets: 0
rx_queue_33_bytes: 0
rx_queue_34_packets: 0
rx_queue_34_bytes: 0
rx_queue_35_packets: 0
rx_queue_35_bytes: 0
rx_queue_36_packets: 0
rx_queue_36_bytes: 0
rx_queue_37_packets: 0
rx_queue_37_bytes: 0
rx_queue_38_packets: 0
rx_queue_38_bytes: 0
rx_queue_39_packets: 0
rx_queue_39_bytes: 0
rx_queue_40_packets: 0
rx_queue_40_bytes: 0
rx_queue_41_packets: 0
rx_queue_41_bytes: 0
rx_queue_42_packets: 0
rx_queue_42_bytes: 0
rx_queue_43_packets: 0
rx_queue_43_bytes: 0
rx_queue_44_packets: 0
rx_queue_44_bytes: 0
rx_queue_45_packets: 0
rx_queue_45_bytes: 0
rx_queue_46_packets: 0
rx_queue_46_bytes: 0
rx_queue_47_packets: 0
rx_queue_47_bytes: 0
rx_queue_48_packets: 0
rx_queue_48_bytes: 0
rx_queue_49_packets: 0
rx_queue_49_bytes: 0
rx_queue_50_packets: 0
rx_queue_50_bytes: 0
rx_queue_51_packets: 0
rx_queue_51_bytes: 0
rx_queue_52_packets: 0
rx_queue_52_bytes: 0
rx_queue_53_packets: 0
rx_queue_53_bytes: 0
rx_queue_54_packets: 0
rx_queue_54_bytes: 0
rx_queue_55_packets: 0
rx_queue_55_bytes: 0
rx_queue_56_packets: 0
rx_queue_56_bytes: 0
rx_queue_57_packets: 0
rx_queue_57_bytes: 0
rx_queue_58_packets: 0
rx_queue_58_bytes: 0
rx_queue_59_packets: 0
rx_queue_59_bytes: 0
rx_queue_60_packets: 0
rx_queue_60_bytes: 0
rx_queue_61_packets: 0
rx_queue_61_bytes: 0
rx_queue_62_packets: 0
rx_queue_62_bytes: 0
rx_queue_63_packets: 0
rx_queue_63_bytes: 0
tx_pb_0_pxon: 0
tx_pb_0_pxoff: 0
tx_pb_1_pxon: 0
tx_pb_1_pxoff: 0
tx_pb_2_pxon: 0
tx_pb_2_pxoff: 0
tx_pb_3_pxon: 0
tx_pb_3_pxoff: 0
tx_pb_4_pxon: 0
tx_pb_4_pxoff: 0
tx_pb_5_pxon: 0
tx_pb_5_pxoff: 0
tx_pb_6_pxon: 0
tx_pb_6_pxoff: 0
tx_pb_7_pxon: 0
tx_pb_7_pxoff: 0
rx_pb_0_pxon: 0
rx_pb_0_pxoff: 0
rx_pb_1_pxon: 0
rx_pb_1_pxoff: 0
rx_pb_2_pxon: 0
rx_pb_2_pxoff: 0
rx_pb_3_pxon: 0
rx_pb_3_pxoff: 0
rx_pb_4_pxon: 0
rx_pb_4_pxoff: 0
rx_pb_5_pxon: 0
rx_pb_5_pxoff: 0
rx_pb_6_pxon: 0
rx_pb_6_pxoff: 0
rx_pb_7_pxon: 0
rx_pb_7_pxoff: 0
[EDIT-1 based on the comment update and code snippet shared]
DPDK NIC 82599 NIC supports multiple RX queue receive and multiple TX queue send. There are 2 types of stats PMD based rte_eth_stats_get and HW register based rte_eth_xstats_get.
when using DPDK stats rte_eth_stats_get the rx stats will be updated by PMD for each rte_eth_rx_burst. So one either needs to periodically invoke rte_eth_rx_burst or run it in a polling loop.
Solution: modifying the code enable RSS before dev_configure allows the traffic to be spread across multiple RX queue. replace current rte_eth_dev_configure with below code
struct rte_eth_conf local_port_conf = port_conf;
local_port_conf.rx_adv_conf.rss_conf.rss_hf &= dev_info.flow_type_rss_offloads;
if (local_port_conf.rx_adv_conf.rss_conf.rss_hf != port_conf_default.rx_adv_conf.rss_conf.rss_hf)
{
printf("Port %u modified RSS hash function based on hardware support,"
"requested:%#"PRIx64" configured:%#"PRIx64"\n",
port,
port_conf.rx_adv_conf.rss_conf.rss_hf,
local_port_conf.rx_adv_conf.rss_conf.rss_hf);
}
retval = rte_eth_dev_configure(port, rx_rings, tx_rings, &local_port_conf);
note: I have validated the same on intel FVL and CVL, requested for an update via comment.
The MVCE below simplified from real codebase shows the same issue.
The server continuously sends a "burst" of 5 UDP frames filled with 150 bytes of value 0xA5 with small or no delay in between. A pause of 1 second is made.
The client use the boost::asio async_receive_from() function in parallel of a 1 second timer.
The client works relatively well except when the delay between the UDP frames is "too" small. It seems that the correct size ( here 150 bytes ) is retrieved but the buffer/vector seems not to be updated.
5 x 150 bytes of UDP frames does not seem much.
Wireshark DOES see the complete and correct frames sent.
If I use a synchronous boost asio socket synchronous receive_from() I meet no issues
I tried maybe half a dozen times to dive into boost asio without much success in finding a single truth or rationale. Same posts on SO show very different code so that it is difficult to transpose them to the present code
Here are the code
client (client_with_timer.cc)
#include <iostream>
#include <vector>
#include <string>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
using namespace boost::asio;
void asyncReadHandler( const boost::system::error_code& error, std::size_t bytesTransferred );
void timeoutHandler( const boost::system::error_code& error, bool* ptime_out );
size_t ReceivedDataSize;
std::string ReadError;
int main(int argc, char * argv[])
{
io_service io;
ip::udp::socket socket(io, ip::udp::endpoint(ip::udp::v4(), 1620));
size_t num = 0;
while (true)
{
std::vector<unsigned char> vec(1500);
ip::udp::endpoint from;
socket.async_receive_from(
boost::asio::buffer( vec ),
from,
boost::bind(
asyncReadHandler,
boost::asio::placeholders::error,
boost::asio::placeholders::bytes_transferred ) );
bool timeout = false;
ReceivedDataSize = 0;
ReadError = "";
// Creating and starting timer (by registering timeout handler)
deadline_timer timer( io, boost::posix_time::seconds( 1 ) );
timer.async_wait(
boost::bind( timeoutHandler, boost::asio::placeholders::error, &timeout ) );
// Resetting IO service instance
io.reset();
while(io.run_one())
{
if ( timeout ) {
socket.cancel();
timer.cancel();
//Leave the io run_one loop
break;
}
else if ( (0 != ReceivedDataSize ) || (!ReadError.empty())) {
timer.cancel();
socket.cancel();
std::cout << "Received n°" << num++ << ": " << ReceivedDataSize << "\r" << std::flush;
if (0 != ReceivedDataSize )
vec.resize(ReceivedDataSize);
if (!ReadError.empty())
std::cout << "Error: " << ReadError << std::endl;
bool result = true;
for ( auto x : vec )
if ( 0xA5 != x ) { result = false; break; }
if ( false == result ) {
std::cout << std::endl << "Bad reception" << std::endl << std::hex;
for ( auto x : vec )
std::cout << (int)x << " ";
std::cout << std::dec << "\n";
}
//Leave the io run_one loop
break;
}
else {
//What shall I do here ???
//another potential io.reset () did not bring much
}
}
}
return 0;
}
void asyncReadHandler( const boost::system::error_code& error, std::size_t bytesTransferred )
{
// If read canceled, simply returning...
if( error == boost::asio::error::operation_aborted ) return;
ReceivedDataSize = 0;
// If no error
if( !error ) {
ReceivedDataSize = bytesTransferred;
}
else {
ReadError = error.message();
}
}
void timeoutHandler( const boost::system::error_code& error, bool* ptime_out )
{
// If timer canceled, simply returning...
if( error == boost::asio::error::operation_aborted ) return;
// Setting timeout flag
*ptime_out = true;
}
Here is the server (server.cc) so that you do not have to roll your own
#include <iostream>
#include <vector>
#include <boost/asio.hpp>
#include <boost/bind.hpp>
#include <unistd.h>
using namespace boost::asio;
int main(int argc, char * argv[])
{
io_service io;
ip::udp::socket socket(io, ip::udp::endpoint(ip::udp::v4(), 0));
std::vector<char> vec(150,0xA5);
#if 1
int separator = 1 * 1000;
#else
int separator = 0;
#endif
while (true)
{
socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
if ( separator ) usleep(separator);
socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
if ( separator ) usleep(separator);
socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
if ( separator ) usleep(separator);
socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
if ( separator ) usleep(separator);
socket.send_to(buffer(vec), ip::udp::endpoint(ip::udp::v4(), 1620));
usleep(1000*1000);
}
return 0;
}
I compiled both with the naive commands below:
g++ client_with_timer.cc -std=c++11 -O2 -Wall -o client_with_timer -lboost_system
g++ server.cc -std=c++11 -O2 -Wall -o server -lboost_system
It produces output like below when delay is too small
nils#localhost ASIO_C]$ ./client_with_timer
Received n°21: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Received n°148: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Received n°166: 150
Bad reception
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Received n°194: 150
How to correct the client code to avoid missed frames ?
Any hint for a better understanding of boost asio rationale is welcome
I think in your code are data races.
If timer is expired (timeout occures) before read operation is completed below code is executed:
if ( timeout ) {
socket.cancel();
timer.cancel();
//Leave the io run_one loop
break; // [1]
}
you are breaking from while loop, socket.cancel() cancels asynchronus read operation, its handler with operation_aborted error is queued and waits for processing in event loop. Because you jumped from while loop, run_one is not invoked, and this handler is still in queue.
io_service.reset() doesn't clear queue. Handler for aborted operation is still there. And waits to be invoked. reset() only sets stopped flag of io_service to false, then handlers can be processed by calls run_one, one .. methods, you are using reset to restore processing handlers from queue.
So we have unprocessed handler in queue, in main while loop new vector vec is created, all its elements are initialized to 0. async_receive_from started (it is reading into vec and set ReceivedDataSize in its handler), then reset is called, run_one can process handler and invokes handler for aborted operation! and you are testing ReceivedDataSize and vec for aborted operation... but you should do it for the last started async operation.
I would rewrite clause with timeout to:
if ( timeout ) {
socket.cancel();
timer.cancel();
} // no break
after removing break we guarantee that aborted operation is processed by run_one and there is no outstanding handler to be invoked when new async operation is started.
After this modifcation I have not seen bad reception while testing your code.
EDIT
Regarding your comment, yes the other break statement should also be removed from the code.
Output of the program is unpredictable because you are starting async operations which takes reference to local variable (vec is modified by async_receive_from), handler is queued, local variable is destroyed and later the handler is called from io_service while vec has been already destroyed.
You can test the code below, and see what happens:
boost::asio::io_context io; // alias on io_service
boost::asio::system_timer t1{io};
t1.expires_from_now(std::chrono::seconds(1));
boost::asio::system_timer t2{io};
t2.expires_from_now(std::chrono::seconds(1));
boost::asio::system_timer t3{io};
t3.expires_from_now(std::chrono::seconds(1));
t1.async_wait ([](const boost::system::error_code& ec){ cout << "[1]" << endl;});
t2.async_wait ([](const boost::system::error_code& ec){ cout << "[2]" << endl;});
t3.async_wait ([](const boost::system::error_code& ec){ cout << "[3]" << endl;});
// 3 handlers are queueud
cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
io.reset(); // RESET is called
cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
io.reset(); // RESET is called
cout << "num of handlers executed " << io.run_one() << endl; // wait for handler, print 1
cout << "executed: " << io.poll_one() << endl; // call handler if any ready, print 0
we are calling io_service::reset but all handlers are executed. After removing breaks from your code, you ensure that all handlers will be performed, and it guarantees that the local data are valid when these handlers are called.
I am trying for about 2 hours, and I'm not sure whether what I want to do even works.
I have a large file with some data that looks like
43034452 LONGSHIRTPAIETTE 17.30
27.90
0110
COLOR : : : : :
: : :
-11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034453 LONG SHIRT PAI ETTE 16.40
25.90
0110
COLOR : : : : :
: : :
-3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034454 BASIC 4.99
8.90
0110
COLOR : : : : :
: : :
-5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(The file has 36k rows.)
What I want to do is to get this whole thing clean.
In the end, the rows should look like
43034452;LONGSHIRTPAIETTE;17.30;27.90;0110
43034453;LONG SHIRT PAI ETTE;16.40;25.90;0110
43034454;BASIC;4.99;8.90;0110
So there is a lot of data that I don't need. I'm using Notepad++ to do my regex.
My regex string looks like ([0-9]*)\s{6,}([A-Z]*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*) at the moment.
This brings me the first number followed by 6 spaces. (It has to be like this because some rows start with FF and FF are not letters. It's some kind of sign that I can't identify but if I let Notepad++ show all signs I see FF.)
So as a result I get
\1: 43034452
\2: LONGSHIRTPAIETTE
\3: 17.30
\4: 27.90
\5: 0110
like expected, but on the next row it stops on the space. If I add \s to the pattern, then it also selects all spaces after the word part. And I obviously can't say "only one space", can I?
So my question is, can I use regex to get a selection like the one I want?
If so, what am I doing wrong?
Try this:
([0-9]+)\s{6,}((?:[A-Z]+\ )+)\s*([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)
Note a few things:
Tightening the *s to + where this is appropriate, so you're enforcing some characters in those columns, or actual whitespace
The use of a non-capturing group to repeat one or more instances of a word then a space.
Use the below regex
([0-9]*)\s{6,}([A-Z]+(?:\s+[A-Z]+)*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*).*?(?=\n\S|$)
and then replace the match with \1;\2;\3;\4;\5
Don't forget to enable the DOTALL modifier s.
DEMO
Your approach is correct.. just replace * with + (more than one) in your regex.
/([0-9]+)\s{6,}([A-Z ]+)\s+([0-9\.]+)\s+([0-9\.]+)\s+([0-9]+)/g
See the DEMO.
There is a simple program in c++ / mpi (mvapich), which sends an array of type float. When i use MPI_Send,MPI_Ssend,MPI_Rsend ,if the size of the data is more than the eager threshold(64k in my program), then during the call MPI_Send my program hangs. If array is smaller than the threshold, program works fine.Source code is bellow:
#include "mpi.h"
#include <unistd.h>
#include <stdio.h>
int main(int argc,char *argv[]) {
int mype=0,size=1;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&mype);
MPI_Comm_size(MPI_COMM_WORLD,&size);
int num = 2048*2048;
float* h_pos = new float[num];
MPI_Status stat;
if(mype == 0)
{
MPI_Rsend(h_pos, 20000, MPI_FLOAT, 1, 5, MPI_COMM_WORLD);
}
if(mype == 1)
{
printf("%fkb\n", 20000.0f*sizeof(float)/1024);
MPI_Recv(h_pos, 20000, MPI_FLOAT, 0, 5, MPI_COMM_WORLD, &stat);
}
MPI_Finalize();
return 0;
}
I think my settings may be wrong,Parameters is bellow:
MVAPICH2 All Parameters
MV2_COMM_WORLD_LOCAL_RANK : 0
PMI_ID : 0
MPIRUN_RSH_LAUNCH : 0
MPISPAWN_GLOBAL_NPROCS : 2
MPISPAWN_MPIRUN_HOST : g718a
MPISPAWN_MPIRUN_ID : 10800
MPISPAWN_NNODES : 1
MPISPAWN_WORKING_DIR : /home/g718a/new_workspace/mpi_test
USE_LINEAR_SSH : 1
PMI_PORT : g718a:42714
MV2_3DTORUS_SUPPORT : 0
MV2_NUM_SA_QUERY_RETRIES : 20
MV2_NUM_SLS : 8
MV2_DEFAULT_SERVICE_LEVEL : 0
MV2_PATH_SL_QUERY : 0
MV2_USE_QOS : 0
MV2_ALLGATHER_BRUCK_THRESHOLD : 524288
MV2_ALLGATHER_RD_THRESHOLD : 81920
MV2_ALLGATHER_REVERSE_RANKING : 1
MV2_ALLGATHERV_RD_THRESHOLD : 0
MV2_ALLREDUCE_2LEVEL_MSG : 262144
MV2_ALLREDUCE_SHORT_MSG : 2048
MV2_ALLTOALL_MEDIUM_MSG : 16384
MV2_ALLTOALL_SMALL_MSG : 2048
MV2_ALLTOALL_THROTTLE_FACTOR : 4
MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE : 64
MV2_GATHER_SWITCH_PT : 0
MV2_INTRA_SHMEM_REDUCE_MSG : 2048
MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
MV2_KNOMIAL_INTER_LEADER_THRESHOLD : 65536
MV2_KNOMIAL_INTER_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_THRESHOLD : 131072
MV2_RED_SCAT_LARGE_MSG : 524288
MV2_RED_SCAT_SHORT_MSG : 64
MV2_REDUCE_2LEVEL_MSG : 16384
MV2_REDUCE_SHORT_MSG : 8192
MV2_SCATTER_MEDIUM_MSG : 0
MV2_SCATTER_SMALL_MSG : 0
MV2_SHMEM_ALLREDUCE_MSG : 32768
MV2_SHMEM_COLL_MAX_MSG_SIZE : 131072
MV2_SHMEM_COLL_NUM_COMM : 8
MV2_SHMEM_COLL_NUM_PROCS : 2
MV2_SHMEM_COLL_SPIN_COUNT : 5
MV2_SHMEM_REDUCE_MSG : 4096
MV2_USE_BCAST_SHORT_MSG : 16384
MV2_USE_DIRECT_GATHER : 1
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
MV2_USE_DIRECT_SCATTER : 1
MV2_USE_OSU_COLLECTIVES : 1
MV2_USE_OSU_NB_COLLECTIVES : 1
MV2_USE_KNOMIAL_2LEVEL_BCAST : 1
MV2_USE_KNOMIAL_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
MV2_USE_SHMEM_ALLREDUCE : 1
MV2_USE_SHMEM_BARRIER : 1
MV2_USE_SHMEM_BCAST : 1
MV2_USE_SHMEM_COLL : 1
MV2_USE_SHMEM_REDUCE : 1
MV2_USE_TWO_LEVEL_GATHER : 1
MV2_USE_TWO_LEVEL_SCATTER : 1
MV2_USE_XOR_ALLTOALL : 1
MV2_DEFAULT_SRC_PATH_BITS : 0
MV2_DEFAULT_STATIC_RATE : 0
MV2_DEFAULT_TIME_OUT : 67374100
MV2_DEFAULT_MTU : 0
MV2_DEFAULT_PKEY : 0
MV2_DEFAULT_PORT : -1
MV2_DEFAULT_GID_INDEX : 0
MV2_DEFAULT_PSN : 0
MV2_DEFAULT_MAX_RECV_WQE : 128
MV2_DEFAULT_MAX_SEND_WQE : 64
MV2_DEFAULT_MAX_SG_LIST : 1
MV2_DEFAULT_MIN_RNR_TIMER : 12
MV2_DEFAULT_QP_OUS_RD_ATOM : 257
MV2_DEFAULT_RETRY_COUNT : 67900423
MV2_DEFAULT_RNR_RETRY : 202639111
MV2_DEFAULT_MAX_CQ_SIZE : 40000
MV2_DEFAULT_MAX_RDMA_DST_OPS : 4
MV2_INITIAL_PREPOST_DEPTH : 10
MV2_IWARP_MULTIPLE_CQ_THRESHOLD : 32
MV2_NUM_HCAS : 1
MV2_NUM_NODES_IN_JOB : 1
MV2_NUM_PORTS : 1
MV2_NUM_QP_PER_PORT : 1
MV2_MAX_RDMA_CONNECT_ATTEMPTS : 10
MV2_ON_DEMAND_UD_INFO_EXCHANGE : 1
MV2_PREPOST_DEPTH : 64
MV2_HOMOGENEOUS_CLUSTER : 0
MV2_COALESCE_THRESHOLD : 6
MV2_DREG_CACHE_LIMIT : 0
MV2_IBA_EAGER_THRESHOLD : 0
MV2_MAX_INLINE_SIZE : 0
MV2_MAX_R3_PENDING_DATA : 524288
MV2_MED_MSG_RAIL_SHARING_POLICY : 0
MV2_NDREG_ENTRIES : 0
MV2_NUM_RDMA_BUFFER : 0
MV2_NUM_SPINS_BEFORE_LOCK : 2000
MV2_POLLING_LEVEL : 1
MV2_POLLING_SET_LIMIT : -1
MV2_POLLING_SET_THRESHOLD : 256
MV2_R3_NOCACHE_THRESHOLD : 32768
MV2_R3_THRESHOLD : 4096
MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
MV2_RAIL_SHARING_MED_MSG_THRESHOLD : 2048
MV2_RAIL_SHARING_POLICY : 4
MV2_RDMA_EAGER_LIMIT : 32
MV2_RDMA_FAST_PATH_BUF_SIZE : 4096
MV2_RDMA_NUM_EXTRA_POLLS : 1
MV2_RNDV_EXT_SENDQ_SIZE : 5
MV2_RNDV_PROTOCOL : 3
MV2_SMALL_MSG_RAIL_SHARING_POLICY : 0
MV2_SPIN_COUNT : 5000
MV2_SRQ_LIMIT : 30
MV2_SRQ_MAX_SIZE : 4096
MV2_SRQ_SIZE : 256
MV2_STRIPING_THRESHOLD : 8192
MV2_USE_COALESCE : 0
MV2_USE_XRC : 0
MV2_VBUF_MAX : -1
MV2_VBUF_POOL_SIZE : 512
MV2_VBUF_SECONDARY_POOL_SIZE : 256
MV2_VBUF_TOTAL_SIZE : 0
MV2_USE_HWLOC_CPU_BINDING : 1
MV2_ENABLE_AFFINITY : 1
MV2_ENABLE_LEASTLOAD : 0
MV2_SMP_BATCH_SIZE : 8
MV2_SMP_EAGERSIZE : 65537
MV2_SMPI_LENGTH_QUEUE : 262144
MV2_SMP_NUM_SEND_BUFFER : 256
MV2_SMP_SEND_BUF_SIZE : 131072
MV2_USE_SHARED_MEM : 1
MV2_CUDA_BLOCK_SIZE : 0
MV2_CUDA_NUM_RNDV_BLOCKS : 8
MV2_CUDA_VECTOR_OPT : 1
MV2_CUDA_KERNEL_OPT : 1
MV2_EAGER_CUDAHOST_REG : 0
MV2_USE_CUDA : 1
MV2_CUDA_NUM_EVENTS : 64
MV2_CUDA_IPC : 1
MV2_CUDA_IPC_THRESHOLD : 0
MV2_CUDA_ENABLE_IPC_CACHE : 0
MV2_CUDA_IPC_MAX_CACHE_ENTRIES : 1
MV2_CUDA_IPC_NUM_STAGE_BUFFERS : 2
MV2_CUDA_IPC_STAGE_BUF_SIZE : 524288
MV2_CUDA_IPC_BUFFERED : 1
MV2_CUDA_IPC_BUFFERED_LIMIT : 33554432
MV2_CUDA_IPC_SYNC_LIMIT : 16384
MV2_CUDA_USE_NAIVE : 1
MV2_CUDA_REGISTER_NAIVE_BUF : 524288
MV2_CUDA_GATHER_NAIVE_LIMIT : 32768
MV2_CUDA_SCATTER_NAIVE_LIMIT : 2048
MV2_CUDA_ALLGATHER_NAIVE_LIMIT : 1048576
MV2_CUDA_ALLGATHERV_NAIVE_LIMIT : 524288
MV2_CUDA_ALLTOALL_NAIVE_LIMIT : 262144
MV2_CUDA_ALLTOALLV_NAIVE_LIMIT : 262144
MV2_CUDA_BCAST_NAIVE_LIMIT : 2097152
MV2_CUDA_GATHERV_NAIVE_LIMIT : 0
MV2_CUDA_SCATTERV_NAIVE_LIMIT : 16384
MV2_CUDA_ALLTOALL_DYNAMIC : 1
MV2_CUDA_ALLGATHER_RD_LIMIT : 1024
MV2_CUDA_ALLGATHER_FGP : 1
MV2_SMP_CUDA_PIPELINE : 1
MV2_CUDA_INIT_CONTEXT : 1
MV2_SHOW_ENV_INFO : 2
MV2_DEFAULT_PUT_GET_LIST_SIZE : 200
MV2_EAGERSIZE_1SC : 0
MV2_GET_FALLBACK_THRESHOLD : 0
MV2_PIN_POOL_SIZE : 2097152
MV2_PUT_FALLBACK_THRESHOLD : 0
MV2_ASYNC_THREAD_STACK_SIZE : 1048576
MV2_THREAD_YIELD_SPIN_THRESHOLD : 5
MV2_USE_HUGEPAGES : 1
and Configurations:
mpiname -a
MVAPICH2 2.0 Fri Jun 20 20:00:00 EDT 2014 ch3:mrail
Compilation
CC: gcc -DNDEBUG -DNVALGRIND -O2
CXX: g++ -DNDEBUG -DNVALGRIND
F77: no -L/lib -L/lib
FC: no
Configuration
-with-device=ch3:mrail --with-rdma=gen2 --enable-cuda --disable-f77 --disable-fc --disable-mcast
The program runs on 2 processes:
mpirun_rsh -hostfile hosts -n 2 MV2_USE_CUDA=1 MV2_SHOW_ENV_INFO=2 ./myTest
Any ideas?
The MPI Standard specifies that
A send that uses the ready communication mode may be started only if the matching receive is already posted. Otherwise, the operation is erroneous and its outcome is undefined.
In this program there is no guarantee that the Recv will be posted before the Rsend, so the operation may fail or hang.
I have run this on my laptop with 781.2 KiB without any deadlock. Ran it on a Blue Gene/Q with 781.2 KiB without any deadlock. So, thanks for the short test case, but I'm sorry I cannot reproduce your issue. Maybe it's specific to infiniband?
The general solution in this case is to post non-blocking sends and receives. I can provide code, but you're asking about ready-send and the eager threshold, so I'm pretty sure you know about those already and must have a good reason not to use them...
I just ran your test case using MVAPICH2-2.0 on an InfiniBand system, and I was not able to reproduce the hang. Would you be able to post a debug trace of the process which is hanging?
$ gdb attach <PID>
gdb> thread apply all bt