How does flow classify example in DPDK works? - dpdk

I want to test the flow classify example in DPDK 20.08 and I'm trying to modify the given ACL rules file to match all the TCP packets.
#file format:
#src_ip/masklen dst_ip/masklen src_port : mask dst_port : mask proto/mask priority
#
2.2.2.3/24 2.2.2.7/24 32 : 0xffff 33 : 0xffff 17/0xff 0
9.9.9.3/24 9.9.9.7/24 32 : 0xffff 33 : 0xffff 17/0xff 1
9.9.9.3/24 9.9.9.7/24 32 : 0xffff 33 : 0xffff 6/0xff 2
9.9.8.3/24 9.9.8.7/24 32 : 0xffff 33 : 0xffff 6/0xff 3
6.7.8.9/24 2.3.4.5/24 32 : 0x0000 33 : 0x0000 132/0xff 4
6.7.8.9/32 192.168.0.36/32 10 : 0xffff 11 : 0xffff 6/0xfe 5
6.7.8.9/24 192.168.0.36/24 10 : 0xffff 11 : 0xffff 6/0xfe 6
6.7.8.9/16 192.168.0.36/16 10 : 0xffff 11 : 0xffff 6/0xfe 7
6.7.8.9/8 192.168.0.36/8 10 : 0xffff 11 : 0xffff 6/0xfe 8
#error rules
#9.8.7.6/8 192.168.0.36/8 10 : 0xffff 11 : 0xffff 6/0xfe 9
Should I add 0.0.0.0/0 0.0.0.0/0 0 : 0x0000 0 : 0x0000 6/0xff 0 rule? I tried but there is still no packets matching.
ps:
This is the file I'm using.
#file format:
#src_ip/masklen dst_ip/masklen src_port : mask dst_port : mask proto/mask priority
#
2.2.2.3/24 2.2.2.7/24 32 : 0xffff 33 : 0xffff 17/0xff 0
9.9.9.3/24 9.9.9.7/24 32 : 0xffff 33 : 0xffff 17/0xff 1
9.9.9.3/24 9.9.9.7/24 32 : 0xffff 33 : 0xffff 6/0xff 2
9.9.8.3/24 9.9.8.7/24 32 : 0xffff 33 : 0xffff 6/0xff 3
6.7.8.9/24 2.3.4.5/24 32 : 0x0000 33 : 0x0000 132/0xff 4
6.7.8.9/32 192.168.0.36/32 10 : 0xffff 11 : 0xffff 6/0xfe 5
6.7.8.9/24 192.168.0.36/24 10 : 0xffff 11 : 0xffff 6/0xfe 6
6.7.8.9/16 192.168.0.36/16 10 : 0xffff 11 : 0xffff 6/0xfe 7
#6.7.8.9/8 192.168.0.36/8 10 : 0xffff 11 : 0xffff 6/0xfe 8
0.0.0.0/0 0.0.0.0/0 0 : 0x0000 0 : 0x0000 6/0xff 8
#error rules
#9.8.7.6/8 192.168.0.36/8 10 : 0xffff 11 : 0xffff 6/0xfe 9
I ran again, and it goes like:
rule [0] query failed ret [-22]
rule [1] query failed ret [-22]
rule [2] query failed ret [-22]
rule [3] query failed ret [-22]
rule [4] query failed ret [-22]
rule [5] query failed ret [-22]
rule [6] query failed ret [-22]
rule [7] query failed ret [-22]
rule[8] count=2
proto = 6
Segmentation fault
I don't know what is causing the Segmentation fault.
The command is sudo ./build/flow_classify -l 101 --log-level=pmd,8 -- --rule_ipv4="./ipv4_rules_file_pass.txt" > ~/flow_classify_log and I didn't change the source code.
I'm using a two port 82599 NIC. I'm putting the log file down below which contains the output before it shows Segmentation fault
flow_classify log
Sometimes it can process normally in the first iteration, and sometimes it can't.
update 1-3:
I modified the code to stop the packet forwarding and free every single packet received to check if it is the forwarding procedure that is causing the problem.
in main function:
/* if (nb_ports < 2 || (nb_ports & 1))
rte_exit(EXIT_FAILURE, "Error: number of ports must be even\n"); */
if (nb_ports < 1)
rte_exit(EXIT_FAILURE, "Error: no port avaliable\n");
in lcore_main function:
//in lcore_main function
/* Send burst of TX packets, to second port of pair. */
/* const uint16_t nb_tx = rte_eth_tx_burst(port ^ 1, 0,
bufs, nb_rx); */
const uint16_t nb_tx = 0;
/* Free any unsent packets. */
if (unlikely(nb_tx < nb_rx)) {
uint16_t buf;
for (buf = nb_tx; buf < nb_rx; buf++)
rte_pktmbuf_free(bufs[buf]);
}
And this is the new log, but I don't think there is any difference. I'm using only one of the two ports on a single 82599ES NIC. Maybe it's the false classification rule I added that is causing the problem, because it ran okay with the default rule settings.

Flow classify requires
minimum of 2 ports, always even ports.
Flow entries has to be populated in the valid format.
Entry in rules:
2.2.2.3/0 2.2.2.7/0 32 : 0xffff 33 : 0xffff 17/0xff 0
2.2.2.3/0 2.2.2.7/0 32 : 0x0 33 : 0x0 6/0xff 1
Packet send: ipv4-TCP
Log from flow-classify:
rule [0] query failed ret [-22] -- UDP lookup failed
rule[1] count=32 -- TCP lookup success
proto = 6
Virtual NIC: ./build/flow_classify -c 8 --no-pci --vdev=net_tap0 --vdev=net_tap1 -- --rule_ipv4="ipv4_rules_file.txt"
Physical NIC: ./build/flow_classify -c 8 -- --rule_ipv4="ipv4_rules_file.txt"
Hence issue faced at your end is because
incorrect configuration
were only using 1 port

Related

Why do these two functions to print binary representation of an integer have the same output?

I have two functions that print 32bit number in binary.
First one divides the number into bytes and starts printing from the last byte (from the 25th bit of the whole integer).
Second one is more straightforward and starts from the 1st bit of the number.
It seems to me that these functions should have different outputs, because they process the bits in different orders. However the outputs are the same. Why?
#include <stdio.h>
void printBits(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i, j;
for (i=size-1;i>=0;i--)
{
for (j=7;j>=0;j--)
{
byte = (b[i] >> j) & 1;
printf("%u", byte);
}
}
puts("");
}
void printBits_2( unsigned *A) {
for (int i=31;i>=0;i--)
{
printf("%u", (A[0] >> i ) & 1u );
}
puts("");
}
int main()
{
unsigned a = 1014750;
printBits(sizeof(a), &a); // ->00000000000011110111101111011110
printBits_2(&a); // ->00000000000011110111101111011110
return 0;
}
Both your functions print binary representation of the number from the most significant bit to the least significant bit. Today's PCs (and majority of other computer architectures) use so-called Little Endian format, in which multi-byte values are stored with least significant byte first.
That means that 32-bit value 0x01020304 stored on address 0x1000 will look like this in the memory:
+--------++--------+--------+--------+--------+
|Address || 0x1000 | 0x1001 | 0x1002 | 0x1003 |
+--------++--------+--------+--------+--------+
|Data || 0x04 | 0x03 | 0x02 | 0x01 |
+--------++--------+--------+--------+--------+
Therefore, on Little Endian architectures, printing value's bits from MSB to LSB is equivalent to taking its bytes in reversed order and printing each byte's bits from MSB to LSB.
This is the expected result when:
1) You use both functions to print a single integer, in binary.
2) Your C++ implementation is on a little-endian hardware platform.
Change either one of these factors (with printBits_2 appropriately adjusted), and the results will be different.
They don't process the bits in different orders. Here's a visual:
Bytes: 4 3 2 1
Bits: 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
Bits: 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
The fact that the output is the same from both of these functions tells you that your platform uses Little-Endian encoding, which means the most significant byte comes last.
The first two rows show how the first function works on your program, and the last row shows how the second function works.
However, the first function will fail on platforms that use Big-Endian encoding and output the bits in this order shown in the third row:
Bytes: 4 3 2 1
Bits: 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1 8 7 6 5 4 3 2 1
Bits: 8 7 6 5 4 3 2 1 16 15 14 13 12 11 10 9 24 23 22 21 20 19 18 17 32 31 30 29 28 27 26 25
For the printbits1 function, it is taking the uint32 pointer and assigning it to a char pointer.
unsigned char *b = (unsigned char*) ptr;
Now, in a big endian processor, b[0] will point to the Most significant byte of the uint32 value. The inner loop prints this byte in binary, and then b[1] will point to the next most significant byte in ptr. Therefore this method prints the uint32 value MSB first.
As for printbits2, you are using
unsigned *A
i.e. an unsigned int. This loop runs from 31 to 0 and prints the uint32 value in binary.

Why are linux's IPv4 address taking 16 bytes instead of 4

I'm trying to resolve a domain name to an ipv4 address.
Here's a minimal code of what I'm doing
addrinfo hints, *results;
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_INET;
int error = getaddrinfo("google.com", 0, &hints, &results);
for(addrinfo* info = results; info; info = info->ai_next)
{
std::cout << (info->ai_family == AF_INET6 ? "ip6" : "ip4")
<< '[' << info->ai_addrlen << "]: ";
for(uint i = 0; i < info->ai_addrlen; ++i)
std::cout << int(((uint8_t*)info->ai_addr)[i]) << ' ';
std::cout << std::endl;
}
freeaddrinfo(results);
When I run this, it prints:
ip4[16]: 2 0 0 0 216 58 208 238 0 0 0 0 0 0 0 0
ip4[16]: 2 0 0 0 216 58 208 238 0 0 0 0 0 0 0 0
ip4[16]: 2 0 0 0 216 58 208 238 0 0 0 0 0 0 0 0
The actual ip of google.com is 216.58.208.238. So for some reason, for ip v4, ai_addr contains 16 bytes and actually stores the address at bytes 4 to 8.
My question is: what are the 12 other bytes for ?
I read the docs, and googled a bit but found nothing on addr_len. Also note that I didn't set the AI_V4MAPPED flag.
I'm not sure whether this is actually required by some standard, but struct sockaddr is defined to contain a short for the address family and then 14 bytes of address-family-specific data.
Every type used as an address must be compatible with that, thus, be at least the same length.
For struct sockaddr_in (the one for IPv4), it looks like this:
struct sockaddr_in {
short sin_family;
unsigned short sin_port;
struct in_addr sin_addr;
char sin_zero[8];
};
2 bytes for the address family, 2 bytes for the port, 4 bytes for the IPv4 address and 8 bytes of padding to reach the length of struct sockaddr.
That said: Your question is slightly wrong, this is not an IPv4 address (that would be just struct in_addr which is indeed just 4 bytes) but a socket address for IPv4. Still, 8 bytes are "wasted" here.

Status thermal printer with libusb on linux ARM

I known there is a lot of question about status printer ...
i have a Citizen CT-S310 II, i have managed all the code for write character in USB without problem with libusb_bulk_transfer (Text, Bold, Center, CR, CUT_PAPER etc) :
#define ENDPOINT_OUT 0x02
#define ENDPOINT_IN 0x81
struct libusb_device_handle *_handle;
[detach kernel driver...]
[claim interface...]
[etc ...]
r = libusb_bulk_transfer(device_handle, ENDPOINT_OUT, Mydata, out_len, &transferred, 1000);
Now, i need to receive data from the printer to ckeck the status, my first idea was to send the POS command with the same "bulk_transfer" of the doc :
1D (hexa) 72 (hexa) n
n => 1 (Send the papel sensor status)
and retrieve the value by "bulk_transfer" with the end point "ENDPOINT_IN" the doc say there is 8 bytes to receive :
bit 0,1 => paper found by paper near-end sensor 00H
bit 0,1 => paper not found by paper near-end sensor 03H
bit 1,2 => paper found by paper-end sensor 00H
bit 1,2 => paper not found by paper-end sensor 0CH
[...]
so two "bulk_transfer", one for send command status (ENDPOINT_OUT) and one for receive the result (ENDPOINT_IN), but i have allways an USB ERROR ("bulk_transfer" in read = -1)
Maybe the USB don't work like this ? So my second idea was to use the implemented function in PrinterClass USB with the command "control_transfer" :
int r = 0;
int out_len = 1;
unsigned char* _udata = NULL;
uint8_t bmRequestType = LIBUSB_ENDPOINT_IN | LIBUSB_REQUEST_TYPE_CLASS | LIBUSB_RECIPIENT_INTERFACE;
uint8_t bRequest = LIBUSB_REQUEST_GET_STATUS;
uint16_t wValue = 0; // the value field for the setup packet (?????)
uint16_t wIndex = 0; // N° interface printer (the index field for the setup packet)
r = libusb_control_transfer(device_handle, bmRequestType,bRequest,wValue, wIndex,_udata,out_len, USB_TIMEOUT);
i don't exactly how to fill all the parameter, i know it depend of my device, but the doc of libsub is not very explicit.
What is exactly "wValue" ?
What is exactly "wIndex" ? the interface number ??
the parameter LIBUSB_ENDPOINT_IN by default is 0x80, but my printer use 0x81, i must to change this default endpoint ?
Bus 001 Device 004: ID 1d90:2060
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x1d90
idProduct 0x2060
bcdDevice 0.02
iManufacturer 1 CITIZEN
iProduct 2 Thermal Printer
iSerial 3 00000000
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 32
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0xc0
Self Powered
MaxPower 0mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 2
bInterfaceClass 7 Printer
bInterfaceSubClass 1 Printer
bInterfaceProtocol 2 Bidirectional
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x02 EP 2 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 0
Device Status: 0x0001
Self Powered
The response of "control_transfer" in my case is always 0 :( with paper or without.How send a good "control_transfer" for request the status of my printer ??
All the help for solve my problem is welcome !!!
finally resolved !
The value of LIBUSB_REQUEST_GET_STATUS is 0x00, but for a printer the request status is 0x01.
for check the status of printer with libusb-1.0 :
uint8_t bmRequestType = LIBUSB_ENDPOINT_IN | LIBUSB_REQUEST_TYPE_CLASS | LIBUSB_RECIPIENT_INTERFACE;
uint8_t bRequest = 0x01; // Here not LIBUSB_REQUEST_GET_STATUS
uint16_t wValue = 0;
uint16_t wIndex = 0;
r = libusb_control_transfer(device_handle, bmRequestType,bRequest,wValue, wIndex,&_udata,out_len, USB_TIMEOUT);

boost serialization hexadecimal decimal encoding of data

I am new to boost serialization but this seems very strange to me.
I have a very simple class with two members
int number // always = 123
char buffer[?] // buffer with ? size
so sometimes I set the size to buffer[31] then I serialize the class
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 31 0 0 0 65 65
we can see the 123 and the 31 so no issue here both are in decimal format.
now I change buffer to buffer[1024] so I expected to see
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 1024 0 0 0 65 65 ---
this is the actual outcome
22 serialization::archive 8 0 0 1 1 0 0 0 0 123 0 0 0 4 0 0 65 65 65
boost has switched to hex for the buffer size only?
notice the other value is still decimal.
So what happens if I switch number from 123 to 1024 ?
I would imagine 040 ?
22 serialization::archive 8 0 0 1 1 0 0 0 0 1024 0 0 0 4 0 0 65 65
If this is by design, why does the 31 not get converted to 1F ? its not consistent.
This causes problems in our load function for the split_free, we were doing this
unsigned int size;
ar >> size;
but as you might guess, when this is 040, it truncs to zero :(
what is the recommended solution to this?
I was using boost 1.45.0 but I tested this on boost 1_56.0 and it is the same.
EDIT: sample of the serialization function
template<class Archive>
void save(Archive& ar, const MYCLASS& buffer, unsigned int /*version*/) {
ar << boost::serialization::make_array(reinterpret_cast<const unsigned char*>(buffer.begin()), buffer.length());
}
MYCLASS is just a wrapper on a char* with the first element an unsigned int
to keep the length approximating a UNICODE_STRING
http://msdn.microsoft.com/en-gb/library/windows/desktop/aa380518(v=vs.85).aspx
The code is the same if the length is 1024 or 31 so I would not have expected this to be a problem.
I don't think Boost "switched to hex". I honestly don't have any experience with this, but it looks like boost is serializing as an array of bytes, which can only hold numbers from 0 through 255. 1024 would be a byte with a value 4 followed by a byte with the value 0.
"why does the 31 not get converted to 1F ? its not consistent" - your assumptions are creating false inconsistencies. Stop assuming you can read the serialization archive format when actually you're just guessing.
If you want to know, trace the code. If not, just use the archive format.
If you want "human accessible form", consider the xml_oarchive.

MVAPICH hangs on MPI_Send for message larger than eager threshold

There is a simple program in c++ / mpi (mvapich), which sends an array of type float. When i use MPI_Send,MPI_Ssend,MPI_Rsend ,if the size of the data is more than the eager threshold(64k in my program), then during the call MPI_Send my program hangs. If array is smaller than the threshold, program works fine.Source code is bellow:
#include "mpi.h"
#include <unistd.h>
#include <stdio.h>
int main(int argc,char *argv[]) {
int mype=0,size=1;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&mype);
MPI_Comm_size(MPI_COMM_WORLD,&size);
int num = 2048*2048;
float* h_pos = new float[num];
MPI_Status stat;
if(mype == 0)
{
MPI_Rsend(h_pos, 20000, MPI_FLOAT, 1, 5, MPI_COMM_WORLD);
}
if(mype == 1)
{
printf("%fkb\n", 20000.0f*sizeof(float)/1024);
MPI_Recv(h_pos, 20000, MPI_FLOAT, 0, 5, MPI_COMM_WORLD, &stat);
}
MPI_Finalize();
return 0;
}
I think my settings may be wrong,Parameters is bellow:
MVAPICH2 All Parameters
MV2_COMM_WORLD_LOCAL_RANK : 0
PMI_ID : 0
MPIRUN_RSH_LAUNCH : 0
MPISPAWN_GLOBAL_NPROCS : 2
MPISPAWN_MPIRUN_HOST : g718a
MPISPAWN_MPIRUN_ID : 10800
MPISPAWN_NNODES : 1
MPISPAWN_WORKING_DIR : /home/g718a/new_workspace/mpi_test
USE_LINEAR_SSH : 1
PMI_PORT : g718a:42714
MV2_3DTORUS_SUPPORT : 0
MV2_NUM_SA_QUERY_RETRIES : 20
MV2_NUM_SLS : 8
MV2_DEFAULT_SERVICE_LEVEL : 0
MV2_PATH_SL_QUERY : 0
MV2_USE_QOS : 0
MV2_ALLGATHER_BRUCK_THRESHOLD : 524288
MV2_ALLGATHER_RD_THRESHOLD : 81920
MV2_ALLGATHER_REVERSE_RANKING : 1
MV2_ALLGATHERV_RD_THRESHOLD : 0
MV2_ALLREDUCE_2LEVEL_MSG : 262144
MV2_ALLREDUCE_SHORT_MSG : 2048
MV2_ALLTOALL_MEDIUM_MSG : 16384
MV2_ALLTOALL_SMALL_MSG : 2048
MV2_ALLTOALL_THROTTLE_FACTOR : 4
MV2_BCAST_TWO_LEVEL_SYSTEM_SIZE : 64
MV2_GATHER_SWITCH_PT : 0
MV2_INTRA_SHMEM_REDUCE_MSG : 2048
MV2_KNOMIAL_2LEVEL_BCAST_MESSAGE_SIZE_THRESHOLD : 2048
MV2_KNOMIAL_2LEVEL_BCAST_SYSTEM_SIZE_THRESHOLD : 64
MV2_KNOMIAL_INTER_LEADER_THRESHOLD : 65536
MV2_KNOMIAL_INTER_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_FACTOR : 4
MV2_KNOMIAL_INTRA_NODE_THRESHOLD : 131072
MV2_RED_SCAT_LARGE_MSG : 524288
MV2_RED_SCAT_SHORT_MSG : 64
MV2_REDUCE_2LEVEL_MSG : 16384
MV2_REDUCE_SHORT_MSG : 8192
MV2_SCATTER_MEDIUM_MSG : 0
MV2_SCATTER_SMALL_MSG : 0
MV2_SHMEM_ALLREDUCE_MSG : 32768
MV2_SHMEM_COLL_MAX_MSG_SIZE : 131072
MV2_SHMEM_COLL_NUM_COMM : 8
MV2_SHMEM_COLL_NUM_PROCS : 2
MV2_SHMEM_COLL_SPIN_COUNT : 5
MV2_SHMEM_REDUCE_MSG : 4096
MV2_USE_BCAST_SHORT_MSG : 16384
MV2_USE_DIRECT_GATHER : 1
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_MEDIUM : 1024
MV2_USE_DIRECT_GATHER_SYSTEM_SIZE_SMALL : 384
MV2_USE_DIRECT_SCATTER : 1
MV2_USE_OSU_COLLECTIVES : 1
MV2_USE_OSU_NB_COLLECTIVES : 1
MV2_USE_KNOMIAL_2LEVEL_BCAST : 1
MV2_USE_KNOMIAL_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RD_INTER_LEADER_BCAST : 1
MV2_USE_SCATTER_RING_INTER_LEADER_BCAST : 1
MV2_USE_SHMEM_ALLREDUCE : 1
MV2_USE_SHMEM_BARRIER : 1
MV2_USE_SHMEM_BCAST : 1
MV2_USE_SHMEM_COLL : 1
MV2_USE_SHMEM_REDUCE : 1
MV2_USE_TWO_LEVEL_GATHER : 1
MV2_USE_TWO_LEVEL_SCATTER : 1
MV2_USE_XOR_ALLTOALL : 1
MV2_DEFAULT_SRC_PATH_BITS : 0
MV2_DEFAULT_STATIC_RATE : 0
MV2_DEFAULT_TIME_OUT : 67374100
MV2_DEFAULT_MTU : 0
MV2_DEFAULT_PKEY : 0
MV2_DEFAULT_PORT : -1
MV2_DEFAULT_GID_INDEX : 0
MV2_DEFAULT_PSN : 0
MV2_DEFAULT_MAX_RECV_WQE : 128
MV2_DEFAULT_MAX_SEND_WQE : 64
MV2_DEFAULT_MAX_SG_LIST : 1
MV2_DEFAULT_MIN_RNR_TIMER : 12
MV2_DEFAULT_QP_OUS_RD_ATOM : 257
MV2_DEFAULT_RETRY_COUNT : 67900423
MV2_DEFAULT_RNR_RETRY : 202639111
MV2_DEFAULT_MAX_CQ_SIZE : 40000
MV2_DEFAULT_MAX_RDMA_DST_OPS : 4
MV2_INITIAL_PREPOST_DEPTH : 10
MV2_IWARP_MULTIPLE_CQ_THRESHOLD : 32
MV2_NUM_HCAS : 1
MV2_NUM_NODES_IN_JOB : 1
MV2_NUM_PORTS : 1
MV2_NUM_QP_PER_PORT : 1
MV2_MAX_RDMA_CONNECT_ATTEMPTS : 10
MV2_ON_DEMAND_UD_INFO_EXCHANGE : 1
MV2_PREPOST_DEPTH : 64
MV2_HOMOGENEOUS_CLUSTER : 0
MV2_COALESCE_THRESHOLD : 6
MV2_DREG_CACHE_LIMIT : 0
MV2_IBA_EAGER_THRESHOLD : 0
MV2_MAX_INLINE_SIZE : 0
MV2_MAX_R3_PENDING_DATA : 524288
MV2_MED_MSG_RAIL_SHARING_POLICY : 0
MV2_NDREG_ENTRIES : 0
MV2_NUM_RDMA_BUFFER : 0
MV2_NUM_SPINS_BEFORE_LOCK : 2000
MV2_POLLING_LEVEL : 1
MV2_POLLING_SET_LIMIT : -1
MV2_POLLING_SET_THRESHOLD : 256
MV2_R3_NOCACHE_THRESHOLD : 32768
MV2_R3_THRESHOLD : 4096
MV2_RAIL_SHARING_LARGE_MSG_THRESHOLD : 16384
MV2_RAIL_SHARING_MED_MSG_THRESHOLD : 2048
MV2_RAIL_SHARING_POLICY : 4
MV2_RDMA_EAGER_LIMIT : 32
MV2_RDMA_FAST_PATH_BUF_SIZE : 4096
MV2_RDMA_NUM_EXTRA_POLLS : 1
MV2_RNDV_EXT_SENDQ_SIZE : 5
MV2_RNDV_PROTOCOL : 3
MV2_SMALL_MSG_RAIL_SHARING_POLICY : 0
MV2_SPIN_COUNT : 5000
MV2_SRQ_LIMIT : 30
MV2_SRQ_MAX_SIZE : 4096
MV2_SRQ_SIZE : 256
MV2_STRIPING_THRESHOLD : 8192
MV2_USE_COALESCE : 0
MV2_USE_XRC : 0
MV2_VBUF_MAX : -1
MV2_VBUF_POOL_SIZE : 512
MV2_VBUF_SECONDARY_POOL_SIZE : 256
MV2_VBUF_TOTAL_SIZE : 0
MV2_USE_HWLOC_CPU_BINDING : 1
MV2_ENABLE_AFFINITY : 1
MV2_ENABLE_LEASTLOAD : 0
MV2_SMP_BATCH_SIZE : 8
MV2_SMP_EAGERSIZE : 65537
MV2_SMPI_LENGTH_QUEUE : 262144
MV2_SMP_NUM_SEND_BUFFER : 256
MV2_SMP_SEND_BUF_SIZE : 131072
MV2_USE_SHARED_MEM : 1
MV2_CUDA_BLOCK_SIZE : 0
MV2_CUDA_NUM_RNDV_BLOCKS : 8
MV2_CUDA_VECTOR_OPT : 1
MV2_CUDA_KERNEL_OPT : 1
MV2_EAGER_CUDAHOST_REG : 0
MV2_USE_CUDA : 1
MV2_CUDA_NUM_EVENTS : 64
MV2_CUDA_IPC : 1
MV2_CUDA_IPC_THRESHOLD : 0
MV2_CUDA_ENABLE_IPC_CACHE : 0
MV2_CUDA_IPC_MAX_CACHE_ENTRIES : 1
MV2_CUDA_IPC_NUM_STAGE_BUFFERS : 2
MV2_CUDA_IPC_STAGE_BUF_SIZE : 524288
MV2_CUDA_IPC_BUFFERED : 1
MV2_CUDA_IPC_BUFFERED_LIMIT : 33554432
MV2_CUDA_IPC_SYNC_LIMIT : 16384
MV2_CUDA_USE_NAIVE : 1
MV2_CUDA_REGISTER_NAIVE_BUF : 524288
MV2_CUDA_GATHER_NAIVE_LIMIT : 32768
MV2_CUDA_SCATTER_NAIVE_LIMIT : 2048
MV2_CUDA_ALLGATHER_NAIVE_LIMIT : 1048576
MV2_CUDA_ALLGATHERV_NAIVE_LIMIT : 524288
MV2_CUDA_ALLTOALL_NAIVE_LIMIT : 262144
MV2_CUDA_ALLTOALLV_NAIVE_LIMIT : 262144
MV2_CUDA_BCAST_NAIVE_LIMIT : 2097152
MV2_CUDA_GATHERV_NAIVE_LIMIT : 0
MV2_CUDA_SCATTERV_NAIVE_LIMIT : 16384
MV2_CUDA_ALLTOALL_DYNAMIC : 1
MV2_CUDA_ALLGATHER_RD_LIMIT : 1024
MV2_CUDA_ALLGATHER_FGP : 1
MV2_SMP_CUDA_PIPELINE : 1
MV2_CUDA_INIT_CONTEXT : 1
MV2_SHOW_ENV_INFO : 2
MV2_DEFAULT_PUT_GET_LIST_SIZE : 200
MV2_EAGERSIZE_1SC : 0
MV2_GET_FALLBACK_THRESHOLD : 0
MV2_PIN_POOL_SIZE : 2097152
MV2_PUT_FALLBACK_THRESHOLD : 0
MV2_ASYNC_THREAD_STACK_SIZE : 1048576
MV2_THREAD_YIELD_SPIN_THRESHOLD : 5
MV2_USE_HUGEPAGES : 1
and Configurations:
mpiname -a
MVAPICH2 2.0 Fri Jun 20 20:00:00 EDT 2014 ch3:mrail
Compilation
CC: gcc -DNDEBUG -DNVALGRIND -O2
CXX: g++ -DNDEBUG -DNVALGRIND
F77: no -L/lib -L/lib
FC: no
Configuration
-with-device=ch3:mrail --with-rdma=gen2 --enable-cuda --disable-f77 --disable-fc --disable-mcast
The program runs on 2 processes:
mpirun_rsh -hostfile hosts -n 2 MV2_USE_CUDA=1 MV2_SHOW_ENV_INFO=2 ./myTest
Any ideas?
The MPI Standard specifies that
A send that uses the ready communication mode may be started only if the matching receive is already posted. Otherwise, the operation is erroneous and its outcome is undefined.
In this program there is no guarantee that the Recv will be posted before the Rsend, so the operation may fail or hang.
I have run this on my laptop with 781.2 KiB without any deadlock. Ran it on a Blue Gene/Q with 781.2 KiB without any deadlock. So, thanks for the short test case, but I'm sorry I cannot reproduce your issue. Maybe it's specific to infiniband?
The general solution in this case is to post non-blocking sends and receives. I can provide code, but you're asking about ready-send and the eager threshold, so I'm pretty sure you know about those already and must have a good reason not to use them...
I just ran your test case using MVAPICH2-2.0 on an InfiniBand system, and I was not able to reproduce the hang. Would you be able to post a debug trace of the process which is hanging?
$ gdb attach <PID>
gdb> thread apply all bt