Poor multicast performance sending using boost::asio on Windows - c++

I have a very simple wrapper for boost::asio sockets sending multicast messages:
// header
class MulticastSender
{
public:
/// Constructor
/// #param ip - The multicast address to broadcast on
/// #param port - The multicast port to broadcast on
MulticastSender(const String& ip, const UInt16 port);
/// Sends a multicast message
/// #param msg - The message to send
/// #param size - The size of the message (in bytes)
/// #return number of bytes sent
size_t send(const void* msg, const size_t size);
private:
boost::asio::io_service m_service;
boost::asio::ip::udp::endpoint m_endpoint;
boost::asio::ip::udp::socket m_socket;
};
// implementation
inline MulticastSender::MulticastSender(const String& ip, const UInt16 port) :
m_endpoint(boost::asio::ip::address_v4::from_string(ip), port),
m_socket(m_service, m_endpoint.protocol())
{
m_socket.set_option(boost::asio::socket_base::send_buffer_size(8 * 1024 * 1024));
m_socket.set_option(boost::asio::socket_base::broadcast(true));
m_socket.set_option(boost::asio::socket_base::reuse_address(true));
}
inline size_t MulticastSender::send(const void* msg, const size_t size)
{
try
{
return m_socket.send_to(boost::asio::buffer(msg, size), m_endpoint);
}
catch (const std::exception& e)
{
setError(e.what());
}
return 0;
}
// read and send a message
MulticastSender sender(ip, port);
while(readFile(&msg)) sender.send(&msg, sizeof(msg));
When compiled on Windows 7 using Visual Studio 2013, I get throughput of ~11 MB/s, on Ubuntu 14.04 ~100 MB/s. I added timers and was able to validate the send(...) method is the culprit.
I tried with and without antivirus enabled, and tried disabling a few other services with no luck. Some I cannot disable due to permissions on the computer, like the firewall.
I assume there is a service on Windows running that is interfering, or my implementation is missing something that is effecting the application on Windows and not Linux.
Any ideas on what might be cauing this would be appreciated

Is windows and ubuntu running on the same machine?
If not, it seems that your windows machine is limited by 100Mbit Ethernet, while the ubuntu machine seems to work with 1Gbit Ethernet.
(In case thats not the cause of the problem, i am sorry for posting an anwser instead of commenting. But i am not able to do so and your code is that simple and the data rates are so obvious [11*8MB/s ~ 100Mbit/s and 100MB/s ~ 800Mbit/s]. I just had to make that hint...)

If you data transfer if huge say more that 10 MB messages i would suggest you to use TCP instead of UPD/Multicast. TCP is a reliable protocol.
I read in a case where a stream of 300 byte packets was being sent over Ethernet (1500 byte MTU) and TCP was 50% faster than UDP. Because TCP will try and buffer the data and fill a full network segment thus making more efficient use of the available bandwidth but UDP puts the packet on the wire immediately thus congesting the network with lots of small packets. In windows i suggest you to use TCP over UDP/Multicast.

Related

How to beat delays in UDP client

I'm trying to write a UDP client App, which receives some control packets(length 52-104 bytes) from a server fragmented to datagrams of size 1-4 bytes each (Why this is not a big packet and is fragmented instead? That's a mystery to me...).
I created a thread, and in this thread I used a typical recvfrom example from MS. The received data from the small buffer I append to string to recreate the packet (If the packet is too big, the string would be cleared).
My problem is the latency:
The inbound packets are changed, but the data from the buffer and the string hasn't changed during the minute or more. I tried to use a circular buffer instead of a string, but it has no effect on the latency.
So, what am I doing wrong and how do I receive a fragmented UDP packet in a proper way?
I don't have the original sender code, so i'm attaching a part of my sender emulator. As you can see, the original data string (mSendString) is fragmented to some four-bytes packets and sent to the net. When the data string has changed on sender side, the data on receiver side hasn't changed in aceptable time, it changed a few minutes later.
UdpClient mSendClient = new UdpClient();
string mSendString = "head,data,data,data,data,data,data,data,chksumm\n";//Control string
public static void SendCallback(IAsyncResult ar)
{
UdpClient u = (UdpClient)ar.AsyncState;
mMsgSent = true;
}
public void Send()
{
while (!mThreadStop)
{
if (!mSendStop)
{
for (int i = 0; i < mSendString.Length; i+=4)
{
Byte[] sendBytes = new Byte[4];
Encoding.ASCII.GetBytes(mSendString,i,4,sendBytes,0);
mSendClient.BeginSend(sendBytes, 1, mEndPoint, new AsyncCallback(SendCallback), mSendClient);
}
}
Thread.Sleep(100);
}
}
I was wrong when I asked this question in some points:
First,the wrong terms - the string was chopped/sliced/divided into
four bytes packets, not fragmented.
Second, I was thought, that too
much small UDP packets are the cause of latency in my app, but when I
ran my UDP receive code separately from other app code, I found this
UDP receive code is working without latency.
Seems like there are threading problems, not UDP sockets.

Bind UDP socket to specific network interface using Boost.Asio

My PC has several network cards and I'm trying to receive UDP data from several broadcast devices. Each device is isolated on a dedicated network and I'm trying to read UDP data from multiple devices at the same time. I'm using Boost version 1.67. Let's pretend in this post that I want to get data from one only specific device, so I want to bind on a local network interface.
On Windows the following code works, but on my Ubuntu 16.04 64bits machine it does not. Indeed, if I bind on one specific local IP address (192.168.1.1 in this example) I do not get any data. But if I use the ANY "0.0.0.0" address then I get what I want. Except that in that case I don't know where it comes from. It could be received by any network card!
Is it normal behavior ? Or do I need to read the sender_endpoint to know that information on Linux and filter afterwards?
#include <iostream>
#include <boost/array.hpp>
#include <boost/asio.hpp>
using boost::asio::ip::udp;
int main(int argc, char* argv[])
{
try
{
boost::asio::io_context io_context;
// Setup UDP Socket
udp::socket socket(io_context);
socket.open(udp::v4());
// Bind to specific network card and chosen port
socket.bind(udp::endpoint(boost::asio::ip::address::from_string("192.168.1.1"), 2368));
// Prepare to receive data
boost::array<char, 128> recv_buf;
udp::endpoint sender_endpoint;
size_t len = socket.receive_from(boost::asio::buffer(recv_buf), sender_endpoint);
// Write data to std output
std::cout.write(recv_buf.data(), len);
}
catch (std::exception& e)
{
std::cerr << e.what() << std::endl;
}
return 0;
}
A little late but others might come to this as well as I have been attempting this with Boost and trying to figure out how it works. From reviewing this question: Fail to listen to UDP Port with boost::asio I went to this page: https://forums.codeguru.com/showthread.php?504427-boost-asio-receive-on-linux and it turns out on Linux that you need to bind to the "any address" in order to receive broadcast packets. So you would set this up as your receiving endpoint:
udp::endpoint(boost::asio::ip::address_v4::any(), port)
And then yes you would need to filter on the sender information. Seems a bit odd but seems to be the way Linux interfaces handle broadcasts.

Using boost::asio for simple udp communication

This is a simple problem, but I can't seem to figure out what I am doing wrong. I am attempting to read data sent to a port on a client using Boost and I have the following code which sets up 1) the UDP client, 2) a buffer for reading to and 3) an attempt to read from the socket:
// Set up the socket to read UDP packets on port 10114
boost::asio::io_service io_service;
udp::endpoint endpoint_(udp::v4(), 10114);
udp::socket socket(io_service, endpoint_);
// Data coming across will be 8 bytes per packet
boost::array<char, 8> recv_buf;
// Read data available from port
size_t len = socket.receive_from(
boost::asio::buffer(recv_buf,8), endpoint_);
cout.write(recv_buf.data(), len);
The problem is that the recieve_from function never returns. The server is running on another computer and generating data continuously. I can see traffic on this port on the local computer using Wireshark. So, what am I doing wrong here?
So, it turns out that I need to listen on that port for connections coming from anywhere. As such, the endpoint needs to be setup as
boost::asio::ip::udp::endpoint endpoint_(boost::asio::ip::address::from_string("0.0.0.0"), 10114);
Using this setup, I get the data back that I expect. And fyi, 0.0.0.0 is the same as INADDR_ANY.

How to reconstruct TCP stream from multiple IP packets?

I am working on a TUN-based VPN server whose goal is to analyze packets it receives before forwarding them to their destination. Currently I am receiving the IP packets from a TUN interface, and simply sending them off to their destination unmodified.
I understand that analyzing the content of UDP packets would be as simple as stripping the IP and UDP headers. However, to analyze the contents of TCP traffic, I would need to reconstruct the message from multiple IP packets. Is there an easy way to do this without re-implementing TCP? Are there any easily accessible C/C++ libraries meant for this task? I would prefer Linux system libraries and/or open-source, non-viral/non-copyleft libraries.
One thing I have already considered is making a copy of each IP packet, and changing the destination IP of the copy to localhost, so that a different part of my server may receive these TCP requests and responses fully reconstructed and without headers. However, I would not be able to associate destination IPs with traffic content, which is something that I desire.
It is likely functionality you need will be always tightly coupled with packet dissection. Good protocol dissectors are really needed to extract required information. So my suggestion is to use best open source tool available - wireshark.org
It provides "Follow TCP stream" functionality:
I doesn't look like you can easily extract part of Wireshark dissection logic, but at least there is a good example packet-tcp:
typedef struct _tcp_flow_t {
guint32 base_seq; /* base seq number (used by relative sequence numbers)
* or 0 if not yet known.
*/
tcp_unacked_t *segments;
guint32 fin; /* frame number of the final FIN */
guint32 lastack; /* last seen ack */
nstime_t lastacktime; /* Time of the last ack packet */
guint32 lastnondupack; /* frame number of last seen non dupack */
guint32 dupacknum; /* dupack number */
guint32 nextseq; /* highest seen nextseq */
guint32 maxseqtobeacked;/* highest seen continuous seq number (without hole in the stream) from the fwd party,
* this is the maximum seq number that can be acked by the rev party in normal case.
* If the rev party sends an ACK beyond this seq number it indicates TCP_A_ACK_LOST_PACKET contition */
guint32 nextseqframe; /* frame number for segment with highest
* sequence number
*/
Basically, there is separate conversation extraction logic, please notice find_conversation usage:
/* Attach process info to a flow */
/* XXX - We depend on the TCP dissector finding the conversation first */
void
add_tcp_process_info(guint32 frame_num, address *local_addr, address *remote_addr, guint16 local_port, guint16 remote_port, guint32 uid, guint32 pid, gchar *username, gchar *command) {
conversation_t *conv;
struct tcp_analysis *tcpd;
tcp_flow_t *flow = NULL;
conv = find_conversation(frame_num, local_addr, remote_addr, PT_TCP, local_port, remote_port, 0);
if (!conv) {
return;
}
The actual logic is well documented and available here:
/*
* Given two address/port pairs for a packet, search for a conversation
* containing packets between those address/port pairs. Returns NULL if
* not found.
*
* We try to find the most exact match that we can, and then proceed to
* try wildcard matches on the "addr_b" and/or "port_b" argument if a more
* exact match failed.
* ...
*/
conversation_t *
find_conversation(const guint32 frame_num, const address *addr_a, const address *addr_b, const port_type ptype,
const guint32 port_a, const guint32 port_b, const guint options)
{
conversation_t *conversation;
/*
* First try an exact match, if we have two addresses and ports.
*/
if (!(options & (NO_ADDR_B|NO_PORT_B))) {
So what I'm actually suggesting is to use EPAN library. It is possible to extract this library and use it independently. Please be careful with the license.
Maybe you might be interested in libipq - iptables userspace packet queuing library.
#include <linux/netfilter.h>
#include <libipq.h>
Netfilter provides a mechanism for passing packets out of the stack
for queueing to userspace, then receiving these packets back into the
kernel with a verdict specifying what to do with the packets (such as
ACCEPT or DROP). These packets may also be modified in userspace prior
to reinjection back into the kernel. For each supported protocol, a
kernel module called a queue handler may register with Netfilter to
perform the mechanics of passing packets to and from userspace.
The standard queue handler for IPv4 is ip_queue. It is provided as an
experimental module with 2.4 kernels, and uses a Netlink socket for
kernel/userspace communication.
Once ip_queue is loaded, IP packets may be selected with iptables and
queued for userspace processing via the QUEUE target
here is brief example how to decompose tcp/ip packet:
ipq_packet_msg_t *m = ipq_get_packet(buf);
struct iphdr *ip = (struct iphdr*) m->payload;
struct tcphdr *tcp = (struct tcphdr*) (m->payload + (4 * ip->ihl));
int port = htons(tcp->dest);
status = ipq_set_verdict(h, m->packet_id,
NF_ACCEPT, 0, NULL);
if (status < 0)
die(h);
quick intro
If this is not what you are looking for you might try to use wireshark EPAN library.

Optimization for multicast receiver program

I am writing a program using boost asio to receive multicase messages from around 30 multicase ip in linux with c++. I am here to seek advances on how to minimize packet drop from my client side during runtime. I have already maximized the NIC receive buffer. I am using a 8 core cpu. I am also wondering will the NIC card create same number of buffer queue to equal to number of sockets in the program? Beside configure the NIC card, could I do something on the linux kernel? Since I believe kernel will do buffer copy from the NIC first before our program copy data from it, right?
template<typename msg, int id>
void data_service<msg, id>::on_rt_recv( char* p_raw_packet, int p_length, const boost::system::error_code& error )
{
if (!error)
{
//post to strand and wait to proceed
processing_strand_.post(boost::bind(&data_service::on_rt_recv_handler, this,
p_raw_packet,
p_length));
//continue to listen as soon as possible
auto new_buffer = get_new_buffer();
rt_socket_[p_line]->async_receive_from(boost::asio::buffer(new_buffer, BUFFER_SIZE_), rt_endpoint_,
boost::bind(&data_service::on_rt_recv, this,
new_buffer,
boost::asio::placeholders::bytes_transferred,
boost::asio::placeholders::error));
}
else if (error != boost::asio::error::operation_aborted)
{
memory_pool_.free((void*)p_raw_packet);
}
}
Packet loss issue was caused by hardware, including switch, and NIC cards. the packet rate is actually 2500 * 70 /sec because there were 70 udp sockets. Highly recommend a network monitoring tool call Wireshark which provide load of information regarding to your current network traffic.
Regarding to Demon solution, Boost asio under the hook use iocp in window, and epoll in unix.
No buffer size need to adjust as well.