I want to write an algorithm that calculates the MTU. I'm writing all of this with functions from the WinSock2 library.
From what I understood from other posts people recommend setting the don't fragment bit to true and then sending messages with various lengths until the maximal packet size is found, but what I don't understand is:
How to check for a dropped packet?
E.g.:
/// socket has the "don't fragment" bit set.
int res = sendto(socket, dataToSend, dataToSendLen, flags /* = NULL*/, remoteAddr, remoteAddrLen);
/// How to check whether this packet has been dropped due to fragmentation?
Related
I have C++ classes that handles sending and receiving UDP packets. So far I used those to send signals (PING, WAKEUP, ...) in other words, very small packets and never had a problem.
Now I'd like to send large blocks of data (i.e. 0.5Mb), but to optimize the possibility of packet losses, I want to be able to do my own fragmentation. First I wrote a function that gives me the MTU size:
int udp_server::get_mtu_size() const
{
if(f_mtu_size == 0)
{
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
if(ioctl(f_socket, SIOCGIFMTU, &ifr) == 0)
{
f_mtu_size = ifr.ifr_mtu;
}
else
{
f_mtu_size = -1;
}
}
return f_mtu_size;
}
Note: I know about PMTUD which this function ignores. As mentioned below, this is to work on a controlled network so the MTU path won't just change on us.
This function is likely to return 1500 under Linux.
What is really not clear and seems contradictory between many answers is that this 1,500 bytes size would not be just my payload. It would possibly include some headers over which I have no control (i.e. Ethernet header + footer, IPv4 header, UDP header.)
From some other questions and answers, it feels like I can send 1,500 bytes of data without fragmentation, assuming all my MTUs are 1,500.
So... Which is true?
My data buffer can have a size equal to MTU
My data buffer must be MTU - sizeof(various-headers/footers)
P.S. The network is a LAN that we control 100%. The packets will travel from one main computer to a set of slave computers using UDP multicast. There is only one 1Gbps switch in between. Nothing more.
The size is very clearly defined in RFC-8085: UDP Usage Guidelines.
https://www.rfc-editor.org/rfc/rfc8085#section-3.2
There is the relevant bit about the size calculation for the payload.
To determine an appropriate UDP payload size, applications MUST subtract the size of the IP header (which includes any IPv4 optional headers or IPv6 extension headers) as well as the length of the UDP header (8 bytes) from the PMTU size. This size, known as the Maximum Segment Size (MSS), can be obtained from the TCP/IP stack [RFC1122].
So in C/C++, this becomes:
#include <netinet/ip.h> // for iphdr
#include <netinet/udp.h> // for udphdr
int mss(udp.get_mtu_size());
mss -= sizeof(iphdr);
mss -= sizeof(udphdr);
WARNING: The size of the IP header varies depending on options. If you use options that will increase the size, your MSS computation must take that in account.
The size of the Ethernet header and footer are not included here because those are transparent to the UDP packet.
I want to implement a client for a sensor that sends data over tcp and uses the following protocol:
the message-header starts with the byte-sequence 0xAFFEC0CC2 of type uint32
the header in total is 24 Bytes long (including the start sequence) and contains the size in bytes of the message-body as a uint32
the message-body is sent directly after the header and not terminated by a demimiter
Currently, I got the following code (assume a connected socket exists)
typedef unsigned char byte;
boost::system::error_code error;
boost::asio::streambuf buf;
std::string magic_word_s = {static_cast<char>(0xAF), static_cast<char>(0xFE),
static_cast<char>(0xC0), static_cast<char>(0xC2)};
ssize_t n = boost::asio::read_until(socket_, buf, magic_word_s, error);
if(error)
std::cerr << boost::system::system_error(error).what() << std::endl;
buf.consume(n);
n = boost::asio::read(socket_, buf, boost::asio::transfer_exactly(20);
const byte * p = boost::asio::buffer_cast<const byte>(buf.data());
uint32_t size_of_body = *((byte*)p);
unfortunately the documentation for read_until remarks:
After a successful read_until operation, the streambuf may contain additional data beyond the delimiter. An application will typically leave that data in the streambuf for a subsequent read_until operation to examine.
which means that I loose synchronization with the described protocol.
Is there an elegant way to solve this?
Well... as it says... you just "leave" it in the object, or temporary store it in another, and handle the whole message (below called 'packet') if it is complete.
I have a similar approach in one of my projects. I'll explain a little how I did it, that should give you a rough idea how you can handle the packets correctly.
In my Read-Handler (-callback) I keep checking if the packet is complete. The meta-data information (header for you) is temporary stored in a map associated with the remote-partner (map<RemoteAddress, InfoStructure>).
For example it can look like this:
4 byte identifier
4 byte message-length
n byte message
Handle incoming data, check if identifier + message-length are received already, continue to check if message-data is completed with received data.
Leave rest of the packet in the temporary buffer, erase old data.
Continue with handling when next packet arrives or check if received data completes next packet already...
This approach may sound a little slow, but I get even with SSL 10MB/s+ on a slow machine.
Without SSL much higher transfer-rates are possible.
With this approach, you may also take a look into read_some or its asynchronous version.
I am working on a TUN-based VPN server whose goal is to analyze packets it receives before forwarding them to their destination. Currently I am receiving the IP packets from a TUN interface, and simply sending them off to their destination unmodified.
I understand that analyzing the content of UDP packets would be as simple as stripping the IP and UDP headers. However, to analyze the contents of TCP traffic, I would need to reconstruct the message from multiple IP packets. Is there an easy way to do this without re-implementing TCP? Are there any easily accessible C/C++ libraries meant for this task? I would prefer Linux system libraries and/or open-source, non-viral/non-copyleft libraries.
One thing I have already considered is making a copy of each IP packet, and changing the destination IP of the copy to localhost, so that a different part of my server may receive these TCP requests and responses fully reconstructed and without headers. However, I would not be able to associate destination IPs with traffic content, which is something that I desire.
It is likely functionality you need will be always tightly coupled with packet dissection. Good protocol dissectors are really needed to extract required information. So my suggestion is to use best open source tool available - wireshark.org
It provides "Follow TCP stream" functionality:
I doesn't look like you can easily extract part of Wireshark dissection logic, but at least there is a good example packet-tcp:
typedef struct _tcp_flow_t {
guint32 base_seq; /* base seq number (used by relative sequence numbers)
* or 0 if not yet known.
*/
tcp_unacked_t *segments;
guint32 fin; /* frame number of the final FIN */
guint32 lastack; /* last seen ack */
nstime_t lastacktime; /* Time of the last ack packet */
guint32 lastnondupack; /* frame number of last seen non dupack */
guint32 dupacknum; /* dupack number */
guint32 nextseq; /* highest seen nextseq */
guint32 maxseqtobeacked;/* highest seen continuous seq number (without hole in the stream) from the fwd party,
* this is the maximum seq number that can be acked by the rev party in normal case.
* If the rev party sends an ACK beyond this seq number it indicates TCP_A_ACK_LOST_PACKET contition */
guint32 nextseqframe; /* frame number for segment with highest
* sequence number
*/
Basically, there is separate conversation extraction logic, please notice find_conversation usage:
/* Attach process info to a flow */
/* XXX - We depend on the TCP dissector finding the conversation first */
void
add_tcp_process_info(guint32 frame_num, address *local_addr, address *remote_addr, guint16 local_port, guint16 remote_port, guint32 uid, guint32 pid, gchar *username, gchar *command) {
conversation_t *conv;
struct tcp_analysis *tcpd;
tcp_flow_t *flow = NULL;
conv = find_conversation(frame_num, local_addr, remote_addr, PT_TCP, local_port, remote_port, 0);
if (!conv) {
return;
}
The actual logic is well documented and available here:
/*
* Given two address/port pairs for a packet, search for a conversation
* containing packets between those address/port pairs. Returns NULL if
* not found.
*
* We try to find the most exact match that we can, and then proceed to
* try wildcard matches on the "addr_b" and/or "port_b" argument if a more
* exact match failed.
* ...
*/
conversation_t *
find_conversation(const guint32 frame_num, const address *addr_a, const address *addr_b, const port_type ptype,
const guint32 port_a, const guint32 port_b, const guint options)
{
conversation_t *conversation;
/*
* First try an exact match, if we have two addresses and ports.
*/
if (!(options & (NO_ADDR_B|NO_PORT_B))) {
So what I'm actually suggesting is to use EPAN library. It is possible to extract this library and use it independently. Please be careful with the license.
Maybe you might be interested in libipq - iptables userspace packet queuing library.
#include <linux/netfilter.h>
#include <libipq.h>
Netfilter provides a mechanism for passing packets out of the stack
for queueing to userspace, then receiving these packets back into the
kernel with a verdict specifying what to do with the packets (such as
ACCEPT or DROP). These packets may also be modified in userspace prior
to reinjection back into the kernel. For each supported protocol, a
kernel module called a queue handler may register with Netfilter to
perform the mechanics of passing packets to and from userspace.
The standard queue handler for IPv4 is ip_queue. It is provided as an
experimental module with 2.4 kernels, and uses a Netlink socket for
kernel/userspace communication.
Once ip_queue is loaded, IP packets may be selected with iptables and
queued for userspace processing via the QUEUE target
here is brief example how to decompose tcp/ip packet:
ipq_packet_msg_t *m = ipq_get_packet(buf);
struct iphdr *ip = (struct iphdr*) m->payload;
struct tcphdr *tcp = (struct tcphdr*) (m->payload + (4 * ip->ihl));
int port = htons(tcp->dest);
status = ipq_set_verdict(h, m->packet_id,
NF_ACCEPT, 0, NULL);
if (status < 0)
die(h);
quick intro
If this is not what you are looking for you might try to use wireshark EPAN library.
I need to send data to another process every 0.02s.
The Server code:
//set socket, bind, listen
while(1){
sleep(0.02);
echo(newsockfd);
}
void echo (int sock)
{
int n;
char buffer[256]="abc";
n=send(sock,buffer,strlen(buffer),0);
if (n < 0) error("ERROR Sending");
}
The Client code:
//connect
while(1)
{
bzero(buffer,256);
n = read(sock,buffer,255);
printf("Recieved data:%s\n",buffer);
if (n < 0)
error("ERROR reading from socket");
}
The problem is that:
The client shows something like this:
Recieved data:abc
Recieved data:abcabcabc
Recieved data:abcabc
....
How does it happen? When I set sleep time:
...
sleep(2)
...
It would be ok:
Recieved data:abc
Recieved data:abc
Recieved data:abc
...
TCP sockets do not guarantee framing. When you send bytes over a TCP socket, those bytes will be received on the other end in the same order, but they will not necessarily be grouped the same way — they may be split up, or grouped together, or regrouped, in any way the operating system sees fit.
If you need framing, you will need to send some sort of packet header to indicate where each chunk of data starts and ends. This may take the form of either a delimiter (e.g, a \n or \0 to indicate where each chunk ends), or a length value (e.g, a number at the head of each chunk to denote how long it is).
Also, as other respondents have noted, sleep() takes an integer, so you're effectively not sleeping at all here.
sleep takes unsigned int as argument, so sleep(0.02) is actually sleep(0).
unsigned int sleep(unsigned int seconds);
Use usleep(20) instead. It will sleep in microseconds:
int usleep(useconds_t usec);
The OS is at liberty to buffer data (i.e. why not just send a full packet instead of multiple packets)
Besides sleep takes a unsigned integer.
The reason is that the OS is buffering data to be sent. It will buffer based on either size or time. In this case, you're not sending enough data, but you're sending it fast enough the OS is choosing to bulk it up before putting it on the wire.
When you add the sleep(2), that is long enough that the OS chooses to send a single "abc" before the next one comes in.
You need to understand that TCP is simply a byte stream. It has no concept of messages or sizes. You simply put bytes on the wire on one end and take them off on the other. If you want to do specific things, then you need to interpret the data special ways when you read it. Because of this, the correct solution is to create an actual protocol for this. That protocol could be as simple as "each 3 bytes is one message", or more complicated where you send a size prefix.
UDP may also be a good solution for you, depending on your other requirements.
sleep(0.02)
is effectively
sleep(0)
because argument is unsigned int, so implicit conversion does it for you. So you have no sleep at all here. You can use sleep(2) to sleep for 2 microseconds.Next, even if you had, there is no guarantee that your messages will be sent in a different frames. If you need this, you should apply some sort of delimiter, I have seen
'\0'
character in some implementation.
TCPIP stacks buffer up data until there's a decent amount of data, or until they decide that there's no more coming from the application and send what they've got anyway.
There are two things you will need to do. First, turn off Nagle's algorithm. Second, sort out some sort of framing mechanism.
Turning off Nagle's algorithm will cause the stack to "send data immediately", rather than waiting on the off chance that you'll be wanting to send more. It actually leads to less network efficiency because you're not filling up Ethernet frames, something to bare in mind on Gigabit where jumbo frames are required to get best throughput. But in your case timeliness is more important than throughput.
You can do your own framing by very simple means, eg by send an integer first that says how long the rest if the message will be. At the reader end you would read the integer, and then read that number of bytes. For the next message you'd send another integer saying how long that message is, etc.
That sort of thing is ok but not hugely robust. You could look at something like ASN.1 or Google Protocol buffers.
I've used Objective System's ASN.1 libraries and tools (they're not free) and they do a good job of looking after message integrity, framing, etc. They're good because they don't read data from a network connection one byte at a time so the efficiency and speed isn't too bad. Any extra data read is retained and included in the next message decode.
I've not used Google Protocol Buffers myself but it's possible that they have similar characteristics, and there maybe other similar serialisation mechanisms out there. I'd recommend avoiding XML serialisation for speed/efficiency reasons.
I have a TCP client connecting to my server which is sending raw data packets. How, using Boost.Asio, can I get the "whole" packet every time (asynchronously, of course)? Assume these packets can be any size up to the full size of my memory.
Basically, I want to avoid creating a statically sized buffer.
Typically when you build a custom protocol on the top of TCP/IP you use a simple message format where first 4 bytes is an unsigned integer containing the message length and the rest is the message data. If you have such a protocol then the reception loop is as simple as below (not sure what is ASIO notation, so it's just an idea)
for(;;) {
uint_32_t len = 0u;
read(socket, &len, 4); // may need multiple reads in non-blocking mode
len = ntohl(len);
assert (len < my_max_len);
char* buf = new char[len];
read(socket, buf, len); // may need multiple reads in non-blocking mode
...
}
typically, when you do async IO, your protocol should support it.
one easy way is to prefix a byte array with it's length at the logical level, and have the reading code buffer up until it has a full buffer ready for parsing.
if you don't do it, you will end up with this logic scattered all over the place (think about reading a null terminated string, and what it means if you just get a part of it every time select/poll returns).
TCP doesn't operate with packets. It provides you one contiguous stream. You can ask for the next N bytes, or for all the data received so far, but there is no "packet" boundary, no way to distinguish what is or is not a packet.