I am trying to set the DF (don't fragment flag) for sending packets using UDP.
Looking at the Richard Steven's book Volume 1 Unix Network Programming; The Sockets Networking API, I am unable to find how to set this.
I suspect that I would do it with setsockopt() but can't find it in the table on page 193.
Please suggest how this is done.
You do it with the setsockopt() call, by using the IP_DONTFRAG option:
int val = 1;
setsockopt(sd, IPPROTO_IP, IP_DONTFRAG, &val, sizeof(val));
Here's a page explaining this in further detail.
For Linux, it appears you have to use the IP_MTU_DISCOVER option with the value IP_PMTUDISC_DO (or IP_PMTUDISC_DONT to turn it off):
int val = IP_PMTUDISC_DO;
setsockopt(sd, IPPROTO_IP, IP_MTU_DISCOVER, &val, sizeof(val));
I haven't tested this, just looked in the header files and a bit of a web search so you'll need to test it.
As to whether there's another way the DF flag could be set:
I find nowhere in my program where the "force DF flag" is set, yet tcpdump suggests it is. Is there any other way this could get set?
From this excellent page here:
IP_MTU_DISCOVER: Sets or receives the Path MTU Discovery setting for a socket. When enabled, Linux will perform Path MTU Discovery as defined in RFC 1191 on this socket. The don't fragment flag is set on all outgoing datagrams. The system-wide default is controlled by the ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and disabled on all others. For non SOCK_STREAM sockets it is the user's responsibility to packetize the data in MTU sized chunks and to do the retransmits if necessary. The kernel will reject packets that are bigger than the known path MTU if this flag is set (with EMSGSIZE).
This looks to me like you can set the system-wide default using sysctl:
sysctl ip_no_pmtu_disc
returns "error: "ip_no_pmtu_disc" is an unknown key" on my system but it may be set on yours. Other than that, I'm not aware of anything else (other than setsockopt() as previously mentioned) that can affect the setting.
If you are working in Userland with the intention to bypass the Kernel network stack and thus building your own packets and headers and hand them to a custom Kernel module, there is a better option than setsockopt().
You can actually set the DF flag just like any other field of struct iphdr defined in linux/ip.h. The 3-bit IP flags are in fact part of the frag_off
(Fragment Offset) member of the structure.
When you think about it, it makes sense to group those two things as the flags are fragmentation related. According to the RFC-791, the section describing the IP header structure states that Fragment Offset is 13-bit long and there are three 1-bit flags. The
frag_off member is of type __be16, which can hold 13 + 3 bits.
Long story short, here's a solution:
struct iphdr ip;
ip.frag_off |= ntohs(IP_DF);
We are here exactly setting the DF bit using the designed-for-that-particular-purpose IP_DF mask.
IP_DF is defined in net/ip.h (kernel headers, of course), whereas struct iphdr is defined in linux/ip.h.
I agree with the paxdiablo's answer.
setsockopt(sockfd, IPPROTO_IP, IP_MTU_DISCOVER, &val, sizeof(val))
where val is one of:
#define IP_PMTUDISC_DONT 0 /* Never send DF frames. */
#define IP_PMTUDISC_WANT 1 /* Use per route hints. */
#define IP_PMTUDISC_DO 2 /* Always DF. */
#define IP_PMTUDISC_PROBE 3 /* Ignore dst pmtu. */
ip_no_pmtu_disc in kernel source:
if (ipv4_config.no_pmtu_disc)
inet->pmtudisc = IP_PMTUDISC_DONT;
else
inet->pmtudisc = IP_PMTUDISC_WANT;
Related
I want to write an algorithm that calculates the MTU. I'm writing all of this with functions from the WinSock2 library.
From what I understood from other posts people recommend setting the don't fragment bit to true and then sending messages with various lengths until the maximal packet size is found, but what I don't understand is:
How to check for a dropped packet?
E.g.:
/// socket has the "don't fragment" bit set.
int res = sendto(socket, dataToSend, dataToSendLen, flags /* = NULL*/, remoteAddr, remoteAddrLen);
/// How to check whether this packet has been dropped due to fragmentation?
I have been implementing the MSEX CITP protocol - with success so far - in my project for streaming image over the network. I'm using winsocks like so:
v_id = socket(AF_INET, SOCK_DGRAM, 0);
setsockopt(v_id, SOL_SOCKET, SO_BROADCAST, (char*)&isbroad, sizeof(isbroad));
sendto(v_id, temp_buf, v_buffer->o(), 0, address->get(), socksize);
But for image larger than 65k, the spec says that I have to fragment my packets and add a given "preamble".
After some research, from what I understand, I have to set MTU size and fragment header but all my attempts are failing. Can someone point me in the right direction?
MTU means Maximum Transmission Unit : it represente the size of the biggest PDU (protocol data unit) that can be sended/received in a one network transaction (generally 1500 bytes for Ethernet).
In your case, it is clear that the size of your data will greatly exceeds the size of MTU. You must therefore send your files in several fragments.
This constraint forces you to explicitly manage:
1. The sending/receiving of fragments
2. The reconstruction of the original file by concatenating its different fragments in the correct order
To identify and manage your fragments, you must add headers as meta-data to your packets. This headers contain for example the size and the sequence number of the fragment:
---------------------------
| headers | data |
---------------------------
Thanks to these headers, you will know the size of data to read, the sequence number of the fragment. What will allow you to rebuild your image.
I have C++ classes that handles sending and receiving UDP packets. So far I used those to send signals (PING, WAKEUP, ...) in other words, very small packets and never had a problem.
Now I'd like to send large blocks of data (i.e. 0.5Mb), but to optimize the possibility of packet losses, I want to be able to do my own fragmentation. First I wrote a function that gives me the MTU size:
int udp_server::get_mtu_size() const
{
if(f_mtu_size == 0)
{
struct ifreq ifr;
memset(&ifr, 0, sizeof(ifr));
strncpy(ifr.ifr_name, "eth0", sizeof(ifr.ifr_name));
if(ioctl(f_socket, SIOCGIFMTU, &ifr) == 0)
{
f_mtu_size = ifr.ifr_mtu;
}
else
{
f_mtu_size = -1;
}
}
return f_mtu_size;
}
Note: I know about PMTUD which this function ignores. As mentioned below, this is to work on a controlled network so the MTU path won't just change on us.
This function is likely to return 1500 under Linux.
What is really not clear and seems contradictory between many answers is that this 1,500 bytes size would not be just my payload. It would possibly include some headers over which I have no control (i.e. Ethernet header + footer, IPv4 header, UDP header.)
From some other questions and answers, it feels like I can send 1,500 bytes of data without fragmentation, assuming all my MTUs are 1,500.
So... Which is true?
My data buffer can have a size equal to MTU
My data buffer must be MTU - sizeof(various-headers/footers)
P.S. The network is a LAN that we control 100%. The packets will travel from one main computer to a set of slave computers using UDP multicast. There is only one 1Gbps switch in between. Nothing more.
The size is very clearly defined in RFC-8085: UDP Usage Guidelines.
https://www.rfc-editor.org/rfc/rfc8085#section-3.2
There is the relevant bit about the size calculation for the payload.
To determine an appropriate UDP payload size, applications MUST subtract the size of the IP header (which includes any IPv4 optional headers or IPv6 extension headers) as well as the length of the UDP header (8 bytes) from the PMTU size. This size, known as the Maximum Segment Size (MSS), can be obtained from the TCP/IP stack [RFC1122].
So in C/C++, this becomes:
#include <netinet/ip.h> // for iphdr
#include <netinet/udp.h> // for udphdr
int mss(udp.get_mtu_size());
mss -= sizeof(iphdr);
mss -= sizeof(udphdr);
WARNING: The size of the IP header varies depending on options. If you use options that will increase the size, your MSS computation must take that in account.
The size of the Ethernet header and footer are not included here because those are transparent to the UDP packet.
when using recvmsg I use MSG_TRUNC and MSG_PEEK like so:
msgLen = recvmsg(fd, &hdr, MSG_PEEK | MSG_TRUNC)
this gives me the size of the buffer to allocate for the next message
my question is how do I get the size of the buffer I should allocate for the msg_control field inside the header
Based on the doc, you need to allocate the buffer for msg_control of the size msg_controllen. To know the size beforehand, you could call like you did recvmsg(fd, &hdr, MSG_PEEK | MSG_TRUNC). MSG_PEEK won't remove the message and MSG_TRUNC will allow to return the size of the message, even if the buffer is too small.
a few solutions:
call recvmsg(fd, &hdr, MSG_PEEK | MSG_TRUNC) and init the buffer in hdr based on the size returned, and call it again without the flags.
allocate a buffer big enough, if you know the size of your messages beforehand, and call recvmsg. If an error occurs (returned -1), check the error code if the message was truncated (MSG_TRUNC or MSG_CTRUNC)
I cannot speak for other platforms than macOS (whose core is based upon a FreeBSD core, so maybe it's no different in BSD-systems, too) and the POSIX standard is not helpful either as it leaves pretty much all details to be defined by the protocol, but by default behavior of recvmsg on macOS for a UDP socket is to not deliver any control data at all. No matter what size you set msg_control on input, it will always be 0 on output. If you wish to receive any control data, you first have to explicitly enable that for the socket.
E.g. if you want to know both addresses, source and destination address of a packet (msg_name only gives you the source address of a received packet), then you have to do this:
int yes = 1;
setsockopt(soc, IPPROTO_IP, IP_RECVDSTADDR, &yes, sizeof(yes));
And now you'll get the destination address for IPv4 sockets documented as
The msg_control field in the msghdr structure points to a buffer that
contains a cmsghdr structure followed by the IP address. The cmsghdr
fields have the following values:
cmsg_len = sizeof(struct in_addr)
cmsg_level = IPPROTO_IP
cmsg_type = IP_RECVDSTADDR
This means you need to provide at least 16 bytes storage on my system, as struct cmsghdr alone is always 12 bytes on that system (four times 32 bit) and an IPv4 address is another 4 bytes, that's 16 bytes together. This value needs to be correctly rounded using CMSG_SPACE macro, but on my system the macro only makes sure it's a multiple of 32 bit and 16 byte already is such a multiple, so CMSG_SPACE(16) returns 16 for me.
As I know in advance which options I have enabled and which control data I will receive, I can exactly calculate the required space in advance.
For raw and other more obscure sockets, certain control data may always be included in the output by default, even if not explicitly enabled, but this control data will then always be equal in size and won't fluctuate from packet to packet as the packet payload size does. Thus once you know the correct size, you can rely upon the fact that it won't change, at least not without you enabling/disabling any options.
If your control data buffer was too small, the MSG_CTRUNC flag is set in the output, always (even if you don't set any flags on input), then you need to increase the control data buffer size and try again (with the next packet or with the same packet if you used MSG_PEEK as input flag), until you've once been able to make that call without getting the MSG_CTRUNC flag on output. Finally look at what the msg_control field says. On input it's the amount of buffer space available but on output it contains the exact amount of buffer space that was actually used. This is the exact buffer size you need to receive the control data of all future packets of that socket, unless you change options that will cause more/less control data to be sent and then you just have to detect that size again the same way as before.
For a more complete example, you may also have a look at:
https://stackoverflow.com/a/49308499/15809
I am afraid you can't get that value from the Posix.1g sockets API. Not sure about all implementations, but not possible in Linux. As you may notice, no control flow is provided in ancillary data buffers, so you will need to implement it yourself in case you are sending a lot of info between processes. On the other hand, for common case uses, you already know what you are going to receive at compile time (but you probably already know this). If you need to implement you own control flow, take into account that, in Linux, ancillary data seems to behave like a stream socket.
However, you can get/set the buffer length of the worst case scenario in /proc/sys/net/core/optmem_max, see cmsg(3). So, I guess you could set it to a reasonable value and declare a buffer that big.
I'm writing code to send raw Ethernet frames between two Linux boxes. To test this I just want to get a simple client-send and server-receive.
I have the client correctly making packets (I can see them using a packet sniffer).
On the server side I initialize the socket like so:
fd = socket(PF_PACKET, SOCK_RAW, htons(MY_ETH_PROTOCOL));
where MY_ETH_PROTOCOL is a 2 byte constant I use as an ethertype so I don't hear extraneous network traffic.
when I bind this socket to my interface I must pass it a protocol again in the socket_addr struct:
socket_address.sll_protocol = htons(MY_ETH_PROTOCOL);
If I compile and run the code like this then it fails. My server does not see the packet. However if I change the code like so:
socket_address.sll_protocol = htons(ETH_P_ALL);
The server then can see the packet sent from the client (as well as many other packets) so I have to do some checking of the packet to see that it matches MY_ETH_PROTOCOL.
But I don't want my server to hear traffic that isn't being sent on the specified protocol so this isn't a solution. How do I do this?
I have resolved the issue.
According to http://linuxreviews.org/dictionary/Ethernet/ referring to the 2 byte field following the MAC addresses:
"values of that field between 64 and 1522 indicated the use of the new 802.3 Ethernet format with a length field, while values of 1536 decimal (0600 hexadecimal) and greater indicated the use of the original DIX or Ethernet II frame format with an EtherType sub-protocol identifier."
so I have to make sure my ethertype is >= 0x0600.
According to http://standards.ieee.org/regauth/ethertype/eth.txt use of 0x88b5 and 0x88b6 is "available for public use for prototype and vendor-specific protocol development." So this is what I am going to use as an ethertype. I shouldn't need any further filtering as the kernel should make sure to only pick up ethernet frames with the right destination MAC address and using that protocol.
I've worked around this problem in the past by using a packet filter.
Hand Waving (untested pseudocode)
struct bpf_insn my_filter[] = {
...
}
s = socket(PF_PACKET, SOCK_DGRAM, htons(protocol));
struct sock_fprog pf;
pf.filter = my_filter;
pf.len = my_filter_len;
setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));
sll.sll_family = PF_PACKET;
sll.sll_protocol = htons(protocol);
sll.sll_ifindex = if_nametoindex("eth0");
bind(s, &sll, sizeof(sll));
Error checking and getting the packet filter right is left as an exercise for the reader...
Depending on your application, an alternative that may be easier to get working is libpcap.