I am acquainting with BSD sockets, and flicking through the man page of sendto, I bumped into MSG_CONFIRM flag, which is quite mysterious to me at the moment.
The description says:
Tell the link layer that forward progress happened: you got a
successful reply from the other side. If the link layer doesn't get
this it will regularly reprobe the neighbor (e.g., via a unicast ARP).
Only valid on SOCK_DGRAM and SOCK_RAW sockets and currently
implemented only for IPv4 and IPv6.
After a quick look at the man page of arp, I understand that flagging something MSG_CONFIRM prevents the ARP mapping MAC address ↔ IP address of the remote machine from being considered stale.
Now I am puzzled because I can’t see any reason why I should not put it, and therefore, why didn’t they enforce that directly in the library. Why is the application layer expected to deal with anything that happens down there at the link layer.
So did I miss anything? when should I set it, or not set it?
You should only set the flag if the datagram you're sending is a direct response to a datagram you just received from the same peer.
If you're sending an initial request, or sending a datagram in response to some other event (like user input, or a timeout) then you should not set the MSG_CONFIRM flag.
The reason to not send it is in case the mac address for the IP changes over time. If you're constantly telling your system not to check, it will continue to send to the same MAC even if the IP address isn't there anymore.
It seems the case for sending it requires a very special situation where you can guarantee things about the recipient of your messages. The overhead of a periodic ARP request is very low, so the benefits are extremely limited.
This is my attempt at making sense of this after reading the other two answers here.
If you're wondering about when to use the MSG_CONFIRM flag, essentially, all it does is tell the underlying ARP network layer to NOT periodically verify the MAC of the recipient IP (which would update the ARP hardware-MAC-address-to-IP-address mapping), because we are confident the IP we are sending to is the device we think it is, since this message are are sending is in direct response to a message we just received from them! In other words, if in doubt, leave OUT the MSG_CONFIRM flag. Put it in ONLY if the message we are sending is a direct response to a message received, and therefore we want to increase the network efficiency a tiny bit by NOT verifying the MAC again periodically, at the risk that the destination MAC could change and be wrong and no loner match the IP address for the device we think we are sending this message to!
Pros of using MSG_CONFIRM:
It increases the efficiency of the network by NOT having ARP reprobe for the MAC of the destination address.
Cons or risks of using MSG_CONFIRM:
It may mean that if the old device on the network drops off, and a new destination device pops up on the network with the same destination address as the old one, but with a different hardware MAC, we won't know, because MSG_CONFIRM tells ARP not to reprobe that IP address's MAC.
When MSG_CONFIRM can be used:
When our message out is a direct reply to a message we just received in from that device, so we are sure its MAC is still correct.
MSG_CONFIRM in this case can be thought of as a "confirmation" message to the sender that we got their previous message.
Official documentation from here for sendto(): https://linux.die.net/man/2/sendto (emphasis added):
MSG_CONFIRM (Since Linux 2.3.15)
Tell the link layer that forward progress happened: you got a successful reply from the other side. If the link layer doesn't get this it will regularly reprobe the neighbor (e.g., via a unicast ARP). Only valid on SOCK_DGRAM and SOCK_RAW sockets and currently only implemented for IPv4 and IPv6. See arp(7) for details.
So, my rule of thumb is: just don't use it. It has little benefit. But, if you choose to use it, only use it
To reply (via a UDP message) directly to the sender of a UDP message just received via recvfrom(), or equivalent, or
In embedded device networks where you have fixed (static) MAC addresses, IP addresses, and ARP MAC-to-IP mapping, and carefully manually control it all, and care about the tiny bit of network efficiency gain achieved by disabling ARP probing.
Related
I'm creating my own server using some protocols : TCP-PULL ok, TCP-PUSH ok, UDP-PULL ok (but I can't serve two clients at the same time!), UDP-PUSH ok (same problem).
Now, I need to create my the last protocol : Multicast-PUSH, but I can't understand how it works and I really don't know how to code it in C++. I've read about join a group and that in multicast there's no connection, so bytes are sent even if anyone is connected.
I'm coding in C++, using MFC libraries and CSockets.
Could please someone help?
Thank's!!
Consider an example where one system needs to send the same information to multiple systems. How best to accomplish this? The obvious approach is to have a socket "connection" for each target system. When data is ready to be sent, the sender iterates over each "connection," transmitting the data to the target system. This iteration process has to occur every time a message is sent, and it has to be robust such that if a transmission fails for one system, it doesn't fail for the remaining systems. But the problem is really worse than that because typically all the systems in a multicast exchange which to transmit data. This means that each system has to have a "connection" to each and every system wishing to participate.
This is where multicast comes in. In multicast, the sender sends data once to a specialized IP address and port called the multicast group. From there the network equipment, e.g., routers, take care of forwarding the data to the other systems in the multicast group. To achieve this, all systems wishing to participate in the multicast exchange have to "join" the multicast group, which happens during socket initialization and is used to simply notify the network equipment that the system wishes to participate in the multicast exchange. There is a special range of IPv4 addresses used for multicast - 224.0.0.0 to 239.255.255.255. You must use an IP address within this range and a port number of your choosing in order for multicast to work correctly.
Check out the Multicast Wrapper Class at CodeProject for an example of how to do this in MFC.
I am working on Connecting an Embedded Circuit board to PC via TCP.
The board contains a chip which, sadly, doesn't generate any interrupt on Receiving data. But it does generates an interrupt on receiving "Keep-Alive" signal.
Currently I have to poll for data.
Instead, I am thinking that, I will send data from PC and then a KeepAlive Signal. Whenever a KeepAlive is received, I will read data too.
I do understand that this might generate false alarms but it's better than continuous polling.
I observed a Keep-Alive packet on Wireshark, it has One byte of Data and it is "00".
And then I tried to send TCP Packet with Data as "00":
I can see, Only Flag Section is different.
I got Two questions:
(Broadly) How to manually send a Keep-Alive Signal?
How to change that flag setting? (Flags in send and sendto are different)
Update:
I have tried RawSockets, but that didn't help me or I missed something. I just change Flag to ACK in RAW Sockets header.
RFC 1122 section 4.2.3.6 might be worth reading.
It states that keepalive is an optional feature of the TCP implementation. It also states that keepalive signals should be limited to at most one every two hours. So manually emitting one from your application isn't a desired feature in general.
Furthermore, it describes details about the implementation, in particular pointing out the sequence number involved. This is one difference visible in your screen shots which you apparently failed to notice: the real keepalive packet has a very high relative sequence number, which is simply the unsigned representation of -1. To reproduce this with raw sockets, I think you'd have to somehow get your hands on the current TCP sequence number of the existing connection. Haven't worked enough with RawSockets to know details on how to do this.
The supported means to have the system send keepalives periodically is using the SO_KEEPALIVE option. But that won't be of much use to emit such a signal at a specific moment in time, I think.
I'm writing a cross-platform client application that uses sockets, written in C++. I'm having problems where the server is doing a hard close on the socket when it's done sending me info.
I've been reading other posts on this topic, and I'm not so much interested in the rights or wrong of this approach, but it's seems the server is either explicitly setting SO_LINGER=0, or that's the default behavior on that system (not sure, it's a Linux box).
I can see (in Wireshark) that the data was sent to me followed within milli-seconds by an RST, indicating a hard close by the server. I personally don't agree with this approach as it should be up to the client to shutdown the socket.
Server team are saying there's nothing wrong with that approach (doing a hard close rather than shutdown), it's typical on servers to avoid accumulating TIMED_WAIT sockets. On Windows my select() returns indicating there's something to read (while I haven't read any of this "in transit" data yet).
However, because of the quick arrival of the RST, on Windows recv() returns -1 and I'm seeing a 10054 for the error code (connection reset by peer). This wouldn't be too bad if I could at least get the data that was sent, but it seems that once my client's socket stack sees the RST any unread bytes are no longer made available to me.
On Linux (client), there's no problem. It seems the TCP stack is behaving slightly differently, in that I can read the outstanding bytes before the RST is honoured. I'm having trouble convincing the server guys they have a bug, given that it works for a Linux client.
First off, am I correct? Is this a server-side issue? I can't see that the client end is doing anything wrong, so it must be right?
It seems the server team are adamant that they want to perform the close, and they don't want to in have TIMED_WAITs, so I was going to push for them to add a SO_LINGER of, say 2 seconds? Does that sound like it will solve my problem? From what I understand this will stop the server from sending out a RST so soon after sending data, and should give me a chance to read the outstanding bytes.
Found a definitive answer to my own question:
"...Upon reception of RST segment, the receiving side will immediately abort the connection. This statement has more implications than just meaning that you will not be able to receive or send any more data to/from this connection. It also implies that any unread data still in the TCP reception buffer will be lost..." It cites the book "TCP/IP Internetworking Volume II". I don't have that book, so I can only take his word for it. Doesn't seems to discard data on Linux, only Windows...
Olivier Langlois's blog
The side-effect of fiddling with SO_LINGER to force a reset is that all pending data is lost. The fact that you don't receive it is all the proof you need that the server team is wrong to do this.
RFC 793 cited below says 'this command [ABORT] causes all pending SENDs and RECEIVEs to be aborted, ... and a special RESET message to be sent to the TCP on the other side of the connection.' See also W.R. Stevens, TCP/IP Illustrated, Vol. 1, p. 287: 'Aborting a connection provides two features to the application: (1) any queued data is thrown away and the reset is sent immediately, and (2) the receiver of the RST can tell that the other end did an abort instead of a normal close'. There is similar wording, along with an extract from the BSD code that implements it, in Vol. 2.
The TIME_WAIT state only occurs on a socket which sends a FIN before it has received one: see RFC 793. So the server should be waiting for a FIN from the client, with a suitable timeout, rather than resetting. This will also permit the client to do connection pooling.
Is there a way to check if the send buffer of an TCP Connection is completely empty?
I haven't found anything until now and just want to make sure a connection is not closed by my server while there are still data being transmitted to a certain client.
I'm using poll to check if I'm able to send data on a non-blocking socket. But by that I'm not able to find out if EVERYTHING has been sent in buffer, am I?
In Linux, you can query a socket's send queue with ioctl(sd, SIOCOUTQ, &bytes). See man ioctl for details.
The information is not completely reliable in the sense that it is possible that the data has been received by the remote host, since the buffer cannot be emptied until an ACK is received. You probably should not use it to add another level of flow-control on top of TCP.
If the remote host actually closes the connection (or half-closes it), then the socket become unwriteable, regardless of how much data might have been in the buffer. You can detect this condition by writing 0 bytes to the socket.
The more difficult (and often more likely) condition is the remote host becoming unreachable, because of network issues or because it crashes. In that case, data will pile up in the send buffer, but that can also happen because the remote host's receive buffer is full (perhaps because the process reading the buffer doesn't have enough resources to process its input). In the case of network routing issues, you might get a router notification (an ICMP error), which should make the socket unwritable; unfortunately, there are many network errors which just result in black holes.
Using Winsock, C++, I send and receive the data with send()/recv(), TCP connection. I want to be sure that the data has been delivered to the other party, and wonder if it is recommended to send back some acknowledgment message after (if) receiving data with recv.
Here are two possibilities, and please advice which way to go:
If send returns the size of passed buffer, assume that the data has been delivered at least to recv function on the other side of wire. When I say "at least", I mean even if the recv fails there (e.g. due to insufficient buffer, etc.), I don't care, I just want to be sure I've done my server part of work properly - I've sent the data completely (i.e. the data reached the other machine).
Use additional acknowledgment: after receiving the data with recv, send back some ID of received packet (part of header of each data sent) signaling the successful receive operation of that packet. If I don't receive such "acknowledgment message" after some interval, return failure code from the sender function.
The second answer looks more safe, but I don't want to complicate the transfer protocol if it is redundant. Also please note that I'm talking about the TCP connection (which is more safe by itself than UDP).
Is there any other mechanisms (maybe some other APIs? maybe WSARecv()/WSASend() work differently?) of ensuring that the data was delivered to the recv function on the other side?
If you recommend the second way, could you please give me some code snippet that allows me to use recv with timeout to receive the acknowledgment? recv is a blocking operation so it will hang forever if the previous send attempt failed (the other party was not notified). Is there any simple way of using recv with timeout (without creating separate thread every time which would probably be the overkill for each and every send operation).
Also the amount of data I pass to send function might be quite big (several megabytes), so how to choose the timeout for "acknowledgment message"? Maybe I should "split" large buffers and use several send calls? I think it will get quite complicated, please advice!
EDIT: OK, you people are suggesting that TCP/IP stack will handle it (i.e. no manual acknowledgment required), but this is what I found on MSDN page: "The successful completion of a send function does not indicate that the data was successfully delivered and received to the recipient. This function only indicates the data was successfully sent." So even if the TCP mechanism has the ability to ensure data delivery, I can't get that status (success or not) via send() function, or any other Winsock function I know. Do you know any way of getting the status from the TCP layer? Again - return value of send() function seems to be not enough!
========================================================
EDIT 2: OK, I think we agree that even though TCP protocol considers the error handling when something goes wrong, the send() function of Winsock is not capable of reporting the errors (simply because it returns before actual transmitting of data starts by the network driver). So here is a million dollar question: Does the send() function of Winsock at least ensure that no other packets will be delivered to the other party until the current packet will be? In other words, if the sending fails for some network failure (but not reported by send() call), and then the network failure will be fixed before next call of send() function with next chunk of data, will it be ensured that the previous packet (which failed but not reported by send()) will be delivered before the next packet? In other words, is there a chance that the one particular send() function will fail "silently", so that subsequent send() calls will succeed but the first packet will be lost? AGAIN - I'm not talking at the TCP level, I'm talking at the Winsock API level!
Why don't you trust your TCP/IP stack to guarantee delivery. After all, that is the whole point of using TCP instead of UDP.
The existing answers here are mostly correct: if you use TCP you really don't need to worry about reliable delivery of your packets to your peer.
But this is a dangerous view for some systems where data integrity must be taken to the next level: the common criteria auditing requirement FAU_STG.4.1 requires the ability to prevent auditable events if the audit log might suffer a loss of audit entries. (For example, the Linux auditd(8) audit logging daemon can be configured to place the computer in single-user-mode or halt the system completely when there is no more space left for audit logs.) Audit logs from remote systems should probably be maintained until it is known that they have been successfully written to centralized log servers.
Financial transactions would probably be best handled with a more reliable protocol than simple TCP as well -- crediting or debiting accounts would be best handled with a multi-staged protocol to ensure availability of funds, perform the transaction, then report the result of the transaction to the origination point.
TCP allows nearly a gigabyte of in-flight data between two peers (under extreme conditions); depending upon the requirements of your application, you might need to maintain that data at the sending side until you receive positive confirmation from your peer that the data has been properly handled.
Thankfully, most applications aren't this critical; losing a megabyte of data here or there down a socket that reports a closed connection at some point "in the future" really isn't horrible -- we just re-try our HTTP request, or re-attempt the SFTP connection.
Update
A socket will only accept enough data to fill its available window. The window size is negotiated between the two peers during the session handshake. So your calls to send() will begin blocking when the socket's window fills. (The OS might keep letting you add data to its internal buffers too, but at some point the writes will block.) If the peer breaks the connection with a RST or ICMP Unreachable message, a future call to send() will return an error value for Connection Reset or Broken Pipe.
Update 2
I'm not talking at the TCP level, I'm talking at the Winsock API level
This might be the source of confusion. send() has no choice but to adhere to the TCP behavior when used with TCP.
TCP guarantees in-order reliable delivery of a stream of bytes, to the extent that packets can be delivered. (See #Hans's comment about a pony and careless people kicking power cords.) The peer program will see bytes in the correct order they were sent. (Well, okay, TCP also has out-of-band urgent packet delivery, but I haven't actually seen any applications that use it. Using OOB packets, you can get some data out-of-line. Forget I mentioned it.)
If the remote program receives a byte sent on a TCP stream, it reliably received all preceding bytes as well. (Well, there are entire classes of replay attacks that splice together legitimate and fake packets for the remote peer, but those are increasingly difficult on systems with randomized initial sequence numbers. If this is within your threat model, you should be using TLS on top of TCP to provide cryptographically strong tamper evident information. But TLS can't provide better per-packet delivery notification.)
If you use UDP and you care about the data actually being received by the other side you NEED to use ACK, but if you don't need the speed of UDP you should use TCP, as it does the ACKing for you.
I think you are over complicating this, trust your TCP/IP software stack and the reliable delivery it offers. TCP sockets operate on streams of data, not packets. Also one call to send does not guarantee one call to recv.