Packet Size modification over Sockets

Packet Size modification over Sockets - c++

I am doing socket programming in QT and I have to design a protocol to transfer data over TCP/IP.
Now my protocol design is simple. It sends commands in a fashion that the first byte of the data written to the socket for every write will be the command. So whenever I write into the socket using socket->write("CDATA") the first byte, "C" in this case will mean a command for the server to do something.
I just want to know one thing, that whether the write will be broken down into multiple reads on the server ? I know there will be a buffer size on the server for the read. But can the socket->write() on the client be recieved in multiple reads on the server when the write is within the buffer limits of the server ?
To clear this question I will given an example Lets say the buffer read size of the socket on the server is 4096 bytes. The client writes socket->write("CDATA") to the server. Now is there any possibility that server will receive this in more than one read? Because I have a while loop on the server :
while{
char str[] = socket->read();
// What is the coomand in the first byte
if(str[0] == "C"){
// Do something
}
}
If the data sent by the client is received in more than one read (even though the client sent it in one write) my protocol design will fail.

Now is there any possibility that server will receive this in more than one read?
Yes, TCP/IP can fragment messages any way it likes. TCP is a stateful stream protocol: you are guaranteed that bytes you put in on one end will come out the other end in the same order. IP is connectionless and datagram based. Due to the nature of carrying TCP over IP, circumstances can arise in which data packets are split, merged, or otherwise processed in transit.
You should find a way to sanitize your program to the intricacies of network communication. You can:
Use a datagram protocol like UDP (you lose the guarantee of getting data in the order they are sent, and dropped packets becomes a possibility as well. Today's networks are fairly robust; this is not usually a problem).
[DATAGRAM (size specified in datagram header)]
Always read blocks of a fixed size from the network
[DATA - block of data of some fixed size]
Include the size of the incoming data as a header attached to the front
[LENGTH - 4 byte integer][DATA - block of data of size LENGTH]
Use some sort of delimiter to indicate end-of-data and continue reading until you get it
[DATA - indeterminately sized data][DELIMITER - end-of-data control sequence]
Chances are you can use library methods to perform this behavior for you requiring very little code on your part.

Related

UDP Read entire socket buffer in one shot

I have 3 components client-proxy-server, at times when the proxy gets heavily loaded the socket buffers configure to say 1 MB gets filled. Is there a way to read Entire buffer 1 MB in one shot and then process?
FYI:
all the data grams never goes beyond MTU size are in per-defined structural format, where in length of each packet is also added.
Proxy routes data in between client & server, so tried having Producer & consumer thread but problem is NOT solved

Short answer: no.
Long answer:
The Berkeley style socket implementation allows to receive or send only one packet per call. Therefore it is not possible to read a complete network stream and replay it at the other side.
One reason is that your UDP socket can receive data from several sources. The interface should be able to pass the meta information like sender socket address, and at least the packet size to the caller. This is bunch of data should be parsed and you would pick the packets that meet a criteria. Finally you could build the bunch of packets to send.
Since you have to have the possibility to check each packet, if the packet is really expected you need a function to read a packet from the bunch. This is the function recvfrom.

Receive packet by packet data from TCP socket?

I have a tcp socket on which I receive video stream. I want to receive data as packet by packet from socket so that I could remove the packet header and keep the only stream data. How can I do this??
any help will be appreciated.

You can't. TCP doesn't work with packets / messages etc. TCP works with bytes. You get a stream of bytes. The problem is that there's no guarantee reagarding the number of bytes you'll get each time you read from a socket. The usual way to handle this:
When you want to send a "packet" include as the first thing a length
When you read stuff from a socket make sure you read at least that length
Your message could be:
|Message Length:4bytes|Additional header Information:whatever1|Message Data:whatever2|
What you'll then have to do is read 4 bytes and then read as much as those 4 bytes tell you. Then you'll be able to strip the header and get the data.

As others have mentioned, TCP is a streaming protocol. This means from an API point of view, there is no concept of "packet". As a user, all you can expect is a stream of data.
Internally, TCP will break the stream into segments that can be placed into IP packets. These packets will be sent along with control data, over IP, to the remote end. The remote end will receive these IP packets. It may discard certain IP packets (in the case of duplicates), reorder the packets or withhold data until earlier packets have arrived. All this is internal to TCP meaning the concept of a "TCP packet" is meaningless.
You might be able to use raw sockets to receive the raw IP packets but this will mean you will have to reimplement much of the TCP stack (like sending ACKs and adjusting window size) to get the remote end to perform correctly. You do not want to do this.
UDP, on the other hand, is a datagram protocol. This means that the user is made aware of how the data is sent over the network. If the concept of packets or datagrams are important to you, you will need to build your own protocol on top of UDP.

TCP is a stream protocol and it doesn't guaranty that when you call socket read function you will receive one, complete packet. UDP or SCTP are packet oriented protocols and guaranty this. For TCP you can get part of the packet or few packet at once. You have to build your own application protocol on top of TCP and fragment/defragment messages manually.

TCP is a streaming protocol. You get bytes with no message boundaries. The solution is to buffer all your reads and extract/process full video packets from the buffer.
Algorithm:
Initialize an empty buffer.
Examine buffer for a complete packet.
If found, remove complete packet from beginning of buffer and process it.
If not found, append data from a recv() to the buffer and go to #2.
What a "complete packet" contains should be defined by the video streaming protocol.

Are you pretty sure about this approach? In my opinion these "preprocessing" will introduce an additional overhead to the system. And of course this is handled by a lower layer (Read about OSI model) so it is not easy to change. Note that most of the existing streaming protocols are already optimized for the best performance.

Receiving all data sent with C sockets

If I write a server, how can I implement the receive function to get all the data sent by a specific client if I don't know how that client sends the data?
I am using a TCP/IP protocol.

If you really have no protocol defined, then all you can do is accept groups of bytes from the client as they arrive. Without a defined protocol, there is no way to know that you have received "all the bytes" that the client sent, since there is always the possibility that a network failure occurred somewhere between the client and your server during transmission, causing the last part of the stream not to arrive at the server. In that case, you would get the usual end-of-stream indication from the TCP socket (e.g. recv() returning 0, or EWOULDBLOCK if you are using non-blocking sockets), so you would know that you aren't going to receive any more data from the client (because the TCP connection is now disconnected)... but that isn't quite the same thing as knowing you have received all of the data the client meant for you receive.
Depending on your application, that might be good enough. If not, then you'll have to work out a protocol, and trust that your clients will abide by the rules of that protocol. Having the client send a header first saying how many bytes it plans to send is a good approach; or having it send some special "Okay, that's all I meant to send" indicator is also possible (although if you do it that way, you have to watch out for false positives if the special indicator could appear by chance inside the data itself)

One call to send does not equal one call to recv. Either send a header so the receiver know how much data to expect, or send some sort of sentinel value so the the receiver knows when to stop reading.

It depends on how you want to design your protocol.
ASCII protocols usually use a special character to delimit the end of the data, while binary protocols usually send the length of the data first as a fixed-size integer (both sides know this size) and then the variable-length data follows.

You can combine size with your data in one buffer and call send once. People usually use first 2 bytes for size of data in a packet. Like this,
|size N (2 bytes) | data (N bytes) |
In this case, you can contain 65535 byte-long custom data.
Since TCP does not preserve message boundary, it doesn't matter how many times you call send. You have to call receive until you get N size(2 bytes) then you can keep calling receive until you have N bytes data you sent.
UPDATE: This is just a sample to show how to check message boundary in TCP. Security/Encryption is a whole different story and it deserves a new thread. That said, do not simply copy this design. :)

TCP is stream-based, so there is no concept of a "complete message": it's given by a higher-level protocol (e.g. HTTP) or you'd have to invent it yourself. If you were free to use UDP (datagram-based), then there would be no need to do send() multiple times, or receive().
A newer SCTP protocol also supports the concept of a message natively.
With TCP, to implement messages, you have to tell the receiver the size of the message. It can be the first few bytes (commonly 2, since that allows messages up to 64K -- but you have to be careful of byte order if you may be communicating between different systems), or it can be something more complicated. HTTP, for example, has a whole set of rules by which the receiver determines the length of the message. One of them is the Content-Length HTTP header, which contains a string representing the number of bytes in the body of the message. Header-only HTTP messages are simply delimited by a blank line. As you can see, there are no easy (or standard) answers.

TCP is a stream based protocol. As such there is no concept of length of data built into TCP in the same way as there is no concept of data length for keyboard input.
It is therefore up to the higher level protocol to specify the end of the message. This can be done by including the packet length in the protocol or specifying a special end-of-message byte sequence.
For example HTTP headers are terminated by a double \r\n sequence and the length of the message body can be obtains from the Content-Length header.

What should i know about UDP programming?

I don't mean how to connect to a socket. What should I know about UDP programming?
Do I need to worry about bad data in my socket?
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
Should I worry about another connection sending me bad data on the same port?
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
What do I really need to know?

"i should assume if i send 200bytes i
may get 120 and 60bytes separately?"
When you're sending UDP datagrams your read size will equal your write size. This is because UDP is a datagram protocol, vs TCP's stream protocol. However, you can only write data up to the size of the MTU before the packet could be fragmented or dropped by a router. For general internet use, the safe MTU is 576 bytes including headers.
"i should worry about another
connection sending me bad data on the
same port?"
You don't have a connection, you have a port. You will receive any data sent to that port, regardless of where it's from. It's up to you to determine if it's from the right address.
If data doesnt arrive typically how
long may i (typically) not see data
for (250ms? 1 second? 1.75sec?)
Data can be lost forever, data can be delayed, and data can arrive out of order. If any of those things bother you, use TCP. Writing a reliable protocol on top of UDP is a very non trivial task and there is no reason to do so for almost all applications.

Should I worry about another
connection sending me bad data on the
same port?
Yes you should worry about it. Any application can send data to your open UDP port at any time. One of the big uses of UDP is many to one style communications where you multiplex communications with several peers on a single port using the addressed passed back during the recvfrom to differentiate between peers.
However, if you want to avoid this and only accept packets from a single peer you can actually call connect on your UDP socket. This cause the IP stack to reject packets coming from any host:port combo ( socket ) other than the one you want to talk to.
A second advantage of calling connect on your UDP socket is that in many OS's it gives a significant speed / latency improvement. When you call sendto on an unconnected UDP socket the OS actually temporarily connects the socket, sends your data and then disconnects the socket adding significant overhead.
A third advantage of using connected UDP sockets is it allows you to receive ICMP error messages back to your application, such as routing or host unknown due to a crash. If the UDP socket isn't connected the OS won't know where to deliver ICMP error messages from the network to and will silently discard them, potentially leading to your app hanging while waiting for a response from a crashed host ( or waiting for your select to time out ).

Your packet may not get there.
Your packet may get there twice or even more often.
Your packets may not be in order.
You have a size limitation on your packets imposed by the underlying network layers. The packet size may be quite small (possibly 576 bytes).
None of this says "don't use UDP". However you should be aware of all the above and think about what recovery options you may want to take.

Fragmentation and reassembly happens at the IP level, so you need not worry about that (Wikipedia). (This means that you won't receive split or truncated packets).
UDP packets have a checksum for the data and the header, so receiving bogus data is unlikely, but possible. Lost or duplicate packets are also possible. You should check your data in any case anyway.
There's no congestion control, so you may wish to consider that, if you plan on clogging the tubes with a lot of UDP packets.

UDP is a connectionless protocol. Sending data over UDP can get to the receiver, but can also get lost during transmission. UDP is ideal for things like broadcasting and streaming audio or video (i.e. a dropped packet is never a problem in those situations.) So if you need to ensure your data gets to the other side, stick with TCP.
UDP has less overhead than TCP and is therefore faster. (TCP needs to build a connection first and also checks data packets for data corruption which takes time.)
Fragmented UDP packets (i.e. packets bigger than about half a Kb) will probably be dropped by routers, so split your data into small chuncks before sending it over. (In some cases, the OS can take care of that.) Note that it is allways a packet that might make it, or not. Half packets aren't processed.
Latency over long distances can be quite big. If you want to do retransmission of data, I would go with something like 5 to 10 times the agerage latency time over the current connection. (You can measure the latency by sending and receiving a few packets.)
Hope this helps.

I won't follow suit with the other people who answered this, they all seem to push you toward TCP, and that's not for gaming at all, except maybe for login/chat info. Let's go in order:
Do I need to worry about bad data in my socket?
Yes. Even though UDP contains an extremely simple checksum for routers and such, it is not 100% efficient. You can add your own checksum device, but most of the time UDP is used when reliability is already not an issue, so data that doesn't conform should just be dropped.
I should assume if I send 200bytes I may get 120 and 60 bytes separately?
No, UDP is direct data write and read. However, if the data is too large, some routers will truncate and you lose part of the data permanently. Some have said roughly 576 bytes with header, I personally wouldn't use more than 256 bytes (nice round log2 number).
Should I worry about another connection sending me bad data on the same port?
UDP listens for any data from any computer on a port, so on this sense yes. Also note that UDP is a primitive and a raw format can be used to fake the sender, so you should use some sort of "key" in order for the listener to verify the sender against their IP.
If data doesnt arrive typically how long may I (typically) not see data for (250ms? 1 second? 1.75sec?)
Data sent on UDP is usually disposable, so if you don't receive data, then it can easily be ignored...however, sometimes you want "semi-reliable" but you don't want 'ordered reliable' like TCP uses, 1 second is a good estimate of a drop. You can number your packets on a rotation and write your own ACK communication. When a packet is received, it records the number and sends back a bitfield letting the sender know which packets it received. You can read this unfinished document for more information (although unfinished, it still yields valiable info):
http://gafferongames.com/networking-for-game-programmers/

The big thing to know when attempting to use UDP is:
Your packets might not all make it over the line, which means there is going to be possible data corruption.
If you're working on an application where 100% of the data needs to arrive reliably to provide functionality, use TCP. If you're working on an application where some loss is allowable (streaming media, etc.) then go for UDP but don't expect everything to get from one of the pipe to the other intact.

One way to look at the difference between applications appropriate for UDP vs. TCP is that TCP is good when data delivery is "better late than never", UDP is good when data delivery is "better never than late".
Another aspect is that the stateless, best-effort nature of most UDP-based applications can make scalability a bit easier to achieve. Also note that UDP can be multicast while TCP can't.

In addition to don.neufeld's recommendation to use TCP.
For most applications TCP is easier to implement. If you need to maintain packet boundaries in a TCP stream, a good way is to transmit a two byte header before the data to delimit the messages. The header should contain the message length. At the receiving end just read two bytes and evaluate the value. Then just wait until you have received that many bytes. You then have a complete message and are ready to receive the next 2-byte header.
This gives you some of the benefit of UDP without the hassle of lost data, out-of-order packet arrival etc.

And don't assume that if you send a packet it got there.

If there is a packet size limitation imposed by some router along the way, your UDP packets could be silently truncated to that size.

Two things:
1) You may or may not received what was sent
2) Whatever you receive may not be in the same order it was sent.

Confusion about UDP/IP and sendto/recvfrom return values

I'm working with UDP sockets in C++ for the first time, and I'm not sure I understand how they work. I know that sendto/recvfrom and send/recv normally return the number of bytes actually sent or received. I've heard this value can be arbitrarily small (but at least 1), and depends on how much data is in the socket's buffer (when reading) or how much free space is left in the buffer (when writing).
If sendto and recvfrom only guarantee that 1 byte will be sent or received at a time, and datagrams can be received out of order, how can any UDP protocol remain coherent? Doesn't this imply that the bytes in a message can be arbitrarily shuffled when I receive them? Is there a way to guarantee that a message gets sent or received all at once?

It's a little stronger than that. UDP does deliver a full package; the buffer size can be arbitrarily small, but it has to include all the data sent in the packet. But there's also a size limit: if you want to send a lot of data, you have to break it into packets and be able to reassemble them yourself. It's also no guaranteed delivery, so you have to check to make sure everything comes through.
But since you can implement all of TCP with UDP, it has to be possible.
usually, what you do with UDP is you make small packets that are discrete.
Metaphorically, think of UDP like sending postcards and TCP like making a phone call. When you send a postcard, you have no guarantee of delivery, so you need to do something like have an acknowledgement come back. With a phone call, you know the connection exists, and you hear the answers right away.

Actually you can send a UDP datagram of 0 bytes length. All that gets sent is the IP and UDP headers. The UDP recvfrom() on the other side will return with a length of 0. Unlike TCP this does not mean that the peer closed the connection because with UDP there is no "connection".

No. With sendto you send out packets, which can contain down to a single byte.
If you send 10 bytes as a single sendto call, these 10 bytes get sent into a single packet, which will be received coherent as you would expect.
Of course, if you decide to send those 10 bytes one by one, each of them with a sendto call, then indeed you send and receive 10 different packets (each one containing 1 byte), and they could be in arbitrary order.
It's similar to sending a book via postal service. You can package the book as a whole into a single box, or tear down every page and send each one as an individual letter. In the first case, the package is bulkier but you receive the book as a single, ordered entity. In the latter, each package is very light, but good luck reading that ;)

I have a client program that uses a blocking select (NULL timeout parameter) in a thread dedicated to waiting for incoming data on a UDP socket. Even though it is blocking, the select would sometimes return with an indication that the single read descriptor was "ready". A subsequent recvfrom returned 0.
After some experimentation, I have found that on Windows at least, sending a UDP packet to a port on a host that's not expecting it can result in a subsequent recvfrom getting 0 bytes. I suspect some kind of rejection notice might be coming from the other end. I now use this as a reminder that I've forgotten to start the process on the server that looks for the client's incoming traffic.
BTW, if I instead "sendto" a valid but unused IP address, then the select does not return a ready status and blocks as expected. I've also found that blocking vs. non-blocking sockets makes no difference.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js