I've encountered an issue when sending large segments of data through a TCP socket, having spend about 3 days trying to pick apart the issue and failing I decided it was best to turn here for help / advice.
My Project
I've written a basic HTTP server which (slightly irrelevant) can run lua scripts to output pages. This all works perfectly fine under Windows (32 bit).
The Problem
When sending medium/large files (anything from roughly 8000 bytes and above appears to have issues) over the TCP socket on Ubuntu Linux(64bit), they appear to cut out at different lengths (the result displayed in the browser is a value between 8000 and 10200 bytes. When I check the return value of the send function it's exactly 9926 bytes every time the send ends. No error.
Smaller files send absolutely fine, and there are no issues under windows. Going on this information I thought it could be a buffer size issues, so I did
cat /proc/sys/net/ipv4/tcp_mem
which outputted 188416 192512 196608
those numbers are far above 9926 so I assume that isn't the problem.
I'm using CSimpleSockets as a socket library, I haven't had any issues before. In case the issue is inside of this library the code I dug around for what the send function used under unix is:
#define SEND(a,b,c,d) send(a, (const int8 *)b, c, d)
send(socket, buffer, bytestosend, 0);
buffer gets cast from a const char * to const unsigned char* to const int8 * before getting passed to the OS to be sent.
OK, I think that covers everything I checked. If you need any more information or I've missed anything glaringly obvious I'll do my best to provide. Thanks for your help!
Your problem is that send does not guarantee to send the amount of data passed to it.
It has internal buffers that can fill, socket parameters that affect buffers, etc. You need to note how many bytes were sent, wait for a few milliseconds (for the send to move data over the wire and empty the buffer), then send the remaining data. There is no automatic way to do this and you'll need to write a bit of logic which advances your buffer by the amount of bytes that were actually sent.
Are you using blocking or non-blocking sockets? If you're using non-blocking sockets, you must (and with blocking sockets, you should) check for a short send (one where the return value is fewer than the number of bytes you meant to send).
Related
I'm experiencing a frustrating behaviour of windows sockets that I cant find any info on, so I thought I'd try here.
My problem is as follows:
I have a C++ application that serves as a device driver, communicating with a serial device connected
through a serial to TCP/IP converter.
The serial protocol requires a lot of single byte messages to be communicated between the device and
my software. I noticed that these small messages are only sent about 3 times after startup, after which they are no longer actually transmitted (checked with wireshark). All the while, the send() method keeps returning > 0, indicating that the message has been copied to it's send buffer.
I'm using blocking sockets.
I discovered this issue because this particular driver eventually has to drop it's connection when the send buffer is completely filled (select() fails due to this after about 5 hours, but it happens much sooner when I reduce SO_SNDBUF size).
I checked, and noticed that when I call send with messages of 2 bytes or larger, transmission never fails.
Any input would be very much appreciated, I am out of ideas how to fix this.
This is a rare case when you should set TCP_NODELAY so that the sends are written individually, not coalesced. But I think you have another problem as well. Are you sure you're reading everything that's being sent back? And acting on it properly? It sounds like an application protocol problem to me.
EDIT!
Just read that read will block until the buffer is full. How on earth to I receive smaller packets with out having to send 1MB (my max buffer length) each time? What If I want to send arbitrarily length messages?
In Java you seem to be able to just send a char array without any worries. But in C++ with the boost sockets I seem to either have to keep calling socket.read(...) until I think I have everything or send my full buffer length of data which seems wasteful.
Old original question for context.
Yet again boost sockets has me completely stumped. I am using
boost::asio::ssl::stream<boost::asio::ip::tcp::socket> socket; I
used the boost SSL example for guidance but I have dedicated a thread
to it rather than having the async calls.
The first socket.read_some(...) of the socket is fine and it reads
all the bytes. After that it reads 1 byte and then all the rest on the
next socket.read_some(...) which had me really confused. I then
noticed that read_some typically has this behaviour. So I moved to
boost::asio::read as socket does have a member function read which
surprised me. However noticed boost::asio has a read function that
takes a socket and buffer. However it is permanently blocking.
//read blocking data method
//now
bytesread = boost::asio::read(socket,buffer(readBuffer, max_length)); << perminatly blocks never seems to read.
//was
//bytesread = socket.read_some(buffer(readBuffer, max_length)); << after the 1st read it will always read one byte and need another
socket.read_some(...) call to read the rest.
What do I need to do make boost::asio::read(...) work?
note .. I have used wireshark to make sure that the server is not
sending the data broken up. The server is not faulty.
Read with read_some() in a loop merging the buffers until you get a complete application message. Assume you can get back anything between 1 byte and full length of your buffer.
Regarding "knowing when you are finished" - that goes into your application level protocol, which could use either delimited messages, fixed length messages, fixed length headers that tell payload length, etc.
I have a problem - when I'm trying to send huge amounts of data through posix sockets ( doesn't matter if it's files or some data ) at some point I don't receive what I expect - I used wireshark to determine what's causing errors, and I found out, that exactly at the point my app breaks there are packets marked red saying "zero window" or "window full" sent in both directions.
The result is, that the application layer does not get a piece of data sent by send() function. It gets the next part though...
Am I doing something wrong?
EDIT:
Lets say I want to send 19232 pieces of data 1024 bytes each - at some random point ( or not at all ) instead of the 9344th packet I get the 9345th. And I didn't implement any retransmission protocol because I thought TCP does it for me.
Zero Window / Window Full is an indication that one end of the TCP connection cannot recieve any more data, until its client application reads some of the data is has already recieved. In other words, it is one side of the connection telling the other side "do not send any more data until I tell you otherwise".
TCP does handle retransmissions. Your problem is likely that:
The application on the recieving side is not reading data fast enough.
This causes the recieving TCP to report Window Full to the sending TCP.
This in turn causes send() on the sending TCP side to return either 0 (no bytes written), or -1 with errno set to EWOULDBLOCK.
Your sending application is NOT detecting this case, and is assuming that send() sent all the data you asked to send.
This causes the data to get lost. You need to fix the sending side so that it handles send() failing, including returning a value smaller than the number of bytes you asked it to send. If the socket is non-blocking, this means waiting until select() tells you that the socket is writeable before trying again.
First of all, TCP is a byte stream protocol, not a packet-based protocol. Just because you sent a 1024 byte chunk doesn't mean it will be received that way. If you're filling the pipe fast enough to get a zero window condition (i.e., that there is no more room in either a receive buffer or send buffer) then it's very likely that the receiver code will at some point be able to read far more at one time than the size of your "packet".
If you haven't specifically requested non-blocking sockets, then both send and recv will block with a zero window/window full condition rather than return an error.
If you want to paste in the receiver-side code we can take a look, but from what you've described it sounds very likely that your 9344th read is actually getting more bytes than your packet size. Do you check the value returned from recv?
Does in your network iperf also fails to send this number of packets of this size? If not check how they send this amount of data.
Hm, from what I read on Wikipedia this may be some kind of buffer overflow (receiver reports zero receive window). Just a guess though.
I have to send mesh data via TCP from one computer to another... These meshes can be rather large. I'm having a tough time thinking about what the best way to send them over TCP will be as I don't know much about network programming.
Here is my basic class structure that I need to fit into buffers to be sent via TCP:
class PrimitiveCollection
{
std::vector<Primitive*> primitives;
};
class Primitive
{
PRIMTYPES primType; // PRIMTYPES is just an enum with values for fan, strip, etc...
unsigned int numVertices;
std::vector<Vertex*> vertices;
};
class Vertex
{
float X;
float Y;
float Z;
float XNormal;
float ZNormal;
};
I'm using the Boost library and their TCP stuff... it is fairly easy to use. You can just fill a buffer and send it off via TCP.
However, of course this buffer can only be so big and I could have up to 2 megabytes of data to send.
So what would be the best way to get the above class structure into the buffers needed and sent over the network? I would need to deserialize on the recieving end also.
Any guidance in this would be much appreciated.
EDIT: I realize after reading this again that this really is a more general problem that is not specific to Boost... Its more of a problem of chunking the data and sending it. However I'm still interested to see if Boost has anything that can abstract this away somewhat.
Have you tried it with Boost's TCP? I don't see why 2MB would be an issue to transfer. I'm assuming we're talking about a LAN running at 100mbps or 1gbps, a computer with plenty of RAM, and don't have to have > 20ms response times? If your goal is to just get all 2MB from one computer to another, just send it, TCP will handle chunking it up for you.
I have a TCP latency checking tool that I wrote with Boost, that tries to send buffers of various sizes, I routinely check up to 20MB and those seem to get through without problems.
I guess what I'm trying to say is don't spend your time developing a solution unless you know you have a problem :-)
--------- Solution Implementation --------
Now that I've had a few minutes on my hands, I went through and made a quick implementation of what you were talking about: https://github.com/teeks99/data-chunker There are three big parts:
The serializer/deserializer, boost has its own, but its not much better than rolling your own, so I did.
Sender - Connects to the receiver over TCP and sends the data
Receiver - Waits for connections from the sender and unpacks the data it receives.
I've included the .exe(s) in the zip, run Sender.exe/Receiver.exe --help to see the options, or just look at main.
More detailed explanation:
Open two command prompts, and go to DataChunker\Debug in both of them.
Run Receiver.exe in one of the
Run Sender.exe in the other one (possible on a different computer, in which case add --remote-host=IP.ADD.RE.SS after the executable name, if you want to try sending more than once and --num-sends=10 to send ten times).
Looking at the code, you can see what's going on, creating the receiver and sender ends of the TCP socket in the respecitve main() functions. The sender creates a new PrimitiveCollection and fills it in with some example data, then serializes and sends it...the receiver deserializes the data into a new PrimitiveCollection, at which point the primitive collection could be used by someone else, but I just wrote to the console that it was done.
Edit: Moved the example to github.
Without anything fancy, from what I remember in my network class:
Send a message to the receiver asking what size data chunks it can handle
Take a minimum of that and your own sending capabilities, then reply saying:
What size you'll be sending, how many you'll be sending
After you get that, just send each chunk. You'll want to wait for an "Ok" reply, so you know you're not wasting time sending to a client that's not there. This is also a good time for the client to send a "I'm canceling" message instead of "Ok".
Send until all packets have been replied with an "Ok"
The data is transfered.
This works because TCP guarantees in-order delivery. UDP would require packet numbers (for ordering).
Compression is the same, except you're sending compressed data. (Data is data, it all depends on how you interpret it). Just make sure you communicate how the data is compressed :)
As for examples, all I could dig up was this page and this old question. I think what you're doing would work well in tandem with Boost.Serialization.
I would like to add one more point to consider - setting TCP socket buffer size in order to increase socket performance to some extent.
There is an utility Iperf that let test speed of exchange over the TCP socket. I ran on Windows a few tests in a 100 Mbs LAN. With the 8Kb default TCP window size the speed is 89 Mbits/sec and with 64Kb TCP window size the speed is 94 Mbits/sec.
In addition to how to chunk and deliver the data, another issue you should consider is platform differences. If the two computers are the same architecture, and the code running on both sides is the same version of the same compiler, then you should, probably, be able to just dump the raw memory structure across the network and have it work on the other side. If everything isn't the same, though, you can run into problems with endianness, structure padding, field alignment, etc.
In general, it's good to define a network format for the data separately from your in-memory representation. That format can be binary, in which case numeric values should be converted to standard forms (mainly, changing endianness to "network order", which is big-endian), or it can be textual. Many network protocols opt for text because it eliminates a lot of formatting issues and because it makes debugging easier. Personally, I really like JSON. It's not too verbose, there are good libraries available for every programming language, and it's really easy for humans to read and understand.
One of the key issues to consider when defining your network protocol is how the receiver knows when it has received all of the data. There are two basic approaches. First, you can send an explicit size at the beginning of the message, then the receiver knows to keep reading until it's gotten that many bytes. The other is to use some sort of an end-of-message delimiter. The latter has the advantage that you don't have to know in advance how many bytes you're sending, but the disadvantage that you have to figure out how to make sure the the end-of-message delimiter can't appear in the message.
Once you decide how the data should be structured as it's flowing across the network, then you should figure out a way to convert the internal representation to that format, ideally in a "streaming" way, so you can loop through your data structure, converting each piece of it to network format and writing it to the network socket.
On the receiving side, you just reverse the process, decoding the network format to the appropriate in-memory format.
My recommendation for your case is to use JSON. 2 MB is not a lot of data, so the overhead of generating and parsing won't be large, and you can easily represent your data structure directly in JSON. The resulting text will be self-delimiting, human-readable, easy to stream, and easy to parse back into memory on the destination side.
I am trying to use some socket network programming in C++. I am trying to send the text "Hello World!" to a server using the C++ send() function. At first, I set the buffer to the size of 13 since "Hello World!" altogether is 12 characters (you have to make it one more than the character count). The send function only sends the characters to the server if I send it about 7 times. And when it does finally come to the server it looks like this:
"Hello World! Hello World! Hello World! Hello World! Hello World! Hello World! Hello World!"
Now here is the funny part. The "Hello World!" sentence sends immediately if I set the buffer size to 256 (char buffer[256];). When it comes to the server like that though, it shows "Hello World!" with a whole bunch of space after the two words. Why is this happening and if possible, how can I fix it? Please let me know.
Thanks
When you call read (or receive) with your buffer to read from the socket, an integer value is returned that specifies the number of bytes read. You should only take that much from the buffer. The rest is irrelevant:
int count = read(...);
// buffer[0 .. count - 1] contains the appropriate data.
Nagle's algorithm usually is turned on by default. This will combine several small packets into one. Turning Nagle's algorithm off will allow small packets to be sent immediately.
Buffers exist to store data until you are ready to send it. You have a send buffer size of 256. Until 256 characters are transmitted through the buffer, your data won't be sent to the other side. You can fix this by calling a flush method on your buffer when you know you are ready to send.
To be clear, you are buffering internally, then the OS (or library) is buffering again when you call send() and pass some data.
If you get more specific with what library you are using and maybe include a code snippet, we can probably find the right buffer flush function to send you on your way with.
Alternatively, if you are in *nix, just turn off Nagle's algorithm so that the OS won't buffer your small packets. Or, when you set up you socket, make sure to use the TCP_NODELAY option
Assuming this is a SOCK_STREAM socket, the important thing to be aware of is that the underlying TCP protocol does not maintain any segment boundaries. That is, when you call send() multiple times, all of the data you sent may very easily be returned by a single recv() call at the other end. Or, the data sent in one send() call may be returned in multiple recv()s at the other end, e.g. if some packets got delayed due to network congestion. This is fundamental to the design of TCP, and your application must be designed accordingly.
Also, as noted by Mehrdad, the recv() call returns the number of bytes that were read off the wire. Anything after that point in the buffer is garbage, and the data is not zero-terminated.
A SOCK_DGRAM socket uses UDP underneath, which is entirely packet-oriented, as opposed to stream-oriented like TCP. However, UDP does not guarantee reliability (the U stands for Unreliable), so you have to handle lost, duplicated, out-of-order etc. packets yourself. This is a lot harder than stream-oriented I/O.
Socket programming is tedious, error prone and non portable. Start using libraries like Boost or ACE that shield you from the low level C APIs and provide you with platform independent abstractions.