I wrote a simple server and client apps, where I can switch between TCP, DCCP and UDP protocols. The goal was to transfer a file from the one to the other and measure the traffic for each protocol, so I can compare them for different network setups (I know roughly what the result should be, but I need exact numbers/graphs). Anyway after starting both apps on different computers and starting tcpdump I only get in the tcpdump-log the first few MBs (~50MB) from my 4GB file. The apps are written in a standard C/C++ code, which could be found anywhere on the web.
What may be the problem or what could I be doing wrong here?
-- Edit
The command line I use is:
tcpdump -s 1500 -w mylog
tcpdump captures then packets only the first ~55 sec. That's the time the client needs to send the file to the socket. Afterwards it stops, even though the server continues receiving and writing the file to the hard drive.
-- Edit2
Source code:
client.cpp
server.cpp
common.hpp
common.cpp
-- Edit final
As many of you pointed out (and as I suspected) there were several misconceptions/bugs in the source code. After I cleaned it up (or almost rewrote it), it works as needed with tcpdump. I will accept the answer from #Laurent Parenteau but only for point 5. as it was the only relevant for the problem. If someone is interested in the correct code, here it is:
Source code edited
client.cpp
server.cpp
common.hpp
common.cpp
There are many things wrong in the code.
The file size / transfer size is hardcoded to 4294967295 bytes. So, if the file supplied isn't that many bytes, you'll have problems.
In the sender, you aren't checking if the file read is successful or not. So if the file is smaller than 4294967295 bytes, you won't know it and send junk data (or nothing at all) over the network.
When you use UDP and DDCP, the packets order isn't guarantee, so the data received may be out of order (ie. junk).
When you use UDP, there's no retransmission of lost packet, so some data may never be received.
In the receiver, you aren't check how many bytes you received, you always write MAX_LINE bytes to the file. So even if you receive 0 bytes, you'll still be writing to the file, which is wrong.
When you use UDP, since you're sending in a thigh loop, even if the write() call return the same amount of bytes sent that what you requested, a lot of data will probably be dropped by the network stack or the network interface, since there's no congestion control in place. So, you will need to put some congestion control in place yourself.
And this is just from a quick scan of the code, there is probably more problems in there...
My suggestion is :
Try the transfer with TCP, do a md5sum of the file you read/send, and a md5sum of the file you receive/save, and compare the 2 md5sum. Once you have this case working, you can move to testing (still using the md5sum comparison) with UDP and DCCP...
For the tcpdump command, you should change -s 1500 for -s 0, which means unlimited. With that tcpdump command, you can trust it that data not seen by it hasn't been sent/received. Another good thing to do is to compare the tcpdump output of the sender with the receiver. This way you'll know if some packet lost occurred between the two network stacks.
Do you have x term access? Switch to Wireshark instead and try with that - its free, open source, and probably more widely used than tcpdump today. (It was formerly known as Ethereal.)
Also, do try the following tcpdump options:
-xx print the link header and data of the packet as well (does -w write data?)
-C specify the max file size explicitly.
-U to write packet by packet to the file instead of flushing the buffer.
-p dont put the nic in promiscuous mode
-O dont use the packet matching optimizer as yours is a new app level protocol.
Are you using verbose output in tcpdump? This can make the buffers fill quickly so redirect stdout/err to a file when you run it.
Are these Gigabit ethernet card on both ends?
tcpdump is used as a diagnostic and forensics tool by 10s of thousands (at least) programmers and computer security professionals worldwide. When a tool like this seems to be mishandling a very common task the first thing to suspect is the code you wrote, and not the tool.
In this particular case your code has a wide variety of significant errors. In particular, with TCP, your server will continue to write data to the file regardless of whether or not the client is sending any.
This code has race conditions that will result in non-deterministic behavior in some situations, improperly treats '\0' as being a special value in network data, ignores error conditions, and ignores end-of-file conditions. And that's just a brief reading.
In this case I am nearly certain that tcpdump is functioning perfectly and telling you that your application does not do what you think it does.
"That's the time the client needs to
send the file to the socket.
Afterwards it stops, even though the
server continues receiving and writing
the file to the hard drive."
This sound really weird. The socket buffers are way too small to allow this to happen. I really think that your server code only seems to receive data, while the sender actually has already stopped sending data.
I know this might sound silly, but are you sure it is not a problem of flush() of the file? I.e. the data are still in memory and not yet written to disk (because they do not amount to a sufficient quantity).
Try sync or just wait a bit until you are certain that enough data have been transmitted.
Related
I have an application that compresses and sends data via socket and data received is written in remote machine. During recovery, this data is decompressed and retrieved. Compression/Decompression is done using "zlib".But during decompression I face the following problem randomly:
zlib inflate() fails with error "Z_DATA_ERROR" for binary files like .xls,.qbw etc.
The application compresses data in blocks say "1024" bytes in a loop with data read from the file and decompresses in the same way.From the forum posts, I found that one reason for Z_DATA_ERROR is due to data corruption. As of now, to avoid this problem, we have introduced CRC check of data compressed during send and what is received.
Any possible reasons on why this happens is really appreciated! (as this occurs randomly and for the same file, it works the other time around).Is it bcoz of incorrect handling of zlib inflate() and deflate() ?
Note: If needed,will post the exact code snippet for further analysis!
Thanks...Udhai
You didn't mention if the socket was TCP or UDP; but based on the blocking and checksumming, I'm going out on a limb and guessing it's UDP.
If you're sending the compressed packets over UDP they could be received out-of-order on the other end, or the packets could be lost in transit.
Getting things like out-of-sequencing and lost packets right ends up being a lot of the work that is all fixed by using the TCP protocol - you have a simple pipe that guarantees the data arrives in-order and as-expected.
Also I'd make sure that the code on the receiving side is simple, and receives into buffers allocated on the heap and not on the stack (I've seen many a bug triggered by this).
Again, this is just an educated guess based on the detail of the question.
I've encountered an issue when sending large segments of data through a TCP socket, having spend about 3 days trying to pick apart the issue and failing I decided it was best to turn here for help / advice.
My Project
I've written a basic HTTP server which (slightly irrelevant) can run lua scripts to output pages. This all works perfectly fine under Windows (32 bit).
The Problem
When sending medium/large files (anything from roughly 8000 bytes and above appears to have issues) over the TCP socket on Ubuntu Linux(64bit), they appear to cut out at different lengths (the result displayed in the browser is a value between 8000 and 10200 bytes. When I check the return value of the send function it's exactly 9926 bytes every time the send ends. No error.
Smaller files send absolutely fine, and there are no issues under windows. Going on this information I thought it could be a buffer size issues, so I did
cat /proc/sys/net/ipv4/tcp_mem
which outputted 188416 192512 196608
those numbers are far above 9926 so I assume that isn't the problem.
I'm using CSimpleSockets as a socket library, I haven't had any issues before. In case the issue is inside of this library the code I dug around for what the send function used under unix is:
#define SEND(a,b,c,d) send(a, (const int8 *)b, c, d)
send(socket, buffer, bytestosend, 0);
buffer gets cast from a const char * to const unsigned char* to const int8 * before getting passed to the OS to be sent.
OK, I think that covers everything I checked. If you need any more information or I've missed anything glaringly obvious I'll do my best to provide. Thanks for your help!
Your problem is that send does not guarantee to send the amount of data passed to it.
It has internal buffers that can fill, socket parameters that affect buffers, etc. You need to note how many bytes were sent, wait for a few milliseconds (for the send to move data over the wire and empty the buffer), then send the remaining data. There is no automatic way to do this and you'll need to write a bit of logic which advances your buffer by the amount of bytes that were actually sent.
Are you using blocking or non-blocking sockets? If you're using non-blocking sockets, you must (and with blocking sockets, you should) check for a short send (one where the return value is fewer than the number of bytes you meant to send).
I have to send mesh data via TCP from one computer to another... These meshes can be rather large. I'm having a tough time thinking about what the best way to send them over TCP will be as I don't know much about network programming.
Here is my basic class structure that I need to fit into buffers to be sent via TCP:
class PrimitiveCollection
{
std::vector<Primitive*> primitives;
};
class Primitive
{
PRIMTYPES primType; // PRIMTYPES is just an enum with values for fan, strip, etc...
unsigned int numVertices;
std::vector<Vertex*> vertices;
};
class Vertex
{
float X;
float Y;
float Z;
float XNormal;
float ZNormal;
};
I'm using the Boost library and their TCP stuff... it is fairly easy to use. You can just fill a buffer and send it off via TCP.
However, of course this buffer can only be so big and I could have up to 2 megabytes of data to send.
So what would be the best way to get the above class structure into the buffers needed and sent over the network? I would need to deserialize on the recieving end also.
Any guidance in this would be much appreciated.
EDIT: I realize after reading this again that this really is a more general problem that is not specific to Boost... Its more of a problem of chunking the data and sending it. However I'm still interested to see if Boost has anything that can abstract this away somewhat.
Have you tried it with Boost's TCP? I don't see why 2MB would be an issue to transfer. I'm assuming we're talking about a LAN running at 100mbps or 1gbps, a computer with plenty of RAM, and don't have to have > 20ms response times? If your goal is to just get all 2MB from one computer to another, just send it, TCP will handle chunking it up for you.
I have a TCP latency checking tool that I wrote with Boost, that tries to send buffers of various sizes, I routinely check up to 20MB and those seem to get through without problems.
I guess what I'm trying to say is don't spend your time developing a solution unless you know you have a problem :-)
--------- Solution Implementation --------
Now that I've had a few minutes on my hands, I went through and made a quick implementation of what you were talking about: https://github.com/teeks99/data-chunker There are three big parts:
The serializer/deserializer, boost has its own, but its not much better than rolling your own, so I did.
Sender - Connects to the receiver over TCP and sends the data
Receiver - Waits for connections from the sender and unpacks the data it receives.
I've included the .exe(s) in the zip, run Sender.exe/Receiver.exe --help to see the options, or just look at main.
More detailed explanation:
Open two command prompts, and go to DataChunker\Debug in both of them.
Run Receiver.exe in one of the
Run Sender.exe in the other one (possible on a different computer, in which case add --remote-host=IP.ADD.RE.SS after the executable name, if you want to try sending more than once and --num-sends=10 to send ten times).
Looking at the code, you can see what's going on, creating the receiver and sender ends of the TCP socket in the respecitve main() functions. The sender creates a new PrimitiveCollection and fills it in with some example data, then serializes and sends it...the receiver deserializes the data into a new PrimitiveCollection, at which point the primitive collection could be used by someone else, but I just wrote to the console that it was done.
Edit: Moved the example to github.
Without anything fancy, from what I remember in my network class:
Send a message to the receiver asking what size data chunks it can handle
Take a minimum of that and your own sending capabilities, then reply saying:
What size you'll be sending, how many you'll be sending
After you get that, just send each chunk. You'll want to wait for an "Ok" reply, so you know you're not wasting time sending to a client that's not there. This is also a good time for the client to send a "I'm canceling" message instead of "Ok".
Send until all packets have been replied with an "Ok"
The data is transfered.
This works because TCP guarantees in-order delivery. UDP would require packet numbers (for ordering).
Compression is the same, except you're sending compressed data. (Data is data, it all depends on how you interpret it). Just make sure you communicate how the data is compressed :)
As for examples, all I could dig up was this page and this old question. I think what you're doing would work well in tandem with Boost.Serialization.
I would like to add one more point to consider - setting TCP socket buffer size in order to increase socket performance to some extent.
There is an utility Iperf that let test speed of exchange over the TCP socket. I ran on Windows a few tests in a 100 Mbs LAN. With the 8Kb default TCP window size the speed is 89 Mbits/sec and with 64Kb TCP window size the speed is 94 Mbits/sec.
In addition to how to chunk and deliver the data, another issue you should consider is platform differences. If the two computers are the same architecture, and the code running on both sides is the same version of the same compiler, then you should, probably, be able to just dump the raw memory structure across the network and have it work on the other side. If everything isn't the same, though, you can run into problems with endianness, structure padding, field alignment, etc.
In general, it's good to define a network format for the data separately from your in-memory representation. That format can be binary, in which case numeric values should be converted to standard forms (mainly, changing endianness to "network order", which is big-endian), or it can be textual. Many network protocols opt for text because it eliminates a lot of formatting issues and because it makes debugging easier. Personally, I really like JSON. It's not too verbose, there are good libraries available for every programming language, and it's really easy for humans to read and understand.
One of the key issues to consider when defining your network protocol is how the receiver knows when it has received all of the data. There are two basic approaches. First, you can send an explicit size at the beginning of the message, then the receiver knows to keep reading until it's gotten that many bytes. The other is to use some sort of an end-of-message delimiter. The latter has the advantage that you don't have to know in advance how many bytes you're sending, but the disadvantage that you have to figure out how to make sure the the end-of-message delimiter can't appear in the message.
Once you decide how the data should be structured as it's flowing across the network, then you should figure out a way to convert the internal representation to that format, ideally in a "streaming" way, so you can loop through your data structure, converting each piece of it to network format and writing it to the network socket.
On the receiving side, you just reverse the process, decoding the network format to the appropriate in-memory format.
My recommendation for your case is to use JSON. 2 MB is not a lot of data, so the overhead of generating and parsing won't be large, and you can easily represent your data structure directly in JSON. The resulting text will be self-delimiting, human-readable, easy to stream, and easy to parse back into memory on the destination side.
I need some help understanding a peculiar issue I'm having when using asio.
I've a client -server app, with a c++ client(using boost asio) sending 2 byte hearbeat (say, every second) to a server(written in java) (and receiving lots of data as well).
for a quite a few minutes the server correctly receives the 2 byte HeartBeat, but after that the server's 'read' complains abt a 0 byte read, and closes the connection (which I guess is correct for a blocking read). The client however always prints out that it's been transferring the correct amount.
I've experimented with almost all variants of the 'write' family of functions. are all of them implemented in terms of 'write_some' and does that mean that this behavior is expected?
I must be making some mistake in my usage, basically I'm looking for something within asio that guarantees a write ( at least a byte) . please help me figure out where I'm going wrong(and if any further info is reqd.)...
any advice, most appreciated!
thanks!
If it's sockets, you can't "guarantee a write"; what if the network is down, the cable yanked out, the switch is on fire, or the power is out worldwide and your computer happens to be the only one running on batteries?
That said, it sounds as if you have some kind of buffering/emptying issue perhaps, check over your read code to make sure it really consumes all data that appears.
A 0-byte read is not an error, look over that code again, check for any error status flags on the socket(s) and so on. A read can fail with a "AGAIN"-status, which really means you should try again.
strace the applications at both ends. It will show any error codes that are returned by read(), write() etc. Use strace -f if the application is multithreaded.
The advantage of this approach is that all applications - java, c++, python appear the same in an strace, so it's easy to spot bad behaviour.
In this case, it would probably show that the tcp connection ended (gracefully).
Is there a good method on how to transfer a file from say... a client to a server?
Probably just images, but my professor was asking for any type of files.
I've looked around and am a little confused as to the general idea.
So if we have a large file, we can split that file into segments...? Then send each segment off to the server.
Should I also use a while loop to receive all the files / segments on the server side? Also, how will my server know if all the segments were received without previously knowing how many segments there are?
I was looking on the Cplusplus website and found that there is like a binary transfer of files...
Thanks for all the help =)
If you are using TCP:
You are right, there is no way to "know" how much data you will be receiving. This gives you a few options:
1) Before transmitting the image data, first send the number of bytes to be expected. So your first 4 bytes might be the 4-byte integer "4096". Then your client can read the first 4 bytes, "know" that it is expecting 4096 bytes, and then malloc(4096) so it can expect the rest. Then, your server can send() 4096 bytes worth of image data.
When you do this, be aware that you might have to recv() multiple times - for one reason or another, you might not have received all 4096 bytes. So you will need to check the return value of recv() to make sure you have gotten everything.
2) If you are just sending one file, you could just have your receiver read it. And it can keep recv()ing from the socket until the server closes the connection. This is a bit harder - you will have to keep track of how much you have received, and then if your buffer is full, you will have to reallocate it. I don't recommend this method, but it would technically accomplish the task.
If you are using UDP:
This means that you don't have reliable transfer. So packets might be dropped. They might also arrive out of order. So if you are going to use UDP, you must fragment your data into little segments. Both the sender and receiver must have agreement on how large a segment is (100 bytes? 1000 bytes?)
Not only that, but you must also transmit a sequence number with each packet - that is, label each packet #1, #2, etc. Because your client must be able to tell: if any packets are missing (you receive packets 1, 2 and 4 - and are thus missing #3) and to make sure they are in order (you receive 3, 2, then 1 - but when you save them to the file, you must make sure the packets are saved in the correct order, 1, 2, then 3).
So for your assignment, well, it will depend on what protocol you have to/are allowed to use.
If you use a UDP-based transfer protocol, you will have to break the file up into chunks for network transmission. You'll also have to reassemble them in the correct order on the receiving end and verify the results. If you use a TCP-based transfer protocol, all of this will be taken care of under the hood.
You should consult Beej's Guide to Network Programming for how best to send and receive data and use sockets in general. It explains most of the things about which you are asking.
There are many ways of transferring files. If your transferring files in a lossless manor, then your basically going to divide the file into chunks. Tag each chunk with a sequence number. Send the chunks to the other side and reconstitute the file. Stream oriented protocols are simpler since packets will be retransmitted if lost. If your using an unreliable protocol, then you will need to retransmit missing packets and resequenced chunks which are not in the correct order.
If lossy transfer is acceptable (like transferring video or on-line game data), then use an unreliable protocol. Lossy transfer is simpler because you don't have to retransmit missing chunks. All you need to do is make sure the chunks are processed in the proper sequence.
Many protocols send a terminator packet to indicate the end of transmission. You could use this strategy if you don't want to send the number of chunks to the other side before transmission.