What (standard) protocols offer piecemeal access to streams? - web-services

I have an application in which I need to access large files remotely in a piecemeal fashion. I will know the start offset, but - having read some prefix of the file from that position onwards, I will establish another new offset, and will want to read next from this new position - crucially - having suffered the minimum possible latency.
I've considered using HTTP - posting a request detailing the offset at which to start a transfer - but I don't want to either specify a transfer size (a size too small would lead to low throughput; a size too large would lead to an unacceptable latency.) or drop an open connection - as that incurs a latency penalty on reconnection.
I've considered 'rolling my own' with TCP/UDP and sockets - but it feels as if this approach involves re-inventing the wheel. UDP might promise lowest latency, but I am not in a position to trade reliability for lower-latency.
I would be very interested to be pointed towards any standards (proposals, RFCs - etc.) about protocols to tackle this mode of access to data. Perhaps there's a good approach developed already in the context of cloud storage?

What you want, I believe, is a variation of the FTP protocol (RFC 959: https://www.rfc-editor.org/rfc/rfc959). I don't think there's any established protocol standard for what you want to do, exactly, but FTP is very close. It uses two connections, a "control connection" and a "data connection". The control connection handles passing of of commands from client to server and return status messages and the data connection is used separately to transfer the data. It sounds like this is the kind of system you need to set up.
The main thing you want to do that's different is to be able to seek to and transfer data from arbitrary offsets in your file which can be easily accomplished via custom commands. Depending on your setup, you may be able to grab existing open source implementations of FTP clients and servers and just add your custom commands.

Related

How do I check if gsoap is failing to serialize/send a big amount of data?

I'm running a web service client which we developed using gsoap library and we need to send 3.5 GB of data through a mutually authenticated connection - meaning we've got encrypted traffic on the network.
The server - which I have no access to - says it is receiving "empty" data.
I've made a network traffic capture and noticed a pause (a few seconds) during transmission and then an "Encrypted alert (21)" and connection closure.
Checking my code it seems there's some problem to serialize or send the data, but I haven't been able to find out exactly what's going on.
My suspicion is that gsoap is not able to allocate the necessary memory to serialize/send data.
How should I go about analyzing this?
EDIT1:
I've dropped the serialize trail and now my suspicion resides at the attachment functions. It seems defining attachment callbacks may solve my problem. Still interested in people's opinions and suggestions.
From your description it seems less likely that XML serialization is the cause of the problem. The standard HTTP-based communication in gsoap (i.e. without HTTP chunking) serializes the C/C++ data before sending to determine the HTTP content length. A serialization failure would show up immediately and not during transfer.
To speed up transfer of large XML chunks of data, you can optimize XML serialization with the SOAP_XML_TREE context flag to initialize the struct soap context. This reduces SOAP encoding overhead substantially. Let me explain why this is the case: the SOAP 1.1/1.2 encoding protocol incurs overhead in the gsoap engine in order to determine co-referenced elements by analyzing data structure pointers (e.g. graphs, possibly with cycles). When lots of pointers are used then this can affect the XML serializer's performance. Also, debug mode (-DDEBUG) will substantially slow down the gsoap engine so best to avoid with large transfers.
Perhaps a tool such as ssldump can shed some more light on the TLS/SSL communication issue you are experiencing.

UDP transfer is too fast, Apache Mina doesn't handle it

We decided to use UDP to send a lot of data like coordinates between:
client [C++] (using poll)
server [JAVA] [Apache MINA]
My datagrams are only 512 Bytes max to avoid as possible the fragmentation during the transfer.
Each datagram has a header I added (with an ID inside), so that I can monitor :
how many datagrams are received
which ones are received
The problem is that we are sending the datagrams too fast. We receive like the first ones and then have a big loss, and then get some, and big loss again. The sequence of ID datagram received is something like [1], [2], [250], [251].....
The problem is happening in local too (using localhost, 1 network card only)
I do not care about losing datagrams, but here it is not about simple loss due to network (which I can deal with)
So my questions here are:
On client, how can I get the best :
settings, or socket settings?
way to send as much as I can without being to much?
On Server, Apache MINA seems to say that it manage itself the ~"size of the buffer socket"~ but is there still some settings to care about?
Is it possible to reach something like 1MB/s knowing that our connection already allow us to have at least this bandwidth when downloading regular files?
Nowadays, when we want to transfer a ~4KB coordinates info, we have to add sleep time so that we are waiting 5 minutes or more to get it to finish, it's a big issue for us knowing that we should send every minute at least 10MB coordinates informations.
If you want reliable transport, you should use TCP. This will let you send almost as fast as the slower of the network and the client, with no losses.
If you want a highly optimized low-latency transport, which does not need to be reliable, you need UDP. This will let you send exactly as fast as the network can handle, but you can also send faster, or faster than the client can read, and then you'll lose packets.
If you want reliable highly optimized low-latency transport with fine-grained control, you're going to end up implementing a custom subset of TCP on top of UDP. It doesn't sound like you could or should do this.
... how can I get the best settings, or socket settings
Typically by experimentation.
If the reason you're losing packets is because the client is slow, you need to make the client faster. Larger receive buffers only buy a fixed amount of headroom (say to soak up bursts), but if you're systematically slower any sanely-sized buffer will fill up eventually.
Note however that this only cures excessive or avoidable drops. The various network stack layers (even without leaving a single box) are allowed to drop packets even if your client can keep up, so you still can't treat it as reliable without custom retransmit logic (and we're back to implementing TCP).
... way to send as much as I can without being to much?
You need some kind of ack/nack/back-pressure/throttling/congestion/whatever message from the receiver back to the source. This is exactly the kind of thing TCP gives you for free, and which is relatively tricky to implement well yourself.
Is it possible to reach something like 1MB/s ...
I just saw 8MB/s using scp over loopback, so I would say yes. That uses TCP and apparently chose AES128 to encrypt and decrypt the file on the fly - it should be trivial to get equivalent performance if you're just sending plaintext.
UDP is only a viable choice when any number of datagrams can be lost without sacrificing QoS. I am not familiar with Apache MINA, but the scenario described resembles the server which handles every datagram sequentially. In this case all datagrams arrived while the one is serviced will be lost - there is no queuing of UDP datagrams. Like I said, I do not know if MINA can be tuned for parallel datagram processing, but if it can't, it is simply wrong choice of tools.

TCP and PF_RING

I was taking a look at using PF_RING for sending and receiving in my application.
If I plan to use PF_RING for maintaining a TCP connection, it looks like I'll need to manually "forge" the IP and TCP messages myself, as pfring_send sends raw packets. Does this mean I'll have to manually reimplement TCP on top of PF_RING?
I understand there is a clear advantage for receiving using PF_RING, has anyone tried sending data with PF_RING? Is there a clear advantage over normal send calls?
note: I am not using DNA (Direct NIC Access), I am just using the kernel partial bypass with NIC aware drivers.
To answer your first question, yes, you will have to manually build the TCP/IP messages from the ground up, MAC address and all. For an example take a look at pfsend.c from ntop.org.
ntop.org has also made a PF_RING user guide available that contains explanations.
As for sending data using PF_RING, it is absolutely possible, the idea is to bypass any and all notion of what is actually data on the wire and send as fast as possible, see wire speed traffic generation from ntop.org. The only advantage it has over normal sending calls using the kernel for TCP/IP is that you can send data 1. faster and 2. completely unformatted onto the wire. 2 can be handy for example when you want to play back a previously captured packet/multiple packets onto the network.
Unless you have a specific use case that requires you to get access to the raw underlying data without kernel intervention there is absolutely no good reason to use PF_RING in any way. Your best bet would be to use the standard socket()'s that are available, in most cases the performance you can achieve with that is more than adequate.
What specific use case did you have in mind?

Simulate network conditions with a C/C++ Socket

I'm looking for a way to add network emulation to a socket.
The basic solution would be some way to add bandwidth limitation to a connection.
The ideal solution for me would:
Support advanced network properties (latency, packet-loss)
Open-source
Have a similar API as standard sockets (or wraps around them)
Work on both Windows and Linux
Support IPv4 and IPv6
I saw a few options that work on the system level, or even as proxy (Dummynet, WANem, neten, etc.), but that won't work for me, because I want to be able to emulate each socket manually (for example, open one socket with modem emulation and one with 3G emulation. Basically I want to know how these tools do it.
EDIT: I need to embed this functionality in my own product, therefore using an extra box or a third-party tool that needs manual configuration is not acceptable. I want to write code that does the same thing as those tools do, and my question is how to do it.
Epilogue: In hindsight, my question was a bit misleading. Apparently, there is no way to do what I wanted directly on the socket. There are two options:
Add delays to send/receive operation (Based on #PaulCoccoli's answer):
by adding a delay before sending and receiving, you can get a very crude network simulation (constant delay for latency, delay sending, as to not send more than X bytes per second, for bandwidth).
Paul's answer and comment were great inspiration for me, so I award him the bounty.
Add the network simulation logic as a proxy (Based on #m0she and others answer):
Either send the request through the proxy, or use the proxy to intercept the requests, then add the desired simulation. However, it makes more sense to use a ready solution instead of writing your own proxy implementation - from what I've seen Dummynet is probably the best choice (this is what webpagetest.org does). Other options are in the answers below, I'll also add DonsProxy
This is the better way to do it, so I'm accepting this answer.
You can compile a proxy into your software that would do that.
It can be some implementation of full fledged socks proxy (like this) or probably better, something simpler that would only serve your purpose (and doesn't require prefixing your communication with the destination and other socks overhead).
That code could run as a separate process or a thread within your process.
Adding throttling to a proxy shouldn't be too hard. You can:
delay forwarding of data if it passes some bandwidth limit
add latency by adding timer before read/write operations on buffers.
If you're working with connection based protocol (like TCP), it would be senseless to drop packets, but with a datagram based protocol (UDP) it would also be simple to implement.
The connection creation API would be a bit different from normal posix/winsock (unless you do some macro or other magic), but everything else (send/recv/select/close/etc..) is the same.
If you're building this into your product, then you should implement a layer of abstraction over the sockets API so you can select your own implementation at run time. Alternatively, you can implement wrappers of each socket function and select whether to call your own version or the system's version.
As for adding latency, you could have your implementation of the sockets API spin off a thread. In that thread, have a priority queue ordered by time (i.e. this background thread does a very basic discrete event simulation). Each "packet" you send or receive could be enqueued along with a delivery time. Each delivery time should have some amount of delay added. I would use some kind of random number generator with a Gaussian distribution.
The background thread would also have to simulate the other side of the connection, though it sounds like you may have already implemented that part?
I know only Network Link Conditioner for Mac OS X Lion. You should be mac developer to download it, so i cannot put download link there. Only description from 9to5mac.com: http://9to5mac.com/2011/08/10/new-in-os-x-lion-network-link-conditioner-utility-lets-you-simulate-internet-and-bandwidth-conditions/
This answer might be a partial solution for you when using linux:
Simulate delayed and dropped packets on Linux. It refers to a kernel module called netem, which can simulate all kinds of network problems.
If you want to work with TCP connections, having "packet loss" could be problematic since a lot of error-handling (like recovering lost packages) is done in the kernel. Simulating this in a cross-platform way could be hard.
you usually add a network device to your network that throttles the bandwidth or latency, on a port by port basis, you can then achieve what you want just by connecting to the port allocated to the particular type of crappy network you want to test, with no code changes or modifications required.
The easiest ways to do this is just add iptables rules to a Linux server acting as a proxy.
If you want it to work without the separate device, try trickle that is a software package that throttles your network on your client PC. (or for Windows)
You may would like to check WANem http://wanem.sourceforge.net/ . WANEM is Open Source and licensed under the GNU General Public License.
WANem allows the application development team to setup a transparent application gateway which can be used to simulate WAN characteristics like Network delay, Packet loss, Packet corruption, Disconnections, Packet re-ordering, Jitter, etc.
I think you could use a tool like Network Simulator. It's free, for Windows.
The only thing to do is to setup your program to use the right ports (and the settings for the network, of course).
If you want a software only solution that you control, you will have to implement it yourself. I know of no such existing package.
While a wrapper layer over a socket may give you the ability to introduce delay, it won't be sufficient to introduce loss or out of order delivery. In order to simulate those activities, you actually need intercept the data in transit between the two TCP stacks.
The approach I would recommend is to use a tunneling device (say tunX). Routes should be set so the client believes the way to the server is through tunX. Additional code (perhaps running in a different thread) would promiscuously intercept traffic on tunX, and perform your augmented behavior, before forwarding packets over the true physical interface that will get the traffic to your server. The reverse would happen for packets arriving from the server on the physical interface. Those packets would be intercepted by the client code, behavior augmented, before forwarding through tunX.
However, since you are testing client software, I am unclear as to why you would want to embed this code in your released software, unless the software itself is a WAN simulating client.

Custom IP/UDP/RTP header in windows xp (and above) + general network questions

Lots of questions, I am sorry!
I am doing a voice-chat (VoIP) application and I was thinking of doing a custom implementation of the IP&UDP headers, along with small, extra information mainly seq number. Sounds alot like RTP yes, but I'm mainly just interested in the seq number or timestamp, and trying to implement my own whole RTP sounds like a nightmare with all the complexity involved and data im not likely to use.
Target OS for the application is windows xp and above. I have read http://msdn.microsoft.com/en-us/library/ms740548%28v=vs.85%29.aspx on the topic of Raw sockets in windows, and now I just want some confirmation.
I also have some general networking questions.
Here's the following questions;
1) According to MSDN, you cannot send custom IP packets with a source that is not on the network list. I understand it from a security PoV, but is there any way around this? My idea was to have for example two clients open UDP communication to a non-NAT protected server, and then have the clients spoof the source-header to make it look like packets come from the server instead of each other, thereby eliminating the need for a server as a relay of data to get through NAT, which would improve latency.
I have heard of winpcap but I don't want each client to have to install any 3rd party apps. Considering the number of DoS attacks surely there must be some way around this, like spoofing the network table the OS uses to check if source-header is legit? Will this trigger anti-virus systems?
I feel it would be really fun to actually toy with IP headers and above instead of just using predefined headers.
2) I've been having issues with free RTP libraries like JRTPLIB(which probably is very good anyway it just dosn't want to work for me) to make them work, more than I could almost tolerate, and am thinking of just writing my own interpretation ontop of UDP. Does application-level protcols like RTP simply build their header directly inside the UDP payload with the actual data afterwards? I suspect this considering the encapsulation process but just want to make sure.
If so, one does not need to create a RAW socket to implement application-level protocol, just an ordinary UDP socket and then your own payload interpretation above?
3) RTP does not give any performance boost compared to UDP since it adds more headers, all it does is making sure packets arrive in a sort-of correct manner based on timestamps and sequence numbers, right?
Is it -really- that usefull to use an RTP implementation for your basic VoIP project needs instead of adding basic sequencing yourself? I realise for video conferencing perhaps you reaally don't want frames to play out of order, but in audio conversations, would you really notice it?
4) If my solution in #1 is not applicable and I would have to use a server as a data relay between clients, would multicast be a good solution to reduce server loads? Is multicast supported enough in routing hardware?
5) It is related to question 1). Why do routers/firewalls allow things like UDP hole punching? For example, two clients first conenct to the server, then the server gives a client port / ip on to other clients, so the clients can talk to each other on those ports.
Why would firewalls allow data to be received from another IP than the one used in making the connection on that very port? Sounds like a big security hole that should easly be filtered? I understand that source IP spoofing would trick it, but this?
6) To set up a UDP session between two parties (the client which is behind NAT, server whic his non-NAT) does the client simply have to send a packet to the server and then the session is allowed through the firewall? Meaning the client can receive too from the server.
Based on article at wiki, http://en.wikipedia.org/wiki/UDP_hole_punching
7) Is SIP dependant on RTP? For some reason I got this impression but I cant find data to back it up. I may plan to add softphone functionality to my VoIP client in the future and want to make sure I have a good foundation (RTP if I really must, otherwise my own UDP interpretation)
Thanks in advance!
1, Raw sockets seems unnecessary for this application
2, Yes
3, RTP runs on top of UDP, of course it adds overhead. In many ways RTP (ignoring RTCP) is pretty much the bare minimum already and if you implemented a half-way decent alternative it would save you a few bytes at best and you wouldn't be able to use any of the many RTP test tools.
7, SIP is completely independent of RTP. SIP is used to Initiate Sessions. SDP is the protocol commonly transported by SIP, and it is SDP that negotiates and controls RTP video/voice voice.