How do you throttle the bandwidth of a socket connection in C? - c++

I'm writing a client-server app using BSD sockets. It needs to run in the background, continuously transferring data, but cannot hog the bandwidth of the network interface from normal use. Depending on the speed of the interface, I need to throttle this connection to a certain max transfer rate.
What is the best way to achieve this, programmatically?

The problem with sleeping a constant amount of 1 second after each transfer is that you will have choppy network performance.
Let BandwidthMaxThreshold be the desired bandwidth threshold.
Let TransferRate be the current transfer rate of the connection.
Then...
If you detect your TransferRate > BandwidthMaxThreshold then you do a SleepTime = 1 + SleepTime * 1.02 (increase sleep time by 2%)
Before or after each network operation do a
Sleep(SleepTime)
If you detect your TransferRate is a lot lower than your BandwidthMaxThreshold you can decrease your SleepTime. Alternatively you could just decay/decrease your SleepTime over time always. Eventually your SleepTime will reach 0 again.
Instead of an increase of 2% you could also do an increase by a larger amount linearly of the difference between TransferRate - BandwidthMaxThreshold.
This solution is good, because you will have no sleeps if the user's network is already not as high as you would like.

The best way would be to use a token bucket.
Transmit only when you have enough tokens to fill a packet (1460 bytes would be a good amount), or if you are the receive side, read from the socket only when you have enough tokens; a bit of simple math will tell you how long you have to wait before you have enough tokens, so you can sleep that amount of time (be careful to calculate how many tokens you gained by how much you actually slept, since most operating systems can sleep your process for longer than you asked).
To control the size of the bursts, limit the maximum amount of tokens you can have; a good amount could be one second worth of tokens.

I've had good luck with trickle. It's cool because it can throttle arbitrary user-space applications without modification. It works by preloading its own send/recv wrapper functions which do the bandwidth calculation for you.
The biggest drawback I found was that it's hard to coordinate multiple applications that you want to share finite bandwidth. "trickled" helps, but I found it complicated.
Update in 2017: it looks like trickle moved to https://github.com/mariusae/trickle

Related

What is the proper way to calculate latency in omnet++?

I have written a simulation module. For measuring latency, I am using this:
simTime().dbl() - tempLinkLayerFrame->getCreationTime().dbl();
Is this the proper way ? If not then please suggest me or a sample code would be very helpful.
Also, is the simTime() latency is the actual latency in terms of micro
seconds which I can write in my research paper? or do I need to
scale it up?
Also, I found that the channel data rate and channel delay has no impact on the link latency instead if I vary the trigger duration the latency varies. For example
timer = new cMessage("SelfTimer");
scheduleAt(simTime() + 0.000000000249, timer);
If this is not the proper way to trigger simple module recursively then please suggest one.
Assuming both simTime and getCreationTime use the OMNeT++ class for representing time, you can operate on them directly, because that class overloads the relevant operators. Going with what the manual says, I'd recommend using a signal for the measurements (e.g., emit(latencySignal, simTime() - tempLinkLayerFrame->getCreationTime());).
simTime() is in seconds, not microseconds.
Regarding your last question, this code will have problems if you use it for all nodes, and you start all those nodes at the same time in the simulation. In that case you'll have perfect synchronization of all nodes, meaning you'll only see collisions in the first transmission. Therefore, it's probably a good idea to add a random jitter to every newly scheduled message at the start of your simulation.

Is it possible to stream data into a ZeroMQ message as it is being sent via UDP?

We're working with a latency-critical application at 30fps, with multiple stages in the pipeline (e.g. compression, network send, image processing, 3D calculations, texture sharing, etc).
Ordinarily we could achieve these multiple stages like so:
[Process 1][Process 2][Process 3]
---------------time------------->
However, if we can stack these processes, then it is possible that as [Process 1] is working on the data, it is continuously passing its result to [Process 2]. This is similar to how iostream works in c++, i.e. "streaming". With threading, this can result in reduced latency:
[Process 1]
[Process 2]
[Process 3]
<------time------->
Let's presume that [Process 2] is our a UDP communication (i.e. [Process 1] is on Computer A and [Process 3] is on Computer B).
The output of [Process 1] is approximately 3 MB (i.e. typically > 300 jumbo packets at 9 KB each), therefore we can presume that when we call ZeroMQ's:
socket->send(message); // message size is 3 MB
Then somewhere in the library or OS, the data is being split into packets which are sent in sequence. This function presumes that the message is already fully formed.
Is there a way (e.g. API) for parts of the message to be 'under construction' or 'constructed on demand' when sending large data over UDP? And would this also be possible on the receiving side (i.e. be allowed to act on the beginning of the message, as the remainder of the message is still incoming). Or.. is the only way to manually split the data into smaller chunks ourselves?
Note:the network connection is a straight wire GigE connection between Computers A and B.
No, you can't realistically do it. The API doesn't provide for it, and ZeroMQ promises that a receiver will get a complete message (including multi-part messages) or no message at all, which means that it won't present a message to a receiver until it's fully transferred. Splitting the data yourself into individually actionable chunks that are sent as separate ZeroMQ messages would be the right thing to do, here.
TLDR; - simple answer is no, ZeroMQ SLOC will not help your project win. The project is doable, but another design view is needed.
Having stated a minimum set of facts:- 30fps,- 3MB/frame,- 3-stage processing pipeline,- private host-host GigE-interconnect,
there is not much to decide without further details.
Sure, there is a threshold of about 33.333 [ms] for the pipeline end-to-end processing ( while you plan to lose some 30 [ms] straight by networkIO ) and the rest you leave to designer's hands. Ouch!
Latency controlled design shall not skip a real-time I/O design phase
ZeroMQ is a powerhorse, but that does not mean, it could save a poor design.
If you spend a few moments with timing constraints, the LAN networkIO latency is the worst enemy in your view.
Ref.: Latency numbers everyone should know
1 ) DESIGN Processing-phases2 ) BENCHMARK each Phase's implementation model3 ) PARALLELISE wherever Amdahl's Law says it is meaningful and where possible
If your code allows for a parallelised processing, your plans will get much better use of "progressive"-pipeline processing with a use of a ZeroMQ Zero-copy / ( almost ) Zero-latency / Zero-blocking achievable in inproc: transport class and your code may support "progressive"-pipelining as you go among multiple processing phases.
Remember, this is not a one-liner and do not expect a SLOC to control your "progressive"-pipelining fabrication.
[ns] matter, read your numbers from data-processing micro-benchmarks carefully.They do decide about your success.
Here you may read how much time was "lost" / "vasted" in just changing a color-representations, which your code will need in object detection and 3D scene-processing and texture post-processing. So have your design criteria set for rather high standard levels.
Check the lower left window numbers about milliseconds lost in this real-time pipeline.
If your code's processing requirements do not safely fit into your 33,000,000 [ns] time-budget with { quad | hexa | octa }-core CPU resources and if the numerical processing may benefit from many-core GPU resources, there may be the case, that Amdahl's Law may well justify for some asynchronous multi-GPU-kernel processing approach, with their additional +21,000 ~ 23,000 [ns] lost in initial/terminal data transfers +350 ~ 700 [ns] introduced by GPU.gloMEM -> GPU.SM.REG latency masking ( which happily has enough quasi-parallel thread-depth in your case of image processing, even for a low-computational density of the expected trivial GPU-kernels )
Ref.:
GPU/CPU latencies one shall validate initial design against:

Packet Delay Variation (PDV)

I am currently implementing video streaming application where the goal is to utilize as much as possible gigabit ethernet bandwidth
Application protocol is built over tcp/ip
Network library is using asynchronous iocp mechanism
Only streaming over LAN is needed
No need for packets to go through routers
This simplifies many things. Nevertheless, I am experiencing problems with packet delay variation.
It means that a video frame which should arrive for example every 20 ms (1280 x 720p 50Hz video signal) sometimes arrives delayed by tens of milliseconds. More:
Average frame rate is kept
Maximum video frame delay is dependent on network utilization
The more data on LAN, the higher the maximum video frame delay
For example, when bandwidth usage is 800mbps, PDV is about 45 - 50 ms.
To my questions:
What are practical boundaries in lowering that value?
Do you know about measurement report available on internet dealing with this?
I want to know if there is some subtle error in my application (perhaps excessive locking) or if there is no way to make numbers better with current technology.
For video streaming, I would recommend using UDP instead of TCP, as it has less overhead and packet confirmation is usually not needed, as the retransmited data would already be obsolete.

How do I send UDP packets at a specified rate in C++ on Windows?

I'm writing a program that implements the RFC 2544 network test. As the part of the test, I must send UDP packets at a specified rate.
For example, I should send 64 byte packets at 1Gb/s. That means that I should send UDP packet every 0.5 microseconds. Pseudocode can look like "Sending UDP packets at a specified rate":
while (true) {
some_sleep (0.5);
Send_UDP();
}
But I'm afraid there is no some_sleep() function in Windows, and Linux too, that can give me 0.5 microseconds resolution.
Is it possible to do this task in C++, and if yes, what is the right way to do it?
Two approaches:
Implement your own sleep by busy-looping using a high-resolution timer such as windows QueryPerformanceCounter
Allow slight variations in rate, insert Sleep(1) when you're enough ahead of the calculated rate. Use timeBeginPeriod to get 1ms resolution.
For both approaches, you can't rely on the sleeps being exact. You will need to keep totals counters and adjust the sleep period as you get ahead/behind.
This might be helpful, but I doubt it's directly portable to anything but Windows. Implement a Continuously Updating, High-Resolution Time Provider for Windows by Johan Nilsson.
However, do keep in mind that for packets that small, the IP and UDP overhead is going to account for a large fraction of the actual on-the-wire data. This may be what you intended, or not. A very quick scan of RFC 2544 suggests that much larger packets are allowed; you may be better off going that route instead. Consistently delaying for as little as 0.5 microseconds between each Send_UDP() call is going to be difficult at best.
To transmit 64-byte Ethernet frames at line rate, you actually want to send every 672 ns. I think the only way to do that is to get really friendly with the hardware. You'll be running up against bandwidth limitations with the PCI bus, etc. The system calls to send one packet will take significantly longer than 672 ns. A sleep function is the least of your worries.
You guess you should be able to do it with Boost Asios timer function. I haven't tried it myself, but I guess that deadline_timer would take a boost::posix_time::nanosec as well as the boost::posix_time::second
Check out an example here
Here's a native Windows implementation of nanosleep. If GPL is acceptable you can reuse the code, else you'll have to reimplement.

how to control socket rate?

I want to know how can I control the rate of my network interface, In fact, I want to receive with a rate of 32 Kbits/s and send the received data to the network with a rate of 1 Mbits/s....do you have any ideas on how to control the interface's rate?....or do you know any tricks that could help?...
Thanks in advance..
There is a difference between data throughput rate and the baud rate of the connection. Generally, you want the baud rate to be as fast as possible (without errors of course). Some low level drivers or the OS may allow you to control this, but it is fundamentally a low-level hardware/driver issue.
For data throughput rate, throttling sending is easy, just don't call send() as fast. This requires that you track how much you are sending per time period and limiting it with sleeps.
Receiving can work the same way, but you have to consider that if someone is sending faster than the rate you are receiving, there may be issues.
You can do this, you must only control time and carry about not recv more and less than 32kbits (you can set this in function arguments) in second and same practice on send.
I've done this "the hard way" (dunno if there is an easier way). Specifically, I did it by controlling the rate at which I called send() and/or recv(), and how much data I indicated I was willing to send/receive in each of those calls. It takes a bit of math to do it right, but it's not impossible.