Jitter buffer calculation in rtp in receiver - rtp

Can someone tell me how to calculate the buffer size to de-jitter the received packets in RTP? My connection is 1Gbps and the maximum bitrate of ASI is 80Mbps. So how can I calculate the amount of buffer size I need?
Regards
Arash

Two different questions?
https://net7mma.codeplex.com/SourceControl/latest#Rtp/RtpClient.cs
That is my implementation otherwise see the RFC # https://www.rfc-editor.org/rfc/rfc3550

Related

How to play audio stream over UDP?

I writing a Windows application, It receives audio data from an Android app, I use UDP to transfer data over LAN, and use RtAudio to play audio-stream.
Every UDP package payload is a audio sample array, in 32k/16bit/pcm format.
When data size is 576 bytes, 288 samples in other words, every thing is OK, we can hear a clear voice.
But when data size in 192 bytes, 96 samples in other words, the sound is not clear.
Does anyone have the problem?
It is a balancing act to determine optimum size of each buffer packet ... too large and you progressively move away from real time response yet too small and the code spends proportionately too much time negotiating the boilerplate plumbing of simply transferring the data. Looks like you have hit this lower boundary when as you say 192 bytes starts acting up.
This is true independent of transport mechanism. Also keep in mind the wall clock duration consumed when listening to a few hundred bytes is tiny (typically 44,100 samples per second for CD quality mono audio) so you will not loose much in the real time aspect to give yourself more than that lower bound you have hit.

GOP size for realtime video stream

I'm working on a kind of rich remote desktop system, with a video stream of the desktop encoded using avcodec/x264. I have to set manually the GOP size for the stream, and so far I was using a size of fps/2.
But I've just read the following on Wikipedia:
This structure [Group Of Picture# suggests a problem because the fourth frame (a P-frame) is needed in order to predict the second and the third (B-frames). So we need to transmit the P-frame before the B-frames and it will delay the transmission (it will be necessary to keep the P-frame).
It means I'm creating a lot of latency since the client needs to receive at least half of the GOP to output the first frame following the I frame. What is the best strategy for the GOP size if I want the smallest latency possible ? A gop of 1 picture ?
If you want to minimize latency with h264, you should generally avoid b-frames. This way the decoder has at least a chance to emit decoded frames early. This prevents decoder-induced latency.
You may also want to tune the encoder for latency, by reducing/disabling look-ahead. x264 has a "zero-latency" setting which should be a good starting point for finding you optimal settings.
The "GOP" size (which afaik is not really defined for h264; I'll just assume you mean the I(DR)-frame interval) does not necessarily affect the latency. This parameter only affects how long a client will have to wait until it can "sync" on the stream (time-to-first-picture).

Are there any constraints to encode a audio signal?

I capture a pcm sound at some sampling rate, e.g. 24 kHz. I need to encode it using some codec (I use Opus for that) to send over network. I noticed that at some sampling rate I use for encoding with Opus, I often hear some extra "cracking" noise at the receiving end. At other rates, it sounds ok. That might be an implementation bug, but I though there might be some constraints also that I don't know.
I also noticed that if I use another sampling rate while decoding Opus-encoded audio stream, I get a lower or higher pitch of sound, which seems logical to me. So I've read, that I need to resample on the other end, if receiving side doesn't support the original PCM sampling rate.
So I have 2 questions regarding all this:
Are there any constraints on sampling rate (or other parameters) of audio encoding? (Like I have a 24kHz pcm sound - maybe there are certain sample rates to use with it?)
Are there any common techniques to provide the same sound quality at both sides when sending audio stream over network?
The crackling noises are most likely a bug, since there is no limitations to the samplerate that would result in this kind of noise (there are other kinds of signal changes that come with sample rate conversion, especially when downsampling to a lower samplerate; but definitely not crackling).
A wild guess would be, that there is something wrong with the input buffer. Crackling often occurs if samples are omitted or duplicated, oftentimes the result of the boundaries of subsequent buffers not being correct.
Sending audio data over network in realtime will require compression, no matter what. The required data rate is simply too high. There are codecs which provide lossless audio compression (e.g. FLAC), but their compression ratio is comparatively low compared to e.g. Opus.
The problem was solved by buffering packets at receiving end and writing them to the soundcard buffer as soon as some amount has been reached. The 'crackling' noise was then most likely due to the gaps between subsequent frames that were sent to the soundcard buffer

IS reading from buffer quicker than reading from a file in python

I have a fpga board and I write a VHDL code that can get Images (in binary) from serial port and save them in a SDRAM on my board. then FPGA display images on a monitor via a VGA cable. my problem is filling the SDRAM take to long(about 10 minutes with 115200 baud rate).
on my computer I wrote a python code to send image(in binary) to FPGA via serial port. my code read binary file that saved in my hard disk and send them to FPGA.
my question is if I use buffer to save my images insted of binary file, do I get a better result? if so, can you help me how to do that, please? if not, can you suggest me a solution, please?
thanks in advans,
Unless you are significantly compressing before download, and decompressing the image after download, the problem is your 115,200 baud transfer rate, not the speed of reading from a file.
At the standard N/8/1 line encoding, each byte requires 10 bits to transfer, so you will be transferring 1150 bytes per second.
In 10 minutes, you will transfer 1150 * 60 * 10 = 6,912,000 bytes. At 3 bytes per pixel (for R, G, and B), this is 2,304,600 pixels, which happens to be the number of pixels in a 1920 by 1200 image.
The answer is to (a) increase the baud rate; and/or (b) compress your image (using something simple to decompress on the FPGA like RLE, if it is amenable to that sort of compression).

How to implement a TCP traffic limitation for the sender side?

I'm about to implement a webcam video chat system for multiple users in C++ (windows/linux). As the 'normal' user is usually connected via DSL/cable, there is a strong bandwidth limitation for my (prefered) TCP/IP connections.
The basic idea is to transmit the highest possible framerate given a bandwidth limitation for the sender side. (Other applications may still require internet bandwidth in the background.) In a second step, the camera-capture-rate shall be automatically adjusted to the network limitations to avoid unncessary CPU overhead.
What I have is a constant stream of compressed images (with strongly variing buffer sizes) that have to be transmitted to the remote side. Given a limitation of let's say 20kb/s, how do I best implement that limitation? (Note that the user shall define this limit!)
Thx in advance,
Mayday
Edit: Question clearifications (sry!)
It's about how to traffic-shape an arbitrary TCP/IP connection.
It's not how to implement image rate/quality reduction as my use-case suggests. (Altough I didn't consider to automatically adjust image compression, yet. (Thx Jon))
There are two things you can do to reduce your bandwidth:
Send smaller images (more compression)
Send less images
When implementing an algorithm that picks image size and quantity to honor the user-selected limit, you have to balance between a simple/robust algorithm and a performant algorithm (one that makes maximum use out of the limit).
The first approach I would try is to use a rolling average of the bandwidth you are using at any point in time to "seed" your algorithm. Every once in a while, check the average. If it becomes more than your limit, instruct the algorithm to use less (in proportion to how much you overstepped the limit). If it becomes significantly lower than your limit, say less than 90%, instruct the algorithm to use more.
The less/more instruction might be a variable (maybe int or float, really there is much scope for inventiveness here) used by your algorithm to decide:
How often to capture an image and send it
How hard to compress that image
You need a buffer / queue of at least 3 frames:
One frame currently being sent to the network;
One complete frame to be sent next;
One frame currently being copied from the camera.
When the network sender finishes sending a frame, it copies the "to be sent next" frame to the "currently sending" slot. When the camera reader finishes copying a frame from the camera, it replaces the "to be sent next" frame with the copied frame. (Obviously, synchronisation is required around the "to be sent next" frame).
The sender can then modulate its sending rate as it sees fit. If it's running slower than the camera, it will simply drop frames.