I capture a pcm sound at some sampling rate, e.g. 24 kHz. I need to encode it using some codec (I use Opus for that) to send over network. I noticed that at some sampling rate I use for encoding with Opus, I often hear some extra "cracking" noise at the receiving end. At other rates, it sounds ok. That might be an implementation bug, but I though there might be some constraints also that I don't know.
I also noticed that if I use another sampling rate while decoding Opus-encoded audio stream, I get a lower or higher pitch of sound, which seems logical to me. So I've read, that I need to resample on the other end, if receiving side doesn't support the original PCM sampling rate.
So I have 2 questions regarding all this:
Are there any constraints on sampling rate (or other parameters) of audio encoding? (Like I have a 24kHz pcm sound - maybe there are certain sample rates to use with it?)
Are there any common techniques to provide the same sound quality at both sides when sending audio stream over network?
The crackling noises are most likely a bug, since there is no limitations to the samplerate that would result in this kind of noise (there are other kinds of signal changes that come with sample rate conversion, especially when downsampling to a lower samplerate; but definitely not crackling).
A wild guess would be, that there is something wrong with the input buffer. Crackling often occurs if samples are omitted or duplicated, oftentimes the result of the boundaries of subsequent buffers not being correct.
Sending audio data over network in realtime will require compression, no matter what. The required data rate is simply too high. There are codecs which provide lossless audio compression (e.g. FLAC), but their compression ratio is comparatively low compared to e.g. Opus.
The problem was solved by buffering packets at receiving end and writing them to the soundcard buffer as soon as some amount has been reached. The 'crackling' noise was then most likely due to the gaps between subsequent frames that were sent to the soundcard buffer
Related
I am looking for a standardized approach to stream JPG images over the network. Also desirable would be a C++ programming interface, which can be easily integrated into existing software.
I am working on a GPGPU program which processes digitized signals and compresses them to JPG images. The size of the images can be defined by a user, typically the images are 1024 x 2048 or 2048 x 4096 pixel. I have written my "own" protocol, which first sends a header (image size in bytes, width, height and corresponding channel) and then the JPG data itself via TCP. After that the receiver sends a confirmation that all data are received and displayed correctly, so that the next image can be sent. So far so good, unfortunately my approach reaches just 12 fps, which does not satisfy the project requirements.
I am sure that there are better approaches that have higher frame rates. Which approach do streaming services like Netflix and Amzon take for UHD videos? Of course I googled a lot, but I couldn't find any satisfactory results.
Is there a standardized method to send JPG images over network with TCP/IP?
There are several internet protocols that are commonly used to transfer files over TCP. Perhaps the most commonly used protocol is HTTP. Another, older one is FTP.
Which approach do streaming services like Netflix and Amzon take for UHD videos?
Firstly, they don't use JPEG at all. They use some video compression codec (such as MPEG), that does not only compress the data spatially, but also temporally (successive frames tend to hold similar data). An example of the protocol that they might use to stream the data is DASH, which is operates over HTTP.
I don't have a specific library in mind that already does these things well, but some items to keep in mind:
Most image / screenshare/ video streaming applications use exclusively UDP, RTP,RTSP for the video stream data, in a lossy fashion. They use TCP for control flow data, like sending key commands, or communication between client / server on what to present, but the streamed data is not TCP.
If you are streaming video, see this.
Sending individual images you just need efficient methods to compress, serialize, and de-serialize, and you probably want to do so in a batch fashion instead of just one at a time.Batch 10 jpegs together, compress them, serialize them, send.
You mentioned fps so it sounds like you are trying to stream video and not just copy over images in fast way. I'm not entirely sure what you are trying to do. Can you elaborate on the digitized signals and why they have to be in jpeg? Can they not be in some other format, later converted to jpeg at the receiving end?
This is not a direct answer to your question, but a suggestion that you will probably need to change how you are sending your movie.
Here's a calculation: Suppose you can get 1Gb/s throughput out of your network. If each 2048x4096 file compresses to about 10MB (80Mb), then:
1000000000 ÷ (80 × 1000000) = 12.5
So, you can send about 12 frames a second. This means if you have a continuous stream of JPGs you want to display, if you want faster frame rates, you need a faster network.
If your stream is a fixed length movie, then you could buffer the data and start the movie after enough data is buffered to allow playback at desired frame rate sooner than waiting for the entire movie to download. If you want playback at 24 frames a second, then you will need to buffer at least 2/3rds of the movie before you being playback, because the the playback is twice is fast as your download speed.
As stated in another answer, you should use a streaming codec so that you can also take advantage of compression between successive frames, rather than just compressing the current frame alone.
To sum up, your playback rate will be limited by the number of frames you can transfer per second if the stream never ends (e.g., a live stream).
If you have a fixed length movie, buffering can be used to hide the throughput and latency bottlenecks.
I'm working on a kind of rich remote desktop system, with a video stream of the desktop encoded using avcodec/x264. I have to set manually the GOP size for the stream, and so far I was using a size of fps/2.
But I've just read the following on Wikipedia:
This structure [Group Of Picture# suggests a problem because the fourth frame (a P-frame) is needed in order to predict the second and the third (B-frames). So we need to transmit the P-frame before the B-frames and it will delay the transmission (it will be necessary to keep the P-frame).
It means I'm creating a lot of latency since the client needs to receive at least half of the GOP to output the first frame following the I frame. What is the best strategy for the GOP size if I want the smallest latency possible ? A gop of 1 picture ?
If you want to minimize latency with h264, you should generally avoid b-frames. This way the decoder has at least a chance to emit decoded frames early. This prevents decoder-induced latency.
You may also want to tune the encoder for latency, by reducing/disabling look-ahead. x264 has a "zero-latency" setting which should be a good starting point for finding you optimal settings.
The "GOP" size (which afaik is not really defined for h264; I'll just assume you mean the I(DR)-frame interval) does not necessarily affect the latency. This parameter only affects how long a client will have to wait until it can "sync" on the stream (time-to-first-picture).
I want to store a number of sound fragments as MP3 or WAV files, but these fragments are each highly repetitive (a 10 second burst of tone for example). Are the MP3 or WAV file formats able to take advantage of this - i.e. is there a sound file equivalent of run-length encoding?
No, neither codec can do this.
WAV files (typically) use PCM, which holds a value for every single sample. Even if there were complete digital silence (all values the same), every sample is stored.
MP3 works in frames of 1,152 samples. Each frame stands alone (well, there is the bit reservoir but for the purpose of encoding/decoding, this is just extra bandwidth made available). Even if there were a way to say do-this-n-times, it would be fixed within a frame. Now, if you are using MP3 with variable bit rate, I suspect that you will have great results with perfect sine waves since they have no harmonics. MP3 works by converting from the time domain to the frequency domain. That is, it samples the frequencies in each frame. If you only have one of those frequencies (or no sound at all), the VBR method would be efficient.
I should note that FLAC does use RLE when encoding silence. However, I don't think FLAC could be hacked to use RLE for 10 seconds of audio, since again there is a frame border. FLAC's RLE for silence is problematic for live internet radio stations that leave a few second gap inbetween songs. It's important for these stations to have a large buffer, since clients will often pause the stream if they don't receive enough data. (They do get caught back up again though as soon as that silent block is sent, once audio resumes.)
I'm writing a video player. For audio part i'm using XAudio2. For this i have separate thread that is waiting for BufferEnd event and after this fills buffer with new data and call SubmitSourceBuffer.
The problem is that XAudio2(driver or sound card) has huge delays before playing next buffer if buffer size is small (1024 bytes). I made measurements and XAudio takes up to two times long for play such chunk. (1024 bytes chunk of 48khz raw 2-channeled pcm should be played in nearly 5ms, but on my computer it's played up to 10ms). And nearly no delays if i make buffer 4kbytes or more.
I need such small buffer to be able making synchronizations with video clock or external clock (like ffplay does). If i make my buffer too big then end-user will hear lot of noises in output due to synchronization stuff.
Also i have made measurements on all my functions that are decoding and synchronizing audio or anything else that could block or produce delays, they take 0 or 1 ms to execute, so they are not the problem 100%.
Does anybody know what can it be and why it's happenning? Can anyone check if he has same delay problems with small buffer?
I've not experienced any delay or pause using .wav files. If you are using mp3 format, it may add silence at the beginning and end of the sound during the compress operation thus causing a delay in your sound playing. See this post for more information.
I'm trying to encode and transfer raw video frames over LAN (100 Mbps) connection (frames captured by a web cam). What video/image encoding format is recommended for fast compression (with not much regard for the compression ratio) ?
Thanks,
If you need individual frames to be independent of one another, use mjpeg (which is equivalent to encoding each frame as a jpeg). It's simple and you have plenty of tools with which to manipulate it.
Otherwise, as long as you have a remotely modern cpu and the resolution isn't insanely high, just use a simple mpeg4 asp or even h264 profile. Encoding 320x240 video with the simplest profile should take less than 5% cpu on a current low-end machine.