WAV files at any rate except 44.1kHz have messed up sound - c++

I'm using ALSA on Ubuntu to try to play a WAV file. Presently, I'm able to read the wav header to figure out the file's sampling rate and such then set the parameters on ALSA to correspond. This works perfectly for files with a 44.1kHz sampling rate, but other files with rates at ~11kHz or ~22kHz do not play correctly. I'm not sure that I am setting the sampling rate correctly.
val = realSampleRate;
//Sampling rate to given sampling rate
snd_pcm_hw_params_set_rate_max(handle, params, &val, &dir);
cout << "sampling at " << val << " Hz \n";
This gives the correct output ("sampling at 22050 Hz") but if I follow it with this:
val = realSampleRate;
snd_pcm_hw_params_set_rate_min(handle, params, &val, &dir);
cout << "sampling at " << val << " Hz \n";
the output proceeds to say "sampling at 44100 Hz" which is obviously contradictory. I also tried using snd_pcm_hw_params_set_rate_near but that doesn't work either, it says sampling at 44100 Hz on a 22050 file, and the audio throughout all of those were very messed up.
EDIT: One issue is incorrect sampling rates, which will speed up the playing, but the real issue comes from mono tracks. Mono tracks sound really distorted and very off.
EDIT: 8 Bit files are off too

Looks to me like your hardware is not capable of handling a 22.05Khz sampling rate for playback. The fact that the API function returns a different value is a clue.
ALSA is just an API. It can only do what your current underlying hardware is capable of supporting. Low-end, bottom-of-the-barrel, el-cheapo audio playback hardware will support a handful of sampling frequencies, and that's about it.
I had some custom-written audio recording and playback software, that was sampling and recording audio at a particular rate, then playing it back using ALSA's aplay. When I got some new hardware, I found that the new hardware was still capable of supporting my sampling rate for recording, for playback it didn't, and aplay simply proceeded to play back the previously recorded audio at the nearest supportable playback level, with hillarious results. I had to change my custom-written stuff to record and playback at the supported rate.
If the hardware does not support your requested playback rate, ALSA won't resample your raw audio data. It's up to you to resample it, for playback.

snd_pcm_hw_params_set_rate_max() sets the maximum sample rate, i.e., when this functions succeeds, the device's sample rate will not be larger than what you've specified.
snd_pcm_hw_params_set_rate_min() sets the minimum sample rate.
snd_pcm_hw_params_set_rate_near() searches for the nearest sample rate that is actually supported by the device, sets it, and returns it.
If you have audio data with a specific sample rate, and cannot do resampling, you must use snd_pcm_hw_params_set_rate().

Using "default" instead of "hw:0,0" solves this, including the sampling rate being too slow. "plughw:0,0" works as well, and it's better because you can select the different devices/cards programmatically whereas default just uses the default.

Related

Are there any constraints to encode a audio signal?

I capture a pcm sound at some sampling rate, e.g. 24 kHz. I need to encode it using some codec (I use Opus for that) to send over network. I noticed that at some sampling rate I use for encoding with Opus, I often hear some extra "cracking" noise at the receiving end. At other rates, it sounds ok. That might be an implementation bug, but I though there might be some constraints also that I don't know.
I also noticed that if I use another sampling rate while decoding Opus-encoded audio stream, I get a lower or higher pitch of sound, which seems logical to me. So I've read, that I need to resample on the other end, if receiving side doesn't support the original PCM sampling rate.
So I have 2 questions regarding all this:
Are there any constraints on sampling rate (or other parameters) of audio encoding? (Like I have a 24kHz pcm sound - maybe there are certain sample rates to use with it?)
Are there any common techniques to provide the same sound quality at both sides when sending audio stream over network?
The crackling noises are most likely a bug, since there is no limitations to the samplerate that would result in this kind of noise (there are other kinds of signal changes that come with sample rate conversion, especially when downsampling to a lower samplerate; but definitely not crackling).
A wild guess would be, that there is something wrong with the input buffer. Crackling often occurs if samples are omitted or duplicated, oftentimes the result of the boundaries of subsequent buffers not being correct.
Sending audio data over network in realtime will require compression, no matter what. The required data rate is simply too high. There are codecs which provide lossless audio compression (e.g. FLAC), but their compression ratio is comparatively low compared to e.g. Opus.
The problem was solved by buffering packets at receiving end and writing them to the soundcard buffer as soon as some amount has been reached. The 'crackling' noise was then most likely due to the gaps between subsequent frames that were sent to the soundcard buffer

Can the mp3 or wav file format take advantage of repetitious sounds?

I want to store a number of sound fragments as MP3 or WAV files, but these fragments are each highly repetitive (a 10 second burst of tone for example). Are the MP3 or WAV file formats able to take advantage of this - i.e. is there a sound file equivalent of run-length encoding?
No, neither codec can do this.
WAV files (typically) use PCM, which holds a value for every single sample. Even if there were complete digital silence (all values the same), every sample is stored.
MP3 works in frames of 1,152 samples. Each frame stands alone (well, there is the bit reservoir but for the purpose of encoding/decoding, this is just extra bandwidth made available). Even if there were a way to say do-this-n-times, it would be fixed within a frame. Now, if you are using MP3 with variable bit rate, I suspect that you will have great results with perfect sine waves since they have no harmonics. MP3 works by converting from the time domain to the frequency domain. That is, it samples the frequencies in each frame. If you only have one of those frequencies (or no sound at all), the VBR method would be efficient.
I should note that FLAC does use RLE when encoding silence. However, I don't think FLAC could be hacked to use RLE for 10 seconds of audio, since again there is a frame border. FLAC's RLE for silence is problematic for live internet radio stations that leave a few second gap inbetween songs. It's important for these stations to have a large buffer, since clients will often pause the stream if they don't receive enough data. (They do get caught back up again though as soon as that silent block is sent, once audio resumes.)

WASAPI lagging playback

I'm writing a program to windows store in c++ which plays back the microphone. I have to modify the bits before sending that to the speakers. Firstly I wanted to play back the microphone without any effect bit it is lagging. The frequency and the bit rate is the same (24 bit, 192000Hz) but I also tried with (24 bit, 96000Hz). I debugged it and it seems that the speaker is faster therefore it has to wait for the data from the microphone like the squeakers would work in a higher frequency but according to the settings it doesn't. Dose anyone have a sightliest idea what is the problem here?
When you say that there are some 'lag', do you mean that there are some delay between when you feed the audio capture device with data and when the playback device renders the data or do you mean that the audio stream is 'chopped' with small pauses in between each sample being rendered?
If there's delay in playback I would take a look at with what latency value you've initialized the audio capture client.
If there are small pauses then I would recommend you using double buffering of sample data so that one buffer is being rendered while the other is being re-fetched from the audio capture device.

Sampling rate deviation and sound playing position

When you set soundcard rate to, for example, 44100, you cannot guarantee actual rate be equal 44100. In my case traffic measurements between application and ALSA (in samples/sec) gave me value of 44066...44084.
This should not be related to resampling issues: even only-48000 hardware must "eat" data at 44100 rate in "44100" mode.
The problem occurs when i try to draw a cursor over waveform while this waveform is playing. I calculate cursor position using "ideal" sampling rate read from WAV-file (22050, ..., 44100, ..., 48000) and the milliseconds spent after playing start, using following C++ function:
long long getCurrentTimeMs(void)
{
boost::posix_time::ptime now = boost::posix_time::microsec_clock::local_time();
boost::posix_time::ptime epoch_start(boost::gregorian::date(1970,1,1));
boost::posix_time::time_duration dur = now - epoch_start;
return dur.total_milliseconds();
}
QTimer is used to generate frames for cursor animation, but i do not depend on QTimer precision, because i ask time by getCurrentTimeMs() (assiming it is precise enough) every frame, so i can work with varying framerate.
After 2-3 minutes of playing i see a little difference between what i hear and what i see - the cursor position is greater than playing position for something like 1/20 of second or so.
When i measure traffic that go through ALSA's callback i get mean value of 44083.7 samples/sec. Then i use this value in the screen drawing function as an actual rate. Now the problem disappears. The program is cross-platform, so i will test this measurements on windows and another soundcard later.
But is there a better way to sync sound and screen? Is there some not very CPU-consuming way of asking soundcard about actual playing sample number, for example?
This is a known effect, which is for example in Windows addressed by Rate Matching, described here Live Sources.
On playback, the effect is typically addressed by using audio hardware as "clock" and synchronizing to audio playback instead of "real" clock. That is, for example, with audio sampling rate 44100, next video frame of 25 fps video is presented in sync with 44100/25 sample playback rather than using 1/25 system time increment. This compensates for the imprecise effective playback rate.
On capture, the hardware itself acts as if it is delivering data at exactly requested rate. I think the best you can do is to measure effective rate and resample audio from effecive to correct sampling rate.

encoding camera with audio source in realtime with WMAsfWriter - jitter problem

I build a DirectShow graph consisting of my video capture filter
(grabbing the screen), default audio input filter both connected
through spliiter to WM Asf Writter output filter and to VMR9 renderer.
This means I want to have realtime audio/video encoding to disk
together with preview. The problem is that no matter what WM profile I
choose (even very low resolution profile) the output video file is
always "jitter" - every few frames there is a delay. The audio is ok -
there is no jitter in audio. The CPU usage is low < 10% so I believe
this is not a problem of lack of CPU resources. I think I'm time-
stamping my frames correctly.
What could be the reason?
Below is a link to recorder video explaining the problem:
http://www.youtube.com/watch?v=b71iK-wG0zU
Thanks
Dominik Tomczak
I have had this problem in the past. Your problem is the volume of data being written to disk. Writing to a faster drive is a great and simple solution to this problem. The other thing I've done is placing a video compressor into the graph. You need to make sure both input streams are using the same reference clock. I have had a lot of problems using this compressor scheme and keeping a good preview. My preview's frame rate dies even if i use an infinite Tee rather than a Smart Tee, the result written to disk was fine though. Its also worth noting that the more of a beast the machine i was running it on was the less of an issue so it may not actually provide much of a win if you need both over sticking a new faster hard disk in the machine.
I don't think this is an issue. The volume of data written is less than 1MB/s (average compression ratio during encoding). I found the reason - when I build the graph without audio input (WM ASF writer has only video input pint) and my video capture pin is connected through Smart Tree to preview pin and to WM ASF writer input video pin then there is no glitch in the output movie. I reckon this is the problem with audio to video synchronization in my graph. The same happens when I build the graph in GraphEdit. Without audio, no glitch. With audio, there is a constant glitch every 1s. I wonder whether I time stamp my frames wrongly bu I think I'm doing it correctly. How is the general solution for audio to video synchronization in DirectShow graphs?