Sampling rate deviation and sound playing position - c++

When you set soundcard rate to, for example, 44100, you cannot guarantee actual rate be equal 44100. In my case traffic measurements between application and ALSA (in samples/sec) gave me value of 44066...44084.
This should not be related to resampling issues: even only-48000 hardware must "eat" data at 44100 rate in "44100" mode.
The problem occurs when i try to draw a cursor over waveform while this waveform is playing. I calculate cursor position using "ideal" sampling rate read from WAV-file (22050, ..., 44100, ..., 48000) and the milliseconds spent after playing start, using following C++ function:
long long getCurrentTimeMs(void)
{
boost::posix_time::ptime now = boost::posix_time::microsec_clock::local_time();
boost::posix_time::ptime epoch_start(boost::gregorian::date(1970,1,1));
boost::posix_time::time_duration dur = now - epoch_start;
return dur.total_milliseconds();
}
QTimer is used to generate frames for cursor animation, but i do not depend on QTimer precision, because i ask time by getCurrentTimeMs() (assiming it is precise enough) every frame, so i can work with varying framerate.
After 2-3 minutes of playing i see a little difference between what i hear and what i see - the cursor position is greater than playing position for something like 1/20 of second or so.
When i measure traffic that go through ALSA's callback i get mean value of 44083.7 samples/sec. Then i use this value in the screen drawing function as an actual rate. Now the problem disappears. The program is cross-platform, so i will test this measurements on windows and another soundcard later.
But is there a better way to sync sound and screen? Is there some not very CPU-consuming way of asking soundcard about actual playing sample number, for example?

This is a known effect, which is for example in Windows addressed by Rate Matching, described here Live Sources.
On playback, the effect is typically addressed by using audio hardware as "clock" and synchronizing to audio playback instead of "real" clock. That is, for example, with audio sampling rate 44100, next video frame of 25 fps video is presented in sync with 44100/25 sample playback rather than using 1/25 system time increment. This compensates for the imprecise effective playback rate.
On capture, the hardware itself acts as if it is delivering data at exactly requested rate. I think the best you can do is to measure effective rate and resample audio from effecive to correct sampling rate.

Related

Directshow returns wrong Frame rate FPS

I want to have the frame rate of a media file using DirectShow.
Currently, I use the following methods which seem inaccurate in some cases:
I add a SourceFilter to my graph, enum its pins, then call a pPin->ConnectionMediaType(&compressedMediaFormat) and extract AvgTimePerFrame out from it. As far as I understand it is the average time per frame expressed in 100 nanoseconds. So, I just divide 10,000,000 / AvgTimePerFrame to get the average FPS of the file.
For those media files which have almost the same frame time for all frames, I get a correct FPS. But for those, which have different frame times for different frames this method returns very inaccurate results.
A correct way to get that would be to get the duration and frame count of the file and calculate the average FPS out of it (frameCount / duration). This is a costly operation however as I understand because calculating the exact number of frames requires passing through the whole file.
I wonder if there is a way to get that frame rate information more accurately?
The media files don't have to be of fixed frame rate, in general - there might be variable frame rate. The metadata of the file still has some frame rate related information which, in this case, might be inaccurate. When you start accessing the file, you have the quickly available metadata information about the frame rate. Indeed, to get the full picture you are supposed to read all frames and process their time stamps.
Even though in many it is technically possible to quickly read just time stamps of frames without reading the actual data, DirectShow demultiplexers/parsers have no method defined to obtain the information, so you would have to read and count the frames to get the accurate information.
You don't need to decompress video for that though, and you can also remove clock from the filter graph when doing this, so that counting frames does not require streaming data in real time (frames would be streamed at maximal rate in that case).

Change FPS on video capture from file with opencv

I'm reading a video file, and it's slower than the actual FPS of the file (59 FPS #1080p) even if i'm not doing any processing to the image:
using namespace cv;
using namespace std;
// Global variables
UMat frame; //current frame
int main(int argc, char** argv)
{
VideoCapture capture("myFile.MP4");
namedWindow("Frame");
capture.set(CAP_PROP_FPS, 120); //not changing anything
cout>>capture.get(CAP_PROP_FPS);
while (charCheckForEscKey != 27) {
capture >>frame;
if (frame.empty())
break;
imshow("Frame", frame);
}
}
Even if I tried to set CAP_PROP_FPS to 120 it doesn't change the fps of the file and when I get(CAP_PROP_FPS) I still get 59.9...
When I read the video the actual outcome is more or less 54 FPS (even using UMat).
Is there a way to read the file at a higher FPS rate ?
I asked he question on the opencv Q&A website as well :http://answers.opencv.org/question/117482/change-fps-on-video-capture-from-file/
Is it just because my computer is too slow ?
TL;DR FPS is irrelevant to the problem, probably a performance issue
What FPS is used for? Before you can display a single frame of video, you have to read the data (from an HDD, DVD, Network, Internet or whatever) and decode it. Both of these operations take time, the amount of which differs from system to system, depending on the speed of HDD/Internet, processor speed etc. If we just display each frame as soon as it's ready, the resulting movie speed will, therefore, vary from system to system. That is usually not what we want, so along with the sequence of video frames we get the "frames per second" value (a. k. a. FPS), which tells us how soon we shall display each consecutive frame (once every 1/30-th of a second for 30 FPS, once every 1/60-th of a second for 60 FPS, etc.) If the frame is ready to be displayed but it's too early, we can wait till its time comes. If it's time to display a frame but it's not ready (on an underpowered/too busy system), there's not much we can do (maybe drop frames in some situations). To see the effect for yourself, try changing the FPS value for x2, saving the file and display it with VLC: for the same amount of data and same number of frame, you will notice that the speed of your video has doubled and the time - halved. Try writing each frame twice for your x2 FPS - you will see that the playback speed is back to normal (with double the number of frames and meaningless increase of the file size).
What FPS is not used for? When processing (not displaying) a video, we are not limited by the original FPS, and the processing goes as fast as it can. If your PC can process 1000 frames per second - good, if 1500 - even better. Needless to say, changing the FPS value in the file won't improve your CPU/HDD speed, so if you were only able to process 54 frames per second, you are still going to be able to only process 54 frames per second.
But how can VLC display faster? Assuming you didn't forget to switch from Debug to Release build before measuring time, there are still a number of possibilities: VLC is probably better optimized for the particular task of video playback (OpenCV is not really that fast at some tasks, plus it has to convert each frame to a more general Mat/UMat structure), multithreading (including "double buffering", as mentioned in the comments) is another possible reason, maybe caching as well (e. g. reading a block of data containing many frames from the HDD at once instead of reading and processing frames one by one).

WAV files at any rate except 44.1kHz have messed up sound

I'm using ALSA on Ubuntu to try to play a WAV file. Presently, I'm able to read the wav header to figure out the file's sampling rate and such then set the parameters on ALSA to correspond. This works perfectly for files with a 44.1kHz sampling rate, but other files with rates at ~11kHz or ~22kHz do not play correctly. I'm not sure that I am setting the sampling rate correctly.
val = realSampleRate;
//Sampling rate to given sampling rate
snd_pcm_hw_params_set_rate_max(handle, params, &val, &dir);
cout << "sampling at " << val << " Hz \n";
This gives the correct output ("sampling at 22050 Hz") but if I follow it with this:
val = realSampleRate;
snd_pcm_hw_params_set_rate_min(handle, params, &val, &dir);
cout << "sampling at " << val << " Hz \n";
the output proceeds to say "sampling at 44100 Hz" which is obviously contradictory. I also tried using snd_pcm_hw_params_set_rate_near but that doesn't work either, it says sampling at 44100 Hz on a 22050 file, and the audio throughout all of those were very messed up.
EDIT: One issue is incorrect sampling rates, which will speed up the playing, but the real issue comes from mono tracks. Mono tracks sound really distorted and very off.
EDIT: 8 Bit files are off too
Looks to me like your hardware is not capable of handling a 22.05Khz sampling rate for playback. The fact that the API function returns a different value is a clue.
ALSA is just an API. It can only do what your current underlying hardware is capable of supporting. Low-end, bottom-of-the-barrel, el-cheapo audio playback hardware will support a handful of sampling frequencies, and that's about it.
I had some custom-written audio recording and playback software, that was sampling and recording audio at a particular rate, then playing it back using ALSA's aplay. When I got some new hardware, I found that the new hardware was still capable of supporting my sampling rate for recording, for playback it didn't, and aplay simply proceeded to play back the previously recorded audio at the nearest supportable playback level, with hillarious results. I had to change my custom-written stuff to record and playback at the supported rate.
If the hardware does not support your requested playback rate, ALSA won't resample your raw audio data. It's up to you to resample it, for playback.
snd_pcm_hw_params_set_rate_max() sets the maximum sample rate, i.e., when this functions succeeds, the device's sample rate will not be larger than what you've specified.
snd_pcm_hw_params_set_rate_min() sets the minimum sample rate.
snd_pcm_hw_params_set_rate_near() searches for the nearest sample rate that is actually supported by the device, sets it, and returns it.
If you have audio data with a specific sample rate, and cannot do resampling, you must use snd_pcm_hw_params_set_rate().
Using "default" instead of "hw:0,0" solves this, including the sampling rate being too slow. "plughw:0,0" works as well, and it's better because you can select the different devices/cards programmatically whereas default just uses the default.

Is it possible to get Time stamps with use of Query performance counter (Win32,C )?

Is it possible to get Time stamps with use of Query performance counter (Win32,C++) ? If not what is the most accurate way of obtaining time stamps on Win32-C++ application?
QueryPerformanceCounter is just a counter that contains some value when the machine is powered on and counts up. It's not connected to the wall clock at all.
GetSystemTime and GetSystemTimeAsFileTime are accurate to ~15ms if that's good enough.
If you're targeting Windows 8 only then GetSystemTimePreciseAsFileTime is very precise.
If you need a really high resolution time on Windows pre 8 you can try a hybrid approach using the system time and the performance counter something like what is outlined here: Implement a Continuously Updating, High-Resolution Time Provider for Windows
If you're using C++ 11, then you might want to look at std::chrono::system_clock or std::chrono::high_resolution_clock as well.
What clock sources do you have in hardware?
You cannot use QueryPerformaceCounter if you want accuracy, since it ticks with the CPU clock, which is fast but not accurate.
The interrupt timer: This timer generates an interrupt with a frequency of 1 kHz and is probably quite exact.
The HPET: A 1 MHz timer. This one is the one you want to use.
The audio interface sample clock: The frequency of this clock depends on audio sample rate, but the interrupt frequency depends on audio buffer size, which typically is requiered to be longer than some ms.
I have successfully used a Waitable Timer to drive a screen capturing program at intervals of 40 ms. For a demo, see: http://www.youtube.com/watch?v=SLad8-IRtg4 . The audio track were recorded by an analog feedback from the audio out to audio in, demonstrating that the frame clock keep in sync with the sample clock of the audio device. The same clock should work in 1 ms, but not when the machine is busy rendering 3d stuff and compressing video frames to PNG :-)
To use audio device clock, record to a dummy buffer. On windows 7, use the new audio architecture to get the smallest possible buffer size. On older systems, you need kernel streaming. If you go for an audio sultion, you can also plug a function generator into your wave device and have a loop that returns a timestamp based on trigg condition. This way you will go below 1 ms.

Playing video at frame rates that are not multiples of the refresh rate.

I'm working on an application to stream video to OpenGL textures. My first thought was to lock the rendering loop to 60hz, so to play a video at 30fps or 60fps I would update the texture on every other frame or every frame respectively. How do computers play videos at other frame rates when monitors are at 60hz, or for that matter if a monitor is at 75 hz how do they play 30fps video?
For most consumer devices, you get something like 3:2 pulldown, which basically copies the source video frames unevenly. Specifically, in a 24 Hz video being shown on a 60 Hz display, the frames are alternately doubled and tripled. For your use case (video in OpenGL textures), this is likely the best way to do it, as it avoids tearing.
If you have enough compute ability to run actual resampling algorithms, you can convert any frame rate to any other frame rate. Your choice of algorithm defines how smooth the conversion looks, and different algorithms will work best in different scenarios.
Too much smoothness may cause things like the 120 Hz "soap opera" effect
[1]
[2]:
We have been trained by growing up watching movies at 24 FPS to expect movies to have a certain look and feel to them that is an artifact of that particular frame rate.
When these movies are [processed], the extra sharpness and clearness can make the movies look wrong to viewers, even though the video quality is actually closer to real.
This is commonly called the Soap Opera Effect, because some feel it makes these expensive movies look like cheap shot-on-video soap operas (because the videotape format historically used on soap operas worked at 30 FPS).
Essentially you're dealing with a resampling problem. Your original data was sampled at 30Hz or 60Hz, and you've to resample it to another sample rate. The very same algorithms apply. Most of the time you'll find articles about audio signal resampling. Just think each pixel's color channel to be a individual waveform you want to resample.