Proper implementation of libspotify get_audio_buffer_stats callback - c++

Can anyone help decipher the correct implementation of the libspotify get_audio_buffer_stats callback. Specifically, we are supposed to populate a sp_audio_buffer_stats buffer, consisting of samples and stutter?
According to the Docs:
int samples - Samples in buffer.
int stutter - Number of stutters (audio dropouts) since last query.
I'm wondering about "samples." What exactly is this referring to?
The music playback (audio_delivery) callback has a num_frames variable, but then you have the issue of audio format (channels and/or sample_rate).
Is it correct to set "samples" to total amount of "num_frames" currently in my buffer? Or do I need to run some math based on total "num_samples", "channels", and "sample_rate"

It should be the number of frames in your output buffer. I.e. int samples is slightly misnamed and should probably be called int frames instead.


How to validate properly ffmpeg pts/dts after demuxing/decoding?

How should I validate pts/dts after demuxing and then after decoding?
For me it is significant to have valid pts all the time for days and
possibly weeks of continuous streaming.
After demuxing I check:
dts <= pts
prev_packet_dts < next_packet_pts
I also discard packets with AV_NOPTS_VALUE and wait for packets with
proper pts, because I don't know video duration at this case.
pts of packets can be not increasing because of I-P-B frames
Is it all right?
What about decoded AVFrames?
Should 'pts' be increasing all the time?
Why at some point 'pts' could lag behind 'dts'?
Why pict_type is a parameter of AVFrame? Should be at AVPacket, because
AVPacket is a compressed frame, not the opposite?
Ideally, yes. Unless if your format allows discontinuities, or wraps timestamps around due to overflow, like MPEG-TS.
Writing error.
It is an informational field, indicating the provenance of the frame. It can be used by filters or encoders, e.g. keyframe alignment during a re-encode.
At libav support I was advised to not rely on decoder output. It is more solid to produce pts/dts for encoding/muxing manually and I should search for ffmpeg tools sources to proper implementation. I will search for this approach.
For now I discard AVFrames only with AV_NOPTS_VALUE, and the rest of encoding/muxing works fine.
Validation of AVPackets after Demuxing remains the same, as described above.

Synchronizing input pins in directshow

I am creating a directshow filter which's purpose is to take 3 input pins and create a video which shows alternately vidoe from the first source, the second source and the third source, in a fixed time internal.
So if i have three webcam connected to my filter, i want the final video for example to show 5 seconds of the first cam, five seconds of the second cam, and so on...
I have tried two approaches:
Approach one
I use a class TimeManager. This class has a function isItPinsTurn(pinname). This functions returns true or false regarding if the pin is supposed to send sample to the output. To do this the TimeManager creates a new thread which sleeps every x seconds.
After it slept it changes to the current active inputpin to the next.
The result is that every x seconds the isItPinSTurn(pinname) function returns another pin. This way every pin only seconds output to the outputpin when it is its turn, hence i get the desired videos with x intervalls between the input cam.
The problem with this approach
Sleep doesn't seem to work in directshow filters. I get a runtime error:
abort() has been called
Approach two
I use the samples GetMediaTime method and a buffer which keeps track of how much video samples in terms of its mediatime, has already been sent to the output pin. This is best illustrated with code:
void MyFilter::acceptFilterInput(LPCWSTR pinname, IMediaSample* sample)
mylogger->LogDebug("In acceptFIlterInput", L"D:\\TEMP\\yc.log");
if (wcscmp(pinname, this->currentInputPin) == 0)
LONGLONG timestart;
LONGLONG timeend;
sample->GetTime(&timestart, &timeend);
*mediaTimeBuffer += timeend - timestart;
if (*mediaTimeBuffer > this->MEDIATIME)
*mediaTimeBuffer = 0;
When the filter starts the currentInputPin is set to pin0 (the first). Calls to acceptFilterInput (which is called by the the input pins receie function) adjust the mediaTimeBUffer with the size of the MediaSample-MediaTime. If this buffer is higher than MEDIATIME (which can for example be 5 (seconds)), the buffer is set back to zero and the next pin is set active.
Problems with this approach
I am not even sure if CMediaSample->GetMediaTime returns the data i need, as it seems to return negative numbers, which doesn't seem to make much sense. I didn't find useful information about the return value of GetMediaTime on the web.
You are expected to block execution (incoming calls to IPin::Receive) on input streams so that other streams could catch up on their own streaming threads. You typically achieve this by either using wait/synchronization APIs and functions, or by holding references on media samples so that input peer would block on empty allocator waiting for a media sample (buffer) to get available.
Yes Sleep works well, although polling is the worst of possible options.
Approach two does not make sense for me because I don't see any real synchronization there: there is no execution blocking, and there is no making pin active. You cannot force data on the input pin, you only can wait to get called with new media sample. So you should block accepting data on one input stream/pin until you get data on another.
Some useful relevant information on multiplexing:
How to make a DirectShow Muxer Filter - Part 1
How to make a DirectShow Muxer Filter - Part 2
GDCL MPEG-4 Multiplexer - available in source, and can multiplex data from 2+ streams

FFMPEG reading keyframes

I am trying to write a c++ program that would read key frames from the video file using ffmpeg.
So far I managed to get all the frames using av_read_frame where you sequentially read
frame by frame.
But I having some problems using av_seek_frame which (if I am correct) supposed to do the trick for keyframes.
int av_seek_frame(AVFormatContext *s, int stream_index, int64_t timestamp, int flags);
I have FormatContext but what are other correct arguments to sequentially get only all keyframes ?
Is there other function that I can use instead?
EDIT: In av_read_frame i am getting AVPacket, which I can use to get frame data, but how I can get packet by using av_seek_frame ?
SOLUTION: OK there is a simple boolean value in AVFrame->key_frame. True if its a keyframe
av_seek_frame has the ability to seek to a certain timestamp in a video file. It takes 4 parameters: a pointer to the AVFormatContext, a stream index, the timestamp to seek to and flags to select the direction and seeking mode.
The function will then seek to the first key frame before the given timestamp.
Check the documentation of that function for more information.

Silence between played buffers in OpenAL?

I use alSourceQueueBuffers to stream buffers into a AL sound source. I have buffers of different size that need to be played one after another. So far so good, however, between some buffer I need a variable amount of silence, how can I add it programmatic?
Perhaps the easiest way would be to generate buffers that hold silence of the length needed and queue them appropriately. You just need to make an array full of zeros based on the sample rate and the desired length of silence and pass it into the buffer.
If you want things to be more complicated, then you can't queue all of the buffers. You queue the one that needs to play right now and set a timer for when it will be done (and the amount of silent time has also passed). Then you can queue the next buffer. Or you can poll the source to see if it has stopped and when it does, start counting down the silent time. You could also use the streaming functionality...
This worked for me. Sample rate needs to be the same as other buffers queued on your source. You could also have a 'greatest common denominator' length buffer and just queue it up multiple times.
int sampleRate=22050;
double sTime=2.5; // How long to maintain silence.
int sampleCount= int(sTime*sampleRate);
int byteCount = sampleCount*sizeof(short);
short* silence = (short*)malloc(byteCount);

How to use ALSA's snd_pcm_writei()?

Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
Full source code at
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
Is that how it works?
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack ( which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.