How to use ALSA's snd_pcm_writei()? - c++

Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9

frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.

I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.

I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.

If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html

Related

Speed up data logging code

I have a device that outputs 64 bits of binary data at a rate of 1KHz. I am reading the device over USB via a 3rd party DLL, converting the binary data into a float, timestamping it, and writing to file.
I have the following setup at the moment:
int main(int argc, char* argv[])
{
unsigned char Message_Rx[64];
USHORT Bytes_Read=0;
std::ofstream out(argv[1]);
do
{
Result = Comms.USBRead(&Message_Rx[0],&Bytes_Read);
unsigned long now = getTickCount(start);
if(Result != 0)
{
uint16_t msb (Message_Rx[11] & 0xff) \\leftshited 8;
uint16_t lsb (Message_Rx[12] & 0xff);
uint16_t rate = msb | lsb;
char outstring[1024];
sprintf(outstring, "%d\t%.7f", now, (float)rate*0.03125);
out << outstring << "\n";
}
}while(!kbhit());
out.close();
}
(Sorry, formatting gets messed up with >> or <<).
This produces perfectly good results on my desktop. There doesn't appear to be any data missing and the timestamps are continuous and 1ms apart.
143379582 -0.5937500
143379583 -1.5312500
143379584 -1.6250000
143379585 -1.4062500
143379586 -1.1875000
143379587 -1.3437500
143379588 -1.3125000
143379589 -1.3125000
143379590 -1.1562500
But when I run this on the old laptop that I need to use I get timestamps that appear in blocks and it looks like there must be some data missing:
143379582 -0.5937500
143379582 -1.5312500
143379582 -1.6250000
143379582 -1.4062500
143379582 -1.1875000
143379593 -1.3437500
143379593 -1.3125000
143379593 -1.3125000
143379593 -1.1562500
Is there a way to achieve a speedup of my code so that I won't lose data?
To say this loud and clear: for any PC that is not a Intel 486SX, 64kb/s is a utmost laughable rate. Getting a few Mb/s over USB is very doable with small, Dollar-a-piece microcontrollers without any optimization.
Whatever goes wrong needs investigation much more than your code does.
I don't know the Comms library, but that's where I'd look for the place where time is spent.
Other than that, your printing stuff to the screen should take orders of magnitude more time than your processing, but still shouldn't be a problem. As mentioned, 1kS/s * 64 b/S is nothing for modern (read: last twenty years) PC hardware.
I recommend storing the raw data until the key is hit. After the key is pressed, output the data.
You want to remove formatting and output from high performance code areas.
Paraphrasing a song, There will be time enough for printing when the data's done.
Edit 1:
An array-based circular queue is a good data structure to hold the incoming data. This gives you the last N data samples.
Whenever you have issues with performance, your first step should be to profile the code to see what parts of it are taking up time.
However, for your code, I would say that the printing and string handling are unnecessary for the main loop. I would have a separate array of timestamps and within my main loop only acquire data.
After a key is hit, you no longer have timing restrictions and can deal with the somewhat expensive operation of file I/O and building up of the strings.
A final note is that your OS might be stealing CPU cycles from you. You may want to try to run your code with higher priorities to rule out scheduling.
With all that said, as was mentioned above, your data rate should be sustainable unless you're running on some really vintage hardware.

Synchronizing input pins in directshow

I am creating a directshow filter which's purpose is to take 3 input pins and create a video which shows alternately vidoe from the first source, the second source and the third source, in a fixed time internal.
So if i have three webcam connected to my filter, i want the final video for example to show 5 seconds of the first cam, five seconds of the second cam, and so on...
I have tried two approaches:
Approach one
I use a class TimeManager. This class has a function isItPinsTurn(pinname). This functions returns true or false regarding if the pin is supposed to send sample to the output. To do this the TimeManager creates a new thread which sleeps every x seconds.
After it slept it changes to the current active inputpin to the next.
The result is that every x seconds the isItPinSTurn(pinname) function returns another pin. This way every pin only seconds output to the outputpin when it is its turn, hence i get the desired videos with x intervalls between the input cam.
The problem with this approach
Sleep doesn't seem to work in directshow filters. I get a runtime error:
abort() has been called
Approach two
I use the samples GetMediaTime method and a buffer which keeps track of how much video samples in terms of its mediatime, has already been sent to the output pin. This is best illustrated with code:
void MyFilter::acceptFilterInput(LPCWSTR pinname, IMediaSample* sample)
{
mylogger->LogDebug("In acceptFIlterInput", L"D:\\TEMP\\yc.log");
if (wcscmp(pinname, this->currentInputPin) == 0)
{
outpin->Deliver(sample);
LONGLONG timestart;
LONGLONG timeend;
sample->GetTime(&timestart, &timeend);
*mediaTimeBuffer += timeend - timestart;
if (*mediaTimeBuffer > this->MEDIATIME)
{
this->SetNextPinActive(pinname);
*mediaTimeBuffer = 0;
}
}
}
When the filter starts the currentInputPin is set to pin0 (the first). Calls to acceptFilterInput (which is called by the the input pins receie function) adjust the mediaTimeBUffer with the size of the MediaSample-MediaTime. If this buffer is higher than MEDIATIME (which can for example be 5 (seconds)), the buffer is set back to zero and the next pin is set active.
Problems with this approach
I am not even sure if CMediaSample->GetMediaTime returns the data i need, as it seems to return negative numbers, which doesn't seem to make much sense. I didn't find useful information about the return value of GetMediaTime on the web.
You are expected to block execution (incoming calls to IPin::Receive) on input streams so that other streams could catch up on their own streaming threads. You typically achieve this by either using wait/synchronization APIs and functions, or by holding references on media samples so that input peer would block on empty allocator waiting for a media sample (buffer) to get available.
Yes Sleep works well, although polling is the worst of possible options.
Approach two does not make sense for me because I don't see any real synchronization there: there is no execution blocking, and there is no making pin active. You cannot force data on the input pin, you only can wait to get called with new media sample. So you should block accepting data on one input stream/pin until you get data on another.
Some useful relevant information on multiplexing:
How to make a DirectShow Muxer Filter - Part 1
How to make a DirectShow Muxer Filter - Part 2
GDCL MPEG-4 Multiplexer - available in source, and can multiplex data from 2+ streams

Serial communication protocol design issues

This is an embedded solution using C++, im reading the changes of brightness from a cellphone screen, from very bright (white) to dark (black).
Using JavaScript and a very simple script im changing the background of a webpage from white to black on 100 milliseconds intervals and reading the result on my brightness sensor, as expected the browser is not very precise on timing, some times it does 100ms sometimes less and sometimes more with a huge deviation at times.
var syncinterval = setInterval(function(){
bytes = "010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101010101";
bit = bytes[i];
output_bit(bit);
i += 1;
if(i > bytes.length) {
clearInterval(syncinterval);
i = 0;
for (i=0; i < input.length; i++) {
tbits = input[i].charCodeAt(0).toString(2);
while (tbits.length < 8) tbits = '0' + tbits;
bytes += tbits;
}
console.log(bytes);
}
}, sync_speed);
My initial idea, before knowing how the timing was on the browser was to use asynchronous serial communication, with some know "word" to sync the stream of data as RS232 does with his start bit, but on RS232 the clocks are very precise.
I could use a second sensor to read a different part of the screen as a clock, in this case even if the monitor or the browser "decides" to go faster or slower my system will only read when there is a clock signal (this is a similar application were they swipe the sensors instead of making the screen flicks as i need), but this require a more complex hardware system, i would like not to complicate things before searching for a software solution.
I don't need high speeds, the data im trying to send is just about 8 Bytes as much.
With any kind of asynchronous communications, you rely on transmitter sending a new 'bit' of data at a fixed time interval, and the receiver sampling the data at the same (fixed) interval. If the browser isn't accurate on timings, you'll just need to slow the bitrate down until its good enough.
There are a few tricks you can use to help you improve the reliability:-
a : While sending, calculate the required 'start transmit time' of each 'bit' in advance, and modify the delay after each bit has been 'sent', based on current time vs. required time. This means you'll avoid cumulative errors (i.e. if Bit 1 is sent a little 'late', the delay to bit 2 will be reduced to compensate), rather than delaying a constant N microseconds per bit.
b: While receiving, you must sample the incoming data much faster than you expect changes. (UARTS normally use a 16x oversample) This means you can resynchronize with the 'start bit' (the initial change from 1 to 0 in your diagram) and you can then sample each bit at the expected 'centre' of its time period.
In other words, if you're sending data at 1000us intervals, you sample data at ~62us intervals, and when you detect a 'start bit, you wait 500us to put you in the centre of the time period, then take 8 single-bit samples at 1000us intervals to form an 8-bit byte.
You might consider not using a fixed-rate encoding, where each bit is represented as a sequence of the same length, and instead go for a variable-rate encoding:
Time: 0 1 2 3 4
0: _/▔\_
1: _/▔▔▔▔▔\_
This means that when decoding, all you need to do is to measure the time the screen is lit. Short pulses are 0s, long pulses are 1s. It's woefully inefficient, but doesn't require accurate clocking and should be relatively resistant to inaccurate timing. By using some synchronisation pulses (say, an 010 sequence) between bytes you can automatically detect the length of the pulses and so end up not needing a fixed clock at all.

Proper implementation of libspotify get_audio_buffer_stats callback

Can anyone help decipher the correct implementation of the libspotify get_audio_buffer_stats callback. Specifically, we are supposed to populate a sp_audio_buffer_stats buffer, consisting of samples and stutter?
According to the Docs:
int samples - Samples in buffer.
int stutter - Number of stutters (audio dropouts) since last query.
I'm wondering about "samples." What exactly is this referring to?
The music playback (audio_delivery) callback has a num_frames variable, but then you have the issue of audio format (channels and/or sample_rate).
Is it correct to set "samples" to total amount of "num_frames" currently in my buffer? Or do I need to run some math based on total "num_samples", "channels", and "sample_rate"
It should be the number of frames in your output buffer. I.e. int samples is slightly misnamed and should probably be called int frames instead.

Limit iterations per time unit

Is there a way to limit iterations per time unit? For example, I have a loop like this:
for (int i = 0; i < 100000; i++)
{
// do stuff
}
I want to limit the loop above so there will be maximum of 30 iterations per second.
I would also like the iterations to be evenly positioned in the timeline so not something like 30 iterations in first 0.4s and then wait 0.6s.
Is that possible? It does not have to be completely precise (though the more precise it will be the better).
#FredOverflow My program is running
very fast. It is sending data over
wifi to another program which is not
fast enough to handle them at the
current rate. – Richard Knop
Then you should probably have the program you're sending data to send an acknowledgment when it's finished receiving the last chunk of data you sent then send the next chunk. Anything else will just cause you frustrations down the line as circumstances change.
Suppose you have a good Now() function (GetTickCount() is bad example, it's OS specific and has bad precision):
for (int i = 0; i < 1000; i++){
DWORD have_to_sleep_until = GetTickCount() + EXPECTED_ITERATION_TIME_MS;
// do stuff
Sleep(max(0, have_to_sleep_until - GetTickCount()));
};
You can check elapsed time inside the loop, but it may be not an usual solution. Because computation time is totally up to the performance of the machine and algorithm, people optimize it during their development time(ex. many game programmer requires at least 25-30 frames per second for properly smooth animation).
easiest way (for windows) is to use QueryPerformanceCounter(). Some pseudo-code below.
QueryPerformanceFrequency(&freq)
timeWanted = 1.0/30.0 //time per iteration if 30 iterations / sec
for i
QueryPerf(count1)
do stuff
queryPerf(count2)
timeElapsed = (double)(c2 - c1) * (double)(1e3) / double(freq) //time in milliseconds
timeDiff = timeWanted - timeElapsed
if (timeDiff > 0)
QueryPerf(c3)
QueryPerf(c4)
while ((double)(c4 - c3) * (double)(1e3) / double(freq) < timeDiff)
queryPerf(c4)
end for
EDIT: You must make sure that the 'do stuff' area takes less time than your framerate or else it doesn't matter. Also instead of 1e3 for milliseconds, you can go all the way to nanoseconds if you do 1e9 (if you want that much accuracy)
WARNING... this will eat your CPU but give you good 'software' timing... Do it in a separate thread (and only if you have more than 1 processor) so that any guis wont lock. You can put a conditional in there to stop the loop if this is a multi-threaded app too.
#FredOverflow My program is running very fast. It is sending data over wifi to another program which is not fast enough to handle them at the current rate. – Richard Knop
What you might need a buffer or queue at the receiver side. The thread that receives the messages from the client (like through a socket) get the message and put it in the queue. The actual consumer of the messages reads/pops from the queue. Of course you need concurrency control for your queue.
Besides the flow control methods mentioned, if you also have the need to maintain an accurate specific data sending rate in your sender part. Usually it can be done like this.
E.x. if you want to send at 10Mbps, create a timer of interval 1ms so it will call a predefined function every 1ms. Then in the timer handler function, by keep tracking of 2 static variables 1)Time elapsed since beginning of sending data 2)How much data in bytes have been sent up to last call, you can easily calculate how much data is needed to be sent in the current call (or just sleep and wait for next call).
By this way, you can do "streaming" of data in a very stable way with very little jitterness, and this is usually adopted in streaming of videos. Of course it also depends on how accurate the timer is.