real time audio processing in C++ - c++

I want to produce software that reads raw audio from an external audio interface (Focusrite Scarlett 2i2) and processes it in C++ before returning it to the interface for playback. I currently run Windows 8 and was wondering how to do this with minimum latency?
I've spent a while looking into (boost) ASIO but the documentation seems fairly poor. I've also been considering OpenCL but I've been told it would most likely have higher latency. Ideally I'd like to be able to just access the Focusrite driver directly.
I'm sorry that this is such an open question but I've been having some trouble finding educational materiel on Audio Programming, other than just manipulating the audio when provided by a third party plug in design suite such as RackAFX. I'd also be grateful if anyone could recommend some reading on low level stuff like this.

You can get very low latency by communicating directly with the Focuswrite ASIO driver (this is totally different than boost ASIO). To work with this you'll need to register and download the ASIO SDK from Steinberg. Within the API download there is a Visual C++ sample project called hostsample which is a good starting point and there is pretty good documentation about the buffering process that is used by ASIO.
ASIO uses double buffering. Your application is able to choose a buffer size within the limits of the driver. For each input channel and each output channel, 2 buffers of that size are created. While the driver is playing from and recording to one set of buffers your program is reading from and writing to the other set. If your program was performing a simple loopback then it would have access to the input 1 buffer period after it was recorded, would write directly to the output buffer which would be played out on the next period so there would be 2 buffer periods of latency. You'll need to experiment to find the smallest buffer size you can tolerate without glitches and this will give you the lowest latency. And of course the signal processing code will need to be optimized well enough to keep up. A 64 sample (1.3 ms # 48kHz) is not unheard of.

Related

No audio output from one of two streams when rendering directly to WASAPI

I've been stuck on this problem for weeks now and Google is no help, so hopefully some here can help me.
I am programming a software sound mixer in C++, getting audio packets from the network and Windows microphones, mixing them together as PCM, and then sending them back out over the network and to speakers/USB headsets. This works. I have a working setup using the PortAudio library to handle the interface with Windows. However, my supervisors think the latency could be reduced between this software and our system, so in an attempt to lower latency (and better handle USB headset disconnects) I'm now rewriting the Windows interface layer to directly use WASAPI. I can eliminate some buffers and callbacks doing this, and theoretically use the super low latency interface if that's still not fast enough for the higher ups.
I have it only partially working now, and the partially part is what is killing me here. Our system has the speaker and headphones as three separate mono audio streams. The speaker is mono, and the headset is combined from two streams to be stereo. I'm outputting this to windows as two streams, one for a device of the user's choice for speaker, and one of another device of the user's choice for headset. For testing, they're both outputting to the default general stereo mix on my system.
I can hear the speaker perfectly fine, but I cannot hear the headset, no matter what I try. They both use the same code path, they both go through a WMF resampler to convert to 2 channel audio at the sample rate Windows wants. But I can hear the speaker, but never the headset stream.
It's not an exclusive mode problem: I'm using shared mode on all streams, and I've even specifically tried cutting down the streams to only the headset, in case one was stomping the other or something, and still the headset has no audio output.
It's not a mixer problem upstream, as I haven't modified any code from when it worked with PortAudio streams. I can see the audio passing through the mixer and to the output via my debug visualizers.
I can see the data going into the buffer I get from the system, when the system calls back to ask for audio. I should be hearing something, static even, but I'm getting nothing. (At one point, I bypassed the ring buffer entirely and put random numbers directly into the buffer in the callback and I still got no sound.)
What am I doing wrong here? It seems like Windows itself is the problem or something, but I don't have the expertise on Windows APIs to know what, and I'm apparently the most expert for this stuff in my company. I haven't even looked yet as to why the microphone input isn't working, and I've been stuck on this for weeks now. If anyone has any suggestions, it'd be much appreciated.
Check the re-sampled streams: output the stereo stream to the speaker, and output the mono stream to the handset.
Use IAudioClient::IsFormatSupported to check supported formats for the handset.
Verify your code using an mp3 file. Use two media players to play different files with different devices simultaneously.

C++ playing audio live from byte array

I am using C++ and have the sample rate, number of channels, and bit depth for my audio. I also have a char array containing the audio that I want to play. I am look for something along the lines of, sending a quarter of a second (or some other short amount of audio) to be played, then sending some more, etc. Is this possible, and if it is how would it be done.
Thanks for any help.
I've done this before with the library OpenAL.
This would require a pretty involved answer and hopefully the OpenAL documentation can walk you through it all, but here is the source example which I wrote that plays audio streaming in from a mumble server in nodejs.
You may need to ask a more specific question to get a better answer as this is a fairly large topic. It may also help to list other technologies you may be using such as target operating system(s) and if you are using any libraries already. Many desktop and game engines already have api's for playing simple sounds and using OpenAL may be much more complex than you really need.
But, briefly, the steps of the solution are:
Enumerate devices
Capture a device
Stream data to device
enqueue audio to buffer alSourceQueueBuffers
play queued buffer alSourcePlay

Audio Recording/Mixer Software

As a guitarist I have always wanted to develop my own recording, mixing software. I have some experience in Direct Sound, Windows Multimedia (waveOutOpen, etc). I realise that this will be a complex project, but is purely for my own use and learning, i.e. no deadlines! I intend to use C++ but as yet am unsure as the best SDK/API to use. I want the software to be extensible as I may wish to add effects in the future. A few prerequisites...
To run on Windows XP
Minimal latency
VU meter (on all tracks)
This caused me to shy away from Direct Sound as there doesn't appear to be a way to read audio data from the primary buffer.
Overdubbing (i.e. record a new track whilst playing existing tracks).
Include a metronome
My initial thoughts are to use WMM and use the waveOutWrite function to play audio data. I guess this is essentially an audio streaming player. To try and keep things simpler, I will hard-code the sample rate to 16-bit, 44.1kHZ (the best sampling rate my sound card supports). What I need are some ideas, guidance on an overall architecture.
For example, assume my tempo is 60 BPM and time signature is 4/4. I want the metronome to play a click at the start of every bar/measure. Now assume that I have recorded a rhythm track. Upon playback I need to orchestrate (pun intended) what data is sent to the primary sound buffer. I may also, at some point, want to add instruments, drums (mainly). Again, I need to know how to send the correct audio data, at the correct time to the primary audio buffer. I appreciate timing is key here. What I am unsure of is how to grab correct data from individual tracks to send to the primary sound buffer.
My initial thoughts are to have a timing thread which periodically asks each track, "I need data to cover N milliseconds of play". Where N depends upon the primary buffer size.
I appreciate that this is a complex question, I just need some guidance as to how I might approach some of the above problems.
An additional question is WMM or DirectSound better suited for my needs. Maybe even ASIO? However, the main question is how, using a streaming mechanism, do I gather the correct track data (from multiple tracks) to send to a primary buffer, and keep minimal latency?
Any help is appreciated,
Many thanks
Karl
Thanks for the responses. However, my main question is how to time all of this, to ensure that each track writes appropriate data to the primary buffer, at the correct time. I am of course open to (free) libraries that will help me achieve my main goals.
As you intend to support XP (which I would not recommend, as even the extended support will end next year) you really have no choice but to use ASIO. The appropriate SDK can be downloaded from Steinberg. In Windows Vista and above WASAPI Exclusive Mode might be a better option due to wider availability, however the documentation is severely lacking IMO. In any case, you should have a look at PortAudio which helps wrap these APIs (and unlike Juce is free.
Neither WMM nor DirectSound nor XAudio 2 will be able to achieve sufficiently low latencies for realtime monitoring. Low-latency APIs usually periodically call a callback for each block of data.
As every callback processes a given number of samples, you can calculate the time from the sample rate and a sample counter (simply accumulate across callback calls). Tip: do not accumulate with floating point. That way lies madness. Use a 64 bit sample counter, as the smallest increment is always 1./sampleRate.
Effectively your callback function would (for each track) call a getSamples(size_t n, float* out) (or similar) method and sum up the results (i.e. mix them). Each individual track could would then have an integrated sample time to compute what is currently required. For periodic things (infinite waves, loops, metronomes) you can easily calculate the number of samples per period and have a modulo counter. That would lead to rounded periods, but as mentioned before, floating point accumulators are a no-no, they can work ok for periodic signals though.
In the case of the metronome example you might have a waveform "click.wav" with n samples and a period of m samples. Your counter periodically goes from 0 to m-1 and as long as the counter is less than n you play the corresponding sample of your waveform. For example a simple metronome that plays a click each beat could look something like this:
class Metronome
{
std::vector<float> waveform;
size_t counter, period;
public:
Metronome(std::vector<float> const & waveform, float bpm, float sampleRate) : waveform(waveform), counter(0)
{
float secondsPerBeat = 60.f/bpm; // bpm/60 = bps
float samplesPerBeat = sampleRate * secondsPerBeat;
period = (size_t)round(samplesPerBeat);
}
void getSamples(size_t n, float* out)
{
while(n--)
{
*out++ = counter < waveform.size() ? waveform[counter] : 0.f;
counter += 1;
counter -= counter >= period ? period : 0;
}
}
};
Furthermore you could check the internet for VST/AU Plugin programming tutorials, as these have the same "problem" of determining time from the number of samples.
As you've discovered, you are entering a world of pain. If you're really building audio software for Windows XP and expect low latency, you'll definitely want to avoid any audio API provided by the operating system, and do as almost all commercial software does and use ASIO. Whilst things got better, ASIO isn't going anyway any time soon.
To ease you pain considerably, I would recommend having a look at Juce, which is a cross-platform framework for building both audio host software and plugins. It's been used to build many commercial products.
They've got many of the really nasty architectural hazards covered, and it comes with examples of both host applications and plug-ins to play with.

How to stream live audio and video while keeping low latency

I'm writing a program similar to StreamMyGame with the difference of the client being free and more importantly, open source, so I can port it to other devices (in my case an OpenPandora), or even make an html5 or flash client.
Because the objective of the program is to stream video games, latency should be reduced to a minimum.
Right now I can capture video of Direct 3D 9 games at a fixed frame rate, encode it using libx264 and dumping it to disk, and send input remotely, but I'm stumped at sending the video and eventually the audio through the network.
I don't want to implement a way just to discover that it introduces several seconds of delay and I don't care how it is done as long as it is done.
Off of my head I can think several ways:
My current way, encode video with libx264 and audio with lame or as ac3 and send them with live555 as a RTSP feed, though the library is not playing nice with MSVC and I’m still trying to understand its functioning.
Have the ffmpeg library do all the grunt work, where it encodes and sends (I guess I'll have to use ffserver to get an idea on how to do it)
Same but using libvlc, perhaps hurting encoding configurability in the process.
Using several pipes with the independent programs (ie: piping data to x264.exe or ffmpeg.exe)
Use other libraries such as pjsip or JRTPLIB that might simplify the process.
The hard way, sending video and audio through an UDP channel and figuring out how to synchronizing everything at the client (though the reason to use RTSP is to avoid this).
Your way, if I didn't think of something.
The second option would really be the best as it would reduce the number of libraries (integrate swscale, libx264, the audio codec and the sender library), simplify the development and bringing more codec variety (CELT looks promising) but I worry about latency as it might have a longer pipeline.
100 ms would already be too much, especially when you consider you might be adding another 150 ms of latency when it is used trough broadband.
Does any of you have experience with these libraries, to recommend me to switch to ffmpeg, keep wrestling live555 or do anything else (even if I didn’t mentioned it)?
I had very good results of streaming large blocks of data with low latency using UDT4 library. But first I would suggest checking ffmpegs network capabilities, so you have a native solution in all operations.

streaming video to and from multiple sources

I wanted to get some ideas one how some of you would approach this problem.
I've got a robot, that is running linux and uses a webcam (with a v4l2 driver) as one of its sensors. I've written a control panel with gtkmm. Both the server and client are written in C++. The server is the robot, client is the "control panel". The image analysis is happening on the robot, and I'd like to stream back the video from the camera to the control panel for two reasons:
A) for fun
B) to overlay image analysis results
So my question is, what are some good ways to stream video from the webcam to the control panel as well as giving priority to the robot code to process it? I'm not interested it writing my own video compression scheme and putting it through the existing networking port, a new network port (dedicated to video data) would be best I think. The second part of the problem is how do I display video in gtkmm? The video data arrives asynchronously and I don't have control over main() in gtkmm so I think that would be tricky.
I'm open to using things like vlc, gstreamer or any other general compression libraries I don't know about.
thanks!
EDIT:
The robot has a 1GHz processor, running a desktop like version of linux, but no X11.
Gstreamer solves nearly all of this for you, with very little effort, and also integrates nicely with the Glib event system. GStreamer includes V4L source plugins, gtk+ output widgets, various filters to resize / encode / decode the video, and best of all, network sink and sources to move the data between machines.
For prototype, you can use the 'gst-launch' tool to assemble video pipelines and test them, then it's fairly simply to create pipelines programatically in your code. Search for 'GStreamer network streaming' to see examples of people doing this with webcams and the like.
I'm not sure about the actual technologies used, but this can end up being a huge synchronization ***** if you want to avoid dropped frames. I was streaming a video to a file and network at the same time. What I eventually ended up doing was using a big circular buffer with three pointers: one write and two read. There were three control threads (and some additional encoding threads): one writing to the buffer which would pause if it reached a point in the buffer not read by both of the others, and two reader threads that would read from the buffer and write to the file/network (and pause if they got ahead of the producer). Since everything was written and read as frames, sync overhead could be kept to a minimum.
My producer was a transcoder (from another file source), but in your case, you may want the camera to produce whole frames in whatever format it normally does and only do the transcoding (with something like ffmpeg) for the server, while the robot processes the image.
Your problem is a bit more complex, though, since the robot needs real-time feedback so can't pause and wait for the streaming server to catch up. So you might want to get frames to the control system as fast as possible and buffer some up in a circular buffer separately for streaming to the "control panel". Certain codecs handle dropped frames better than others, so if the network gets behind you can start overwriting frames at the end of the buffer (taking care they're not being read).
When you say 'a new video port' and then start talking about vlc/gstreaming i'm finding it hard to work out what you want. Obviously these software packages will assist in streaming and compressing via a number of protocols but clearly you'll need a 'network port' not a 'video port' to send the stream.
If what you really mean is sending display output via wireless video/tv feed that's another matter, however you'll need advice from hardware experts rather than software experts on that.
Moving on. I've done plenty of streaming over MMS/UDP protocols and vlc handles it very well (as server and client). However it's designed for desktops and may not be as lightweight as you want. Something like gstreamer, mencoder or ffmpeg on the over hand is going to be better I think. What kind of CPU does the robot have? You'll need a bit of grunt if you're planning real-time compression.
On the client side I think you'll find a number of widgets to handle video in GTK. I would look into that before worrying about interface details.