Process audio packets decoded with ffmpeg - c++

Following my other post, I am wondernig if it is possible to do some process like MFCC extraction on the decoded audio packets. The code I use decode audio and video from mpeg-2 file using ffmpeg. Process on video is done using opencv as this library permits to grab frames on by one. I need to process the corresponding audio samples in the same time.
Thanks.

I've created a C++ audio engine named "Crosstalk".
Although it's referred to as an "audio engine", It's really just a real-time C++ data (floating point) processing engine. Crosstalk allows you to create and route systems in design-time and real-time. Basically, the engine takes care of all the data routing and gives you a simple platform for creating components through which the data gets processed (E.g. your "Audio Feed" component connected in parallel with the "Video Feed" component). As long as your branches are of equal total buffer length, they will be perfectly synchronized.
It's very easy to use. Here's an example of how to configure a system to play an mp3 file (The components used here are provided with the engine):
XtSystem system;
XtMp3Decoder mp3Decoder;
XtAudioDevice audioDevice;
long md = system.addComponent(&mp3Decoder);
long ad = system.addComponent(&audioDevice);
system.connOutToIn(md,0,ad,0);
system.connOutToIn(md,1,ad,1);
mp3Decoder.loadFile("../05 Tchaikovski-Swan Lake-Scene.mp3");
mp3Decoder.play();
You can check out the API documentation and licensing details here: http://www.adaptaudio.com/Crosstalk
EDIT (01-12-2012):
Crosstalk has been replaced by an open-source project called "DSPatch". DSPatch is essentially an upgraded version of the routing engine behind Crosstalk that is no longer limited to only audio processing. DSPatch allows you to create and route almost any type of process chain imaginable, and free for personal AND proprietary use :)

I downloaded your library and I'm playing with it.
Did you perform some kind of performance comparison with other IPC techniques like socket/localhost, message queues, circular buffers for audio streams?
I'm developing a software application that receives a multichannel UDP stream (128 channels), performs FFT on a subset of it, plays one selected channel, visualizes the spectrum of 2 channels and spectrogram of one channel.
Do you think that DSPatch is fast enough to use it?downloaded your library and I'm playing with it.
Did you perform some kind of performance comparison with other IPC techniques like socket/localhost, message queues, circular buffers for audio streams?
I'm developing a software application that receives a multichannel UDP stream (128 channels), performs FFT on a subset of it, plays one selected channel, visualizes the spectrum of 2 channels and spectrogram of one channel.
Do you think that DSPatch is fast enough to use it?

Related

Routing Audio from microphone to network using QT 6.4.x

With QT 6.4.x (Windows), how can I capture microphone audio and repackage it and forward the repackaged audio to a QUdpSocket.
The repackaging will involve changing the captured audio format from its typical 16 bit little endian format and converting to 24 bit big endian format where each packet will have a constant size potentially different size payload to that from the microphone. I am not sure but somehow I think I need to replace the QAudioSink with a QAudioDecoder as the description indicates:
The QAudioDecoder class is a high level class for decoding audio media files. It is similar to the QMediaPlayer class except that audio is provided back through this API rather than routed directly to audio hardware.
I have a partially working example that contains a mixture of sending synthesized audio directly to the speaker. This functionality is based off the 'Audio Output Example' that ships with Qt 6 (my modified example sends a sine wave generated tone to the speakers).
Also in this RtpWorker thread, using the 'Audio Source Example' for inspiration, I was also able to capture and intercept audio packets from the microphone, but I do not know how to send these packets (repackaged per the above) to a UDP socket in a fixed size datagrams, instead I just log the captured packets. I think I need an intermediate circular buffer (the write part of which fills it with captured microphone audio while the read part gets called by a QAudioSink or QAudioDecoder in pull mode).
Per my comment above I think I might need to send them to a QAudioDevice so I can handle the packaging and sending over the network myself.
My code is contained in 2 attachment in the following QTBUG-108383.
It would be great if someone could point to some useful examples that try to do something similar.
try to run Mac OS or Linux its seems Windows bug

embed video stream with custom meta data

I have an optical system that provides a UDP video stream.
From device specification FAQ:
Both single metadata (KLV) stream and compressed video (H.264) with metadata (KLV) are available on Ethernet link. Compressed video and metadata are coupled in the same stream compliant with STANAG 4609 standard. Each encoded video stream is encapsulated with the associated metadata within an MPEG-TS single program stream over Ethernet UDP/IP/ The video and metadata are synchronized through the use of timestamps.
Also there are other devices that provide data about the state of an aircraft (velocity, coords, etc). This data should be displayed on a client GUI display alongside with video. Of course it has to be synchronized with the current video frame.
One of the approaches I though of is to embed this data into the video stream. But I am not sure if it is possible or should I use another (than UDP) protocol for this purpose.
Is it possible/reasonable to use such approach? Is ffmpeg library suitable in this case?
If not, what are the other ways to synchronize data with a video frame.
Latency is crucial. Although bandwidth is limited to 2-5 Mbps.
It seems to be possible using ffmpeg: AVPacket can be provided with additional data using function av_packet_add_side_data which takes a preallocated buffer, size and a type AVPacketSideDataType.
However, I am not sure for now, which enum value of AVPacketSideDataType can be used for custom user-provided binary data.
Something similar that might be used for my needs:
How do I encode KLV packets to an H.264 video using libav*
The quote sounds like you have a transport stream containing two elementary streams (the H.264 video in one, and the KLV data in another). The transport stream is sent over UDP (or TCP, or is just a file, whatever you want - its mostly independent of the transport).
There is a discussion of implementing this kind of thing in the Motion Imagery handbook (which you can download from the MISB part of the NSG Registry at https://nsgreg.nga.mil/misb.jsp - its towards the bottom of the Non-cited Standards Documents table) and in detail in ST 1402 (which you can find in the same table). I'm avoiding providing direct links because the versions change - just look for whatever is current.
The short version is that you can embed the timestamp in the video (see ST 0603 and ST 0604), and then correlate that to the metadata timestamp (Precision Time Stamp, see ST 0601). You don't want to do that at the AVPacket level though. Instead, you need to put side data into AVFrame, with the AV_FRAME_DATA_SEI_UNREGISTERED key (https://ffmpeg.org/doxygen/trunk/group__lavu__frame.html#ggae01fa7e427274293aacdf2adc17076bca4f2dcaee18e5ffed8ff4ab1cc3b326aa). You will need a fairly recent FFmpeg version.
Note: if all you want to do is see the UDP data stream - video on one side, and decoded KLV on the other, then you might like to check out the jMISB Viewer application: https://github.com/WestRidgeSystems/jmisb
It also provides an example of encoding (generator example). Disclaimer: I contribute to the project.

real time audio processing in C++

I want to produce software that reads raw audio from an external audio interface (Focusrite Scarlett 2i2) and processes it in C++ before returning it to the interface for playback. I currently run Windows 8 and was wondering how to do this with minimum latency?
I've spent a while looking into (boost) ASIO but the documentation seems fairly poor. I've also been considering OpenCL but I've been told it would most likely have higher latency. Ideally I'd like to be able to just access the Focusrite driver directly.
I'm sorry that this is such an open question but I've been having some trouble finding educational materiel on Audio Programming, other than just manipulating the audio when provided by a third party plug in design suite such as RackAFX. I'd also be grateful if anyone could recommend some reading on low level stuff like this.
You can get very low latency by communicating directly with the Focuswrite ASIO driver (this is totally different than boost ASIO). To work with this you'll need to register and download the ASIO SDK from Steinberg. Within the API download there is a Visual C++ sample project called hostsample which is a good starting point and there is pretty good documentation about the buffering process that is used by ASIO.
ASIO uses double buffering. Your application is able to choose a buffer size within the limits of the driver. For each input channel and each output channel, 2 buffers of that size are created. While the driver is playing from and recording to one set of buffers your program is reading from and writing to the other set. If your program was performing a simple loopback then it would have access to the input 1 buffer period after it was recorded, would write directly to the output buffer which would be played out on the next period so there would be 2 buffer periods of latency. You'll need to experiment to find the smallest buffer size you can tolerate without glitches and this will give you the lowest latency. And of course the signal processing code will need to be optimized well enough to keep up. A 64 sample (1.3 ms # 48kHz) is not unheard of.

Portable library to play samples on individual 5.1 channels with C/C++?

I'm looking for a free, portable C or C++ library which allows me to play mono sound samples on specific channels in a 5.1 setup. For example the sound should be played with the left front speaker whereby all other speakers remain silent. Is there any library capable of doing this?
I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
Any hints are welcome!
I had a look at OpenAL. However, I can only specify the position from which the sound should come, but it seems to me that I cannot say something like "use only the front left channel to play this sound".
I don't think this is quite true. I think you can do it with OpenAL, although it's not trivial. OpenAL only does the positional stuff if you feed it mono format data. If you give it stereo or higher, it plays the data the way it was provided. However, you're only guaranteed stereo support. You'll need to check to see if the 5.1 channel format extension is available on your system (AL_FORMAT_51CHN16). If so, then, I think that you feed your sound to the channel you want and feed zeroes to all the others channels when you buffer the samples. Note that you need hardware support for this on the sound card. A "generic software" device won't cut it.
See this discussion from the OpenAL mailing list.
Alternatively, I think that PortAudio is Open, cross-platform, and supports multiple channel output. You do still have to interleave the data so that if you're sending a sound to a single channel, you have to send zeroes to all the others. You'll also still need to do some checking when opening a stream on a device to make sure the device supports 6 channels of output.
A long time ago I used RTAudio. But I cannot say if this lib can do what you want to archive, but maybe it helps.
http://fmod.org could do the trick too
I use the BASS Audio Library http://www.un4seen.com for all my audio, sound and music projects. I am very happy with it.
BASS is an audio library to provide developers with powerful and efficient sample, stream (MP3, MP2, MP1, OGG, WAV, AIFF, custom generated, and more via add-ons), MOD music (XM, IT, S3M, MOD, MTM, UMX), MO3 music (MP3/OGG compressed MODs), and recording functions. All in a tiny DLL, under 100KB* in size. C/C++, Delphi, Visual Basic, MASM, .Net and other APIs are available. BASS is available for the Windows, Mac, Win64, WinCE, Linux, and iOS platforms.
I have never used it to play different samples in a 5.1 configuration. But, according their own documentation, it should be possible.
Main features
Samples Support for WAV/AIFF/MP3/MP2/MP1/OGG and custom generated samples
Sample streams Stream any sample data in 8/16/32 bit, with both "push" and "pull" systems. File streams MP3/MP2/MP1/OGG/WAV/AIFF file streaming. Internet file streaming. Stream data from HTTP and FTP servers (inc. Shoutcast, Icecast & Icecast2), with IDN and proxy server support and adjustable buffering. ** Custom file streaming ** Stream data from anywhere using any delivery method, with both "push" and "pull" systems
Multi-channel Support for more than plain stereo, including multi-channel OGG/WAV/AIFF files
...
Multiple outputs Simultaneously use multiple soundcards, and move channels between them
Speaker assignment Assign streams and MOD musics to specific speakers to take advantage of hardware capable of more than plain stereo (up to 4 separate stereo outputs with a 7.1 soundcard)
3D sound Play samples/streams/musics in any 3D position
Licensing
BASS is free for non-commercial use. If you are a non-commercial entity (eg. an individual) and you are not making any money from your product (through sales, advertising, etc), then you can use BASS in it for free. Otherwise, one of the following licences will be required.

streaming video to and from multiple sources

I wanted to get some ideas one how some of you would approach this problem.
I've got a robot, that is running linux and uses a webcam (with a v4l2 driver) as one of its sensors. I've written a control panel with gtkmm. Both the server and client are written in C++. The server is the robot, client is the "control panel". The image analysis is happening on the robot, and I'd like to stream back the video from the camera to the control panel for two reasons:
A) for fun
B) to overlay image analysis results
So my question is, what are some good ways to stream video from the webcam to the control panel as well as giving priority to the robot code to process it? I'm not interested it writing my own video compression scheme and putting it through the existing networking port, a new network port (dedicated to video data) would be best I think. The second part of the problem is how do I display video in gtkmm? The video data arrives asynchronously and I don't have control over main() in gtkmm so I think that would be tricky.
I'm open to using things like vlc, gstreamer or any other general compression libraries I don't know about.
thanks!
EDIT:
The robot has a 1GHz processor, running a desktop like version of linux, but no X11.
Gstreamer solves nearly all of this for you, with very little effort, and also integrates nicely with the Glib event system. GStreamer includes V4L source plugins, gtk+ output widgets, various filters to resize / encode / decode the video, and best of all, network sink and sources to move the data between machines.
For prototype, you can use the 'gst-launch' tool to assemble video pipelines and test them, then it's fairly simply to create pipelines programatically in your code. Search for 'GStreamer network streaming' to see examples of people doing this with webcams and the like.
I'm not sure about the actual technologies used, but this can end up being a huge synchronization ***** if you want to avoid dropped frames. I was streaming a video to a file and network at the same time. What I eventually ended up doing was using a big circular buffer with three pointers: one write and two read. There were three control threads (and some additional encoding threads): one writing to the buffer which would pause if it reached a point in the buffer not read by both of the others, and two reader threads that would read from the buffer and write to the file/network (and pause if they got ahead of the producer). Since everything was written and read as frames, sync overhead could be kept to a minimum.
My producer was a transcoder (from another file source), but in your case, you may want the camera to produce whole frames in whatever format it normally does and only do the transcoding (with something like ffmpeg) for the server, while the robot processes the image.
Your problem is a bit more complex, though, since the robot needs real-time feedback so can't pause and wait for the streaming server to catch up. So you might want to get frames to the control system as fast as possible and buffer some up in a circular buffer separately for streaming to the "control panel". Certain codecs handle dropped frames better than others, so if the network gets behind you can start overwriting frames at the end of the buffer (taking care they're not being read).
When you say 'a new video port' and then start talking about vlc/gstreaming i'm finding it hard to work out what you want. Obviously these software packages will assist in streaming and compressing via a number of protocols but clearly you'll need a 'network port' not a 'video port' to send the stream.
If what you really mean is sending display output via wireless video/tv feed that's another matter, however you'll need advice from hardware experts rather than software experts on that.
Moving on. I've done plenty of streaming over MMS/UDP protocols and vlc handles it very well (as server and client). However it's designed for desktops and may not be as lightweight as you want. Something like gstreamer, mencoder or ffmpeg on the over hand is going to be better I think. What kind of CPU does the robot have? You'll need a bit of grunt if you're planning real-time compression.
On the client side I think you'll find a number of widgets to handle video in GTK. I would look into that before worrying about interface details.