Capture default audio stream with ALSA in C++ - c++

I am doing a fun project to change Philips Hue bulb lights color based on the sound that is coming from the default ALSA device.
I want to write small C++ program that captures and analyzes default audio stream and split it into 3 changes low, mid, and high, then assing those channels to red, green, and blue.
I am trying to read how to create ALSA devices but I am struggling to figure out and Google how to capture streams with ALSA. This is the first time I work with Audio and ALSA. I am trying to avoid using python for now as I want to learn a bit more.
If you believe that it is not worth writing this on C++ I will do it in python.

This answer is broken into two parts. The first part discusses how to take the audio data and use it to represent LED "bits" for use in LED brightness setting. The second part discusses how to use C++ to read audio data from the ALSA sound card.
Part 1
An idea for splitting into RGB, you could work out how to convert the audio samples into 24 bit representation in a "perceptual manner". As we hear nonlinearly, you probably want to take the logarithm of the audio data. Because the audio data is both positive and negative, you probably want to do this on its absolute value. Finally for each buffer read from the ADC audio input, you probably want to take the RMS first (which will handle doing the absolute value for you).
So the steps in processing would be :
Capture the audio buffer
Take the RMS for each column of the audio buffer (each column is an audio channel).
Take the logarithm of the RMS value for each column.
Work out how to map each channel's log(RMS) value onto the LEDs. One idea is to use log base 2 (log2) of the RMS of the audio data as that will give you 32 bits of data, which you can divide down (rotate by 8 : log2(RMS) << 8) to get a 24 bit representation. Then work out how to map these 24 bits onto LEDs to achieve your aim.
For example in pseudo code :
float loudness=log2(RMS(buffer);
if (loudness)>pow(2.,16.))
setTheRedLED(loudness/pow(2.,16.));
else if (loudness)>pow(2.,8.))
setTheBlueLED(loudness/pow(2.,8.));
else
setTheGreenLED(loudness);
Part 2
You can use gtkiostream to implement C++ classes for handling audio with ALSA.
For example this ALSA::Capture class allows you to capture audio for processing.
To use it you include it into your code :
#include "ALSA/ALSA.H"
using namespace ALSA;
Then you can stream in audio to a matrix (matrix columns are audio channels). First however you instantiate the class in your C++ code :
Capture capture("hw:0"); // to open the device hw:0 you could use "default" or another device
// you can now reset params if you don't want to use the default, see here : https://github.com/flatmax/gtkiostream/blob/master/applications/ALSACapture.C#L82
capture.setParams(); // set the parameters
if (!capture.prepared()){
cout<<"should be prepared, but isn't"<<endl;
return -1;
}
// now define your audio buffer you want to use for signal processing
Eigen::Array<int, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> buffer(latency, chCnt);
// start capturing
if ((res=capture.start())<0) // start the device capturing
ALSADebug().evaluateError(res);
cout<<"format "<<capture.getFormatName(format)<<endl;
cout<<"channels "<<capture.getChannels()<<endl;
cout<<"period size "<<pSize<<endl;
// now do an infinite loop capturing audio and then processing it to do what you want.
while (true){
capture>>buffer; // capture the audio to the buffer
// do something with the audio in the buffer to separate out for red blue and green
}
A more complete capture example is available here.

Related

Sample audio from microphone to array of integers (C++/Qt)

I am developing an app that records audio from microphone to integer array. Array is then passed to FFT and MFCC. I need to make frames about n samples and I need to 50% overlaps them (they cannot be side by side). So I need 3 buffers - when the first is full is passed to FFT. I this moment app should record second half of second buffer and first half of third buffer. FFT will be separate thread (my idea).
So I tried to sample audio using QAudioRecorder and QAudioProbe. I connected audioBufferProbed signal with processBuffer and there i use buffer.constData<int>(). I seems it works. I understood audioBufferProbed is emitted when buffer is full.
I don't know how to associate more buffers with one recorder. Or start writing to second buffer too at half of first buffer.
The easiest way here (since 50% is a nice round number) is to ask Qt for frames half the size of the ones you need for your FFT. You now run your FFT over frames 0&1, 1&2, 2&3, etc.

How to insert a key frame(Iframe) to a h.264 video stream in ffmpeg C++ api?

I have a real time video stream, and want to cut some video clips from it by accurate timestamp(pts).
When I receiver an avpacket, I decode it, and do something and cache the avpacket. I don't want to re-encode all avpackets, it cost cpu resource.
There are many gop structure in H.264 stream, usually we should cut the video begin at the key frame, and end at the key frame. Otherwise the front some frames in the video clip would display error.
Now I use av_write_frame to make avpacket to video. But sometimes the length of gop is very long, such as it could be 250, 8.3s(30 frame per second). It means the distance between two I-frame could be 250 frames. The video clip is short, I don't want to add too many unused frames.
How should I do? I think i should insert a i-frame at the start position of video clip. Could I change a p-frame to i-frame?
Thanks your reading!
This is not possible in the generic case, but may be in specific cases. Even then, there are no open source/free tools to do this, and I am unaware of any commercial tools. The reason I say it is not possible in the generic case is each frame can reference up to 16 other frames. So you can not just replace a single frame, You will need to replace all referenced frames. Doing this will likely take almost as much CPU as encoding the whole GOP.

Can the mp3 or wav file format take advantage of repetitious sounds?

I want to store a number of sound fragments as MP3 or WAV files, but these fragments are each highly repetitive (a 10 second burst of tone for example). Are the MP3 or WAV file formats able to take advantage of this - i.e. is there a sound file equivalent of run-length encoding?
No, neither codec can do this.
WAV files (typically) use PCM, which holds a value for every single sample. Even if there were complete digital silence (all values the same), every sample is stored.
MP3 works in frames of 1,152 samples. Each frame stands alone (well, there is the bit reservoir but for the purpose of encoding/decoding, this is just extra bandwidth made available). Even if there were a way to say do-this-n-times, it would be fixed within a frame. Now, if you are using MP3 with variable bit rate, I suspect that you will have great results with perfect sine waves since they have no harmonics. MP3 works by converting from the time domain to the frequency domain. That is, it samples the frequencies in each frame. If you only have one of those frequencies (or no sound at all), the VBR method would be efficient.
I should note that FLAC does use RLE when encoding silence. However, I don't think FLAC could be hacked to use RLE for 10 seconds of audio, since again there is a frame border. FLAC's RLE for silence is problematic for live internet radio stations that leave a few second gap inbetween songs. It's important for these stations to have a large buffer, since clients will often pause the stream if they don't receive enough data. (They do get caught back up again though as soon as that silent block is sent, once audio resumes.)

WASAPI lagging playback

I'm writing a program to windows store in c++ which plays back the microphone. I have to modify the bits before sending that to the speakers. Firstly I wanted to play back the microphone without any effect bit it is lagging. The frequency and the bit rate is the same (24 bit, 192000Hz) but I also tried with (24 bit, 96000Hz). I debugged it and it seems that the speaker is faster therefore it has to wait for the data from the microphone like the squeakers would work in a higher frequency but according to the settings it doesn't. Dose anyone have a sightliest idea what is the problem here?
When you say that there are some 'lag', do you mean that there are some delay between when you feed the audio capture device with data and when the playback device renders the data or do you mean that the audio stream is 'chopped' with small pauses in between each sample being rendered?
If there's delay in playback I would take a look at with what latency value you've initialized the audio capture client.
If there are small pauses then I would recommend you using double buffering of sample data so that one buffer is being rendered while the other is being re-fetched from the audio capture device.

MPEG backwards frame decoding using FFmpeg

I have so-called "block's" that stores some of MPEG4 (I,p,p,p,p...) frames.
For every "block", frame starts with an "I" frame, and ends before next "I" frame.
(VOL - "visual_object_sequence_start_code" is allways included before the "I" frame)
I need to be able to play those "block" frames in "backwards" mode.
The thick is that:
It's not possible to just take the last frame in my block and perform decoding, because it's a "P" frame and it needs an "inter frame (I)" to be correctly decoded.
I can't just take my first "I" frame, then pass it to the ffmpeg's "avcodec_decode_video" function and only then pass my last "P" frame to the ffmpeg, because that last "P" frame depends on the "P" frame before it, right? (well.. as far as I've tested this method, my last decoded P frame had artifacts)
Now the way I'm performing backwards playing is - first decoding all of my "block" frames in RGB and store them in memory. (in most cases it would be ~25 frames per block max.) But this method really requires a lot of memory... (especially if frames resolutions is high)
And I have a feeling that this is not the right way to do this...
So I would like to ask, does any one have any suggestions how this "backwards" frame decoding/playing could be performed using FFmpeg?
Thanks
What you are looking at really a research problem: To get a glimps of the overall approach, look at the following paper:
Compressed-Domain Reverse Play of MPEG Video Streams, SPIE International Symposium on Voice, Video, and Data Communications, Boston, MA, November, 1998.
Reverse-play algorithm for MPEG video streaming
MANIPULATING TEMPORAL DEPENDENCIES IN COMPRESSED VIDEO DATA WITH APPLICATIONS TO COMPRESSED-DOMAIN PROCESSING OF MPEG VIDEO.
Essentially, there is still advanced encoding based on key frames, however, you can reverse the process of motion compensation to achieve the reverse flow. This is done by on the fly conversion of P frames into I frames. This does require looking forward but doesn't require that much more memory. Possibly you can save this as a new file and then apply it to standard decoder with reverse play requirements.
However, this is very complex, and i have seen rare softwares doing this practically.
I do not think there is a way around starting from I-frame and decoding all P-frames, due to P-frame depending on the previous frame. To handle decoded frames, they can be saved to a file, or, with limited storage and extra CPU power, older P-frames can be discarded and recomputed later.
At the command level, you can convert input video to a series of images:
ffmpeg -i input_video output%4d.jpg
then reverse their order somehow and convert back to a video:
ffmpeg -r FRAME_RATE -i reverse_output%4d.jpg output_video
You may consider pre-processing, if it is an option.