I'm new to Qt's multimedia library and in my application I want to mix audio from multiple input devices (e.g. microphone), in order to stream it via TCP.
As far as I know I have to obtain the specific QAudioDeviceInfo for all needed devices first - together with an according QAudioFormat object - and use this with QAudioInput. Then I simply call start() for every created QAudioInput object and read out pending bytes with readLine().
But how can I mix audio data of multiple devices to one buffer?
I am not sure if there is any Qt specific method / class to do this. However it's pretty simple to do it yourself.
The most basic way (assuming you are using PCM), you can simply add the two streams/buffers together word by word (if I recall they are 16-bit PCM words).
So if you have two input buffers:
int16 buff1[10];
int16 buff2[10];
int16 mixBuff[10];
// Fill them...
//... code goes here to read from the buffers ....
// Add them (effectively mix them)
for (int i = 0; i < 10; i++)
{
mixBuff[i] = buff1[i] + buff2[i];
}
Now, this is very crude and does not take any scaling into consideration. So imagine buff1 and buff2 both use 80% of the dynamic range (call this full volume, beyond which you get distortion), then when you add them together you will get number overrun (i.e. 16-bit max is 65535 so 50000 + 50000 will be a over run).
Each time you mix you effectively need half the two inputs (so 65535 / 2 + 65535 / 2 = 65535... i.e. when you add them up you can't overrun). So your mix code is like this:
for (int i = 0; i < 10; i++)
{
mixBuff[i] = (buff1[i] >> 1) + (buff2[i] >> 1);
}
There is much more you can do (noise removal etc...) but then the maths start getting a bit hairy. This is very simple. You can use the shift afterwards to increase / decrease volume as simple volume control if you want.
EDIT
One thing to note... you are using readline() (which the docs say reads the data out as ASCII). I always use read() which it does not state the "format" it is read out, but I am assuming binary. So this code may not work if you use readline() but I have never tried it. It works well for read(), you don't really want to be working in ASCII if you want to manipulate the data.
Related
I'm trying to use Google Oboe for a 3D audio processing app due to it's low latency. The app will have a C++ backend, which does the processing, and the frontend is done with Flutter. I'm running a couple of tests to see if it'll work but I'm having issues loading assets from Flutter to Oboe. I checked the example RhythmGame in Oboe's repo, done with Java, but couldn't quiet find a way of doing that straight from Dart to C++.
The connection between front and backend is through dart::ffi
Here's what I've tried so far. Based on the example published by Richard Heap here, I changed the noise variable from just a sine wave to a short fragment of a song in a wav file:
class _MyAppState extends State<MyApp> {
final stream = OboeStream();
var noise = Float32List(512);
Timer t;
#override
void initState() {
super.initState();
// for (var i = 0; i < noise.length; i++) {
// noise[i] = sin(8 * pi * i / noise.length);
// }
_loadSound();
}
void _loadSound() async {
final ByteData data = await rootBundle.load('assets/song_cut.wav');
noise = data.buffer.asFloat32List();
}
(...)
Then this function in Dart calls the Dart wrapper of the native library:
void start() {
stream.start();
var interval = (512000 / stream.getSampleRate()).floor() + 1;
t = Timer.periodic(Duration(milliseconds: interval), (_) {
stream.write(noise);
});
}
The wrapper in Dart is:
void write(Float32List original) {
var length = original.length;
var copy = allocate<Float>(count: length)
..asTypedList(length).setAll(0, original);
FfiGoogleOboe()._streamWrite(_nativeInstance, copy, length);
free(copy);
}
_streamWrite is the native function in C++:
EXTERNC void stream_write(void* ptr, void* data, int32_t size) {
auto stream = static_cast<OboeFfiStream*>(ptr);
auto dataToWrite = static_cast<float*>(data);
stream->write(dataToWrite, size);
}
void OboeFfiStream::write(float *data, int32_t size) {
managedStream->write(data, size, 1000000);
}
Now I can hear the song but it comes out with too much distortion. When trying with the sine I could hear it too, but it also had some distortion. I'm not yet using the callback mode in Oboe, since I wanted to try if this worked first.
1 - what format is your WAV file in? Is it 32 bit floats? Don't forget that WAV files have a header, so you should discard the first few tens of bytes (up to the data segment). Be sure that you start reading the audio data on a float boundary (which may not be a multiple of 4 if the header isn't). If necessary, just use a hex editor to ascertain the offset of the float data and start reading there. Or, truncate the header and rename your asset to song_cut.raw. Audacity should be able to produce a header-less raw audio file.
2 - What sample rate is your audio clip recorded at? Does that match the sample rate of the device? (Note that iOS devices are normally 44.1k, but Android devices are frequently 48k. When using an Android emulator on macOS, who knows what the reported sample rate will be! Expect pitch distortion if your rates don't match - or use a resampler. I think Oboe has one. Alternatively, the sample repo associated with the talk contains one you can use.)
3 - note that the timer interval is finely tuned (for demo purposes) to the approximate time taken to deliver 512 samples at the sound card rate. This might be ok for demos, but isn't for real life. Also, your wav file probably doesn't have exactly 512 samples in it. Either adjust your audio loop to 512 samples, or adjust the 512000 constant to match the number of samples in your loop.
4a - You aren't using the callback method yet, but you probably should as soon as possible. One method I've had success with is to use a lock-free circular buffer. The Oboe callback tries to empty the buffer, while the Dart timer routine tries to fill it. The bigger the buffer the less chance there is of an underflow, but the worse the latency.
4b - The ideal solution would be to have the Oboe callback call up into Dart, but I haven't found a way to do that as C->Dart calls must be on the main Dart thread, but the Oboe callbacks are surely on a high-priority IO thread.
I have a question regarding buffering in between blocks in GNU Radio. I know that each block in GNU (including custom blocks) have buffers to store items that are going to be sent or received items. In my project, there is a certain sequence I have to maintain to synchronize events between blocks. I am using GNU radio on the Xilinx ZC706 FPGA platform with the FMCOMMS5.
In the GNU radio companion I created a custom block that controls a GPIO Output port on the board. In addition, I have an independent source block that is feeding information into the FMCOMMS GNU block. The sequence I am trying to maintain is that, in GNU radio, I first send data to the FMCOMMS block, second I want to make sure that the data got consumed by the FMCOMMS block (essentially by checking buffer), then finally I want to control the GPIO output.
From my observations, the source block buffer doesn’t seem to send the items until it’s full. This will cause a major issue in my project because this means that the GPIO data will be sent before or in parallel with sending the items to the other GNU blocks. That’s because I’m setting the GPIO value through direct access to its address in the ‘work’ function of my custom block.
I tried to use pc_output_buffers_full() in the ‘work’ function of my custom source in order to monitor the buffer, but I’m always getting 0.00. I’m not sure if it’s supposed to be used in custom blocks or if the ‘buffer’ in this case is something different from where the output items are stored. Here's a small code snippet which shows the problem:
char level_count = 0, level_val = 1;
vector<float> buff (1, 0.0000);
for(int i=0; i< noutput_items; i++)
{
if(level_count < 20 && i< noutput_items)
{
out[i] = gr_complex((float)level_val,0);
level_count++;
}
else if(i<noutput_items)
{
level_count = 0;
level_val ^=1;
out[i] = gr_complex((float)level_val,0);
}
buff = pc_output_buffers_full();
for (int n = 0; n < buff.size(); n++)
cout << fixed << setw(5) << setprecision(2) << setfill('0') << buff[n] << " ";
cout << "\n";
}
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent? Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
GNU Radio Companion version: 3.7.8
OS: Linaro 14.04 image running on the FPGA
Or is there a way to make sure that the each single output item is being sent like a continuous stream to the next block(s)?
Nope, that's not how GNU Radio works (at all!):
A while back I wrote an article that explains how GNU Radio deals with buffers, and what these actually are. While the in-memory architecture of GNU Radio buffers might be of lesser interest to you, let me quickly summarize the dynamics of it:
The buffers that (general_)work functions are called with behave for all that's practical like linearly addressable ring buffers. You get a random number of samples at once (restrictable to minimum numbers, multiples of numbers), and all that you not consume will be handed to you the next time work is called.
These buffers hence keep track of how much you've consumed, and thus, how much free space is in a buffer.
The input buffer a block sees is actually the output buffer of the "upstream" block in the flow graph.
GNU Radio's computation is backpressure-controlled: Any block's work method will immediately be called in an endless loop given that:
There's enough input for the block to do work,
There's enough output buffer space to write to.
Therefore, as soon as one block finishes its work call, the upstream block is informed that there's new free output space, thus typically leading to it running
That leads to high parallelity, since even adjacent blocks can run simultaneously without conflicting
This architecture favors large chunks of input items, especially for blocks that take a relative long time to computer: while the block is still working, its input buffer is already being filled with chunks of samples; when it's finished, chances are it's immediately called again with all the available input buffer being already filled with new samples.
This architecture is asynchronous: even if two blocks are "parallel" in your flow graph, there's no defined temporal relation between the numbers of items they produce.
I'm not even convinced switching GPIOs at times based on the speed computation in this completely non-deterministic timing data flow graph model is a good idea to start with. Maybe you'd rather want to calculate "timestamps" at which GPIOs should be switched, and send (timestamp, gpio state) command tuples to some entity in your FPGA that keeps absolute time? On the scale of radio propagation and high-rate signal processing, CPU timing is really inaccurate, and you should use the fact that you have an FPGA to actually implement deterministic timing, and use the software running on the CPU (i.e. GNU Radio) to determine when that should happen.
Is there a way to monitor the buffer so that I can determine when my first part of data bits have been sent?
Other than that, a method to asynchronously tell another another block that, yes, you've processed N samples, would be either to have a single block that just observes the outputs of both blocks that you want to synchronize and consumes an identical number of samples from both inputs, or to implement something using message passing. Again, my suspicion is that this is not a solution to your actual problem.
I'm working on software for processing audio in real time in C++ with Qt. I need that requirements are minimized.
Defining a temporary buffer 40ms, launching our device with a sampling frequency Fs = 8000Hz, every 320 samples entered a feature called Data Processing ().
The idea is to have a global buffer that stores the 10s last recorded, 80000 samples.
This Buffer in each iteration eliminates the initial 320 samples and looped at the end, 320 new samples. Thus the buffer is updated and the user can observe the real-time graphical representation of the recorded signal.
At first I thought of using QVector (equivalent to std::vector but for Qt) for this deployment, thus we reduce the process a few lines of code
int NUM_POINTS=320;
DatosTemporales.erase(DatosTemporales.begin(),DatosTemporales.begin()+NUM_POINTS);
DatosTemporales+= (DatosNuevos); // Datos Nuevos con un tamaño de NUM_POINTS
In each iteration we create a vector of 80000 samples in addition to free some positions so requires some processing time. An alternative for opting was the use of * double, and iterations a loop:
for(int i=0;i<80000;i++){
if(i<80000-NUM_POINTS){
aux=DatosTemporales[i];
DatosTemporales[i+NUM_POINTS]=aux;
}else{
DatosTemporales[i]=DatosNuevos[i-NUN_POINTS];
}
}
Does fails. I think the best way is to use dynamic memory. Implementing this process by pointers. Could anyone give me some idea how to implement it?
It sounds like what you are looking for is a circular buffer.
https://www.google.com/search?q=qcircularbuffer
https://qt.gitorious.org/qt/qtbase/merge_requests/60
And it looks like you only need the header file and you should be good to go.
A similar tool that is already in the Qt data set is found here:
http://doc.qt.io/qt-5/qcontiguouscache.html#details
The advantage of using a system like these presented, is that they don't need to have dynamic memory, it just needs to move the head and the tail pointers.
Hope that helps.
I have raw 16bit 48khz pcm data. I need to strip all data which is out of the range of human hearing.
For now I'm just doing a sum of all samples and then dividing by the sample count to calculate peak sound level, but I need to reduce false positives.
I have big peak level all the time, speaking and other sounds which I can hear increasing levels just a little, so I need to implement some filtering. I am not familiar with sound processing at all, so currently I am not using any filtering because I do not understand how to create it. My current code looks like this:
for(size_t i = 0; i < buffer.size(); i++)
level += abs(buffer[i]);
level /= buffer.size();
How can I implement this kind of filtering using C++?
Use a band pass filter.
A band-pass filter is a device that passes frequencies within a
certain range and rejects (attenuates) frequencies outside that range.
This sounds like exactly the sort of filter you are looking for.
I had a quick google search and found this thread that discusses implementation in C++.
It sounds like you want to do something (maybe start recording) if the sound level goes above a certain threshold. This is sometimes called a "gate". It also sounds like you are having trouble with false positives. This is sometimes handled with a "side-chain" applied to the gate.
The general principle of a gate is create an envelope of your signal, and then monitor the envelope to discover when it goes above a certain threshold. If it is above the threshold, your gate is "on", if not, your gate is "off". If you treat your signal before creating the envelope in some way to make it more or less sensitive to various parts of your signal/noise the treatment is called a "side-chain".
You will have to discover the details on your own because there is too much for a Q&A website, but maybe this is enough of a start:
float[] buffer; //defined elsewhere
float HOLD = .9999 ; //there are precise ways to compute this, but experimentation might work fine
float THRESH = .7 ; //or whatever
float env = 0; //we initialize to 0, but in real code be sure to save this between runs
for(size_t i = 0; i < buffer.size(); i++) {
// side-chain, if used, goes here
float b = buffer[i];
// create envelope:
float tmp = abs(b); // you could also do buffer[i] * buffer[i]
env = env * HOLD + tmp * (1-HOLD);
// threshold detection
if( env > THRESH ) {
//gate is "on"
} else {
//gate is "off"
}
}
The side-chain might consist of filters like an eq. Here is a tutorial on designing audio eq: http://blog.bjornroche.com/2012/08/basic-audio-eqs.html
Can someone explain how snd_pcm_writei
snd_pcm_sframes_t snd_pcm_writei(snd_pcm_t *pcm, const void *buffer,
snd_pcm_uframes_t size)
works?
I have used it like so:
for (int i = 0; i < 1; i++) {
f = snd_pcm_writei(handle, buffer, frames);
...
}
Full source code at http://pastebin.com/m2f28b578
Does this mean, that I shouldn't give snd_pcm_writei() the number of
all the frames in buffer, but only
sample_rate * latency = frames
?
So if I e.g. have:
sample_rate = 44100
latency = 0.5 [s]
all_frames = 100000
The number of frames that I should give to snd_pcm_writei() would be
sample_rate * latency = frames
44100*0.5 = 22050
and the number of iterations the for-loop should be?:
(int) 100000/22050 = 4; with frames=22050
and one extra, but only with
100000 mod 22050 = 11800
frames?
Is that how it works?
Louise
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html#gf13067c0ebde29118ca05af76e5b17a9
frames should be the number of frames (samples) you want to write from the buffer. Your system's sound driver will start transferring those samples to the sound card right away, and they will be played at a constant rate.
The latency is introduced in several places. There's latency from the data buffered by the driver while waiting to be transferred to the card. There's at least one buffer full of data that's being transferred to the card at any given moment, and there's buffering on the application side, which is what you seem to be concerned about.
To reduce latency on the application side you need to write the smallest buffer that will work for you. If your application performs a DSP task, that's typically one window's worth of data.
There's no advantage in writing small buffers in a loop - just go ahead and write everything in one go - but there's an important point to understand: to minimize latency, your application should write to the driver no faster than the driver is writing data to the sound card, or you'll end up piling up more data and accumulating more and more latency.
For a design that makes producing data in lockstep with the sound driver relatively easy, look at jack (http://jackaudio.org/) which is based on registering a callback function with the sound playback engine. In fact, you're probably just better off using jack instead of trying to do it yourself if you're really concerned about latency.
I think the reason for the "premature" device closure is that you need to call snd_pcm_drain(handle); prior to snd_pcm_close(handle); to ensure that all data is played before the device is closed.
I did some testing to determine why snd_pcm_writei() didn't seem to work for me using several examples I found in the ALSA tutorials and what I concluded was that the simple examples were doing a snd_pcm_close () before the sound device could play the complete stream sent it to it.
I set the rate to 11025, used a 128 byte random buffer, and for looped snd_pcm_writei() for 11025/128 for each second of sound. Two seconds required 86*2 calls snd_pcm_write() to get two seconds of sound.
In order to give the device sufficient time to convert the data to audio, I put used a for loop after the snd_pcm_writei() loop to delay execution of the snd_pcm_close() function.
After testing, I had to conclude that the sample code didn't supply enough samples to overcome the device latency before the snd_pcm_close function was called which implies that the close function has less latency than the snd_pcm_write() function.
If the ALSA driver's start threshold is not set properly (if in your case it is about 2s), then you will need to call snd_pcm_start() to start the data rendering immediately after snd_pcm_writei().
Or you may set appropriate threshold in the SW params of ALSA device.
ref:
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m.html
http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___s_w___params.html