How to mix voice audio

How to mix voice audio - c++

I am currently developing a simple VoIP project where multiple clients send out his voice to a server and later the server will mix up those voices together.
However, I can't mix it directly by using simple mathematic addition. Each cycle, a client will send 3584 Bytes voice data to the mixer.
Below is the snippet of the value contained in a receiver buffer:
BYTE buffer[3584];
[0] 0 unsigned char
[1] 192 'À' unsigned char
[2] 176 '°' unsigned char
[3] 61 '=' unsigned char
[4] 0 unsigned char
[5] 80 'P' unsigned char
[6] 172 '¬' unsigned char
[7] 61 '=' unsigned char
[8] 0 unsigned char
[9] 144 '' unsigned char
[10] 183 '·' unsigned char
[11] 61 '=' unsigned char
.
.
.
I'm not so sure how the pattern inside the buffer is generated in that way from a client side but I'm thinking it may be a wave pattern. Now let say I have another similar data like this, how do I mix the voice together.
Please help. Thank you.

You need to find out if your VoIP system uses compression. It probably does, in which case the first thing you need to do is to decompress the streams, then mix them, then recompress.

This is probably an array of floats (unlikely due to the byte pattern presented) or singed integers if it's raw PCM data so try using it as such. Mixing to PCM streams is fairly trivial, just add them together and divide them by two (use other weighting for volume control).

I looked at your data again and they appear to be floating point values the reason I was mistaken in my previous post is probably related to me working on big endian systems for a while now. However your data is in little endian IEEE floating point. Here are the values I got after conversion.
0.089630127 -> 0x0090b73d
0.084136963 -> 0x0050ac3d
0.086303711 -> 0x00c0b03d
As you can see, the values are fairly small so you'll probably need to take that into account when applying the volume; the usual convention is to have this data either between 0..1 or -1..1 for min and max volumes respectively.
Here is part of a mixing loop I've written a few years ago, for reference the full mixer is available here
for(int i = 0; i < a_Sample->count() / a_Sample->channels(); i++){
float l_Volume = a_Sample->volume() * m_MasterVolume;
*l_Output++ += *l_Left * l_PanLeft * l_Volume;
*l_Output++ += *l_Right * l_PanRight * l_Volume;
l_Left += a_Sample->channels();
l_Right += a_Sample->channels();
}
Notice that for the output you'll probably need to convert the data to signed integers so communicate properly if that's the responsibility of the mixer or the outputting device.

As others have mentioned you have to know what format the buffer is in. You can't simply just operate on the bytes directly (well, you could, but it would become quite complicated). Most raw PCM data is usually 44100 bits/second, 16 bit, 2 channel. However, that's not always the case. Each one of those can be different. It won't effect it too much, but is an example. However, even WAV files can be in other formats (like IEEE Float). You will need to interpret the buffer correct as the appropriate data type in order to operate on it.
Like:
BYTE buffer[3584];
if (SampleTypeIsPcm16Bit())
{
short *data = reinterpret_cast<short *>(buffer);
// Rock on
}
else if (SampleTypeIsFloat())
{
float *data = reinterpret_cast<float *>(buffer);
// Rock on
}
Of course, you can make it more generic with templates, but ignore that for know :P.
Keep in mind that if you are dealing with floats, they need to be capped to the range -1.0 and 1.0.
So, are you currently saying the "add two values and divide by two" (mentioned by Jasper) isn't working? How are you playing the data when you just hear silence? I wonder if that's a problem because if your math is off, you would likely hear audio glitches (pops/clicks/etc.) rather than just silence.

Related

byte order over custom buffer

I want to send a payload that has this structure:
uint8_t num_of_products;
//1 product
uint8_t action;
time_t unix_time;//uint64_t
uint32_t name_len;//up to 256
char* name;
uint8_t num_of_ips;
//1 ip out if num_of_ips
uint8_t ip_ver; //0 error , 1 ipv4 , 2 ipv6
char* IP;// ipv4 : 4 , ipv6 : 16
before sending the packet I aggregate products using memcpy into jumbo size mbuf
from tests I did name_len must go through hton in order to not look "inverted" in wireshark.
my question is ,what logic can I apply in order to get the byte order right for a custom structure with inner variables with unknown size
i.e what should go through hton what should be left as is

If you are aiming to have your message in network byte order (big endian), only your integer fields that take up more than one byte will need the htonl treamnet. In your case, that would be time_t unix_time and uint32_t name_len. Your strings and single-byte fields (such as num_of_products) won't need any specific conversion.
As some of the commenters in your question suggested - it's really up to you if you want to use a strict network byte order. Serializing your message to have a strict byte ordering is useful if you intend your code to run across different platforms.
Writing efficient byte packing code is annoyingly hard. You wind up writing a lot of code to just to save a few bytes of network bandwidth.
User jxh mentioned JSON as a possible encoding for your message. Not sure why he deleted his answer because it was on point. But in any case, a standard messaging format of either JSON or XML (or any ascii text schema) is 100x easier to observe in wireshark and when debugging.

How to mix audio input devices in Qt

I'm new to Qt's multimedia library and in my application I want to mix audio from multiple input devices (e.g. microphone), in order to stream it via TCP.
As far as I know I have to obtain the specific QAudioDeviceInfo for all needed devices first - together with an according QAudioFormat object - and use this with QAudioInput. Then I simply call start() for every created QAudioInput object and read out pending bytes with readLine().
But how can I mix audio data of multiple devices to one buffer?

I am not sure if there is any Qt specific method / class to do this. However it's pretty simple to do it yourself.
The most basic way (assuming you are using PCM), you can simply add the two streams/buffers together word by word (if I recall they are 16-bit PCM words).
So if you have two input buffers:
int16 buff1[10];
int16 buff2[10];
int16 mixBuff[10];
// Fill them...
//... code goes here to read from the buffers ....
// Add them (effectively mix them)
for (int i = 0; i < 10; i++)
{
mixBuff[i] = buff1[i] + buff2[i];
}
Now, this is very crude and does not take any scaling into consideration. So imagine buff1 and buff2 both use 80% of the dynamic range (call this full volume, beyond which you get distortion), then when you add them together you will get number overrun (i.e. 16-bit max is 65535 so 50000 + 50000 will be a over run).
Each time you mix you effectively need half the two inputs (so 65535 / 2 + 65535 / 2 = 65535... i.e. when you add them up you can't overrun). So your mix code is like this:
for (int i = 0; i < 10; i++)
{
mixBuff[i] = (buff1[i] >> 1) + (buff2[i] >> 1);
}
There is much more you can do (noise removal etc...) but then the maths start getting a bit hairy. This is very simple. You can use the shift afterwards to increase / decrease volume as simple volume control if you want.
EDIT
One thing to note... you are using readline() (which the docs say reads the data out as ASCII). I always use read() which it does not state the "format" it is read out, but I am assuming binary. So this code may not work if you use readline() but I have never tried it. It works well for read(), you don't really want to be working in ASCII if you want to manipulate the data.

Speed up data logging code

I have a device that outputs 64 bits of binary data at a rate of 1KHz. I am reading the device over USB via a 3rd party DLL, converting the binary data into a float, timestamping it, and writing to file.
I have the following setup at the moment:
int main(int argc, char* argv[])
{
unsigned char Message_Rx[64];
USHORT Bytes_Read=0;
std::ofstream out(argv[1]);
do
{
Result = Comms.USBRead(&Message_Rx[0],&Bytes_Read);
unsigned long now = getTickCount(start);
if(Result != 0)
{
uint16_t msb (Message_Rx[11] & 0xff) \\leftshited 8;
uint16_t lsb (Message_Rx[12] & 0xff);
uint16_t rate = msb | lsb;
char outstring[1024];
sprintf(outstring, "%d\t%.7f", now, (float)rate*0.03125);
out << outstring << "\n";
}
}while(!kbhit());
out.close();
}
(Sorry, formatting gets messed up with >> or <<).
This produces perfectly good results on my desktop. There doesn't appear to be any data missing and the timestamps are continuous and 1ms apart.
143379582 -0.5937500
143379583 -1.5312500
143379584 -1.6250000
143379585 -1.4062500
143379586 -1.1875000
143379587 -1.3437500
143379588 -1.3125000
143379589 -1.3125000
143379590 -1.1562500
But when I run this on the old laptop that I need to use I get timestamps that appear in blocks and it looks like there must be some data missing:
143379582 -0.5937500
143379582 -1.5312500
143379582 -1.6250000
143379582 -1.4062500
143379582 -1.1875000
143379593 -1.3437500
143379593 -1.3125000
143379593 -1.3125000
143379593 -1.1562500
Is there a way to achieve a speedup of my code so that I won't lose data?

To say this loud and clear: for any PC that is not a Intel 486SX, 64kb/s is a utmost laughable rate. Getting a few Mb/s over USB is very doable with small, Dollar-a-piece microcontrollers without any optimization.
Whatever goes wrong needs investigation much more than your code does.
I don't know the Comms library, but that's where I'd look for the place where time is spent.
Other than that, your printing stuff to the screen should take orders of magnitude more time than your processing, but still shouldn't be a problem. As mentioned, 1kS/s * 64 b/S is nothing for modern (read: last twenty years) PC hardware.

I recommend storing the raw data until the key is hit. After the key is pressed, output the data.
You want to remove formatting and output from high performance code areas.
Paraphrasing a song, There will be time enough for printing when the data's done.
Edit 1:
An array-based circular queue is a good data structure to hold the incoming data. This gives you the last N data samples.

Whenever you have issues with performance, your first step should be to profile the code to see what parts of it are taking up time.
However, for your code, I would say that the printing and string handling are unnecessary for the main loop. I would have a separate array of timestamps and within my main loop only acquire data.
After a key is hit, you no longer have timing restrictions and can deal with the somewhat expensive operation of file I/O and building up of the strings.
A final note is that your OS might be stealing CPU cycles from you. You may want to try to run your code with higher priorities to rule out scheduling.
With all that said, as was mentioned above, your data rate should be sustainable unless you're running on some really vintage hardware.

Building a fast PNG encoder issues

I am trying to build a fast 8-bit greyscale PNG encoder. Unfortunately I must be misunderstanding part of the spec. Smaller image sizes seem to work, but the larger ones will only open in some image viewers. This image (with multiple DEFLATE Blocks) gives a
"Decompression error in IDAT" error in my image viewer but opens fine in my browser:
This image has just one DEFLATE block but also gives an error:
Below I will outline what I put in my IDAT chunk in case you can easily spot any mistakes (note, images and steps have been modified based on answers, but there is still a problem):
IDAT length
"IDAT" in ascii (literally the bytes 0x49 0x44 0x41 0x54)
Zlib header 0x78 0x01
Steps 4-7 are for every deflate block, as the data may need to be broken up:
The byte 0x00 or 0x01, depending on if it is a middle or the last block.
Number of bytes in block (up to 2^16-1) stored as a little endian 16-bit integer
The 1's complement of this integer representation.
Image data (each scan-line is starts with a zero-byte for the no filter option in PNG, and is followed by width bytes of greyscale pixel data)
An adler-32 checksum of all the image data
A CRC of all the IDAT data
I've tried pngcheck on linux, an it does not spot any errors. If nobody can see what is wrong, can you point me in the right direction for a debugging tool?
My last resort is to use the libpng library to make my own decoder, and debug from there.
Some people have suggested it may be my adler-32 function calculation:
static uint32_t adler32(uint32_t height, uint32_t width, char** pixel_array)
{
uint32_t a=1,b=0,w,h;
for(h=0;h<height;h++)
{
b+=a;
for(w=0;w<width;w++)
{
a+=pixel_array[h][w];
b+=a;
}
}
return (uint32_t)(((b%65521)*65536)|(a%65521));
}
Note that because the pixel_array passed to the function does not contain the zero-byte at the beginning of each scanline (needed for PNG) there is an extra b+=a (and implicit a+=0) at the beginning of each iteration of the outer loop.

I do get an error with pngcheck: "zlib: inflate error = -3 (data error)". As your PNG scaffolding structure looks okay, it's time to take a low-level look into the IDAT block with a hex viewer. (I'm going to type this up while working through it.)
The header looks alright; IDAT length is okay. Your zlib flags are 78 01 ("No/low compression", see also What does a zlib header look like?), where one of my own tools use 78 9C ("Default compression"), but then again, these flags are only informative.
Next: zlib's internal blocks (per RFC1950).
Directly after the compression flags (CMF in RFC1950) it expects FLATE compressed data, which is the only compression scheme zlib supports. And that is in another castle RFC: RFC1951.
Each separately compression block is prepended by a byte:
3.2.3. Details of block format
Each block of compressed data begins with 3 header bits
containing the following data:
first bit BFINAL
next 2 bits BTYPE
...
BFINAL is set if and only if this is the last block of the data
set.
BTYPE specifies how the data are compressed, as follows:
00 - no compression
01 - compressed with fixed Huffman codes
10 - compressed with dynamic Huffman codes
11 - reserved (error)
So this value can be set to 00 for 'not last block, uncompressed' and to 01 for 'last block, uncompressed', immediately followed by the length (2 bytes) and its bitwise inverse, per 3.2.4. Non-compressed blocks (BTYPE=00):
3.2.4. Non-compressed blocks (BTYPE=00)
Any bits of input up to the next byte boundary are ignored.
The rest of the block consists of the following information:
0 1 2 3 4...
+---+---+---+---+================================+
| LEN | NLEN |... LEN bytes of literal data...|
+---+---+---+---+================================+
LEN is the number of data bytes in the block. NLEN is the
one's complement of LEN.
They are the final 4 bytes in your IDAT segment. Why do small images work, and larger not? Because you only have 2 bytes for the length.1 You need to break up your image into blocks no larger than 65,535 bytes (in my own PNG creator I seem to have used 32,768, probably "for safety"). If the last block, write out 01, else 00. Then add the two times two LEN bytes, properly encoded, followed by exactly LEN data bytes. Repeat until done.
The Adler-32 checksum is not part of this Flate-compressed data, and should not be counted in the blocks of LEN data. (It is still part of the IDAT block, though.)
After re-reading your question to verify I addressed all of your issues (and confirming I spelled "Adler-32" correctly), I realized you describe all of the steps right -- except that the 'last block' indicator is 01, not 80 (later edit: uh, perhaps you are right about that!) -- but that it does not show in this sample PNG. See if you can get it to work following all of the steps by the letter.
Kudos for doing this 'by hand'. It's a nice exercise in 'following the specs', and if you get this to work, it may be worthwhile to try and add proper compression. I shun pre-made libraries as much as possible; the only allowance I made for my own PNG encoder/decoder was to use Rich Geldreich's miniz.c, because implementing proper Flate encoding/decoding is beyond my ken.
1 That's not the whole story. Browsers are particularly forgiving in HTML errors; it seems they are as forgiving for PNG errors as well. Safari displays your image just fine, and so does Preview. But they may just all be sharing OS X's PNG decoder, because Photoshop rejects the file.

The byte 0x00 or 0x80, depending on if it is a middle or the last block.
Change the 0x80 to 0x01 and all will be well.
The 0x80 is appearing as a stored block that is not the last block. All that's being looked at is the low bit, which is zero, indicating a middle block. All of the data is in that "middle" block, so a decoder will recover the full image. Some liberal PNG decoders may then ignore the errors it gets when it tries to decode the next block, which isn't there, and then ignore the missing check values (Adler-32 and CRC-32), etc. That's why it shows up ok in browsers, even though it is an invalid PNG file.
There are two things wrong with your Adler-32 code. First, you are accessing the data from a char array. char is signed, so your 0xff bytes are being added not as 255, but rather as -127. You need to make the array unsigned char or cast it to that before extracting byte values from it.
Second, you are doing the modulo operation too late. You must do the % 65521 before the uint32_t overflows. Otherwise you don't get the modulo of the sum as required by the algorithm. A simple fix would be to do the % 65521 to a and b right after the width loop, inside the height loop. This will work so long as you can guarantee that the width will be less than 5551 bytes. (Why 5551 is left as an exercise for the reader.) If you cannot guarantee that, then you will need to embed a another loop to consume bytes from the line until you get to 5551 of them, do the modulo, and then continue with the line. Or, a smidge slower, just run a counter and do the modulo when it gets to the limit.
Here is an example of a version that works for any width:
static uint32_t adler32(uint32_t height, uint32_t width, unsigned char ** pixel_array)
{
uint32_t a = 1, b = 0, w, h, k;
for (h = 0; h < height; h++)
{
b += a;
w = k = 0;
while (k < width) {
k += 5551;
if (k > width)
k = width;
while (w < k) {
a += pixel_array[h][w++];
b += a;
}
a %= 65521;
b %= 65521;
}
}
return (b << 16) | a;
}

Arduino Ethernet Byte size problem

I'm using an Arduino (duemilanove) with the official Ethernet shield to send data to the controller for controlling an LED matrix. I am trying to send some raw 32-bit unsigned int values (unix timestamps) to the controller by taking the 4 bytes in the 32-bit value on the desktop and sending it to the arduino as 4 consecutive bytes. However, whenever a byte value is larger than 127, the returned value by the ethernet client library is 63.
The following is a basic example of what I'm doing on the arduino side of things. Some things have been removed for neatness.
byte buffer[32];
memset(buffer, 0, 32);
int data;
int i=0;
data = client.read();
while(data != -1 && i < 32)
{
buffer[i++] = (byte)data;
data = client.read();
}
So, whenever the input byte is bigger than 127 the variable "data" will end up getting set to 63! At first I thought the problem was further down the line (buffer used to be char instead of byte) but when I print out "data" right after the read, it's still 63.
Any ideas what could be causing this? I know client.read() is supposed to output int and internally reads data from the socket as uint8_t which is a full byte and unsigned, so I should be able to at least go to 255...
EDIT: Right you are, Hans. Didn't realize that Encoding.ASCII.GetBytes only supported the first 7 bits and not all 8.

I'm more inclined to suspect the transmit side. Are you positive the transmit side is working correctly? Have you verified with a wireshark capture or some such?

63 is the ASCII code for ?. There's some relevance to the values, ASCII doesn't have character codes for values over 127. An ASCII encoder commonly replaces invalid codes like this with a question mark. Default behavior for the .NET Encoding.ASCII encoder for example.
It isn't exactly clear where that might happen. Definitely not in your snippet. Probably on the other end of the wire. Write bytes, not characters.

+1 for Hans Passant and Karl Bielefeldt.
Can you just send the data without encoding? How is the data being sent? TCP/UDP/IP/Ethernet definitely support sending binary data without restriction. If this isn't possible, perhaps converting the data to hex will solve the problem. Base64 will also work (better) but is considerably more work. For small amounts of data, hex is probably the easiest and fastest solution.
+1 again to Karl and Ben for mentioning wireshark. Invaluable for debugging network problems like this.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to mix voice audio - c++

You need to find out if your VoIP system uses compression. It probably does, in which case the first thing you need to do is to decompress the streams, then mix them, then recompress.

This is probably an array of floats (unlikely due to the byte pattern presented) or singed integers if it's raw PCM data so try using it as such. Mixing to PCM streams is fairly trivial, just add them together and divide them by two (use other weighting for volume control).

Related

byte order over custom buffer

How to mix audio input devices in Qt

Speed up data logging code

Building a fast PNG encoder issues

Arduino Ethernet Byte size problem

Categories

Resources