VST audio input values completely different to those in Matlab - c++

Apologies if this sounds like a stupid question, I'm relatively new to VST development. I'm trying to build a plugin using the JUCE framework and I'm currently trying to test it with a sine wave .wav file. When I open the .wav file in Audacity it tells me it's 44100Hz and it's a 32 bit float. When I load this same file into matlab the first three samples are something like 0.00, 0.0443, 0.0884... However, when I put the same file into Ableton and Reaper, and try to step through the code I find the first three samples of the same file are 0.00000000, 0.00012068315, 0.00048156900... I see this when I peak into the memory in VS and look at it in 32 bit floating point view. Why are my sample values so much smaller?
My problem is that I need the audio to have the same sample values as they are in Matlab for my algorithm to work. Obviously there's a conversion happening that I have no control of. Can anyone shed any light on this problem and how I should go about fixing it. It looks like a scaling problem maybe. Ableton is being run in 32-bit mode and my VST is being compiled as 32-bit.
I can also provide more samples if that helps.
Thanks

The problem is because Ableton and Reaper were converting the 32 bit audio to 16 bit audio. I was able to check this by loading the sine.wav into Ableton and export it at 16bit. I then loaded the file into Matlab and got the smaller samples like above. My next problem is to figure out a way to convert the 16bit audio to 32bit audio within the VST.

What you seem to describe is a very low amplitude, thus you will have little values (you could convert your sample values in dB to verify that). Usually an audio signal is ranging between -1 and +1, where the extrema represent the max possible volume in the digital world (aka 0dB).
I believe the dilemma between 16 bits and 32 bits has nothing to do with your issue.

Related

HD Video Calling in Unity3D

I am an amateur in video/image processing but I am trying to create an app for HD video calling. I hope someone would see where I may be doing wrong and guide me on the right path. Here is what I am doing and what I think I understand, please correct me if you know better.
I am using OpenCV currently to grab an image from my webcam in a DLL. (I will be using this image for other things later)
Currently, the image that opencv gives me is a Opencv::Mat. I resized this and converted to a byte array size of a 720p image, which is about 3 Megapixels.
I pass this ptr back to my C# code then I can now render this onto a texture.
Now I created a TCP socket and connect the server and client and start to transmit previously gotten image byte array. I am able to transmit the byte array over to the client then I use the GPU to render it to a texture.
Currently, there is a big delay of about 4-500ms delay. This is after I tried compressing the buffer with gzipstream for unity. It was able to compress the byte array from about 3 million bytes to 1.5 million. I am trying to get this to smallest as possible and also fastest as possible but this is where I am completely lost. I saw that Skype requires only 1.2Mbps connection for a 720p video calling at 22 fps. I have no idea how they can achieve such a small frame, but of course I don't need it to be that small. I need to be at least decent.
Please give me a lecture on how this can be done! And let me know if you need anything else from me.
I found a link that may be very useful to anyone working on something similar. https://www.cs.utexas.edu/~teammco/misc/udp_video/
https://github.com/chenxiaoqino/udp-image-streaming/

How to read raw audio data in c++?

I'm trying to do a Fourier Transform on audio file. So far I've managed to read the header of the file with the help of this answer. This is the output.
The audio format is 1 which means PCM and I should really easily be able to work with the data. However, this is what I can't figure out.
Is the data binary and I should convert it to float or something else that I can't understand?
Yes, it's binary. Specifically, it's signed 16-bit integers.
You may want to convert it to float or double depending on your FFT needs.
I suggest you use a mono channel input audio file ... the sample you showed has two channels ( stereo ) which complicates the data slightly ... for a mono PCM file the structure is
two-bytes-sample-A immediately followed by two-bytes-sample-B ... etc.
in PCM each such sample directly corresponds to a point on the analog audio curve as the microphone diaphragm (or your eardrum) wobbles ... paying attention to correct use of endianness of your data each of these samples will result in an integer using all 16 bits so the unsigned integers go from values of 0 up to (2^16 - 1) which is 0 to 65535 .... confirm your samples stay inside this range IF they are unsigned

How can I get the frequency value at given time with XAudio2?

I've already loaded the .wav audio to the buffer with XAudio2 (Windows 8.1) and to play it I just have to use:
//start consuming audio in the source voice
/* IXAudio2SourceVoice* */ g_source->Start();
//play the sound
g_source->SubmitSourceBuffer(buffer.xaBuffer());
I wonder, how can I get the frequency value at given time with XAudio2?
The question does not make much sense, a .wav file contains a great many frequencies. It is the blend of them that makes it sound like music to your ears, instead of just an artificial generated tone. A blend that's constantly changing.
A signal processing step is required to convert the samples in the .wav file from the time domain to the frequency domain. Generally known as spectrum analysis, the Fast Fourier Transform (FFT) is the standard technique.
A random Google hit on "xaudio2 fft" produced this code sample. No idea how good it is, but something to play with to get the lay of the land. You'll find more about it in this gamedev question.

How to get PCM data from recorded sound for Fourier analysis

I've been working on c++ code that will take in sound and output it's core frequency, like a guitar tuner. I can generate my own randomized sine wave and successfully perform the FFT from a text file that is just amplitude vs. time. I just don't know how to produce usable data from either microphone or sound file.
Is there a simple way to sample sound and have it output the data in an amplitude vs. time text file?
I've looked into the WAV file format and how the various chunks work but it's a bit above my level. Any help is really appreciated.
If you can ensure that your WAV is mono, uncompressed, 16-bit and at a known sample rate, you can either skip the WAV/RIFF/whatever header or suck it in as if it were samples (that shouldn't affect your FFT results much if the file is long).
Other than that, uncompressed WAV isn't a horribly complex format. With some more effort you'll parse it.

How can I reduce or remove the noise created by changing the 'volume' of a sample from 16bit PCM

I'm currently working on a small project where I'm loading 16bit wave files with a sample rate of 44100Hz. In normal playback the audio seems fine but as soon as I start to play with things like amplitude size to change the volume it starts giving a little bit of static noise.
What I'm doing is getting a sample from the buffer in the case of this 16bit type a short, converting this to a float in the range of -1 to 1 to start doing mixing and other effects. In this I also change the volume, when I just multiply it by 1 giving the same output its fine but as soon as I start to change the volume I hear the static noise. It happens when going over 1.0 as well as going below 1.0. And it gets worse the bigger or smaller the scale.
Anyone an idea how to reduce or remove the noise ?
"Static", otherwise known as "clicks and pops" are the result of discontinuities in the output signal. Here is a perfect example of a discontinuity:
http://en.wikipedia.org/wiki/File:Discontinuity_jump.eps.png
If you send a buffer of audio to the system to play back, and then for the next buffer you multiply every sample by 1.1, you can create a discontinuity. For example, consider a buffer that contains a sine wave with values from [-0.5, 0.5]. You send a piece of this wave to the output device, and the last sample happens to be 0.5.
Now on your next buffer you try to adjust the volume by multiplying by 1.1. The first sample of the new buffer will be close to 0.5 (since the previous sample was 0.5). Multiply that by 1.1 and you get 0.55.
A change from one sample to the next of 0.05 will probably sound like a click or a pop. If you create enough of these, it will sound like static.
The solution is to "ramp" your volume change over the buffer. For example, if you want to apply a gain of 1.1 to a buffer of 100 samples, and the previous gain was 1.0, then you would loop over all 100 samples starting with gain 1 and smoothly increase the gain until you reach the last sample, at which point your gain should be 1.1.
If you want an example of this code look at juce::AudioSampleBuffer::applyGainRamp:
http://www.rawmaterialsoftware.com/api/classAudioSampleBuffer.html
I found the flaw, I was abstracting different bit data types by going to their data using char*, I did not cast the usage of it to the correct datatype pointer. This means bytes were cut off when giving it data. This created the noise and volume changing bugs amongst others.
A flaw of my implementation and me not thinking about this when working with the audio data. A tip for anyone doing the same kind of thing, keep a good eye when modifying data, check which type your data is when using abstractions.
Many thanks to the guys trying to help me, the links were really interesting and it did learn me more things about audio programming.