How does this FFT smoother work in c++/openframeworks? - c++

I'm doing some tutorials for OpenFrameworks (i'm kind of a noob when it comes to coding but have a bit of experience so far w/ tutorials and learning what's going on and stuff over the past few years) and a major part of the code involves grabbing the sound spectrum of an audio sample and throwing the values into an array to control a float value. But I can't seem to wrap my head around what's going on here.
This is the relevant code (it's a VJ shaper that rotates and changes the size of shapes according to input from the sound spectrum):
header:
float * fftSmooth;
int bands;
cpp setup:
fftSmooth = new float[8192];
for (int i = 0; i < 8192; i++) {
fftSmooth[i] = 0;
}
bands = 64;
cpp update:
float * value = ofSoundGetSpectrum(bands);
for (int i = 0; i < bands; i++) {
fftSmooth[i] *= release; //"release" is a float
if (fftSmooth[i] < value[i]) {
fftSmooth[i] = value[i];
}
}
if anyone could walk me through the steps of what's going on, that would be great. I understand (sort of) that in the setup, an array called "fftSmooth" is being created, with 8192 floats in it, then being filled with zeros in the for loop after which the int "bands" is being assigned a value of 64. Then in the update, another array called "value" is being created with 64 floats in it by looking at "bands", which is also the number of bands in ofSoundGetSpectrum, which is grabbing the frequency levels from a sound file as it plays. I've looked at the openframeworks reference page for the sound spectrum thing and didn't really get any more clues as to what it's doing in this context, and i have no idea what the for loops and if statements in the update section are doing either.
Not knowing what's going on really isn't going to impact whether i can actually use the code or not, but i feel like if i want to actually build on this code (grabbing different frequency ranges etc) i need to know what the for loops and if statements in the update are doing.

ofSoundGetSpectrum(...)
Gets a frequency spectrum sample, taking all current sound players into account.
Each band will be represented as a float between 0 and 1.
This appears to be taking an instantaneous FFT, and returning the "strength" of each of the frequency bands.
I assume the second half of the code is run in a loop. The first time through, it is just going to copy the current band strength into fftSmooth. In subsequent passes, the multiply by release is designed to reduce the value in fftSmooth by some percentage. Then any new band strength greater than the filtered one will overwrite the old value.
If you animate plots of fftSmooth, you should get an image like this (minus the color) :

Related

Where to alter reference code to extract motion vectors from HEVC encoded video

So this question has been asked a few times, but I think my C++ skills are too deficient to really appreciate the answers. What I need is a way to start with an HEVC encoded video and end with CSV that has all the motion vectors. So far, I've compiled and run the reference decoder, everything seems to be working fine. I'm not sure if this matters, but I'm interested in the motion vectors as a convenient way to analyze motion in a video. My plan at first is to average the MVs in each frame to just get a value expressing something about the average amount of movement in that frame.
The discussion here tells me about the TComDataCU class methods I need to interact with to get the MVs and talks about how to iterate over CTUs. But I still don't really understand the following:
1) what information is returned by these MV methods and in what format? With my limited knowledge, I assume that there are going to be something like 7 values associated with the MV: the frame number, an index identifying a macroblock in that frame, the size of the macroblock, the x coordinate of the macroblock (probably the top left corner?), the y coordinate of the macroblock, the x coordinate of the vector, and the y coordinate of the vector.
2) where in the code do I need to put new statements that save the data? I thought there must be some spot in TComDataCU.cpp where I can put lines in that print the data I want to a file, but I'm confused when the values are actually determined and what they are. The variable declarations look like this:
// create motion vector fields
m_pCtuAboveLeft = NULL;
m_pCtuAboveRight = NULL;
m_pCtuAbove = NULL;
m_pCtuLeft = NULL;
But I can't make much sense of those names. AboveLeft, AboveRight, Above, and Left seem like an asymmetric mix of directions?
Any help would be great! I think I would most benefit from seeing some example code. An explanation of the variables I need to pay attention to would also be very helpful.
At TEncSlice.cpp, you can access every CTU in loop
for( UInt ctuTsAddr = startCtuTsAddr; ctuTsAddr < boundingCtuTsAddr; ++ctuTsAddr )
then you can choose exact CTU by using address of CTU.
pCtu(TComDataCU class)->getCtuRsAddr().
After that,
pCtu->getCUMvField()
will return CTU's motion vector field. You can extract MV of CTU in that object.
For example,
TComMvField->getMv(g_auiRasterToZscan[y * 16 + x])->getHor()
returns specific 4x4 block MV's Horizontal element.
You can save these data after m_pcCuEncoder->compressCtu( pCtu ) because compressCtu determines all data of CTU such as CU partition and motion estimation, etc.
I hope this information helps you and other people!

C++ mathematical function generation

In working on a project I came across the need to generate various waves, accurately. I thought that a simple sine wave would be the easiest to begin with, but it appears that I am mistaken. I made a simple program that generates a vector of samples and then plays those samples back so that the user hears the wave, as a test. Here is the relevant code:
vector<short> genSineWaveSample(int nsamples, float freq, float amp) {
vector<short> samples;
for(float i = 0; i <= nsamples; i++) {
samples.push_back(amp * sinx15(freq*i));
}
return samples;
}
I'm not sure what the issue with this is. I understand that there could be some issue with the vector being made of shorts, but that's what my audio framework wants, and I am inexperienced with that kind of library and so do not know what to expect.
The symptoms are as follows:
frequency not correct
ie: given freq=440, A4 is not the note played back
strange distortion
Most frequencies do not generate a clean wave. 220, 440, 880 are all clean, most others are distorted
Most frequencies are shifted upwards considerably
Can anyone give advice as to what I may be doing wrong?
Here's what I've tried so far:
Making my own sine function, for greater accuracy.
I used a 15th degree Taylor Series expansion for sin(x)
Changed the sample rate, anything from 256 to 44100, no change can be heard given the above errors, the waves are simply more distorted.
Thank you. If there is any information that can help you, I'd be obliged to provide it.
I suspect that you are passing incorrect values to your sin15x function. If you are familiar with the basics of signal processing the Nyquist frequency is the minimum frequency at which you can faithful reconstruct (or in your case construct) a sampled signal. The is defined as 2x the highest frequency component present in the signal.
What this means for your program is that you need at last 2 values per cycle of the highest frequency you want to reproduce. At 20Khz you'd need 40,000 samples per second. It looks like you are just packing a vector with values and letting the playback program sort out the timing.
We will assume you use 44.1Khz as your playback sampling frequency. This means that a snipet of code producing one second of a 1kHz wave would look like
DataStructure wave = new DataStructure(44100) // creates some data structure of 44100 in length
for(int i = 0; i < 44100; i++)
{
wave[i] = sin(2*pi * i * (frequency / 44100) + pi / 2) // sin is in radians, frequency in Hz
}
You need to divide by the frequency, not multiply. To see this, take the case of a 22,050 Hz frequency value is passed. For i = 0, you get sin(0) = 1. For i = 1, sin(3pi/2) = -1 and so on are so forth. This gives you a repeating sequence of 1, -1, 1, -1... which is the correct representation of a 22,050Hz wave sampled at 44.1Khz. This works as you go down in frequency but you get more and more samples per cycle. Interestingly though this does not make a difference. A sinewave sampled at 2 samples per cycle is just as accurately recreated as one that is sampled 1000 times per second. This doesn't take into account noise but for most purposes works well enough.
I would suggest looking into the basics of digital signal processing as it a very interesting field and very useful to understand.
Edit: This assumes all of those parameters are evaluated as floating point numbers.
Fundamentally, you're missing a piece of information. You don't specify the amount of time over which you want your samples taken. This could also be thought of as the rate at which the samples will be played by your system. Something roughly in this direction will get you closer, for now, though.
samples.push_back(amp * std::sin(M_PI / freq *i));

Audio Visualizer from wav looks wrong

I'm having trouble making an audio visualizer look accurate. The bins that have a significant amount of sound tend to draw correctly, but the problem I'm having is that all the frequencies with no significant sound seem to be coming back with a value that usually bounces between -60dB and -40dB. This forms a flat bouncing line (usually in the higher freqencies).
I want to display 512 bins or less at 30 frames per second. I've been reading up on FFT and audio non stop for a couple weeks now, and so far my process has been:
Load pcm data from wav file. This comes in as 44100 samples per second that have a range of -/+ 32767. I'm assuming I treat these as real numbers when passing them to the FFT.
Divide these samples up into 1470 per frame. (446 are ignored)
Take 1024 samples and apply a Hann window.
Pass the samples to FFT as an array of real[1024] as well as another array of the same size filled with zeros for the imaginary part.
Get the magnitude by looping through the (samples/2) bins and do a sqrt(real[i]*real[i] + img[i]*img[i]).
Taking 20 * log(magnitude) to get the decibel level of each bin
Draw a rectangle for each bin. Draw these bins for each frame.
I've tested it with a couple songs, and a wav file I generated that just plays a tone at 440Hz. With the wav file, I do get a spike at the 440 bin, but all the other bins form a line that isn't much shorter than the 440 bin. Also every other frame, the bins apart from 440 look like a graphed log function with a dip on some other bin.
I'm writing this in c++. Using STK to only load left channel from the audio file:
//put every sample in the song into a temporary vector
for (int i = 0; i < stkObject->getSize(); i++)
{
standardVector.push_back(stkObject->tick(LEFT));
}
I'm using FFTReal to perform the FFT:
std::vector<std::vector <double> > leftChannelData;
int numberOfFrames = stkObject->getSize()/samplesPerFrame;
leftChannelData.resize(numberOfFrames);
for(int i = 0; i < numberOfFrames; i++)
{
for(int j = 0; j < FFT_SAMPLE_LENGTH; j++)
{
real[j] = standardVector[j + (i*samplesPerFrame)];
}
applyHannWindow(real, FFT_SAMPLE_LENGTH);
fft_object.do_fft(imaginary,real);
//FFTReal instructions say to run this after an fft
fft_object.rescale(real);
leftChannelData[i].resize(FFT_SAMPLE_LENGTH/2);
for (int j = 0; j < FFT_SAMPLE_LENGTH/2; j++)
{
double magnitude = sqrt(real[j]*real[j] + imaginary[j]*imaginary[j]);
double dbValue = 20 * log(magnitude/maxMagnitude);
leftChannelData[i].at(j) = dbValue;
}
}
I'm at a loss as to what's causing this. I've tried various ways to pull those 446 samples I'm ignoring, but the results don't seem to change. I think I may be doing something fundamentally wrong. I've tried normalizing the pcm data before handing it to the fft and I've tried normalizing the magnitude before finding the decibels, but it doesn't seem to be working. Any thoughts?
EDIT: I don't see any difference between log(magnitude) and log(magnitude/maxMagnitude). All it seems to do is shift all of the bin's values evenly downwards.
EDIT2:
Here's a what they look like to get a visual:
Song playing low sounds - with log(mag)
Song playing low sounds - same but with log(mag/maxMag)
Again, log(mag) and log(mag/maxMag) generally look the same, but with values spanning in the negative. Like MSalters said, the decibel can approach -infinite, so I can clamp those values to -100dB. Then take log(mag/maxMag) and add 100. That way the rectangle's height range from 0 to 100 instead of -100 to 0.
Is this what I should do? I've tried this, but it still looks wrong. Maybe it's just a scaling issue? When I do this, a lot of the bars don't make it above the line when it sounds like they should. And if they do make it above 0, they do so just barely.
You have to understand that you're not taking the Fourier Transform of an infinite signal, but the FT of an windowed version thereof. And your window isn't even a plain Hann window. Discarding 446 points is effectively a rectangular window function. The FT of the window functions will both show up in your output.
Secondly, the dB scale is logarithmic. That indeed means it can go quite low in the absence of a signal. You mention -60 dB, but it in fact could hit minus infinity. The only thing that would save you from that is the window function, which will introduce smear at about -110 dB.
The noise (stop band ripple) produced by a quantized Von Hann window of length 1024 could well be around -40 to -60 dB. So one strategy is to just set a threshold, and ignore (don't plot) all values below that threshold.
Also, try removing the rescale(real) function, as that could distort your complex vector before you take the log magnitude.
Also, make sure you are actually loading the audio samples into your real vector correctly (sign, number of bits and endianess).

OpenCV Image Manipulation

I am trying to find out the difference in 2 images.
Scenario: Suppose that i have 2 images, one of a background and the other of a person in front of the background, I want to subtract the two images in such a way that I get the position of the person, that is the program can detect where the person was standing and give the subtracted image as the output.
The code that I have managed to come up with is taking two images from the camera and re-sizing them and is converting both the images to gray scale. I wanted to know what to do after this. I checked the subtract function provided by OpenCV but it takes arrays as inputs so I don't know how to progress.
The code that I have written is:
cap>>frame; //gets the first image
cv::cvtColor(frame,frame,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame,frame,Size(30,30)); //re-sizes it
cap>>frame2;//gets the second image
cv::cvtColor(frame2,frame2,CV_RGB2GRAY); //converts it to gray scale
cv::resize(frame2,frame2,Size(30,30)); //re-sizes it
Now do I simply use the subtract function like:
cv::subtract(frame_gray,frame,frame);
or do I apply some filters first and then use the subtract function?
As others have noticed, it's a tricky problem: easy to come up with a hack that will work sometimes, hard to come up with a solution that will work most of the time with minimal human intervention. Also, much easier to do if you can control tightly the material and illumination of the background. The professional applications are variously known as "chromakeying" (esp. in the TV industry), "bluescreening", "matting" or "traveling matte" (in cinematography), "background removal" in computer vision.
The groundbreaking work for matting quasi-uniform backdrops was done by Petro Vlahos many years ago. The patents on its basic algorithms have already expired, so you can go to town with them (and find open source implementations of various quality). Needless to say, IANAL, so do your homework on the patent subject.
Matting out more complex backgrounds is still an active research area, especially for the case when no 3D information is available. You may want to look into a few research papers that have come out of MS Research in the semi-recent past (A. Criminisi did some work in that area).
Using the subtract would not be appropriate because, it might result in some values becoming negative and will work only if you are trying to see if there is a difference or not( a boolean true/false).
If you need to get the pixels where it is differing, you should do a pixel by pixel comparison - something like:
int rows = frame.rows;
int cols = frame.cols;
cv::Mat diffImage = cv::Mat::zeros(rows, cols, CV_8UC1);
for(int i = 0; i < rows; ++i)
{
for(int j = 0; j < cols; ++j)
{
if(frame.at<uchar>(i,j) != frame2.at<uchar>(i,j))
diffImage.at<uchar>(i, j) = 255;
}
}
now, you can either show or save diffImage. All pixels that differ will be white while the similar ones will be in black

Draw sound wave with possibility to zoom in/out

I'm writing a sound editor for my graduation. I'm using BASS to extract samples from MP3, WAV, OGG etc files and add DSP effects like echo, flanger etc. Simply speaching I made my framework that apply an effect from position1 to position2, cut/paste management.
Now my problem is that I want to create a control similar with this one from Cool Edit Pro that draw a wave form representation of the song and have the ability to zoom in/out select portions of the wave form etc. After a selection i can do something like:
TInterval EditZone = WaveForm->GetSelection();
where TInterval have this form:
struct TInterval
{
long Start;
long End;
}
I'm a beginner when it comes to sophisticated drawing so any hint on how to create a wave form representation of a song, using sample data returned by BASS, with ability to zoom in/out would be appreciated.
I'm writing my project in C++ but I can understand C#, Delphi code so if you want you can post snippets in last two languages as well :)
Thanx DrOptix
By Zoom, I presume you mean horizontal zoom rather than vertical. The way audio editors do this is to scan the wavform breaking it up into time windows where each pixel in X represents some number of samples. It can be a fractional number, but you can get away with dis-allowing fractional zoom ratios without annoying the user too much. Once you zoom out a bit the max value is always a positive integer and the min value is always a negative integer.
for each pixel on the screen, you need to have to know the minimum sample value for that pixel and the maximum sample value. So you need a function that scans the waveform data in chunks and keeps track of the accumulated max and min for that chunk.
This is slow process, so professional audio editors keep a pre-calculated table of min and max values at some fixed zoom ratio. It might be at 512/1 or 1024/1. When you are drawing with a zoom ration of > 1024 samples/pixel, then you use the pre-calculated table. if you are below that ratio you get the data directly from the file. If you don't do this you will find that you drawing code gets to be too slow when you zoom out.
Its worthwhile to write code that handles all of the channels of the file in an single pass when doing this scanning, slowness here will make your whole program feel sluggish, it's the disk IO that matters here, the CPU has no trouble keeping up, so straightforward C++ code is fine for building the min/max tables, but you don't want to go through the file more than once and you want to do it sequentially.
Once you have the min/max tables, keep them around. You want to go back to the disk as little as possible and many of the reasons for wanting to repaint your window will not require you to rescan your min/max tables. The memory cost of holding on to them is not that high compared to the disk io cost of building them in the first place.
Then you draw the waveform by drawing a series of 1 pixel wide vertical lines between the max value and the min value for the time represented by that pixel. This should be quite fast if you are drawing from pre built min/max tables.
I've recently done this myself. As Marius suggests you need to work out how many samples are at each column of pixels. You then work out the minimum and maximum and then plot a vertical line from the maximum to the minimum.
As a first pass this seemingly works fine. The problem you'll get is that as you zoom out it will start to take too long to retrieve the samples from disk. As a solution to this I built a "peak" file alongside the audio file. The peak file stores the minimum/maximum pairs for groups of n samples. PLaying with n till you get the right amount is up to uyou. Personally I found 128 samples to be a good tradeoff between size and speed. Its also worth remembering that, unless you are drawing a control larger than 65536 pixels in size that you needn't store this peak information as anything more than 16-bit values which saves a bit of space.
Wouldn't you just plot the sample points on a 2 canvas? You should know how many samples there are per second for a file (read it from the header), and then plot the value on the y axis. Since you want to be able to zoom in and out, you need to control the number of samples per pixel (the zoom level). Next you take the average of those sample points per pixel (for example take the average of every 5 points if you have 5 samples per pixel. Then you can use a 2d drawing api to draw lines between the points.
Using the open source NAudio Package -
public class WavReader2
{
private readonly WaveFileReader _objStream;
public WavReader2(String sPath)
{
_objStream = new WaveFileReader(sPath);
}
public List<SampleRangeValue> GetPixelGraph(int iSamplesPerPixel)
{
List<SampleRangeValue> colOutputValues = new List<SampleRangeValue>();
if (_objStream != null)
{
_objStream.Position = 0;
int iBytesPerSample = (_objStream.WaveFormat.BitsPerSample / 8) * _objStream.WaveFormat.Channels;
int iNumPixels = (int)Math.Ceiling(_objStream.SampleCount/(double)iSamplesPerPixel);
byte[] aryWaveData = new byte[iSamplesPerPixel * iBytesPerSample];
_objStream.Position = 0; // startPosition + (e.ClipRectangle.Left * iBytesPerSample * iSamplesPerPixel);
for (float iPixelNum = 0; iPixelNum < iNumPixels; iPixelNum += 1)
{
short iCurrentLowValue = 0;
short iCurrentHighValue = 0;
int iBytesRead = _objStream.Read(aryWaveData, 0, iSamplesPerPixel * iBytesPerSample);
if (iBytesRead == 0)
break;
List<short> colValues = new List<short>();
for (int n = 0; n < iBytesRead; n += 2)
{
short iSampleValue = BitConverter.ToInt16(aryWaveData, n);
colValues.Add(iSampleValue);
}
float fLowPercent = (float)((float)colValues.Min() /ushort.MaxValue);
float fHighPercent = (float)((float)colValues.Max() / ushort.MaxValue);
colOutputValues.Add(new SampleRangeValue(fHighPercent, fLowPercent));
}
}
return colOutputValues;
}
}
public struct SampleRangeValue
{
public float HighPercent;
public float LowPercent;
public SampleRangeValue(float fHigh, float fLow)
{
HighPercent = fHigh;
LowPercent = fLow;
}
}