RMS in Frequency Domain - c++

I am trying to create a spectral analyzer plugin using C++; After the FFT, I would like to somehow average each bin using RMS. The reason being is because I want the frequency plot to display at a slower rate for better viewing. How can I achieve this? To be a little more specific, I have a FFT with a sample size of 4096 with a sampling frequency of 44,100 HZ. I'm updating the display every 40 ms. Each FFT frame is displaying to fast for the human eye. How can I smooth this out by some type of averaging?
Thanks,
Isaiah Thompson

Your display updates every 40 ms are of course pointless. You have 44.100 samples per second, 4096 samples per FFT so about 11 FFT's per second. That's one every 90 ms, not 40 ms.
Furthermore, the common way to display this is as a spectrogram. Don't use a 4096 bin FFT, that's overkill anyway. Instead, use a 1024 point FFT. You'll now get 44 FFT's per second. Color-code each bin, and plot each FFT on a vertical line. The horizontal axis is the time axis. You can now show half a minute of FFT's on a single screen, and it will horizontally scroll at 44 pixels/second. This is slow enough for the eye to track.

Related

How would you measure the amount of bass in one audio sample?

I am a beginner in audio programming and was wondering how would you get the amount of bass in just one single audio sample. I was thinking it would be measured in db maybe but i don't know if there a unit that is actually for measuring bass.
I have no code to show for the measuring of the bass since I have no idea where to look or to start out by doing by I've already got everything up to the point of having all the samples of my audio file stored as a float array using the juce library, now its just a matter of going through each sample measuring the bass of each sample
Any help please?
I am assuming by one audio sample you mean an array of floats, and not just one element of that array.
If you "Google" the word Bass you land on the very first result telling:
Bass (also called bottom end) describes tones of low (also called "deep") frequency, pitch and range from 16 to 256 Hz.
Yes, Bass is just the audio in that range.
Now, with that I think you would be able to figure out how to find frequencies using audio samples and if not, then this is the best I can do...
Now, you can find the amount of Bass, frequencies in the said range, clearly.. :)
There's just one solution here, and it's not what you think. You need to transform your signal in the time domain to a signal in the frequency domain. Bass is the lower part of the frequency domain.
The first thing you need then is the FFT. This takes a number of samples as input. A typical value would be 2048 samples. If your input is a 48 kHz signal, this will divide the signal into 1024 bins of 47 Hz each. The lower 5 bins or so contain the bass part of your signal. (Bin 0 also contains any DC offset, which might be problematic)
You then need to convert these 5 bins into energy; that's just squaring the 5 values and summing them.

OpenCV: Detecting seizure-inducing lights in a video?

I have been working on an algorithm which can detect seizure-inducing strobe lights in a video.
Currently, my code returns virtually every frame as capable of causing a seizure (3Hz flashes).
My code calculates the relative luminance of each pixel and sees how many times the luminance goes up then down, etc. or down then up, etc. by more than 10% within any given second.
Is there any way to do this without comparing each individual pixel within a second of each other and that only returns the correct frames.
An example of what I am trying to emulate: https://trace.umd.edu/peat
The common approach to solving this type of problems is to convert the frames to grayscale and then construct a cube containing frames from a 1 to 3 seconds time interval. From this cube, you can extract the time-varying characteristics of either individual pixels (noisy), or blocks (recommended). The resulting 1D curves can first be observed manually to see if they actually show the 3Hz variation that you are looking for (sometimes, these variations are either lost or distorted because of the camera's auto exposure settings). If you can see it, they you should be able to use FFT to isolate and detect it automatically.
Convert the image to grayscale. Break the image up into blocks, maybe 16x16 or 64x64 or larger (experiment to see what works). Take the average luminance of each block over a minimum of 2/3 seconds. Create a wave of luminance over time. Do an fft on this wave and look for a minimum energy threshold around 3Hz.

Frequency & amplitude

I have a data (file) which contains 2 columns:
Seconds, Volts
0, -0.4238353
2.476346E-08, -0.001119718
4.952693E-08, -0.006520569
(..., thousands of similar entries in file)
4.516856E-05, -0.0002089292
How to calculate the frequency of the highest amplitude wave ? (Each wave is of fixed frequency).
Is there any difference between calculating frequency of seconds and amplitude vs. seconds and volts? Because in Frequency & amplitue there is seconds and amplitude example solved, so it might help in my case.
Your data is in the time domain, the question is about the frequency domain. Your course should have told you how the two are related. In two words: Fourier Transform. In practical programming, we use the FFT: Fast Fourier Tranform. If the input is a fixed frequency sine wave, your FFT output will have one hump. Model that as a parabola and find the peak of the parabola. (Finding the highest amplitude in the FFT is about 10 times less accurate)
The link you give is horrible; I've downvoted the nonsense answer there. In your example, time starts at t=0 and the solution given would do a 1/0.

Generating a square wave for DFT

I'm working on an assignment to perform a 200 point DFT at a sampling frequency of 20kHz on a square wave of frequency 500Hz whose amplitude alternates between 0 and 20.
I'm using C++ and I have figured how to code the DFT equation, my problem is I'm having trouble representing the square wave in code using a for loop.
What I'm really still confused about is how many cycles of this square wave will be in my 200 point sample.
Thanks
The period of the square wave is 20000/500=40 points, so you'll have exactly 5 periods of the square wave in your 200-point sample (200/40=5).
One cycle of your square wave will take 1/500 seconds. Each sample will be 1/20000 seconds. A simple division should tell you how many samples each square wave will be.
Another division will tell you how many of those waves will fit in your 200 point window.
If your sampling frequency is 20,000 Hz and you have a square wave of frequency 500 Hz, this basically means that you will have 500 cycles of your wave per second, which means you will have 500 cycles in every 20,000 samples. This means that each wave cycle requires 40 samples (or points), so if you have 200 points that means you should have 5 square wave cycles within your DFT.
You can make sure you do your calculation right by including the units in your calculation. So the period has the dimension time, Hertz has the dimension of 1.0/time and samples is dimensionless.
Programatically, you can do this with boost.units. It will check your units at compile time and give you an error if you make a mistake.
It will also stop your user from entering the wrong units into your code. For example, by entering 20 instead of 20000 for the frequency (thinking you were measuring in kHz)
Your interface will then be something like
using namespace boost::units;
set_period(quantity<si::time> period);
The user will have to enter the time in seconds,
set_period(5*si::seconds)

Parameters to improve a music frequency analyzer

I'm using a FFT on audio data to output an analyzer, like you'd see in Winamp or Windows Media Player. However the output doesn't look that great. I'm plotting using a logarithmic scale and I average the linear results from the FFT into the corresponding logarithmic bins. As an example, I'm using bins like:
16k,8k,4k,2k,1k,500,250,125,62,31,15 [hz]
Then I plot the magnitude (dB) against frequency [hz]. The graph definitely 'reacts' to the music, and I can see the response of a drum sample or a high pitched voice. But the graph is very 'saturated' close to the lower frequencies, and overall doesn't look much like what you see in applications, which tend to be more evenly distributed. I feel that apps that display visual output tend to do different things to the data to make it look better.
What things could I do to the data to make it look more like the typical music player app?
Some useful information:
I downsample to single channel, 32kHz, and specify a time window of 35ms. That means the FFT gets ~1100 points. I vary these values to experiment (ie tried 16kHz, and increasing/decreasing interval length) but I get similar results.
With an FFT of 1100 points, you probably aren't able to capture the low frequencies with a lot of frequency resolution.
Think about it, 30 Hz corresponds to a period of 33ms, which at 32kHz is roughly 1000 samples. So you'll only be able to capture about 1 period in this time.
Thus, you'll need a longer FFT window to capture those low frequencies with sharp frequency resolution.
You'll likely need a time window of 4000 samples or more to start getting noticeably more frequency resolution at the low frequencies. This will be fine too, since you'll still get about 8-10 spectrum updates per second.
One option too, if you want very fast updates for the high frequency bins but good frequency resolution at the low frequencies, is to update the high frequency bins more quickly (such as with the windows you're currently using) but compute the low frequency bins less often (and with larger windows necessary for the good freq. resolution.)
I think a lot of these applications have variable FFT bins.
What you could do is start with very wide evenly spaced FFT bins like you have and then keep track of the number of elements that are placed in each FFT bin. If some of the bins are not used significantly at all (usually the higher frequencies) then widen those bins so that they are larger (and thus have more frequency entries) and shring the low frequency bins.
I have worked on projects were we just spend a lot of time tuning bins for specific input sources but it is much nicer to have the software adjust in real time.
A typical visualizer would use constant-Q bandpass filters, not a single FFT.
You could emulate a set of constant-Q bandpass filters by multiplying the FFT results by a set of constant-Q filter responses in the frequency domain, then sum. For low frequencies, you should use an FFT longer than the significant impulse response of the lowest frequency filter. For high frequencies, you can use shorter FFTs for better responsiveness. You can slide any length FFTs along at any desired update rate by overlapping (re-using) data, or you might consider interpolation. You might also want to pre-window each FFT to reduce "spectral leakage" between frequency bands.