Audio frequency of each frame of a audio file like .mp3 .wav - c++

Can I find a way to get frequency of each frame on a audio file like .mp3 or .wav or any other sound format using "fmod" or "cwave" libraries or even other libraries?
How can I find out this frequency in C/C++?

The FFTW library is a set of very fast implementations of different fourier transformations.

If you have a number of samples of digitized audio, you pretty much have, in total, as many frequencies and phases as you've got samples. Suppose you've got just two samples of audio. In order to faithfully represent them, you need one frequency and one phase -- so again, two values. There is no "single" frequency to represent multiple samples of digitized audio.
You can of course, akin to the question of "How can I get the color of a specific video frame?", ask what is the average frequency. Or you can ask what is the most prominent frequency (the one with highest amplitude). Or you can ask what is the frequency that with its harmonics carries the most energy in the signal (assuming the signal was physical, like electrical current sampled in time).
In all those cases, you'd probably want to use a premade library that internally uses the FFT or a similar discrete transform to get the signal from the time domain to a frequency or a similar domain (quefrency domain, for example, and it's not a typo). It's hard to get what you want from a plain FFT, you'd need some mathematical training to process raw FFT results into what you're after. I'm sure there are libraries for it, I just can't think of any right now. Perhaps someone who deals with such work can edit the answer.

Related

How can I get the frequency value at given time with XAudio2?

I've already loaded the .wav audio to the buffer with XAudio2 (Windows 8.1) and to play it I just have to use:
//start consuming audio in the source voice
/* IXAudio2SourceVoice* */ g_source->Start();
//play the sound
g_source->SubmitSourceBuffer(buffer.xaBuffer());
I wonder, how can I get the frequency value at given time with XAudio2?
The question does not make much sense, a .wav file contains a great many frequencies. It is the blend of them that makes it sound like music to your ears, instead of just an artificial generated tone. A blend that's constantly changing.
A signal processing step is required to convert the samples in the .wav file from the time domain to the frequency domain. Generally known as spectrum analysis, the Fast Fourier Transform (FFT) is the standard technique.
A random Google hit on "xaudio2 fft" produced this code sample. No idea how good it is, but something to play with to get the lay of the land. You'll find more about it in this gamedev question.

jpeg compression ratio

Is there a table that gives the compression ratio of a jpeg image at a given quality?
Something like the table given on the wiki page, except for more values.
A formula could also do the trick.
Bonus: Are the [compression ratio] values on the wiki page roughly true for all images? Does the ratio depend on what the image is and the size of the image?
Purpose of these questions: I am trying to determine the upper bound of the size of a compressed image for a given quality.
Note: I am not looking to make a table myself(I already have). I am looking for other data to check with my own.
I had exactly the same question and I was disappointed that no one created such table (studies based on a single classic Lena image or JPEG tombstone are looking ridiculous). That's why I made my own study. I cannot say that it is perfect, but it is definitely better than others.
I took 60 real life photos from different devices with different dimensions. I created a script which compress them with different JPEG quality values (it uses our company imaging library, but it is based on libjpeg, so it should be fine for other software as well) and saved results to CSV file. After some Excel magic, I came to the following values (note, I did not calculated anything for JPEG quality lower than 55 as they seem to be useless to me):
Q=55 43.27
Q=60 36.90
Q=65 34.24
Q=70 31.50
Q=75 26.00
Q=80 25.06
Q=85 19.08
Q=90 14.30
Q=95 9.88
Q=100 5.27
To tell the truth, the dispersion of the values is significant (e.g. for Q=55 min compression ratio is 22.91 while max value is 116.55) and the distribution is not normal. So it is not so easy to understand what value should be taken as typical for a specific JPEG quality. But I think these values are good as a rough estimate.
I wrote a blog post which explains how I received these numbers.
http://www.graphicsmill.com/blog/2014/11/06/Compression-ratio-for-different-JPEG-quality-values
Hopefully anyone will find it useful.
Browsing Wikipedia a little more led to http://en.wikipedia.org/wiki/Standard_test_image and Kodak's test suite. Although they're a little outdated and small, you could make your own table.
Alternately, pictures of stars and galaxies from NASA.gov should stress the compressor well, being large, almost exclusively composed of tiny speckled detail, and distributed in uncompressed format. In other words, HUBBLE GOTCHOO!
The compression you get will depend on what the image is of as well as the size. Obviously a larger image will produce a larger file even if it's of the same scene.
As an example, a random set of photos from my digital camera (a Canon EOS 450) range from 1.8MB to 3.6MB. Another set has even more variation - 1.5MB to 4.6MB.
If I understand correctly, one of the key mechanisms for attaining compression in JPEG is using frequency analysis on every 8x8 pixel block of the image and scaling the resulting amplitudes with a "quantization matrix" that varies with the specified compression quality.
The scaling of high frequency components often result in the block containing many zeros, which can be encoded at negligible cost.
From this we can deduce that in principle there is no relation between the quality and the final compression ratio that will be independent of the image. The number of frequency components that can be dropped from a block without perceptually altering its content significantly will necessarily depend on the intensity of those components, i.e. whether the block contains a sharp edge, highly variable content, noise, etc.

Real-time spectrum analyzer with API

I'm looking for a C or C++ API that will give me real-time spectrum analysis of a waveform on Windows.
I'm not entirely sure how large a sample window it should need to determine frequency content, but the smaller the better. For example, if it can work with a 0.5 second long sample and determine frequency content to the Hz, that would be wicked-awesome.
I used FFTW a few years ago. It is supposedly fast (though I didn't use it for anything real-time myself) and was certainly pretty easy to use, even on Windows.
Regarding the window size, see the Nyquist-Shannon sampling theorem.
(I imagine there are other issues involved when using a window on the data, particularly for low frequencies, but I'm no expert and I couldn't find any useful-looking info about this, so maybe I'm wrong.)
For details of how to generate a power spectrum and how to determine frequency resolution of same, please see my answer to this question: How to extract semi-precise frequencies from a WAV file using Fourier Transforms

Programmatically convert WAV

I'm writing a file compressor utility in C++ that I want support for PCM WAV files, however I want to keep it in PCM encoding and just convert it to a lower sample rate and change it from stereo to mono if applicable to yield a lower file size.
I understand the WAV file header, however I have no experience or knowledge of how the actual sound data works. So my question is, would it be relatively easy to programmatically manipulate the "data" sub-chunk in a WAV file to convert it to another sample rate and change the channel number, or would I be much better off using an existing library for it? If it is, then how would it be done? Thanks in advance.
PCM merely means that the value of the original signal is sampled at equidistant points in time.
For stereo, there are two sequences of these values. To convert them to mono, you merely take piecewise average of the two sequences.
Resampling the signal at lower sampling rate is a little bit more tricky -- you have to filter out high frequencies from the signal so as to prevent alias (spurious low-frequency signal) from being created.
I agree with avakar and nico, but I'd like to add a little more explanation. Lowering the sample rate of PCM audio is not trivial unless two things are true:
Your signal only contains significant frequencies lower than 1/2 the new sampling rate (Nyquist rate). In this case you do not need an anti-aliasing filter.
You are downsampling by an integer value. In this case, downampling by N just requires keeping every Nth sample and dropping the rest.
If these are true, you can just drop samples at a regular interval to downsample. However, they are both probably not true if you're dealing with anything other than a synthetic signal.
To address problem one, you will have to filter the audio samples with a low-pass filter to make sure the resulting signal only contains frequency content up to 1/2 the new sampling rate. If this is not done, high frequencies will not be accurately represented and will alias back into the frequencies that can be properly represented, causing major distortion. Check out the critical frequency section of this wikipedia article for an explanation of aliasing. Specifically, see figure 7 that shows 3 different signals that are indistinguishable by just the samples because the sampling rate is too low.
Addressing problem two can be done in multiple ways. Sometimes it is performed in two steps: an upsample followed by a downsample, therefore achieving rational change in the sampling rate. It may also be done using interpolation or other techniques. Basically the problem that must be solved is that the samples of the new signal do not line up in time with samples of the original signal.
As you can see, resampling audio can be quite involved, so I would take nico's advice and use an existing library. Getting the filter step right will require you to learn a lot about signal processing and frequency analysis. You won't have to be an expert, but it will take some time.
I don't think there's really the need of reinventing the wheel (unless you want to do it for your personal learning).
For instance you can try to use libsnd

Visualizing volume of PCM samples

I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.
My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.
What would be a better algorithem calculate the volume ?
I hear G.711 audio is logarithmic PCM. How should I take that into account ?
Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.
You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.
A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.
For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.
If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.
If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.
Once you've decoded it, visualizing the volume should be a whole lot easier.