MP3 bitrate detection through frequency spectrum analysis - mp3

Is there any program that detects the bitrate of an MP3?
I'm not talking about the effective bitrate that the file has been encoded with, but the real bitrate that can be calculated only by frequency spectrum analysis.
For example, if I have an MP3 encoded in 128 kbps whose size is 1 MB, and then I transcode this MP3 to 320 kpbs whose size becomes 3 MB, I will have the same identical audio track, but with different sizes.
If I have a 320 kbps MP3 and I transcode it to 128 kbps I will lose some quality and therefore some file size.
But still, I have no way to verify that 320 kbps is my MP3's "real" bitrate.
Details are explained in this paper:
http://www.fileden.com/files/2009/2/14/2321055/My%20Documents/MP3%20Bit%20Rate%20Quality%20Detection%20through%20Frequency.pdf

Firstly, https://www.google.com/?q=mp3+cutoff+frequency can be quite enlightening.
Secondly, almost all MP3 are encoded using presets with quite standard polyphase lowpass filters. Since it is actually impossible to achieve lossless compression with MP3, what is lost is actually the higher harmonics of the base frequencies (see FFT, DCT, wavelet transforms etc); the filter is applied so that the results of later Fourier analysis of the spectrum is more coherent with human hearing range (ie unhearable/masked frequencies are eliminated from the analysis at all). It's actually impossible to achieve a high compression without cutting off/severly distorting higher frequencies, since it's actually them that occupy most of the space in the bit stream.
Of course, without the cutoff the frequency domain limiting would be less accurate - but it would still occur. The cutoff is applied, amongst other reasons, so that the compression artifacts are generated outside of the psychoacoustic hearing range.
As a point of reference - do a spectrum analysis of the stream (realtime SA with Winamp clone would suffice if the higher frequency ranges are saturated enough, you can also simply do a spectrogram if you have the tools), and find the cutoff point. In below example, the cutoff occurs # 15 kHz, which informs me the stream was originally compressed # 128 kbps ; I'd even go so far to say it's actually possible to distinguish <= 128 kpbs streams by ear with many kinds of music (drum'n'bass and other electronic music genres with lots of highs come to mind).
The most common cutoffs are: (note that they are "hard" in CBR and "soft" in ABR/VBR)
128 kbps : 15-16 kHz (very audible on rock/electronic music! "loss of space" effect)
192 kbps : ~19 kHz (barely audible in most cases, considered transparent by most)
256-320 kbps : > ~20kHz (inaudible)
Yes, I'm aware that some people can hear above 20 kHz, but the masking effect appearing in music plus the actual response times from the speakers means that in actual music 20 kHz cutoff is irrelevant to sound quality.
source: own reasearch as an audio engineer plus
https://web.archive.org/web/20150313010213/http://www.whatinterviewprep.com/prepare-for-the-interview/spectral-analysis/ as an additional reference

The cutoff frequency and the bit rate are independent. Yes, the majority of people use presets, therefore there is a correlation between the two, but it's not deterministic.

Only thing that you can easily determine from the frequency spectrum analysis is what is the sampling frequency of the input mp3 file.
For example, if your mp3 is sampled at 44100, you won't have any sound above 22050hz and that will be clearly visible on the spectrum graph.
Since you are crossing into transcendental domain here, try this:
encode mp3 to 128kbps
transcode it to say 320kbps
try RAR-ing or 7Z-ing resulting file and original file. Observe compression ratios.
Their 'entropy' ie. randomness will differ, and maybe that number will tell you something about how much information is 'fabricated' in bitrate expansion during transcoding.

You can open file "Adobe Audition" or "Cool Edit" and open frequency analysis window. If frequency lines to 20khz ore more mp3 bitrate maybe 320 kbps if frequency lines cut smaller then 20khz it's not 320kbps.

Related

Are there any constraints to encode a audio signal?

I capture a pcm sound at some sampling rate, e.g. 24 kHz. I need to encode it using some codec (I use Opus for that) to send over network. I noticed that at some sampling rate I use for encoding with Opus, I often hear some extra "cracking" noise at the receiving end. At other rates, it sounds ok. That might be an implementation bug, but I though there might be some constraints also that I don't know.
I also noticed that if I use another sampling rate while decoding Opus-encoded audio stream, I get a lower or higher pitch of sound, which seems logical to me. So I've read, that I need to resample on the other end, if receiving side doesn't support the original PCM sampling rate.
So I have 2 questions regarding all this:
Are there any constraints on sampling rate (or other parameters) of audio encoding? (Like I have a 24kHz pcm sound - maybe there are certain sample rates to use with it?)
Are there any common techniques to provide the same sound quality at both sides when sending audio stream over network?
The crackling noises are most likely a bug, since there is no limitations to the samplerate that would result in this kind of noise (there are other kinds of signal changes that come with sample rate conversion, especially when downsampling to a lower samplerate; but definitely not crackling).
A wild guess would be, that there is something wrong with the input buffer. Crackling often occurs if samples are omitted or duplicated, oftentimes the result of the boundaries of subsequent buffers not being correct.
Sending audio data over network in realtime will require compression, no matter what. The required data rate is simply too high. There are codecs which provide lossless audio compression (e.g. FLAC), but their compression ratio is comparatively low compared to e.g. Opus.
The problem was solved by buffering packets at receiving end and writing them to the soundcard buffer as soon as some amount has been reached. The 'crackling' noise was then most likely due to the gaps between subsequent frames that were sent to the soundcard buffer

MP3-encoding: Does CBR 320 cost more CPU than a lower bitrate?

I have an Android-app which uses LAME to encode an audio live-stream to MP3.
Right now, I'm using a constant bitrate (CBR) of 128 for this.
Now I wonder, if I switch over to a bitrate of eg. 320, will this cost more CPU/take longer?
This is a mandatory part of the app as it's as mentioned a live-stream.. therefor I won't risk a higher CPU-usage or so.
MP3 encoders/decoders usually need more processing power for higher bit rates. I could find two data points, MP3 encoder datasheet and MP3 decoder datasheet to support this.
On a modern phone, the difference in the CPU loading should be insignificant, as MP3 decoding/encoding is less CPU intensive.

Can the mp3 or wav file format take advantage of repetitious sounds?

I want to store a number of sound fragments as MP3 or WAV files, but these fragments are each highly repetitive (a 10 second burst of tone for example). Are the MP3 or WAV file formats able to take advantage of this - i.e. is there a sound file equivalent of run-length encoding?
No, neither codec can do this.
WAV files (typically) use PCM, which holds a value for every single sample. Even if there were complete digital silence (all values the same), every sample is stored.
MP3 works in frames of 1,152 samples. Each frame stands alone (well, there is the bit reservoir but for the purpose of encoding/decoding, this is just extra bandwidth made available). Even if there were a way to say do-this-n-times, it would be fixed within a frame. Now, if you are using MP3 with variable bit rate, I suspect that you will have great results with perfect sine waves since they have no harmonics. MP3 works by converting from the time domain to the frequency domain. That is, it samples the frequencies in each frame. If you only have one of those frequencies (or no sound at all), the VBR method would be efficient.
I should note that FLAC does use RLE when encoding silence. However, I don't think FLAC could be hacked to use RLE for 10 seconds of audio, since again there is a frame border. FLAC's RLE for silence is problematic for live internet radio stations that leave a few second gap inbetween songs. It's important for these stations to have a large buffer, since clients will often pause the stream if they don't receive enough data. (They do get caught back up again though as soon as that silent block is sent, once audio resumes.)

Playing video at frame rates that are not multiples of the refresh rate.

I'm working on an application to stream video to OpenGL textures. My first thought was to lock the rendering loop to 60hz, so to play a video at 30fps or 60fps I would update the texture on every other frame or every frame respectively. How do computers play videos at other frame rates when monitors are at 60hz, or for that matter if a monitor is at 75 hz how do they play 30fps video?
For most consumer devices, you get something like 3:2 pulldown, which basically copies the source video frames unevenly. Specifically, in a 24 Hz video being shown on a 60 Hz display, the frames are alternately doubled and tripled. For your use case (video in OpenGL textures), this is likely the best way to do it, as it avoids tearing.
If you have enough compute ability to run actual resampling algorithms, you can convert any frame rate to any other frame rate. Your choice of algorithm defines how smooth the conversion looks, and different algorithms will work best in different scenarios.
Too much smoothness may cause things like the 120 Hz "soap opera" effect
[1]
[2]:
We have been trained by growing up watching movies at 24 FPS to expect movies to have a certain look and feel to them that is an artifact of that particular frame rate.
When these movies are [processed], the extra sharpness and clearness can make the movies look wrong to viewers, even though the video quality is actually closer to real.
This is commonly called the Soap Opera Effect, because some feel it makes these expensive movies look like cheap shot-on-video soap operas (because the videotape format historically used on soap operas worked at 30 FPS).
Essentially you're dealing with a resampling problem. Your original data was sampled at 30Hz or 60Hz, and you've to resample it to another sample rate. The very same algorithms apply. Most of the time you'll find articles about audio signal resampling. Just think each pixel's color channel to be a individual waveform you want to resample.

Visualizing volume of PCM samples

I have several chunks of PCM audio (G.711) in my C++ application. I would like to visualize the different audio volume in each of these chunks.
My first attempt was to calculate the average of the sample values for each chunk and use that as an a volume indicator, but this doesn't work well. I do get 0 for chunks with silence and differing values for chunks with audio, but the values only differ slighly and don't seem to resemble the actual volume.
What would be a better algorithem calculate the volume ?
I hear G.711 audio is logarithmic PCM. How should I take that into account ?
Note, I haven't worked with G.711 PCM audio myself, but I presume that you are performing the correct conversion from the encoded amplitude to an actual amplitude before processing the values.
You'd expect the average value of most samples to be approximately zero as sound waveforms oscillate either side of zero.
A crude volume calculation would be rms (root mean square), i.e. taking a rolling average of the square of the samples and take the square root of that average. This will give you a postive quantity when there is some sound; the quantity is related to the power represented in the waveform.
For something better related to human perception of volume you may want to investigate the sort of techniques used in Replay Gain.
If you're feeling ambitious, you can download G.711 from the ITU-web site, and spend the next few weeks (or maybe more) implementing it.
If you're lazier (or more sensible) than that, you can download G.191 instead -- it includes source code to compress and decompress G.711 encoded data.
Once you've decoded it, visualizing the volume should be a whole lot easier.