Extract and analyse sound from mp3 files

Extract and analyse sound from mp3 files - mp3

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew

I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#

Related

Partial decoding h264 stream

I'm trying to get information about frames in h264 bitstream. Especially motion vectors of macroblocks. I think, I have to use ffmpeg code for it, but it's really huge and hard to understand.
So, can someone give me some tips or exapmles of partial decoding from raw data of single frame from h264 stream?
Thank you.

Unfortunately, to get that level of information from the bitstream you have to decode every macroblock, there's no quick option, like there would be for getting information from the slice header.
One option is to use the h.264 reference software and turn on the verbose debug output and/or add your own printf's where needed, but this is also a large code base to navigate:
http://iphome.hhi.de/suehring/tml/
(You can also use ffmpeg and add output where needed too as you said, but it would take some understanding of that code base too)
There are graphical tools for analyzing video bitstreams which will show you this type of information on a per-macroblock basis, many are expensive, but sometimes there are free trial versions available.

How to add sound effects to PCM buffered audio in C++

I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.

For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.

Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.

You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.

Sound processing in C++ on Windows - a nudge in the right direction

I want to write a simple sound editor with a very specific purpose: cutting and re-gluing an audio file (which will contain spoken prose) in such a way that each sentence is repeated N times. (This is for foreign language learning.)
I don't want to use an existing sound editor because I would like to tailor the GUI specifically for this narrow task, reducing the amount of motions and clicks to a minimum.
Unfortunately I don't have any experience whatsoever in working with sound. I was wondering about recommendations for C++ libraries/APIs on Windows that would enable me to:
read in an audio file (mp3 or wav)
select a portion from "here" to "here"
listen to it
append it to a new file
write the whole thing out as mp3 (or at least wav)
Also any general thoughts are very welcome (this is completely unknown territory for me so if you have had any stumbling blocks and mistakes you don't want others to repeat, please do share).

I have previously been quite happy about http://www.portaudio.com/, which is a nice platform independent wrapper to the sound hardware (low latency recording and playback).
For reading / writing mp3s I have used LAME http://lame.sourceforge.net which is also supported support on pretty much all popular platforms.
You might also want to check out the source code of Audacity http://audacity.sourceforge.net/, which does what you want and a lot more.

Can you encode an mp3 file with multiple bitrates?

Is it possible to encode an mp3 file using multiple bit rates?
e.g., 0-2min 64kbps, 2-4min 128kbps, and 4-10min 64kbps (the middle section needs higher sound quality)
Or am I stuck having to encode it all at the highest?

Yes. See the following:
Variable Bitrate # Wikipedia
You will either need an encoder that supports it, or, if you are emitting frames on your own- you can vary the rate per segment as you wish.
edit:
Also, you may have better luck looking for resources using the VBR (variable-bit-rate) keyword.
edit (caveat):
You should note that there are potentially two different concepts in conflict here, as mentioned by sellibitze.
A higher bitrate allows the capability of storing more audio detail, but doesn't do anything for the fidelity of your recording. If your recording was already of low quality, higher bitrates will only help preserve the level of fidelity available in your audio sample.

Does the middle section need to be higher quality or just higher bitrate to maintain constant quality. If it's the latter you get it with a decent encoder in VBR mode (variable bitrate). If you want the quality of the middle section ("region of interest") to be higher I don't think it's that easy. In theory you can encode the track twice and mix & match afterwards. But mixing frames is not that easy due to the bitreservoir.

I think you are looking for the term variable bit rate, more info here.

WAV compression help

How do you programmatically compress a WAV file to another format (PCM, 11,025 KHz sampling rate, etc.)?

I'd look into audacity... I'm pretty sure they don't have a command line utility that can do it, but they may have a library...
Update:
It looks like they use libsndfile, which is released under the LGPL. I for one, would probably just try using that.

Use sox (Sound eXchange : universal sound sample translator) in Linux:
SoX is a command line program that can convert most popular audio files to most other popular audio file formats. It can optionally
change the audio sample data type and apply one or more sound effects to the file during this translation.

If you mean how do you compress the PCM data to a different audio format then there are a variety of libraries you can use to do this, depending on the platform(s) that you want to support. If you just want to change the sample rate of the PCM data then you need a sample rate conversion algorithm instead, which is a completely different problem. Can you be more specific in your requirements?

You're asking about resampling, and more specifically downsampling, not compression. While both processes are lossy (meaning that you will suffer loss of information), downsampling works on raw samples instead of in the frequency domain.
If you are interested in doing compression, then you should look into lame or OGG vorbis libraries; you are no doubt familiar with MP3 and OGG technology, though I have a feeling from your question that you are interested in getting back a PCM file with a lower sampling rate.
In that case, you need a resampling library, of which there are a few possibilites. The most widely known is libsamplerate, which I honestly would not recommend due to quality issues not only within the generated audio files, but also of the stability of the code used in the library itself. The other non-commercial possibility is sox, as a few others have mentioned. Depending on the nature of your program, you can either exec sox as a separate process, or you can call it from your own code by using it as a library. I personally have not tried this approach, but I'm working on a product now where we use sox (for upsampling, actually), and we're quite happy with the results.
The other option is to write your own sample rate conversion library, which can be a significant undertaking, but, if you only are interested in converting with an integer factor (ie, from 44.1kHz to 22kHz, or from 44.1kHz to 11kHz), then it is actually very easy, since you only need to strip out every Nth sample.

In Windows, you can make use of the Audio Compression Manager to convert between files (the acm... functions). You will also need a working knowledge of the WAVEFORMAT structure, and WAV file formats. Unfortunately, to write all this yourself will take some time, which is why it may be a good idea to investigate some of the open source options suggested by others.
I have written a my own open source .NET audio library called NAudio that can convert WAV files from one format to another, making use of the ACM codecs that are installed on your machine. I know you have tagged this question with C++, but if .NET is acceptable then this may save you some time. Have a look at the NAudioDemo project for an example of converting files.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js