Can you encode an mp3 file with multiple bitrates? - mp3

Is it possible to encode an mp3 file using multiple bit rates?
e.g., 0-2min 64kbps, 2-4min 128kbps, and 4-10min 64kbps (the middle section needs higher sound quality)
Or am I stuck having to encode it all at the highest?

Yes. See the following:
Variable Bitrate # Wikipedia
You will either need an encoder that supports it, or, if you are emitting frames on your own- you can vary the rate per segment as you wish.
edit:
Also, you may have better luck looking for resources using the VBR (variable-bit-rate) keyword.
edit (caveat):
You should note that there are potentially two different concepts in conflict here, as mentioned by sellibitze.
A higher bitrate allows the capability of storing more audio detail, but doesn't do anything for the fidelity of your recording. If your recording was already of low quality, higher bitrates will only help preserve the level of fidelity available in your audio sample.

Does the middle section need to be higher quality or just higher bitrate to maintain constant quality. If it's the latter you get it with a decent encoder in VBR mode (variable bitrate). If you want the quality of the middle section ("region of interest") to be higher I don't think it's that easy. In theory you can encode the track twice and mix & match afterwards. But mixing frames is not that easy due to the bitreservoir.

I think you are looking for the term variable bit rate, more info here.

Related

Best video format / codec to optimise 'seeking' with Videogular

I am using the Videogular2 library within my Ionic 3 application. A major feature of the application is the ability to seek to different places within a video.
I noticed that some formats have very quick seek response, while others take seconds to get there, even if the video is in the buffer already - I assume this may depend on the decoding process being used.
What would the best compromise be in order to speed up seek time while still keeping the file size reasonably small so that the video can be streamed from a server?
EDIT
Well, I learned that the fact that my video was recorded in the mov format caused the seek delays. Any transcoding applied to this didn't help because mov is lossy and the damage must have been done already. After screen-capturing the video and encoding it in regular mp4, the seeking happens almost instantaneously.
What would the best compromise be in order to speed up seek time while
still keeping the file size reasonably small so that the video can be
streamed from a server?
Decrease key-frame distance when encoding the video. This will allow for building a full frame quicker with less scanning, depending on codec.
This will increase the file size if using the same quality parameters, so the compromise for this is to reduce quality at the same time.
The actual effect depends on the codec itself, how it builds intermediate frames, and how it is supported/implemented in the browser. This together with the general load/caching-strategy (you can control some of the latter via media source extensions).

Streaming File Delta Encoding/Decoding

Here's the problem - I want to generate the delta of a binary file (> 1 MB in size) on a server and send the delta to a memory-constrained (low on RAM and no dynamic memory) embedded device over HTTP. Deltas are preferred (as opposed to sending the full binary file from the server) because of the high cost involved in transmitting data over the wire.
Trouble is, the embedded device cannot decode deltas and create the contents of the new file in memory. I have looked into various binary delta encoding/decoding algorithms like bsdiff, VCDiff etc. but was unable to find libraries that supported streaming.
Perhaps, rather than asking if there are suitable libraries out there, are there alternate approaches I can take that will still solve the original problem (send minimal data over the wire)? Although it would certainly help if there are suitable delta libraries out there that support streaming decode (written in C or C++ without using dynamic memory).
Maintain a copy on the server of the current file as held by the embedded device. When you want to send an update, XOR the new version of the file with the old version and compress the resultant stream with any sensible compressor. (Algorithms which allow high-cost encoding to allow low-cost decoding would be particularly helpful here.) Send the compressed stream to the embedded device, which reads the stream, decompresses it on the fly and XORs directly (a copy of) the target file.
If your updates are such that the file content changes little over time and retains a fixed structure, the XOR stream will be predominantly zeroes, and will compress extremely well: number of bytes transmitted will be small, effort to decompress will be low, memory requirements on the embedded device will be minimal. The further your model is from these assumptions, the less this approach will gain you.
Since you said the delta could be arbitrarily random (from zero delta to a completely different file), compression of the delta may be a lost cause. Lossless compression of random binary data is theoretically impossible. Also, since the embedded device has limited memory anyway, using a sophisticated -and therefore computationally expensive- library for compression/decompression of the occasional "simple" delta will probably be infeasible.
I would recommend simply sending the new file to the device in raw byte format, and overwriting the existing old file.
As Kevin mentioned, compressing random data should not be your goal. A few more comments about the type of data your working with would be helpful. Context is key in compression.
You used the term image which makes it sound like the classic video codec challenge. If you've ever seen weird video aliasing effects that impact the portion of the frame that has changed, and then suddenly everything clears up. You've likely witnessed the notion of a key frame along with a series of delta frames. Where the delta frames were not properly applied.
In this model, the server decides what's cheaper:
complete key frame
delta commands
The delta commands are communicated as a series of write instructions that can overlay the clients existing buffer.
Example Format:
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
[Address][Length][Repeat][Delta Payload]
There are likely a variety of methods for computing these delta commands. A brute force method would be:
Perform Smith Waterman between two images.
Compress the resulting transform into delta commands.

How to add sound effects to PCM buffered audio in C++

I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.
For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.
Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.
You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#

h.264 bytestream parsing

The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks
The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.
Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.
Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.