h.264 bytestream parsing - c++

The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks

The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.

Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.

Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.

Related

Explanation of Brotli's encoder algorithm

The Brotli compression format is excellently documented in RFC 7932. You can just read this RFC top to bottom, and it tells you how the format works.
However, while you could probably implement a decoder (decompressor) based on the RFC alone, the RFC doesn't describe the encoder algorithm that is part of Google's reference C implementation (the brotli command line tool). In other words, it doesn't tell us what strategies the encoder uses at different quality levels to find an efficient compressed representation for a given input stream.
Of course I can always read the encoder source, but I was wondering if there was an accessible high-level description of how the encoder works?
All I am aware of is a very brief description in this article:
The higher data density is achieved by a 2nd order context modeling,
re-use of entropy codes, larger memory window of past data and joint
distribution codes.
More importantly, from the same article:
the new algorithm is named after Swiss bakery products. Brötli means
‘small bread’ in Swiss German.
Update:
AardvarkSoup added a much better answer with this link to a compreshensive paper on how Brotli works, by its authors. Some moderator inexplicably deleted that answer, so I have copied the link here.

Partial decoding h264 stream

I'm trying to get information about frames in h264 bitstream. Especially motion vectors of macroblocks. I think, I have to use ffmpeg code for it, but it's really huge and hard to understand.
So, can someone give me some tips or exapmles of partial decoding from raw data of single frame from h264 stream?
Thank you.
Unfortunately, to get that level of information from the bitstream you have to decode every macroblock, there's no quick option, like there would be for getting information from the slice header.
One option is to use the h.264 reference software and turn on the verbose debug output and/or add your own printf's where needed, but this is also a large code base to navigate:
http://iphome.hhi.de/suehring/tml/
(You can also use ffmpeg and add output where needed too as you said, but it would take some understanding of that code base too)
There are graphical tools for analyzing video bitstreams which will show you this type of information on a per-macroblock basis, many are expensive, but sometimes there are free trial versions available.

How to add sound effects to PCM buffered audio in C++

I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.
For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.
Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.
You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.

Parsing char* data of h.264

I have an char* array of binary data.
It is binary media-stream encoded with h.264.
It has next structure: ...
stream_header is 64 bytes struct.
I've already done reinterpret_cast(charArray) where chararray represents first 64 bytes of stream. I'm successfully get all header data. In this header there is an nLength variable, which tell us how many bytes of media data is in next stream_data.
For example 1024 bytes.
I read next 1024 bytes in char* data array, and here my question begins: how I can get from this data set of video frames (in structure i have info about resolution of this frames), and save it in *.jpg files such as (1.jpg 2.jpg 3.jpg .....)
Maybe someone has already done something simmilar??? Help me plz..
You need an H264 decoder library, best option is ffmpeg
But even then it's a bit complicated to use the library - although decoding is simpler since you have less options to worry about.
Do you really need to do this in a program? It's very simple to use the 'ffmpeg' executable to save a video as jpegs
If you just want to get a sequence of JPEGs from a video file, GStreamer can do that among many other things.
If you want to write code from scratch to convert H.264 video into JPEGs, let me warn you that you have many hundreds of pages of specifications documents and some very serious mathematics to understand and then implement. It would be months of work for a reasonably skilled programmer mathematician. Understanding the MP4 format is the easy part, the video compression will blow your mind.

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#