Partial decoding h264 stream - c++

I'm trying to get information about frames in h264 bitstream. Especially motion vectors of macroblocks. I think, I have to use ffmpeg code for it, but it's really huge and hard to understand.
So, can someone give me some tips or exapmles of partial decoding from raw data of single frame from h264 stream?
Thank you.

Unfortunately, to get that level of information from the bitstream you have to decode every macroblock, there's no quick option, like there would be for getting information from the slice header.
One option is to use the h.264 reference software and turn on the verbose debug output and/or add your own printf's where needed, but this is also a large code base to navigate:
http://iphome.hhi.de/suehring/tml/
(You can also use ffmpeg and add output where needed too as you said, but it would take some understanding of that code base too)
There are graphical tools for analyzing video bitstreams which will show you this type of information on a per-macroblock basis, many are expensive, but sometimes there are free trial versions available.

Related

Windows Media Foundation: IMFSourceReader::SetCurrentMediaType execution time issue

I'm currently working on retrieving image data from a video capturing device.
It is important for me that I have raw output data in a rather specific format and I need a continuous data stream. Therefore I figured to use the IMFSourceReader. I pretty much understand how it is working. For the whole pipeline to work I checked the output formats of the camera and looked at the available Media Foundation Transforms(MFTs).
The critical function here is IMFSourceReader::SetCurrentMediaType. I'd like to elaborate one critical functionality I discovered. If I just use the function with the parameters of my desired output format, it changes some parameters like fps or resolution, but the call succeeds. When I first call the function with a native media type with my desired parameters and a wrong subtype (like MJPG or sth.) and call it again with my desired parameters and the correct subtype the call succeeds and I end up with my correct parameters. I suspect this is only true, if fitting MFTs (decoders) are available.
So far I've pretty much beaten the WMF to get what I want. The Problem now is, that the second call of IMFSourceReader::SetCurrentMediaType takes a long time. The duration depends heavily on the camera used. Varying from 0.5s to 10s. To be honest I don't really know why its taking so long, but I think the calculation of the correct transformation paths and/or the initialization of the transformations is the problem. I recognized an excessive amount of loading and unloading of the same dlls(ntasn1.dll, ncrypt.dll, igd10iumd32.dll). But loading them once myself didn't change anything.
So does anybody know this issue and has a quick fix for it?
Or does anybody know a work around to:
Get raw image data via media foundation without the use ofIMFSourceReader?
Select and load the transformations myself, to support the source reader call?
You basically described the way Source Reader is supposed to work in first place. The underlying media source has its own media types and the reader can supply a conversion if it ever needs to fit requested media type and closest original.
Video capture devices tend to expose many [native] media types (I have a webcam which enumerates 475 of them!), so if format fitting does not go well, source reader might take some time to try one conversion or another.
Note that you can disable source reader's conversions by applying certain attributes like MF_READWRITE_DISABLE_CONVERTERS, in which case inability to set a video format directly on the source would result in an failure.
You can also read data in original device's format and decode/convert/process yourself by feeding the data into one or a chain of MFTs. Typically, when you set respective format on the source reader, the source reader manages the MFTs for you. If you however prefer, you can do it yourself too. Unfortunately you cannot build a chain of MFTs for the source reader to manage. Either you leave it on source reader completely, or you set native media type, you read the data in original format from the reader, and then you manage the MFTs on your side by doing IMFTransform::ProcessInput, IMFTransform::ProcessOutput and friends. This is not as easy as source reader, but is doable.
Since VuVirt does not want to write any answer, I'd like to add one for him and everybody who has the same issue.
Under some conditions the call IMFSourceReader::SetCurrentMediaType takes a long time, when the target format is RGB of some kind and is not natively available. So to get rid of it, I adjusted my image pipeline to be able to interpret YUV (YUY2). I still have no idea, why this is the case, but this is a working work around for me. I don't know any alternative to speed the call up.
Additional hint: I've found that there are usually several IMFTransforms to decode many natively available formats to YUY2. So, if you are able to use YUY2, you are safe. NV12 is another working alternative. Though there are probably more.
Thanks for your answer anyways

How to add sound effects to PCM buffered audio in C++

I have an int16_t[] buffer with PCM raw audio data and I want to apply some effects (like echo, reverb, gain...) into it.
I thought that SoX or similar can do the trick for me, but SoX only works with files and other similar libraries that supports adding sound effects seems to add the effects only when the sound is played. So my problem with this is that I want to apply the effect to the samples into my buffer without playing them.
I have never worked with audio, but reading about PCM data I have learned that I can apply gain multiplying each sample value, for example. But I'm looking for any library or relatively easy algorithms that I can use directly in my buffer to get the sound effects applied.
I'm sure there are a lot of solutions to my problem out there if you know what to look for, but it's my first time with audio "processing" and I'm lost, as you can see.
For everyone like me, interested in learning DSP related to audio processing with C++ I want to share my little research results and opinion, and perhaps save you some time :)
After trying several DSP libraries, finally I have found The Synthesis ToolKit in C++ (STK), an open-source library that offer easy and clear interfaces and easy to understand code that you can dive in to learn about various basic DSP algorithms.
So, I recommend to anyone who is starting out and have no previous experience to take a look at this library.
Your int16_t[] buffer contains a sequence of samples. They represent instantaneous amplitude levels. Think of them as the voltage to apply to the speaker at the corresponding instant in time. They are signed numbers with values in the range (-32767,32767]. A stream of constant zeros means silence. A stream of constant -32000 (for example) also means silence, but it will eventually burn your your speaker coil. The position in the array represents time, and the value of each sample represents voltage.
If you want to mix two sample streams together, for example to apply a chirp, you get yourself a sample stream with the chirp in it (record a bird or something). You then add the two sounds sample by sample.
You can do a super-cheesy reverb effect by taking your original sound buffer, lowering its volume (perhaps by dividing all the samples by a constant), and adding it back to your original stream, but shifting the samples by a tenth of a second's worth of array position.
Those are the basics of audio processing. Things get very sophisticated indeed. This field is known as "digital signal processing" and there are plenty of books on the subject.
You can do it either with hacking the audio buffer and trying to do some effects like gain and threshold with simple math operations or do it correct using proper DSP algorithms. If you wish to do it correct, I would recommend using the Speex Library. It's open source and and well tested. www (dot)speex (dot)org. The code should compile on MSVC or linux with minimal effort. This is the fastest way to get a good audio code working with proper DSP techniques. Your code would look like .. please read the AEC example.
st = speex_echo_state_init(NN, TAIL);
den = speex_preprocess_state_init(NN, sampleRate);
speex_echo_ctl(st, SPEEX_ECHO_SET_SAMPLING_RATE, &sampleRate);
speex_preprocess_ctl(den, SPEEX_PREPROCESS_SET_ECHO_STATE, st);
You need to setup the states, the code testecho includes these.

video/audio encoding/decoding/playback

I've always wanted to try and make a media player but I don't understand how. I found FFmpeg and GStreamer but I seem to be favoring FFmpeg despite its worse documentation even though I haven't written anything at all. That being said, I feel I would understand how things worked more if I knew what they were doing. I have no idea how video/audio streams work and the several media types so that doesn't help. At the end of the day, I'm just 'emulating' some of the code samples.
Where do I start to learn how to encode/decode/playback video/audio streams without having to read hundreds of pages of several 'standards'. Perhaps to a certain extent also be enough knowledge to playback media without relying on another API. Googling 'basic video audio decoding encoding' doesn't seem to help. :(
This seem to be a black art that nobody is out to tell anyone about.
The first part is extracting streams from the container. From there, you need to decode the streams into media. I recommend finding a small Theora video and seeing how the pieces relate there.
you want that we write one answer and you read that and be master in multimedia domain..!!!!
Anyway that can not be by one answer.
First of all understand this terminolgy by googling
1> container -- muxer/demuxer
2> codec --coder/decoder
If you like ffmpeg then go with its basic video plater application. iT is well documented at here http://dranger.com/ffmpeg/ it will shows the method of demuxing container and decoding any elementry stream with ffmpeg api. more about this at http://ffmpeg.org/ffplay.html
i like gstreamer more then ffmpeg. it has well documentation. it will be good choise if you start with gstreamer

Extract and analyse sound from mp3 files

I have a set of mp3 files, some of which have extended periods of silence or periodic intervals of silence. How can I programmatically detect this?
I am looking for a library in C++, or preferably C#, that will allow me to examine the sound content of these files for the silences.
EDIT: I should elaborate what I am trying to achieve. I am capturing streaming sports commentary using VLC and saving it to mp3. When a game is delayed, or cancelled, the streaming commentary is replaced by a repetitive message saying commentary is not available. By looking for these periodic silences (or total silence), I can detect if there is no commentary and stop the streaming recording
For this reason I am reluctant to decompress the mp3 because if would mean my test for these silences would be very slow. Unless I can decode the last 5 minutes of the file?
Thanks
Andrew
I'm not aware of a library that will detect silence directly in the MP3 encoded data, since its not a trivial task to detect silence without first decompressing. Luckily, its easy to find libraries that decode MP3 files and access them as PCM data, and its trivial to detect silence in PCM Data. Here is one such Library for C# I found, but I'm sure there are tons: http://www.robburke.net/mle/mp3sharp/
Once you decode the data, you will have a list of PCM samples. In the most basic form, the algorithm you need to detect silence is simply to analyze a small chunks (could be as little as .25s or as much as several seconds), and make sure that the absolute value of each sample in the chunk is below a threshold. The threshold value you use determines how 'quiet' the sound has to be to be considered silence, and the chunk size determines how long the volume needs to be below that threshold to be considered silence (If you go with very short chunks, you will get lots of false positives due to samples near zero-crossings, but .25s or higher should be ok. There are improvements to the basic approach such as using historesis (which is basically using two thresholds, one for the transition to silence, and one for the transition from silence), and filtering.
Unfortunately, I don't know a library for C++ or C# that implements level detection off hand, and nothing immediately springs up on google, but at least for the simple version its pretty easy to code.
Edit: Also, this library seems interesting: http://naudio.codeplex.com/
Also, while not a true duplicate question, the answers here will be useful for you:
Detecting audio silence in WAV files using C#

h.264 bytestream parsing

The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks
The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.
Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.
Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.