Parsing char* data of h.264 - c++

I have an char* array of binary data.
It is binary media-stream encoded with h.264.
It has next structure: ...
stream_header is 64 bytes struct.
I've already done reinterpret_cast(charArray) where chararray represents first 64 bytes of stream. I'm successfully get all header data. In this header there is an nLength variable, which tell us how many bytes of media data is in next stream_data.
For example 1024 bytes.
I read next 1024 bytes in char* data array, and here my question begins: how I can get from this data set of video frames (in structure i have info about resolution of this frames), and save it in *.jpg files such as (1.jpg 2.jpg 3.jpg .....)
Maybe someone has already done something simmilar??? Help me plz..

You need an H264 decoder library, best option is ffmpeg
But even then it's a bit complicated to use the library - although decoding is simpler since you have less options to worry about.
Do you really need to do this in a program? It's very simple to use the 'ffmpeg' executable to save a video as jpegs

If you just want to get a sequence of JPEGs from a video file, GStreamer can do that among many other things.
If you want to write code from scratch to convert H.264 video into JPEGs, let me warn you that you have many hundreds of pages of specifications documents and some very serious mathematics to understand and then implement. It would be months of work for a reasonably skilled programmer mathematician. Understanding the MP4 format is the easy part, the video compression will blow your mind.

Related

How to convert wav to mp3 and mp3 to wav while keeping the same size

I cannot find out how I can convert a wav to mp3 and mp3 to wav. Does anyone know how to convert a .wav file into a .mp3 or .ogg and later convert back into .wav while matching 100% same size like untouched (if it can be done in the command line its much better). I tried to use LAME and later back to .wav with some tools but the file wouldn’t match 100% byte per byte like if it was never was touched. Does anyone know any command line in SoX or FFMPEG that can help me? Thanks!
Most WAV files are raw PCM. MP3 is MP3. And, most Ogg files are going to contain Vorbis or Opus.
MP3, Vorbis, and Opus, are all lossy codecs. They work by taking advantage of what we hear and what we don't hear, psychoacoustics and all that, and saving bandwidth. It's tradeoff between bandwidth and audio quality.
You cannot use the output of a lossy codec to get back to the original source. Therefore, you definitely can't expect to binary compare the outputs and get them to be the same.
You also can't even get the same file size really without knowing more about the source. For instance, the input of your MP3 codec might have been 24-bit audio, but the output of the receiving codec is almost always going to be configured for 16-bit. Also, it's common for these lossy codecs to not be sample-accurate. MP3 in particular has a problem with this. Read up on "gapless playback" if you're in doubt.

C/C++: Streaming MP3

In a C++ program, I get multiple chunks of PCM data and I am currently using libmp3lame to encode this data into MP3 files. The PCM chunks are produced one after another. However, instead of waiting until the PCM data stream finished, I'd like to encode data early as possible into multiple MP3 chunks, so the client can either play or append the pieces together.
As far as I understand, MP3 files consist of frames, files can be split along frames and published in isolation. Moreover, there is no information on length needed in advance, so the format is suitable for streaming. However, when I use libmp3lame to generate MP3 files from partial data, the product cannot be interpreted by audio players after concatted together. I deactivated the bit reservoir, thus, I expect the frames to be independent.
Based on this article, I wrote a Python script that extracts and lists frames from MP3 files. I generated an MP3 file with libmp3lame by first collecting the whole PCM data and then applying libmp3lame. Then, I took the first n frames from this file and put them into another file. But the result would be unplayable as well.
How is it possible to encode only chunks of an audio, which library is suitable for this and what is the minimum size of a chunk?
I examined the source code of lame and the file lame_main.c helped me to come to a solution. This file implements the lame command-line utility, which also can encode multiple wav files, so they can be appended to a single mp3 file without gaps.
My mistake was to initialize lame every single time I call my encode function, thus, initialize lame for each new segment. This causes short interruptions in the output mp3. Instead, initializing lame once and re-using it for subsequent calls already solved the problem. Additionally, I call lame_init_bitstream at the start of encode and use lame_set_nocap_currentindex and lame_set_nogap_total appropriately. Now, the output fragments can be combined seamlessly.

Partial decoding h264 stream

I'm trying to get information about frames in h264 bitstream. Especially motion vectors of macroblocks. I think, I have to use ffmpeg code for it, but it's really huge and hard to understand.
So, can someone give me some tips or exapmles of partial decoding from raw data of single frame from h264 stream?
Thank you.
Unfortunately, to get that level of information from the bitstream you have to decode every macroblock, there's no quick option, like there would be for getting information from the slice header.
One option is to use the h.264 reference software and turn on the verbose debug output and/or add your own printf's where needed, but this is also a large code base to navigate:
http://iphome.hhi.de/suehring/tml/
(You can also use ffmpeg and add output where needed too as you said, but it would take some understanding of that code base too)
There are graphical tools for analyzing video bitstreams which will show you this type of information on a per-macroblock basis, many are expensive, but sometimes there are free trial versions available.

h.264 bytestream parsing

The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks
The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.
Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.
Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.

How to write mp3 frames from PCM data (C/C++)?

How to write mp3 frames (not full mp3 files with ID3 etc) from PCM data?
I have something like PCM data (for ex 100mb) I want to create an array of mp3 frames from that data. How to perform such operation? (for ex with lame or any other opensource encoder)
What do I need:
Open Source Libs for encoding.
Tutorials and blog articles on How to do it, about etc.
You should be able to use LAME. It has a -t command line switch that turns off the INFO header in the output (otherwise present in frame 0). If that still leaves too much bookkeeping data, you should be able to write a separate tool to strip that away.
You are already on the right track: use LAME external executable, or any other shell-invoked encoder.
To build MP frames, were your layer of interest is 3, is not easy to do from scratch. There are compression steps, Fast-fourier transforms followed by quantization, which are of complex and tediously long explanation. The amount of work required for a developer to build it from scratch is very big.
There are programmatic C and C++ MP encoding libs, but you will be either asked for fees, be left with very limited support, or have very limited interfacing options.
Go LAME, study their wiki.