I am currently attempting to develop a player that can perform accurate seeking based on an mpeg4 elementary video stream. I'm in the planning stage and trying to decide how to go about things and I'd like some advice before I start.
Some things to note are:
I will have complete control over the encoding of the file.
The original content will be I-frame only
FFmpeg is the encoding/decoding library
Audio can be disregarded for now. I will only be dealing with the video stream.
Frame accurate seeking must be implemented
So, when I'm encoding the content, can I query what type of frame (I, P, B) has been encoded so I can construct an additional index stream for the seeking operation. If not, I can query the GOP after it has been encoded to find the I-frame.
As for playback, the user needs to be able to type in a specific time and go to that frame (the nearest I-frame will be suitable for now). We can assume that the GOP is closed and the length is fairly short (e.g. 15 frames). My thoughts are to query the index stream that I created during encode and determine the relevant distance into the stream for the requested time.
I'm not sure how to seek using the FFMpeg library when playing back files.
Has anyone done anything similar and if so, can you give a brief explanation of how you did it?
Related
I am using the Videogular2 library within my Ionic 3 application. A major feature of the application is the ability to seek to different places within a video.
I noticed that some formats have very quick seek response, while others take seconds to get there, even if the video is in the buffer already - I assume this may depend on the decoding process being used.
What would the best compromise be in order to speed up seek time while still keeping the file size reasonably small so that the video can be streamed from a server?
EDIT
Well, I learned that the fact that my video was recorded in the mov format caused the seek delays. Any transcoding applied to this didn't help because mov is lossy and the damage must have been done already. After screen-capturing the video and encoding it in regular mp4, the seeking happens almost instantaneously.
What would the best compromise be in order to speed up seek time while
still keeping the file size reasonably small so that the video can be
streamed from a server?
Decrease key-frame distance when encoding the video. This will allow for building a full frame quicker with less scanning, depending on codec.
This will increase the file size if using the same quality parameters, so the compromise for this is to reduce quality at the same time.
The actual effect depends on the codec itself, how it builds intermediate frames, and how it is supported/implemented in the browser. This together with the general load/caching-strategy (you can control some of the latter via media source extensions).
I have an char* array of binary data.
It is binary media-stream encoded with h.264.
It has next structure: ...
stream_header is 64 bytes struct.
I've already done reinterpret_cast(charArray) where chararray represents first 64 bytes of stream. I'm successfully get all header data. In this header there is an nLength variable, which tell us how many bytes of media data is in next stream_data.
For example 1024 bytes.
I read next 1024 bytes in char* data array, and here my question begins: how I can get from this data set of video frames (in structure i have info about resolution of this frames), and save it in *.jpg files such as (1.jpg 2.jpg 3.jpg .....)
Maybe someone has already done something simmilar??? Help me plz..
You need an H264 decoder library, best option is ffmpeg
But even then it's a bit complicated to use the library - although decoding is simpler since you have less options to worry about.
Do you really need to do this in a program? It's very simple to use the 'ffmpeg' executable to save a video as jpegs
If you just want to get a sequence of JPEGs from a video file, GStreamer can do that among many other things.
If you want to write code from scratch to convert H.264 video into JPEGs, let me warn you that you have many hundreds of pages of specifications documents and some very serious mathematics to understand and then implement. It would be months of work for a reasonably skilled programmer mathematician. Understanding the MP4 format is the easy part, the video compression will blow your mind.
I'm just trying to read a video Stream out of an IP Camera (Basler BIP-1280c).
The stream I want to have is saved in a buffer on the camera, has a length of 40 seconds and is decoded in MJPEG.
Now if I access the stream via my webbrowser it shows me the 40 seconds without any problems.
But actually I need an application which is capable of downloading and saving the stream by itself.
The camera is accessed via http, so I am using libcurl to access it. This works fine and I also can download the stream without any troubles. I have chosen to save the stream data into an *.avi file (hope that's correct…?).
But now to the problem: I can open the video (tried with Totem Video Player and VLC) and also view all that has been recorded — BUT it's way too fast. The whole video lasts like 5 seconds (instead of 40). Is there in MJPEG anything in a header where to put information like the total video length or the fps? I mean there must be some information missing for the video players, so that they play it way to fast?
Update:
As suggested in the answers, I opened the file with a hexeditor and what I found was this:
--myboundary..Content-Type: image/jpeg..Content-Length: 39050.........*Exif..II*...............V...........................2...................0210................FrameNr=000398732
6.AOI=(0800x0720)#(0240,0060)/(1280x0720).Motion=00000 (no)
[00000 | 00000 | 00000 | 00000 | 00000].Alarm=0000 (no) .IO
=000.RtTrigger=0...Basler..BIP2-1280c..1970:01:05 23:08:10.8
98286......JFIF.................................. ....&"((
This header reoccurs in the file all over ( followed by a a lot of Bytes of binary Data ). This is actually okay, since I read in the camera manual that all MJPEG Pictures get this Header.
More interesting ins the JFIFin the last line. As in the answers suggested this is maybe the indicator of the file format. But afaik JFIF is a single picture format just like jpg. So does this maybe even mean that the whole video file is just some "brainless" chained pictures? And my Player just assumes that he should show this pictures one after another, without any knowledge about the framerate?
There is not a single format to use with MJPEG. From Wikipedia:
[...] there is no document that defines a single exact format that is
universally recognized as a complete specification of “Motion JPEG”
for use in all contexts.
The formats differ by vendor. My advice would be to closely inspect the file you download. Check if it is really an AVI container. (Some cameras can send the frames wrapped in a MIME container).
After the container format is clear, you can check out the documentation of that container and look for a file which has that format and the desired fps. Then you can start adjusting your downloaded file to have the desired effect.
You might also find this project useful: http://mjpeg.sourceforge.net/
Edit:
According to your sample data your camera sends the frames packed into a MIME container. (The first line is the boundary, then the headers until you encounter an empty line, then the file data itseld, followe by the boundary and so on).
These are JPEG files as the header suggests: image/jpeg. JFIF is the standard file format to store JPEG data.
I recommend you to:
Extract the contents of the file into multiple jpeg files (with munpack for instance), then
use ffmpeg or mplayer to create a movie file out of the series of jpegs.
This way you can specify the desired frame rate too.
It can make things more complicated if the camera dynamically canges AOI (area of interest), meaning it can send only a smaller part of the image where change occured. But you should check first if the simple approach works.
on un*x systems (linux, osx,...), you can use the file cmdline tool to make a (usually good) guess about the file format.
--myboundary is an indication that the stream is regular M-JPEG streamed as multipart content over HTTP. There is no well known file format which can hold this stream "as is" and be playable (that is if you rename this to AVI it is not supposed to play back).
The format itself is a sequence of (boundary, subheader, JPEG image), (boundary, subheader, JPEG image), ... etc. The stream does not have time stamps, so playback speed completely depends on the player.
I'm trying to get ffmpeg to seek h264 interlaced videos, and i found that i can seek to any frame if i just force it.
I already hacked the decoder to consider I - Frames as keyframes, and it works nicely with the videos i need it to work with. And there will NEVER be any videos encoded with different encoders.
However, i'd like the seek to find me an I - Frame and not just any frame.
What i'd need to do is to hack The AVIndexEntry creation so that it marks any frame that is an I-Frame to be a key frame.
Or alternatively, hack the search thing to return I - Frames.
The code does get a tad dfficult to follow at this point.
Can someone please point me at the correct place in ffmpeg code which handles this?
This isn't possible as far as i can tell..
But if you do know where the I-Frames are, by either decoding the entire video or by just knowing, you can insert stuff into the AVIndexEntry information stored in the stream.
AVIndexEntries have a flag that tells if it's a keyframe, just set it to true on I-Frames.
Luckily, i happen to know where they are in my videos :)
-mika
The input data is a byte array which represents a h.264 frame. The frame consists of a single slice (not multislice frame).
So, as I understood I can cope with this frame as with slice. The slice has header, and slice data - macroblocks, each macroblock with its own header.
So I have to parse that byte array to extract frame number, frame type, quantisation coefficient (as I understood each macroblock has its own coefficient? or I'm wrong?)
Could You advise me, where I can get more detailed information about parsing h.264 frame bytes.
(In fact I've read the standard, but it wasn't very specific, and I'm lost.)
Thanks
The H.264 Standard is a bit hard to read, so here are some tips.
Read Annex B; make sure your input starts with a start code
Read section 9.1: you will need it for all of the following
Slice header is described in section 7.3.3
"Frame number" is not encoded explicitly in the slice header; frame_num is close to what you probably want.
"Frame type" probably corresponds to slice_type (the second value in the slice header, so most easy to parse; you should definitely start with this one)
"Quantization coefficient" - do you mean "quantization parameter"? If yes, be prepared to write a full H.264 parser (or reuse an existing one). Look in section 9.3 to get an idea on a complexity of a H.264 parser.
Standard is very hard to read. You can try to analyze source code of existing H.264 video stream decoding software such as ffmpeg with it's C (C99) libraries. For example there is avcodec_decode_video2 function documented here. You can get full working C (open file, get H.264 stream, iterate thru frames, dump information, get colorspace, save frames as raw PPM images etc.) here. Alternatively there is great "The H.264 Advanced Video Compression Standard" book, which explains standard in "human language". Another option is to try Elecard StreamEye Pro software (there is trial version), which could give you some additional (visual) perspective.
Actually much better and easier (it is only my opinion) to read H.264 video coding documentation.
ffmpeg is very good library but it contain a lot of optimized code. Better to look at reference implementation of the H.264 codec and official documentation.
http://iphome.hhi.de/suehring/tml/download/ - this is link to the JM codec implementation.
Try to separate levels of decoding process, like transport layer that contains NAL units (SPS, PPS, SEI, IDR, SLICE, etc). Than you need to implement VLC engine (mostly exp-Golomb codes of 0 range). Than very difficult and powerful codec called CABAC (Context Adaptive Arithmetic Binary Codec). It is quite tricky task. Demuxing process (goes after unpacking of a video data) also complicated. You need completely understand each of such modules.
Good luck.