MPEG backwards frame decoding using FFmpeg - c++

I have so-called "block's" that stores some of MPEG4 (I,p,p,p,p...) frames.
For every "block", frame starts with an "I" frame, and ends before next "I" frame.
(VOL - "visual_object_sequence_start_code" is allways included before the "I" frame)
I need to be able to play those "block" frames in "backwards" mode.
The thick is that:
It's not possible to just take the last frame in my block and perform decoding, because it's a "P" frame and it needs an "inter frame (I)" to be correctly decoded.
I can't just take my first "I" frame, then pass it to the ffmpeg's "avcodec_decode_video" function and only then pass my last "P" frame to the ffmpeg, because that last "P" frame depends on the "P" frame before it, right? (well.. as far as I've tested this method, my last decoded P frame had artifacts)
Now the way I'm performing backwards playing is - first decoding all of my "block" frames in RGB and store them in memory. (in most cases it would be ~25 frames per block max.) But this method really requires a lot of memory... (especially if frames resolutions is high)
And I have a feeling that this is not the right way to do this...
So I would like to ask, does any one have any suggestions how this "backwards" frame decoding/playing could be performed using FFmpeg?
Thanks

What you are looking at really a research problem: To get a glimps of the overall approach, look at the following paper:
Compressed-Domain Reverse Play of MPEG Video Streams, SPIE International Symposium on Voice, Video, and Data Communications, Boston, MA, November, 1998.
Reverse-play algorithm for MPEG video streaming
MANIPULATING TEMPORAL DEPENDENCIES IN COMPRESSED VIDEO DATA WITH APPLICATIONS TO COMPRESSED-DOMAIN PROCESSING OF MPEG VIDEO.
Essentially, there is still advanced encoding based on key frames, however, you can reverse the process of motion compensation to achieve the reverse flow. This is done by on the fly conversion of P frames into I frames. This does require looking forward but doesn't require that much more memory. Possibly you can save this as a new file and then apply it to standard decoder with reverse play requirements.
However, this is very complex, and i have seen rare softwares doing this practically.

I do not think there is a way around starting from I-frame and decoding all P-frames, due to P-frame depending on the previous frame. To handle decoded frames, they can be saved to a file, or, with limited storage and extra CPU power, older P-frames can be discarded and recomputed later.
At the command level, you can convert input video to a series of images:
ffmpeg -i input_video output%4d.jpg
then reverse their order somehow and convert back to a video:
ffmpeg -r FRAME_RATE -i reverse_output%4d.jpg output_video
You may consider pre-processing, if it is an option.

Related

How to detect camera frame loss using Windows media API like Media Foundation or DirectShow?

I am writing an application for Windows that runs a CUDA accelerated HDR algorithm. I've set up an external image signal processor device that presents as a UVC device, and delivers 60 frames per second to the Windows machine over USB 3.0.
Every "even" frame is a more underexposed frame, and every "odd" frame is a more overexposed frame, which allows my CUDA code perform a modified Mertens exposure fusion algorithm to generate a high quality, high dynamic range image.
Very abstract example of Mertens exposure fusion algorithm here
My only problem is that I don't know how to know when I'm missing frames, since the only camera API I have interfaced with on Windows (Media Foundation) doesn't make it obvious that a frame I grab with IMFSourceReader::ReadSample isn't the frame that was received after the last one I grabbed.
Is there any way that I can guarantee that I am not missing frames, or at least easily and reliably detect when I have, using a Windows available API like Media Foundation or DirectShow?
It wouldn't be such a big deal to miss a frame and then have to purposefully "skip" the next frame in order to grab the next oversampled or undersampled frame to pair with the last frame we grabbed, but I would need to know how many frames were actually missed since a frame was last grabbed.
Thanks!
There is IAMDroppedFrames::GetNumDropped method in DirectShow and chances are that it can be retrieved through Media Foundation as well (never tried - they are possibly obtainable with a method similar to this).
The GetNumDropped method retrieves the total number of frames that the filter has dropped since it started streaming.
However I would question its reliability. The reason is that with these both APIs, the attribute which is more or less reliable is a time stamp of a frame. Capture devices can flexibly reduce frame rate for a few reasons, including both external like low light conditions and internal like slow blocking processing downstream in the pipeline. This makes it hard to distinguish between odd and even frames, but time stamp remains accurate and you can apply frame rate math to convert to frame indices.
In your scenario I would however rather detect large gaps in frame times to identify possible gap and continuity loss, and from there run algorithm that compares exposure on next a few consecutive frames to get back to sync with under-/overexposition. Sounds like a more reliable way out.
After all this exposure problem is highly likely to be pretty much specific to the hardware you are using.
Normally MFSampleExtension_Discontinuity is here for this. When you use IMFSourceReader::ReadSample, check this.

Modifying CISCO openh264 to take image frames and out compressed frames

Has anyone tried to modify the CISCO openh264 library to take JPEG images as input and compress them into P and I frames (output as frames, NOT video) and similarly to modify decoder to take compressed P and I frames and generate uncompressed-frames ?
I have a camera looking at a static scene and taking pictures (1280x720p) every 30 second. The scene is almost static. Currenlty I am using JPEG compression to compress each frame individually and it is resulting in an image size of ~270KB. This compressed frame is transferred via internet to a storage server. Since there is very little motion in the scene, the 'I' frame size will be very small (I think it should be ~20-50KB). So it will be very cost effective to transmit I frames over internet instead of JPEG images.
Can anyone guide me to some project or about how to proceed with this task ?
You are describing exactly what a codec does. It takes images, and compresses them. There relationship in time is irrelevant to the compression step. The decoder than decides how to display or just write them to disk. You don't need to modify open264, what you want to do is exactly what it is designed to do.

How to insert a key frame(Iframe) to a h.264 video stream in ffmpeg C++ api?

I have a real time video stream, and want to cut some video clips from it by accurate timestamp(pts).
When I receiver an avpacket, I decode it, and do something and cache the avpacket. I don't want to re-encode all avpackets, it cost cpu resource.
There are many gop structure in H.264 stream, usually we should cut the video begin at the key frame, and end at the key frame. Otherwise the front some frames in the video clip would display error.
Now I use av_write_frame to make avpacket to video. But sometimes the length of gop is very long, such as it could be 250, 8.3s(30 frame per second). It means the distance between two I-frame could be 250 frames. The video clip is short, I don't want to add too many unused frames.
How should I do? I think i should insert a i-frame at the start position of video clip. Could I change a p-frame to i-frame?
Thanks your reading!
This is not possible in the generic case, but may be in specific cases. Even then, there are no open source/free tools to do this, and I am unaware of any commercial tools. The reason I say it is not possible in the generic case is each frame can reference up to 16 other frames. So you can not just replace a single frame, You will need to replace all referenced frames. Doing this will likely take almost as much CPU as encoding the whole GOP.

Can the mp3 or wav file format take advantage of repetitious sounds?

I want to store a number of sound fragments as MP3 or WAV files, but these fragments are each highly repetitive (a 10 second burst of tone for example). Are the MP3 or WAV file formats able to take advantage of this - i.e. is there a sound file equivalent of run-length encoding?
No, neither codec can do this.
WAV files (typically) use PCM, which holds a value for every single sample. Even if there were complete digital silence (all values the same), every sample is stored.
MP3 works in frames of 1,152 samples. Each frame stands alone (well, there is the bit reservoir but for the purpose of encoding/decoding, this is just extra bandwidth made available). Even if there were a way to say do-this-n-times, it would be fixed within a frame. Now, if you are using MP3 with variable bit rate, I suspect that you will have great results with perfect sine waves since they have no harmonics. MP3 works by converting from the time domain to the frequency domain. That is, it samples the frequencies in each frame. If you only have one of those frequencies (or no sound at all), the VBR method would be efficient.
I should note that FLAC does use RLE when encoding silence. However, I don't think FLAC could be hacked to use RLE for 10 seconds of audio, since again there is a frame border. FLAC's RLE for silence is problematic for live internet radio stations that leave a few second gap inbetween songs. It's important for these stations to have a large buffer, since clients will often pause the stream if they don't receive enough data. (They do get caught back up again though as soon as that silent block is sent, once audio resumes.)

C/C++ library for seekable movie format

I'm doing some processing on some very large video files (often up to 16MP), and I need a way to store these videos in a format that allows seeking to specific frames (rather than to times, like ffmpeg). I was planning on just rolling my own format that concatenates all of the individually zlib compressed frames together, and then appends an index on the end that links frame numbers to file byte indices. Before I go about this though, I just wanted to check to make sure I'm not duplicating the functionality of another format/library. Has anyone heard of a format/library that allows lossless compression and random access of videos?
The reason it is hard to seek to a specific frame in most video codecs is that most frames depend on another frame or frames, so frames must be decoded as a group. For this reason, most libraries will only let you seek to the closest I-frame (Intra-frame - independently decodable frame). To actually produce an image from a non-I-frame, data from other frames is required, so you have to decode a number of frames worth of data.
The only ways I have seen this problem solved involve creating an index of some kind on the file. In other words, make a pass through the file and create an index of what frame corresponds to a certain time or section of the file. Since the seeking functions of most libraries are only able to seek to an I frame so you may have to seek to the closest I-frame and then decode from there to the exact frame you want.
If space is not of high importance, I would suggest doing it like you say, but use JPEG compression instead of zlib as it will give you a lot higher compression ratio since it exploits the fact you are dealing with image data.
If space is an issue, P frames (depend on previous frame/frames) can greatly reduce the size of the file. I would not mess with B frames (depend on previous and future frame/frames) since they make it much harder to get things right.
I have solved the problem of seeking to a specific frame in the presence of B and P frames in the past using ffmpeg (libavformat) to demux the video into packets (1 frame's worth of data per packet) and concatenate these into a single file. The important thing is to keep and index into that file so you can find packet bounds for a given frame. If the frame is an I-frame, you can just feed that frame's data into an ffmpeg decoder and it can be decoded. If the frame is a B or P frame, you have to go back to the last I-frame and decode forward from there. This can be quite tricky to get right, especially for B-frames since they are often sent in a different order than how they are displayed.
Some formats allow you to change the number of key frames per second.
For example, I've used ffmpeg to encode to flv at 25 frames per second with 25 key frames per second, and then used a player that was fine in moving to key frames. Basically this allowed me to do frame by frame seeking.
Also the last time I checked quicktime can do frame by frame seek without having to have each frame being a key frame.
May not be applicable to you but that's my thoughts.