How to control framerate of encoded data inside uridecodebin? - gstreamer

Im using uridecodebin with multiple types of sources, rtsp, http, filesrc, etc. But I want to control framerate at which frames are decoded, for example decode only 1 frame every 60 frames. I can set plugin that changes framerate after uridecodebin, but it will change framerate of already decoded frames(drop them).
May be there is some element that autodetects the source element, that I can connect to decodebin? I found autovideosrc, but I dont understand how to use it. Any advise appreciated.

Related

Grey Video frames when using OpenCV Videocapture with GStreamer C++

Hey,
I am new to Gstreamer and want to send a video that is captured from a camera and manipulated with OpenCV over a network to the receiving part. The receiving part then read it and displays it. This shall be done in real-time. It basically works with the code/gstreamer settings below however as soon a frame is dropped (at least I think this is the reason) the video get corrupted in form of grey parts (attached picture).
OpenCV Sending Part:
cv::VideoWriter videoTransmitter("appsrc ! videoconvert ! videoscale ! x264enc ! rtph264pay config-interval=1 pt=96 ! udpsink host=192.168.168.99 port=5000", cv::VideoWriter::fourcc('H', '2', '6', '4'), 10, videoTransmitter_imageSize, true);
OpenCV Receiving part:
cv::VideoCapture videoReceiver("udpsrc port=5000 ! application/x-rtp ! rtpjitterbuffer ! rtph264depay ! avdec_h264 ! videoconvert ! appsink", cv::CAP_GSTREAMER);
It basically works but I often get grey parts in the video which then stay for a bit until the video is displayed correctly. I guessed it happens always when a frame is dropped due to the transmission. However, how can I get rid of these grey/corrupted frames? Any Hints? Any Gstreamer parameters I need to set to tune result? Is there a better way to stream a video with opencv over network?
Any help is appreciated!
No, there isn't any mechanism in Gstreamer to detect corrupted frames, because this doesn't make sense.
In most modern video codec, frame aren't sent in full anymore, but split in slices (meaning only a small part of the frame). It can takes multiple intra packets (each containing multiple slices) to build a complete frame, and this is a good thing, because it makes your stream more resilient to errors, and allow multithreaded decoding of the slices (for example).
In order to achieve what you want, you have multiple solutions:
Use RTP/RTCP instead of RTP over UDP only. At least RTP contains a sequence number and "end of frame" markers so it possible to detect some packet drops. Gstreamer doesn't care about those by default unless you have started a RTP/RTCP session. If you set up a session with RTCP, you can have reports when some packets were dropped. I'm not sure there is a pipeline way to be informed when a packet is dropped, so you might still have to write an appsink in your gstreamer pipeline to add some code for detecting this event. However, this will tell you something is wrong, but not when it's ok to resume or how much wrong it is. In Gstreamer speak, it's called RTPSession, and you're interested in the stats::XXX_nack_count properties,
Add some additional protocol to compute the checksum of the encoder's output frame/NAL/packet and transmit out of band. Make sure the decoder also compute the checksum of incoming frame/NAL/packet and if doesn't match, you'll know it'll fail decoding. Beware of packet/frame reordering (typically B frames will be re-ordered after their dependencies) that could disturb your algorithm. Again, you have no way to know when to resume upon an error. Using TCP instead of UDP might be enough to fix it if you only have partial packet drop, but it'll fail to resume if it's a bandwidth issue (if the video bandwidth > network bandwidth, it'll collapse, since TCP can't drop packets to adapt)
Use intra only video codec (like APNG, or JPEG). JPEG can also partially decode, but gstreamer's default software jpeg decoder doesn't output a partial JPEG frame.
Set a closed and shorter GOP in your encoder. Many encoder have a pseudo "gop = group of picture" parameter and count the frames in your decoder when decoding after an error. A GOP ensure that whatever the state of the encoding, after GOP frames, the encoder will emit an non-dependent group of frames (likely enough intra frame/slices to rebuild the complete frame). This will allow resuming after an error by dropping GOP - 1 frames (you must decode them, but you can't use them, they might be corrupted), you'll need a way to detect the error, see point 1 or 2 above. For x264enc the parameter is called key-int-max. You might want to try also intra-refresh=true so the broken frame effect upon error will be shorter. The downside is an increase in bandwidth for the same video quality.
Use a video codec with scalable video coding (SVC instead of AVC for exemple). In that case, in case of decoding error, you'll get a lower quality instead of corrupted frame. There isn't any free SVC encoder I'm aware of in Gstreamer.
Deal with it. Compute a saturation map of the picture with OpenCV and compute its deviation & mean. If it's very different from the previous picture, stop computation until the GOP has elapsed and the saturation is back to expected levels.

How to insert a key frame(Iframe) to a h.264 video stream in ffmpeg C++ api?

I have a real time video stream, and want to cut some video clips from it by accurate timestamp(pts).
When I receiver an avpacket, I decode it, and do something and cache the avpacket. I don't want to re-encode all avpackets, it cost cpu resource.
There are many gop structure in H.264 stream, usually we should cut the video begin at the key frame, and end at the key frame. Otherwise the front some frames in the video clip would display error.
Now I use av_write_frame to make avpacket to video. But sometimes the length of gop is very long, such as it could be 250, 8.3s(30 frame per second). It means the distance between two I-frame could be 250 frames. The video clip is short, I don't want to add too many unused frames.
How should I do? I think i should insert a i-frame at the start position of video clip. Could I change a p-frame to i-frame?
Thanks your reading!
This is not possible in the generic case, but may be in specific cases. Even then, there are no open source/free tools to do this, and I am unaware of any commercial tools. The reason I say it is not possible in the generic case is each frame can reference up to 16 other frames. So you can not just replace a single frame, You will need to replace all referenced frames. Doing this will likely take almost as much CPU as encoding the whole GOP.

Capturing an AVI video with DirectShow

I'm trying to capture an AVI video, using DirectShow AVIMux and FileWriter Filters.
When I connect SampleGrabber filter instead of the AVIMux, I can clearly see that the stream is 30 fps, however upon capturing the video, each frame is duplicated 4 time and I get a 120 frames instead of 30. The movie is 4 times slower than it should be and only the first frame in the set of 4 is a Key Frame.
I tried the same experiment with 8 fps and for each image I received, I had 15 frames in the video. And in case of 15 fps, I got each frame 8 times.
I tried both writing the code in C++ and testing it with Graph Edit Plus.
Is there any way I can control it? Maybe some restrictions on the AVIMux filter?
You don't specify your capture format which could have some bearing on the problem, but generally it sounds like the graph when writing to file has some bottleneck which prevents the stream from continuing to flow at 30fps. The camera is attempting to produce frames at 30fps, and it will do so as long as buffers are recycled for it to fill.
But here the buffers aren't available because the file writer is busy getting them onto the disk. The capture filter is starved and in this situation it increments the "dropped frame" counter which travels with each captured frame. AVIMux uses this count to insert an indicator into the AVI file which says in effect "a frame should have been available here to write to file, but isn't; at playback time repeat the last frame". So the file should have placeholders for 30 frames per second - some filled with actual frames, and some "dropped frames".
Also, you don't mention whether you're muxing in audio, which would be acting as a reference clock for the graph to maintain audio-video sync. When capture completes if also using an audio stream, AVIMux alters the framerate of the video stream to make the duration of the two streams equal. You can check whether AVIMux has altered the framerate of the video stream by dumping the AVI file header (or maybe right click on the file in explorer and look at the properties).
If I had to hazard a guess as to the root of the problem, I'd wager the capture driver has a bug in calculating the dropped frame count which is in turn messing up AVIMux. Does this happen with a different camera?

Using Async_reader and Wave Parser in DirectShow filter graph results in video seeking issues

Some background:
I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps, each of which can last for a long time (for example 30 seconds), to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
The problem:
When I attempt to seek through the resulting video, I have to wait until a bitmap's start time occurs before it is displayed. For example, if I'm currently seeing bitmap 4 and skip back to the time which bitmap 2 is displayed the video output will not change until the third bitmap starts. Initially I wondered if I wasn't allowing FillBuffer to be called enough (as at the moment it's only once per bitmap) however I have since noted that when the audio track is very short (just a second long perhaps), I can seek through the video as expected. Is there a another way I should be introducing audio into the filter graph? Do I need to perform some kind of indexing when the WMV has been rendered? I'm at a bit of a loss...
You may need to do indexing as a post-processing step. Try indexing it with Windows Media File Editor from Windows Media Encoder SDK and see if this improves seeking.
Reducing key frame interval in the encoder profile may improve seeking. This can be done in Windows Media Profile Editor from the SDK. Note that this will cause file size increase.

C/C++ library for seekable movie format

I'm doing some processing on some very large video files (often up to 16MP), and I need a way to store these videos in a format that allows seeking to specific frames (rather than to times, like ffmpeg). I was planning on just rolling my own format that concatenates all of the individually zlib compressed frames together, and then appends an index on the end that links frame numbers to file byte indices. Before I go about this though, I just wanted to check to make sure I'm not duplicating the functionality of another format/library. Has anyone heard of a format/library that allows lossless compression and random access of videos?
The reason it is hard to seek to a specific frame in most video codecs is that most frames depend on another frame or frames, so frames must be decoded as a group. For this reason, most libraries will only let you seek to the closest I-frame (Intra-frame - independently decodable frame). To actually produce an image from a non-I-frame, data from other frames is required, so you have to decode a number of frames worth of data.
The only ways I have seen this problem solved involve creating an index of some kind on the file. In other words, make a pass through the file and create an index of what frame corresponds to a certain time or section of the file. Since the seeking functions of most libraries are only able to seek to an I frame so you may have to seek to the closest I-frame and then decode from there to the exact frame you want.
If space is not of high importance, I would suggest doing it like you say, but use JPEG compression instead of zlib as it will give you a lot higher compression ratio since it exploits the fact you are dealing with image data.
If space is an issue, P frames (depend on previous frame/frames) can greatly reduce the size of the file. I would not mess with B frames (depend on previous and future frame/frames) since they make it much harder to get things right.
I have solved the problem of seeking to a specific frame in the presence of B and P frames in the past using ffmpeg (libavformat) to demux the video into packets (1 frame's worth of data per packet) and concatenate these into a single file. The important thing is to keep and index into that file so you can find packet bounds for a given frame. If the frame is an I-frame, you can just feed that frame's data into an ffmpeg decoder and it can be decoded. If the frame is a B or P frame, you have to go back to the last I-frame and decode forward from there. This can be quite tricky to get right, especially for B-frames since they are often sent in a different order than how they are displayed.
Some formats allow you to change the number of key frames per second.
For example, I've used ffmpeg to encode to flv at 25 frames per second with 25 key frames per second, and then used a player that was fine in moving to key frames. Basically this allowed me to do frame by frame seeking.
Also the last time I checked quicktime can do frame by frame seek without having to have each frame being a key frame.
May not be applicable to you but that's my thoughts.