I decode the rtp h264 stream and display it on the screen. In a parallel thread, recording to the mp4 file is sometimes performed. Also, during recording, I mix the sound through mp4mux into the file. Separately, sound and video are written perfectly, but as soon as I combine this, a problem appears. The first few seconds of the video is a black screen, but there is sound. At the same time, sound and video are synchronous. How to solve this problem? Thank you in advance.
Video has a higher latency than audio. That's why you get audio sooner. So you would need to trim the file afterwards if you don't want that. Or you add some logic in your code that drops all audio until the first video is decoded.
Related
Hey,
I am new to Gstreamer and want to send a video that is captured from a camera and manipulated with OpenCV over a network to the receiving part. The receiving part then read it and displays it. This shall be done in real-time. It basically works with the code/gstreamer settings below however as soon a frame is dropped (at least I think this is the reason) the video get corrupted in form of grey parts (attached picture).
OpenCV Sending Part:
cv::VideoWriter videoTransmitter("appsrc ! videoconvert ! videoscale ! x264enc ! rtph264pay config-interval=1 pt=96 ! udpsink host=192.168.168.99 port=5000", cv::VideoWriter::fourcc('H', '2', '6', '4'), 10, videoTransmitter_imageSize, true);
OpenCV Receiving part:
cv::VideoCapture videoReceiver("udpsrc port=5000 ! application/x-rtp ! rtpjitterbuffer ! rtph264depay ! avdec_h264 ! videoconvert ! appsink", cv::CAP_GSTREAMER);
It basically works but I often get grey parts in the video which then stay for a bit until the video is displayed correctly. I guessed it happens always when a frame is dropped due to the transmission. However, how can I get rid of these grey/corrupted frames? Any Hints? Any Gstreamer parameters I need to set to tune result? Is there a better way to stream a video with opencv over network?
Any help is appreciated!
No, there isn't any mechanism in Gstreamer to detect corrupted frames, because this doesn't make sense.
In most modern video codec, frame aren't sent in full anymore, but split in slices (meaning only a small part of the frame). It can takes multiple intra packets (each containing multiple slices) to build a complete frame, and this is a good thing, because it makes your stream more resilient to errors, and allow multithreaded decoding of the slices (for example).
In order to achieve what you want, you have multiple solutions:
Use RTP/RTCP instead of RTP over UDP only. At least RTP contains a sequence number and "end of frame" markers so it possible to detect some packet drops. Gstreamer doesn't care about those by default unless you have started a RTP/RTCP session. If you set up a session with RTCP, you can have reports when some packets were dropped. I'm not sure there is a pipeline way to be informed when a packet is dropped, so you might still have to write an appsink in your gstreamer pipeline to add some code for detecting this event. However, this will tell you something is wrong, but not when it's ok to resume or how much wrong it is. In Gstreamer speak, it's called RTPSession, and you're interested in the stats::XXX_nack_count properties,
Add some additional protocol to compute the checksum of the encoder's output frame/NAL/packet and transmit out of band. Make sure the decoder also compute the checksum of incoming frame/NAL/packet and if doesn't match, you'll know it'll fail decoding. Beware of packet/frame reordering (typically B frames will be re-ordered after their dependencies) that could disturb your algorithm. Again, you have no way to know when to resume upon an error. Using TCP instead of UDP might be enough to fix it if you only have partial packet drop, but it'll fail to resume if it's a bandwidth issue (if the video bandwidth > network bandwidth, it'll collapse, since TCP can't drop packets to adapt)
Use intra only video codec (like APNG, or JPEG). JPEG can also partially decode, but gstreamer's default software jpeg decoder doesn't output a partial JPEG frame.
Set a closed and shorter GOP in your encoder. Many encoder have a pseudo "gop = group of picture" parameter and count the frames in your decoder when decoding after an error. A GOP ensure that whatever the state of the encoding, after GOP frames, the encoder will emit an non-dependent group of frames (likely enough intra frame/slices to rebuild the complete frame). This will allow resuming after an error by dropping GOP - 1 frames (you must decode them, but you can't use them, they might be corrupted), you'll need a way to detect the error, see point 1 or 2 above. For x264enc the parameter is called key-int-max. You might want to try also intra-refresh=true so the broken frame effect upon error will be shorter. The downside is an increase in bandwidth for the same video quality.
Use a video codec with scalable video coding (SVC instead of AVC for exemple). In that case, in case of decoding error, you'll get a lower quality instead of corrupted frame. There isn't any free SVC encoder I'm aware of in Gstreamer.
Deal with it. Compute a saturation map of the picture with OpenCV and compute its deviation & mean. If it's very different from the previous picture, stop computation until the GOP has elapsed and the saturation is back to expected levels.
I have a directshow filter graph that run forever without any stopping. But when I change source of the graph to other video file, synchronization between audio & video streams was failed.
It's happening because of some audio frames haven't played yet. How could tell to graph to flash out audio buffer?
When you stop filter graph, the data is flushed unconditionally.
Without stopping, you can remove buffered data by calling respective input pin's IPin::BeginFlush and IPin::EndFlush methods (the first one and then the second immediately afterwards). This does not have to be renderer's input pin, you are interested in calling the upstream audio pin so that this flushing call is propagated through and drains everything up to the renderer.
I'm trying to capture an AVI video, using DirectShow AVIMux and FileWriter Filters.
When I connect SampleGrabber filter instead of the AVIMux, I can clearly see that the stream is 30 fps, however upon capturing the video, each frame is duplicated 4 time and I get a 120 frames instead of 30. The movie is 4 times slower than it should be and only the first frame in the set of 4 is a Key Frame.
I tried the same experiment with 8 fps and for each image I received, I had 15 frames in the video. And in case of 15 fps, I got each frame 8 times.
I tried both writing the code in C++ and testing it with Graph Edit Plus.
Is there any way I can control it? Maybe some restrictions on the AVIMux filter?
You don't specify your capture format which could have some bearing on the problem, but generally it sounds like the graph when writing to file has some bottleneck which prevents the stream from continuing to flow at 30fps. The camera is attempting to produce frames at 30fps, and it will do so as long as buffers are recycled for it to fill.
But here the buffers aren't available because the file writer is busy getting them onto the disk. The capture filter is starved and in this situation it increments the "dropped frame" counter which travels with each captured frame. AVIMux uses this count to insert an indicator into the AVI file which says in effect "a frame should have been available here to write to file, but isn't; at playback time repeat the last frame". So the file should have placeholders for 30 frames per second - some filled with actual frames, and some "dropped frames".
Also, you don't mention whether you're muxing in audio, which would be acting as a reference clock for the graph to maintain audio-video sync. When capture completes if also using an audio stream, AVIMux alters the framerate of the video stream to make the duration of the two streams equal. You can check whether AVIMux has altered the framerate of the video stream by dumping the AVI file header (or maybe right click on the file in explorer and look at the properties).
If I had to hazard a guess as to the root of the problem, I'd wager the capture driver has a bug in calculating the dropped frame count which is in turn messing up AVIMux. Does this happen with a different camera?
I build a DirectShow graph consisting of my video capture filter
(grabbing the screen), default audio input filter both connected
through spliiter to WM Asf Writter output filter and to VMR9 renderer.
This means I want to have realtime audio/video encoding to disk
together with preview. The problem is that no matter what WM profile I
choose (even very low resolution profile) the output video file is
always "jitter" - every few frames there is a delay. The audio is ok -
there is no jitter in audio. The CPU usage is low < 10% so I believe
this is not a problem of lack of CPU resources. I think I'm time-
stamping my frames correctly.
What could be the reason?
Below is a link to recorder video explaining the problem:
http://www.youtube.com/watch?v=b71iK-wG0zU
Thanks
Dominik Tomczak
I have had this problem in the past. Your problem is the volume of data being written to disk. Writing to a faster drive is a great and simple solution to this problem. The other thing I've done is placing a video compressor into the graph. You need to make sure both input streams are using the same reference clock. I have had a lot of problems using this compressor scheme and keeping a good preview. My preview's frame rate dies even if i use an infinite Tee rather than a Smart Tee, the result written to disk was fine though. Its also worth noting that the more of a beast the machine i was running it on was the less of an issue so it may not actually provide much of a win if you need both over sticking a new faster hard disk in the machine.
I don't think this is an issue. The volume of data written is less than 1MB/s (average compression ratio during encoding). I found the reason - when I build the graph without audio input (WM ASF writer has only video input pint) and my video capture pin is connected through Smart Tree to preview pin and to WM ASF writer input video pin then there is no glitch in the output movie. I reckon this is the problem with audio to video synchronization in my graph. The same happens when I build the graph in GraphEdit. Without audio, no glitch. With audio, there is a constant glitch every 1s. I wonder whether I time stamp my frames wrongly bu I think I'm doing it correctly. How is the general solution for audio to video synchronization in DirectShow graphs?
I have a two dump files of raw video and raw audio from an encoder and I want to be able to measure the "Lip-sync". Imagine a video of a hammer striking an anvil. I want to go frame by frame and see that when the hammer finally hits the anvil, there is a spike in amplitude on the audio track.
Because of the speed that everything happens at, I cannot merely listen to the audio, i need to see the waveform in time domain.
Are there any tools out there that will let me see both the video and audio?
If you are concerned about validating a decoder then generally from a validation perspective the goal is to check Audio and Video PTS values against a common real time clock.
Raw YUV and PCM files do not include timestamps. If you know the frame-rate and sample-rate you can use a raw yuv file viewer (I wrote my own) to figure out the time (from start of file) of a given frame in the video, and a tool like Audacity to figure out the time form start of file to a start of tone in the audio file. this still may not tell you the whole story since tools usually embed a delay between the audio and video in the ts/ps file. Or you can hook up ab OScope and go old school.