I couldn't find any information on the way av_interleaved_write_frame deals with video and audio packets.
I have multiple audio and video packets coming from 2 threads. Each thread calls a write_video_frame or write_audio_frame, locks a mutex, initialize an AVPacket and writes data to an .avi file.
Initialization of AVCodecContext and AVFOrmatContext is ok.
-- Edit 1 --
Audio and video are coming from an external source (microphone and camera) and are captured as raw data without any compression (even for video).
I use h264 to encode video and no compression for Audio (PCM).
Audio captured is: 16bits, 44100khz, stereo
Video captured is 25FPS
Question:
1) Is it a problem if I write multiple video packets at once (let's say 25 packets/sec) and just one audio packet/sec.
Answer: Apparently not, the function av_interleaved_write_frame should be able to manage that kind of data as soon as pts and dts is well managed
This means I call av_interleaved_write_frame 25 times for video writing and just 1 for audio writing per second. Could this be a problem ? If it is how can I deal with this scenario ?
2) How can I manage pts and dts in this case ? It seems to be a problem in my application since I cannot correctly render the .avi file. Can I use real time stamps for both video and audio ?
Answer: The best thing to do here is to use the timestamp given when capturing audio / video as pts and dts for this kind of application. So these are not exactly real time stamps (from wall clock) but media capture timestamps.
Thank you for your precious advices.
av_interleaved_write_frame writes otput packets in such way so they are properly interleaved (maybe queueing them internally). "Properly interleaved" depends on a container format but usually it means that DTS stamps of the packets in output file are monotonically increasing.
av_interleaved_write_frame, like most FFmpeg APIs, shouldn't be ever called simultaneously by two threads with same AVFormatContext. I assume you make sure of that with a mutex. If you do then it doesn't matter whether it is multithreaded application or now.
Is it a problem if I write multiple video packets at once (let's say 25 packets/sec) and just one audio packet/sec
It is not a problem in general but most audio codecs can't output 1 second long audio packets. Which codec do you use?
How can I manage pts and dts in this case? Can I use real time stamps for both video and audio ?
Same way as you would in single-threaded application. Dts usually are generated by codecs from pts. Pts usually comes from a capture device/decoder together with the corresponding audio/video data.
Real time stamps might be ok to use but it really depends on how and when you are acquiring them. Please elaborate on what exactly you are trying to do. Where is audio/video data coming from?
Related
I work on an application which consist of multiple pipelines:
Video source: capture video, do some simple operation on it (encode and overlaying) and passing the output to an appsink
Audio source: capture audio and forward it to an appsink
Streamer: sending the video (and maybe later audio as well) over as UDP stream or WebRTC (live video)
Buffer: a plain c++ class which stores the buffers for a configured time (it can be changed dynamically, and the audio and video buffer time can be different, so it should be possible that the video starts a few minutes before from the buffer before the audio)
Filewriter: this is a simple filesink, when its triggered I push the buffers to it's audio and video appsrc with the gst_app_src_push_buffer methond (at the moment I encode the video with VP8 to a webm file using a webmmux).
All of pipelines are started at the same time (sequentially after each other).
VideoSource -----------------> LiveStream
|
\
--------------> record trigger
Buffer---------------- X ----------------> FileWriter
AudioSource ----------------->
I have two kinds of issues:
I cannot make the audio and video sync in the filewriter
During playback the playtime counter not starts from 0 (I think it is somehow receiving the timestamp from the Video source, so when I start the video in VLC it start countig from eg.: 30 seconds which is the amount of time I waited until I triggered the record) and Im not able seek in the video file.
I tried to play around with following approaches:
the 'do-timestamp' and 'is-live' properties on the filewriter's receiving appsrc
reset the timestamps on the buffers (audio and video) with 'GST_BUFFER_PTS' to a sequentially value starting from zero and increased with the duration
adjust the timestamps on the buffers (audio and video) with 'GST_BUFFER_PTS' with the time difference between the start times of the source and filewriter timelines
tried to sync all piplines by setting the same clock on them and use the same basetime on them
I think I miss some fundamental thing here. Can you give me some hints how should I properly do this time synchronisation?
Thanks a lot!
I write the received packets in binary files. When the recording of the first file is completed, I call flush:
avcodec_send_frame(context, NULL);
This is the signal to end the stream. But when I send a new frame to the encoder, function return AVERROR_EOF (man: the encoder has been flushed, and no new frames can be sent to it). What to do to make the encoder take the frames after flushing?
Example: when decoding, you can call:
avcodec_flush_buffers(context);
This function changes the stream, but only for decoding.
Maybe analogic function for encoding?
Ideas:
1) do not call flush. But the encoder buffers frames inside and gives some packets only after flushing (using h.264 with b-frames), while some packets get into the next file.
2) Recreate codec context?
Details: use Win 7, Qt 5.10, ffmpeg 4.0.2
The correct answer is that you should create a new codec context for each file, or headache will follow. The little expense of additional headers and key frames should be small unless you are doing something very exotic.
B-frames can refer to both previous and future frames, how would you even decide such a beast?
In theory you could probably force a keyframe and hope for the best, but then there is really no point in not starting a new context, unless the hundreds of bytes or so of H264 init data is a problem.
I'm capturing the audio stream of a voice chat program (it is proprietary, closed-source and I have no control over it) which is encoded with the OPUS Codec, and I want to decode it into raw PCM audio (Opus Decoder doc).
What I'm doing is:
Create an OPUS decoder: opusDecoder = opus_decoder_create(48000, 1, &opusResult);
Decode the stream: opusResult = opus_decode(opusDecoder, voicePacketBuffer, voicePacketLength, pcm, 9600, 0);
Save it to a file: pcmFile.write(pcm, opusResult * sizeof(opus_int16));
Read the file with Audacity (File > Import > Raw Data...)
Here comes the problem: sometimes it works perfectly well (I can hear the decoded PCM audio without glitch and with the original speed) but sometimes, the decoded audio stream is in "slow motion" (sometimes a little slower than normal, sometimes much slower).
I can't find out why because I don't change my program: the decoding settings remain the same. Yet, sometimes it works, sometimes it doesn't. Also, opus_decode() is always able to decode the data, it doesn't return an error code.
I read that the decoder has a "state" (opus_decoder_ctl() doc). I thought maybe time between opus_decode() calls is important?
Can you think of any parameter, be it explicit (like the parameters given to the functions) or implicit (time between two function calls), that might cause this effect?
"Slow motion" audio is almost always mismatch of sampling rate (recorded on high rate but played in low rate). For example if you record audio on 48kHz but play it as 8kHz.
Another possible reason of "slow motion" is more than one stream decoded by the same decoder. But in this case you also get distorted slow audio.
As for OPUS:
It always decode in rate that you specified in create parameters.
Inside it has pure math (without any timers or realtime related things) so it is not important when you call decode function.
Therefore some troubleshooting advises:
Make sure that you do not create decoder with different sampling rates
Make sure that when you import raw file in audacity you always import it in 48kHz mono
If any above do not help - check how many bytes you receive from decoder on each packet in normal/slow motion cases. For normal audio streams (with uniform inter-packet time) you always get the same number of raw audio samples.
I am currently trying to implement an algorithm which can quickly discard unwanted frames from an MP4 video when encoding to another MP4 (using Media Foundation).
The encoding part doesn't seem to bad - the "Source Reader plus Sink Writer" approach is nice and fast. You basically just have to create an IMFSourceReader and an IMFSinkWriter, set the source native media type on the writer, yada, yada, yada, and just loop: source.ReadSample(&sample) --> writer.WriteSample(&sample). The WriteSample() calls can be conditioned on whether they're "! 2 b discarded".
That naive approach is no good if you consider that the samples read will be "predicted-frames", a.k.a., P-frames in the H.264 encoded streams. Dropping any preceding "intra-coded picture frame" (I-frame or key-frame) before that will result in garbled video.
So, my question is, is it possible to introduce an I-frame (somehow) into the sink writer before resuming the sample writing in the sink writer?
Doing something with the MFSampleExtension_CleanPoint attribute doesn't seem to help. I could manually create an IMFSample (via MFCreateSample), but getting it in the right H.264 format might be tricky.
Any ideas? Or thoughts on other approaches to dropping frames during encoding?
I think that this is not possible without reenconding the video! The Reference between P and I Frames are in the h264 Bitstream and not in the container (MP4). You can only safely skip frames, which are not referenced from other frames:
last P-Frames of a GOP (before the next I-Frame)
B-Frames
Normaly these Frames are not referrenced, but they can be! This dependes on the encoder-settings used to create the h264 stream
I receive audio packets from net (4 packets per second, 250ms each) and video - 15fps. Everything goes with my own timestamps. How should I sync them? I've seen the source code of one of our developers but he did syncing VIDEO according to audio. I.e. audio is always played immediately and video can be dropped or buffered. I don't think it is correct because audio can overrun video for a second or two - in that case we will not have actual video frames at all.
I'd like to know some basics in sync stuff. What should be buffered? Should audio and video in sync mode be played in separate thread(s)? Any clues would be regardful!
Thanks a lot!
I needed in smth like that - http://www.freepatentsonline.com/7680153.html
Pretty difficult to understand but I think this patent explains the basics of sync.