ffmpeg based multi threaded c++ application fails on decoding - c++

I am using remuxing example from ffmpeg sources as reference. I wrote a multi-threaded application based on boost threads to perform a codec copy and remux using ffmpeg API. That works fine . The problem arises when I try to decode the frame
"
ret = avcodec_decode_video2(dec_ctx, frame, &got_frame, &pkt);
if (ret < 0) {
av_log(NULL, AV_LOG_ERROR, "Error decoding video %s\n",av_make_error_string(errorBuff,80,ret));
return -1;
}"
I need the decoded frame to convert it to Opencv Mat object. For a single instance this code works fine. But as soon as I run multiple threads I start getting decoding errors like these
left block unavailable for requested intra mode at 0 0
[h264 # 0x7f9a48115100] error while decoding MB 0 0, bytestream 1479
[h264 # 0x7f9a480825e0] number of reference frames (0+2) exceeds max (1; probably corrupt input), discarding one
[h264 # 0x7f9a480ae680] error while decoding MB 13 5, bytestream -20
[h264 # 0x7f9a48007700] number of reference frames (0+2) exceeds max (1; probably corrupt input), discarding one
[h264 # 0x7f9a48110340] top block unavailable for requested intra4x4 mode -1 at 31 0
[h264 # 0x7f9a48110340] error while decoding MB 31 0, bytestream 1226
[h264 # 0x7f9a48115100] number of reference frames (0+2) exceeds max (1; probably corrupt input), discarding one
[h264 # 0x7f9a480825e0] top block unavailable for requested intra4x4 mode -1 at 4 0
[h264 # 0x7f9a480825e0] error while decoding MB 4 0, bytestream 1292
[h264 # 0x7f9a480ae680] number of reference frames (0+2) exceeds max (1; probably corrupt input), discarding one
All variables used by ffmpeg api are declared local to the thread function. I am not sure how ffmpeg frame allocs or context allocs work.
any help in making the decoding process multi-threaded ?
Update:
I have included ff_lockmgr
static int ff_lockmgr(void **mutex, enum AVLockOp op)
{
pthread_mutex_t** pmutex = (pthread_mutex_t**) mutex;
switch (op) {
case AV_LOCK_CREATE:
*pmutex = (pthread_mutex_t*) malloc(sizeof(pthread_mutex_t));
pthread_mutex_init(*pmutex, NULL);
break;
case AV_LOCK_OBTAIN:
pthread_mutex_lock(*pmutex);
break;
case AV_LOCK_RELEASE:
pthread_mutex_unlock(*pmutex);
break;
case AV_LOCK_DESTROY:
pthread_mutex_destroy(*pmutex);
free(*pmutex);
break;
}
return 0;
}
and initialized it as well "av_lockmgr_register(ff_lockmgr);"
Now the video is being decoded in all threads BUT the images saved from the decoded frame using FFMPEG AVFrame to OpenCv Mat conversion and imwrite results in garbled (mixed) frame. Part of the frame is from one camera and rest is from another or the image doesnt make any sense at all.

Not every format decoder supports multiple threads, and even for the decoders which support it, it might not be supported for a particular file.
For example, consider a MPEG4 file with a single keyframe at the beginning, followed by P frames. In this case every next frame depends on previous, and using multiple threads would not likely produce any benefits.
In my app I had to disable multithreaded encoders because of that.

Related

Ffmpeg video output is 0 seconds with correct filesize when uploading to google cloud bucket

I've made a C++ program that lives in gke and takes some videos as input using ffmpeg, then does something with that input using opengl(not relevant), then finally encodes those edited videos as a single output. Normally the program works perfectly fine on my local machine, it encodes just as I want it to with no warnings or valgrind errors whatsoever. Then, after encoding the said video, I want my program to upload that video to the google cloud storage. This is where the problem comes, I have tried 2 methods for this: First, I tried using curl to upload to the cloud using a signed url. Second, I tried mounting the google storage using gcsfuse(I was already mounting the bucket to access the inputs in question). Both of those methods yielded undefined, weird behaviour's ranging from: Outputing a 0byte or 44byte file, (This is the most common one:) encoding in the correct file size ~500mb but the video is 0 seconds long, outputing a 0.4 second video or just encoding the desired output normally (really rare).
From the logs I can't see anything unusual, everything seems to work fine and ffmpeg does not give any errors or warnings, so does valgrind. Everything seems to work normally, even when I use curl to upload the video to the cloud the output is perfectly fine when it first encodes it (before sending it with curl) but the video gets messed up when curl uploads it to the cloud.
I'm using the muxing.c example of ffmpeg to encode my video with the only difference being:
void video_encoder::fill_yuv_image(AVFrame *frame, struct SwsContext *sws_context) {
const int in_linesize[1] = { 4 * width };
//uint8_t* dest[4] = { rgb_data, NULL, NULL, NULL };
sws_context = sws_getContext(
width, height, AV_PIX_FMT_RGBA,
width, height, AV_PIX_FMT_YUV420P,
SWS_BICUBIC, 0, 0, 0);
sws_scale(sws_context, (const uint8_t * const *)&rgb_data, in_linesize, 0,
height, frame->data, frame->linesize);
}
rgb_data is the data I got after editing the inputs. Again, this works fine and I don't think there are any errors here.
I'm not sure where the error is and since the code is huge I can't provide a replicable example. I'm just looking for someone to point me to the right direction.
Running the cloud's output in mplayer wields this result (This is when the video is the right size but is 0 seconds long, the most common one.):
MPlayer 1.4 (Debian), built with gcc-11 (C) 2000-2019 MPlayer Team
do_connect: could not connect to socket
connect: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.
Playing /media/c36c2633-d4ee-4d37-825f-88ae54b86100.
libavformat version 58.76.100 (external)
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
RAWDV file format detected.
VIDEO: [DVSD] 720x480 24bpp 29.970 fps 0.0 kbps ( 0.0 kbyte/s)
X11 error: BadMatch (invalid parameter attributes)
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory
[vdpau] Error when calling vdp_device_create_x11: 1
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 58.134.100 (external)
[dvvideo # 0x7f2cb987a380]Requested frame threading with a custom get_buffer2() implementation which is not marked as thread safe. This is not supported anymore, make your callback thread-safe.
Selected video codec: [ffdv] vfm: ffmpeg (FFmpeg DV)
==========================================================================
Load subtitles in /media/
==========================================================================
Opening audio decoder: [libdv] Raw DV Audio Decoder
Unknown/missing audio format -> no sound
ADecoder init failed :(
Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders
[dvaudio # 0x7f2cb987a380]Decoder requires channel count but channels not set
Could not open codec.
ADecoder init failed :(
ADecoder init failed :(
Cannot find codec for audio format 0x56444152.
Audio: no sound
Starting playback...
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
V: 0.0 2/ 2 ??% ??% ??,?% 0 0
Exiting... (End of file)
Edit: Since the code runs on a VM, I'm using xvfb-run ro start my application, but again even when using xvfb-run it works completely fine on when not encoding to the cloud.
Apparently, I'm assuming for security reasons, the google cloud storage does not allow us to do multiple continuous operations on a file, just a singular read/write operation. So I found a workaround by encoding my video to a local file inside the pod and then doing a copy operation to the cloud.

Raw files aren't playing, or are playing incorrectly - Oboe (Android-ndk)

I'm attempting to Play a Raw (int16 PCM) encoded audio file in my android application. I've been following and reading through the Oboe documentation/samples to try to get one of my own audio files to play.
The audio file I need to play is roughly 6kb, or 1592 frames (stereo).
Either no sound plays, or sound/jitter plays on startup (with varying output - see bellow)
Troubleshooting
update
I have switched to floats for buffer queuing, instead of keeping everything to int16_t (and converting back to int16_t when done), although now I'm back to no sound.
The audio seems to be either not playing, or playing on startup (which is wrong). The sound should play after I press 'start'.
When the app was implemented with int16_t only, the premature sound was relative to how big the buffer size was. If the buffer size is smaller than the audio file, the sound is very fast and clipped (more drone-like at lower buffer sizes). Bigger than the Raw audio size it seems like it plays on a loop and gets quieter at higher buffer sizes. The sound would also get "softer" when the start button is pressed. I'm not even entirely sure this means the raw audio was playing, it could just be random nonsense jitters from Android.
When filling the buffers with floats, and converting to int16_t afterwards, no audio is played.
(I have tried running systrace, but I honestly don't know what I'm looking for)
The stream opens fine.
The buffer size fails to be ajusted in createPlaybackStream() (although somehow it still sets it to twice the burst size)
The stream starts fine.
The Raw resources are being loaded fine.
Implementation
What I am currently trying in the builder:
Setting the callback to this, or onAudioReady()
Setting the performance mode to LowLatency
Setting the sharing mode to Exclusive
Setting the buffer capacity to (anything bigger than my audio file frame count)
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
I am using the Player class and the AAssetManager class from the Rhythm Game sample here: https://github.com/google/oboe/blob/master/samples/RhythmGame. I am using these classes to load my resources and play the sound. Player.renderAudio writes the audio data to the output buffer.
Here are the relevant methods from my audio engine:
void AudioEngine::createPlaybackStream() {
// // Load the RAW PCM data files into memory
std::shared_ptr<AAssetDataSource> soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
if (soundSource == nullptr) {
LOGE("Could not load source data for sound");
return;
}
sound = std::make_shared<Player>(soundSource);
AudioStreamBuilder builder;
builder.setCallback(this);
builder.setPerformanceMode(PerformanceMode::LowLatency);
builder.setSharingMode(SharingMode::Exclusive);
builder.setChannelCount(mChannelCount);
Result result = builder.openStream(&stream);
if (result == Result::OK && stream != nullptr) {
mSampleRate = stream->getSampleRate();
mFramesPerBurst = stream->getFramesPerBurst();
int channelCount = stream->getChannelCount();
if (channelCount != mChannelCount) {
LOGW("Requested %d channels but received %d", mChannelCount, channelCount);
}
// Set the buffer size to (burst size * 2) - this will give us the minimum possible latency while minimizing underruns
stream->setBufferSizeInFrames(mFramesPerBurst * 2);
if (setBufferSizeResult != Result::OK) {
LOGW("Failed to set buffer size. Error: %s", convertToText(setBufferSizeResult.error()));
}
// Start the stream - the dataCallback function will start being called
result = stream->requestStart();
if (result != Result::OK) {
LOGE("Error starting stream. %s", convertToText(result));
}
} else {
LOGE("Failed to create stream. Error: %s", convertToText(result));
}
}
DataCallbackResult AudioEngine::onAudioReady(AudioStream *audioStream, void *audioData, int32_t numFrames) {
int16_t *outputBuffer = static_cast<int16_t *>(audioData);
sound->renderAudio(outputBuffer, numFrames);
return DataCallbackResult::Continue;
}
// When the 'start' button is pressed, it calls this method with true
// There should be no sound on app start-up until this button is pressed
// Sound stops when 'stop' is pressed
setPlaying(bool isPlaying) {
sound->setPlaying(isPlaying);
}
Setting the buffer capacity to (anything bigger than my audio file frame count)
You don't need to set the buffer capacity. This will be set automatically at a reasonable level for you. Typically ~3000 frames. Note that buffer capacity is different from buffer size which defaults to 2*framesPerBurst.
Setting the burst size (frames per call back) to (anything equal to or lower than the buffer capacity / 2)
Again, don't do this. onAudioReady will be called every time the stream requires more audio data and numFrames indicates how many frames you should supply. If you override this value with a value which isn't an exact ratio of the audio device's native burst size (typical values are 128, 192 and 240 frames depending on underlying hardware) then you may get audio glitches.
I have switched to floats for buffer queuing
The format which you need to supply data in is determined by the audio stream and it is only known after the stream has been opened. You can get it by calling stream->getFormat().
In the RhythmGame sample (at least the version you're referring to) here's how the formats work:
Source file is converted from 16-bit to float inside AAssetDataSource::newFromAssetManager (floats are the preferred format for any kind of signal processing)
If the stream format is 16-bit then convert it back inside onAudioReady
1592 frames (stereo).
You said that your source was stereo but you're specifying it as mono here:
std::shared_ptr soundSource(AAssetDataSource::newFromAssetManager(assetManager, "sound.raw", ChannelCount::Mono));
Without doubt that will cause audio problems because the AAssetDataSource will have a value for numFrames which is double the correct value. This will cause audio glitches because half the time you'll be playing random parts of system memory.

Media Foundation video re-encoding producing audio stream sync offset

I'm attempting to write a simple windows media foundation command line tool to use IMFSourceReader and IMFSyncWriter to load in a video, read the video and audio as uncompressed streams and re-encode them to H.246/AAC with some specific hard-coded settings.
The simple program Gist is here
sample video 1
sample video 2
sample video 3
(Note: the video's i've been testing with are all stereo, 48000k sample rate)
The program works, however in some cases when comparing the newly outputted video to the original in an editing program, I see that the copied video streams match, but the audio stream of the copy is pre-fixed with some amount of silence and the audio is offset, which is unacceptable in my situation.
audio samples:
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[silence] [silence] [silence] [audio1] [audio2] [audio3] ... etc
In cases like this the first video frames coming in have a non zero timestamp but the first audio frames do have a 0 timestamp.
I would like to be able to produce a copied video who's first frame from the video and audio streams is 0, so I first attempted to subtract that initial timestamp (videoOffset) from all subsequent video frames which produced the video i wanted, but resulted in this situation with the audio:
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[audio4] [audio5] [audio6] [audio7] [audio8] ... etc
The audio track is shifted now in the other direction by a small amount and still doesn't align. This can also happen sometimes when a video stream does have a starting timestamp of 0 yet WMF still cuts off some audio samples at the beginning anyway (see sample video 3)!
I've been able to fix this sync alignment and offset the video stream to start at 0 with the following code inserted at the point of passing the audio sample data to the IMFSinkWriter:
//inside read sample while loop
...
// LONGLONG llDuration has the currently read sample duration
// DWORD audioOffset has the global audio offset, starts as 0
// LONGLONG audioFrameTimestamp has the currently read sample timestamp
//add some random amount of silence in intervals of 1024 samples
static bool runOnce{ false };
if (!runOnce)
{
size_t numberOfSilenceBlocks = 1; //how to derive how many I need!? It's aribrary
size_t samples = 1024 * numberOfSilenceBlocks;
audioOffset = samples * 10000000 / audioSamplesPerSecond;
std::vector<uint8_t> silence(samples * audioChannels * bytesPerSample, 0);
WriteAudioBuffer(silence.data(), silence.size(), audioFrameTimeStamp, audioOffset);
runOnce= true;
}
LONGLONG audioTime = audioFrameTimeStamp + audioOffset;
WriteAudioBuffer(dataPtr, dataSize, audioTime, llDuration);
Oddly, this creates an output video file that matches the original.
original - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
copy - |[audio1] [audio2] [audio3] [audio4] [audio5] ... etc
The solution was to insert extra silence in block sizes of 1024 at the beginning of the audio stream. It doesn't matter what the audio chunk sizes provided by IMFSourceReader are, the padding is in multiples of 1024.
My problem is that there seems to be no detectable reason for the the silence offset. Why do i need it? How do i know how much i need? I stumbled across the 1024 sample silence block solution after days of fighting this problem.
Some videos seem to only need 1 padding block, some need 2 or more, and some need no extra padding at all!
My question here are:
Does anyone know why this is happening?
Am I using Media Foundation incorrectly in this situation to cause this?
If I am correct, How can I use the video metadata to determine if i need to pad an audio stream and how many 1024 blocks of silence need to be in the pad?
EDIT:
For the sample videos above:
sample video 1 : the video stream starts at 0 and needs no extra blocks, passthrough of original data works fine.
sample video 2 : video stream starts at 834166 (hns) and needs 1 1024 block of silence to sync
sample video 3 : video stream starts at 0 and needs 2 1024 blocks of silence to sync.
UPDATE:
Other things I have tried:
Increasing the duration of the first video frame to account for the offset: Produces no effect.
I wrote another version of your program to handle NV12 format correctly (yours was not working) :
EncodeWithSourceReaderSinkWriter
I use Blender as video editing tools. Here is my results with Tuning_against_a_window.mov :
from the bottom to the top :
Original file
Encoded file
I changed the original file by settings "elst" atoms with the value of 0 for number entries (I used Visual Studio hexa editor)
Like Roman R. said, MediaFoundation mp4 source doesn't use the "edts/elst" atoms. But Blender and your video editing tools do. Also the "tmcd" track is ignored by mp4 source.
"edts/elst" :
Edits Atom ( 'edts' )
Edit lists can be used for hint tracks...
MPEG-4 File Source
The MPEG-4 file source silently ignores hint tracks.
So in fact, the encoding is good. I think there is no audio stream sync offset, comparing to the real audio/video data. For example, you can add "edts/elst" to the encoded file, to get the same result.
PS: on the encoded file, i added "edts/elst" for both audio/video tracks. I also increased size for trak atoms and moov atom. I confirm, Blender shows same wave form for both original and encoded file.
EDIT
I tried to understand relation between mvhd/tkhd/mdhd/elst atoms, in the 3 video samples. (Yes I know, i should read the spec. But i'm lazy...)
You can use a mp4 explorer tool to get atom's values, or use the mp4 parser from my H264Dxva2Decoder project :
H264Dxva2Decoder
Tuning_against_a_window.mov
elst (media time) from tkhd video : 20689
elst (media time) from tkhd audio : 1483
GREEN_SCREEN_ANIMALS__ALPACA.mp4
elst (media time) from tkhd video : 2002
elst (media time) from tkhd audio : 1024
GOPR6239_1.mov
elst (media time) from tkhd video : 0
elst (media time) from tkhd audio : 0
As you can see, with GOPR6239_1.mov, media time from elst is 0. That's why there is no video/audio sync problem with this file.
For Tuning_against_a_window.mov and GREEN_SCREEN_ANIMALS__ALPACA.mp4, i tried to calculate the video/audio offset.
I modified my project to take this into account :
EncodeWithSourceReaderSinkWriter
For now, i didn't find a generic calculation for all files.
I just find the video/audio offset needed to encode correctly both files.
For Tuning_against_a_window.mov, i begin encoding after (movie time - video/audio mdhd time).
For GREEN_SCREEN_ANIMALS__ALPACA.mp4, i begin encoding after video/audio elst media time.
It's OK, but I need to find the right unique calculation for all files.
So you have 2 options :
encode the file and add elst atom
encode the file using right offset calculation
it depends on your needs :
The first option permits you to keep the original file.But you have to add the elst atom
With the second option you have to read atom from the file before encoding, and the encoded file will loose few original frames
If you choose the first option, i will explain how I add the elst atom.
PS : i'm intersting by this question, because in my H264Dxva2Decoder project, the edts/elst atom is in my todo list.
I parse it, but i don't use it...
PS2 : this link sounds interesting :
Audio Priming - Handling Encoder Delay in AAC

Running error, [h264 # 0x10af4a0] AVC: nal size [big number]

I am using libavformat apis to get video frame from a MP4 video file. My code (c++) runs good in my personal computer, but when I try to deploy it into computing server, there are something strange happens. In the function 'av_read_frame()', some errors appear.
[h264 # 0x10af4a0] AVC: nal size 555453589
[h264 # 0x10af4a0] AVC: nal size 555453589
[h264 # 0x10af4a0] no frame!
My code is like this:
if (av_read_frame(_p_format_ctx, &_packet) < 0) {
return false;
}
But when this error occurs, the program doesn't exit. But the final results are wrong.
The OS of computing server is Linux, the kernel is 2.6.32.
The version of FFmpeg is 3.2.4.
The version of gcc is 4.8.2.

X264 encoding using Opencv

I am working with a high resolution camera: 4008x2672. I a writing a simple program which grabs frame from the camera and sends the frame to a avi file. For working with such a high resolution, I found only x264 codec that could do the trick (Suggestions welcome). I am using opencv for most of the image handling stuff. As mentioned in this post http://doom10.org/index.php?topic=1019.0 , I modified the AVCodecContext members as per ffmpeg presets for libx264 (Had to do this to avoid broken ffmpeg defaults settings error). This is output I am getting when I try to run the program
libx264 # 0x992d040]non-strictly-monotonic PTS
1294846981.526675 1 0 //Timestamp camera_no frame_no
1294846981.621101 1 1
1294846981.715521 1 2
1294846981.809939 1 3
1294846981.904360 1 4
1294846981.998782 1 5
1294846982.093203 1 6
Last message repeated 7 times
[avi # 0x992beb0]st:0 error, non monotone timestamps
-614891469123651720 >= -614891469123651720
OpenCV Error: Unspecified error (Error while writing video frame) in
icv_av_write_frame_FFMPEG, file
/home/ajoshi/ext/OpenCV-2.2.0/modules/highgui/src/cap_ffmpeg.cpp, line 1034
terminate called after throwing an instance of 'cv::Exception'
what(): /home/ajoshi/ext/OpenCV-2.2.0/modules/highgui/src/cap_ffmpeg.cpp:1034:
error: (-2) Error while writing video frame in function icv_av_write_frame_FFMPEG
Aborted
Modifications to the AVCodecContext are:
if(codec_id == CODEC_ID_H264)
{
//fprintf(stderr, "Trying to parse a preset file for libx264\n");
//Setting Values manually from medium preset
c->me_method = 7;
c->qcompress=0.6;
c->qmin = 10;
c->qmax = 51;
c->max_qdiff = 4;
c->i_quant_factor=0.71;
c->max_b_frames=3;
c->b_frame_strategy = 1;
c->me_range = 16;<br>
c->me_subpel_quality=7;
c->coder_type = 1;
c->scenechange_threshold=40;
c->partitions = X264_PART_I8X8 | X264_PART_I4X4 | X264_PART_P8X8 | X264_PART_B8X8;
c->flags = CODEC_FLAG_LOOP_FILTER;
c->flags2 = CODEC_FLAG2_BPYRAMID | CODEC_FLAG2_MIXED_REFS | CODEC_FLAG2_WPRED | CODEC_FLAG2_8X8DCT | CODEC_FLAG2_FASTPSKIP;
c->keyint_min = 25;
c->refs = 3;
c->trellis=1;
c->directpred = 1;
c->weighted_p_pred=2;
}
I am probably not setting the dts and pts values which I believed ffmpeg should be setting it for me.
Any sugggestions welcome.
Thanks in advance
I would probably run the x264 executable in another process and pipe either rgb or yuv pixels to it. Then you can use all the normal x264 (or ffmpeg) flags and it handles multi threading for you.
And since x264 is GPL licensed it also gives you more freedom on licensing your app.
ps. Here is some sample code using ffmpeg from Qt you can ignore the Qt specific bits but it gives a good starting point for using ffmpeg from a c++ app.
Actual error is "non monotone timestamps". I seems that you didn't properly initialized video frame properties. If its possible use libx264 directly. It'll be more easy to handle.
PS. you can work around ffmpeg x264 setting problem by specify 264 preset file with -fvpre option.
The pts value of the AVFrame you send as the last argument to avcodec_encode_video needs to be set by you. Once you set this, the codec context's coded_from->pts field will have the correct value which you can av_rescale_q() and set in the AVPacket for your av_interleaved_write_frame().