FFmpeg: Encoding PCM 16 audio data Allocation error - c++

I am currently trying to encode some raw audio data with some video inside an avi container.
The video codec used is mpeg4 and I would like to use the PCM_16LE for the audio codec but I am facing a problem regarding the AVCodec->frame_size parameter for the audio samples.
After doing all the correct allocation, I try allocating the audio frame and for AV_CODEC_ID_PCM_S16LE codec I don't have the codec frame_size needed to get the samples buffer size. Therefore the sample buffer size is huge and I simply can't allocate such quantity of memory.
Does someone know how to bypass this issue and how to manually compute the frame_size ?
frame = av_frame_alloc();
if(!frame)
{
return NULL;
}
//Problem is right here with the frame_size
frame->nb_samples = m_pAudioCodecContext->frame_size;
frame->format = m_pAudioStream->codec->sample_fmt;
frame->channel_layout = m_pAudioStream->codec->channel_layout;
//The codec gives us the frame size, in samples, so we can calculate the size of the samples buffer in bytes
//This returns a huge value due to a null frame_size
m_audioSampleBufferSize = av_samples_get_buffer_size(NULL,
m_pAudioCodecContext->channels,
m_pAudioCodecContext->frame_size,
m_pAudioCodecContext->sample_fmt,
0);
Thank you for your help,
Robert

As you can see in pcm_encode_init function in pcm.c
All pcm encoders have frame_size = 0;. Why?
Because in all PCM formats 'De facto' no such thing like frame, there is no compression by nature of PCM.
So you should decide by your own how many samples you wanna to store in buffer

Related

Pass ID3D11Texture2D back buffer to libx264 encoder

I'm writing a C++ program to encode frames from a DirectX game to the H.264/MPEG-4 AVC format. I am using libx264 alone with no other dependencies at the moment.
I have a ID3D11Texture2D* resolved back buffer of the next game frame. I need to somehow copy this into the x264_picture input (apparently YUV420P format according to limited help I've found) but I cannot find any way to do so online.
Here is my code at the moment:
void Fx264VideoEncoder::Fx264VideoEncoderImpl::InitFrameInputBuffer(const FTexture2DRHIRef& BackBuffer, FFrame& Frame)
{
x264_picture_alloc(Frame.InputPicture, X264_CSP_I420, x264Parameters.i_width, x264Parameters.i_height);
// We need to take the back buffer and convert it to an input format that libx264 can understand
{
ID3D11Texture2D* ResolvedBackBufferDX11 = (ID3D11Texture2D*)(GetD3D11TextureFromRHITexture(Frame.ResolvedBackBuffer)->GetResource());
EPixelFormat PixelFormat = Frame.ResolvedBackBuffer->GetFormat();
// ...?
}
}

How do I pre-allocate the memory for libavcodec to write decoded frame data?

I am trying to decode a video with libav by following the demo code: here
I need to be able to control where the frame data in pFrame->data[0] is stored. I have tried setting pFrame->data to my own buffer as follows:
// Determine required buffer size and allocate buffer
int numBytes = av_image_get_buffer_size(pixFmt, width, height, 1);
(uint8_t*) dataBuff = (uint8_t*) malloc (numBytes * sizeof(uint8_t));
// Assign buffer to image planes in pFrame
av_image_fill_arrays(frame->data, frame->linesize, dataBuff, pixFmt, width,
height, 1);
While this does set pFrame->data to be dataBuff (if I print their addresses, they are the same), this call ret = avcodec_receive_frame(pCodecContext, pFrame) to receive the decoded data always writes the data to a different address. It seems to manage its own memory somewhere in the underlying API and ignores the dataBuff that I assigned to pFrame right before.
So I'm stuck--how can I tell libav to write decoded frame data to memory that I pre-allocate? I've seen people ask similar questions online and in the libav forum but haven't been able to find an answer.
Many thanks~
I found that the proper way to do it is via the callback function get_buffer2 to create your own custom allocator as this answer demonstrates:
FFMPEG: While decoding video, is possible to generate result to user's provided buffer?
Further documentation is here!

How can I convert an FFmpeg AVFrame with pixel format AV_PIX_FMT_CUDA to a new AVFrame with pixel format AV_PIX_FMT_RGB

I have a simple C++ application that uses FFmpeg 3.2 to receive an H264 RTP stream. In order to save CPU, I'm doing the decoding part with the codec h264_cuvid. My FFmpeg 3.2 is compiled with hw acceleration enabled. In fact, if I do the command:
ffmpeg -hwaccels
I get
cuvid
This means that my FFmpeg setup has everything OK to "speak" with my NVIDIA card.
The frames that the function avcodec_decode_video2 provides me have the pixel format AV_PIX_FMT_CUDA. I need to convert those frames to new ones with AV_PIX_FMT_RGB. Unfortunately, I can't do the conversion using the well knwon functions sws_getContext and sws_scale because the pixel format AV_PIX_FMT_CUDA is not supported. If I try with swscale I get the error:
"cuda is not supported as input pixel format"
Do you know how to convert an FFmpeg AVFrame from AV_PIX_FMT_CUDA to AV_PIX_FMT_RGB ?
(pieces of code would be very appreciated)
This my understanding of the hardware decoding on the latest FFMPeg 4.1 version. Below are my conclusion after studying the source code.
First I recommend to inspire yourself from the hw_decode example:
https://github.com/FFmpeg/FFmpeg/blob/release/4.1/doc/examples/hw_decode.c
With the new API, when you send a packet to the encoder using avcodec_send_packet(), then use avcodec_receive_frame() to retrieve the decoded frame.
There are two different kinds of AVFrame: software one, which is stored in the "CPU" memory (a.k.a RAM), and hardware one, which is stored in the graphic card memory.
Getting AVFrame from the hardware
To retrieve the hardware frame and get it into a readable, convertible (with swscaler) AVFrame, av_hwframe_transfer_data() needs to be used to retrieve the data from the graphic card. Then look at the pixel format of the retrieved frame, it is usually NV12 format when using nVidia decoding.
// According to the API, if the format of the AVFrame is set before calling
// av_hwframe_transfer_data(), the graphic card will try to automatically convert
// to the desired format. (with some limitation, see below)
m_swFrame->format = AV_PIX_FMT_NV12;
// retrieve data from GPU to CPU
err = av_hwframe_transfer_data(
m_swFrame, // The frame that will contain the usable data.
m_decodedFrame, // Frame returned by avcodec_receive_frame()
0);
const char* gpu_pixfmt = av_get_pix_fmt_name((AVPixelFormat)m_decodedFrame->format);
const char* cpu_pixfmt = av_get_pix_fmt_name((AVPixelFormat)m_swFrame->format);
Listing supported "software" pixel formats
Side note here if you want to select the pixel format, not all AVPixelFormat are supported. AVHWFramesConstraints is your friend here:
AVHWDeviceType type = AV_HWDEVICE_TYPE_CUDA;
int err = av_hwdevice_ctx_create(&hwDeviceCtx, type, nullptr, nullptr, 0);
if (err < 0) {
// Err
}
AVHWFramesConstraints* hw_frames_const = av_hwdevice_get_hwframe_constraints(hwDeviceCtx, nullptr);
if (hw_frames_const == nullptr) {
// Err
}
// Check if we can convert the pixel format to a readable format.
AVPixelFormat found = AV_PIX_FMT_NONE;
for (AVPixelFormat* p = hw_frames_const->valid_sw_formats;
*p != AV_PIX_FMT_NONE; p++)
{
// Check if we can convert to the desired format.
if (sws_isSupportedInput(*p))
{
// Ok! This format can be used with swscale!
found = *p;
break;
}
}
// Don't forget to free the constraint object.
av_hwframe_constraints_free(&hw_frames_const);
// Attach your hw device to your codec context if you want to use hw decoding.
// Check AVCodecContext.hw_device_ctx!
Finally, a quicker way is probably the av_hwframe_transfer_get_formats() function, but you need to decode at least one frame.
Hope this will help!
You must use vf_scale_npp to do this. You can use either nppscale_deinterleave or nppscale_resize depend on your needs.
Both has same input parameters, which are AVFilterContext that should be initialize with nppscale_init, NPPScaleStageContext which takes your in/out pixel format and two AVFrames which of course are your input and output frames.
For more information you can see npplib\nppscale definition which will do the CUDA-accelerated format conversion and scaling since ffmpeg 3.1.
Anyway, I recommend to use NVIDIA Video Codec SDK directly for this purpose.
I am not an ffmpeg expert, but I had a similar problem and managed to solve it. I was getting AV_PIX_FMT_NV12 from cuvid (mjpeg_cuvid decoder), and wanted AV_PIX_FMT_CUDA for cuda processing.
I found that setting the pixel format just before decoding the frame worked.
pCodecCtx->pix_fmt = AV_PIX_FMT_CUDA; // change format here
avcodec_decode_video2(pCodecCtx, pFrame, &frameFinished, &packet);
// do something with pFrame->data[0] (Y) and pFrame->data[1] (UV)
You can check which pixel formats are supported by your decoder using pix_fmts:
AVCodec *pCodec = avcodec_find_decoder_by_name("mjpeg_cuvid");
for (int i = 0; pCodec->pix_fmts[i] != AV_PIX_FMT_NONE; i++)
std::cout << pCodec->pix_fmts[i] << std::endl;
I'm sure there's a better way of doing this, but I then used this list to map the integer pixel format ids to human readable pixel formats.
If that doesn't work, you can do a cudaMemcpy to transfer your pixels from device to host:
cudaMemcpy(pLocalBuf pFrame->data[0], size, cudaMemcpyDeviceToHost);
The conversion from YUV to RGB/RGBA can be done many ways. This example does it using the libavdevice API.

PTS not set after decoding H264/RTSP stream

Question: What does the Libav/FFmpeg decoding pipeline need in order to produce valid presentation timestamps (PTS) in the decoded AVFrames?
I'm decoding an H264 stream received via RTSP. I use Live555 to parse H264 and feed the stream to my LibAV decoder. Decoding and displaying is working fine, except I'm not using timestamp info and get some stuttering.
After getting a frame with avcodec_decode_video2, the presentation timestamp (PTS) is not set.
I need the PTS in order to find out for how long each frame needs to be displayed, and avoid any stuttering.
Notes on my pipeline
I get the SPS/PPS information via Live555, I copy these values to my AVCodecContext->extradata.
I also send the SPS and PPS to my decoder as NAL units, with the appended {0,0,0,1} startcode.
Live555 provides presentation timestamps for each packet, these are in most cases not monotonically increasing. The stream contains B-frames.
My AVCodecContext->time_base is not valid, value is 0/2.
Unclear:
Where exactly should I set the NAL PTS coming from my H264 sink (Live555)? As the AVPacket->dts, pts, none, or both?
Why is my time_basevalue not valid? Where is this information?
According to the RTP payload spec. It seems that
The RTP timestamp is set to the sampling timestamp of the content. A 90 kHz clock rate MUST be used.
Does this mean that I must always asume a 1/90000 timebase for the decoder? What if some other value is specified in the SPS?
Copy the live555 pts into the avpacket pts. Process the packet with avcodec_decode_video2, and then retrieve the pts from avframe->pkt_pts, these will be monotonically increasing.
There is no need to set anything in the codec context, apart from setting the SPS and PPS in the AVCodecContex extradata
You can find a good example in VLC's github:
Setting AVPacket pts: https://github.com/videolan/vlc/blob/master/modules/codec/avcodec/video.c#L983
Decoding AVPacket into AVFrame: https://github.com/videolan/vlc/blob/master/modules/codec/avcodec/video.c#L1014
Retrieving from AVFrame pts:
https://github.com/videolan/vlc/blob/master/modules/codec/avcodec/video.c#L1078
avcodec_decode_video2() reorders the frames so that decode order and presentation order is the same.
Even if you somehow convince ffmpeg to give you PTS on the decoded frame it should be the same as DTS.
//
// decode a video frame
//
avcodec_decode_video2
(
ctxt->video_st->codec,
frame,
&is_finished,
buffer
);
if (buffer->dts != AV_NOPTS_VALUE)
{
//
// you should end up here
//
pts = buffer->dts;
}
else
{
pts = 0;
}
//
// adjust time base
//
pts *= av_q2d(ctxt->video_st->time_base);

Reading out specific video frame using FFMPEG API

I read frames from video stream in FFMPEG using this loop:
while(av_read_frame(pFormatCtx, &packet)>=0) {
// Is this a packet from the video stream?
if(packet.stream_index==videoStream) {
// Decode video frame
avcodec_decode_video2(pCodecCtx,pFrame,&frameFinished,&packet);
// Did we get a video frame?
if(frameFinished) {
sws_scale(img_convert_context ,pFrame->data,pFrame->linesize,0,
pCodecCtx->height, pFrameRGBA->data, pFrameRGBA->linesize);
printf("%s\n","Frame read finished ");
ExportFrame(pFrameRGBA->data[0]);
break;
}
}
// Save the frame to disk
}
printf("%s\n","Read next frame ");
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
}
So in this way the stream is read sequentially.What I want is to have a random access to the frame to be able reading a specific frame (by frame number).How is it done?
You may want to look
int av_seek_frame(AVFormatContext *s, int stream_index, int64_t timestamp,
int flags);
The above api will seek to the keyframe at give timestamp. After seeking you can read the frame. Also the tutorial below explain conversion between position and timestamp.
http://dranger.com/ffmpeg/tutorial07.html
Since most frames in a video depend on previous and next frames, in general, accessing random frames in a video is not straightforward. However, some frames, are encoded independently of any other frames, and occur regularly throughout the video. These frames are known as I-frames. Accessing these frames is straightforward through seeking.
If you want to "randomly" access any frame in the video, then you must:
Seek to the previous I-frame
Read the frames one by one until you get to the frame number that you want
You've already got the code for the second point, so all you need to do is take care of the first point and you're done. Here's an updated version of the Dranger tutorials that people often refer to -- it may be of help.