Getting audio sound level from FLTP audio stream

Getting audio sound level from FLTP audio stream - c++

I need to get audio level or even better, EQ data from NDI audio stream in C++. Here's the struct of a audio packet:
// This describes an audio frame.
typedef struct NDIlib_audio_frame_v3_t {
// The sample-rate of this buffer.
int sample_rate;
// The number of audio channels.
int no_channels;
// The number of audio samples per channel.
int no_samples;
// The timecode of this frame in 100-nanosecond intervals.
int64_t timecode;
// What FourCC describing the type of data for this frame.
NDIlib_FourCC_audio_type_e FourCC;
// The audio data.
uint8_t* p_data;
union {
// If the FourCC is not a compressed type and the audio format is planar, then this will be the
// stride in bytes for a single channel.
int channel_stride_in_bytes;
// If the FourCC is a compressed type, then this will be the size of the p_data buffer in bytes.
int data_size_in_bytes;
};
// Per frame metadata for this frame. This is a NULL terminated UTF8 string that should be in XML format.
// If you do not want any metadata then you may specify NULL here.
const char* p_metadata;
// This is only valid when receiving a frame and is specified as a 100-nanosecond time that was the exact
// moment that the frame was submitted by the sending side and is generated by the SDK. If this value is
// NDIlib_recv_timestamp_undefined then this value is not available and is NDIlib_recv_timestamp_undefined.
int64_t timestamp;
#if NDILIB_CPP_DEFAULT_CONSTRUCTORS
NDIlib_audio_frame_v3_t(
int sample_rate_ = 48000, int no_channels_ = 2, int no_samples_ = 0,
int64_t timecode_ = NDIlib_send_timecode_synthesize,
NDIlib_FourCC_audio_type_e FourCC_ = NDIlib_FourCC_audio_type_FLTP,
uint8_t* p_data_ = NULL, int channel_stride_in_bytes_ = 0,
const char* p_metadata_ = NULL,
int64_t timestamp_ = 0
);
#endif // NDILIB_CPP_DEFAULT_CONSTRUCTORS
} NDIlib_audio_frame_v3_t;
Problem is that unlike video frames I have absolutely no idea how binary audio is packed and there's much less information about it online. The best information I found so far is this project:
https://github.com/gavinnn101/fishing_assistant/blob/7f5fcd73de1e39336226b5969cd1c5ca84c8058b/fishing_main.py#L124
It uses PyAudio however which I'm not familiar with and they use 16 bit audio format while mine seems to be 32bit and I can't figure out the struct.unpack stuff either because "%dh"%(count) is telling it some number then h for short which I don't understand how it would interpret.
Is there any C++ library that can take pointer to the data and type then has functions to extract sound level, sound level at certain hertz etc?
Or just some good information on how I would extract this myself? :)
I've searched the web a lot while finding very little. I've placed a breakpoint when the audio frame is populated but given up once I realize there's too many variables to think of that I don't have a clue about like sample rate, channels, sample count etc.

Got it working using
// This function calculates the RMS value of an audio frame
float calculateRMS(const NDIlib_audio_frame_v2_t& frame)
{
// Calculate the number of samples in the frame
int numSamples = frame.no_samples * frame.no_channels;
// Get a pointer to the start of the audio data
const float* data = frame.p_data;
// Calculate the sum of the squares of the samples
float sumSquares = 0.0f;
for (int i = 0; i < numSamples; ++i)
{
float sample = data[i];
sumSquares += sample * sample;
}
// Calculate the RMS value and return it
return std::sqrt(sumSquares / numSamples);
}
called as
// Keep receiving audio frames and printing their RMS values
NDIlib_audio_frame_v2_t audioFrame;
while (true)
{
// Wait for the next audio frame to be received
if (NDIlib_recv_capture_v2(pNDI_recv, NULL, &audioFrame, NULL, 0) != NDIlib_frame_type_audio)
continue;
// Print the RMS value of the audio frame
std::cout << "RMS: " << calculateRMS(audioFrame) << std::endl;
NDIlib_recv_free_audio_v2(pNDI_recv, &audioFrame);
}
Shoutout to chatGPT for explaining and feeding me with possible solutions until I managed to get a working solution :--)

Related

NVencs Output Bitstream is not readable

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback.
The Video has the length of 0, the size of the Vid seems rather small, too.
Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?
The flow of the application is as following:
Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.
The initializion of NVenc-Encoder:
InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid; //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0;
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height;
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1;
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config
NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering
Registration of CudaResource
m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType =
NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);
Encoding
m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;
EventWithId latestEvent(currentBufferAndEvent,
m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue
CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap
Every little hint, where to look at, is very much appreciated. I'm running out of ideas what might be wrong.
Thanks a lot!

With the help of hall822 from the nvidia forums I managed to solve the issue.
The primary error was that I registered my cuda resource with a pitch equal to the size of the frame. I'm using a Framebuffer-Renderbuffer to draw my content into. The data of this is a plain, unpitched array. My first thought, giving a pitch equal to zero, failed. The encoder did nothing. The next idea was to set it to the width of the frame, a quarter of the image was encoded.
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
To answer this question: Yes, it is correct. But the pitch is measured in byte. So because I'm encoding RGBA-Frames, the correct pitch has to be FRAME_WIDTH * 4.
The second error was that my color channels were not right (See point 4 in my opening post). The NVidia enum says that the encoder expects the channels in ARGB format but actually ment is BGRA, so the alpha channel which is always 255 polluted the blue channel.
Edit: This may be due to the fact that NVidia is using little endian internally. I'm writing
my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

Obtain the total number of samples with FFMpeg

Currently my application reads audio files based on a while-realloc loop:
// Pseudocode
float data* = nullptr;
int size = 0;
AVFrame* frame;
while(readFrame(formatContext, frame))
{
data = realloc(data, size + frame.nSamples);
size += frame.nSamples;
/* Read frame samples into data */
}
Is there a way to obtain the total number of samples in a stream at the beginning? I want to be able to create the array with new[] instead of malloc.

For reference, this was answered here:
FFmpeg: How to estimate number of samples in audio stream?
I used the following in my code:
int total_samples = (int) ((format_context->duration / (float) AV_TIME_BASE) * SAMPLE_RATE * NUMBER_CHANNELS);
NOTE: my testing shows this calculation will be most likely be more than the actual number of samples found, so make sure you compensate for that in your code. I set all remaining "unset" samples to zero.

How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?

I am writing a C++ code where a sequence of N different frames is generated after performing some operations implemented therein. After each frame is completed, I write it on the disk as IMG_%d.png, and finally I encode them to a video through ffmpeg using the x264 codec.
The summarized pseudocode of the main part of the program is the following one:
std::vector<int> B(width*height*3);
for (i=0; i<N; i++)
{
// void generateframe(std::vector<int> &, int)
generateframe(B, i); // Returns different images for different i values.
sprintf(s, "IMG_%d.png", i+1);
WriteToDisk(B, s); // void WriteToDisk(std::vector<int>, char[])
}
The problem of this implementation is that the number of desired frames, N, is usually high (N~100000) as well as the resolution of the pictures (1920x1080), resulting into an overload of the disk, producing write cycles of dozens of GB after each execution.
In order to avoid this, I have been trying to find documentation about parsing directly each image stored in the vector B to an encoder such as x264 (without having to write the intermediate image files to the disk). Albeit some interesting topics were found, none of them solved specifically what I exactly want to, as many of them concern the execution of the encoder with existing images files on the disk, whilst others provide solutions for other programming languages such as Python (here you can find a fully satisfactory solution for that platform).
The pseudocode of what I would like to obtain is something similar to this:
std::vector<int> B(width*height*3);
video_file=open_video("Generated_Video.mp4", ...[encoder options]...);
for (i=0; i<N; i++)
{
generateframe(B, i+1);
add_frame(video_file, B);
}
video_file.close();
According to what I have read on related topics, the x264 C++ API might be able to do this, but, as stated above, I did not find a satisfactory answer for my specific question. I tried learning and using directly the ffmpeg source code, but both its low ease of use and compilation issues forced me to discard this possibility as a mere non-professional programmer I am (I take it as just as a hobby and unluckily I cannot waste that many time learning something so demanding).
Another possible solution that came to my mind is to find a way to call the ffmpeg binary file in the C++ code, and somehow manage to transfer the image data of each iteration (stored in B) to the encoder, letting the addition of each frame (that is, not "closing" the video file to write) until the last frame, so that more frames can be added until reaching the N-th one, where the video file will be "closed". In other words, call ffmpeg.exe through the C++ program to write the first frame to a video, but make the encoder "wait" for more frames. Then call again ffmpeg to add the second frame and make the encoder "wait" again for more frames, and so on until reaching the last frame, where the video will be finished. However, I do not know how to proceed or if it is actually possible.
Edit 1:
As suggested in the replies, I have been documenting about named pipes and tried to use them in my code. First of all, it should be remarked that I am working with Cygwin, so my named pipes are created as they would be created under Linux. The modified pseudocode I used (including the corresponding system libraries) is the following one:
FILE *fd;
mkfifo("myfifo", 0666);
for (i=0; i<N; i++)
{
fd=fopen("myfifo", "wb");
generateframe(B, i+1);
WriteToPipe(B, fd); // void WriteToPipe(std::vector<int>, FILE *&fd)
fflush(fd);
fd=fclose("myfifo");
}
unlink("myfifo");
WriteToPipe is a slight modification of the previous WriteToFile function, where I made sure that the write buffer to send the image data is small enough to fit the pipe buffering limitations.
Then I compile and write the following command in the Cygwin terminal:
./myprogram | ffmpeg -i pipe:myfifo -c:v libx264 -preset slow -crf 20 Video.mp4
However, it remains stuck at the loop when i=0 at the "fopen" line (that is, the first fopen call). If I had not called ffmpeg it would be natural as the server (my program) would be waiting for a client program to connect to the "other side" of the pipe, but it is not the case. It looks like they cannot be connected through the pipe somehow, but I have not been able to find further documentation in order to overcome this issue. Any suggestion?

After some intense struggle, I finally managed to make it work after learning a bit how to use the FFmpeg and libx264 C APIs for my specific purpose, thanks to the useful information that some users provided in this site and some others, as well as some FFmpeg's documentation examples. For the sake of illustration, the details will be presented next.
First of all, the libx264 C library was compiled and, after that, the FFmpeg one with the configure options --enable-gpl --enable-libx264. Now let us go to the coding. The relevant part of the code that achieved the requested purpose is the following one:
Includes:
#include <stdint.h>
extern "C"{
#include <x264.h>
#include <libswscale/swscale.h>
#include <libavcodec/avcodec.h>
#include <libavutil/mathematics.h>
#include <libavformat/avformat.h>
#include <libavutil/opt.h>
}
LDFLAGS on Makefile:
-lx264 -lswscale -lavutil -lavformat -lavcodec
Inner code (for the sake of simplicity, the error checkings will be omitted and the variable declarations will be done when needed instead of the beginning for better understanding):
av_register_all(); // Loads the whole database of available codecs and formats.
struct SwsContext* convertCtx = sws_getContext(width, height, AV_PIX_FMT_RGB24, width, height, AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL); // Preparing to convert my generated RGB images to YUV frames.
// Preparing the data concerning the format and codec in order to write properly the header, frame data and end of file.
char *fmtext="mp4";
char *filename;
sprintf(filename, "GeneratedVideo.%s", fmtext);
AVOutputFormat * fmt = av_guess_format(fmtext, NULL, NULL);
AVFormatContext *oc = NULL;
avformat_alloc_output_context2(&oc, NULL, NULL, filename);
AVStream * stream = avformat_new_stream(oc, 0);
AVCodec *codec=NULL;
AVCodecContext *c= NULL;
int ret;
codec = avcodec_find_encoder_by_name("libx264");
// Setting up the codec:
av_dict_set( &opt, "preset", "slow", 0 );
av_dict_set( &opt, "crf", "20", 0 );
avcodec_get_context_defaults3(stream->codec, codec);
c=avcodec_alloc_context3(codec);
c->width = width;
c->height = height;
c->pix_fmt = AV_PIX_FMT_YUV420P;
// Setting up the format, its stream(s), linking with the codec(s) and write the header:
if (oc->oformat->flags & AVFMT_GLOBALHEADER) // Some formats require a global header.
c->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avcodec_open2( c, codec, &opt );
av_dict_free(&opt);
stream->time_base=(AVRational){1, 25};
stream->codec=c; // Once the codec is set up, we need to let the container know which codec are the streams using, in this case the only (video) stream.
av_dump_format(oc, 0, filename, 1);
avio_open(&oc->pb, filename, AVIO_FLAG_WRITE);
ret=avformat_write_header(oc, &opt);
av_dict_free(&opt);
// Preparing the containers of the frame data:
AVFrame *rgbpic, *yuvpic;
// Allocating memory for each RGB frame, which will be lately converted to YUV:
rgbpic=av_frame_alloc();
rgbpic->format=AV_PIX_FMT_RGB24;
rgbpic->width=width;
rgbpic->height=height;
ret=av_frame_get_buffer(rgbpic, 1);
// Allocating memory for each conversion output YUV frame:
yuvpic=av_frame_alloc();
yuvpic->format=AV_PIX_FMT_YUV420P;
yuvpic->width=width;
yuvpic->height=height;
ret=av_frame_get_buffer(yuvpic, 1);
// After the format, code and general frame data is set, we write the video in the frame generation loop:
// std::vector<uint8_t> B(width*height*3);
The above commented vector has the same structure than the one I exposed in my question; however, the RGB data is stored on the AVFrames in a specific way. Therefore, for the sake of exposition, let us assume we have instead a pointer to a structure of the form uint8_t[3] Matrix(int, int), whose way to access the color values of the pixels for a given coordinate (x, y) is Matrix(x, y)->Red, Matrix(x, y)->Green and Matrix(x, y)->Blue, in order to get, respectively, to the red, green and blue values of the coordinate (x, y). The first argument stands for the horizontal position, from left to right as x increases and the second one for the vertical position, from top to bottom as y increases.
Being that said, the for loop to transfer the data, encode and write each frame would be the following one:
Matrix B(width, height);
int got_output;
AVPacket pkt;
for (i=0; i<N; i++)
{
generateframe(B, i); // This one is the function that generates a different frame for each i.
// The AVFrame data will be stored as RGBRGBRGB... row-wise, from left to right and from top to bottom, hence we have to proceed as follows:
for (y=0; y<height; y++)
{
for (x=0; x<width; x++)
{
// rgbpic->linesize[0] is equal to width.
rgbpic->data[0][y*rgbpic->linesize[0]+3*x]=B(x, y)->Red;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+1]=B(x, y)->Green;
rgbpic->data[0][y*rgbpic->linesize[0]+3*x+2]=B(x, y)->Blue;
}
}
sws_scale(convertCtx, rgbpic->data, rgbpic->linesize, 0, height, yuvpic->data, yuvpic->linesize); // Not actually scaling anything, but just converting the RGB data to YUV and store it in yuvpic.
av_init_packet(&pkt);
pkt.data = NULL;
pkt.size = 0;
yuvpic->pts = i; // The PTS of the frame are just in a reference unit, unrelated to the format we are using. We set them, for instance, as the corresponding frame number.
ret=avcodec_encode_video2(c, &pkt, yuvpic, &got_output);
if (got_output)
{
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base); // We set the packet PTS and DTS taking in the account our FPS (second argument) and the time base that our selected format uses (third argument).
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt); // Write the encoded frame to the mp4 file.
av_packet_unref(&pkt);
}
}
// Writing the delayed frames:
for (got_output = 1; got_output; i++) {
ret = avcodec_encode_video2(c, &pkt, NULL, &got_output);
if (got_output) {
fflush(stdout);
av_packet_rescale_ts(&pkt, (AVRational){1, 25}, stream->time_base);
pkt.stream_index = stream->index;
printf("Write frame %6d (size=%6d)\n", i, pkt.size);
av_interleaved_write_frame(oc, &pkt);
av_packet_unref(&pkt);
}
}
av_write_trailer(oc); // Writing the end of the file.
if (!(fmt->flags & AVFMT_NOFILE))
avio_closep(oc->pb); // Closing the file.
avcodec_close(stream->codec);
// Freeing all the allocated memory:
sws_freeContext(convertCtx);
av_frame_free(&rgbpic);
av_frame_free(&yuvpic);
avformat_free_context(oc);
Side notes:
For future reference, as the available information on the net concerning the time stamps (PTS/DTS) looks so confusing, I will next explain as well how I did manage to solve the issues by setting the proper values. Setting these values incorrectly caused that the output size was being much bigger than the one obtained through the ffmpeg built binary commandline tool, because the frame data was being redundantly written through smaller time intervals than the actually set by the FPS.
First of all, it should be remarked that when encoding there are two kinds of time stamps: one associated to the frame (PTS) (pre-encoding stage) and two associated to the packet (PTS and DTS) (post-encoding stage). In the first case, it looks like the frame PTS values can be assigned using a custom unit of reference (with the only restriction that they must be equally spaced if one wants constant FPS), so one can take for instance the frame number as we did in the above code. In the second one, we have to take into account the following parameters:
The time base of the output format container, in our case mp4 (=12800 Hz), whose information is held in stream->time_base.
The desired FPS of the video.
If the encoder generates B-frames or not (in the second case the PTS and DTS values for the frame must be set the same, but it is more complicated if we are in the first case, like in this example). See this answer to another related question for more references.
The key here is that luckily it is not necessary to struggle with the computation of these quantities, as libav provides a function to compute the correct time stamps associated to the packet by knowing the aforementioned data:
av_packet_rescale_ts(AVPacket *pkt, AVRational FPS, AVRational time_base)
Thanks to these considerations, I was finally able to generate a sane output container and essentially the same compression rate than the one obtained using the commandline tool, which were the two remaining issues before investigating more deeply how the format header and trailer and how the time stamps are properly set.

Thanks for your excellent work, #ksb496 !
One minor improvement:
c=avcodec_alloc_context3(codec);
should be better written as:
c = stream->codec;
to avoid a memory leak.
If you don't mind, I've uploaded the complete ready-to-deploy library onto GitHub: https://github.com/apc-llc/moviemaker-cpp.git

Thanks to ksb496 I managed to do this task, but in my case I need to change some codes to work as expected. I thought maybe it could help others so I decided to share (with two years delay :D).
I had an RGB buffer filled by directshow sample grabber that I needed to take a video from. RGB to YUV conversion from given answer didn't do the job for me. I did it like this :
int stride = m_width * 3;
int index = 0;
for (int y = 0; y < m_height; y++) {
for (int x = 0; x < stride; x++) {
int j = (size - ((y + 1)*stride)) + x;
m_rgbpic->data[0][j] = data[index];
++index;
}
}
data variable here is my RGB buffer (simple BYTE*) and size is data buffer size in bytes. It's start filling RGB AVFrame from bottom left to top right.
The other thing is that my version of FFMPEG didn't have av_packet_rescale_ts function. It's latest version but FFMPEG docs didn't say this function is deprecated anywhere, I guess this might be the case for windows only. Anyway I used av_rescale_q instead that does the same job. like this :
AVPacket pkt;
pkt.pts = av_rescale_q(pkt.pts, { 1, 25 }, m_stream->time_base);
And the last thing, using this format conversion I needed to change my swsContext to BGR24 instead of RGB24 like this :
m_convert_ctx = sws_getContext(width, height, AV_PIX_FMT_BGR24, width, height,
AV_PIX_FMT_YUV420P, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);

avcodec_encode_video2 & avcodec_encode_audio2 seems to be deprecated. FFmpeg of Current Version (4.2) has new API: avcodec_send_frame & avcodec_receive_packet.

ffmpeg, C++ - get fps in program

I have a video which has 23.98 fps, this can be seen from Quicktime and ffmpeg in the command line. OpenCV, wrongly, thinks it has 23 fps. I am interested in finding a programmatic way to find out a video fps' from ffmpeg.

To get video Frames per Second (fps) value using FFMPEG-C++
/* find first stream */
for(int i=0; i<pAVFormatContext->nb_streams ;i++ )
{
if( pAVFormatContext->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO )
/* if video stream found then get the index */
{
VideoStreamIndx = i;
break;
}
}
/* if video stream not availabe */
if((VideoStreamIndx) == -1)
{
std::cout<<"video streams not found"<<std::endl;
return -1;
}
/* get video fps */
double videoFPS = av_q2d(ptrAVFormatContext->streams[VideoStreamIndx]->r_frame_rate);
std::cout<<"fps :"<<videoFPS<<std::endl;

A quick look into the OpenCV sources show the following:
double CvCapture_FFMPEG::get_fps()
{
double fps = r2d(ic->streams[video_stream]->r_frame_rate);
#if LIBAVFORMAT_BUILD >= CALC_FFMPEG_VERSION(52, 111, 0)
if (fps < eps_zero)
{
fps = r2d(ic->streams[video_stream]->avg_frame_rate);
}
#endif
if (fps < eps_zero)
{
fps = 1.0 / r2d(ic->streams[video_stream]->codec->time_base);
}
return fps;
}
so looks quite right. Maybe run a debug session through this part to verify the values at this point? The avg_frame_rate of AVStream is an AVRational so it should be able to hold the precise value. Maybe if your code uses the second if block due to an older ffmpeg version the time_base is not set right?
EDIT
If you debug take a look if the r_frame_rate and avg_frame_rate differ, since at least according to this they tend to differ based on the codec used. Since you have not mentioned the video format it's hard to guess, but seems that at least for H264 you should use avg_ frame_rate straightforward and a value obtained from r_frame_rate could mess things up.

From libavformat version 55.1.100 released in 2013-03-29, av_guess_frame_rate() is added.
/**
* Guess the frame rate, based on both the container and codec information.
*
* #param ctx the format context which the stream is part of
* #param stream the stream which the frame is part of
* #param frame the frame for which the frame rate should be determined, may be NULL
* #return the guessed (valid) frame rate, 0/1 if no idea
*/
AVRational av_guess_frame_rate(AVFormatContext *ctx, AVStream *stream, AVFrame *frame);

ffmpeg::avcodec_encode_video setting PTS h264

I'm trying to encode video as H264 using libavcodec
ffmpeg::avcodec_encode_video(codec,output,size,avframe);
returns an error that I don't have the avframe->pts value set correctly.
I have tried setting it to 0,1, AV_NOPTS_VALUE and 90khz * framenumber but still get the error non-strictly-monotonic PTS
The ffmpeg.c example sets the packet.pts with ffmpeg::av_rescale_q() but this is only called after you have encoded the frame !
When used with the MP4V codec the avcodec_encode_video() sets the pts value correctly itself.

I had the same problem, solved it by calculating pts before calling avcodec_encode_video as follows:
//Calculate PTS: (1 / FPS) * sample rate * frame number
//sample rate 90KHz is for h.264 at 30 fps
picture->pts = (1.0 / 30) * 90 * frame_count;
out_size = avcodec_encode_video(c, video_outbuf, video_outbuf_size, picture);
Solution stolen from this helpful blog post
(Note: Changed sample rate to khz, expressed in hz was far too long between frames, may need to play with this value - not a video encoding expert here, just wanted something that worked and this did)

I had this problem too. I sloved the problem in this way:
Before you invoke
ffmpeg::avcodec_encode_video(codec,output,size,avframe);
you set the pts value of avframe an integer value which has an initial value 0 and increments by one every time, just like this:
avframe->pts = nextPTS();
The implementation of nextPTS() is:
int nextPTS()
{
static int static_pts = 0;
return static_pts ++;
}
After giving the pts of avframe a value, then encoded it. If encoding successfully. Add the following code:
if (packet.pts != AV_NOPTS_VALUE)
packet.pts = av_rescale_q(packet.pts, mOutputCodecCtxPtr->time_base, mOutputStreamPtr->time_base);
if (packet.dts != AV_NOPTS_VALUE)
packet.dts = av_rescale_q(packet.dts, mOutputCodecCtxPtr->time_base, mOutputStreamPtr->time_base);
It'll add correct dts value for the encoded AVFrame. Among the code, packe of type AVPacket, mOutputCodeCtxPtr of type AVCodecContext* and mOutputStreamPtr of type AVStream.
avcodec_encode_video returns 0 indicates the current frame is buffered, you have to flush all buffered frames after all frames have been encoded. The code flushs all buffered frame somewhat like:
int ret;
while((ret = ffmpeg::avcodec_encode_video(codec,output,size,NULL)) >0)
;// place your code here.

I had this problem too. As far as I remember, the error is related to dts
setting
out_video_packet.dts = AV_NOPTS_VALUE;
helped me

A strictly increase monotonic function is a function where f(x) < f(y) if x < y.
So it means you cannot encode 2 frames with the same PTS as you were doing... check for example with a counter and it should not return error anymore.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Getting audio sound level from FLTP audio stream - c++

Related

NVencs Output Bitstream is not readable

Obtain the total number of samples with FFMpeg

How to encode a video from several images generated in a C++ program without writing the separate frame images to disk?

ffmpeg, C++ - get fps in program

ffmpeg::avcodec_encode_video setting PTS h264

Categories

Resources