I'm trying to encode video as H264 using libavcodec
ffmpeg::avcodec_encode_video(codec,output,size,avframe);
returns an error that I don't have the avframe->pts value set correctly.
I have tried setting it to 0,1, AV_NOPTS_VALUE and 90khz * framenumber but still get the error non-strictly-monotonic PTS
The ffmpeg.c example sets the packet.pts with ffmpeg::av_rescale_q() but this is only called after you have encoded the frame !
When used with the MP4V codec the avcodec_encode_video() sets the pts value correctly itself.
I had the same problem, solved it by calculating pts before calling avcodec_encode_video as follows:
//Calculate PTS: (1 / FPS) * sample rate * frame number
//sample rate 90KHz is for h.264 at 30 fps
picture->pts = (1.0 / 30) * 90 * frame_count;
out_size = avcodec_encode_video(c, video_outbuf, video_outbuf_size, picture);
Solution stolen from this helpful blog post
(Note: Changed sample rate to khz, expressed in hz was far too long between frames, may need to play with this value - not a video encoding expert here, just wanted something that worked and this did)
I had this problem too. I sloved the problem in this way:
Before you invoke
ffmpeg::avcodec_encode_video(codec,output,size,avframe);
you set the pts value of avframe an integer value which has an initial value 0 and increments by one every time, just like this:
avframe->pts = nextPTS();
The implementation of nextPTS() is:
int nextPTS()
{
static int static_pts = 0;
return static_pts ++;
}
After giving the pts of avframe a value, then encoded it. If encoding successfully. Add the following code:
if (packet.pts != AV_NOPTS_VALUE)
packet.pts = av_rescale_q(packet.pts, mOutputCodecCtxPtr->time_base, mOutputStreamPtr->time_base);
if (packet.dts != AV_NOPTS_VALUE)
packet.dts = av_rescale_q(packet.dts, mOutputCodecCtxPtr->time_base, mOutputStreamPtr->time_base);
It'll add correct dts value for the encoded AVFrame. Among the code, packe of type AVPacket, mOutputCodeCtxPtr of type AVCodecContext* and mOutputStreamPtr of type AVStream.
avcodec_encode_video returns 0 indicates the current frame is buffered, you have to flush all buffered frames after all frames have been encoded. The code flushs all buffered frame somewhat like:
int ret;
while((ret = ffmpeg::avcodec_encode_video(codec,output,size,NULL)) >0)
;// place your code here.
I had this problem too. As far as I remember, the error is related to dts
setting
out_video_packet.dts = AV_NOPTS_VALUE;
helped me
A strictly increase monotonic function is a function where f(x) < f(y) if x < y.
So it means you cannot encode 2 frames with the same PTS as you were doing... check for example with a counter and it should not return error anymore.
Related
I need to get audio level or even better, EQ data from NDI audio stream in C++. Here's the struct of a audio packet:
// This describes an audio frame.
typedef struct NDIlib_audio_frame_v3_t {
// The sample-rate of this buffer.
int sample_rate;
// The number of audio channels.
int no_channels;
// The number of audio samples per channel.
int no_samples;
// The timecode of this frame in 100-nanosecond intervals.
int64_t timecode;
// What FourCC describing the type of data for this frame.
NDIlib_FourCC_audio_type_e FourCC;
// The audio data.
uint8_t* p_data;
union {
// If the FourCC is not a compressed type and the audio format is planar, then this will be the
// stride in bytes for a single channel.
int channel_stride_in_bytes;
// If the FourCC is a compressed type, then this will be the size of the p_data buffer in bytes.
int data_size_in_bytes;
};
// Per frame metadata for this frame. This is a NULL terminated UTF8 string that should be in XML format.
// If you do not want any metadata then you may specify NULL here.
const char* p_metadata;
// This is only valid when receiving a frame and is specified as a 100-nanosecond time that was the exact
// moment that the frame was submitted by the sending side and is generated by the SDK. If this value is
// NDIlib_recv_timestamp_undefined then this value is not available and is NDIlib_recv_timestamp_undefined.
int64_t timestamp;
#if NDILIB_CPP_DEFAULT_CONSTRUCTORS
NDIlib_audio_frame_v3_t(
int sample_rate_ = 48000, int no_channels_ = 2, int no_samples_ = 0,
int64_t timecode_ = NDIlib_send_timecode_synthesize,
NDIlib_FourCC_audio_type_e FourCC_ = NDIlib_FourCC_audio_type_FLTP,
uint8_t* p_data_ = NULL, int channel_stride_in_bytes_ = 0,
const char* p_metadata_ = NULL,
int64_t timestamp_ = 0
);
#endif // NDILIB_CPP_DEFAULT_CONSTRUCTORS
} NDIlib_audio_frame_v3_t;
Problem is that unlike video frames I have absolutely no idea how binary audio is packed and there's much less information about it online. The best information I found so far is this project:
https://github.com/gavinnn101/fishing_assistant/blob/7f5fcd73de1e39336226b5969cd1c5ca84c8058b/fishing_main.py#L124
It uses PyAudio however which I'm not familiar with and they use 16 bit audio format while mine seems to be 32bit and I can't figure out the struct.unpack stuff either because "%dh"%(count) is telling it some number then h for short which I don't understand how it would interpret.
Is there any C++ library that can take pointer to the data and type then has functions to extract sound level, sound level at certain hertz etc?
Or just some good information on how I would extract this myself? :)
I've searched the web a lot while finding very little. I've placed a breakpoint when the audio frame is populated but given up once I realize there's too many variables to think of that I don't have a clue about like sample rate, channels, sample count etc.
Got it working using
// This function calculates the RMS value of an audio frame
float calculateRMS(const NDIlib_audio_frame_v2_t& frame)
{
// Calculate the number of samples in the frame
int numSamples = frame.no_samples * frame.no_channels;
// Get a pointer to the start of the audio data
const float* data = frame.p_data;
// Calculate the sum of the squares of the samples
float sumSquares = 0.0f;
for (int i = 0; i < numSamples; ++i)
{
float sample = data[i];
sumSquares += sample * sample;
}
// Calculate the RMS value and return it
return std::sqrt(sumSquares / numSamples);
}
called as
// Keep receiving audio frames and printing their RMS values
NDIlib_audio_frame_v2_t audioFrame;
while (true)
{
// Wait for the next audio frame to be received
if (NDIlib_recv_capture_v2(pNDI_recv, NULL, &audioFrame, NULL, 0) != NDIlib_frame_type_audio)
continue;
// Print the RMS value of the audio frame
std::cout << "RMS: " << calculateRMS(audioFrame) << std::endl;
NDIlib_recv_free_audio_v2(pNDI_recv, &audioFrame);
}
Shoutout to chatGPT for explaining and feeding me with possible solutions until I managed to get a working solution :--)
I am using fftw to analyse the frequency spectrum of audio input to a computer from the mic input. I am using portaudio c++ libraries to capture the windows of time-domain audio data and then fftw to do a real to complex r2c transformation of this data to the frequency domain. Below is my function which I call everytime I receive the block of data.
The sample rate is 44100 samples per second , the sample type is short (signed 16 bit integer)and I am taking 250ms blocks of data in each window. The fft resolution is therefore 4Hz.
The problem is , i'm not sure how to interpret the data which I am receiving after the transformation. When no audio is played , I am getting amplitudes of around 1000 to 4000 for every frequency component, as soon as audio is played from an instrument for example, all of the amplitudes go negative.
I have tried doing a normalisation before the fft, by dividing by the average power and then the data makes more sense. All amplitudes are from 200 to 500 when nothing is played, then for example if I play a tone of 76Hz, the amplitude for this component increases to around 2000. So that is something along the lines of what I expect, but still not sure if this process can be implemented better.
My question is, am I doing the right thing here? Must the data be normalised and am I doing it right? Why am I still receiving high amplitudes on the frequencies that are not being played. Has anyone any experience of doing something similar and maybe give some tips. Many thanks in advance.
void AudioProcessor::GetFFT(void* inputData, void* freqSpectrum)
{
double* input = (double*)inputData;
short* freq_spectrum = (short*)freqSpectrum;
fftPlan = fftw_plan_dft_r2c_1d(FRAMES_PER_BUFFER, input, complexOut, FFTW_ESTIMATE);
fftw_execute(fftPlan);
////
for (int k = 0; k < (FRAMES_PER_BUFFER + 1) / 2; ++k)
{
freq_spectrum[k] = (short)(sqrt(complexOut[k][0] * complexOut[k][0] + complexOut[k][1] * complexOut[k][1]));
}
if (FRAMES_PER_BUFFER % 2 == 0) /* frames per buffer is even number */
{
freq_spectrum[FRAMES_PER_BUFFER / 2] = (short)(sqrt(complexOut[FRAMES_PER_BUFFER / 2][0] * complexOut[FRAMES_PER_BUFFER / 2][0] + complexOut[FRAMES_PER_BUFFER / 2][1] * complexOut[FRAMES_PER_BUFFER / 2][1])); /* Nyquist freq. */
}
}
I am using ffmpeg to record video input from GDI (windows screen recorder) to view it later using VLC (via ActiveX plugin) + ffmpeg to decode it.
Right now seeking in video is not working in VLC via plugin (which is critical). VLC player itself provide seeking, but it is more like byte position seeking (on I- frames which are larger than other frames it makes larger steps on horizontal scroll and also there are no timestamps).
Encoder is opened with next defaults:
avformat_alloc_output_context2(&outputContext, NULL, "mpegts", "test.mpg");
outputFormat = outputContext->oformat;
encoder = avcodec_find_encoder(AV_CODEC_ID_H264);
outputStream = avformat_new_stream(outputContext, encoder);
outputStream->id = outputContext->nb_streams - 1;
encoderContext = outputStream->codec;
encoderContext->bit_rate = bitrate; // 800000 by default
encoderContext->rc_max_rate = bitrate;
encoderContext->width = imageWidth; // 1920
encoderContext->height = imageHeight; // 1080
encoderContext->time_base.num = 1;
encoderContext->time_base.den = fps; // 25 by default
encoderContext->gop_size = fps;
encoderContext->keyint_min = fps;
encoderContext->max_b_frames = 0;
encoderContext->pix_fmt = AV_PIX_FMT_YUV420P;
outputStream->time_base = encoderContext->time_base;
avcodec_open2(encoderContext, encoder, NULL);
Recording is done this way:
// my impl of GDI recorder, returning AVFrame with only data and linesize filled.
AVFrame* tmp_frame = impl_->recorder->acquireFrame();
// converting RGB -> YUV420
sws_scale(impl_->scaleContext, tmp_frame->data, tmp_frame->linesize, 0, impl_->frame->height, impl_->frame->data, impl_->frame->linesize);
// pts variable is calculated by using QueryPerformanceCounter form WinAPI. It is strictly increasing
impl_->frame->pts = pts;
avcodec_encode_video2(impl_->encoderContext, impl_->packet, impl_->frame, &out_size);
if (out_size) {
impl_->packet->pts = pts;
impl_->packet->dts = pts;
impl_->packet->duration = 1; // here it is! It is set but has no effect
av_packet_rescale_ts(impl_->packet, impl_->encoderContext->time_base, impl_->outputStream->time_base);
// here pts = 3600*pts, dts = 3600*pts, duration = 3600 what I consider to be legit in terms of milliseconds
impl_->packet->stream_index = impl_->outputStream->index;
av_interleaved_write_frame(impl_->outputContext, impl_->packet);
av_packet_unref(impl_->packet);
out_size = 0;
}
ffprobe is providing next info on frames:
[FRAME]
media_type=video
stream_index=0
key_frame=1
pkt_pts=3600
pkt_pts_time=0:00:00.040000
pkt_dts=3600
pkt_dts_time=0:00:00.040000
best_effort_timestamp=3600
best_effort_timestamp_time=0:00:00.040000
pkt_duration=N/A
pkt_duration_time=N/A
pkt_pos=564
pkt_size=97.018555 Kibyte
width=1920
height=1080
pix_fmt=yuv420p
sample_aspect_ratio=N/A
pict_type=I
coded_picture_number=0
display_picture_number=0
interlaced_frame=0
top_field_first=0
repeat_pict=0
[/FRAME]
I believe that problem is in pkt_duration variable, though it was set.
What I am doing wrong in recording so I can't seek in video?
P.S. on other videos (also h264) seeking is working in ActiveX VLC plugin.
What is definitely wrong, is:
impl_->packet->pts = pts;
impl_->packet->dts = pts;
PTS and DTS are not equal! They could be if you would have only I-frames, which is not the case here. Also, your comment says: pts variable is calculated by using QueryPerformanceCounter form WinAPI. If your frame rate is constant, and I believe it is, then you don't need QueryPerformanceCounter API. PTS is usually in 90kHz units. The duration of 1 frame expressed in 90kHz is calculated like this:
90000 x denominator / numerator
If fps is 25 then numerator is 25 and denominator is 1. For 29.97 fps the numerator is 30000 and denominator is 1001. Each new frame's PTS should be increased for that amount (unless you have dropped frames). Regarding the DTS, the encoder should provide that value.
Currently my application reads audio files based on a while-realloc loop:
// Pseudocode
float data* = nullptr;
int size = 0;
AVFrame* frame;
while(readFrame(formatContext, frame))
{
data = realloc(data, size + frame.nSamples);
size += frame.nSamples;
/* Read frame samples into data */
}
Is there a way to obtain the total number of samples in a stream at the beginning? I want to be able to create the array with new[] instead of malloc.
For reference, this was answered here:
FFmpeg: How to estimate number of samples in audio stream?
I used the following in my code:
int total_samples = (int) ((format_context->duration / (float) AV_TIME_BASE) * SAMPLE_RATE * NUMBER_CHANNELS);
NOTE: my testing shows this calculation will be most likely be more than the actual number of samples found, so make sure you compensate for that in your code. I set all remaining "unset" samples to zero.
I have a video which has 23.98 fps, this can be seen from Quicktime and ffmpeg in the command line. OpenCV, wrongly, thinks it has 23 fps. I am interested in finding a programmatic way to find out a video fps' from ffmpeg.
To get video Frames per Second (fps) value using FFMPEG-C++
/* find first stream */
for(int i=0; i<pAVFormatContext->nb_streams ;i++ )
{
if( pAVFormatContext->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO )
/* if video stream found then get the index */
{
VideoStreamIndx = i;
break;
}
}
/* if video stream not availabe */
if((VideoStreamIndx) == -1)
{
std::cout<<"video streams not found"<<std::endl;
return -1;
}
/* get video fps */
double videoFPS = av_q2d(ptrAVFormatContext->streams[VideoStreamIndx]->r_frame_rate);
std::cout<<"fps :"<<videoFPS<<std::endl;
A quick look into the OpenCV sources show the following:
double CvCapture_FFMPEG::get_fps()
{
double fps = r2d(ic->streams[video_stream]->r_frame_rate);
#if LIBAVFORMAT_BUILD >= CALC_FFMPEG_VERSION(52, 111, 0)
if (fps < eps_zero)
{
fps = r2d(ic->streams[video_stream]->avg_frame_rate);
}
#endif
if (fps < eps_zero)
{
fps = 1.0 / r2d(ic->streams[video_stream]->codec->time_base);
}
return fps;
}
so looks quite right. Maybe run a debug session through this part to verify the values at this point? The avg_frame_rate of AVStream is an AVRational so it should be able to hold the precise value. Maybe if your code uses the second if block due to an older ffmpeg version the time_base is not set right?
EDIT
If you debug take a look if the r_frame_rate and avg_frame_rate differ, since at least according to this they tend to differ based on the codec used. Since you have not mentioned the video format it's hard to guess, but seems that at least for H264 you should use avg_ frame_rate straightforward and a value obtained from r_frame_rate could mess things up.
From libavformat version 55.1.100 released in 2013-03-29, av_guess_frame_rate() is added.
/**
* Guess the frame rate, based on both the container and codec information.
*
* #param ctx the format context which the stream is part of
* #param stream the stream which the frame is part of
* #param frame the frame for which the frame rate should be determined, may be NULL
* #return the guessed (valid) frame rate, 0/1 if no idea
*/
AVRational av_guess_frame_rate(AVFormatContext *ctx, AVStream *stream, AVFrame *frame);