Get frame time in ffmpeg

Get frame time in ffmpeg - c++

I am trying to make a little video player that has seek bar (with ffmpeg, of course). For that i need function that will, using data from frame and/or packet, get me current time in the video that should be set in seek slider.
It should work like this:
my_time = get_cur_time()
seek(my_time + 10)
assert(my_time+10 == get_cur_time())
seek(my_time - 10)
assert(my_time-10 == get_cur_time())
I do understand thatffmpeg does not support precise seeking, so equality here means "something reasonably cloae).
What code have i used for this thus far:
frame_time = frame->pts*av_q2d(video_dec_ctx->time_base) * 1000;
where frame is AVFrame and video_dec_ctx is AVCodecContext.
And for seeking:
int fn = ffmpeg::av_rescale(tsms,fmt_ctx->streams[video_stream->index]->time_base.den,
fmt_ctx->streams[video_stream->index]->time_base.num);
int frame = fn/1000;
printf("\t avformat_seek_file to %d\n",frame);
int flags = AVSEEK_FLAG_FRAME;
if (frame < this->frame->pts)
flags |= AVSEEK_FLAG_BACKWARD;
if(ffmpeg::av_seek_frame(fmt_ctx,video_stream->index,frame,flags))
{
printf("\nFailed to seek for time %d",frame);
return false;
}
avcodec_flush_buffers(video_dec_ctx);
int got_frame = 0;
do
if (av_read_frame(fmt_ctx, &pkt) >= 0) {
decode_packet_ro(&got_frame, 0);
av_free_packet(&pkt);
}
else
{
read_cache = true;
pkt.data = NULL;
pkt.size = 0;
break;
}
while(!(got_frame && this->frame->pts >= frame));
The code does forward seeking passably, but after any attempt of backward seeking my second assertion fails. After seeking to previous position, my method of getting time does not return position less that one before seeking. That causes my seek slider to work grossly incorrectly.

Related

iOS AVCaptureSession stream micro audio Objective-C++

I am currently messing the first time with iOS and Objective-C++. Im coming from C/C++ so please excuse my bad coding in the below examples.
I am trying to live stream the microphone audio of my iOS device over tcp, the iOS device is acting as server and sends the data to all clients that connect.
To do so, I am first using AVCaptureDevice and requestAccessForMediaType:AVMediaTypeAudio to request access to the microphone (along with the needed entry in the Info.plist).
Then I create a AVCaptureSession* using the below function:
AVCaptureSession* createBasicARecordingSession(aReceiver* ObjectReceivingAudioFrames){
AVCaptureSession* s = [[AVCaptureSession alloc] init];
AVCaptureDevice* aDevice = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
AVCaptureDeviceInput* aInput = NULL;
if([aDevice lockForConfiguration:NULL] == YES && aDevice){
aInput = [AVCaptureDeviceInput deviceInputWithDevice:aDevice error:nil];
[aDevice unlockForConfiguration];
}
else if(!aDevice){
fprintf(stderr, "[d] could not create device. (%p)\n", aDevice);
return NULL;
}
else{
fprintf(stderr, "[d] could not lock device.\n");
return NULL;
}
if(!aInput){
fprintf(stderr, "[d] could not create input.\n");
return NULL;
}
AVCaptureAudioDataOutput* aOutput = [[AVCaptureAudioDataOutput alloc] init];
dispatch_queue_t aQueue = dispatch_queue_create("aQueue", NULL);
if(!aOutput){
fprintf(stderr, "[d] could not create output.\n");
return NULL;
}
[aOutput setSampleBufferDelegate:ObjectReceivingAudioFrames queue:aQueue];
// the below line does only work on macOS
//aOutput.audioSettings = settings;
[s beginConfiguration];
if([s canAddInput:aInput]){
[s addInput:aInput];
}
else{
fprintf(stderr, "[d] could not add input.\n");
return NULL;
}
if([s canAddOutput:aOutput]){
[s addOutput:aOutput];
}
else{
fprintf(stderr, "[d] could not add output.\n");
return NULL;
}
[s commitConfiguration];
return s;
}
The aReceiver* class (?) is defined below and receives the audio frames provided by the AVCaptureAudioDataOutput* object. The frames are stored inside a std::vector.
(im adding the code as image as I could not get it formatted right...)
Then I start the AVCaptureSession* using [audioSession start]
When a tcp client connects I first create a AudioConverterRef and two AudioStreamBasicDescription to convert the audio frames to AAC, see below:
AudioStreamBasicDescription asbdIn, asbdOut;
AudioConverterRef converter;
asbdIn.mFormatID = kAudioFormatLinearPCM;
//asbdIn.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked;
asbdIn.mFormatFlags = 12;
asbdIn.mSampleRate = 44100;
asbdIn.mChannelsPerFrame = 1;
asbdIn.mFramesPerPacket = 1;
asbdIn.mBitsPerChannel = 16;
//asbdIn.mBytesPerFrame = (asbdIn.mBitsPerChannel / 8) * asbdIn.mBitsPerChannel;
asbdIn.mBytesPerFrame = 2;
asbdIn.mBytesPerPacket = asbdIn.mBytesPerFrame;
asbdIn.mReserved = 0;
asbdOut.mFormatID = kAudioFormatMPEG4AAC;
asbdOut.mFormatFlags = 0;
asbdOut.mSampleRate = 44100;
asbdOut.mChannelsPerFrame = 1;
asbdOut.mFramesPerPacket = 1024;
asbdOut.mBitsPerChannel = 0;
//asbdOut.mBytesPerFrame = (asbdOut.mBitsPerChannel / 8) * asbdOut.mBitsPerChannel;
asbdOut.mBytesPerFrame = 0;
asbdOut.mBytesPerPacket = asbdOut.mBytesPerFrame;
asbdOut.mReserved = 0;
OSStatus err = AudioConverterNew(&asbdIn, &asbdOut, &converter);
Then I create a AudioBufferList* to store the encoded frames:
while(audioInput.locked){ // audioInput is my aReceiver*
usleep(0.2 * 1000000);
}
audioInput.locked = true;
UInt32 RequestedPackets = 8192;
//AudioBufferList* aBufferList = (AudioBufferList*)malloc(sizeof(AudioBufferList));
AudioBufferList* aBufferList = static_cast<AudioBufferList*>(calloc(1, offsetof(AudioBufferList, mBuffers) + (sizeof(AudioBuffer) * 1)));
aBufferList->mNumberBuffers = 1;
aBufferList->mBuffers[0].mNumberChannels = asbdIn.mChannelsPerFrame;
aBufferList->mBuffers[0].mData = static_cast<void*>(calloc(RequestedPackets, asbdIn.mBytesPerFrame));
aBufferList->mBuffers[0].mDataByteSize = asbdIn.mBytesPerFrame * RequestedPackets;
Then I go through the frames stored in the std::vector mentioned earlier and pass them to AudioConverterFillComplexBuffer(). After conversion, i concat all encoded frames into one NSMutableData which I then write() to the socket connected to the client.
long aBufferListSize = audioInput.aBufferList.size();
while(aBufferListSize > 0){
err = AudioConverterFillComplexBuffer(converter, feedAFrames, static_cast<void*>(&audioInput.aBufferList[audioInput.aBufferList.size() - aBufferListSize]), &RequestedPackets, aBufferList, NULL);
NSMutableData* encodedData = [[NSMutableData alloc] init];
long encodedDataLen = 0;
for(int i = 0; i < aBufferList->mNumberBuffers; i++){
Float32* frame = (Float32*)aBufferList->mBuffers[i].mData;
[encodedData appendBytes:frame length:aBufferList->mBuffers[i].mDataByteSize];
encodedDataLen += aBufferList->mBuffers[i].mDataByteSize;
}
unsigned char* encodedDataBytes = (unsigned char*)[encodedData bytes];
fprintf(stderr, "[d] got %li encoded bytes to send...\n", encodedDataLen);
long bytes = write(Client->GetFD(), encodedDataBytes, encodedDataLen);
fprintf(stderr, "[d] written %li of %li bytes.\n", bytes, encodedDataLen);
usleep(0.2 * 1000000);
aBufferListSize--;
}
audioInput.aBufferList.clear();
audioInput.locked = false;
Below is the feedAFrames() callback used in the AudioConverterFillComplexBuffer() call:
(again this is an image of the code, same reason as above)
Step 5 to 7 are repeated until the tcp connection is closed.
Each step runs without any noticeable error (I know I could include way better error handling here), and I do get data out of step 3 and 7. However it does not seem to be AAC what comes out at the end.
As im rather new to all of this, im really not sure what my error is, im sure there are several things I made wrong. It is kind of hard to find suitable example code of what I am trying to do, and the above is the best I could come up with until now with all that I have found paired with the apple dev documentation.
I hope someone might take some time to explain me what I did wrong and how I can get this to work. Thanks for reading until here!

How does AudioKit's AKNodeOutputPlot pull it's data?

I'm very new to the AudioKit framework and I have been trying to understand a bit more about the DSP side to it. Whilst rummaging around in the source code I realised that AKNodeOutputPlot does not pull data from the node the same way others would.
In the DSP code for the AKAmplitudeTracker an RMS value is calculated for each channel and the result is briefly written to the output buffer but at the end of the for loop the node is essentially bypassed by setting the output to the original input:
void process(AUAudioFrameCount frameCount, AUAudioFrameCount bufferOffset) override {
for (int frameIndex = 0; frameIndex < frameCount; ++frameIndex) {
int frameOffset = int(frameIndex + bufferOffset);
for (int channel = 0; channel < channels; ++channel) {
float *in = (float *)inBufferListPtr->mBuffers[channel].mData + frameOffset;
float temp = *in;
float *out = (float *)outBufferListPtr->mBuffers[channel].mData + frameOffset;
if (channel == 0) {
if (started) {
sp_rms_compute(sp, leftRMS, in, out);
leftAmplitude = *out;
} else {
leftAmplitude = 0;
}
} else {
if (started) {
sp_rms_compute(sp, rightRMS, in, out);
rightAmplitude = *out;
} else {
rightAmplitude = 0;
}
}
*out = temp;
}
}
}
This makes sense since outputting the RMS value to the device speakers would sound terrible but when this node is used as the input to the AKNodeOutputPlot object RMS values are plotted.
I assumed that the leftAmplitude and rightAmplitude variables were being referenced somewhere but even if they are zeroed out the plot works just fine. I'm interested in doing some work on the signal without effecting the output so I'd love it someone could help me figure how the AKPlot is grabbing this data.
Cheers

AKNodeOutputPlot works with something called a "tap":
https://github.com/AudioKit/AudioKit/blob/master/AudioKit/Common/User%20Interface/AKNodeOutputPlot.swift
There are also a few other taps that are not necessarily just for user interface purposes:
https://github.com/AudioKit/AudioKit/tree/master/AudioKit/Common/Taps
Taps allow you to inspect the data being pulled through another node without being inserted into the signal chain itself.

SDL2 + ffmpeg2 intermittent clicks instead of audio

Using c++, SDL2, SDL2_mix, ffmpeg2.
Inited SDL2_mix with callback
Mix_HookMusic(MusicPlayer, &g_audioPos);
Decoded audio from ogg to AVFrame* by this code:
while(av_read_frame(m_formatContext, &m_packet) >= 0)
{
if(m_packet.stream_index == m_audioStream)
{
int audio_frame_finished;
avcodec_decode_audio4(m_audioCodecContext, m_audioFrame, &audio_frame_finished, &m_packet);
if(!audio_frame_finished)
{
continue;
}
}
}
After this code i have frame with some m_audioFrame->data[0] and m_audioFrame->linesize[0] == 4096
Once in a while my callback being called:
void MusicPlayer(void *_udata, Uint8 *_stream, int _len)
{
if(!g_audioData)
{
return;
}
AudioData* audio = reinterpret_cast<AudioData*>(g_audioData);
if(!audio || audio->Frame->linesize[0] == 0)
{
return;
}
SDL_memcpy((Uint8*)audio->Data + audio->Pos, (Uint8*)audio->Frame->data[0], audio->Frame->linesize[0]);
audio->Pos+= audio->Frame->linesize[0];
int rest = g_chunkSize - audio->Pos;//frame->linesize[0];
if(rest <= 0)
{
SDL_memcpy(_stream, audio->Data, _len);
audio->Pos = 0;
*(int*)_udata += _len;
}
}
_len == 8192, so i must push 2 frames to fill stream, but all i get - clicks in my speaker. What am i doing wrong?
PS: Tried to reopen MIX with Mix_OpenAudio(m_audioCodecContext->sample_rate, AUDIO_S16SYS, m_audioCodecContext->channels, 4096);. Interesting thing is m_audioCodecContext->channels == 2 and when my callback being called _len = 16384. Have no idea what to do. Please help!!!

You did not show how you perform the ffmpeg initialization; I'd speculate you forgot to specify the requested sample format:
aCodecCtx->request_sample_fmt = AV_SAMPLE_FMT_S16;
Take a look at the source code of my Karaoke lyrics editor which uses the SDL/ffmpeg audio player based on the latest ffmpeg (and hence decode_audio4): https://sourceforge.net/p/karlyriceditor/code/HEAD/tree/src/audioplayerprivate.cpp

How to write a Live555 FramedSource to allow me to stream H.264 live

I've been trying to write a class that derives from FramedSource in Live555 that will allow me to stream live data from my D3D9 application to an MP4 or similar.
What I do each frame is grab the backbuffer into system memory as a texture, then convert it from RGB -> YUV420P, then encode it using x264, then ideally pass the NAL packets on to Live555. I made a class called H264FramedSource that derived from FramedSource basically by copying the DeviceSource file. Instead of the input being an input file, I've made it a NAL packet which I update each frame.
I'm quite new to codecs and streaming, so I could be doing everything completely wrong. In each doGetNextFrame() should I be grabbing the NAL packet and doing something like
memcpy(fTo, nal->p_payload, nal->i_payload)
I assume that the payload is my frame data in bytes? If anybody has an example of a class they derived from FramedSource that might at least be close to what I'm trying to do I would love to see it, this is all new to me and a little tricky to figure out what's happening. Live555's documentation is pretty much the code itself which doesn't exactly make it easy for me to figure out.

Ok, I finally got some time to spend on this and got it working! I'm sure there are others who will be begging to know how to do it so here it is.
You will need your own FramedSource to take each frame, encode, and prepare it for streaming, I will provide some of the source code for this soon.
Essentially throw your FramedSource into the H264VideoStreamDiscreteFramer, then throw this into the H264RTPSink. Something like this
scheduler = BasicTaskScheduler::createNew();
env = BasicUsageEnvironment::createNew(*scheduler);
framedSource = H264FramedSource::createNew(*env, 0,0);
h264VideoStreamDiscreteFramer
= H264VideoStreamDiscreteFramer::createNew(*env, framedSource);
// initialise the RTP Sink stuff here, look at
// testH264VideoStreamer.cpp to find out how
videoSink->startPlaying(*h264VideoStreamDiscreteFramer, NULL, videoSink);
env->taskScheduler().doEventLoop();
Now in your main render loop, throw over your backbuffer which you've saved to system memory to your FramedSource so it can be encoded etc. For more info on how to setup the encoding stuff check out this answer How does one encode a series of images into H264 using the x264 C API?
My implementation is very much in a hacky state and is yet to be optimised at all, my d3d application runs at around 15fps due to the encoding, ouch, so I will have to look into this. But for all intents and purposes this StackOverflow question is answered because I was mostly after how to stream it. I hope this helps other people.
As for my FramedSource it looks a little something like this
concurrent_queue<x264_nal_t> m_queue;
SwsContext* convertCtx;
x264_param_t param;
x264_t* encoder;
x264_picture_t pic_in, pic_out;
EventTriggerId H264FramedSource::eventTriggerId = 0;
unsigned H264FramedSource::FrameSize = 0;
unsigned H264FramedSource::referenceCount = 0;
int W = 720;
int H = 960;
H264FramedSource* H264FramedSource::createNew(UsageEnvironment& env,
unsigned preferredFrameSize,
unsigned playTimePerFrame)
{
return new H264FramedSource(env, preferredFrameSize, playTimePerFrame);
}
H264FramedSource::H264FramedSource(UsageEnvironment& env,
unsigned preferredFrameSize,
unsigned playTimePerFrame)
: FramedSource(env),
fPreferredFrameSize(fMaxSize),
fPlayTimePerFrame(playTimePerFrame),
fLastPlayTime(0),
fCurIndex(0)
{
if (referenceCount == 0)
{
}
++referenceCount;
x264_param_default_preset(&param, "veryfast", "zerolatency");
param.i_threads = 1;
param.i_width = 720;
param.i_height = 960;
param.i_fps_num = 60;
param.i_fps_den = 1;
// Intra refres:
param.i_keyint_max = 60;
param.b_intra_refresh = 1;
//Rate control:
param.rc.i_rc_method = X264_RC_CRF;
param.rc.f_rf_constant = 25;
param.rc.f_rf_constant_max = 35;
param.i_sps_id = 7;
//For streaming:
param.b_repeat_headers = 1;
param.b_annexb = 1;
x264_param_apply_profile(&param, "baseline");
encoder = x264_encoder_open(&param);
pic_in.i_type = X264_TYPE_AUTO;
pic_in.i_qpplus1 = 0;
pic_in.img.i_csp = X264_CSP_I420;
pic_in.img.i_plane = 3;
x264_picture_alloc(&pic_in, X264_CSP_I420, 720, 920);
convertCtx = sws_getContext(720, 960, PIX_FMT_RGB24, 720, 760, PIX_FMT_YUV420P, SWS_FAST_BILINEAR, NULL, NULL, NULL);
if (eventTriggerId == 0)
{
eventTriggerId = envir().taskScheduler().createEventTrigger(deliverFrame0);
}
}
H264FramedSource::~H264FramedSource()
{
--referenceCount;
if (referenceCount == 0)
{
// Reclaim our 'event trigger'
envir().taskScheduler().deleteEventTrigger(eventTriggerId);
eventTriggerId = 0;
}
}
void H264FramedSource::AddToBuffer(uint8_t* buf, int surfaceSizeInBytes)
{
uint8_t* surfaceData = (new uint8_t[surfaceSizeInBytes]);
memcpy(surfaceData, buf, surfaceSizeInBytes);
int srcstride = W*3;
sws_scale(convertCtx, &surfaceData, &srcstride,0, H, pic_in.img.plane, pic_in.img.i_stride);
x264_nal_t* nals = NULL;
int i_nals = 0;
int frame_size = -1;
frame_size = x264_encoder_encode(encoder, &nals, &i_nals, &pic_in, &pic_out);
static bool finished = false;
if (frame_size >= 0)
{
static bool alreadydone = false;
if(!alreadydone)
{
x264_encoder_headers(encoder, &nals, &i_nals);
alreadydone = true;
}
for(int i = 0; i < i_nals; ++i)
{
m_queue.push(nals[i]);
}
}
delete [] surfaceData;
surfaceData = NULL;
envir().taskScheduler().triggerEvent(eventTriggerId, this);
}
void H264FramedSource::doGetNextFrame()
{
deliverFrame();
}
void H264FramedSource::deliverFrame0(void* clientData)
{
((H264FramedSource*)clientData)->deliverFrame();
}
void H264FramedSource::deliverFrame()
{
x264_nal_t nalToDeliver;
if (fPlayTimePerFrame > 0 && fPreferredFrameSize > 0) {
if (fPresentationTime.tv_sec == 0 && fPresentationTime.tv_usec == 0) {
// This is the first frame, so use the current time:
gettimeofday(&fPresentationTime, NULL);
} else {
// Increment by the play time of the previous data:
unsigned uSeconds = fPresentationTime.tv_usec + fLastPlayTime;
fPresentationTime.tv_sec += uSeconds/1000000;
fPresentationTime.tv_usec = uSeconds%1000000;
}
// Remember the play time of this data:
fLastPlayTime = (fPlayTimePerFrame*fFrameSize)/fPreferredFrameSize;
fDurationInMicroseconds = fLastPlayTime;
} else {
// We don't know a specific play time duration for this data,
// so just record the current time as being the 'presentation time':
gettimeofday(&fPresentationTime, NULL);
}
if(!m_queue.empty())
{
m_queue.wait_and_pop(nalToDeliver);
uint8_t* newFrameDataStart = (uint8_t*)0xD15EA5E;
newFrameDataStart = (uint8_t*)(nalToDeliver.p_payload);
unsigned newFrameSize = nalToDeliver.i_payload;
// Deliver the data here:
if (newFrameSize > fMaxSize) {
fFrameSize = fMaxSize;
fNumTruncatedBytes = newFrameSize - fMaxSize;
}
else {
fFrameSize = newFrameSize;
}
memcpy(fTo, nalToDeliver.p_payload, nalToDeliver.i_payload);
FramedSource::afterGetting(this);
}
}
Oh and for those who want to know what my concurrent queue is, here it is, and it works brilliantly http://www.justsoftwaresolutions.co.uk/threading/implementing-a-thread-safe-queue-using-condition-variables.html
Enjoy and good luck!

The deliverFrame method lacks the following check at its start:
if (!isCurrentlyAwaitingData()) return;
see DeviceSource.cpp in LIVE

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Get frame time in ffmpeg - c++

Related

More efficient way for reading CAN data in while loop

iOS AVCaptureSession stream micro audio Objective-C++

How does AudioKit's AKNodeOutputPlot pull it's data?

SDL2 + ffmpeg2 intermittent clicks instead of audio

How to write a Live555 FramedSource to allow me to stream H.264 live

Categories

Resources