ffmpeg H264 Encode Frame at a time for network streaming - c++

I'm working on a remote desktop application, I would like to send an encoded H264 packet over TCP by using ffmpeg for the encoding. However I couldn't find useful info for the particular case of encoding just one frame (already on YUV444) and get the packet.
I have several issues, the first was that:
avcodec_encode_video2
Was not blocking, I found that most of the time you get the "delayed" frames at the end, however, since this is a real time streaming the solution was:
av_opt_set(mCodecContext->priv_data, "tune", "zerolatency", 0);
Now I got the frame, but several issues, it takes a while and even worse I got a gray with trash pixels video as result. My configuration for the Codec Context:
m_pCodecCtx->bit_rate=8000000;
m_pCodecCtx->codec_id=AV_CODEC_ID_H264;
m_pCodecCtx->codec_type = AVMEDIA_TYPE_VIDEO;
m_pCodecCtx->width=1920;
m_pCodecCtx->height=1080;
m_pCodecCtx->pix_fmt=AV_PIX_FMT_YUV444P;
m_pCodecCtx->time_base.num = 1;
m_pCodecCtx->time_base.den = 25;
m_pCodecCtx->gop_size = 1;
m_pCodecCtx->keyint_min = 1;
m_pCodecCtx->i_quant_factor = float(0.71);
m_pCodecCtx->b_frame_strategy = 20;
m_pCodecCtx->qcompress = (float)0.6;
m_pCodecCtx->qmax = 51;
m_pCodecCtx->qmin = 20;
m_pCodecCtx->max_qdiff = 4;
m_pCodecCtx->refs = 4;
m_pCodecCtx->max_b_frames = 1;
m_pCodecCtx->thread_count = 1;
I would like to know how this could be done, how do I set the "I Frames"? and, that would be the optimal for a "one at a time" encoding? Also I'm not concerned right now with the quality, just need to be fast enough (under 16 ms).
For the encoding part:
nres = avcodec_encode_video2(m_pCodecCtx,&packet,m_pFrame,&framefinished);
if(nres<0){
qDebug() << "error encoding: " << nres << endl;
}
if(framefinished){
m_pFrame->pts++;
ofstream vidout("video.h264",ios::app);
if(vidout.good()){
vidout.write((const char*)&packet.data[0],packet.size);
}
vidout.close();
av_packet_unref(&packet);
}
I'm not using a container, just a raw file, ffplay reproduce raw files if the packets are right, and that's my principal issue. I'm planning to send the packet over tcp and decode on the client. Any help would be greatly appreciated.

You could take a look at the source code of webrtc.
It use openh264 and ffmpeg to accomplish your work.
I was study in it for a while. But I can't the the latest source code currently.
I found this :
source code.
Hope it helps.

Turns out I got it working since the beginning, I made very simple but important mistake, I was writing as text a binary file, so...
Thanks for the feedback and your help

Related

FFmpeg, decode mp3 packet after sending by network

Good day, everyone.
Working with FFmpeg.
And have some issues with decoding, I cannot find in docs and forums.
The code is not simple, so I will try to explain firstly in words, may be someone really skilled in ffmpeg will understand the issue by words. But if the code really helps, i will try to post it.
So, firstly in general, what i want to do. I want to capture voice, encode to mp3, get packet, send to network, on the other side: accept the packet, decode and play. Why not to use ffmpeg streaming? Well, because this packet will be modified a little bit, and may be encoded/crypted, so ffmpeg has no functions to do this, and i should do it manually.
Well, what i managed to do now. Now i can encode and decode via file. This code works fine, without any issues. So generally i can encode and decode mp3, and play, so i assume that my code for encoding/decoding works fine.
So, than i just change code, making no saving to file, and send packets via network instead.
This is the method, standard, well, i use to send packet to decoder on the accepting side.
result = avcodec_send_packet( codecContext, networkPacket );
if( result < 0 ) {
if( result != AVERROR(EAGAIN) ) {
qDebug() << "Some decoding error occured: " << result;
return;
}
}
networkPacket is the AVPacket restored from network.
AVPacket* networkPacket = NULL;
networkPacket = av_packet_alloc();
...
And that is the way i restore it.
void FFmpegPlay::processNetworkPacket( MediaPacket* mediaPacket ) {
qDebug() << "FFmpegPlay::processNetworkPacket start";
int result;
AVPacket* networkPacket = NULL;
networkPacket = av_packet_alloc();
networkPacket->size = mediaPacket->data.size();
networkPacket->data = (uint8_t*) malloc( mediaPacket->data.size() + AV_INPUT_BUFFER_PADDING_SIZE );
memcpy( networkPacket->data, mediaPacket->data.data(), mediaPacket->data.size() );
networkPacket->pts = mediaPacket->pts;
networkPacket->dts = mediaPacket->dts;
networkPacket->flags = mediaPacket->flags;
networkPacket->duration = mediaPacket->duration;
networkPacket->pos = mediaPacket->pos;
...
And there i get -22, EINVAL, invalid argument.
Docs tell me:
AVERROR(EINVAL): codec not opened, it is an encoder, or requires flush
Well, my codec really opened, this is decoder, and this is first call, so i think that flush is not required. So i assume, that issue is in packet and codec setup. I also tried different flags, and always get this error. Codec just doesn't want to accept packet.
So, now i have explained the situation.
And question is: Is there any special options or flags for ffmpeg mp3 decoder to implement what is explained above? Which one of them i should change?
Upd.
After some testing, i decided to make more clear test and check if i can decode immediately after encoding, without network, and it looks like i can do that.
So it looks like in network case the decoder should be initialized somehow special, or have some options.
Well, i'm dealing initialization by copying AVCodecParameters from original and sending them by network. Maybe i should change them some special way?
I'm stuck on this one. And have no idea how to deal with it. So any help is appreciated.

Core Audio Ring Buffer Data comes out blank

I am working off a demo from the book "Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS." Chapter 8 shows how to set up a simple AudioUnit graph to play through from the AUHAL input unit to an output unit. This setup doesn't actually connect the audio units; instead, both units use a callback and pass audio data through an instance of CARingBuffer. I'm coding for MacOS 10.15.6, and using code directly from the publisher here. Here's a picture of how it works:
The code builds and runs, but I get no audio. Note that later, after introducing a speech synthesis unit, I do get playback, so I know the basics are working.
InputRenderProc asks the AUHAL unit for input and stores it in the ring buffer.
MyAUGraphPlayer *player = (MyAUGraphPlayer*) inRefCon;
// have we ever logged input timing? (for offset calculation)
if (player->firstInputSampleTime < 0.0) {
player->firstInputSampleTime = inTimeStamp->mSampleTime;
if ((player->firstOutputSampleTime > -1.0) &&
(player->inToOutSampleTimeOffset < 0.0)) {
player->inToOutSampleTimeOffset = player->firstInputSampleTime - player->firstOutputSampleTime;
}
}
// render into our buffer
OSStatus inputProcErr = noErr;
inputProcErr = AudioUnitRender(player->inputUnit,
ioActionFlags,
inTimeStamp,
inBusNumber,
inNumberFrames,
player->inputBuffer);
if (! inputProcErr) {
inputProcErr = player->ringBuffer->Store(player->inputBuffer,
inNumberFrames,
inTimeStamp->mSampleTime);
UInt32 sz = sizeof(player->inputBuffer);
printf ("stored %d frames at time %f (%d bytes)\n", inNumberFrames, inTimeStamp->mSampleTime, sz);
for (int i = 0; i < player->inputBuffer->mNumberBuffers; i++ ){
//printf("stored audio string[%d]: %s\n", i, player->inputBuffer->mBuffers[i].mData);
}
}
If I uncomment the printf statement, I see what looks like audio data being stored.
stored audio string[1]: #P'\274a\353\273\336^\274x\205 \2741\330B\2747'\274\371\361U\274\346\274\274}\212C\274\334\365%\274\261\367\273\340\307/\274E
stored 512 frames at time 134610.000000 (8 bytes)
However, when I fetch from the ring buffer in the GraphRenderCallback like this...
MyAUGraphPlayer *player = (MyAUGraphPlayer*) inRefCon;
// have we ever logged output timing? (for offset calculation)
if (player->firstOutputSampleTime < 0.0) {
player->firstOutputSampleTime = inTimeStamp->mSampleTime;
if ((player->firstInputSampleTime > -1.0) &&
(player->inToOutSampleTimeOffset < 0.0)) {
player->inToOutSampleTimeOffset = player->firstInputSampleTime - player->firstOutputSampleTime;
}
}
// copy samples out of ring buffer
OSStatus outputProcErr = noErr;
// new CARingBuffer doesn't take bool 4th arg
outputProcErr = player->ringBuffer->Fetch(ioData,
inNumberFrames,
inTimeStamp->mSampleTime + player->inToOutSampleTimeOffset);
I get nothing (I know I can't expect proper null-terminated string output, but I thought I'd see something).
fetched 512 frames at time 160776.000000
fetched audio string[0, size 2048]: xx
fetched audio string[1, size 2048]: xx
fetched 512 frames at time 161288.000000
fetched audio string[0, size 2048]: xx
fetched audio string[1, size 2048]: xx
This is not a permission problem; I have other non-AudioUnit code that can get mic input. In addition, I created a plist that makes this app prompt for mic access every time, so I know that is working. I cannot understand why data goes into this ring buffer, but never comes out.
These days you need to declare that you want to use the microphone, providing an explanation string. This wasn't the case in 2012 when Learning Core Audio was published.
In short, you now need to:
add an NSMicrophoneUsageDescription string to your Info.plist
add sandboxing capability and enable Audio Input
The sample code you're using is a command line tool, so adding an Info.plist to it in Xcode isn't as simple as with a .app package. Also the code does not seem to work if you run it from Xcode. In my case it has to be run for Terminal.app. This may be due to the fact that my Terminal has microphone permissions (viewable in System Preferences > Security & Privacy > Microphone). You can and probably should explicitly request microphone access from the user (yourself in this case!) by using requestAccessForMediaType on an AVCaptureDevice. That's right, AVFoundation code in a Core Audio tutorial, what's the world coming to.
There are more details on the above steps in this answer
p.s. I think the person who thought capturing zeroes instead of returning an error was a good idea is probably good friends with whoever invented returning HTTP 200 with an error code in the body.

FFMPEG:av_rescale_q - time_base difference

I want to know once and for all, how time base calucaltion and rescaling works in FFMPEG.
Before getting to this question I did some research and found many controversial answers, which make it even more confusing.
So based on official FFMPEG examples one has to
rescale output packet timestamp values from codec to stream timebase
with something like this:
pkt->pts = av_rescale_q_rnd(pkt->pts, *time_base, st->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
pkt->dts = av_rescale_q_rnd(pkt->dts, *time_base, st->time_base, AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX);
pkt->duration = av_rescale_q(pkt->duration, *time_base, st->time_base);
But in this question a guy was asking similar question to mine, and he gave more examples, each of them doing it differently. And contrary to the answer which says that all those ways are fine, for me only the following approach works:
frame->pts += av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
In my application I am generating video packets (h264) at 60 fps outside FFMPEG API then write them into mp4 container.
I set explicitly:
video_st->time_base = {1,60};
video_st->r_frame_rate = {60,1};
video_st->codec->time_base = {1 ,60};
The first weird thing I see happens right after I have written header for the output format context:
AVDictionary *opts = nullptr;
int ret = avformat_write_header(mOutputFormatContext, &opts);
av_dict_free(&opts);
After that ,video_st->time_baseis populated with:
num = 1;
den = 15360
And I fail to understand why.
I want someone please to exaplain me that.Next, before writing frame I calculate
PTS for the packet. In my case PTS = DTS as I don't use B-frames at all.
And I have to do this:
const int64_t duration = av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
totalPTS += duration; //totalPTS is global variable
packet->pts = totalPTS ;
packet->dts = totalPTS ;
av_write_frame(mOutputFormatContext, mpacket);
I don't get it,why codec and stream have different time_base values even though I explicitly set those to be the same. And because I see across all the examples that av_rescale_q is always used to calculate duration I really want someone to explain this point.
Additionally, as a comparison, and for the sake of experiment, I decided to try writing stream for WEBM container. So I don't use libav output stream at all.
I just grab the same packet I use to encode MP4 and write it manually into EBML stream. In this case I calculate duration like this:
const int64_t duration =
( video_st->codec->time_base.num / video_st->codec->time_base.den) * 1000;
Multiplication by 1000 is required for WEBM as the time stamps are presented in milliseconds in that container.And this works. So why in case of MP4 stream encoding there is a difference in time_base which has to be rescaled?
This behavior from ffmpeg confuses me too. It was discussed a little by users here - http://ffmpeg.org/pipermail/libav-user/2018-January/010843.html . But the resolution there was to just deal with the 15360 time_base rather than exert control over it.
From the source pointed out by the poster in that forum topic (https://github.com/FFmpeg/FFmpeg/blob/master/libavformat/movenc.c, search for "*= 2"), it doesn't look easily avoidable as far as I can tell. It appears your choice is to let the time_base get changed, or to pick something >= 10000 and then it will not be changed.

How to reduce latency when streaming x264

I would like to produce a zerolatency live video stream and play it in VLC player with as little latency as possible.
This are the settings I currently use:
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
m_Params.i_threads = 2;
m_Params.b_sliced_threads = true;
m_Params.i_width = m_SourceWidth;
m_Params.i_height = m_SourceHeight;
m_Params.b_intra_refresh = 1;
m_Params.b_vfr_input = true;
m_Params.i_timebase_num = 1;
m_Params.i_timebase_den = 1000;
m_Params.i_fps_num = 1;
m_Params.i_fps_den = 60;
m_Params.rc.i_vbv_max_bitrate = 512;
m_Params.rc.i_vbv_buffer_size = 256;
m_Params.rc.f_vbv_buffer_init = 1.1f;
m_Params.rc.i_rc_method = X264_RC_CRF;
m_Params.rc.f_rf_constant = 24;
m_Params.rc.f_rf_constant_max = 35;
m_Params.b_annexb = 0;
m_Params.b_repeat_headers = 0;
m_Params.b_aud = 0;
x264_param_apply_profile( &m_Params, "high" );
Using those settings, I have the following issues:
VLC shows lots of missing frames (see screenshot, "verloren"). I am not sure if this is an issue.
If I set a value <200ms for the network stream delay in VLC, VLC renders a few frames and than stops to decode/render frames.
If I set a value >= 200ms for the network stream delay in VLC, everything looks good so far but the latency is, obviously, 200ms, which is too high.
Question:
Which settings (x264lib and VLC) should I use in order to encode and stream with as little latency as possible?
On your x264 settings: many are redundant ie already contained in "zerolatency". However, as best as I can tell, your encoding latency is nevertheless zero frames, ie you put one frame in and you immediately (as soon as your CPU has finished encoding it, anyway) get one frame out. It never waits for a newer frame in order to give an encoded older frame (the way it would with lookahead, for example).
On why vlc pauses unless you give it a large network delay: The problem is that your combination of rate control and vbv settings when encoding is not ideal. What you want to do for low latency encode is to use CBR, and set the VBV buffer to the size of one frame, exactly. This enables a special VBV calculation, if you look in the x264 source.
You may also try not setting anything timing related whatsoever (no fps, no vbv) and use CRF with zerolatency. The results would depend on what container the video is packaged in for streaming.
Read this for more info: http://x264dev.multimedia.cx/archives/249
If you want to have the fastest possible encoding, then delete everything after
x264_param_default_preset( &m_Params, "veryfast", "zerolatency" );
and change veryfast to ultrafast. The rest is because of network delay + decoding.

How to use FFMPEG to play H.264 stream from NAL units that are stored as video AVPackets

I am writing client-server system that uses FFMPEG library to parse H.264 stream into NAL units on the server side, then uses channel coding to send them over network to client side, where my application must be able to play video.
The question is how to play received AVPackets (NAL units) in my application as video stream.
I have found this tutorial helpful and used it as base for both server and client side.
Some sample code or resource related to playing video not from file, but from data inside program using FFMPEG library would be very helpful.
I am sure that received information will be sufficient to play video, because I tried to save received data as .h264 or .mp4 file and it can be played by VLC player.
Of what I understand from your question, you have the AVPackets and want to play a video. In reality this is two problems; 1. decoding your packets, and 2. playing the video.
For decoding your packets, with FFmpeg, you should take a look at the documentation for AVPacket, AVCodecContext and avcodec_decode_video2 to get some ideas; the general idea is that you want to do something (just wrote this in the browser, take with a grain of salt) along the lines of:
//the context, set this appropriately based on your video. See the above links for the documentation
AVCodecContext *decoder_context;
std::vector<AVPacket> packets; //assume this has your packets
...
AVFrame *decoded_frame = av_frame_alloc();
int ret = -1;
int got_frame = 0;
for(AVPacket packet : packets)
{
avcodec_get_frame_defaults(frame);
ret = avcodec_decode_video2(decoder_context, decoded_frame, &got_frame, &packet);
if (ret <= 0) {
//had an error decoding the current packet or couldn't decode the packet
break;
}
if(got_frame)
{
//send to whatever video player queue you're using/do whatever with the frame
...
}
got_frame = 0;
av_free_packet(&packet);
}
It's a pretty rough sketch, but that's the general idea for your problem of decoding the AVPackets. As for your problem of playing the video, you have many options, which will likely depend more on your clients. What you're asking is a pretty large problem, I'd advise familiarizing yourself with the FFmpeg documentation and the provided examples at the FFmpeg site. Hope that makes sense