Low quality H.265 Media Foundation encoding?

Low quality H.265 Media Foundation encoding? - c++

I'm trying to encode video with MF H.265, and no matter what I try, the quality is always lower than the same-settings video procuded by non MF encoders, like what VideoPad uses (say, ffmpeg) at the same 4000 bitrate.
Videopad produces this video of a swimming boy. My app produces this video. The sky in my app is clearly worse at a 6K bitrate, where the VideoPad is at 1K.
pMediaTypeOutVideo->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
pMediaTypeOutVideo->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_HEVC);
pMediaTypeOutVideo->SetUINT32(MF_MT_AVG_BITRATE, 4000000);
MFSetAttributeSize(pMediaTypeOutVideo, MF_MT_FRAME_SIZE, 1920,1080);
MFSetAttributeRatio(pMediaTypeOutVideo, MF_MT_FRAME_RATE, 25, 1);
MFSetAttributeRatio(pMediaTypeOutVideo, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
pMediaTypeOutVideo->SetUINT32(MF_MT_INTERLACE_MODE, MFVideoInterlace_Progressive);
pMediaTypeOutVideo->SetUINT32(MF_MT_VIDEO_NOMINAL_RANGE, MFNominalRange_Wide);
CComPtr<ICodecAPI> ca;
hr = pSinkWriter->GetServiceForStream(OutVideoStreamIndex, GUID_NULL, __uuidof(ICodecAPI), (void**)&ca);
if (ca)
{
if (true)
{
VARIANT v = {};
v.vt = VT_BOOL;
v.boolVal = VARIANT_FALSE;
ca->SetValue(&CODECAPI_AVLowLatencyMode, &v);
}
if (true)
{
VARIANT v = {};
v.vt = VT_UI4;
v.ulVal = 100;
hr = ca->SetValue(&CODECAPI_AVEncCommonQualityVsSpeed, &v);
}
if (true)
{
VARIANT v = {};
v.vt = VT_UI4;
v.ulVal = eAVEncCommonRateControlMode_Quality;
ca->SetValue(&CODECAPI_AVEncCommonRateControlMode, &v);
if (true)
{
VARIANT v = {};
v.vt = VT_UI4;
v.ulVal = 100;
ca->SetValue(&CODECAPI_AVEncCommonQuality, &v);
}
}
}
No matter what, the quality at 4000k remains inferior to what ffmpeg produces. Also the eAVEncCommonRateControlMode_Quality and CODECAPI_AVEncCommonQuality does not seem to take effect (it works in H.264). The only way to see better quality is to raise the bitrate.
Also, the speed parameter does not seem to affect the quality or the encoder speed.
Even at 1000k Videopad produced video does not have pixelizing in the sky. Of course, its speed is 1/100.
Is the Media Foundation encoders worse than ffmpeg's? What am I missing?
Edit: Rendering with software (MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS to FALSE) is also equally bad.
Update: Tried it ot my laptop with an AMD hardware encoder. Similar problem, when the bitrate is low the quality is awful.

I checked the two videos with MediaInfo and it is obvious that they use different HEVC profile, which should be the main reason that affects the quality of the NVidia encoded video.
Here is the comparison screenshot:
You can try setting the MF_MT_VIDEO_PROFILE in your input IMFMediaType to eAVEncH265VProfile_Main_420_8. Additionally the MF_MT_MPEG2_LEVEL should be set accordingly as well. For instance to eAVEncH265VLevel4_1.
You might also consider using IClassFactory approach in order to guarantee the correct order of calling the ICodecAPI methods.

Perhaps simply because software encoders are better than hardware encoders.
If i take a look at this : https://www.techspot.com/article/1131-hevc-h256-enconding-playback/page7.html, I also would confirm hardware nvidia encoder is bad, comparing to x265 (date 2016).
I can't investigate a bit more, but from what I see about your Post :
pMediaTypeOutVideo->SetUINT32(MF_MT_VIDEO_NOMINAL_RANGE, MFNominalRange_Wide); -> Why not MFNominalRange_Normal ?
Are there others ICodecAPI from NVidia encoder, than CODECAPI_AVLowLatencyMode/CODECAPI_AVEncCommonQualityVsSpeed/CODECAPI_AVEncCommonRateControlMode...
Where is the 2 passes encoding parameter ?
I found at least three forums saying NVidia HEVC encoder blurs images. And you confirm this, so... Fake news or not, (date 2018/2019)..
From NVidia https://developer.nvidia.com/nvidia-video-codec-sdk#NVENCFeatures (date undefined).
I don't understand nothing about this diagram, but NVidia seems to pretend they are the best... So Fake news or not.
EDIT
Rendering with software (MF_READWRITE_ENABLE_HARDWARE_TRANSFORMS to FALSE) is also equally bad.
Can you confirm in software mode, the H.265 / HEVC Video Encoder used ?
If so, did you play with Codec Properties ?
CODECAPI_AVEncCommonRateControlMode
CODECAPI_AVEncCommonMeanBitRate
CODECAPI_AVEncCommonBufferSize
CODECAPI_AVEncCommonMaxBitRate
CODECAPI_AVEncMPVGOPSize
CODECAPI_AVLowLatencyMode
CODECAPI_AVEncCommonQualityVsSpeed
CODECAPI_AVEncVideoForceKeyFrame
CODECAPI_AVEncVideoEncodeQP
CODECAPI_AVEncVideoMinQP
CODECAPI_AVEncVideoMaxQP
CODECAPI_VideoEncoderDisplayContentType
CODECAPI_AVEncNumWorkerThreads

Related

How to check if a true hardware video adapter is used

I develop an application which shows something like a video in its window. I use technologies which are described here Introducing Direct2D 1.1. In my case the only difference is that eventually I create a bitmap using
ID2D1DeviceContext::CreateBitmap
then I use
ID2D1Bitmap::CopyFromMemory
to copy raw RGB data to it and then I call
ID2D1DeviceContext::DrawBitmap
to draw the bitmap. I use the high quality cubic interpolation mode D2D1_INTERPOLATION_MODE_HIGH_QUALITY_CUBIC for scaling to have the best picture but in some cases (RDP, Citrix, virtual machines, etc) it is very slow and has very high CPU consumption. It happens because in those cases a non-hardware video adapter is used. So for non-hardware adapters I am trying to turn off the interpolation and use faster methods. The problem is that I cannot exactly check if the system has a true hardware adapter.
When I call D3D11CreateDevice, I use it with D3D_DRIVER_TYPE_HARDWARE but on virtual machines it typically returns "Microsoft Basic Render Driver" which is a software driver and does not use GPU (it consumes CPU). So currently I check the vendor ID. If the vendor is AMD (ATI), NVIDIA or Intel, then I use the cubic interpolation. In the other case I use the fastest method which does not consume CPU a lot.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
// NVIDIA
if (desc.VendorId == 0x10DE ||
// AMD
desc.VendorId == 0x1002 || // 0x1022 ?
// Intel
desc.VendorId == 0x8086) // 0x163C, 0x8087 ?
{
bSupported = true;
}
}
}
}
It works for physical (console) Windows session even in virtual machines. But for RDP sessions IDXGIAdapter still returns the vendors in case of real machines but it does not use GPU (I can see it via the Process Hacker 2 and AMD System Monitor (in case of ATI Radeon)) so I still have high CPU consumption with the cubic interpolation. In case of an RDP session to Windows 7 with ATI Radeon it is 10% bigger than via the physical console.
Or am I mistaken and somehow RDP uses GPU resources and that is the reason why it returns a real hardware adapter via IDXGIAdapter::GetDesc?
DirectDraw
Also I looked at DirectX Diagnostic Tool. It looks like the "DirectDraw Acceleration" info field returns exactly what I need. In case of physical (console) sessions it says "Enabled". In case of RDP and virtual machine (without hardware video acceleration) sessions it says "Not Available". I looked at sources and theoretically I can use the verification algorithm. But it is actually for DirectDraw which I do not use in my application. I would like to use something which is directly linked to ID3D11Device, IDXGIDevice, IDXGIAdapter and so on.
IDXGIAdapter1::GetDesc1 and DXGI_ADAPTER_FLAG
I also tried to use IDXGIAdapter1::GetDesc1 and check the flags.
Microsoft::WRL::ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(m_pD3dDevice->QueryInterface(...)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
Microsoft::WRL::ComPtr<IDXGIAdapter1> adapter1;
if (SUCCEEDED(adapter->QueryInterface(__uuidof(IDXGIAdapter1), reinterpret_cast<void**>(adapter1.GetAddressOf()))))
{
DXGI_ADAPTER_DESC1 desc;
if (SUCCEEDED(adapter1->GetDesc1(&desc)))
{
// desc.Flags
// DXGI_ADAPTER_FLAG_NONE = 0,
// DXGI_ADAPTER_FLAG_REMOTE = 1,
// DXGI_ADAPTER_FLAG_SOFTWARE = 2,
// DXGI_ADAPTER_FLAG_FORCE_DWORD = 0xffffffff
}
}
}
}
Information about the DXGI_ADAPTER_FLAG_SOFTWARE flag
Virtual Machine RDP Win Serv 2012 (Microsoft Basic Render Driver) -> (0x02) DXGI_ADAPTER_FLAG_SOFTWARE
Physical Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
Physical Win 7 (ATI Radeon) - > (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 10 (Intel Video) -> (0x00) DXGI_ADAPTER_FLAG_NONE
RDP Win 7 (ATI Radeon) -> (0x00) DXGI_ADAPTER_FLAG_NONE
In case of RDP session on a real machine with a hardware adapter, Flags == 0 but as I can see via Process Hacker 2 the GPU is not used. At least on Windows 7 with ATI Radeon I can see bigger CPU usage in case of an RDP session. So it looks like DXGI_ADAPTER_FLAG_SOFTWARE is only for Microsoft Basic Render Driver. So the issue is not solved.
The question
Is there a correct way to check if a real hardware video card (GPU) is used for the current Windows session? Or maybe it is possible to check if a specific interpolation mode of ID2D1DeviceContext::DrawBitmap has hardware implementation and uses GPU for the current session?
UPD
The topic is not about detecting RDP or Citrix sessions. It is not about detecting if the application is inside a virtual machine or not. I already have the all verifications and use the linear interpolation for those cases. The topic is about detecting if a real GPU is used for the current Windows session to display the desktop. I am looking for a more sophisticated solution to make decision using features of DirectX and DXGI.

If you want to detect the Microsoft Basic Renderer, the best option is to use it's VID/PID combo:
ComPtr<IDXGIDevice> dxgiDevice;
if (SUCCEEDED(device.As(&dxgiDevice)))
{
ComPtr<IDXGIAdapter> adapter;
if (SUCCEEDED(dxgiDevice->GetAdapter(&adapter)))
{
DXGI_ADAPTER_DESC desc;
if (SUCCEEDED(adapter->GetDesc(&desc)))
{
if ( (desc.VendorId == 0x1414) && (desc.DeviceId == 0x8c) )
{
// WARNING: Microsoft Basic Render Driver is active.
// Performance of this application may be unsatisfactory.
// Please ensure that your video card is Direct3D10/11 capable
// and has the appropriate driver installed.
}
}
}
}
See Microsoft Docs and Anatomy of Direct3D 11 Create Device
You will probably find for testing/debugging that you don't want to explicitly block these scenarios, but you do want to provide some kind of warning or notice feedback to the user that they are using software rather than hardware rendering.
Remote Desktop detection from Win32 classic desktop applications is better done directly via GetSystemMetrics( SM_REMOTESESSION ).
See Microsoft Docs

Answering a 3 years old question as I struggled myself to do so.
I had to go through the registry. First thing is to find the adapter LUID in the registry, to get the adapter GUID
private string GetAdapterGuid(long luid)
{
var directXRegistryKey = Registry.LocalMachine.OpenSubKey(#"SOFTWARE\Microsoft\DirectX");
if (directXRegistryKey == null)
return "";
var subKeyNames = directXRegistryKey.GetSubKeyNames();
foreach (var subKeyName in subKeyNames)
{
var subKey = directXRegistryKey.OpenSubKey(subKeyName);
if (subKey.GetValueKind("AdapterLuid") != RegistryValueKind.QWord)
continue;
var luidValue = (long)subKey.GetValue("AdapterLuid");
if (luidValue == luid)
return subKeyName;
}
return "";
}
Once you have that Guid, you can search for the details of the graphic card in HKLM like this. If it is virtual, the service name will be "INDIRECTKMD" :
private bool IsVirtualAdapter(string adapterGuid)
{
var videoRegistryKey = Registry.LocalMachine.OpenSubKey($#"SYSTEM\CurrentControlSet\Control\Video\{adapterGuid}\Video");
if (videoRegistryKey == null)
return false;
if (videoRegistryKey.GetValueKind("Service") != RegistryValueKind.String)
return false;
var serviceName = (string)videoRegistryKey.GetValue("Service");
return serviceName.ToUpper() == "INDIRECTKMD";
}
Checking the service name felt easier than parsing the DeviceDesc value.
My use case involved having the Guid ready so I split up the function, you could merge it into one.
It also only detect RDP/MSTSC through this, additional service names might be needed for other virtual adapters. Or you could try to detect only Nvidia/AMD/Intel driver names... up to you.

input delay with PortAudio callback and ASIO sdk

I'm trying to get the input from my guitar to be played through my computer using the portaudio library and the ASIO sdk.
I have been following some of the tutorials on the official website to get the basics set up. Currently I got it working so that portaudio is listening to the right input and output device and I have the callback setup to just output the input and do nothing with it like this:
static int paTestCallback(const void *inputBuffer, void *outputBuffer, unsigned long framesPerBuffer, const PaStreamCallbackTimeInfo* timeInfo, PaStreamCallbackFlags statusFlags, void *userData)
{
float *out = (float*)outputBuffer;
float* in = (float*)inputBuffer;
for (int i = 0; i<framesPerBuffer; i++)
{
*out++ = *in++; /* left */
*out++ = *in++; /* right */
}
return 0;
}
This callback is setup by calling this:
PaError error = Pa_OpenDefaultStream(&stream, 2, 2, paFloat32, 44100, paFramesPerBufferUnspecified, paTestCallback, &data);
Pa_StartStream(stream);
Now, this does work but I have a lot of delay (about 0.5s) when I strike a string on my guitar and when I hear it through the monitors.
Is there a way to solve this delay? Do I need to rewrite the callback method?
EDIT:
So, I got the delay to be a lot better using this code instead of the basic Pa_OpenDefaultStream()
int defaultIn = Pa_GetDefaultInputDevice();
int defaultOut = Pa_GetDefaultOutputDevice();
PaStreamParameters *inParam = new PaStreamParameters();
inParam->channelCount = 2;
inParam->device = defaultIn;
inParam->sampleFormat = paFloat32;
inParam->suggestedLatency = 0.05;
PaStreamParameters *outParam = new PaStreamParameters();
outParam->channelCount = 2;
outParam->device = defaultOut;
outParam->sampleFormat = paFloat32;
outParam->suggestedLatency = 0;
error = Pa_OpenStream(&stream, inParam, outParam, 44100, paFramesPerBufferUnspecified, paNoFlag, paTestCallback, &data);
if (error != paNoError) {
Logger::log("[PortAudioManager] Could not open default stream. Exiting function...");
return;
}
Pa_StartStream(stream);
There is still a little bit of delay though, mostly noticeable when playing more then just a single note.
EDIT:
I figured out with the help of Ross-Bencina that the windows default input device and output device doesn't change anything to the index of the host api's in PortAudio. I seemed to be using MME all this time. I did the following to get the right index for the ASIO device:
int hostNr = Pa_GetHostApiCount();
std::vector<const PaHostApiInfo*> infoVertex;
for (int t = 0; t < hostNr; ++t) {
infoVertex.push_back(Pa_GetHostApiInfo(t));
}
Then I just checked which is the one with ASIO and set the suggestedLatency in both PaStreamParameters to 0 and the delay is now gone and sound is good (although it's mono for now).

You are on the right track using paFramesPerBufferUnspecified.
The ASIO latency behavior depends on the driver. There are two possibilities:
The ASIO driver lets the code (i.e. PortAudio) request a latency (possibly with some constraints). PortAudio finds to best match between a supported driver buffer size and the latency that you request.
The other possibility is that your audio interface does not provide programmatic control over latency settings. Instead, latency is only selectable from the driver's ASIO control panel UI (and the driver will force a fixed buffer size on PortAudio). In this case, you should investigate the driver control panel UI to set the lowest workable latency.
In either case, your approach with Pa_OpenStream is close to optimal, but you should request zero latency for both input and output (in your edit you're requesting 50ms input latency, zero output latency). The end result will be that PortAudio selects the lowest available ASIO buffer size. If this turns out to be unstable (audio glitches) then you'll need to increase the requested latency.
include/pa_asio.h exposes a host-API-specific interface for querying the ASIO buffer sizes allowed by the driver (be aware that this can change if you change settings in the control panel). It also provides a function to display the driver's control panel UI.
EDIT: Note that Pa_GetDefaultInputDevice() and Pa_GetDefaultOutputDevice() will only return ASIO devices if you built PortAudio for ASIO only. If you included any other more common APIs in the build (e.g. WMME or DirectSound) they will be given priority as the (lowest common denominator) default device. You could add a check that you are actually accessing the ASIO device:
assert(Pa_GetHostApiInfo(Pa_GetDeviceInfo(Pa_GetDefaultOutputDevice())->hostApi)->type == paASIO);
If PortAudio is compiled with support for multiple host APIs: To get the default ASIO device: enumerate host APIs using Pa_GetHostApiCount and Pa_GetHostApiInfo to find the ASIO host API. Then pull the default ASIO device indices from the returned PaHostApiInfo struct.

IMediaSample returned by Sample Grabber has unexpected buffer size

I was working on a audio/video capture library for Windows using Media Foundation. However, I faced the issue described in this post for some webcams on Windows 8.1. I have thus decided to have another implementation using Directshow to support in my app the webcams for which the drivers haven't been updated yet.
The library works quite well, but I have noticed a problem with some webcams for which the sample (IMediaSample) returned has not the expected size according to the format that was set before starting the camera.
For example, I have the case where the format set has a subtype of MEDIASUBTYPE_RGB24 (3 bytes per pixels) and the frame size is 640x480. The biSizeImage (from BITMAPINFOHEADER) is well 640*480*3 = 921600 when applying the format.
The IAMStreamConfig::SetFormat() method succeed to apply the format.
hr = pStreamConfig->SetFormat(pmt);
I also set the format to the Sample Grabber Interface as follow:
hr = pSampleGrabberInterface->SetMediaType(pmt);
I applied the format before starting the graph.
However, in the callback (ISampleGrabberCB::SampleCB), I'm receiving a sample of size 230400 (which might be a buffer for a frame of size 320x240 (320*240*3=230400)).
HRESULT MyClass::SampleCB(double sampleTime, IMediaSample *pSample)
{
unsigned char* pBuffer= 0;
HRESULT hr = pSample->GetPointer((unsigned char**) &pBuffer);
if(SUCCEEDED(hr) {
long bufSize = pSample->GetSize();
//bufSize = 230400
}
}
I tried to investigate the media type returned using the IMediaSample::GetMediaType() method, but the media type is NULL, which means, according to the documentation of the GetMediaType method that the media type has not changed (so I guess, it's still the media type I've applied successfully using the IAMStreamConfig::SetFormat() function).
HRESULT hr = pSample->GetMediaType(&pType);
if(SUCCEEDED(hr)) {
if(pType==NULL) {
//it enters here => the media type has not changed!
}
}
Why the sample buffer size returned is not the expected size in this case? How can I solve this issue?
Thanks in advance!

Sample Grabber callback will always return "correct" size in terms that it matches actual data size and formats used in streaming pipeline.
If you are seeing the mismatch, it means that your filter graph topology differs from what you expect it to be. You need to review the graph (esp. using remote connection by GraphEdit), inspect media types and check why it was built incorrectly. For example, you could be applying format of your interest after connecting pins, which is too late.
See also:
How can I reverse engineer a DirectShow graph?

How to use hardware acceleration with ffmpeg

I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapi and --enable-hwaccel=h264. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi") but it returns nullptr.
Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?
P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.

After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well.
So let's start with the easiest:
Mac OS X
To get HW acceleration working on Mac OS you should just use the following:
avcodec_find_decoder_by_name("h264_vda");
Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.
Linux VDPAU
On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above:
avcodec_find_decoder_by_name("h264_vdpau");
You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavg and FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.c to get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.
The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCr which killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:
bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
AVFrame* frame)
{
VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
VdpOutputSurface surface;
vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
VdpVideoSurface videoSurface = renderState->surface;
auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
VDP_INVALID_HANDLE,
nullptr,
VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
0, nullptr,
videoSurface,
0, nullptr,
nullptr,
surface,
nullptr, nullptr, 0, nullptr);
if(status == VDP_STATUS_OK)
{
auto tmframe = av_frame_alloc();
tmframe->format = AV_PIX_FMT_BGRA;
tmframe->width = frame->width;
tmframe->height = frame->height;
if(av_frame_get_buffer(tmframe, 32) >= 0)
{
VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
reinterpret_cast<void * const *>(tmframe->data),
reinterpret_cast<const uint32_t *>(tmframe->linesize));
if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
{
av_frame_unref(frame);
av_frame_move_ref(frame, tmframe);
return;
}
}
av_frame_unref(tmframe);
}
vdp_output_surface_destroy(surface);
return 0;
}
While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRA format which was more suitable for my needs maybe you will choose another.
The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.
Linux VA-API
Since my original question was regarding VA-API I can't not leave it unanswered.
First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi") doesn't make any sense: it is nullptr.
I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!
I've used the following link to setup it on my Ubuntu.
Also, you might want to check the comments to the original question where #Timothy_G also mentioned some links regarding VA-API.

FFMPEG with C++ accessing a webcam

I have searched all around and can not find any examples or tutorials on how to access a webcam using ffmpeg in C++. Any sample code or any help pointing me to some documentation, would greatly be appreciated.
Thanks in advance.

I have been working on this for months now. Your first "issue" is that ffmpeg (libavcodec and other ffmpeg libs) does NOT access web cams, or any other device.
For a basic USB webcam, or audio/video capture card, you first need driver software to access that device. For linux, these drivers fall under the Video4Linux (V4L2 as it is known) category, which are modules that are part of most distros. If you are working with MS Windows, then you need to get an SDK that allows you to access the device. MS may have something for accessing generic devices, (but from my experience, they are not very capable, if they work at all) If you've made it this far, then you now have raw frames (video and/or audio).
THEN you get to the ffmpeg part - libavcodec - which takes the raw frames (audio and/or video) and encodes them into a streams, which ffmpeg can then mux into your final container.
I have searched, but have found very few examples of all of these, and most are piece-meal.
If you don't need to actually code of this yourself, the command line ffmpeg, as well as vlc, can access these devices, capture and save to files, and even stream.
That's the best I can do for now.
ken

For windows use dshow
For Linux (like ubuntu) use Video4Linux (V4L2).
FFmpeg can take input from V4l2 and can do the process.
To find the USB video path type : ls /dev/video*
E.g : /dev/video(n) where n = 0 / 1 / 2 ….
AVInputFormat – Struct which holds the information about input device format / media device format.
av_find_input_format ( “v4l2”) [linux]
av_format_open_input(AVFormatContext , “/dev/video(n)” , AVInputFormat , NULL)
if return value is != 0 then error.
Now you have accessed the camera using FFmpeg and can continue the operation.
sample code is below.
int CaptureCam()
{
avdevice_register_all(); // for device
avcodec_register_all();
av_register_all();
char *dev_name = "/dev/video0"; // here mine is video0 , it may vary.
AVInputFormat *inputFormat =av_find_input_format("v4l2");
AVDictionary *options = NULL;
av_dict_set(&options, "framerate", "20", 0);
AVFormatContext *pAVFormatContext = NULL;
// check video source
if(avformat_open_input(&pAVFormatContext, dev_name, inputFormat, NULL) != 0)
{
cout<<"\nOops, could'nt open video source\n\n";
return -1;
}
else
{
cout<<"\n Success !";
}
} // end function
Note : Header file < libavdevice/avdevice.h > must be included

This really doesn't answer the question as I don't have a pure ffmpeg solution for you, However, I personally use Qt for webcam access. It is C++ and will have a much better API for accomplishing this. It does add a very large dependency on your code however.

It definitely depends on the webcam - for example, at work we use IP cameras that deliver a stream of jpeg data over the network. USB will be different.
You can look at the DirectShow samples, eg PlayCap (but they show AmCap and DVCap samples too). Once you have a directshow input device (chances are whatever device you have will be providing this natively) you can hook it up to ffmpeg via the dshow input device.
And having spent 5 minutes browsing the ffmpeg site to get those links, I see this...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js