I have searched all around and can not find any examples or tutorials on how to access a webcam using ffmpeg in C++. Any sample code or any help pointing me to some documentation, would greatly be appreciated.
Thanks in advance.
I have been working on this for months now. Your first "issue" is that ffmpeg (libavcodec and other ffmpeg libs) does NOT access web cams, or any other device.
For a basic USB webcam, or audio/video capture card, you first need driver software to access that device. For linux, these drivers fall under the Video4Linux (V4L2 as it is known) category, which are modules that are part of most distros. If you are working with MS Windows, then you need to get an SDK that allows you to access the device. MS may have something for accessing generic devices, (but from my experience, they are not very capable, if they work at all) If you've made it this far, then you now have raw frames (video and/or audio).
THEN you get to the ffmpeg part - libavcodec - which takes the raw frames (audio and/or video) and encodes them into a streams, which ffmpeg can then mux into your final container.
I have searched, but have found very few examples of all of these, and most are piece-meal.
If you don't need to actually code of this yourself, the command line ffmpeg, as well as vlc, can access these devices, capture and save to files, and even stream.
That's the best I can do for now.
ken
For windows use dshow
For Linux (like ubuntu) use Video4Linux (V4L2).
FFmpeg can take input from V4l2 and can do the process.
To find the USB video path type : ls /dev/video*
E.g : /dev/video(n) where n = 0 / 1 / 2 ….
AVInputFormat – Struct which holds the information about input device format / media device format.
av_find_input_format ( “v4l2”) [linux]
av_format_open_input(AVFormatContext , “/dev/video(n)” , AVInputFormat , NULL)
if return value is != 0 then error.
Now you have accessed the camera using FFmpeg and can continue the operation.
sample code is below.
int CaptureCam()
{
avdevice_register_all(); // for device
avcodec_register_all();
av_register_all();
char *dev_name = "/dev/video0"; // here mine is video0 , it may vary.
AVInputFormat *inputFormat =av_find_input_format("v4l2");
AVDictionary *options = NULL;
av_dict_set(&options, "framerate", "20", 0);
AVFormatContext *pAVFormatContext = NULL;
// check video source
if(avformat_open_input(&pAVFormatContext, dev_name, inputFormat, NULL) != 0)
{
cout<<"\nOops, could'nt open video source\n\n";
return -1;
}
else
{
cout<<"\n Success !";
}
} // end function
Note : Header file < libavdevice/avdevice.h > must be included
This really doesn't answer the question as I don't have a pure ffmpeg solution for you, However, I personally use Qt for webcam access. It is C++ and will have a much better API for accomplishing this. It does add a very large dependency on your code however.
It definitely depends on the webcam - for example, at work we use IP cameras that deliver a stream of jpeg data over the network. USB will be different.
You can look at the DirectShow samples, eg PlayCap (but they show AmCap and DVCap samples too). Once you have a directshow input device (chances are whatever device you have will be providing this natively) you can hook it up to ffmpeg via the dshow input device.
And having spent 5 minutes browsing the ffmpeg site to get those links, I see this...
Related
I've made a C++ program that lives in gke and takes some videos as input using ffmpeg, then does something with that input using opengl(not relevant), then finally encodes those edited videos as a single output. Normally the program works perfectly fine on my local machine, it encodes just as I want it to with no warnings or valgrind errors whatsoever. Then, after encoding the said video, I want my program to upload that video to the google cloud storage. This is where the problem comes, I have tried 2 methods for this: First, I tried using curl to upload to the cloud using a signed url. Second, I tried mounting the google storage using gcsfuse(I was already mounting the bucket to access the inputs in question). Both of those methods yielded undefined, weird behaviour's ranging from: Outputing a 0byte or 44byte file, (This is the most common one:) encoding in the correct file size ~500mb but the video is 0 seconds long, outputing a 0.4 second video or just encoding the desired output normally (really rare).
From the logs I can't see anything unusual, everything seems to work fine and ffmpeg does not give any errors or warnings, so does valgrind. Everything seems to work normally, even when I use curl to upload the video to the cloud the output is perfectly fine when it first encodes it (before sending it with curl) but the video gets messed up when curl uploads it to the cloud.
I'm using the muxing.c example of ffmpeg to encode my video with the only difference being:
void video_encoder::fill_yuv_image(AVFrame *frame, struct SwsContext *sws_context) {
const int in_linesize[1] = { 4 * width };
//uint8_t* dest[4] = { rgb_data, NULL, NULL, NULL };
sws_context = sws_getContext(
width, height, AV_PIX_FMT_RGBA,
width, height, AV_PIX_FMT_YUV420P,
SWS_BICUBIC, 0, 0, 0);
sws_scale(sws_context, (const uint8_t * const *)&rgb_data, in_linesize, 0,
height, frame->data, frame->linesize);
}
rgb_data is the data I got after editing the inputs. Again, this works fine and I don't think there are any errors here.
I'm not sure where the error is and since the code is huge I can't provide a replicable example. I'm just looking for someone to point me to the right direction.
Running the cloud's output in mplayer wields this result (This is when the video is the right size but is 0 seconds long, the most common one.):
MPlayer 1.4 (Debian), built with gcc-11 (C) 2000-2019 MPlayer Team
do_connect: could not connect to socket
connect: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.
Playing /media/c36c2633-d4ee-4d37-825f-88ae54b86100.
libavformat version 58.76.100 (external)
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
libavformat file format detected.
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7f2cba1168e0]moov atom not found
LAVF_header: av_open_input_stream() failed
RAWDV file format detected.
VIDEO: [DVSD] 720x480 24bpp 29.970 fps 0.0 kbps ( 0.0 kbyte/s)
X11 error: BadMatch (invalid parameter attributes)
Failed to open VDPAU backend libvdpau_nvidia.so: cannot open shared object file: No such file or directory
[vdpau] Error when calling vdp_device_create_x11: 1
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 58.134.100 (external)
[dvvideo # 0x7f2cb987a380]Requested frame threading with a custom get_buffer2() implementation which is not marked as thread safe. This is not supported anymore, make your callback thread-safe.
Selected video codec: [ffdv] vfm: ffmpeg (FFmpeg DV)
==========================================================================
Load subtitles in /media/
==========================================================================
Opening audio decoder: [libdv] Raw DV Audio Decoder
Unknown/missing audio format -> no sound
ADecoder init failed :(
Opening audio decoder: [ffmpeg] FFmpeg/libavcodec audio decoders
[dvaudio # 0x7f2cb987a380]Decoder requires channel count but channels not set
Could not open codec.
ADecoder init failed :(
ADecoder init failed :(
Cannot find codec for audio format 0x56444152.
Audio: no sound
Starting playback...
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
[dvvideo # 0x7f2cb987a380]could not find dv frame profile
Error while decoding frame!
V: 0.0 2/ 2 ??% ??% ??,?% 0 0
Exiting... (End of file)
Edit: Since the code runs on a VM, I'm using xvfb-run ro start my application, but again even when using xvfb-run it works completely fine on when not encoding to the cloud.
Apparently, I'm assuming for security reasons, the google cloud storage does not allow us to do multiple continuous operations on a file, just a singular read/write operation. So I found a workaround by encoding my video to a local file inside the pod and then doing a copy operation to the cloud.
When using the Source Reader I can use it to get decoded YUV samples using an mp4 file source (example code).
How can I do the opposite with a webcam source? Use the Source Reader to provide encoded H264 samples? My webcam supports RGB24 and I420 pixel formats and I can get H264 samples if I manually wire up the H264 MFT transform. But it seems as is the Source Reader should be able to take care of the transform for me. I get an error whenever I attempt to set MF_MT_SUBTYPE of MFVideoFormat_H264 on the Source Reader.
Sample snippet is shown below and the full example is here.
// Get the first available webcam.
CHECK_HR(MFCreateAttributes(&videoConfig, 1), "Error creating video configuration.");
// Request video capture devices.
CHECK_HR(videoConfig->SetGUID(
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE,
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID), "Error initialising video configuration object.");
CHECK_HR(videoConfig->SetGUID(MF_MT_SUBTYPE, WMMEDIASUBTYPE_I420),
"Failed to set video sub type to I420.");
CHECK_HR(MFEnumDeviceSources(videoConfig, &videoDevices, &videoDeviceCount), "Error enumerating video devices.");
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->GetAllocatedString(MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &webcamFriendlyName, &nameLength),
"Error retrieving video device friendly name.\n");
wprintf(L"First available webcam: %s\n", webcamFriendlyName);
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->ActivateObject(IID_PPV_ARGS(&pVideoSource)),
"Error activating video device.");
CHECK_HR(MFCreateAttributes(&pAttributes, 1),
"Failed to create attributes.");
// Adding this attribute creates a video source reader that will handle
// colour conversion and avoid the need to manually convert between RGB24 and RGB32 etc.
CHECK_HR(pAttributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, 1),
"Failed to set enable video processing attribute.");
CHECK_HR(pAttributes->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
// Create a source reader.
CHECK_HR(MFCreateSourceReaderFromMediaSource(
pVideoSource,
pAttributes,
&pVideoReader), "Error creating video source reader.");
MFCreateMediaType(&pSrcOutMediaType);
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264), "Error setting video sub type.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_AVG_BITRATE, 240000), "Error setting average bit rate.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_INTERLACE_MODE, 2), "Error setting interlace mode.");
CHECK_HR(pVideoReader->SetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, NULL, pSrcOutMediaType),
"Failed to set media type on source reader.");
CHECK_HR(pVideoReader->GetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFirstOutputType),
"Error retrieving current media type from first video stream.");
std::cout << "Source reader output media type: " << GetMediaTypeDescription(pFirstOutputType) << std::endl << std::endl;
Output:
bind returned success
First available webcam: Logitech QuickCam Pro 9000
Failed to set media type on source reader. Error: C00D5212.
finished.
Source Reader does not look like suitable API here. It is API to implement "half of pipeline" which includes necessary decoding but not encoding. The other half is Sink Writer API which is capable to handle encoding, and which can encode H.264.
Or your another option, unless you are developing a UWP project, is Media Session API which implements a pipeline end to end.
Even though technically (in theory) you could have an encoding MFT as a part of Source Reader pipeline, Source Reader API itself is insufficiently flexible to add encoding style tansforms based on requested media types.
So, one solution could be to have Source Reader to read with necessary decoding (such as up to having RGB32 or NV12 video frames), then Sink Writer to manage encoding with respectively appropriate media sink on its end (or Sample Grabber as media sink). Another solution is to put Media Foundation primitives into Media Session pipeline which can manage both decoding and encoding parts, connected together.
Now, your use case is clearer.
For me, your MFWebCamRtp is the best optimized way of doing : WebCam Source Reader -> Encoding -> RTP Streaming.
But you are experiencing presentation clock issues, synchronization issues, or unsynchronized audio video issues. Am I right ?
So you tried Sample Grabber Sink, and now Source Reader, like I suggested to you. Of course, you can think that a Media Session will be able to do it better.
I think so, but extra work will be needed.
Here is what I would do in your case :
Code a custom RTP Sink
Create a topology with webcam source, h264 encoder, your custom RTP Sink
Add your topology to a MediaSession
Use the MediaSession to play the process
If you want a networkstream sink sample, see this : MFSkJpegHttpStreamer
This is old, but it's a good start. This program also uses winsock, like your.
You should be aware that RTP protocol uses UDP. A very good way to have synchronization issues... Definitely your main problem, as I guess.
What I think. You are trying to compensate for the weaknesses of the RTP protocol (UDP), with a management of the audio / video synchronization of MediaFoundation. I think you will just fail with this approach.
I think your main problem is RTP protocol.
EDIT
No I'm not having synchronisation issues. The Source Reader and Sample Grabber both provide correct timestamps which I can use in the RTP header. Likewise no problems with RTP/UDP etc. that's the bit I do know about. My questions are originating from a desire to understand the most efficient (least amount of plumbing code) and flexible solution. And yes it does look like a custom sink writer is the optimal solution.
Again things are clearer. If you need help with a custom RTP sink, I'll be there.
I need to have ffmpeg decode my video(e.g. h264) using hardware acceleration. I'm using the usual way of decoding frames: read packet -> decode frame. And I'd like to have ffmpeg speed up decoding. So I've built it with --enable-vaapi and --enable-hwaccel=h264. But I don't really know what should I do next. I've tried to use avcodec_find_decoder_by_name("h264_vaapi") but it returns nullptr.
Anyway, I might want to use others API and not just VA API. How one is supposed to speed up ffmpeg decoding?
P.S. I didn't find any examples on Internet which uses ffmpeg with hwaccel.
After some investigation I was able to implement the necessary HW accelerated decoding on OS X (VDA) and Linux (VDPAU). I will update the answer when I get my hands on Windows implementation as well.
So let's start with the easiest:
Mac OS X
To get HW acceleration working on Mac OS you should just use the following:
avcodec_find_decoder_by_name("h264_vda");
Note, however that you can accelerate h264 videos only on Mac OS with FFmpeg.
Linux VDPAU
On Linux things are much more complicated(who is surprised?). FFmpeg has 2 HW accelerators on Linux: VDPAU(Nvidia) and VAAPI(Intel) and only one HW decoder: for VDPAU. And it may seems perfectly reasonable to use vdpau decoder like in the Mac OS example above:
avcodec_find_decoder_by_name("h264_vdpau");
You might be surprised to find out that it doesn't change anything and you have no acceleration at all. That's because it is only the beginning, you have to write much more code to get the acceleration working. Happily, you don't have to come up with a solution on your own: there are at least 2 good examples of how to achieve that: libavg and FFmpeg itself. libavg has VDPAUDecoder class which is perfectly clear and which I've based my implementation on. You can also consult ffmpeg_vdpau.c to get another implementation to compare. In my opinion the libavg implementation is easier to grasp, though.
The only things both aforementioned examples lack is proper copying of the decoded frame to the main memory. Both examples uses VdpVideoSurfaceGetBitsYCbCr which killed all the performance I gained on my machine. That's why you might want to use the following procedure to extract the data from a GPU:
bool VdpauDecoder::fillFrameWithData(AVCodecContext* context,
AVFrame* frame)
{
VdpauDecoder* vdpauDecoder = static_cast<VdpauDecoder*>(context->opaque);
VdpOutputSurface surface;
vdp_output_surface_create(m_VdpDevice, VDP_RGBA_FORMAT_B8G8R8A8, frame->width, frame->height, &surface);
auto renderState = reinterpret_cast<vdpau_render_state*>(frame->data[0]);
VdpVideoSurface videoSurface = renderState->surface;
auto status = vdp_video_mixer_render(vdpauDecoder->m_VdpMixer,
VDP_INVALID_HANDLE,
nullptr,
VDP_VIDEO_MIXER_PICTURE_STRUCTURE_FRAME,
0, nullptr,
videoSurface,
0, nullptr,
nullptr,
surface,
nullptr, nullptr, 0, nullptr);
if(status == VDP_STATUS_OK)
{
auto tmframe = av_frame_alloc();
tmframe->format = AV_PIX_FMT_BGRA;
tmframe->width = frame->width;
tmframe->height = frame->height;
if(av_frame_get_buffer(tmframe, 32) >= 0)
{
VdpStatus status = vdp_output_surface_get_bits_native(surface, nullptr,
reinterpret_cast<void * const *>(tmframe->data),
reinterpret_cast<const uint32_t *>(tmframe->linesize));
if(status == VDP_STATUS_OK && av_frame_copy_props(tmframe, frame) == 0)
{
av_frame_unref(frame);
av_frame_move_ref(frame, tmframe);
return;
}
}
av_frame_unref(tmframe);
}
vdp_output_surface_destroy(surface);
return 0;
}
While it has some "external" objects used inside you should be able to understand it once you have implemented the "get buffer" part(to which the aforementioned examples are of great help). Also I've used BGRA format which was more suitable for my needs maybe you will choose another.
The problem with all of it is that you can't just get it working from FFmpeg you need to understand at least basics of the VDPAU API. And I hope that my answer will aid someone in implementing the HW acceleration on Linux. I've spent much time on it myself before I realized that there is no simple, one-line way of implementing HW accelerated decoding on Linux.
Linux VA-API
Since my original question was regarding VA-API I can't not leave it unanswered.
First of all there is no decoder for VA-API in FFmpeg so avcodec_find_decoder_by_name("h264_vaapi") doesn't make any sense: it is nullptr.
I don't know how much harder(or maybe simpler?) is to implement decoding via VA-API since all the examples I've seen were quite intimidating. So I choose not to use VA-API at all and I had to implement the acceleration for an Intel card. Fortunately enough for me, there is a VDPAU library(driver?) which works over VA-API. So you can use VDPAU on Intel cards!
I've used the following link to setup it on my Ubuntu.
Also, you might want to check the comments to the original question where #Timothy_G also mentioned some links regarding VA-API.
I am writing client-server system that uses FFMPEG library to parse H.264 stream into NAL units on the server side, then uses channel coding to send them over network to client side, where my application must be able to play video.
The question is how to play received AVPackets (NAL units) in my application as video stream.
I have found this tutorial helpful and used it as base for both server and client side.
Some sample code or resource related to playing video not from file, but from data inside program using FFMPEG library would be very helpful.
I am sure that received information will be sufficient to play video, because I tried to save received data as .h264 or .mp4 file and it can be played by VLC player.
Of what I understand from your question, you have the AVPackets and want to play a video. In reality this is two problems; 1. decoding your packets, and 2. playing the video.
For decoding your packets, with FFmpeg, you should take a look at the documentation for AVPacket, AVCodecContext and avcodec_decode_video2 to get some ideas; the general idea is that you want to do something (just wrote this in the browser, take with a grain of salt) along the lines of:
//the context, set this appropriately based on your video. See the above links for the documentation
AVCodecContext *decoder_context;
std::vector<AVPacket> packets; //assume this has your packets
...
AVFrame *decoded_frame = av_frame_alloc();
int ret = -1;
int got_frame = 0;
for(AVPacket packet : packets)
{
avcodec_get_frame_defaults(frame);
ret = avcodec_decode_video2(decoder_context, decoded_frame, &got_frame, &packet);
if (ret <= 0) {
//had an error decoding the current packet or couldn't decode the packet
break;
}
if(got_frame)
{
//send to whatever video player queue you're using/do whatever with the frame
...
}
got_frame = 0;
av_free_packet(&packet);
}
It's a pretty rough sketch, but that's the general idea for your problem of decoding the AVPackets. As for your problem of playing the video, you have many options, which will likely depend more on your clients. What you're asking is a pretty large problem, I'd advise familiarizing yourself with the FFmpeg documentation and the provided examples at the FFmpeg site. Hope that makes sense
I'm developing an application to send video over RTP to a client that can play only H.263 (1996) and H263+ (1998).
To do this i've encoded the video using libav following these steps: (this is only part of the code)
av_register_all();
avformat_network_init();
Fmt = av_guess_format("rtp", NULL, NULL);
...
st = add_video_stream(FmtCtx, CODEC_ID_H263);
...
avio_open(&FmtCtx->pb, rtp_url, URL_WRONLY)
To finally enter a loop where i encode the video, the problem is that the stream generated by this program is encoded in H.263-2000 (or H.263++) which the other side cannot undertand, even though i use CODEC_ID_H263 or CODEC_ID_H263P in the initialization the same thing happens.
Is it possible to encode in those old H.263 versions using libav? i havent managed to do it not even using ffmpeg commands. The stream is always h.263-2000 (PT=96)