Use Source Reader to get H264 samples from webcam source - c++

When using the Source Reader I can use it to get decoded YUV samples using an mp4 file source (example code).
How can I do the opposite with a webcam source? Use the Source Reader to provide encoded H264 samples? My webcam supports RGB24 and I420 pixel formats and I can get H264 samples if I manually wire up the H264 MFT transform. But it seems as is the Source Reader should be able to take care of the transform for me. I get an error whenever I attempt to set MF_MT_SUBTYPE of MFVideoFormat_H264 on the Source Reader.
Sample snippet is shown below and the full example is here.
// Get the first available webcam.
CHECK_HR(MFCreateAttributes(&videoConfig, 1), "Error creating video configuration.");
// Request video capture devices.
CHECK_HR(videoConfig->SetGUID(
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE,
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID), "Error initialising video configuration object.");
CHECK_HR(videoConfig->SetGUID(MF_MT_SUBTYPE, WMMEDIASUBTYPE_I420),
"Failed to set video sub type to I420.");
CHECK_HR(MFEnumDeviceSources(videoConfig, &videoDevices, &videoDeviceCount), "Error enumerating video devices.");
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->GetAllocatedString(MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &webcamFriendlyName, &nameLength),
"Error retrieving video device friendly name.\n");
wprintf(L"First available webcam: %s\n", webcamFriendlyName);
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->ActivateObject(IID_PPV_ARGS(&pVideoSource)),
"Error activating video device.");
CHECK_HR(MFCreateAttributes(&pAttributes, 1),
"Failed to create attributes.");
// Adding this attribute creates a video source reader that will handle
// colour conversion and avoid the need to manually convert between RGB24 and RGB32 etc.
CHECK_HR(pAttributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, 1),
"Failed to set enable video processing attribute.");
CHECK_HR(pAttributes->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
// Create a source reader.
CHECK_HR(MFCreateSourceReaderFromMediaSource(
pVideoSource,
pAttributes,
&pVideoReader), "Error creating video source reader.");
MFCreateMediaType(&pSrcOutMediaType);
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264), "Error setting video sub type.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_AVG_BITRATE, 240000), "Error setting average bit rate.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_INTERLACE_MODE, 2), "Error setting interlace mode.");
CHECK_HR(pVideoReader->SetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, NULL, pSrcOutMediaType),
"Failed to set media type on source reader.");
CHECK_HR(pVideoReader->GetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFirstOutputType),
"Error retrieving current media type from first video stream.");
std::cout << "Source reader output media type: " << GetMediaTypeDescription(pFirstOutputType) << std::endl << std::endl;
Output:
bind returned success
First available webcam: Logitech QuickCam Pro 9000
Failed to set media type on source reader. Error: C00D5212.
finished.

Source Reader does not look like suitable API here. It is API to implement "half of pipeline" which includes necessary decoding but not encoding. The other half is Sink Writer API which is capable to handle encoding, and which can encode H.264.
Or your another option, unless you are developing a UWP project, is Media Session API which implements a pipeline end to end.
Even though technically (in theory) you could have an encoding MFT as a part of Source Reader pipeline, Source Reader API itself is insufficiently flexible to add encoding style tansforms based on requested media types.
So, one solution could be to have Source Reader to read with necessary decoding (such as up to having RGB32 or NV12 video frames), then Sink Writer to manage encoding with respectively appropriate media sink on its end (or Sample Grabber as media sink). Another solution is to put Media Foundation primitives into Media Session pipeline which can manage both decoding and encoding parts, connected together.

Now, your use case is clearer.
For me, your MFWebCamRtp is the best optimized way of doing : WebCam Source Reader -> Encoding -> RTP Streaming.
But you are experiencing presentation clock issues, synchronization issues, or unsynchronized audio video issues. Am I right ?
So you tried Sample Grabber Sink, and now Source Reader, like I suggested to you. Of course, you can think that a Media Session will be able to do it better.
I think so, but extra work will be needed.
Here is what I would do in your case :
Code a custom RTP Sink
Create a topology with webcam source, h264 encoder, your custom RTP Sink
Add your topology to a MediaSession
Use the MediaSession to play the process
If you want a networkstream sink sample, see this : MFSkJpegHttpStreamer
This is old, but it's a good start. This program also uses winsock, like your.
You should be aware that RTP protocol uses UDP. A very good way to have synchronization issues... Definitely your main problem, as I guess.
What I think. You are trying to compensate for the weaknesses of the RTP protocol (UDP), with a management of the audio / video synchronization of MediaFoundation. I think you will just fail with this approach.
I think your main problem is RTP protocol.
EDIT
No I'm not having synchronisation issues. The Source Reader and Sample Grabber both provide correct timestamps which I can use in the RTP header. Likewise no problems with RTP/UDP etc. that's the bit I do know about. My questions are originating from a desire to understand the most efficient (least amount of plumbing code) and flexible solution. And yes it does look like a custom sink writer is the optimal solution.
Again things are clearer. If you need help with a custom RTP sink, I'll be there.

Related

Write H.264 stream in buffer to a streamable mp4 using ffmpeg

I wrote code to create H.264 stream, which has a loop to generate H.264 encoded frame.
while(true) {
...
x264_encoder_encode(encoder, &buffer, &i_buffer, &pic_in, &pic_out);
...
/*TODO: Write one frame in the buffer to a streamable mp4 file*/
}
Every single time, an H.264 encoded frame is generated and stored in the buffer. How can I write it into a streamable mp4 file directly through the buffer?
I spent lots of time searching for the solution. All I can find is to read stream from a file using
avformat_open_input(&fmtCtx, in_filename, 0, 0)
Is there any way to read directly from buffer without a file?
MP4 is actually not streamable. So in other words, you can't do it at all. I ran in that very problem.
The reason why it won't work is because when you open an mp4 file, you have to have all sorts of parameters, which by default get saved at the end of the file. When you create an MP4, you can always forcibly save that info at the start. However, to know what those parameters are, you need all the data. And without those parameters, the software trying to load the mp4 fails very early on. This is true for some other formats such as webm videos and .m4a or .wav for audio.
What you have to do is stream the actual H.264, possibly using RTSP or a format of your own if you're in control of both sides.

Stream live audio live555

I was writing as I could not find the answer in previous topics. I am using live555 to stream live video (h264) and audio(g723), which are being recorded by a web camera. The video part is already done and it works perfectly, but I have no clue about the audio task.
As long as I have read I have to create a ServerMediaSession to which I should add two subsessions: one for the video and one for the audio. For the video part I created a subclass of OnDemandServerMediaSubsession, a subclass of FramedSource and the Encoder class, but for the audio aspect I do not know on which classes should I base the implementation.
The web camera records and delivers audio frames in g723 format separatedly from the video. I would say the audio is raw as when I try to play it in VLC it says that it could not find any startcode; so I suppose it is the raw audio stream what is recorded by the web cam.
I was wondering if someone could give me a hint.
For an audio stream ,your override of OnDemandServerMediaSubsession::createNewRTPSink should create a SimpleRTPSink.
Something like :
RTPSink* YourAudioMediaSubsession::createNewRTPSink(Groupsock* rtpGroupsock, unsigned char rtpPayloadTypeIfDynamic, FramedSource* inputSource)
{
return SimpleRTPSink::createNew(envir(), rtpGroupsock,
4,
frequency,
"audio",
"G723",
channels );
}
The frequency and the number of channels should comes from the inputSource.

FFMPEG with C++ accessing a webcam

I have searched all around and can not find any examples or tutorials on how to access a webcam using ffmpeg in C++. Any sample code or any help pointing me to some documentation, would greatly be appreciated.
Thanks in advance.
I have been working on this for months now. Your first "issue" is that ffmpeg (libavcodec and other ffmpeg libs) does NOT access web cams, or any other device.
For a basic USB webcam, or audio/video capture card, you first need driver software to access that device. For linux, these drivers fall under the Video4Linux (V4L2 as it is known) category, which are modules that are part of most distros. If you are working with MS Windows, then you need to get an SDK that allows you to access the device. MS may have something for accessing generic devices, (but from my experience, they are not very capable, if they work at all) If you've made it this far, then you now have raw frames (video and/or audio).
THEN you get to the ffmpeg part - libavcodec - which takes the raw frames (audio and/or video) and encodes them into a streams, which ffmpeg can then mux into your final container.
I have searched, but have found very few examples of all of these, and most are piece-meal.
If you don't need to actually code of this yourself, the command line ffmpeg, as well as vlc, can access these devices, capture and save to files, and even stream.
That's the best I can do for now.
ken
For windows use dshow
For Linux (like ubuntu) use Video4Linux (V4L2).
FFmpeg can take input from V4l2 and can do the process.
To find the USB video path type : ls /dev/video*
E.g : /dev/video(n) where n = 0 / 1 / 2 ….
AVInputFormat – Struct which holds the information about input device format / media device format.
av_find_input_format ( “v4l2”) [linux]
av_format_open_input(AVFormatContext , “/dev/video(n)” , AVInputFormat , NULL)
if return value is != 0 then error.
Now you have accessed the camera using FFmpeg and can continue the operation.
sample code is below.
int CaptureCam()
{
avdevice_register_all(); // for device
avcodec_register_all();
av_register_all();
char *dev_name = "/dev/video0"; // here mine is video0 , it may vary.
AVInputFormat *inputFormat =av_find_input_format("v4l2");
AVDictionary *options = NULL;
av_dict_set(&options, "framerate", "20", 0);
AVFormatContext *pAVFormatContext = NULL;
// check video source
if(avformat_open_input(&pAVFormatContext, dev_name, inputFormat, NULL) != 0)
{
cout<<"\nOops, could'nt open video source\n\n";
return -1;
}
else
{
cout<<"\n Success !";
}
} // end function
Note : Header file < libavdevice/avdevice.h > must be included
This really doesn't answer the question as I don't have a pure ffmpeg solution for you, However, I personally use Qt for webcam access. It is C++ and will have a much better API for accomplishing this. It does add a very large dependency on your code however.
It definitely depends on the webcam - for example, at work we use IP cameras that deliver a stream of jpeg data over the network. USB will be different.
You can look at the DirectShow samples, eg PlayCap (but they show AmCap and DVCap samples too). Once you have a directshow input device (chances are whatever device you have will be providing this natively) you can hook it up to ffmpeg via the dshow input device.
And having spent 5 minutes browsing the ffmpeg site to get those links, I see this...

Encoding video on H.263 to send over RTP

I'm developing an application to send video over RTP to a client that can play only H.263 (1996) and H263+ (1998).
To do this i've encoded the video using libav following these steps: (this is only part of the code)
av_register_all();
avformat_network_init();
Fmt = av_guess_format("rtp", NULL, NULL);
...
st = add_video_stream(FmtCtx, CODEC_ID_H263);
...
avio_open(&FmtCtx->pb, rtp_url, URL_WRONLY)
To finally enter a loop where i encode the video, the problem is that the stream generated by this program is encoded in H.263-2000 (or H.263++) which the other side cannot undertand, even though i use CODEC_ID_H263 or CODEC_ID_H263P in the initialization the same thing happens.
Is it possible to encode in those old H.263 versions using libav? i havent managed to do it not even using ffmpeg commands. The stream is always h.263-2000 (PT=96)

Saving H.264 RTP stream without re-encoding?

My C++ application receives a H.264 RTP video stream.
Right now it decodes the stream, saves it into a YUV file and later I use ffmpeg to re-ecode the file into something suitable to watch on a Windows PC (eg. Mpeg4 AVI).
Shouldn't it be possible to save the H.264 stream into a AVI (or similar) container without having to decode and re-encode it ? That would require some H.264 decoder on the PC to watch, but it should be much more efficient.
How could that be done ? Are there any libraries supporting that ?
using ffmpeg is correct but the answers posted so far dont look right to me.
the correct switch should be:
-vcodec copy
Your program could pipe the rtp itself through ffmpeg - even invoking it using popen3().
It seems that you need to use an intermediate SDP file - I speculate that you can specify a file you created as a named pipe or with tmpfile() which your application writes to - using the file as an intermediary.
The command-line would be something like:
int p[3];
const char* const out_fmt = "avi";
const char* cmd[] = {"ffmpeg","-f",,"-i",temp_sdp_filename,"-vcodec","copy","-f",out_fmt,"-",NULL};
if(-1 == popen3(p,cmd)) ...
// write the rtp that you receive to p[STDIN_FILENO]
// read the avi from p[STDOUT_FILENO]
// read any messages and error text from p[STDERR_FILENO]
I believe that in this circumstance ffmpeg is clever enough to repackage the container (rtp stream vs AVI) without transcoding the video and audio (this is the -vcodec copy switch); therefore, you'd have no loss of quality and it'd be blazingly fast.