I am attempting to write a GStreamer wrapper around a video analytic library. The input is video frames and the output is a metadata object. The metadata object contains the binary representation of the outlines of the objects in the video. This will be used downstream as to further inspect the metadata.
Is the GstBaseTransform the correct parent class for this conversion? Or should I be using some GstVideo* base class? Like GstVideoFilterClass?
Should the plugin type be Converter/Video/Metadata?
It seems that the GstBaseTransform is more set up for filters. Should I just derive from GstElement? I can't really find an example of this in any prior plugins.
The node graph will sort of be like the following:
video video video
VideoSrc ------- tee ------------------------------- Annotation ------- Stream
| |
| video metadata |
| metadata | metadata
`--- Analytics ---------- Processing ---'------------ Cloud
I used the GstVideoFilterClass and the GstMeta API to implement this rather than split up the streams.
Related
I'm currently building a small video player (and cutter) using GStreamer and Qt.
My pipeline is as follows :
| | -> video -> | Queue | -----------------------------------------------------> | PlaySink |
| UriDecodeBin | -> audio1 -> | Queue | -> | AudioConvert | -> | Volume | -> | AudioMixer | -> | |
| | -> audio2 -> | Queue | -> | AudioConvert | -> | Volume | -> | |
Volume is the plugin from https://gstreamer.freedesktop.org/documentation/volume/index.html?gi-language=c
Playback is fine, pausing and seeking as well, but when I try to change the volume (while playing a video) using following call :
g_object_set(_volumes[track], "volume", value, NULL);
The change can be heard only around 1 second later, which feels extremely slow.
Is this latency to be expected for this plugin (and/or whole pipeline) ?
If it isn't, how can I improve the latency of the change ?
If it is, is there any other plugin I can use to change the volume that would react faster ?
note : images inserted are quite wide, so open them in new tab if you want to zoom in
The answer came by printing the full pipeline : delay was caused by playsink element.
So, the old pipeline I used in the question looks like this :
We can see here that it the playsink created two queues, one for audio aqueue and one for video vqueue, and that those queues are using default buffering settings, allowing for up to one second of buffering, which was corresponding more or less to the delay I was experiencing when modifying volume parameters of the volume elements.
To solve the problem, I first looked into configuring the queues size for playsink, but it was unsuccessful, so I simply removed playsink from the pipeline, which now looks like this :
The audio queue audioQueue is setup to allow at most 50ms of buffering, which makes audio volume change quite reactive.
I didn't added a StreamSynchronizer as playsink uses. Synchronization seems to be fine. I'll try to figure out if it is needed or not in my case (single pipeline), and will update here if I find an answer.
When using the Source Reader I can use it to get decoded YUV samples using an mp4 file source (example code).
How can I do the opposite with a webcam source? Use the Source Reader to provide encoded H264 samples? My webcam supports RGB24 and I420 pixel formats and I can get H264 samples if I manually wire up the H264 MFT transform. But it seems as is the Source Reader should be able to take care of the transform for me. I get an error whenever I attempt to set MF_MT_SUBTYPE of MFVideoFormat_H264 on the Source Reader.
Sample snippet is shown below and the full example is here.
// Get the first available webcam.
CHECK_HR(MFCreateAttributes(&videoConfig, 1), "Error creating video configuration.");
// Request video capture devices.
CHECK_HR(videoConfig->SetGUID(
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE,
MF_DEVSOURCE_ATTRIBUTE_SOURCE_TYPE_VIDCAP_GUID), "Error initialising video configuration object.");
CHECK_HR(videoConfig->SetGUID(MF_MT_SUBTYPE, WMMEDIASUBTYPE_I420),
"Failed to set video sub type to I420.");
CHECK_HR(MFEnumDeviceSources(videoConfig, &videoDevices, &videoDeviceCount), "Error enumerating video devices.");
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->GetAllocatedString(MF_DEVSOURCE_ATTRIBUTE_FRIENDLY_NAME, &webcamFriendlyName, &nameLength),
"Error retrieving video device friendly name.\n");
wprintf(L"First available webcam: %s\n", webcamFriendlyName);
CHECK_HR(videoDevices[WEBCAM_DEVICE_INDEX]->ActivateObject(IID_PPV_ARGS(&pVideoSource)),
"Error activating video device.");
CHECK_HR(MFCreateAttributes(&pAttributes, 1),
"Failed to create attributes.");
// Adding this attribute creates a video source reader that will handle
// colour conversion and avoid the need to manually convert between RGB24 and RGB32 etc.
CHECK_HR(pAttributes->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, 1),
"Failed to set enable video processing attribute.");
CHECK_HR(pAttributes->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
// Create a source reader.
CHECK_HR(MFCreateSourceReaderFromMediaSource(
pVideoSource,
pAttributes,
&pVideoReader), "Error creating video source reader.");
MFCreateMediaType(&pSrcOutMediaType);
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video), "Failed to set major video type.");
CHECK_HR(pSrcOutMediaType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264), "Error setting video sub type.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_AVG_BITRATE, 240000), "Error setting average bit rate.");
CHECK_HR(pSrcOutMediaType->SetUINT32(MF_MT_INTERLACE_MODE, 2), "Error setting interlace mode.");
CHECK_HR(pVideoReader->SetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, NULL, pSrcOutMediaType),
"Failed to set media type on source reader.");
CHECK_HR(pVideoReader->GetCurrentMediaType((DWORD)MF_SOURCE_READER_FIRST_VIDEO_STREAM, &pFirstOutputType),
"Error retrieving current media type from first video stream.");
std::cout << "Source reader output media type: " << GetMediaTypeDescription(pFirstOutputType) << std::endl << std::endl;
Output:
bind returned success
First available webcam: Logitech QuickCam Pro 9000
Failed to set media type on source reader. Error: C00D5212.
finished.
Source Reader does not look like suitable API here. It is API to implement "half of pipeline" which includes necessary decoding but not encoding. The other half is Sink Writer API which is capable to handle encoding, and which can encode H.264.
Or your another option, unless you are developing a UWP project, is Media Session API which implements a pipeline end to end.
Even though technically (in theory) you could have an encoding MFT as a part of Source Reader pipeline, Source Reader API itself is insufficiently flexible to add encoding style tansforms based on requested media types.
So, one solution could be to have Source Reader to read with necessary decoding (such as up to having RGB32 or NV12 video frames), then Sink Writer to manage encoding with respectively appropriate media sink on its end (or Sample Grabber as media sink). Another solution is to put Media Foundation primitives into Media Session pipeline which can manage both decoding and encoding parts, connected together.
Now, your use case is clearer.
For me, your MFWebCamRtp is the best optimized way of doing : WebCam Source Reader -> Encoding -> RTP Streaming.
But you are experiencing presentation clock issues, synchronization issues, or unsynchronized audio video issues. Am I right ?
So you tried Sample Grabber Sink, and now Source Reader, like I suggested to you. Of course, you can think that a Media Session will be able to do it better.
I think so, but extra work will be needed.
Here is what I would do in your case :
Code a custom RTP Sink
Create a topology with webcam source, h264 encoder, your custom RTP Sink
Add your topology to a MediaSession
Use the MediaSession to play the process
If you want a networkstream sink sample, see this : MFSkJpegHttpStreamer
This is old, but it's a good start. This program also uses winsock, like your.
You should be aware that RTP protocol uses UDP. A very good way to have synchronization issues... Definitely your main problem, as I guess.
What I think. You are trying to compensate for the weaknesses of the RTP protocol (UDP), with a management of the audio / video synchronization of MediaFoundation. I think you will just fail with this approach.
I think your main problem is RTP protocol.
EDIT
No I'm not having synchronisation issues. The Source Reader and Sample Grabber both provide correct timestamps which I can use in the RTP header. Likewise no problems with RTP/UDP etc. that's the bit I do know about. My questions are originating from a desire to understand the most efficient (least amount of plumbing code) and flexible solution. And yes it does look like a custom sink writer is the optimal solution.
Again things are clearer. If you need help with a custom RTP sink, I'll be there.
I was writing as I could not find the answer in previous topics. I am using live555 to stream live video (h264) and audio(g723), which are being recorded by a web camera. The video part is already done and it works perfectly, but I have no clue about the audio task.
As long as I have read I have to create a ServerMediaSession to which I should add two subsessions: one for the video and one for the audio. For the video part I created a subclass of OnDemandServerMediaSubsession, a subclass of FramedSource and the Encoder class, but for the audio aspect I do not know on which classes should I base the implementation.
The web camera records and delivers audio frames in g723 format separatedly from the video. I would say the audio is raw as when I try to play it in VLC it says that it could not find any startcode; so I suppose it is the raw audio stream what is recorded by the web cam.
I was wondering if someone could give me a hint.
For an audio stream ,your override of OnDemandServerMediaSubsession::createNewRTPSink should create a SimpleRTPSink.
Something like :
RTPSink* YourAudioMediaSubsession::createNewRTPSink(Groupsock* rtpGroupsock, unsigned char rtpPayloadTypeIfDynamic, FramedSource* inputSource)
{
return SimpleRTPSink::createNew(envir(), rtpGroupsock,
4,
frequency,
"audio",
"G723",
channels );
}
The frequency and the number of channels should comes from the inputSource.
hi i am using wowza streaming engine 4 , to stream smil file
i am able to trace events when file play on flash,
and gether informantions like , which file play , time etc,
in onConnect() event,
precisely i want to get which file is played from my smil file.
but in case of apple hls streaming when i try to get file name in onHTTPSessionDestroy() method , eg.
public void onHTTPSessionDestroy(IHTTPStreamerSession httpSession) {
String streamName = httpSession.getStreamName();
}
i only get the name of smil file , not the actual file played .
is it possible to get the played file info in wowza hls steaming
New Api is introduce in wowza 4.1.1
public void onHTTPStreamerRequest(IHTTPStreamerSession httpSession, IHTTPStreamerRequestContext reqContext)
we can get check which bit-rate is played like media_w1577403587_b4000000_1.ts
_b is the bitrate
I have developed DirectShow C++ app which successfully previews web cam view into provided window. Now I want to capture image from this live web cam preview. I have used graph manager, ICaptureGraphBuilder2, IMoniker etc. for that.
I have searched and found following options:
WIA & Sample Grabber.
Many recommends using SampleGrabber but as per MS's msdn document SampleGrabber is deprecated and one should not use. And I don't want to use WIA API.
So which is the best DirectShow way to capture image from live web cam preview?
Here is a quote from DxSnap sample from DirectShow.NET library:
Use DirectShow to take snapshots from the Still pin of a capture
device. Note the MS encourages you to use WIA for this, but if you
want to do in with DirectShow and C#, here's how.
Note that this sample will only work with devices that output
uncompressed video as RBG24. This will include most webcams, but
probably zero tv tuners.
This is C# code, but you should get the idea as the interfaces are all the same. And there are other samples on how to use Sample Grabber Filter in C++.
Sample Grabber is deprecated, the headers are removed from a couple of latest SDKs, however the runtime components are all there and are going to be there for a long time, or otherwise a multitude of application would be broken (e.g. Video Chat in browser hosted GMail is using Sample Grabber). So basically Sample Grabber is still an easy way to capture snapshots from a web camera, or if you alternatively prefer to follow the latest MS APIs - you would want to look into Media Foundation (09 Jul 2016 update: new Windows Server installations might need one to add "Media Foundation" and/or "Desktop Experience" features to make Media Foundation API available along with DirectShow, and DirectShow Editing Services, Sample Grabber is a part of which. Default installation does not offer qedit.dll out of the box).
Also in C++ you certainly don't have to use Sample Grabber Filter. You can develop a custom filter using DirectShow BaseClasses to be a custom transformation filter or a custom renderer, which which accept incoming video feed and export the frames from the DirectShow pipeline. Another option is to use Sample Grabber sample source code from one of the older SDKs (which is not exact source for OS Sample Grabber, but it is doing the same thing). The point however that Sample Grabber shipped with Windows is still a good option.
Listed on Microsoft's website is an example of how to capture a frame using IVMRWindowlessControl9::GetCurrentImage ... Here's one way of doing it:
IBaseFilter* vmr9ptr; // I'm assuming that you got this pointer already
IVMRWindowlessControl9* controlPtr = NULL;
vmr9ptr->QueryInterface(IID_IVMRWindowlessControl9, (void**)controlPtr);
assert ( controlPtr != NULL );
// Get the current frame
BYTE* lpDib = NULL;
hr = controlPtr->GetCurrentImage(&lpDib);
// If everything is okay, we can create a BMP
if (SUCCEEDED(hr))
{
BITMAPINFOHEADER* pBMIH = (BITMAPINFOHEADER*) lpDib;
DWORD bufSize = pBMIH->biSizeImage;
// Let's create a bmp
BITMAPFILEHEADER bmpHdr;
BITMAPINFOHEADER bmpInfo;
size_t hdrSize = sizeof(bmpHdr);
size_t infSize = sizeof(bmpInfo);
memset(&bmpHdr, 0, hdrSize);
bmpHdr.bfType = ('M' << 8) | 'B';
bmpHdr.bfOffBits = static_cast<DWORD>(hdrSize + infSize);
bmpHdr.bfSize = bmpHdr.bfOffBits + bufSize;
// Builder the bit map info.
memset(&bmpInfo, 0, infSize);
bmpInfo.biSize = static_cast<DWORD>(infSize);
bmpInfo.biWidth = pBMIH->biWidth;
bmpInfo.biHeight = pBMIH->biHeight;
bmpInfo.biPlanes = pBMIH->biPlanes;
bmpInfo.biBitCount = pBMIH->biBitCount;
// boost::shared_arrays are awesome!
boost::shared_array<BYTE> buf(new BYTE[bmpHdr.bfSize]);//(lpDib);
memcpy(buf.get(), &bmpHdr, hdrSize); // copy the header
memcpy(buf.get() + hdrSize, &bmpInfo, infSize); // now copy the info block
memcpy(buf.get() + bmpHdr.bfOffBits, lpDib, bufSize);
// Do something with your image data ... seriously...
CoTaskMemFree(lpDib);
} // All done!
jeez... so much dis-information. if you're previewing in a directshow graph, then it depends on what you're previewing off of. Capture filters have 1, 2, or 3 pins. If it has 1 pin, it's most likely a "capture" pin (no preview pin). For this, if you want to capture and preview at the same time, you should put in a "Smart Tee" filter, and connect the VMR off of the preview pin, and hook up "something that grabs frames" off the capture pin. since you don't want to fool around with DirectShow's crummy pin start/stop stuff (instead, just simply controlling the entire graph's start/stop state). You don't need to use a SampleGrabber, it's a dead-simple filter and you could write it in a few hours (I should know, I'm the one that wrote it). it's simply a CTransInPlace filter that you can set a forced media type for it to accept, and you can set a callback interface on it to call you back when it receives a sample. It's actually simpler to write a NullRenderer which calls you back when it receives a sample, you could write this quite easily.
If the capture filter has 2 pins, it's most likely a capture pin and a still pin. in this case you still need a Smart Tee connected to the source's capture pin, and need to preview off the smart tee's preview pin, and capture samples off the smart tee's capture pin.
(If you don't know what a SmartTee is, it's a filter that plays allocator tricks and only sends a sample down the preview pin if the capture pin isn't super bogged down. It's job is to provide a path for the VMR to render from which won't botch up the allocators between the capture filter and the filters downstream of the capture filter)
If the capture filter has both capture and preview pins, I think you can figure out what you need to do then.
Anyhow, summary: The SampleGrabber is simply a CTransInPlaceFilter. You could write it as a Null Renderer, too, just make sure to fill out some junk in CheckInputType, and to call your callback back in DoRenderSample.