Read frame by request with DirectShow - c++

I'm trying to use DirectShow to capture video from webcam. I assume to use SampleGabber class. For now I see that DirectShow can only read frames continiously with some desired fps. Can DirectShow read frames by request?

DirectShow pipeline sets up streaming video. Frames will continuously stream through Sample Grabber and its callback, if you set it up. The callback itself adds minimal processing overhead if you don't force format change (to force video to be RGB in particular). It is up to whether to process or skip a frame there.
On request grabbing will be taking either last known video frame streamed, or next to go through Sample Grabber. This is typical mode of operation.
Some devices offer additional feature of taking a still on request. This is a rarer case and it's described on MSDN here: Capturing an Image From a Still Image Pin:
Some cameras can produce a still image separate from the capture
stream, and often the still image is of higher quality than the images
produced by the capture stream. The camera may have a button that acts
as a hardware trigger, or it may support software triggering. A camera
that supports still images will expose a still image pin, which is pin
category PIN_CATEGORY_STILL.
The recommended way to get still images from the device is to use the
Windows Image Acquisition (WIA) APIs. [...]
To trigger the still pin, use [...]

Related

Live streaming and processing with opencv

I am having a hard time figuring out a seemingly simple problem : my aim is to send a video stream to a server, process it using opencv, then send back the processed feed to be displayed.
I am thinking of using kafka to send and receive the feed since I already have some experience with it. However, this is raising a problem : opencv process video streams using the VideoCapture method, which is different from just reading a single image using the Read method.
If I stream my video feed frame by frame, will I be able to process my feed on the server as a video rather than a single image at time ? And when I get back the processed frame, can I display it again as a video ?
I am sure I misunderstood some concepts so please let me know if you need further explanations.
Apologies for the late response. I have built a Live-streaming project with a basic Analytics (Face Detection) using Kafka and OpenCV.
The publisher application has OpenCV to access the Live video from Webcam/Ip Camera / USB camera. Like you have mentioned VideoCapture.read(frame) fetches a continuous stream of frames/Images of the video as a Mat. Mat is then converted into a String (JSON) and published it to Kafka.
You can then, transform these objects as per their requirement (into Buffered Image for live streaming application) or work with the raw form (for face detection application). This will be the desired solution as it exhibits reusability by allowing a publisher application to produce data for multiple consumers.

How to use Intel Hardware MJPEG Decoder MFT in MediaFoundation SourceReader for Window Desktop application?

I'm developing USB camera streaming Desktop application using MediaFoundation SourceReader technique. The camera is having USB3.0 support and gives 60fps for 1080p MJPG video format resolution.
I used Software MJPEG Decoder MFT to convert MJPG to YUY2 frames and then converted into the RGB32 frame to draw on the window. Instead of 60fps, I'm able to render only 30fps on the window when using this software decoder. I have posted a question on this site and got some suggestion to use Intel Hardware MJPEG Decoder MFT to solve frame drop issue.
I have faced an error 0xC00D36B5 - MF_E_NOTACCEPTING when calling IMFTransform::ProcessInput() method. To solve this error, MSDN suggested using IMFTranform interface asynchronously. So, I used IMFMediaEventGenerator interface to GetEvent for every In/Out sample. Successfully, I can process only one input sample and then continuously IMFMediaEventGenerator:: GetEvent() methods returns MF_E_NO_EVENTS_AVAILABLE error(GetEvent() is synchronous).
I have tried to configure an asynchronous callback for SourceReader as well as IMFTransform but MFAsyncCallback:: Invoke method is not invoking, hence I planned to use GetEvent method.
Am I missing anything?If Yes, Someone guides me to use Intel Hardware Decoder into my project?
Intel Hardware MJPEG Decoder MFT is an asynchronous MFT and if you are managing it directly, you are responsible to apply asynchronous model. You seem to be doing this but you don't provide information that allows nailing the problem down. Yes, you are supposed to use event model described in ProcessInput, ProcessOutput sections of the article linked above. As you get the first frame, you should debug further to make it work with smooth continuous processing.
When you use APIs like media session our source reader, you have Media Foundation itself dealing with the MFTs. It is capable of doing synchronous and asynchronous consumption when appropriate. In this case, however, you don't do IMFTransform calls and even from your vague description it comes you are doing it wrong way.

DXGI Desktop Duplication: encoding frames to send them over the network

I'm trying to write an app which will capture a video stream of the screen and send it to a remote client. I've found out that the best way to capture a screen on Windows is to use DXGI Desktop Duplication API (available since Windows 8). Microsoft provides a neat sample which streams duplicated frames to screen. Now, I've been wondering what is the easiest, but still relatively fast way to encode those frames and send them over the network.
The frames come from AcquireNextFrame with a surface that contains the desktop bitmap and metadata which contains dirty and move regions that were updated. From here, I have a couple of options:
Extract a bitmap from a DirectX surface and then use an external library like ffmpeg to encode series of bitmaps to H.264 and send it over RTSP. While straightforward, I fear that this method will be too slow as it isn't taking advantage of any native Windows methods. Converting D3D texture to a ffmpeg-compatible bitmap seems like unnecessary work.
From this answer: convert D3D texture to IMFSample and use MediaFoundation's SinkWriter to encode the frame. I found this tutorial of video encoding, but I haven't yet found a way to immediately get the encoded frame and send it instead of dumping all of them to a video file.
Since I haven't done anything like this before, I'm asking if I'm moving in the right direction. In the end, I want to have a simple, preferably low latency desktop capture video stream, which I can view from a remote device.
Also, I'm wondering if I can make use of dirty and move regions provided by Desktop Duplication. Instead of encoding the frame, I can send them over the network and do the processing on the client side, but this means that my client has to have DirectX 11.1 or higher available, which is impossible if I would want to stream to a mobile platform.
You can use IMFTransform interface for H264 encoding. Once you get IMFSample from ID3D11Texture2D just pass it to IMFTransform::ProcessInput and get the encoded IMFSample from IMFTransform::ProcessOutput.
Refer this example for encoding details.
Once you get the encoded IMFSamples you can send them one by one over the network.

Using Async_reader and Wave Parser in DirectShow filter graph results in video seeking issues

Some background:
I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps, each of which can last for a long time (for example 30 seconds), to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
The problem:
When I attempt to seek through the resulting video, I have to wait until a bitmap's start time occurs before it is displayed. For example, if I'm currently seeing bitmap 4 and skip back to the time which bitmap 2 is displayed the video output will not change until the third bitmap starts. Initially I wondered if I wasn't allowing FillBuffer to be called enough (as at the moment it's only once per bitmap) however I have since noted that when the audio track is very short (just a second long perhaps), I can seek through the video as expected. Is there a another way I should be introducing audio into the filter graph? Do I need to perform some kind of indexing when the WMV has been rendered? I'm at a bit of a loss...
You may need to do indexing as a post-processing step. Try indexing it with Windows Media File Editor from Windows Media Encoder SDK and see if this improves seeking.
Reducing key frame interval in the encoder profile may improve seeking. This can be done in Windows Media Profile Editor from the SDK. Note that this will cause file size increase.

DirectShow filter graph using WMASFWriter creates video which is too short

I am attempting to create a DirectShow source filter based on the pushsource example from the DirectShow SDK. This essentially outputs a set of bitmaps to a video. I have set up a filter graph which uses Async_reader with a Wave Parser for audio and my new filter to push the video (the filter is a CSourceStream and I populate my frames in the FillBuffer function). These are both connected to a WMASFWriter to output a WMV.
Each bitmap can last for several seconds so in the FillBuffer function I'm calling SetTime on the passed IMediaSample with a start and end time several seconds apart. This works fine when rendering to the screen but writing to disk results in a file which is too short in duration. It seems like the last bitmap is being ignored when writing a WMV (it is shown as the video ends rather than lasting for the intended duration). This is the case both with my filter and a modified pushsource filter (in which the frame length has been increased).
I've seen additional odd behaviour in that it was not possible to have a video that wasn't a multiple of 10 seconds in length at one point whilst I was trying to make this work. I'm not sure what this was, but I though I'd mention it incase it's relevant.
I think the end time is simply ignored. Normally video samples only have a start time because they are a point in time. If there is movement in the video, the movement is fluent, though the video are just points in time.
I think the solution is simple. Because video stays the same until the next frame is received, you can just add a dummy frame at the end of your video. You can simply repeat the previous frame.