I am having a hard time figuring out a seemingly simple problem : my aim is to send a video stream to a server, process it using opencv, then send back the processed feed to be displayed.
I am thinking of using kafka to send and receive the feed since I already have some experience with it. However, this is raising a problem : opencv process video streams using the VideoCapture method, which is different from just reading a single image using the Read method.
If I stream my video feed frame by frame, will I be able to process my feed on the server as a video rather than a single image at time ? And when I get back the processed frame, can I display it again as a video ?
I am sure I misunderstood some concepts so please let me know if you need further explanations.
Apologies for the late response. I have built a Live-streaming project with a basic Analytics (Face Detection) using Kafka and OpenCV.
The publisher application has OpenCV to access the Live video from Webcam/Ip Camera / USB camera. Like you have mentioned VideoCapture.read(frame) fetches a continuous stream of frames/Images of the video as a Mat. Mat is then converted into a String (JSON) and published it to Kafka.
You can then, transform these objects as per their requirement (into Buffered Image for live streaming application) or work with the raw form (for face detection application). This will be the desired solution as it exhibits reusability by allowing a publisher application to produce data for multiple consumers.
Related
I'm trying to write an app which will capture a video stream of the screen and send it to a remote client. I've found out that the best way to capture a screen on Windows is to use DXGI Desktop Duplication API (available since Windows 8). Microsoft provides a neat sample which streams duplicated frames to screen. Now, I've been wondering what is the easiest, but still relatively fast way to encode those frames and send them over the network.
The frames come from AcquireNextFrame with a surface that contains the desktop bitmap and metadata which contains dirty and move regions that were updated. From here, I have a couple of options:
Extract a bitmap from a DirectX surface and then use an external library like ffmpeg to encode series of bitmaps to H.264 and send it over RTSP. While straightforward, I fear that this method will be too slow as it isn't taking advantage of any native Windows methods. Converting D3D texture to a ffmpeg-compatible bitmap seems like unnecessary work.
From this answer: convert D3D texture to IMFSample and use MediaFoundation's SinkWriter to encode the frame. I found this tutorial of video encoding, but I haven't yet found a way to immediately get the encoded frame and send it instead of dumping all of them to a video file.
Since I haven't done anything like this before, I'm asking if I'm moving in the right direction. In the end, I want to have a simple, preferably low latency desktop capture video stream, which I can view from a remote device.
Also, I'm wondering if I can make use of dirty and move regions provided by Desktop Duplication. Instead of encoding the frame, I can send them over the network and do the processing on the client side, but this means that my client has to have DirectX 11.1 or higher available, which is impossible if I would want to stream to a mobile platform.
You can use IMFTransform interface for H264 encoding. Once you get IMFSample from ID3D11Texture2D just pass it to IMFTransform::ProcessInput and get the encoded IMFSample from IMFTransform::ProcessOutput.
Refer this example for encoding details.
Once you get the encoded IMFSamples you can send them one by one over the network.
I am currently working on a project of real-time camera video transmission. I have done the camera capturing using opencv and real-time encoding decoding using ffmepg.
Now my problem is that, at the decoder side, I want to let the decoder keep playing the decoded video while keep receiving the encoded packet at the same time. My supervisor told me I need to apply threading to realize this function.
Any idea how should I do this or any example program?
I'm trying to use DirectShow to capture video from webcam. I assume to use SampleGabber class. For now I see that DirectShow can only read frames continiously with some desired fps. Can DirectShow read frames by request?
DirectShow pipeline sets up streaming video. Frames will continuously stream through Sample Grabber and its callback, if you set it up. The callback itself adds minimal processing overhead if you don't force format change (to force video to be RGB in particular). It is up to whether to process or skip a frame there.
On request grabbing will be taking either last known video frame streamed, or next to go through Sample Grabber. This is typical mode of operation.
Some devices offer additional feature of taking a still on request. This is a rarer case and it's described on MSDN here: Capturing an Image From a Still Image Pin:
Some cameras can produce a still image separate from the capture
stream, and often the still image is of higher quality than the images
produced by the capture stream. The camera may have a button that acts
as a hardware trigger, or it may support software triggering. A camera
that supports still images will expose a still image pin, which is pin
category PIN_CATEGORY_STILL.
The recommended way to get still images from the device is to use the
Windows Image Acquisition (WIA) APIs. [...]
To trigger the still pin, use [...]
I use Live555 h.264 stream client to query the frame packets from an IP camera, I use ffmpeg to decode the buffer and analysis the frame by OpenCV.(those pipeline are based on testRTSPClient sample, I decode the h.264 frame buffer in DummySink::afterGettingFrame() by ffmpeg)
And now I wanna stream the frame to another client(remote client) OnDemand mode in real-time, the frame may added the analysis result(boundingboxs, text, etc), how to use Live555 to achieve this?
Well, your best bet is to re-encode the resultant frame (with bounding boxes etc), and pass this to an RTSPServer process which will allow you to connect to it using an rtsp url, and stream the encoded data to any compatible rtsp client. There is a good reference on the FAQ for how to do this http://www.live555.com/liveMedia/faq.html#liveInput which walks you through the steps taken, and provides example source code which you can modify for your needs.
Normally, I can get a still snapshot from an IP camera with a vendor provided url. However, the jpegs served this way are not of good enough quality and the vendor says there is no facility provided for serving snapshots in other image formats or smaller/lossless compression.
I noticed when I open an rtsp h264 stream from the camera with VLC then manually take a screenshot, the resulting image has none of the jpeg artifacts observed previously.
The question is, how would I obtain these superior snapshots from an h264 stream with a c++ program? I need to perform multiple operations on the image (annotations, cropping, face recognition) but those have to come after getting as high quality as possible initial image.
(note that this is related to my previous question. I obtained jpeg images with CURL but would now like to replace the snapshot getter with this new one if possible. I am again running on linux, Fedora 11)
You need an RTSP client implementation to connect to the camera, start receiving video feed, defragment/depacketize the video frame and then you will get it and save/process/present as needed.
You might want to look towards live555 library as a well known RTSP library/implemetnation.