OpenCV high resolution camera capture && display - c++

Two questions (both related):
[q1] I've read through the documentation of OpenCV; it seems no matter what the backing store for a cv::Mat is (byte, float, double), the display is always rendered down to 256 levels per channel for the bgr format for display. What is currently the best method to bypass this limitation? Can I somehow use OpenGL integration to pipe directly to an opengl display?
[q2] Is it possible to back the display with a higher precision cv::Mat and do a dynamic transform on presentation? Again, OpenGL? So if I have a Mat that is 1024 x 768, and wish to keep it like this, but declare the window to have a different dimension (512 x 384) so that feeding it the 1024 cv::Mat on presentation scales down to 512, but the underlying interface (zoom in particular) retains the 1024 resolution, is this possible?
I suspect the answer is no to both (as possibilities directly addressed by OpenCV) based on their documentation - so the OpenGL integration would be interesting in this respect. If it makes a difference, I have the OpenCV with Qt involved. I don't want to have to write a bunch of GUI code. I just want to have the data in OpenCV, call a correct display (compatible with OpenCV) and have the data presented in a somewhat higher-fidelity version than what is currently being done with imshow.

Related

How can I horizontally mirror video in DirectShow?

I need to display the local webcam stream on the screen,
horizontally flipped,
so that the screen appears as a mirror.
I have a DirectShow graph which does all of this,
except for mirroring the image.
I have tried several approaches to mirror the image,
but none have worked.
Approach A: VideoControlFlag_FlipHorizontal
I tried setting the VideoControlFlag_FlipHorizontal flag
on the output pin of the webcam filter,
like so:
IAMVideoControl* pAMVidControl;
IPin* pWebcamOutputPin;
// ...
// Omitting error-handing for brevity
pAMVidControl->SetMode(pWebcamOutputPin, VideoControlFlag_FlipHorizontal);
However, this has no effect.
Indeed, the webcam filter claims to not have this capability,
or any other capabilities:
long supportedModes;
hr = pAMVidControl->GetCaps(pWebcamOutputPin, &supportedModes);
// Prints 0, i.e. no capabilities
printf("Supported modes: %ld\n", supportedModes);
Approach B: SetVideoPosition
I tried flipping the image by flipping the rectangles passed to SetVideoPosition.
(I am using an Enhanced Video Renderer filter, in windowless mode.)
There are two rectangles:
a source rectangle and a destination rectangle.
I tried both.
Here's approach B(i),
flipping the source rectangle:
MFVideoNormalizedRect srcRect;
srcRect.left = 1.0; // note flipped
srcRect.right = 0.0; // note flipped
srcRect.top = 0.0;
srcRect.bottom = 0.5;
return m_pVideoDisplay->SetVideoPosition(&srcRect, &destRect);
This results in nothing being displayed.
It works in other configurations,
but appears to dislike srcRect.left > srcRect.right.
Here's approach B(ii),
flipping the destination rectangle:
RECT destRect;
GetClientRect(hwnd, &destRect);
LONG left = destRect.left;
destRect.left = destRect.right;
destRect.right = left;
return m_pVideoDisplay->SetVideoPosition(NULL, &destRect);
This also results in nothing being displayed.
It works in other configurations,
but appears to dislike destRect.left > destRect.right.
Approach C: IMFVideoProcessorControl::SetMirror
IMFVideoProcessorControl::SetMirror(MF_VIDEO_PROCESSOR_MIRROR)
sounds like what I want.
This IMFVideoProcessorControl interface is implemented by the Video Processor MFT.
Unfortunately, this is a Media Foundation Transform,
and I can't see how I could use it in DirectShow.
Approach D: Video Resizer DSP
The Video Resizer DSP
is "a COM object that can act as a DMO",
so theoretically I could use it in DirectShow.
Unfortunately, I have no experience with DMOs,
and in any case,
the docs for the Video Resizer don't say whether it would support flipping the image.
Approach E: IVMRMixerControl9::SetOutputRect
I found
IVMRMixerControl9::SetOutputRect,
which explicitly says:
Because this rectangle exists in compositional space,
there is no such thing as an "invalid" rectangle.
For example, set left greater than right to mirror the video in the x direction.
However, IVMRMixerControl9 appears to be deprecated,
and I'm using an EVR rather than a VMR,
and there are no docs on how to obtain a IVMRMixerControl9 anyway.
Approach F: Write my own DirectShow filter
I'm reluctant to try this one unless I have to.
It will be a major investment,
and I'm not sure it will be performant enough anyway.
Approach G: start again with Media Foundation
Media Foundation would possibly allow me to solve this problem,
because it provides "Media Foundation Transforms".
But it's not even clear that Media Foundation would fit all my other requirements.
I'm very surprised that I am looking at such radical solutions
for a transform that seems so standard.
What other approaches exist?
Is there anything I've overlooked in the approaches I've tried?
How can I horizontally mirror video in DirectShow?
If Option E does not work (see comment above; neither source nor destination rectangle allows mirroring), and given that it's DirectShow I would offer looking into Option F.
However writing a full filter might be not so trivial if you never did this before. There are a few shortcuts here though. You don't need to develop a full filter: similar functionality can be reached at least using two alternate methods:
Sample Grabber Filter with a ISampleGrabberCB::SampleCB callback. You will find lots of mentions for this technic: when inserted into graph your code can receive a callback for every processed frame. If you rearrange pixels in frame buffer within the callback, the image will be mirrored.
Implement a DMO and insert it into filter graph with the help of DMO Wrapper Filter. You will have a chance to similarly rearrange pixels of frames, with a bit more of flexibility at the expense of more code to write.
Both mentioned will be easier to do because you don't have to use DirectShow BaseClasses, which are notoriously obsolete in 2020.
Both mentioned will not require to understand data flow in DirectShow filter. Both and also developing full DirectShow filter assume that your code supports rearrangement in a limited set of pixel formats. You can go with 24-bit RGB for example, or typical formats of webcams such as NV12 (nowadays).
If your pixel data rearrangement is well done without need to super-optimize the code, you can ignore performance impact - either way it can be neglected in most of the cases.
I expect integration of Media Foundation solution to be more complicated, and much more complicated if Media Foundation solution is to be really well optimized.
The complexity of the problem in first place is the combination of the following factors.
First, you mixed different solutions:
Mirroring right in web camera (driver) where your setup to mirror results that video frames are already mirrored at the very beginning.
Mirroring as data flows through pipeline. Even though this sounds simple, it is not: sometimes the frames are yet compressed (webcams quite so often send JPEGs), sometimes frames can be backed by video memory, there are multiple pixel formats etc
Mirroring as video is presented.
Your approach A is #1 above. However if there is no support for the respected mode, you can't mirror.
Mirroring in EVR renderer #3 is apparently possible in theory. EVR used Direct3D 9 and internally renders a surface (texture) into scene so it's absolutely possible to setup 3D position of the surface in the way that it becomes mirrored. However, the problem here is that API design and coordinate checks are preventing from passing mirroring arguments.
Then Direct3D 9 is pretty much deprecated, and DirectShow itself and even DirectShow/Media Foundation's EVR are in no way compatible to current Direct3D 11. Even though a capability to mirror via hardware might exist, you might have hard time to consume it through the legacy API.
As you want a simple solution you are limited with mirroring as the data is streamed through, #2 that is. Even though this is associated with reasonable performance impact you don't need to rely on specific camera or video hardware support: you just swap the pixels in every frame and that's it.
As I mentioned the easiest way is to setup SampleCB callback using either 24-bit RGB and/or NV12 pixel format. It depends on whatever else your application is doing too, but with no such information I would say that it is sufficient to implement 24-bit RGB and having the video frame data you would just go row by row and swap the three byte pixel data width/2 times. If the application pipeline allows you might want to have additional code path to flip NV12, which is similar but does not have the video to be converted to RGB in first place and so is a bit more efficient. If NV12 can't work, RGB24 would be a backup code path.
See also: Mirror effect with DirectShow.NET - I seem to already explained something similar 8 years ago.

Using OpenGL to perform video compositing with YUV color format - performance

I have written a C/C++ implementation of what I term a "compositor" (I come from a video background) to composite/overlay video/graphics on the top of a video source. My current compositor implementation is rather naive and there is room for CPU optimization improvements (ex: SIMD, threading, etc).
I've created a high-level diagram of what I am currently doing:
The diagram is self explanatory. Nonetheless, I'll elaborate on some of the constraints:
The main video always comes served in an 8-bit YUV 4:2:2 packed format
The secondary video (optional) will come served in either an 8-bit YUV 4:2:2 or YUVA 4:2:2:4 packed format.
The output from the overlay must come out in an 8-bit YUV 4:2:2 packed format
Some other bits of information:
The number of graphics inputs will vary; it may (or may not) be a constant value.
The colour format of the Graphics can be pinned to either ARGB or YUVA format (ie. I can provide it as you see fit). At the moment, I pin it to YUVA to keep a consistent colour format.
The potential of using OpenGL and accompanying shaders is rather appealing:
No need to reinvent the wheel (in terms of actually performing the composition)
The possibility of using GPU where available.
My concern with using OpenGL is performance. Looking around on the web, it is my understanding that a YUV surface would be converted to RGB internally; I would like to minimize the number of colour format conversions and ensure optimal performance. Without prior OpenGL experience, I hope someone can shed some light and suggest if I'm about to venture down the wrong path.
Perhaps my concern relating to performance is less of an issue when using a dedicated GPU? Do I need to consider separate code paths:
Hardware with GPU(s)
Hardware with only CPU(s)?
Additionally, am I going to struggle when I need to process 10-bit YUV?
You should be able to treat YUV as independent channels throughout. OpenGL shaders will be calling them r, g, and b, but it's just data that can be treated as whatever you want.
Most GPUs will support 10 bits per channel (+ 2 alpha bits). Various will support 16 bits per channel for all 4 channels but I'm a little rusty here so I have no idea how common support is for this. Not sure about the 4:2:2 data, but you can always treat it as 3 separate surfaces.
The number of graphics inputs will vary; it may (or may not) be a constant value.
This is something I'm a little less sure about. Shaders like this to be predictable. If your implementation allows you to add each input iteratively then you should be fine.
As an alternative suggestion, have you looked into OpenCL?

background extraction using OpenCv

I want to extract the background from a video but i don't want to use cv::bgsegm::BackgroundSubtractorMOG, cv::BackgroundSubtractorMOG2 these methods. because they using frame means. But I planed to use frame comparison method. Where i'm using first frame as background model and i plane to compere pixel values of next frames with first frame pixel values and if there is no change or change less than threshold it is background pixel. How can implement these using OpenCV and C++
Your question is too vague, I think. I can only give you some hints.
First, your approach is very simplistic. That's not bad. But from my experience, it won't give great results, even if you have a lot of control over your scene. Nevertheless, I do not want to hold you back if you want to make your own experiences.
You probably want to take a look at
Operations on Arrays in OpenCV
Basic Threshold Operations in OpenCV
Everything you need should be there. In particular, the absdiff operation and the threshold function (with binary threshold type) should be of interest.

Real time drawing in GDI

I'm currently writing a 3D renderer (for fun and research), so I need a way to draw my framebuffer to a window. Since I'm doing all of my calculations on CPU, the drawing needs to be as fast as possible.
One of my goals is to use no existing graphics library (OpenGL/DirectX) so the drawing to the screen is pure Win32. In my research I've found a couple of ways to create and draw bitmaps and now I'm looking for the best one.
My current implementation uses a bitmap created with CreateDIBSection(), which is drawn to my window DC using BitBlt().
CreateDIBSection() give me a pointer to my bitmap bytes so I can manipulate it without copying. Using this method I achieve an update rate of about 260 FPS (without any rendering done).
This seems a bit slow, so I'm looking for optimizations.
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
How can I make sure my DIB bitmap and window are compatible?
Are there methods of drawing an bitmap which are faster than my current implementation?
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
I've read something about that if you don't create a bitmap with the same palette as the system palette, some slow color conversions are done.
Very few systems run in a palette mode any more, so it seems unlikely this is an issue for you.
Aside from palettes, some GDI functions also cause a color matching conversion to be applied if the source bitmap and the destination have different gamuts. BitBlt, however, does not do this type of color matching, so you're not paying a price for that.
How can I make sure my DIB bitmap and window are compatible?
You don't. You can use DIBs (which are Device-Independent Bitmaps) or compatible (device-dependent) bitmaps. It's possible that your DIB bitmap matches the current mode of your device. For example, if you're using a 32 bpp DIB, and your display is in that same mode, then no conversion is necessary. If you want a bitmap that's guaranteed to be in the same mode as your device, then you can't use a DIB and all the nice properties it provides for predictable pixel layout and format.
Are there methods of drawing an bitmap which are faster than my current implementation?
The limitation is most likely in getting the data from system memory to graphics adapter memory. To get around that limitation, you need a faster graphics bus, or you need to render directly into graphic memory, which means you'd need to do your computation on the GPU rather than the CPU.
If you're rendering a 1920 x 1080 pixel image at 24 bits per pixel, that's close to 6 MB for your frame buffer. That's an awful lot of data. If you're doing that 260 times per second, that's actually pretty impressive.
I've also read something about DrawDibDraw(), can anyone confirm that this is faster?
It's conceivable, but the only way to know would be to measure it. And the results might vary from machine to machine because of differences in the graphics adapter (and which bus they use).

Is it possible to draw in another window (Using Opencv/ffmpeg

I realize this question might be closed with the "not enough research". However I did spent like 2 days googling for it and didn't find a conclusive answer.
Well I have an application that spawns a window, not written in c++. This application can have a c-interface with dlls. Now I wish to use the power of OpenCV, so I started on a dll to extend. Ss passing image data from/to the application is near impossible (only capable of passing c-strings & double values directly - using the hard drive for drawing is slowing down too much for real time image manipulation).
I am looking into letting opencv draw the image data directly - onto the window. I can gain the window handle easily, so would it then be possible to let openCV draw their data "over" the other window - or better into the other window?
Is this even possible with any library (FFMPEG, or something else)?
Yes, it's possible, but it's far from ideal. You can use GDI to draw on top of the other window (just convert IplImage to HBITMAP). Another technique is to do such drawing in a borderless layered window.
An easier approach is, since you own both applications, to write a function that passes an IplImage between them using standard C data types, after all, IplImage is nothing but a data type that is built from these standard types.
Here is how you will disassemble IplImage into 5 standard parameters:
The size (int, int) of the image (width/height);
The (int) bit depth of the image;
The number (int) of channels;
And the (unsigned char*) pixels of the image;
After receiving these parameters on the other side, you may wonder: how do I assemble a IplImage from scratch? Call cvCreateImageHeader() followed by cvSetData().