I am writing a custom video rendering filter for Directshow. My renderer assumes the incoming pixels are organized one row of pixels at a time (correct assumption?) and blits them to another DirectX display elsewhere using a DirectX texture.
This approach works with webcams as input, but when I use an analog capture board, the samples the renderer receives are not in any expected order (see left image below). When I render the capture using the stock DirectShow video renderer, it looks fine (see right image below). So the directshow renderer must be doing something extra that my renderer is not. Any idea what it is?
Some more details:
The capture card is NTSC, I'm not sure if that would matter.
As input to the custom renderer, I am accepting only MEDIASUBTYPE_RGB24, so I do not think that this is a YUV issue (is it?).
It's a bit hard to see, but the second image below is my filter graph. My custom renderer connects to the color space converter on the far right.
I assume that the pixels coming into my renderer, are all organized one row of pixels at a time. Is this a correct assumption?
Maybe texture is padded to keep rows aligned at (multiply of) 32 bytes per row? Mind you that I did not ever use DirectShow but that's what I would expect in D3D.In other words, your input might have different stride than you think. Unfortunately I do not know DS so I can only assume that something that computes input / output coordinates should have different stride factor e.g. something in code that looks like this offset = y * stride + x.
Related
I need to display the local webcam stream on the screen,
horizontally flipped,
so that the screen appears as a mirror.
I have a DirectShow graph which does all of this,
except for mirroring the image.
I have tried several approaches to mirror the image,
but none have worked.
Approach A: VideoControlFlag_FlipHorizontal
I tried setting the VideoControlFlag_FlipHorizontal flag
on the output pin of the webcam filter,
like so:
IAMVideoControl* pAMVidControl;
IPin* pWebcamOutputPin;
// ...
// Omitting error-handing for brevity
pAMVidControl->SetMode(pWebcamOutputPin, VideoControlFlag_FlipHorizontal);
However, this has no effect.
Indeed, the webcam filter claims to not have this capability,
or any other capabilities:
long supportedModes;
hr = pAMVidControl->GetCaps(pWebcamOutputPin, &supportedModes);
// Prints 0, i.e. no capabilities
printf("Supported modes: %ld\n", supportedModes);
Approach B: SetVideoPosition
I tried flipping the image by flipping the rectangles passed to SetVideoPosition.
(I am using an Enhanced Video Renderer filter, in windowless mode.)
There are two rectangles:
a source rectangle and a destination rectangle.
I tried both.
Here's approach B(i),
flipping the source rectangle:
MFVideoNormalizedRect srcRect;
srcRect.left = 1.0; // note flipped
srcRect.right = 0.0; // note flipped
srcRect.top = 0.0;
srcRect.bottom = 0.5;
return m_pVideoDisplay->SetVideoPosition(&srcRect, &destRect);
This results in nothing being displayed.
It works in other configurations,
but appears to dislike srcRect.left > srcRect.right.
Here's approach B(ii),
flipping the destination rectangle:
RECT destRect;
GetClientRect(hwnd, &destRect);
LONG left = destRect.left;
destRect.left = destRect.right;
destRect.right = left;
return m_pVideoDisplay->SetVideoPosition(NULL, &destRect);
This also results in nothing being displayed.
It works in other configurations,
but appears to dislike destRect.left > destRect.right.
Approach C: IMFVideoProcessorControl::SetMirror
IMFVideoProcessorControl::SetMirror(MF_VIDEO_PROCESSOR_MIRROR)
sounds like what I want.
This IMFVideoProcessorControl interface is implemented by the Video Processor MFT.
Unfortunately, this is a Media Foundation Transform,
and I can't see how I could use it in DirectShow.
Approach D: Video Resizer DSP
The Video Resizer DSP
is "a COM object that can act as a DMO",
so theoretically I could use it in DirectShow.
Unfortunately, I have no experience with DMOs,
and in any case,
the docs for the Video Resizer don't say whether it would support flipping the image.
Approach E: IVMRMixerControl9::SetOutputRect
I found
IVMRMixerControl9::SetOutputRect,
which explicitly says:
Because this rectangle exists in compositional space,
there is no such thing as an "invalid" rectangle.
For example, set left greater than right to mirror the video in the x direction.
However, IVMRMixerControl9 appears to be deprecated,
and I'm using an EVR rather than a VMR,
and there are no docs on how to obtain a IVMRMixerControl9 anyway.
Approach F: Write my own DirectShow filter
I'm reluctant to try this one unless I have to.
It will be a major investment,
and I'm not sure it will be performant enough anyway.
Approach G: start again with Media Foundation
Media Foundation would possibly allow me to solve this problem,
because it provides "Media Foundation Transforms".
But it's not even clear that Media Foundation would fit all my other requirements.
I'm very surprised that I am looking at such radical solutions
for a transform that seems so standard.
What other approaches exist?
Is there anything I've overlooked in the approaches I've tried?
How can I horizontally mirror video in DirectShow?
If Option E does not work (see comment above; neither source nor destination rectangle allows mirroring), and given that it's DirectShow I would offer looking into Option F.
However writing a full filter might be not so trivial if you never did this before. There are a few shortcuts here though. You don't need to develop a full filter: similar functionality can be reached at least using two alternate methods:
Sample Grabber Filter with a ISampleGrabberCB::SampleCB callback. You will find lots of mentions for this technic: when inserted into graph your code can receive a callback for every processed frame. If you rearrange pixels in frame buffer within the callback, the image will be mirrored.
Implement a DMO and insert it into filter graph with the help of DMO Wrapper Filter. You will have a chance to similarly rearrange pixels of frames, with a bit more of flexibility at the expense of more code to write.
Both mentioned will be easier to do because you don't have to use DirectShow BaseClasses, which are notoriously obsolete in 2020.
Both mentioned will not require to understand data flow in DirectShow filter. Both and also developing full DirectShow filter assume that your code supports rearrangement in a limited set of pixel formats. You can go with 24-bit RGB for example, or typical formats of webcams such as NV12 (nowadays).
If your pixel data rearrangement is well done without need to super-optimize the code, you can ignore performance impact - either way it can be neglected in most of the cases.
I expect integration of Media Foundation solution to be more complicated, and much more complicated if Media Foundation solution is to be really well optimized.
The complexity of the problem in first place is the combination of the following factors.
First, you mixed different solutions:
Mirroring right in web camera (driver) where your setup to mirror results that video frames are already mirrored at the very beginning.
Mirroring as data flows through pipeline. Even though this sounds simple, it is not: sometimes the frames are yet compressed (webcams quite so often send JPEGs), sometimes frames can be backed by video memory, there are multiple pixel formats etc
Mirroring as video is presented.
Your approach A is #1 above. However if there is no support for the respected mode, you can't mirror.
Mirroring in EVR renderer #3 is apparently possible in theory. EVR used Direct3D 9 and internally renders a surface (texture) into scene so it's absolutely possible to setup 3D position of the surface in the way that it becomes mirrored. However, the problem here is that API design and coordinate checks are preventing from passing mirroring arguments.
Then Direct3D 9 is pretty much deprecated, and DirectShow itself and even DirectShow/Media Foundation's EVR are in no way compatible to current Direct3D 11. Even though a capability to mirror via hardware might exist, you might have hard time to consume it through the legacy API.
As you want a simple solution you are limited with mirroring as the data is streamed through, #2 that is. Even though this is associated with reasonable performance impact you don't need to rely on specific camera or video hardware support: you just swap the pixels in every frame and that's it.
As I mentioned the easiest way is to setup SampleCB callback using either 24-bit RGB and/or NV12 pixel format. It depends on whatever else your application is doing too, but with no such information I would say that it is sufficient to implement 24-bit RGB and having the video frame data you would just go row by row and swap the three byte pixel data width/2 times. If the application pipeline allows you might want to have additional code path to flip NV12, which is similar but does not have the video to be converted to RGB in first place and so is a bit more efficient. If NV12 can't work, RGB24 would be a backup code path.
See also: Mirror effect with DirectShow.NET - I seem to already explained something similar 8 years ago.
I'm using Open CV 2.4.6 with C++ (with Python sometimes too but it is irrelevant). I would like to know if there is a simple way to get all the available frame sizes from a capture device?
For example, my webcam can provide 640x480, 320x240 and 160x120. Suppose that I don't know about these frame sizes a priori... Is it possible to get a vector or an iterator, or something like this that could give me these values?
In other words, I don't want to get the current frame size (which is easy to obtain) but the sizes I could set the device to.
Thanks!
When you retrieve a frame from a camera, it is the maximum size that that camera can give. If you want a smaller image, you have to specify it when you get the image, and opencv will resize it for you.
A normal camera has one sensor of one size, and it sends one kind of image to the computer. What opencv does with it thereafter is up to you to specify.
The scenario I'm dealing with is actually as follow: I need to get the screen generated by OpenGL and send it through HDMI to a FPGA component while keeping the alpha channel. But right now the data that is being sent through HDMI is only RGB (24bit without alpha channel) So i need a way to force sending the Alpha bits through this port somehow.
See image: http://i.imgur.com/hhlcbb9.jpg
One solution i could think of is to convert the screen buffer from RGBA mode to RGB while mixing the Alpha channels within the RGB buffer.
For example:
The original buffer: [R G B A][R G B A][R G B A]
The output i want: [R G B][A R G][B A R][G B A]
The point is not having to go through every single pixels.
But I'm not sure if it's possible at all using OpenGL or any technology (VideoCore kernel?)
opengl frame buffer
Do you actually mean a framebuffer, or some kind of texture? Because framebuffers cannot be resized, and the size of this resulting image will be larger in the number of pixels by 25%. You can't actually do that.
You could do it with a texture, but only by resizing it. You would have to get the texel data with glGetTexImage into some buffer, then upload the texel data to another texture with glTexImage2D. You would simply change the pixel transfer format and texture width appropriately. The read would use GL_RGBA, and the write would use GL_RGB, with an internal format of GL_RGB8.
The performance of this will almost certainly not be very good. But it should work.
But no, there is no simple call you can make to cause this to happen.
You may be able to send the alpha channel separately in opengl via different rgb or hdmi video output on your video card.
So your pc now outputs RGB down one cable, and then the Alpha down the other cable. Your alpha probably needs to be converted to grey scale so that it is still 24 bits.
You then select the which signal is the key in your fpga.
I'm presuming your fpga supports chroma key.
I have done something similar before but using a program called Brainstorm which uses a specific video card that supports SDI out and it splits the RGB, and the Alpha into separate video channels(cables), and then the vision mixer does the keying.
Today, however, I have created a new solution which mixes the multiple video channels first on the pc, and then outputs the final mixed video directly to either a RTMP streaming server or to a DVI to SDI scan converter.
Right now, what I'm trying to do is to make a new GUI, essentially a software using directX (more exact, direct3D), that display streaming images from Axis IP cameras.
For the time being I figured that the flow for the entire program would be like this:
1. Get the Axis program to get streaming images
2. Pass the images to the Direct3D program.
3. Display the program, on the screen.
Currently I have made a somewhat basic Direct3D app that loads and display video frames from avi videos(for testing). I dunno how to load images directly from videos using DirectX, so I used OpenCV to save frames from the video and have DX upload them up. Very slow.
Right now I have some unclear things:
1. How to Get an Axis program that works in C++ (gonna look up examples later, prolly no big deal)
2. How to upload images directly from the Axis IP camera program.
So guys, do you have any recommendations or suggestions on how to make my program work more efficiently? Anything just let me know.
Well you may find it faster to use directshow and add a custom renderer at the far end that, directly, copies the decompressed video data directly to a Direct3D texture.
Its well worth double buffering that texture. ie have texture 0 displaying and texture 1 being uploaded too and then swap the 2 over when a new frame is available (ie display texture 1 while uploading to texture 0).
This way you can de-couple the video frame rate from the rendering frame rate which makes dropped frames a little easier to handle.
I use in-place update of Direct3D textures (using IDirect3DTexture9::LockRect) and it works very fast. What part of your program works slow?
For capture images from Axis cams you may use iPSi c++ library: http://sourceforge.net/projects/ipsi/
It can be used for capturing images and control camera zoom and rotation (if available).
I'm relatively new to DirectX and have to work on an existing C++ DX9 application. The app does tracking on a camera images and displays some DirectDraw (ie. 2d) content. The camera has an aspect ratio of 4:3 (always) and the screen is undefined.
I want to load a texture and use this texture as a mask, so tracking and displaying of the content only are done within the masked area of the texture. Therefore I'd like to load a texture that has exactly the same size as the camera images.
I've done all steps to load the texture, but when I call GetDesc() the fields Width and Height of the D3DSURFACE_DESC struct are of the next bigger power-of-2 size. I do not care that the actual memory used for the texture is optimized for the graphics card but I did not find any way to get the dimensions of the original image file on the harddisk.
I do (and did, but with no success) search a possibility to load the image into the computers RAM only (graphicscard is not required) without adding a new dependency to the code. Otherwise I'd have to use OpenCV (which might anyway be a good idea when it comes to tracking), but at the moment I still try to avoid including OpenCV.
thanks for your hints,
Norbert
D3DXCreateTextureFromFileEx with parameters 3 and 4 being
D3DX_DEFAULT_NONPOW2.
After that, you can use
D3DSURFACE_DESC Desc;
m_Sprite->GetLevelDesc(0, &Desc);
to fetch the height & width.
D3DXGetImageInfoFromFile may be what you are looking for.
I'm assuming you are using D3DX because I don't think Direct3D automatically resizes any textures.