I am currently using opencv2.4.8 with three cameras on Win 7 in C++. I understand how to read from a camera through VideoCapture cap etc but I am having trouble identifying which camera is which.
From what I can understand, cap.open(int num) takes in the camera index. In my case, these are 0, 1, & 2.
CAMA = 0
CAMB = 1
CAMC = 2
If however, before starting up the program, cam at index 0 becomes unplugged, my cameras index change.
CAMA //GONE
CAMB = 0
CAMC = 1
So is there some way to remember which camera is which (other than displaying each camera to the user every single time...)? Perhaps logging some unique ID for the camera that can be then read in on restart?
In the device manager, I see under the properties of the USB device - Device Class guid which seems to be a somewhat persistent value. Is there a way to correlate this ID to the cam index?
I have searched here and tried the code here. I imagine stereovision users commonly run into this problem, so how have people managed this?
Not really an answer, but then I think there isn't really one.
What I've done for production applications is to do the video handling completely outside of OpenCV and then convert the frames to OpenCV images and do further processing.
On Windows you could use DirectShow, and I've used camera-specific APIs as well. Not in any way portable or convenient, but it has the benefit of working. On the plus side, you usually get access to the full set of camera settings and features, rather than just the few properties OpenCV defines.
Related
I need to display the local webcam stream on the screen,
horizontally flipped,
so that the screen appears as a mirror.
I have a DirectShow graph which does all of this,
except for mirroring the image.
I have tried several approaches to mirror the image,
but none have worked.
Approach A: VideoControlFlag_FlipHorizontal
I tried setting the VideoControlFlag_FlipHorizontal flag
on the output pin of the webcam filter,
like so:
IAMVideoControl* pAMVidControl;
IPin* pWebcamOutputPin;
// ...
// Omitting error-handing for brevity
pAMVidControl->SetMode(pWebcamOutputPin, VideoControlFlag_FlipHorizontal);
However, this has no effect.
Indeed, the webcam filter claims to not have this capability,
or any other capabilities:
long supportedModes;
hr = pAMVidControl->GetCaps(pWebcamOutputPin, &supportedModes);
// Prints 0, i.e. no capabilities
printf("Supported modes: %ld\n", supportedModes);
Approach B: SetVideoPosition
I tried flipping the image by flipping the rectangles passed to SetVideoPosition.
(I am using an Enhanced Video Renderer filter, in windowless mode.)
There are two rectangles:
a source rectangle and a destination rectangle.
I tried both.
Here's approach B(i),
flipping the source rectangle:
MFVideoNormalizedRect srcRect;
srcRect.left = 1.0; // note flipped
srcRect.right = 0.0; // note flipped
srcRect.top = 0.0;
srcRect.bottom = 0.5;
return m_pVideoDisplay->SetVideoPosition(&srcRect, &destRect);
This results in nothing being displayed.
It works in other configurations,
but appears to dislike srcRect.left > srcRect.right.
Here's approach B(ii),
flipping the destination rectangle:
RECT destRect;
GetClientRect(hwnd, &destRect);
LONG left = destRect.left;
destRect.left = destRect.right;
destRect.right = left;
return m_pVideoDisplay->SetVideoPosition(NULL, &destRect);
This also results in nothing being displayed.
It works in other configurations,
but appears to dislike destRect.left > destRect.right.
Approach C: IMFVideoProcessorControl::SetMirror
IMFVideoProcessorControl::SetMirror(MF_VIDEO_PROCESSOR_MIRROR)
sounds like what I want.
This IMFVideoProcessorControl interface is implemented by the Video Processor MFT.
Unfortunately, this is a Media Foundation Transform,
and I can't see how I could use it in DirectShow.
Approach D: Video Resizer DSP
The Video Resizer DSP
is "a COM object that can act as a DMO",
so theoretically I could use it in DirectShow.
Unfortunately, I have no experience with DMOs,
and in any case,
the docs for the Video Resizer don't say whether it would support flipping the image.
Approach E: IVMRMixerControl9::SetOutputRect
I found
IVMRMixerControl9::SetOutputRect,
which explicitly says:
Because this rectangle exists in compositional space,
there is no such thing as an "invalid" rectangle.
For example, set left greater than right to mirror the video in the x direction.
However, IVMRMixerControl9 appears to be deprecated,
and I'm using an EVR rather than a VMR,
and there are no docs on how to obtain a IVMRMixerControl9 anyway.
Approach F: Write my own DirectShow filter
I'm reluctant to try this one unless I have to.
It will be a major investment,
and I'm not sure it will be performant enough anyway.
Approach G: start again with Media Foundation
Media Foundation would possibly allow me to solve this problem,
because it provides "Media Foundation Transforms".
But it's not even clear that Media Foundation would fit all my other requirements.
I'm very surprised that I am looking at such radical solutions
for a transform that seems so standard.
What other approaches exist?
Is there anything I've overlooked in the approaches I've tried?
How can I horizontally mirror video in DirectShow?
If Option E does not work (see comment above; neither source nor destination rectangle allows mirroring), and given that it's DirectShow I would offer looking into Option F.
However writing a full filter might be not so trivial if you never did this before. There are a few shortcuts here though. You don't need to develop a full filter: similar functionality can be reached at least using two alternate methods:
Sample Grabber Filter with a ISampleGrabberCB::SampleCB callback. You will find lots of mentions for this technic: when inserted into graph your code can receive a callback for every processed frame. If you rearrange pixels in frame buffer within the callback, the image will be mirrored.
Implement a DMO and insert it into filter graph with the help of DMO Wrapper Filter. You will have a chance to similarly rearrange pixels of frames, with a bit more of flexibility at the expense of more code to write.
Both mentioned will be easier to do because you don't have to use DirectShow BaseClasses, which are notoriously obsolete in 2020.
Both mentioned will not require to understand data flow in DirectShow filter. Both and also developing full DirectShow filter assume that your code supports rearrangement in a limited set of pixel formats. You can go with 24-bit RGB for example, or typical formats of webcams such as NV12 (nowadays).
If your pixel data rearrangement is well done without need to super-optimize the code, you can ignore performance impact - either way it can be neglected in most of the cases.
I expect integration of Media Foundation solution to be more complicated, and much more complicated if Media Foundation solution is to be really well optimized.
The complexity of the problem in first place is the combination of the following factors.
First, you mixed different solutions:
Mirroring right in web camera (driver) where your setup to mirror results that video frames are already mirrored at the very beginning.
Mirroring as data flows through pipeline. Even though this sounds simple, it is not: sometimes the frames are yet compressed (webcams quite so often send JPEGs), sometimes frames can be backed by video memory, there are multiple pixel formats etc
Mirroring as video is presented.
Your approach A is #1 above. However if there is no support for the respected mode, you can't mirror.
Mirroring in EVR renderer #3 is apparently possible in theory. EVR used Direct3D 9 and internally renders a surface (texture) into scene so it's absolutely possible to setup 3D position of the surface in the way that it becomes mirrored. However, the problem here is that API design and coordinate checks are preventing from passing mirroring arguments.
Then Direct3D 9 is pretty much deprecated, and DirectShow itself and even DirectShow/Media Foundation's EVR are in no way compatible to current Direct3D 11. Even though a capability to mirror via hardware might exist, you might have hard time to consume it through the legacy API.
As you want a simple solution you are limited with mirroring as the data is streamed through, #2 that is. Even though this is associated with reasonable performance impact you don't need to rely on specific camera or video hardware support: you just swap the pixels in every frame and that's it.
As I mentioned the easiest way is to setup SampleCB callback using either 24-bit RGB and/or NV12 pixel format. It depends on whatever else your application is doing too, but with no such information I would say that it is sufficient to implement 24-bit RGB and having the video frame data you would just go row by row and swap the three byte pixel data width/2 times. If the application pipeline allows you might want to have additional code path to flip NV12, which is similar but does not have the video to be converted to RGB in first place and so is a bit more efficient. If NV12 can't work, RGB24 would be a backup code path.
See also: Mirror effect with DirectShow.NET - I seem to already explained something similar 8 years ago.
I want to build a depth camera that finds out any image from particular distance. I have already read the following link.
http://www.i-programmer.info/news/194-kinect/7641-microsoft-research-shows-how-to-turn-any-camera-into-a-depth-camera.html
https://jahya.net/blog/how-depth-sensor-works-in-5-minutes/
But couldn't understand clearly which hardware requirements need & how to integrated into all together?
Thanks
Certainly, a depth sensor needs an IR sensor, just like in Kinect or Asus Xtion and other cameras available that provides the depth or range image. However, Microsoft came up with machine learning techniques and using algorithmic modification and research which you can find here. Also here is a video link which shows the mobile camera that has been modified to get depth rendering. But some hardware changes might be necessary if you make a standalone 2D camera into a new performing device. So I would suggest you to see the hardware design of the existing market devices as well.
one way or the other you would need two angles to the same points to get a depth. So search for depth sensors and examples e.g. kinect with ros or openCV or here
also you could transfere two camera streams into a point cloud but that's another story
Here's what I know:
3D Cameras
RGBD and Stereoscopic cameras are popular for these applications but are not always practical / available. I've prototyped with Kinects (v1,v2) and intel cameras (r200,d435). Certainly those are preferred even today.
2D Cameras
IF YOU WANT TO USE RGB DATA FOR DEPTH INFO then you need to have an algorithm that will process the math for each frame; try an RGB SLAM. A good algo will not process ALL the data every frame but it will process all the data once and then look for clues to support evidence of changes to your scene. A number of BIG companies have already done this (it's not that difficult if you have a big team w big money) think Google, Apple, MSFT, etc etc.
Good luck out there, make something amazing!
I would like to do skeletal tracking simultaneously from two kinect cameras in SkeletalViewer and obtain the skeleton result. As in my understanding, the Nui_Init() only process the threads for first Kinect (which I suppose index = 0). However could I have the two skeletal tracking run at the same time as I would like to output their result respectively into two text files at the same time.
(eg. for Kinect 0 output to "cam0.txt" while Kinect 1 output to "cam1.txt")
Does anyone has experience in such case or able to help?
Regards,
Eva
PS: I read this from Kinect SDK documentation state that:
If you are using multiple Kinect sensors, skeleton tracking works only on the first device that you Initialize. To switch the device being used to track, uninitialized the old one and initialize the new one.
So is it possible if I want to acquire the coordinates simultaneously? Or even if acquire one by one, how should I call them separately? (as I realize the index of the active Kinect will be 0 which I can't differentiate them).
I assume that you are using MS SkeletalViewer example. The problem with their SkeletalViewer is that they closely tied the display and the skeleton tracking. This make it difficult to change.
Using multiple kinect sensor should be possible, you just need to initialize all the sensors the same way. The best thing to do would be to define a sensor class to wrap kinect sensors. If you don't need the display, you can just write a new program. That's a bit of work but not that much, you can probably get a fully working program for multiple sensors in less than 100 lines. If you need the display, you can rewrite the SkeletalViewer example to use your sensor class but that's more tedious.
I have: Linux laptop with built-in camera, 2 other cameras, OpenCV-based program.
I need: to pass the device number of those two cameras to the program automatically.
In OpenCV you open a camera with videoCapture.open(n);, where videoCapture is an object of a cv::VideoCapture and n is device number of the camera you wish to open. My program uses two webcams. Here OpenCV part of this question is over.
Usually this n is hard-coded or manually passed by user. I want to write a script that would automatically detect the device number of two desired cameras. But built-in camera is the obstacle.
When loaded, Linux defaults the device number of the built-in camera to 0. Two connected USB-cameras get the numbers 1 and 2 accordingly. And when you reboot the laptop you most probably get all the numbers mixed up, e.g. built-in camera - 1, 0 and 2 - for USB-cameras. And anyways I have to change the device numbers in the code when I switch the platfrom and execute the program on desktop which have no built-in cameras.
I thought I could write sh script that would parse an output of lsusb | grep Logitech (those two USB webcams are Logitech ones) and get the device number according to USB Bus number but that rebooting thing obstructs this.
I would appreciate any ideas and thoughts on what I could check out for the problem.
I think you should enumerate the USB devices--you can see details on how to do it here:
Enum USB devices in Linux/C++
For a previous question similar to yours, see: How to count cameras in OpenCV 2.3?
Right now, what I'm trying to do is to make a new GUI, essentially a software using directX (more exact, direct3D), that display streaming images from Axis IP cameras.
For the time being I figured that the flow for the entire program would be like this:
1. Get the Axis program to get streaming images
2. Pass the images to the Direct3D program.
3. Display the program, on the screen.
Currently I have made a somewhat basic Direct3D app that loads and display video frames from avi videos(for testing). I dunno how to load images directly from videos using DirectX, so I used OpenCV to save frames from the video and have DX upload them up. Very slow.
Right now I have some unclear things:
1. How to Get an Axis program that works in C++ (gonna look up examples later, prolly no big deal)
2. How to upload images directly from the Axis IP camera program.
So guys, do you have any recommendations or suggestions on how to make my program work more efficiently? Anything just let me know.
Well you may find it faster to use directshow and add a custom renderer at the far end that, directly, copies the decompressed video data directly to a Direct3D texture.
Its well worth double buffering that texture. ie have texture 0 displaying and texture 1 being uploaded too and then swap the 2 over when a new frame is available (ie display texture 1 while uploading to texture 0).
This way you can de-couple the video frame rate from the rendering frame rate which makes dropped frames a little easier to handle.
I use in-place update of Direct3D textures (using IDirect3DTexture9::LockRect) and it works very fast. What part of your program works slow?
For capture images from Axis cams you may use iPSi c++ library: http://sourceforge.net/projects/ipsi/
It can be used for capturing images and control camera zoom and rotation (if available).