webcam "still pin" capture - c++

I am trying to replicate the image quality that is achieved when using the Logitech webcam driver to capture a still image.
The Logitech forum has several threads about the subject unfortunately they all point to a website which is down. such as here.
I am currently able to use DirectShow and a frame grabber to capture images, but they are nowhere near the quality of the snapshot button. Could anyone point me to the direction of a working c++/c example of a snapshot button?
After some research I found this about the Still Image pin, is this the correct method for implementing a snapshot like button?
The webcam I am using the c910 and is capable of taking 10 mega-pixel still images.
Thanks for any help.

My best guess, which I'll use to gather some upvotes (or downvotes), and which will be valid until someone disassembles the application or the driver, is:
Something alike http://www1.idc.ac.il/toky/videoproc-07/projects/superres/srproject.html was used at the application level to enhance the resolution of the images collected as a video.
Rationale: having a friend pulling his hair over simpler things inside the driver, I can only imagine how difficult it should be to code such an algorithm INSIDE the driver with extremely limited set of libraries.
I won't mind taking downvotes here, since I'm too interested in this subject, but please have some information available on the subject.

I did not have a chance to deal with this directly, however I suspect that high resolution images captured from the camera are a result of taking a sequence of images followed by "superresolution" post-processing. This functionality might be unavailable via DirectShow API, since it mostly covers video streaming. However, the camera driver might also make it available via Windows Image Acquisition API, where you might have better luck taking oversampled snapshots of the quality you are looking for.

Related

Object Tracking in h.264 compressed video

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.

DirectShow webcam recording

I need to use DirectShow (C++) for recording a webcam and saving the data to a file.
I really don't know how DirectShow works, this is a "stage" (working experience), but at school we didn't study it.
I think the best way to implement this could be:
List the video devices connected to the computer
Select the correct camera (there will be only one)
Retrieve the video
Save it to a file
Now there are two problems:
Where can I find a good reference book or how do I start?
The saved video shouldn't be too big, does DirectShow provide a way to compress it?
I won't use OpenCV because sometime it doesn't work properly (It doesn't find the camera).
Are there any high level wrapper that could help?
EDIT: the program won't have a window, it will run in background called by a dll.
Where can I find a good reference book or how do I start?
DirectShow introduction material
The saved video shouldn't be too big, does DirectShow provide a way to compress it?
Yes it provides capabilities to attach codecs, that needs to be installed in the system. These are typically third party codecs (for reasons beyond the scope of brief answer). You might want to record into Windows Media files to not depend on third party codecs. SWee more on MSDN: Choosing a Compression Filter.

Intercepting video frames from game

I would like to grab video frames (images) from a game that is launched at PC at the moment.
XSplit Broadcaster has such functionality. It somehow listing the processes that are actually video games and allows to grab video frames.
As far as I understand, it can be accomplished by enumerating Direct3D surfaces that are running at the moment and grab the picture from it.
Am I correct? What is the solution for OpenGL games then?
Have you checked out glReadPixels()? I have used it before. It is a little slow though.
Try
glReadPixels(0,0,width, height,GL_RGB, GL_UNSIGNED_BYTE,buffer);
apitrace seems able to capture frames using Ye Olde LD_PRELOAD Tricke.

How to make rgbdemo working with non-kinect stereo cameras?

I was trying to get RGBDemo(mostly reconstructor) working with 2 logitech stereo cameras, but I did not figure out how to do it.
I noticed that there is a opencv grabber in nestk library and its header file is included in the reconstructor.cpp. Yet, when I try "rgbd-viewer --camera-id 0", it keeps looking for kinect.
My questions:
1. Is RGBDemo only working with kinect so far?
2. If RGBDemo can work with non-kinect stereo cameras, how do I do that?
3. If I need to write my own implementation for non-kinect stereo cameras, any suggestion on how to start?
Thanks in advance.
if you want to do it with non-kinect cameras. You don't even need stereo. There are algorithms now that are able to determine whether two images' viewpoints are sufficiently different that they can be used as if they were taken by a stereo camera. In fact, they use images from different cameras that are found on the internet and reconstruct 3D models of famous places. I can write you a tutorial on how to get it working. I've been meaning to do so. The software is called Bundler. Along with Bundler, people often also use CMVS and PMVS. CMVS preprocesses the images for PMVS. PMVS generates dense clouds.
BUT! I highly recommend that you don't go this route. It makes a lot of mistakes because there is so much less information in 2D images. It makes it very hard to reconstruct the 3D model. So, it ends up making a lot of mistakes, or not working. Although Bundler and PMVS are awesome compared to previous software, the stuff you can do with kinect is on a whole other level.
To use kinect will only cost you $80 for the kinect off of ebay or $99 off of amazon and another $5 for the power adapter off of amazon. So, I'd highly recommend this route. Kinect provides much more information for the algorithm to work with than 2D images do, making it much more effective, reliable and fast. In fact, it could take hours to process images with Bundler and PMVS. Whereas with kinect, I made a model of my desk in just a few seconds! It truly rocks!

Using DirectShow to capture frames and OpenCV to Process

I have made two different solutions for Video-to-Image Capturing and was wondering if I could intertwine the best of both worlds. I am currently using DirectShow to load in an AVI file and capture images. However, DirectShow's lack of image processing capabilities and the need to make additional filters have stopped me dead in my tracks.
I then turned to OpenCV.
It has all the image processing functions I need, but it has trouble loading in the videos that the DirectShow solution was able to retrieve. Are there any tutorials online about this process or anything close to it? Thanks for any advice.
Yes, here is a link to an article: http://opencv.willowgarage.com/wiki/DirectShow