I am creating an app in C++ (OpenGL) using Kinect. Whenever we click in OpenGL the function invoked is
void myMouseFunction( int button, int state, int mouseX, int mouseY )
{
}
But can we invoke them using Kinect? Maybe we have to use the depth buffer for it, but how?
First: You don't "click in openGL", because OpenGL doesn't deal with user input. OpenGL is purely a rendering API. What you're referring to is probably a callback to be used with GLUT; GLUT is not part of OpenGL, but a free standing framework, that also does some user input event processing.
The Kinect does not generate input events. What the Kinect does is, it returns a depth image of what it "sees". You need to process this depth image somehow. There are frameworks like OpenNI which process this depth image and translate it in gesture data or similar. You can then process such gestures data and process it further to interpret it as user input.
In your tags you referred to "openkinect", the open source drivers for the Kinect. However OpenKinect does not gesture extraction and interpretation, but only provides the depth image. You can of course perform simple tests on the depth data as well. For example testing of there's some object within the bounds of some defined volume and interpret this as sort of an event.
I think you are confusing what the Kinect really does. The Kinect feeds depth and video data to your computer, which will then have to process it. Openkinect only does very minimal processing for you -- no skeleton tracking. Skeleton tracking allows you to geta 3D representation of where each of your user's joints is.
If you're just doing some random hacking, you could perhaps switch to the KinectSDK -- with the caveat that you will only be able to develop and deploy on Windows.
KinectSDK works with OpenGL and C++, too, and you can get a said user's "skeleton".
OpenNI -- which is multiplatform and free as in freedom -- also supports skeleton tracking, but I haven't used it so I can't recommend it.
After you have some sort of skeleton tracking up, you can focus on the user's hands and process his movements to get your "mouse clicks" working. This will not use GLUT's mouse handler's though.
Related
I'm creating an augmented reality application in OpenGL where I want to augment a video stream captured by a Kinect with virtual objects. I found some running code using fixed function pipeline OpenGL that creates a glTexImage2D, fills it with the image data and then creates a GL_QUAD with glTexCoord2f to fill the screen.
I'm looking for an optimized solution using modern, shader-based OpenGL only which is also capable of handling HD video streams.
I guess what I hope for as an answer to my question is a list of possibilities on how rendering a camera video stream can be achieved in OpenGL from which I can select the one that best fits my needs.
I am trying to build a program that takes just the hand inputs from Kinect
I need acquire 3 things,
-the streaming of kinect depth video with OpenGL output data on top of it
-recognition of just 2 simple hand gestures, open hand and closed fist, I will build some function to solve it down to Boolean form for each hand
-left and right hand positions, if possible to do more than 2 hands, that would be great
Basically to do a click and drag mouse operation with open and close hand motion in kinect, let's start with just one hand, if it's possible to do more than 2 hands, I will learn that later.
From what I have read so far, Kinect could do this easily without any extra libraries, so I should be able to build my application with just Kinect library and OpenGL
I heard there are tons of examples for this online, but all I found so far are in C#, not C++, the other components for my program are only in C++ and I want to stay with C++ if possible.
There are essentially two layers:
Interaction Stream (C++ or managed)
Interaction Controls (managed only, WPF-specific)
The WPF controls are implemented in terms of the interaction stream.
If you are using a UI framework other than WPF, you will need to do the following:
Implement the "interaction client" interface. This interface has a
single method, GetInteractionInfoAtLocation. This method will be
called repeatedly by the interaction stream as it tracks the user's
hand movements. Each time it is called, it is your responsibility to
return the "interaction info" (InteractionInfo in managed,
NUI_INTERACTION_INFO in C++) for the given user, hand, and position.
Essentially, this is how the interaction stream performs hit-testing
on the controls within your user interface.
Create an instance of the interaction stream, supplying it a
reference to your interaction client implementation.
Start the Kinect sensor's depth and skeleton streams.
For each depth and skeleton frame produced by the sensor streams,
pass the frame's data to the appropriate method (ProcessDepth or
ProcessSkeleton) of the interaction stream. As the interaction stream
processes the input frames from the sensor, it will produce
interaction frames for your code to consume. In C++, call the
interaction stream's GetNextFrame method to retrieve each such frame.
In managed code, you can either call OpenNextFrame, or subscribe to
the InteractionFrameReady event.
Read the data from each interaction frame to find out what the user
is doing. Each frame has a timestamp and a collection of user info
structures, each of which has a user tracking ID and a collection of
hand info structures, which provide information about each hand's
position, state, and grip/ungrip events
You can find a complete sample here.
I am extending an existing OpenGL project with new functionality.
I can play a video stream using OpenGL with FFMPEG.
Some objects are moving in the video stream. Co-ordinates of those objects are know to me.
I need to show tracking of motion for that object, like continuously drawing a point or rectangle around the object as it moves on the screen.
Any idea how to start with it?
Are you sure you want to use OpenGL for this task? Usually for computer vision algorithms, like motion tracking one uses OpenCV. In this case you could simply use the drawing functions of OpenCV as documented here.
If you are using OpenGL you might have a look at this question because in this case I guess you draw the frames as textures.
I've read a lot of posts describing how people use AVAssetReader or AVPlayerItemVideoOutput to get video frames as raw pixel data from a video file, which they then use to upload to an OpenGL texture. However, this seems to create the needless step of decoding the video frames with the CPU (as opposed to the graphics card), as well as creating unnecessary copies of the pixel data.
Is there a way to let AVFoundation own all aspects of the video playback process, but somehow also provide access to an OpenGL texture ID it created, which can just be drawn into an OpenGL context as necessary? Has anyone come across anything like this?
In other words, something like this pseudo code:
initialization:
open movie file, providing an opengl context;
get opengl texture id;
every opengl loop:
draw texture id;
If you were to use the Video Decode Acceleration Framework on OS X, it will give you a CVImageBufferRef when you "display" decoded frames, which you can call CVOpenGLTextureGetName (...) on to use as a native texture handle in OpenGL software.
This of course is lower level than your question, but it is definitely possible for certain video formats. This is the only technique that I have personal experience with. However, I believe QTMovie also has similar functionality at a much higher level, and would likely provide the full range of features you are looking for.
I wish I could comment on AVFoundation, but I have not done any development work on OS X since 10.6. I imagine the process ought to be similar though, it should be layered on top of CoreVideo.
I am looking for a tutorial or documentation on how to overlay direct3d on top of a video (webcam) feed in directshow.
I want to provide a virtual web cam (a virtual device that looks like a web cam to the system (ie. so that it be used where ever a normal webcam could be used like IM video chats)
I want to capture a video feed from a webcam attached to the computer.
I want to overlay a 3d model on top of the video feed and provide that as the output.
I had planned on doing this in directshow only because it looked possible to do this in it. If you have any ideas about possible alternatives, I am all ears.
I am writing c++ using visual studio 2008.
Use the Video Mixing Renderer Filter to render the video to a texture, then render it to the scene as a full screen quad. After that you can render the rest of the 3D stuff on top and then present the scene.
Are you after a filter that sits somewhere in the graph that renders D3D stuff over the video?
If so then you need to look at deriving a filter from CTransformFilter. Something like the EZRGB example will give you something to work from. Basically once you have this sorted your filter needs to do the Direct 3D rendering and, literally, insert the resulting image into the direct show stream. Alas you can't render the Direct3D directly to a direct show video frame so you will have to do your rendering then lock the front/back buffer and copy the 3D data out and into the direct show stream. This isn't ideal as it WILL be quite slow (compared to standard D3D rendering) but its the best you can do, to my knowledge.
Edit: In light of your update what you want is quite complicated. You need to create a source filter (You should look at the CPushSource example) to begin with. Once you've done that you will need to register it as a video capture source. Basically you need to do this by using the IFilterMapper2::RegisterFilter call in your DLLRegisterServer function and pass in a class ID of "CLSID_VideoInputDeviceCategory". Adding the Direct3D will be as I stated above.
All round you want to spend as much time reading through the DirectShow samples in the windows SDK and start modifying them to do what YOU want them to do.