DirectX render video - c++

Can someone explains how intro company logo or some other similar video sequences are presented in modern 3D games?
Is it pre rendered video sequence or it is rendered using DirectX to preserve quality, resolution and aspect ratio.
I understand that it may vary from company to company but I am searching for "best practice" since I can't find that in documentation.

Pretty much always videos. Rendering a video with a different aspect ratio is almost always the same as cutting angles on a 3D scene rendering.
Also:
1) they're not always high resolution (sometimes, when playing 1080p, you can see that the video is actually 720p or much smaller)
2) to avoid quality issues, they can just record with high resolution and scale on small resolutions (it's not obvious to tell the difference between a scaled-down video and a perfect fit).
On a side note, some games may have 3D scenes but that means lags possible (before setting video settings), less complex visual effects and graphics quality (because rendering an animated movie is not the same as rendering a game, we're talking a million times longer in the most extreme cases).

Related

Object Tracking in h.264 compressed video

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.

How do I output 3D images to my 3D TV?

I have a 3D TV and feel that I would be shirking my responsibilities (as a geek) if I didn't at least try to make it display pretty 3D images of my own creation!
I've done a very basic amount of OpenGL programming before and so I understand the concepts involved - Assume that I can render myself a simple Tetrahedron or Cube and make it spin around a bit; How can I get my 3D TV to display this image in, well, 3D?
Note that I understand the basics of how 3D works (render the same image twice from 2 different angles, one for each eye), my question is about the logistics of actually doing this (do I need an SDK? etc...)
The TV I have uses polarization 3D, although my intention is that this question also be relevant to other 3D technologies (if possible)
My laptop has a HDMI output, which is what I intend to use to connect up to my TV with (does this make any difference over using a VGA / component video cable?)
In the past I have experimented with GLUT / OpenGL, however if its easier / only really possible to do this using some alternative technology then thats fine
The main problem is, getting your GPU to send a stereoscopic format. In the case of a HDMI connection this will not work without the help of a driver. If you have a professional grade GPU (Quadro, FireGL), then they likely support OpenGL quadbuffers, i.e. you get framebuffers for the left and right eye, both back and front:
glDrawBuffer(GL_BACK_LEFT);
render_left_eye();
glDrawBuffer(GL_BACK_RIGHT);
render_right_eye();
glDrawBuffer(GL_BACK); // renders to both eyes simultanously
render_screen_level_and_nonstereoscopic();
SwapBuffers();
Unfortunately OpenGL quad buffer is considered professional grade stuff.
Instead NVidia (at least) provides a customary stereoscopy library plus some extensions to control it. The main reasoning is, that shared fragments are to be rendered only once and then sent to both eyes with the appropirate parallax applied. However from my semi-professional experiences with stereoscopy¹, these kinds of semi-/automatic stereoscopifications just don't cut it. Stereoscopy requires tight control of the whole "production" pipeline, otherwise you're screwed. With Elephants Dream I went as far as modifying the renderer's core code.
I sent the people at the 3D devision at NVidia some case scenarios where you need exact control over the stereoscopy process, and I hope they will see the light and give access to quad buffer stereo also on consumer grade hardware.
Note that I understand the basics of how 3D works (render the same image twice from 2 different angles, one for each eye)
Actually you don't render from two different angles but with a shifted parallax and lens shift. Otherwise you get some trapezoidal/keystone distortion in the horizontal, which are very, very unpleasant to watch (in fact I now think that in the stereoscopic rendering process one should slightly diverge the optical axes – i.e. doing the complete contrary to what one would naively do – and "over"compensate with lens shift, I'm currently preparing a small study about this, but still need to gather my testing and control groups).
1: heck, I'm the guy who single-handedly stereographed Elephants Dream, rendered it and got it an award at a 3D movie festival.
Because you have a passive 3D TV, it's likely that the left and right eye views are rendered on alternate scan lines. (or perhaps on alternate pixels in a checkerboard pattern)
Thus your mission is to render the left-eye view to the even numbered scan lines, and the right eye view to the odd numbered scan lines (or vice versa). This can be accomplished either via OpenGL stencil operations, or, more modernly, using custom fragment shaders.
This way, you can avoid the whole quad-buffered video card/GL_BACK_LEFT/GL_BACK_RIGHT approach described by datenwolf. And you want to avoid that approach, as I have never encountered a video driver that directs quad-buffered stereo 3D to an actual 3D TV.
I agree with datenwolf's advice that you should use asymmetric frustum shift rather than scene rotation to generate the right and left eye viewpoints.

optimal pixel-read back strategy

I need to render certain scenes and read the whole image back in main memory. I've search for this and it seems that most video cards will accelerate the rendering but the read-back will be very slow. After a bit of research i only found this card mentioning "Hardware-Accelerated Pixel Read-Back"
The other approach would do software rendering and the read-back problem doesn't exist, but then the rendering performance will be bad.
Likely, i will have to implement both in order to be able to find the optimal trade-off, but my question is about what other alternative can i have hardware-wise; i understand Quadro is for modelling and designer market segment, which is precisely the client target of this application, Does this means that i'm not likely to find better pixel read-back performance in other video card lines? i.e: Tesla or Fermi, which don't even have video outputs btw
I don't know if the performance would be any different, but you could at least try rendering to an off-screen buffer, then setting that as a texture of a full-screen quad (or outputting that to video in some other way)

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.

Is stereoscopy (3D stereo) making a come back?

I'm working on a stereoscopy application in C++ and OpenGL (for medical image visualization). From what I understand, the technology was quite big news about 10 years ago but it seems to have died down since. Now, many companies seem to be investing in the technology... Including nVidia it would seem.
Stereoscopy is also known as "3D Stereo", primarily by nVidia (I think).
Does anyone see stereoscopy as a major technology in terms of how we visualize things? I'm talking in both a recreational and professional capacity.
With nVidia's 3D kit you don't need to "make a stereoscopy application", drivers and video card take care of that. 10 years ago there was good quality stereoscopy with polarized glasses and extremely expensive monitors and low quality stereoscopy with red/cyan glasses. What you have now is both cheap and good quality. Right now all you need is 120Hz LCD, entry level graphics card and $100 shutter glasses.
So no doubt about it, it will be the next big thing. At least in entertainment.
One reason why it is probably coming back is due to the fact that we know have screens with high enough refreshrate so that 3D is possible. I think I read that you will need somewhere around 100Hz for 3D-TV. So, no need for bulky glasses anymore.
Edit: To reiterate: You no longer need glasses in order to have 3D TV. This article was posted in a swedish magazine a few weeks ago: http://www.nyteknik.se/nyheter/it_telekom/tv/article510136.ece.
What it says is basically that instead of glasses you use a technique with vertical lenses on the screen. Problem with CRT is that they are not flat. Our more modern flat screens obviously hasn't got this problem.
The second problem is that you need high frequency (at least 100 Hz as that makes the eye get 50 frames per second) and a lot of pixels, since each eye only gets half the pixels.
TV sets that support 3D without glasses have been sold by various companies since 2005.
Enthusiasm for stereo display seems to come and go in cycles of hype and disappointment (e.g cinema). I don't expect TV and PCs will be any different.
For medical visualisation, if it was that useful there would be armies of clinicians sitting in front of expensive displays wearing shutter glasses already. Big hint: there aren't. And that market doesn't need 3D display tech to reach "impulse purchase" pricing levels as an enabler.