What main factors/features explain the high price of most industrial computer vision hardware? [closed] - computer-vision

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I am a student who is currently working on a computer science project that will require soon computer vision and more specifically stereoscopy (for depth detection). I am now looking for a great camera to do the job and I found several interesting options:
1- A custom built set of two cheap cameras (i.e. webcam);
2- The old, classic, economic and proven Kinect;
3- Specialized stereo sensors.
I found a couple months ago this sensor: https://duo3d.com/product/duo-mini-lv1
I tought it was interesting because it is small, stereoscopic and brand new (encouraging a fresh USA company). However, if we take apart the additional APIs that come with it, I just don't understand why you would buy this when a Kinect (or "cheap" cameras) are at least 4-5 times less expensive and still have great if not better specifications.
Same applies for this one: http://www.ptgrey.com/bumblebee2-firewire-stereo-vision-camera-systems
Can someone please explain to me why I would need such a device and if not, why they are needed for some?

The reason you want a "real" stereo camera as opposed to a pair of usb webcams is synchronization. Cameras like the bumblebee have hardware triggering, which takes the two images with virtually no delay in between. With the webcams you will always have a noticeable delay between the two shots. This may be ok if you are looking at a static scene or at a scene where things are moving slowly. But if you want to have a stereo camera on a mobile robot, you will need good synchronization.
Kinect is great. However, a good stereo pair of cameras has a couple of serious advantages. One is that Kinect will not work outdoors during the day. The bright sun will interfere with the infra-red camera that Kinect uses for depth estimation. Also Kinect has a fairly short range of a few meters. You can get 3D information at much larger distances with a stereo pair by increasing the baseline.

In computer vision, we always want an ideal stereo camera such as no skewness on pixels, perfectly matched, aligned, identical cameras and so on. The cameras must supply enough images per seconds, because some of the algorithms uses temporal information that requires high fps to approximate the continuous motion. The lens is also an important part which may be so expensive. Additionally, the hardware suppliers generally provide an SDK. Creating an SDK is adding them extra value, because we always want to care what is important for us such as algorithms. Preparing the required software in order to connect the cameras to PC or any other boards may waste our time.
Kinect allows researchers to get depth information easily with a really good price; however, I agree with Dima that it's just for indoor applications and Kinect depth data has some holes which are generally required to be filled.

In addition to what Dima already pointed out: The cameras that you have listed both only give you the raw image data. If you are interested in the depth data, you will have to compute them yourself which can be very resource demanding and hard to do in real-time.
I currently know of one hardware system which does that for you in real-time:
http://nerian.com/products/sp1-stereo-vision/
But I don't think that this would be cheap either, and you would have to buy that on-top of the industrial cameras. So, if you are looking for a cheap solution, you should go with the Kinnect.

Related

Object Tracking in h.264 compressed video

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.

Design of virtual trial room

As a part of my masters project I proposed to build a virtual trial room application intended for retail clothing stores. Currently its meant to be used directly in store though it may be extended for online stores as well.
This application will show customers how a selected apparel would look on them by showing it on their 3D replica on screen.
It involves 3 steps
Sizing up the customer
Building customer replica 3D humanoid model
Apply simulated cloth on the model
My question is about the feasibility of the project and choice of framework.
Can this be achieved in real time using a normal Desktop computer? If yes what would be appropriate framework ( hardware, software, programming language etc ) for this purpose?
On the work I have done till now, I was planning to achieve above steps in following ways
for step 1 : option a) Two cameras for front and side views or
option b) 1 Kinect or 2 Kinect for complete 3D data
for step 2: either use makehuman (http://www.makehuman.org/) code to build a customised 3D model using above data or build everything from scratch, unsure about the framework.
for step 3: Just need few cloth samples, so thought of building simulated clothes in blender.
Currently I have just the vague idea about different pieces but I am not sure of how to develop complete application.
Theoretically this can be achieved in real time. Many usefull algorithms for video tracking, stereo vision and 3d recostruction are available in OpenCV library. But it's very difficult to build robust solution. For example, you'll probably need to track human body which moves frame to frame and perform pose estimation (OpenCV contains POSIT algorithm), however it's not trivial to eliminate noise in resulting objects coordinates. For inspiration see a nice work on video tracking.
You might want to choose another way, simplify some things, avoid complicated stuff do things less dynamicaly and estimate only clothes size and approximate human location. I this case most likely you will create something usefull and interesting.
I've lost link to one online fiting room where hands and body detection implemented. Using Kinnect solves many problems. But If for some reason you won't use it then AR(augmented reality) helps you (yet another fitting room)

Modelling clothing in C++ [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking to write a bit of software that will end up drawing a human frame (which can be configured with various parameters), and the plan is to have some sort of clothing placed on the dummy.
I've looked at Blender, and OpenGL libraries as well as other rendering and physics engines, I'm not looking for you to tell me how to do this, but mainly I'm wondering what libraries are out there to do this sort of thing?
So there'll be a pattern for the clothing in 2d, then the system, (at least in theory) will be able to translate that in to a 3d representation of a shirt for example? And then place that on the human frame. I know there's a lot of work I need to do for this, however in terms of rendering the clothing on to the frame, and accounting for collisions and how it drops around the frame etc, I've been googling, and have found a few bits, but was wondering if there were C++ libraries out there that would do that.
I'm developing using Visual C++ 2010, and the target environment is a Windows box.
Either that, or i'm going to need to take some physics lessons.
Unfortunately, developing a system like the one your talking about would be insanely difficult. On the plus side, there are alot of easy to use technologies that will help you attain your goal hopefully.
Generally, the way that this type of thing works is as follows: You make some 3d asset in a modeling program such as Blender, 3ds max, Maya, Softimage, etc, and then use this in your program/game. You can think of these programs as just spitting out a bunch of 3d coordinates, which your program, with the help of OpenGL or DirectX can load into memory and render.
Modeling and loading assets is of course the alternative to developing algorithms to generate points. This is what it seems like your trying to accomplish.
The bad news is that clothing is really really complicated. A big part of this is due to the fact that most of it requires simulating cloth dynamics. Another part of the problem is that even if you had a 2d pattern, how would you the manner in which the clothing would adhere to your human model? Is it skin tight? Loose? How will you parameterize that? The placement of the actual clothing on the body is a chore in and of itself as anyone with experience in 3d modeling might tell you.
Nevertheless, some of the industry's brightest professionals are looking for both better ways to simulate cloth, and better ways to automate asset creation.
In summation, the easy answer is that what your trying to do, as interesting and noble as it may be, is going to be extremely difficult and may not have the result your looking for.
As for where you can go for more answers:
If your still intersted in finding a way to automate clothing attatchment to models, I would start by looking around academic websites. Look for any computer science departments which have computer graphics research programs. You will find alot of interesting things there.
For more academic type resources look at Game Programming Gems, GPU Programming Gems, and Graphics Programming Gems book series. They feature many good articles that tackle difficult graphics problems such as these.
Another thing you might do is check out blender a little more. There is an interesting project called MakeHuman
http://makehuman.blogspot.com/
That automates the process of developing human models in blender.
There are a couple of tutorials for putting clothing on the models, take a look at this one:
http://www.davidjarvis.ca/blender/tutorial-05.shtml
For more tutorials on clothing and cloth simulation in blender, you can always check out
www.blendercookie.com
cg.tutsplus.com
I hope some of this has been useful.
From what I remember, cloth is simulated as a mesh of springs which suggests physics libraries for the simulation along with an understanding of the physics of springs/cloth. I've not heard of a physics library tailored to cloth simulation though, but no doubt someone on this site will know of one.
It's answer about cloth simulation itself. (maybe it is not you're intersing in)
If you want to model cloth simulation by some vendors middleware - you can try to use
Havok(it's commercial). It seems to me, that is supports any collision objects, represented by a triangle mesh.
PhysX (it's free), but when you will try to use it there is a lot of constraints on it).
If you want to model cloth physics by you hands I can advise to you this steps:
Refresh base knowledge about physics (Interia, Energy, Newton's law.)
Good start point fo cloth simulation and also physics simulation is that book
http://www.amazon.com/Game-Physics-Pearls-Gino-Bergen/dp/1568814747
Read articles from Siggraph about clothes.
Think about which collision objects do you need
Think about what forces do you need.
Split this challenge to
Broad Phase / Integration / Collision Detection / Collision Response / Constrain Solver
I have developed cloth physics simulation in C++, OpenCL.
It takes me about 4 months to develop, and about 2 months to Debug stage5.
But it was very hot-time in my life, the job has consumed huge amount of time.
except the part that you want to change the dummy while application is running what you want is more or less the example of game engines like Esenthel Engine. the whole idea is to load a mesh for the body and then put a "cloth" (cloth is already defined in most game engines as physical type) on it. but when it come to runtime changes in human frame it becomes a little more tricky since you have to know how you are going to affect the parrameters which is not easy of organic shapes.
Free Game engine to use these days is Unity 3d ... as well it all depends in the detail and as well Maya and 3ds Max are the best of the modeling programs.

Is stereoscopy (3D stereo) making a come back?

I'm working on a stereoscopy application in C++ and OpenGL (for medical image visualization). From what I understand, the technology was quite big news about 10 years ago but it seems to have died down since. Now, many companies seem to be investing in the technology... Including nVidia it would seem.
Stereoscopy is also known as "3D Stereo", primarily by nVidia (I think).
Does anyone see stereoscopy as a major technology in terms of how we visualize things? I'm talking in both a recreational and professional capacity.
With nVidia's 3D kit you don't need to "make a stereoscopy application", drivers and video card take care of that. 10 years ago there was good quality stereoscopy with polarized glasses and extremely expensive monitors and low quality stereoscopy with red/cyan glasses. What you have now is both cheap and good quality. Right now all you need is 120Hz LCD, entry level graphics card and $100 shutter glasses.
So no doubt about it, it will be the next big thing. At least in entertainment.
One reason why it is probably coming back is due to the fact that we know have screens with high enough refreshrate so that 3D is possible. I think I read that you will need somewhere around 100Hz for 3D-TV. So, no need for bulky glasses anymore.
Edit: To reiterate: You no longer need glasses in order to have 3D TV. This article was posted in a swedish magazine a few weeks ago: http://www.nyteknik.se/nyheter/it_telekom/tv/article510136.ece.
What it says is basically that instead of glasses you use a technique with vertical lenses on the screen. Problem with CRT is that they are not flat. Our more modern flat screens obviously hasn't got this problem.
The second problem is that you need high frequency (at least 100 Hz as that makes the eye get 50 frames per second) and a lot of pixels, since each eye only gets half the pixels.
TV sets that support 3D without glasses have been sold by various companies since 2005.
Enthusiasm for stereo display seems to come and go in cycles of hype and disappointment (e.g cinema). I don't expect TV and PCs will be any different.
For medical visualisation, if it was that useful there would be armies of clinicians sitting in front of expensive displays wearing shutter glasses already. Big hint: there aren't. And that market doesn't need 3D display tech to reach "impulse purchase" pricing levels as an enabler.