Traffic Motion Recognition - computer-vision

I'm trying to build a simple traffic motion monitor to estimate average speed of moving vehicles, and I'm looking for guidance on how to do so using an open source package like OpenCV or others that you might recommend for this purpose. Any good resources that are particularly good for this problem?
The setup I'm hoping for is to install a webcam on a high-rise building next to the road in question, and point the camera down onto moving traffic. Camera altitude would be anywhere between 20 ft and 100ft, and the building would be anywhere between 20ft and 500ft away from the road.
Thanks for your input!

Generally speaking, you need a way to detect cars so you can get their 2D coordinates in the video frame. You might want to use a tracker to speed up the process and take advantage of the predictable motion of the vehicles. You, also, need a way to calibrate the camera so you can translate the 2D coordinates in the image to depth information so you can approximate speed.
So as a first step, look at detectors such as deformable parts model DPM, and tracking by detection methods. You'll probably need to port some code from Matlab (and if you do, please make it available :-) ). If that's too slow, maybe do some segmentation of foreground blobs, and track the colour histogram or HOG descriptors using a Particle Filter or a Kalman Filter to predict motion.

Related

How detect rain on camera vision using OpenCV in C++

How to recognize rain on camera vision using with OpenCV in C++?
Or if somebody stick a sticker on a camera how recognize it with OpenCV in C++?
Or if somebody throw color to the camera how can i detect it with OpenCV in C++?
Detect these on camera vision:
Rain
Sticker
Color
Here is an example video of sticker!
Camera Vision-Sticker
In case of a sticker, you're just looking for a large dark area that doesn't change in time.
In case of color, analyze image color stats - if somebody sprays some paint on a camera (is that what you mean by "throwing color"?), some color is going to be dominant over all the others.
You can also try to handle both cases by subtracting frames and detecting image areas that don't change in time that way.
You may want to use machine learning for finding threshold values (e.g. area size, its shape properties, such as width/length ratio, continuousness etc.) used to decide when to consider something to be a sticker/color or something else.
As for the rain, I guess there's no simple answer that can be given in a few sentences. There are some articles available in the web though. That said, I would guess it would be simpler and cheaper to detect rain by just installing external rain sensors (like the ones activating wipers in a car) rather than trying to do it by developing your own computer vision algorithm for that purpose.
This sounds like an interesting project, where a camera can automatically detect obstruction (paint, sticker, rain). It will most likely be necessary for the camera to be mounted without obstructions so that the expected image can be learned. If the usage scenario allows that, it won't be very hard.Both sticker and rain result in strong permanent deviations from the expected image, while rain will result in noisy images.
OpenCV with C++ or Python can help solve this kind of problems, because complicated computer vision algorithms are already implemented there. It takes some time to get started with, but after that OpenCV is not hard.

How to turn any camera into a Depth Camera?

I want to build a depth camera that finds out any image from particular distance. I have already read the following link.
http://www.i-programmer.info/news/194-kinect/7641-microsoft-research-shows-how-to-turn-any-camera-into-a-depth-camera.html
https://jahya.net/blog/how-depth-sensor-works-in-5-minutes/
But couldn't understand clearly which hardware requirements need & how to integrated into all together?
Thanks
Certainly, a depth sensor needs an IR sensor, just like in Kinect or Asus Xtion and other cameras available that provides the depth or range image. However, Microsoft came up with machine learning techniques and using algorithmic modification and research which you can find here. Also here is a video link which shows the mobile camera that has been modified to get depth rendering. But some hardware changes might be necessary if you make a standalone 2D camera into a new performing device. So I would suggest you to see the hardware design of the existing market devices as well.
one way or the other you would need two angles to the same points to get a depth. So search for depth sensors and examples e.g. kinect with ros or openCV or here
also you could transfere two camera streams into a point cloud but that's another story
Here's what I know:
3D Cameras
RGBD and Stereoscopic cameras are popular for these applications but are not always practical / available. I've prototyped with Kinects (v1,v2) and intel cameras (r200,d435). Certainly those are preferred even today.
2D Cameras
IF YOU WANT TO USE RGB DATA FOR DEPTH INFO then you need to have an algorithm that will process the math for each frame; try an RGB SLAM. A good algo will not process ALL the data every frame but it will process all the data once and then look for clues to support evidence of changes to your scene. A number of BIG companies have already done this (it's not that difficult if you have a big team w big money) think Google, Apple, MSFT, etc etc.
Good luck out there, make something amazing!

Generate an image that can be most easily detected by Computer Vision algorithms

Working on a small side project related to Computer Vision, mostly to try playing around with OpenCV. It lead me to an interesting question:
Using feature detection to find known objects in an image isn't always easy- objects are hard to find, especially if the features of the target object aren't great.
But if I could choose ahead of time what it is I'm looking for, then in theory I could generate for myself an optimal image for detection. Any quality that makes feature detection hard would be absent, and all the qualities that make it easy would exist.
I suspect this sort of thought went into things like QR codes, but with the limitations that they wanted QR codes to be simple, and small.
So my question for you: How would you generate an optimal image for later recognition by a camera? What if you already know that certain problems like skew, or partial obscuring would occur?
Thanks very much
I think you need something like AR markers.
Take a look at ArToolkit, ArToolkitPlus or Aruco libraries, they have marker generators and detectors.
And papeer about marker generation: http://www.uco.es/investiga/grupos/ava/sites/default/files/GarridoJurado2014.pdf
If you plan to use feature detection, than marker should be specific to used feature detector. Common practice for detector design is good response to "corners" or regions with high x,y gradients. Also you should note the scaling of target.
The simplest detection can be performed with BLOBS. It can be faster and more robust than feature points. For example you can detect circular blobs or rectangular.
Depending on the distance you want to see your markers from and viewing conditions/backgrounds you typically use and camera resolution/noise you should choose different images/targets. Under moderate perspective from a longer distance a color target is pretty unique, see this:
https://surf-it.soe.ucsc.edu/sites/default/files/velado_report.pdf
at close distances various bar/QR codes may be a good choice. Other than that any flat textured object will be easy to track using homography as opposed to 3D objects.
http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_feature_homography/py_feature_homography.html
Even different views of 3d objects can be quickly learned and tracked by such systems as Predator:
https://www.youtube.com/watch?v=1GhNXHCQGsM
then comes the whole field of hardware, structured light, synchronized markers, etc, etc. Kinect, for example, uses a predefined pattern projected on the surface to do stereo. This means it recognizes and matches million of micro patterns per second creating a depth map from the matched correspondences. Note that one camera sees the pattern and while another device - a projector generates it working as a virtual camera, see
http://article.wn.com/view/2013/11/17/Apple_to_buy_PrimeSense_technology_from_the_360s_Kinect/
The quickest way to demonstrate good tracking of a standard checkerboard pattern is to use pNp function of open cv:
http://www.juergenwiki.de/work/wiki/lib/exe/fetch.php?media=public:cameracalibration_detecting_fieldcorners_of_a_chessboard.gif
this literally can be done by calling just two functions
found = findChessboardCorners(src, chessboardSize, corners, camFlags);
drawChessCornersDots(dst, chessboardSize, corners, found);
To sum up, your question is very broad and there are multiple answers and solutions. Formulate your viewing condition, camera specs, backgrounds, distances, amount of motion and perspective you expect to have indoors vs outdoors, etc. There is no such a thing as a general average case in computer vision!

Is Eigenface best for moving object face recognition

I am doing a project on face recognition from CCTV cameras, I want to recognize each individual faces. I think eigenface method is best for face recognition. But when we use eigenface method for moving object face recognition, is there any problem? Can we recognize individuals perfectly? Since it is not still image, I am really confused to select a method.
Please help me to know whether this method is ok, otherwise suggest a better alternative.
Short answer: Typically those computer vision techniques used in image analysis can be used in video analysis, too. Videos just give you more information (esp. the temporal information.) For example, you could do face recognition using multiple frames, and between each frame you do object tracking. Associating multiple frames typically give you higher accuracy.
IMO, the most difficult problems are: you're more likely to face viewing angle, calibration problems, and lighting condition problems, in which you will need accurate face detection technique, or more training data in order to recognize faces under viewing angles and lighting conditions. Eigen face based approach relies on an accurate position of faces, eyes, and so on. Otherwise, you are likely to mix different features in the same vector. But again, this problem also exists in face recognition under still image.
To sum up, video content only gives you more information. If you don't really want to associated frames and consider temporal information, video is just a collection of still images :)

Looking to develop a visual odometer (distance traveled) APP for indoor use

Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.