OpenCV outdoor tracking tips - c++

I'm currently working on an application that tracks movement on a bridge and could use a few tips to make the tracking more robust. The tracking is done over a long period of time during the day, so tips on how best deal with this are handy.
Currently I'm doing basic frame differencing and blob tracking using OpenCV and OpenFrameworks. Unfortunately in this state, the question is quite open ended: I'm trying to get advice on stable tracking in outdoor conditions:
how to handle light changes
how to ignore shadows (tracking dark blobs can trigger shadows to be tracked as well)
how to isolate people (I've looked into OpenCv's HOGDescriptor but it's a bit much for my setup, I can deal with simpler/less exact data)
Also I'm thinking of improving stability by applying a few filters like blur and high pass on the images. Any other tricks/tips I could use ?

Related

Combine tracking and detection

I'm currently working on a multiple object tracking problem. I think using Tracking-by-Detection is a good choice. However, I do not know how to combine tracking and detection result so that detection can help improve tracking results.
I'm using Faster-RCNN, tensorflow object detection API as a simple starting point for detection.
For tracking, I use KCF algorithm from opencv.
Detection is unstable because every frame is independent to the model, while tracking is much more stable.
Although tracking is more stable, when the object moves, tracker can not follow the object, which is not accurate.
So I'm thinking of combining these two methods to improve my result as both stable and accurate.
I have a background of computer vision but I'm new to this field (Multiple Object Tracking). Could anyone please give me some advice on how I should deal with this kind of problem ?
Thanks alot! :)
I have tried to use detection to track objects recently. The unstable problem can be resovled by classic filting techology such as Kalman filtering(In that field, the point from signal processing is also "unstable" due to noise.). You can set a small region around the tracked object and try to find same one in that region in next frame. A "matched" relationship is established from that, and then you try to match the object in next frame from next next one... A trace can be built from the process. Any smoothing method can be employed to suppress predicted box noise. A example can be shown in:
The transparent points are detected trace points and the soild one are smoothed points.
The corresponding trace shown in background:
Some tricks are also useful, if detection fail on some random position, you can set a "skip gate", to try find one matching point in later frame(In my experiment, 60 is not bad for 24fps video). You will prefer recall more than accuracy since you can build a pretty long sequence and drop short noise sequence come from false alarm detection.
Reference code:https://github.com/yiyuezhuo/detection-tracking
I think you should try using CSRT tracker from opencv which is much more stable than KCF. For the detection, you could use it after a fixed set of frames to reinitialize the tracker using the detections. This way you can fuse a tracker with the detector.

Stitching a full spherical mosaic using only a smartphone and sensor data?

I'm really interested in the Google Street View mobile application, which integrates a method to create a fully functional spherical panorama using only your smartphone camera. (Here's the procedure for anyone interested: https://www.youtube.com/watch?v=NPs3eIiWRaw)
What strikes me the most is that it always manages to create the full sphere, even when stitching a feature-less near monochrome blue sky or ceiling ; which gets me to thinking that they're not using feature based matching.
Is it possible to get a decent quality full spherical mosaic without using feature based matching and only using sensor data? Are smartphone sensors precise enough? What library would be usable to do this? OpenCV? Something else?
Thanks!
The features are needed for registration. In the app the clever UI makes sure they already know where each photo is relative to the sphere so in the extreme case all the have to do is reproject/warp and blend. No additional geometry processing needed.
I would assume that they do try to do some small corrections to improve the registration, but even if these fail, you can fallback onto the sensor based ones acquired at capture time.
This is a case where a clever UI makes the vision problem significantly easier.

Object Tracking in h.264 compressed video

I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.

Pose independent face detection

I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).

Real time Object detection: where to learn?

I am working with opencv these days and I am capable of doing 99% of stuff explained in opencv official tutorials. And I managed to do motion tracking manually with background substraction, where some users claimed as impossible.
However, right now I am working with object detection, where I need to track the hand and want to find whether the hand is moved to left or right. Can this be done by following steps? (used in motion detection)
Get camera 2 instances of camera video (real time)
blur it to reduce noise
theresold it to find hand (or leave it if blur is enough)
find the absolute deference between 2 images
Get PSR
find pixel position of motion
However, it seems like it is not 100% same as motion detection, because I read some stuff about Kalman Filter, Block-matching, etc which I did not use in motion detection. However, I found this tutorial
http://homepages.cae.wisc.edu/~ece734/project/s06/lintangwuReport.pdf
But, I really need your advice. Is there any tutorial which teach me how to do this? I am interested in learning core theory with opencv explanation (c++).
Since I am not good at maths( I am working on it - I didnt go to the university , they found me and invited me to join the final year for free because of my programming skills, so I missed math) , full of math stuff will not work.
Please help. Thank you.