I'm looking for a method for tracking ships on a video.
For examples:
Video 1
Video 2
My expectation is tracking 12 frames/seconds and the accuracy is greater than 70%. I have heard about using learning-based detecting algorithm. However, the video in Video 1 & Video 3 are very poor quality, so I think the learning-based algorithm will not have good performance in this case.
Can anyone give me some advices for this task? Thank you so much!
Do you want to track or detect the ships? At least for the videos you posted, the tracking problems seems quite easy. Even a simple x-correlation tracker should have no trouble following points on those ships for quite long subsequences.
Related
I am trying to write a very good ANPR (automatic number plate recognition) system for Brazil's cars plates. So far I have used the javaANPR method which is the X and Y projection to find the ROI (car plate). It works well but not so good with image that has a lot of shadow in the car. And I am using tesseract-ocr as well for character recognition.
I got 80% of success for really good car images, from cars not moving.
And I got less than 60% for not so good images from moving cars.
I have been resourcing online, reading papers, etc. What do you think could help me improve it ? Maybe marge two methods ? Use templateMatch as well ? Because I need about 95% - 98% of success rate.
I see the anpronline
Their demo: https://www.anpronline.net/demo.html
They have done a really good job. It worked on 100% of my images.
Are you guys aware of what OCR engine do they use ? Maybe this is a top secret.
But can you guys point me to the right direction of how to improve my OCR ?
I really appreciate any help.
Thanks
It's likely that any highly successful system has already been patented or classified as a trade secret. If you are not doing this for a business you could try and find the patent and replicate from that. Alternatively you could look in the scientific literature for state of the art algorithms. Google Scholar is a pretty good search engine for that kind of search.
I am working on a project that requires me to detect and track a human in a live video from a webcam connected to a Beagleboard xm.
I have completed this task using Opencv in pixel domain. The results on the board are very accurate but extremely slow. Many people have suggested me to leave pixel domain and do the same task in an h.264/MPEG-4 compressed video as it would extremely reduce the computational overhead.
I have read many research papers but failed to discover any software platform or a library that I can use to analyze and process h.264 compressed videos.
I will be thankful if someone can suggest me some library for h.264 compressed video analysis and guide me further.
Thanks and Regards.
I'm not sure how practical this really is (I've never tried to do it), but my guess would be that what they're referring to would be looking for a block of macro-blocks that all have (nearly) identical motion vectors.
For example, let's assume you have a camera that's not panning, and the picture shows a car driving across the screen. Looking at the motion vectors, you should have a (roughly) car-shaped bunch of macro-blocks that all have similar motion vectors (denoting the motion of the car). Then, rather than look at the entire picture for your object of interest, you can look at that block in isolation and try to identify it. Likewise, if the camera was panning with the car, you'd have a car-shaped block with small motion vectors, and most of the background would have similar motion vectors in the opposite direction of the car's movement.
Note, however, that this is likely to be imprecise at best. Just for example, let's assume our mythical car as driving in front of a brick building, with its headlights illuminating some of the bricks. In this case, a brick in one picture might (easily) not point back at the same brick in the previous picture, but instead point at the brick in the previous picture that happened to be illuminated about the same. The bricks are enough alike that the closest match will depend more on illumination than the brick itself.
You may be able, eventually, to parse and determine that h.264 has an object, but this will not be "object tracking" like your looking for. openCV is excellent software and what it does best. Have you considered scaling the video down to a smaller resolution for easier analysis by openCV?
I think you are highly over estimating the computing power of this $45 computer. Object recognition and tracking is VERY hard computationally speaking. I would start by seeing how many frames per second your board can track and optimize from there. Start looking at where your bottlenecks are, you may be better off processing raw video instead of having to decode h.264 video first. Again, RAW video takes a LOT of RAM, and processing through that takes a LOT of CPU.
Minimize overhead from decoding video, minimize RAM overhead by scaling down the video before analysis, but in the end, your asking a LOT from a 1ghz, 32bit ARM processor.
FFMPEG is a very old library that is not being supported now a days. It has very limited capabilities in terms of processing and object tracking in h.264 compressed video. Most of the commands usually are outdated.
The best thing would be to study h.264 thoroughly and then try to implement your own API in some language like Java or c#.
I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).
I am working with opencv these days and I am capable of doing 99% of stuff explained in opencv official tutorials. And I managed to do motion tracking manually with background substraction, where some users claimed as impossible.
However, right now I am working with object detection, where I need to track the hand and want to find whether the hand is moved to left or right. Can this be done by following steps? (used in motion detection)
Get camera 2 instances of camera video (real time)
blur it to reduce noise
theresold it to find hand (or leave it if blur is enough)
find the absolute deference between 2 images
Get PSR
find pixel position of motion
However, it seems like it is not 100% same as motion detection, because I read some stuff about Kalman Filter, Block-matching, etc which I did not use in motion detection. However, I found this tutorial
http://homepages.cae.wisc.edu/~ece734/project/s06/lintangwuReport.pdf
But, I really need your advice. Is there any tutorial which teach me how to do this? I am interested in learning core theory with opencv explanation (c++).
Since I am not good at maths( I am working on it - I didnt go to the university , they found me and invited me to join the final year for free because of my programming skills, so I missed math) , full of math stuff will not work.
Please help. Thank you.
Im trying to do a screen-flashing application, that flashes the screen according to the music(which will be frequencies, such as healing frequencies, etc...).
I already made the player and know how will I make the screen flash, but I need to make the screen flash super fast according to the music, for example if the music speeds up, the screen-flash will flash faster. I understand that I would achieve this by FFT or DSP(as I only need to know when the frequency raises from some Hz, lets say 20 to change the color, making the screen-flash).
But I've found that I understand NOTHING, even less try to implement it to my application.
Can somebody help me out to learn whichever both of them? My email is sismetic_chaos#hotmail.com. I really need help, I've been stucked for like 3 days not coding or doing anything at all, trying to understand, but I dont.
PS:My application is written in C++ and Qt.
PS:Thanks for taking the time to read this and the willingness to help.
Edit: Thanks to all for the answers, the problem is in no way solved yet, but I appreciate all the answers, I didnt thought I would get so many answers and info. Thanks to you all.
This is a difficult problem, requiring more than an FFT. I'll briefly describe how I implemented beat detection when I was writing software for professional DJ equipment.
First of all, you'll need to cut down the amount of data you're dealing with, since there are only two or three beats per second, but tens of thousands of samples. You'll also need to look at different frequency ranges, since some types of music carry the tempo in the bassline, and others in percussion or other instruments. So pass the signal through several band-pass filters (I chose 8 filters, each covering one octave, from low bass to high treble), and then downsample each band by averaging the power over a few hundred samples.
Every few seconds, you'll have a thousand or so samples in each band. Your next tool is an autocorrelation, to identify repetitive patterns in the music. The peaks of the autocorrelation tell you what the beat is more or less likely to be; but you'll need to make up some heuristics to compare all the frequency bands to find a beat that you can be confident in, and to avoid misleading syncopations. If you can manage that, then you'll have a reasonable guess at the tempo, but no idea of the phase (i.e. exactly when to flash the screen).
Now you can look at the a smoothed version of the audio data for peaks, some of which are likely to correspond to beats. Initially, look for the strongest peak over the course of a few seconds and take that as a downbeat. In conjunction with the tempo you estimated in the first stage, you can predict when the next beat is due, and measure where you actually saw something like a beat, and adjust your estimate to more closely match the data. You can also maintain a confidence level based on how well the predicted beats match the measured peaks; if that drops too low, then restart the beat detection from scratch.
There are a lot of fiddly details to this, and it took me some weeks to get it working nicely. It is a difficult problem.
Or for a simple visualisation effect, you could simply detect peaks and flash the screen for each one; it will probably look good enough.
The output of a FFT will give you the frequency spectrum of an audio sample, but extracting the tempo from the FFT output is probably not the way you want to go.
One thing you can do is to use peak detection to identify the volume "spikes" that typically occur on the "down-beats" of the music. If you can identify the down-beats, then you can use a resource like bpmdatabase.com to find the tempo of the song. The tempo will tell you how fast to flash and the peaks you detected will tell you when to start flashing. Have your app monitor your flashes to make sure that they generally occur at the same time as a peak (if the two start to diverge, then the tempo may have changed mid-song).
That may sound straightforward, but this is actually a very non-trivial thing to implement. You might want to read this SO question for more information. There are some quality links in the answers there.
If I'm completely mis-interpreting what you are trying to do and you need to do FFTs for something different, then you might want to look at using one of the existing FFT libraries to do the heavy lifting for you. Some examples are FFTW and KissFFT.
It sounds like maybe you're trying to get your visualizer to flash the screen in time with the
music somehow. I don't think calculating the FFT is going to help you here. At any
given instant, there will be many simultaneous frequency components, all over the audio spectrum (roughly 20 Hz to 20 kHz). But you're likely to be a lot more interested in the
musical tempo (beats per minute -- more like 5 Hz or below), and that's not going to show
up anywhere in an FFT of the raw audio signal.
You probably need something much simpler -- some sort of real-time peak detection.
Whenever you see a peak greater than some threshold above the average volume,
make your screen flash.
Of course, more complicated visualizations might well take advantage of the FFT,
but not the one you're describing.
My recommendation would be to find a library that does this for you. Unless you have a lot of mathematics to back you up, I think you will be wasting a ton of your time trying to learn FFTs when all you really want out is some sort of 'base hits per minute' number out which you can adjust your graphics to accordingly.
Check out this similar post:
here
It took me about three weeks to understand the mathematics behind FFTs and then another week to write something in Matlab using those concepts. If you are discouraged after three days, don't try and roll your own.
I hope this is helpful advice and not discouraging.
-Brian J. Stinar-
As previous answers have noted, an FFT is probably not the tool you need in order to solve your problem, which requires tempo detection rather than spectral analysis.
For an example of what can be done using FFT - and of how a particular FFT implementation was integrated into a Qt application, take a look at this blog post which describes the spectrum analyzer demo I developed. Code for the demo is shipped with Qt itself, in the demos/spectrum directory.