Training the Face Recognizer is taking a lot of time.
Is this time machine dependent?
Any tips for minimizing this time if I am having data for a few hundred people?
Yes it is machine dependent, depending on where you have the pictures I/O can be an issue as they have to read entirely.
I currently train ~ 5500 pictures a day on two different models (opencv models) and it takes ~ 40 minutes for one and three hours for another. But there's probably about 15 minutes of pre-processing that I do before I even begin the training which includes:
Gray scaling everything
Cropping faces
Facial alignment
Verifying
I found that if you will be doing a lot of additions to your picture repository, its easier to save, load and if the model is updateable, update and re-save to avoid training again upon a instantiation.
Good luck,
Orlando
Related
I'm using Keras (with Tensorflow backend) for an image classification project. I have a total of almost 40 000 hi-resolution (1920x1080) images that I use as training input data. Training takes about 45 minutes and this is becoming a problem so I was thinking that I might be able to speed things up by lowering the resolution of the image files. Looking at the code (I didn't write it myself) it seems all images are re-sized to 30x30 pixels anyway before processing
I have two general questions about this.
Is it reasonable to expect this to improve the training speed?
Would resizing the input image files affect the accuracy of the image classification?
1- Of course it will affect the training speed as the spatial dimensions is one of the most important key of the model speed performance.
2- We can say sure it'll affect the accuracy, but how much exactly that depends on many of other aspects like what objects are you classifying and what dataset are you working with.
I'm working on a project where I need to detect faces in very messy videos (recorded from an egocentric point of view, so you can imagine..). Faces can have angles of yaw that variate between -90 and +90, pitch with almost the same variation (well, a bit lower due to the human body constraints..) and possibly some roll variations too.
I've spent a lot of time searching for some pose independent face detector. In my project I'm using OpenCV but OpenCV face detector is not even close to the detection rate I need. It has very good results on frontal faces but almost zero results on profile faces. Using haarcascade .xml files trained on profile images doesn't really help. Combining frontal and profile cascades yield slightly better results but still, not even close to what I need.
Training my own haarcascade will be my very last resource since the huge computational (or time) requirements.
By now, what I'm asking is any help or any advice regarding this matter.
The requirements for a face detector I could use are:
very good detection rate. I don't mind a very high false positive rate since using some temporal consistency in my video I'll probably be able to get rid of the majority of them
written in c++, or that could work in a c++ application
Real time is not an issue by now, detection rate is everything I care right now.
I've seen many papers achieving these results but i couldn't find any code that I could use.
I sincerely thank for any help that you'll be able to provide.
perhaps not an answer but too long to put into comment.
you can use opencv_traincascade.exe to train a new detector that can detect a wider variety of poses. this post may be of help. http://note.sonots.com/SciSoftware/haartraining.html. i have managed to trained a detector that is sensitive within -50:+50 yaw by using feret data set. for my case, we did not want to detect purely side faces so training data is prepared accordingly. since feret already provides convenient pose variations it might be possible to train a detector somewhat close to your specification. time is not an issue if you are using lbp features, training completes in 4-5 hours at most and it goes even faster(15-30min) by setting appropriate parameters and using fewer training data(useful for ascertaining whether the detector is going to produce the output you expected).
I have a universal iOS game built with cocos2d-iphone that has a large number of small images (amongst others). For these small images, the game works fine with a 1:2:4 ratio for iphone:ipad/iphone-retina:ipad-retina. I have two approaches to enable this in the game:
A) Have three sets of sprites/spritesheets - for the three form factors required and name them appropriately and have the images picked up
B) Have one set of highest resolution images that are then scaled depending on the device and its resolution aSprite.scale=[self getScaleAccordingToDevice];
Option A has the advantage of lesser runtime overhead at the cost of high on disk footprint (an important consideration, as the app is currently ~94 MB).
Option B has the advantage of a smaller on disk footprint, but the cost is that ipad retina images will be loaded in memory even for the iphone 3gs (lowest supported device).
Can someone provide arguments that will help me decide one way or the other?
Thanks
There is no argument: use option A.
Option B is absolutely out of the question because you would be loading images that may be 6-8 times larger in memory (as a texture) on a device (3GS) that has a quarter of the memory of an iPad 3 or 4 (256 MB vs 1 GB). Not to mention the additional processing power needed to render a scaled down version of such a large image. There's a good chance it won't work at all due to running out of memory and running too slowly (have you tried?).
Next, it stands to reason that at 95 MB you might still not get your app below 50 MB with option B. The large Retina textures make up two thirds or three quarters of your bundle size , the SD textures don't weigh in much. This is the only app bundle size target you should ever consider because below 50 MB users can download your app over the air, at over 50 MB they'll have to sync via Wifi or connected to a computer. If you can't get below 50 MB, it really doesn't matter if your bundle size is 55 MB or 155 MB.
Finally there are better options to decrease bundle size. Read my article and especially the second part.
If your images are PNG the first thing you should try is to convert them all to .pvr.ccz and as NPOT texture atlases (easiest way to do that: TexturePacker). You may be able to cut down bundle size by as much as 30-50% without losing image quality. And if you can afford to lose some image quality there are even greater savings possible (plus additional loading and performance improvements).
Well, at 94Mb, your app is already way beyond the download limit for phone network, ie it will only ever be downloaded when some internet connection is available. So ... is it really an issue? The other big factor you need to consider is memory footprint when running. If you run 4x on a 3G and scale down, the memory requirement will still be for the full size sprite (ie 16x the amount of memory :). So the other question you have to ask yourself is whether that game is likely to 'run' with a high memory foot print on older devices. Also, the load time for the textures could be enough to affect the usability of your app on older devices. You need to measure these things and decide based on some hard data (unfortunately).
The first test you need to do is to see whether your 'scaled down' sprites will look ok on standard resolution iphones. Scaling down sometimes falls short of expectations when rendered. If your graphic designer turns option B down, you dont have a decision. He/she has the onus of providing all 3 formats. After that, if option B is still an option, I would start with option B, and measure on 3GS (small scale project, not the complete implementation). If all is well, you are done.
ps : for app size, consider using .pvr.ccz formats ( I use texture packer). Smaller textures and much faster load time (because of the pvr format). The load time improvement may be smaller on 3GS because of generally slower processor - need to un compress.
Are there any open source code which will take a video taken indoors (from a smart phone for example of a home or office buildings, hallways) and superimpose that on a 2D picture showing the path traveled? This can be a handr drawn picture or a photo of a floor layout.
First I thought of doing this using the accelerometer and compass sensors but thought that perhaps one can get better accuracy with the visual odometer approach. I only need 0.5 to 1 meter accuracy. The phone will also collect important information indoors (no gps) for superimposing that data on the path traveled (this is the real application of this project and we know how to do this part). The post processing of the video can be done later on a stand alone computer so speed and cpu power is not a issue.
Challenges -
The user will simply hand carry the smart phone so the video taker is moving (walking) and not fixed
limit the video rate to keep the file size small (5 frames/sec? is that ok?). Typically need perhaps a full hour of video
Will using inputs from the phone sensors help the visual approach?
any help or guidance is appreciated Thanks
I have worked in the area for quite some time. There are three points which I'd care to make.
Vision only is hard
Vision based navigation using just a cellphone camera is very difficult. Most of the literature with great results show ~1% distance traveled as state-of-the-art but is usually using stereo cameras. Stereo helps a great deal, particularly in indoor environments for coping with scale drift. I've worked on a system which achieves 0.5% distance traveled for stereo but only roughly 5% distance traveled for monocular. While I can't share code, much of our system was inspired by this Sibley and Mei paper.
Stereo code in our case ran at full 60fps on a desktop. Provided you can push data fast enough, it'll be fine. With your error envelope, you can only navigate for 100m or so. Is that enough?
Multi-sensor is way to go. Though other sensors are worse than vision by themselves.
I've heard some good work with accelerometers mounted on the foot to do ZUPT (zero velocity updates) when the foot is briefly motionless on the ground while taking a step in order to zero out drift. This approach has the clear drawback of needing to mount the device on your foot, making a vision approach largely useless.
Compass is interesting but will be distracted by the ton of metal within an office building. Translating few feet around a large metal cabinet might cause 50+ degrees of directional jump.
Ultimately, a combination of sensors is likely to be the best if you can make that work.
Can you solve a simpler problem?
How much control do you have over your environment? Can you slap down fiducial markers? Can you do wifi triangulation? Does it need to be an initial exploration? If you can go through the environment before hand and produce visual bubbles (akin to Google Street View) to match against, you'll be much more accurate.
I'm not aware of any software that does this directly (though it might exist) but stuff similar to what you want to do has been done. A few pointers:
Google for "Vision based robot localization" the problem you state is very similar to the problem robots with a camera have when they enter a new environment. In this field the approach is usually to have the robot map its environment and then use the model for later reference, but the techniques are similar to what you'll need.
Optical flow will roughly tell you in what direction the camera is moving, but it won't tell you the speed because you have no objective reference. This is because you don't know if the things you see moving in the video feed are 1cm away and very small or 1 mile away and very big.
If you know the camera matrix of the camera recording the images you could try partial 3D scene reconstruction techniques to take a stab at the speed. Note that you can do the 3D scene stuff without the camera matrix (this is the "uncalibrated" part you see in the title of a lot of the google results), the camera matrix will let you add real world object sizes (and hence distances) to your reconstruction.
The amount of images/second you need depends on the speed of the camera. More is better, but my guess is that 5/second should be sufficient at walking speeds.
Using extra sensors will help. Probably the robot localization articles talk about this as well.
Im trying to do a screen-flashing application, that flashes the screen according to the music(which will be frequencies, such as healing frequencies, etc...).
I already made the player and know how will I make the screen flash, but I need to make the screen flash super fast according to the music, for example if the music speeds up, the screen-flash will flash faster. I understand that I would achieve this by FFT or DSP(as I only need to know when the frequency raises from some Hz, lets say 20 to change the color, making the screen-flash).
But I've found that I understand NOTHING, even less try to implement it to my application.
Can somebody help me out to learn whichever both of them? My email is sismetic_chaos#hotmail.com. I really need help, I've been stucked for like 3 days not coding or doing anything at all, trying to understand, but I dont.
PS:My application is written in C++ and Qt.
PS:Thanks for taking the time to read this and the willingness to help.
Edit: Thanks to all for the answers, the problem is in no way solved yet, but I appreciate all the answers, I didnt thought I would get so many answers and info. Thanks to you all.
This is a difficult problem, requiring more than an FFT. I'll briefly describe how I implemented beat detection when I was writing software for professional DJ equipment.
First of all, you'll need to cut down the amount of data you're dealing with, since there are only two or three beats per second, but tens of thousands of samples. You'll also need to look at different frequency ranges, since some types of music carry the tempo in the bassline, and others in percussion or other instruments. So pass the signal through several band-pass filters (I chose 8 filters, each covering one octave, from low bass to high treble), and then downsample each band by averaging the power over a few hundred samples.
Every few seconds, you'll have a thousand or so samples in each band. Your next tool is an autocorrelation, to identify repetitive patterns in the music. The peaks of the autocorrelation tell you what the beat is more or less likely to be; but you'll need to make up some heuristics to compare all the frequency bands to find a beat that you can be confident in, and to avoid misleading syncopations. If you can manage that, then you'll have a reasonable guess at the tempo, but no idea of the phase (i.e. exactly when to flash the screen).
Now you can look at the a smoothed version of the audio data for peaks, some of which are likely to correspond to beats. Initially, look for the strongest peak over the course of a few seconds and take that as a downbeat. In conjunction with the tempo you estimated in the first stage, you can predict when the next beat is due, and measure where you actually saw something like a beat, and adjust your estimate to more closely match the data. You can also maintain a confidence level based on how well the predicted beats match the measured peaks; if that drops too low, then restart the beat detection from scratch.
There are a lot of fiddly details to this, and it took me some weeks to get it working nicely. It is a difficult problem.
Or for a simple visualisation effect, you could simply detect peaks and flash the screen for each one; it will probably look good enough.
The output of a FFT will give you the frequency spectrum of an audio sample, but extracting the tempo from the FFT output is probably not the way you want to go.
One thing you can do is to use peak detection to identify the volume "spikes" that typically occur on the "down-beats" of the music. If you can identify the down-beats, then you can use a resource like bpmdatabase.com to find the tempo of the song. The tempo will tell you how fast to flash and the peaks you detected will tell you when to start flashing. Have your app monitor your flashes to make sure that they generally occur at the same time as a peak (if the two start to diverge, then the tempo may have changed mid-song).
That may sound straightforward, but this is actually a very non-trivial thing to implement. You might want to read this SO question for more information. There are some quality links in the answers there.
If I'm completely mis-interpreting what you are trying to do and you need to do FFTs for something different, then you might want to look at using one of the existing FFT libraries to do the heavy lifting for you. Some examples are FFTW and KissFFT.
It sounds like maybe you're trying to get your visualizer to flash the screen in time with the
music somehow. I don't think calculating the FFT is going to help you here. At any
given instant, there will be many simultaneous frequency components, all over the audio spectrum (roughly 20 Hz to 20 kHz). But you're likely to be a lot more interested in the
musical tempo (beats per minute -- more like 5 Hz or below), and that's not going to show
up anywhere in an FFT of the raw audio signal.
You probably need something much simpler -- some sort of real-time peak detection.
Whenever you see a peak greater than some threshold above the average volume,
make your screen flash.
Of course, more complicated visualizations might well take advantage of the FFT,
but not the one you're describing.
My recommendation would be to find a library that does this for you. Unless you have a lot of mathematics to back you up, I think you will be wasting a ton of your time trying to learn FFTs when all you really want out is some sort of 'base hits per minute' number out which you can adjust your graphics to accordingly.
Check out this similar post:
here
It took me about three weeks to understand the mathematics behind FFTs and then another week to write something in Matlab using those concepts. If you are discouraged after three days, don't try and roll your own.
I hope this is helpful advice and not discouraging.
-Brian J. Stinar-
As previous answers have noted, an FFT is probably not the tool you need in order to solve your problem, which requires tempo detection rather than spectral analysis.
For an example of what can be done using FFT - and of how a particular FFT implementation was integrated into a Qt application, take a look at this blog post which describes the spectrum analyzer demo I developed. Code for the demo is shipped with Qt itself, in the demos/spectrum directory.