How to use buildOpticalFlowPyramid? - c++

I'm using OpenCV 3.3.1. I want to do a semi-dense optical flow operation using cv::calcOpticalFlowPyrLK, but I've been getting some really noticeable slowdown whenever my ROI is pretty big (Partly due to the fact that I am letting the user decide what the winSize should be, ranging from from 10 to 100). Anyways, it seems like cv::buildOpticalFlowPyramid can mitigate the slowdown by building image pyramids? I'm sorta familiar what image pyramids are, but in context of the function, I'm especially confused about what parameters I pass in, and how it impacts my function call to cv::calcOpticalFlowPyrLK. With that in mind, I now have these set of questions:
The output is, according to the documentation, is an OutputArrayOfArrays, which I take it can be a vector of cv::Mat objects. If so, what do I pass in to cv::calcOpticalFlowPyrLK for prevImg and nextImg (assuming that I need to make image pyramids for both)?
According to the docs for cv::buildOpticalFlowPyramid, you need to pass in a winSize parameter in order to calculate required padding for pyramid levels. If so, do you pass in the same winSize value when you eventually call cv::calcOpticalFlowPyrLK?
What exactly are the arguments for pyrBorder and derivBorder doing?
Lastly, and apologies if it sounds newbish, but what is the purpose of this function? I always assumed that cv::calcOpticalFlowPyrLK internally builds the image pyramids. Is it just to speed up the optical flow operation?
I hope my questions were clear, I'm still very new to OpenCV, and computer vision, but this topic is very interesting.
Thank you for your time.
EDIT:
I used the function to see if my guess was correct, so far it has worked, but I've seen no noticeable speed up. Below is how I used it:
// Building pyramids
int maxLvl = 3;
maxLvl = cv::buildOpticalFlowPyramid(imgPrev, imPyr1, cv::Size(searchSize, searchSize), maxLvl, true);
maxLvl = cv::buildOpticalFlowPyramid(tmpImg, imPyr2, cv::Size(searchSize, searchSize), maxLvl, true);
// LK optical flow call
cv::calcOpticalFlowPyrLK(imPyr1, imPyr2, currentPoints, nextPts, status, err,
cv::Size(searchSize, searchSize), maxLvl, termCrit, 0, 0.00001);
So now I'm wondering what's the purpose of preparing the image pyramids if calcOpticalFlowPyrLK does it internally?

So the point of your question is that you are trying to improve speed of optical flow tracking by tuning your input parameters.
If you want dirty and quick answer then here it is
KTL (OpenCV's calcOpticalFlowPyrLK) define a e residual function which are sum of gradient of point inside search window .
The main purpose is to find vector of point that can minimize residual function
So if you increase search window size (winSize) then it is more difficult to find that set of points.
If your really really want to do that then please read the official paper.
See the section 2.4
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.185.585&rep=rep1&type=pdf
I took it from official document
https://docs.opencv.org/2.4/modules/video/doc/motion_analysis_and_object_tracking.html#bouguet00
Hope that help

Related

How to improve accuracy of estimateAffine2D (or estimageRigidTransform) in OpenCV?

I have two sets of points, one from time t-1 and current time t. The first set was generated using goodFeaturesToTrack, and the latter from using calcOpticalFlowPyrLK(). Using these two sets of points, I then estimate a transformation matrix via estimateAffine2DPartial() in order to keep track of its scale & rotation. Code snippet is listed below:
// Precompute image pyramids
maxLvl = cv::buildOpticalFlowPyramid(_imgPrev, imPyr1, _winSize, maxLvl, true);
maxLvl = cv::buildOpticalFlowPyramid(tmpImg, imPyr2, _winSize, maxLvl, true);
// Optical flow call for tracking pixels
cv::calcOpticalFlowPyrLK(imPyr1, imPyr2, _currentPoints, nextPts, status, err, _winSize, maxLvl, _terminationCriteria, 0, 0.000001);
// Get transformation matrix between the two data sets
cv::Mat H = cv::estimateAffinePartial2D(_currentPoints, nextPts, inlier_mask, cv::RANSAC, 10.0, 2000, 0.99);
Using H, I then map my masking points using perspectiveTransform(). The result seems accurate for the first few dozen frames until I notice some drift (in terms of rotation) occurring when the object I am tracking continues to rotate (usually when rotation becomes > M_PI). I'm honestly stumped on where the culprit is, but my main suspicion is perhaps my window size for optical flow might be too small, or too big. However, tweaking the window size did not seem to help, the position of my object is still accurate, but the estimated rotation (and scale) got worse. Can anyone hope to shed a light on this?
Warm regards and thanks.
EDIT: Images attached to show drift issue
Starting Frame
First few frames -- Rotation OK
Z-Rotation Drift occurs -- see anchor line has drifted towards the red rectangle.
Lucas Kanade tracker needs more features. Guess the tracking template you provided is not good enough.
(1) Try with other feature rich real images? e.g Opencv feautre tracking template image
(2) fix scale. Since you are doing simulation, you can try to anchor the size first.
calcOpticalFlowPyrLK is widely used in visual inertial state estimation studies. such as Semi direct visual odometry or VINSMONO. You can try to find the code inside those project to see how other people is playing with the feature and parameters

The meaning of sigma_s and sigma_r in detailEnhance function on OpenCV

The detailEnhance function provided by openCV have parameters InputArray, OutputArray, sigma_s and sigma_r. What does sigma s and r mean and what is it used for?
Here is the source: http://docs.opencv.org/3.0-beta/modules/photo/doc/npr.html#detailenhance
Thank you in advance.
sigma_s controls how much the image is smoothed - the larger its value, the more smoothed the image gets, but it's also slower to compute.
sigma_r is important if you want to preserve edges while smoothing the image. Small sigma_r results in only very similar colors to be averaged (i.e. smoothed), while colors that differ much will stay intact.
See also: https://www.learnopencv.com/non-photorealistic-rendering-using-opencv-python-c/

Refining Camera parameters and calculating errors - OpenCV

I've been trying to refine my camera parameters with CvLevMarq but after reading about it, it seems to be causing mixed results - which is exactly what I am experiencing. I read about the alternatives and came upon EIGEN - and also found this library that utilizes it.
However, the library above seems to use a stitching class that doesn't support OpenCV and will probably require me to port it to OpenCV.
Before going ahead and doing so, which will probably not be an easy task, I figured I'd ask around first and see if anyone else had the same problem?
I'm currently using:
1. Calculating features with FASTFeatureDetector
Ptr<FeatureDetector> detector = new FastFeatureDetector(5,true);
detector->detect(firstGreyImage, features_global[firstImageIndex].keypoints); // Previous picture
detector->detect(secondGreyImage, features_global[secondImageIndex].keypoints); // New picture
2. Extracting features with SIFTDescriptorExtractor
Ptr<SiftDescriptorExtractor> extractor = new SiftDescriptorExtractor();
extractor->compute(firstGreyImage, features_global[firstImageIndex].keypoints, features_global[firstImageIndex].descriptors); // Previous Picture
extractor->compute(secondGreyImage, features_global[secondImageIndex].keypoints, features_global[secondImageIndex].descriptors); // New Picture
3. Matching features with BestOf2NearestMatcher
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_use_gpu, 0.50f);
matcher(features_global, pairwise_matches);
matcher.collectGarbage();
4. CameraParams.R quaternion passed from a device (slightly inaccurate which causes the issue)
5. CameraParams.Focal == 389.0f -- Played around with this value, 389.0f is the only value that matches the images horizontally but not vertically.
6. Bundle Adjustment (cvLevMarq, calcError & calcJacobian)
Ptr<BPRefiner> adjuster = new BPRefiner();
adjuster->setConfThresh(0.80f);
adjuster->setMaxIterations(5);
(*adjuster)(features,pairwise_matches,cameras);
7. ExposureCompensator (GAIN)
8. OpenCV MultiBand Blender
What works so far:
SeamFinder - works to some extent but it depends on the result of the cvLevMarq algoritm. I.e. if the algoritm is off, seamFinder is going to be off too.
HomographyBasedEstimator works beautifully. However, since it "relies" on the features, it's unfortunately not the method that I'm looking for.
I wouldn't want to rely on the features since I already have the matrix, if there's a way to "refine" the current matrix instead - then that would be the targeted result.
Results so far:
cvLevMarq "Russian roulette" 6/10:
This is what I'm trying to achieve 10/10 times. But 4/10 times, it looks like the picture below this one.
By simply just re-running the algorithm, the results change. 4/10 times it looks like this (or worse):
cvLevMarq "Russian roulette" 4/10:
Desired Result:
I'd like to "refine" my camera parameters with the features that I've matched - in hope that the images would align perfectly. Instead of hoping that cvLevMarq will do the job for me (which it won't 4/10 times), is there another way to ensure that the images will be aligned?
Update:
I've tried these versions:
OpenCV 3.1: Using CVLevMarq with 3.1 is like playing Russian roulette. Some times it can align them perfectly, and other times it estimates focal as NAN which causes segfault in the MultiBand Blender (ROI = 0,0,1,1 because of NAN)
OpenCV 2.4.9/2.4.13: Using CvLevMarq with 2.4.9 or 2.4.13 is unfortunately the same thing minus the NAN issue. 6/10 times it can align the images perfectly, but the other 4 times it's completely off.
My Speculations / Thoughts:
Template Matching using OpenCV. Maybe if I template match the ends of the images (i.e. x = 0, y = 0,height = image.height, width = 50). Any thoughts about this?
I found this interesting paper about Levenberg Marquardt applied in Homography. That looks like something that could solve my problem since the paper uses corner detection and whatnot to detect the features in the images. Any thoughts about this?
Maybe the problem isn't in CvLevMarq but instead in BestOf2NearestMatcher? However, I've searched for days and I couldn't find another method that returns the pairwise matches to pass to BPRefiner.
Hough Line Transform Detecting the lines in the first/second image and use that to align the images. Any thoughts on this? -- One thing might be, what if the images doesn't have any lines? I.e. empty wall?
Maybe I'm overkilling something so simple.. Or maybe I'm not? Basically, I'm trying to align a set of images so I can warp them without overlapping each-other. Drop a comment if it doesn't make sense :)
Update Aug 12:
After trying all kinds of combinations, the absolute best so far is CvLevMarq. The only problem with it is the mixed results shown in the images above. If anyone has any input, I'd be forever grateful.
It seems your parameter initialization is the problem. I would use a linear estimator first, i.e. ignore your noisy sensor, and then use this as the initial values for the non-linear optimizer.
A quick method is to use getaffinetransform, as you have mostly rotation.
Maybe you want to take a look at this library: https://github.com/ethz-asl/kalibr.
Cheers
If you want to stitch the images, you should see stitching_detailed.cpp. It will probably solve your problem.
In addition, I have used Graph Cut Seam Finding method with Canny Edge Detection for better stitching results in this code. If you want to optimize this code, see here.
Also, if you are going to use it for personal use, SIFT is good. You should know, SIFT is patented and will cost you if you use it for commercial purposes. Use ORB instead.
Hope it helps!

Parameter of BackgroundSubtractorMOG2

I have Problem understanding all Parameter of backgroundsubtractormog2.
I looked in the code (located in bfgf_gaussmix2.cpp), but don't see the connection to the mentioned paper. For exmaple is Tb = varThreshold, but what is the name of Tb in the paper?
I am especially interested in the fat marked parameter.
Let's start with the easy parameter [my remarks]:
int nmixtures
Maximum allowed number of mixture components. Actual number is determined dynamically per pixel.
[set 0 for GMG]
uchar nShadowDetection
The value for marking shadow pixels in the output foreground mask. Default value is 127.
float fTau
Shadow threshold. The shadow is detected if the pixel is a darker version of the background. Tau is a threshold defining how much darker the shadow can be. Tau= 0.5 means that if a pixel is more than twice darker then it is not shadow.
Now to the ones i don't understand:
float backgroundRatio
Threshold defining whether the component is significant enough to be included into the background model ( corresponds to TB=1-cf from the paper??which paper??). cf=0.1 => TB=0.9 is default. For alpha=0.001, it means that the mode should exist for approximately 105 frames before it is considered foreground.
float varThresholdGen
Threshold for the squared Mahalanobis distance that helps decide when a sample is close to the existing components (corresponds to Tg). If it is not close to any component, a new component is generated. 3 sigma => Tg=3*3=9 is default. A smaller Tg value generates more components. A higher Tg value may result in a small number of components but they can grow too large. [i don't understand a word of this]
In the Constructor the variable varThreshold is used. Is it the same as varThresholdGen?
Threshold on the squared Mahalanobis distance to decide whether it is well described by the background model (see Cthr??). This parameter does not affect the background update. A typical value could be 4 sigma, that is, varThreshold=4*4=16; (see Tb??).
float fVarInit
Initial variance for the newly generated components. It affects the speed of adaptation. The parameter value is based on your estimate of the typical standard deviation from the images. OpenCV uses 15 as a reasonable value.
float fVarMin
Parameter used to further control the variance.
float fVarMax
Parameter used to further control the variance.
float fCT
Complexity reduction parameter. This parameter defines the number of samples needed to accept to prove the component exists. CT=0.05 is a default value for all the samples. By setting CT=0 you get an algorithm very similar to the standard Stauffer&Grimson algorithm.
Someone asked pretty much the same question on the OpenCV website, but without an answer.
Well, I don't think anyone could tell you which parameter is what if you don't know the details of the algorithm that you are using. Besides, you should not need anyone to tell you which parameter is what if you know the details of the algorithm. I'm telling this for detailed parameters (fCT, fVarMax, etc.) not for straightforward ones (nmixtures, nShadowDetection, etc.).
So, I think you should read the papers referenced in the documentation. Here are the links for the papers 1, 2, 3.
And also you should read this paper as well, which is the beginning of background estimation.
After reading these papers and checking out the code with, I'm sure you will understand what those parameters are.
Good luck!

Reverb Algorithm [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm looking for a simple or commented reverb algorithm, even in pseudocode would help a lot.
I've found a couple, but the code tends to be rather esoteric and hard to follow.
Here is a very simple implementation of a "delay line" which will produce a reverb effect in an existing array (C#, buffer is short[]):
int delayMilliseconds = 500; // half a second
int delaySamples =
(int)((float)delayMilliseconds * 44.1f); // assumes 44100 Hz sample rate
float decay = 0.5f;
for (int i = 0; i < buffer.length - delaySamples; i++)
{
// WARNING: overflow potential
buffer[i + delaySamples] += (short)((float)buffer[i] * decay);
}
Basically, you take the value of each sample, multiply it by the decay parameter and add the result to the value in the buffer delaySamples away.
This will produce a true "reverb" effect, as each sound will be heard multiple times with declining amplitude. To get a simpler echo effect (where each sound is repeated only once) you use basically the same code, only run the for loop in reverse.
Update: the word "reverb" in this context has two common usages. My code sample above produces a classic reverb effect common in cartoons, whereas in a musical application the term is used to mean reverberation, or more generally the creation of artificial spatial effects.
A big reason the literature on reverberation is so difficult to understand is that creating a good spatial effect requires much more complicated algorithms than my sample method here. However, most electronic spatial effects are built up using multiple delay lines, so this sample hopefully illustrates the basics of what's going on. To produce a really good effect, you can (or should) also muddy the reverb's output using FFT or even simple blurring.
Update 2: Here are a few tips for multiple-delay-line reverb design:
Choose delay values that won't positively interfere with each other (in the wave sense). For example, if you have one delay at 500ms and a second at 250ms, there will be many spots that have echoes from both lines, producing an unrealistic effect. It's common to multiply a base delay by different prime numbers in order to help ensure that this overlap doesn't happen.
In a large room (in the real world), when you make a noise you will tend to hear a few immediate (a few milliseconds) sharp echoes that are relatively undistorted, followed by a larger, fainter "cloud" of echoes. You can achieve this effect cheaply by using a few backwards-running delay lines to create the initial echoes and a few full reverb lines plus some blurring to create the "cloud".
The absolute best trick (and I almost feel like I don't want to give this one up, but what the hell) only works if your audio is stereo. If you slightly vary the parameters of your delay lines between the left and right channels (e.g. 490ms for the left channel and 513ms for the right, or .273 decay for the left and .2631 for the right), you'll produce a much more realistic-sounding reverb.
Digital reverbs generally come in two flavors.
Convolution Reverbs convolve an impulse response and a input signal. The impulse response is often a recording of a real room or other reverberation source. The character of the reverb is defined by the impulse response. As such, convolution reverbs usually provide limited means of adjusting the reverb character.
Algorithmic Reverbs mimic reverb with a network of delays, filters and feedback. Different schemes will combine these basic building blocks in different ways. Much of the art is in knowing how to tune the network. Algorithmic reverbs usually expose several parameters to the end user so the reverb character can be adjusted to suit.
The A Bit About Reverb post at EarLevel is a great introduction to the subject. It explains the differences between convolution and algorithmic reverbs and shows some details on how each might be implemented.
Physical Audio Signal Processing by Julius O. Smith has a chapter on reverb algorithms, including a section dedicated to the Freeverb algorithm. Skimming over that might help when searching for some source code examples.
Sean Costello's Valhalla blog is full of interesting reverb tidbits.
What you need is the impulse response of the room or reverb chamber which you want to model or simulate. The full impulse response will include all the multiple and multi-path echos. The length of the impulse response will be roughly equal to the length of time (in samples) it takes for an impulse sound to completely decay below audible threshold or given noise floor.
Given an impulse vector of length N, you could produce an audio output sample by vector multiplication of the input vector (made up of the current audio input sample concatenated with the previous N-1 input samples) by the impulse vector, with appropriate scaling.
Some people simplify this by assuming most taps (down to all but 1) in the impulse response are zero, and just using a few scaled delay lines for the remaining echos which are then added into the output.
For even more realistic reverb, you might want to use different impulse responses for each ear, and have the response vary a bit with head position. A head movement of as little as a quarter inch might vary the position of peaks in the impulse response by 1 sample (at 44.1k rates).
You can use GVerb. Get the code from here.GVerb is a LADSPA plug-in, you can go here if you want to know something about LADSPA.
Here is the wiki for GVerb , including explaining of the parameters and some instant reverb settings.
Also we can use it directly in Objc:
ty_gverb *_verb;
_verb = gverb_new(16000.f, 41.f, 40.0f, 7.0f, 0.5f, 0.5f, 0.5f, 0.5f, 0.5f);
AudioSampleType *samples = (AudioSampleType*)dataBuffer.mBuffers[0].mData;//Audio Data from AudioUnit Render or ExtAuidoFileReader
float lval,rval;
for (int i = 0; i< fileLengthFrames; i++) {
float value = (float)samples[i] / 32768.f;//from SInt16 to float
gverb_do(_verb, value, &lval, &rval);
samples[i] = (SInt16)(lval * 32767.f);//float to SInt16
}
GVerb is a mono effect but if you want a stereo effect you could run each channel through the effect separately and then pan and mix the processed signals with the dry signals as required