Parameter of BackgroundSubtractorMOG2 - c++

I have Problem understanding all Parameter of backgroundsubtractormog2.
I looked in the code (located in bfgf_gaussmix2.cpp), but don't see the connection to the mentioned paper. For exmaple is Tb = varThreshold, but what is the name of Tb in the paper?
I am especially interested in the fat marked parameter.
Let's start with the easy parameter [my remarks]:
int nmixtures
Maximum allowed number of mixture components. Actual number is determined dynamically per pixel.
[set 0 for GMG]
uchar nShadowDetection
The value for marking shadow pixels in the output foreground mask. Default value is 127.
float fTau
Shadow threshold. The shadow is detected if the pixel is a darker version of the background. Tau is a threshold defining how much darker the shadow can be. Tau= 0.5 means that if a pixel is more than twice darker then it is not shadow.
Now to the ones i don't understand:
float backgroundRatio
Threshold defining whether the component is significant enough to be included into the background model ( corresponds to TB=1-cf from the paper??which paper??). cf=0.1 => TB=0.9 is default. For alpha=0.001, it means that the mode should exist for approximately 105 frames before it is considered foreground.
float varThresholdGen
Threshold for the squared Mahalanobis distance that helps decide when a sample is close to the existing components (corresponds to Tg). If it is not close to any component, a new component is generated. 3 sigma => Tg=3*3=9 is default. A smaller Tg value generates more components. A higher Tg value may result in a small number of components but they can grow too large. [i don't understand a word of this]
In the Constructor the variable varThreshold is used. Is it the same as varThresholdGen?
Threshold on the squared Mahalanobis distance to decide whether it is well described by the background model (see Cthr??). This parameter does not affect the background update. A typical value could be 4 sigma, that is, varThreshold=4*4=16; (see Tb??).
float fVarInit
Initial variance for the newly generated components. It affects the speed of adaptation. The parameter value is based on your estimate of the typical standard deviation from the images. OpenCV uses 15 as a reasonable value.
float fVarMin
Parameter used to further control the variance.
float fVarMax
Parameter used to further control the variance.
float fCT
Complexity reduction parameter. This parameter defines the number of samples needed to accept to prove the component exists. CT=0.05 is a default value for all the samples. By setting CT=0 you get an algorithm very similar to the standard Stauffer&Grimson algorithm.
Someone asked pretty much the same question on the OpenCV website, but without an answer.

Well, I don't think anyone could tell you which parameter is what if you don't know the details of the algorithm that you are using. Besides, you should not need anyone to tell you which parameter is what if you know the details of the algorithm. I'm telling this for detailed parameters (fCT, fVarMax, etc.) not for straightforward ones (nmixtures, nShadowDetection, etc.).
So, I think you should read the papers referenced in the documentation. Here are the links for the papers 1, 2, 3.
And also you should read this paper as well, which is the beginning of background estimation.
After reading these papers and checking out the code with, I'm sure you will understand what those parameters are.
Good luck!

Related

Can't find GDK::InterpType members in gtkmm

I'm trying to make a Gtk::Image widget display a picture from a file, but prevent the widget from expanding in size, so I'm loading it from a Gdk::Pixbuf and then scaling the picture. I'm using Gdk::Pixbuf instead of GdkPixBuf because the latter one works on regular pointers, but Gtk::Image requires a Glib::RefPtr<Gdk::Pixbuf>. (Just mentioning all this in case there's a better way to achieve what I'm doing that I'm unaware of.)
auto pixbuf = Gdk::Pixbuf::create_from_file("/home/raitis/Music/WRLD/Awake EP/cover.jpg");
auto scaled = pixbuf->scale_simple(48, 48, Gdk::InterpType::NEAREST);
image->set(scaled);
Anyway, problem is that although I'm following the documentation for Gdk::Pixbuf, line 2 in my code generate the error:
error: ‘NEAREST’ is not a member of ‘Gdk::InterpType’
auto scaled = pixbuf->scale_simple(48, 48, Gdk::InterpType::NEAREST);
^~~~~~~
Trying GDK_INTERP_NEAREST instead also leads to an error. :(
no known conversion for argument 3 from ‘GdkInterpType’ to ‘Gdk::InterpType’
From the stable gtkmm gdkmm documentation, Gdk::InterpType members are:
INTERP_NEAREST
Nearest neighbor sampling; this is the fastest and lowest quality
mode. Quality is normally unacceptable when scaling down, but may be OK when
scaling up.
INTERP_TILES
This is an accurate simulation of the PostScript image operator
without any interpolation enabled.
Each pixel is rendered as a tiny parallelogram of solid color, the
edges of which are implemented with antialiasing. It resembles nearest
neighbor for enlargement, and bilinear for reduction.
INTERP_BILINEAR
Best quality/speed balance; use this mode by default.
Bilinear interpolation. For enlargement, it is equivalent to
point-sampling the ideal bilinear-interpolated image. For reduction,
it is equivalent to laying down small tiles and integrating over the
coverage area.
INTERP_HYPER
This is the slowest and highest quality reconstruction function.
It is derived from the hyperbolic filters in Wolberg's "Digital Image
Warping", and is formally defined as the hyperbolic-filter sampling
the ideal hyperbolic-filter interpolated image (the filter is designed
to be idempotent for 1:1 pixel mapping).
And from the documentation of the Gdk::Pixbuf, in the scale_simple method you'll find a reference to the interpolation type:
Leaves src unaffected. interp_type should be Gdk::INTERP_NEAREST if
you want maximum speed (but when scaling down Gdk::INTERP_NEAREST is
usually unusably ugly). The default interp_type should be
Gdk::INTERP_BILINEAR which offers reasonable quality and speed.

About generalized hough transform code

I was looking for an implementation of Generalized Hough Transform,and then I found this website,which showed me a complete implementation of GHT .
I can totally understand how the algorithm processes except this:
Vec2i referenceP = Vec2i(id_max[0]*rangeXY+(rangeXY+1)/2, id_max[1]*rangeXY+(rangeXY+1)/2);
which calculates the reference point of the object based on the maximum value of the hough space,then mutiplied by rangXY to get back to the corresponding position of origin image.(rangeXY is the dimensions in pixels of the squares in which the image is divided. )
I edited the code to
Vec2i referenceP = Vec2i(id_max[0]*rangeXY, id_max[1]*rangeXY);
and I got another reference point then show all edgePoints in the image,which apparently not fit the shape.
I just cannot figure out what the factor(rangeXY+1)/2means.
Is there anyone who has implemented this code or familiared with the rationale of GHT can tell me what the factor rangeXYmeans? Thanks~
I am familiar with the classic Hough Transform, though not with the generalised one. However, I believe you give enough information in your question for me to answer it without being familiar with the algorithm in question.
(rangeXY+1)/2 is simply integer division by 2 with rounding. For instance (4+1)/2 gives 2 while (5+1)/2 gives 3 (2.5 rounds up). Now, since rangeXY is the side of a square block of pixels and id_max is the position (index) of such a block, then id_max[dim]*rangeXY+(rangeXY+1)/2 gives the position of the central pixel in that block.
On the other hand, when you simplified the expression to id_max[dim]*rangeXY, you were getting the position of the top-left rather than the central pixel.

Backpropagation 2-Dimensional Neuron Network C++

I am learning about Two Dimensional Neuron Network so I am facing many obstacles but I believe it is worth it and I am really enjoying this learning process.
Here's my plan: To make a 2-D NN work on recognizing images of digits. Images are 5 by 3 grids and I prepared 10 images from zero to nine. For Example this would be number 7:
Number 7 has indexes 0,1,2,5,8,11,14 as 1s (or 3,4,6,7,9,10,12,13 as 0s doesn't matter) and so on. Therefore, my input layer will be a 5 by 3 neuron layer and I will be feeding it zeros OR ones only (not in between and the indexes depends on which image I am feeding the layer).
My output layer however will be one dimensional layer of 10 neurons. Depends on which digit was recognized, a certain neuron will fire a value of one and the rest should be zeros (shouldn't fire).
I am done with implementing everything, I have a problem in computing though and I would really appreciate any help. I am getting an extremely high error rate and an extremely low (negative) output values on all output neurons and values (error and output) do not change even on the 10,000th pass.
I would love to go further and post my Backpropagation methods since I believe the problem is in it. However to break down my work I would love to hear some comments first, I want to know if my design is approachable.
Does my plan make sense?
All the posts are speaking about ranges ( 0->1, -1 ->+1, 0.01 -> 0.5 etc ), will it work for either { 0 | .OR. | 1 } on the output layer and not a range? if yes, how can I control that?
I am using TanHyperbolic as my transfer function. Does it make a difference between this and sigmoid, other functions.. etc?
Any ideas/comments/guidance are appreciated and thanks in advance
Well, by the description given above, I think that the design and approach taken it's correct! With respect to the choice of the activation function, remember that those functions help to get the neurons which have the largest activation number, also, their algebraic properties, such as an easy derivative, help with the definition of Backpropagation. Taking this into account, you should not worry about your choice of activation function.
The ranges that you mention above, correspond to a process of scaling of the input, it is better to have your input images in range 0 to 1. This helps to scale the error surface and help with the speed and convergence of the optimization process. Because your input set is composed of images, and each image is composed of pixels, the minimum value and and the maximum value that a pixel can attain is 0 and 255, respectively. To scale your input in this example, it is essential to divide each value by 255.
Now, with respect to the training problems, Have you tried checking if your gradient calculation routine is correct? i.e., by using the cost function, and evaluating the cost function, J? If not, try generating a toy vector theta that contains all the weight matrices involved in your neural network, and evaluate the gradient at each point, by using the definition of gradient, sorry for the Matlab example, but it should be easy to port to C++:
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
After evaluating the function, compare the numerical gradient, with the gradient calculated by using backpropagation. If the difference between each calculation is less than 3e-9, then your implementation shall be correct.
I recommend to checkout the UFLDL tutorials offered by the Stanford Artificial Intelligence Laboratory, there you can find a lot of information related to neural networks and its paradigms, it's worth to take look at it!
http://ufldl.stanford.edu/wiki/index.php/Main_Page
http://ufldl.stanford.edu/tutorial/

Determine difference in stops between images with no EXIF data

I have a set of images of the same scene but shot with different exposures. These images have no EXIF data so there is no way to extract useful info like f-stop, shutter speed etc.
What I'm trying to do is to determine the difference in stops between the images i.e. Image1 is +1.3 stops of Image0.
My current approach is to first calculate luminance from the image's RGB values using the equation
L = 0.2126 * R + 0.7152 * G + 0.0722 * B
I've seen different numbers being used in the equation but generally it should not affect the end result L too much.
After that I derive the log-average luminance of the image.
exp(avg of log(luminance of image))
But somehow the log-avg luminance doesn't seem to give much indication on exposure difference btw the images.
Any ideas on how to determine exposure difference?
edit: on c/c++
You have to generally solve two problems:
1. Linearize your image data
(In case it's not obvious what is meant: two times more light collected by your pixel shall result in two times the intensity value in your linearized image.)
Your image input might be (sufficiently) linearized already -> you may skip to part 2. If your content came from a camera and it's a JPEG, then this will most certainly not be the case.
The real 'solution' to this problem is finding the camera response function, which you want to invert and apply to your image data to get linear intensity values. This is by no means a trivial task. The EMoR model is widely used in all sorts of software (Photoshop, PTGui, Photomatix, etc.) to describe camera response functions. Some open source software solving this problem (but using a different model iirc) is PFScalibrate.
Having that said, you may get away with a simple inverse gamma application. A rough 'gestimation' for the right gamma value might be found by doing this:
capture an evenly lit, static scene with two exposure times e and e/2
apply a couple of inverse gamma transforms (e.g. for 1.8 to 2.4 in 0.1 steps) on both images
multiply all the short exposure images with 2.0 and subtract them from the respective long exposure images
pick the gamma that lead to the smallest overall difference
2. Find the actual difference of irradiation in stops, i.e. log2(scale factor)
Presuming the scene was static (no moving objects or camera), this is relatively easy:
sum1 = sum2 = 0
foreach pixel pair (p1,p2) from the two images:
if p1 or p2 is close to 0 or 255:
skip this pair
sum1 += p1 and sum2 += p2
return log2(sum1 / sum2)
On large images this will certainly work just as well and a lot faster if you sub-sample the images.
If the camera was static but the scene was not (moving objects), this starts to work less well. I produced acceptable results in this case by simply repeating the above procedure several times and use the output of the previous run as an estimate for the correct scale factor and then discard pixel pairs who's quotient is too far away from the current estimate. So basically replacing the above if line with the following:
if <see above> or if abs(log2(p1/p2) - estimate) > 0.5:
I'd stop the repetition after a fixed number of iterations or if two consecutive estimates are sufficiently close to each other.
EDIT: A note about conversion to luminance
You don't need to do that at all (as Tony D mentioned already) and if you insist, then do it after the linearization step (as Mark Ransom noted). In a perfect setting (static scene, no noise, no de-mosaicing, no quantization) every channel of every pixel would have the same ratio p1/p2 (if neither is saturated). Therefore the relative weighting of the different channels is irrelevant. You may sum over all pixels/channels (weighing R, G and B equally) or maybe only use the green channel.

Using ImageMagick++ to modify image contrast/brightness

I'm trying to apply contrast and brightness to a bitmap in memory and I'm completely lost. Currently I'm trying to use Magick++ to do it, but if one of the other APIs would work better I'm all ears. I managed to find Magick::Image::sigmoidalContrast() for applying the contrast, but I can't figure out how to get it to work. I'm creating an image, passing it the buffer pointer, then calling that function, but it doesn't seem like it's changing anything so my first though was that it's making a copy and modifying that. Even so, I have no idea how to get the data out of the Magick::Image object.
Here's what I got so far.
Magick::Image image(fBitmapData->mGetTextureWidth(), fBitmapData->mGetTextureHeight(), "RGBA", MagickCore::CharPixel, pixels);
image.sigmoidalContrast(1, 20.0);
The documentation is useless and after searching I could only find hints that the first parameter is actually a boolean, even though it takes a size_t, that specifies whether to add or subtract the contrast, and the second value is something I have no idea what to pass so I'm just using 20.0 to test.
So does anyone know if this will work for contrast, and if not, then how do you apply contrast? And likewise I still have no idea how to apply brightness either and can't find any functions that look like they would work.
Figured it out; The function for contrast I was using was correct, and for brightness I ended up using image.modulate(brightness, 100.0, 100.0);. To get the data out of the image object you can grab the pixels of the entire image by doing
const MagickCore::PixelPacket * magickPixels = image.getConstPixels(0, 0, image.columns(), image.rows());
And then copy the magickPixels data back into the original pixels that were passed into the image constructor. An important thing to note is that the member MagickCore::PixelPacket::opacity is not what you would think it would be. If the pixel is completely transparent you'd think the value would be 0, right? Well for some reason ImageMagick is doing it opposite. So for full transparency the value would be 255. This means you need to do 255 - opacity to get the correct value.
Also be careful of the MAGICKCORE_QUANTUM_DEPTH that ImageMagick was compiled with, as this will change the values drastically. For my code MAGICKCORE_QUANTUM_DEPTH just happened to be defined as 16 so all of the values were a range of 0 to 65535, which I just fixed by doing realValue = magickValue >> 8 when copying the data back over since the texture data is unsigned char values.
Just for clarification on how to use these functions, since the documentation is horrible and completely wrong, the first parameter to signmoidalContrast() is actually a boolean, even though the type is a size_t, that specifies whether to increase the contrast (true) or reduce it (false), and the second is a range from 0.00001 to 20.0. I say 0.00001 because 0.0 is an invalid value so it just needs to be some decimal that is close to but not exactly 0.0.
For modulate() the documentation says that each value should be specified as 1.0 for no change, which is completely wrong. The values are actually a percentage so for no change you would specify 100.0.
I hope that helps someone because it took me all damn day to figure this stuff out.
According to the Imagemagick website - for the command line but may be the same?
-sigmoidal-contrast contrastxmid-point
increase the contrast without saturating highlights or shadows.
Increase the contrast of the image using a sigmoidal transfer function without saturating highlights or shadows. Contrast indicates how much to increase the contrast. For example, near 0 is none, 3 is typical and 20 is a lot. Note that exactly zero is invalid, but 0.0001 is negligibly different from no change in contrast. mid-point indicates where midtones fall in the resultant image (0 is white; 50% is middle-gray; 100% is black). By default the image contrast is increased, use +sigmoidal-contrast to decrease the contrast.
To achieve the equivalent of a sigmoidal brightness change, use -sigmoidal-contrast brightnessx0% to increase brightness and class="arg">+sigmoidal-contrast brightnessx0% to decrease brightness.
On the command line there is a new brightness contrast setting that may be in later versions of magic++?
-brightness-contrast brightness{xcontrast}{%}}
Adjust the brightness and/or contrast of the image.
Brightness and Contrast values apply changes to the input image. They are not absolute settings. A brightness or contrast value of zero means no change. The range of values is -100 to +100 on each. Positive values increase the brightness or contrast and negative values decrease the brightness or contrast. To control only contrast, set the brightness=0. To control only brightness, set contrast=0 or just leave it off.
You may also use -channel to control which channels to apply the brightness and/or contrast change. The default is to apply the same transformation to all channels.
Brightness and Contrast arguments are converted to offset and slope of a linear transform and applied using -function polynomial "slope,offset".
The slope varies from 0 at contrast=-100 to almost vertical at contrast=+100. For brightness=0 and contrast=-100, the result are totally midgray. For brightness=0 and contrast=+100, the result will approach but not quite reach a threshold at midgray; that is the linear transformation is a very steep vertical line at mid gray.
Negative slopes, i.e. negating the image, are not possible with this function. All achievable slopes are zero or positive.
The offset varies from -0.5 at brightness=-100 to 0 at brightness=0 to +0.5 at brightness=+100. Thus, when contrast=0 and brightness=100, the result is totally white. Similarly, when contrast=0 and brightness=-100, the result is totally black.
As the range of values for the arguments are -100 to +100, adding the '%' symbol is no different than leaving it off.
If magick++ is like Imagick it may be lagging a long way behind the Imagemagick options