Logistic Regression with pymc3 - what's the prior for build in glm? - pymc3

I could not find good explanation for what's going on exactly by using glm with pymc3 in case of logistic regression. So I compared the GLM version to an explicit pymc3 model. I started to write an ipython notebook for documentation, see:
http://christianherta.de/lehre/dataScience/machineLearning/mcmc/logisticRegressionPymc3.slides.php
What I don't understand is:
What prior is used for the Parameters in GLM? I assume they are also Normal distributed. I got different results with my explicit model in comparison to the build in GLM. (see link above)
With less data the sampling get's stuck and/or I got really poor results. With more training data I could not observe this behaviour. Is this normal for mcmc?
There are more issue in the notebook.
Thanks for your answer.

What prior is used for the Parameters in GLM
GLM is name for family of methods. Two popular priors: gaussian (corresponds to l2 regularization) and laplacian (corresponds to l1), usually the first one.
With less data the sampling get's stuck and/or I got really poor results. With more training data I could not observe this behaviour. Is this normal for mcmc?
Did you play with prior parameter? If model behaves badly with small amount of data, this may be due to strong prior (= too high regularization), which becomes the main term in optimization.

Related

In SAS, does the using the GLS option 'absorb' invalidate the standard errors of the parameter estimates?

SAS provides a handy tool for handling GLM models with a large number of groups or fixed effects - "absorb" (PROC GLM; ABSORB). My understanding is that it factors the effect of the absorbed parameters out of the data before estimating the remaining parameters. It seems like this may invalidate the standard errors of the parameter estimates. The variance covariance matrix would not be calculated with the full set of parameters. Hence, we would be calculating the variance covariance matrix conditional on the absorbed parameters, where we would really want the unconditional variance. Am I misunderstanding what is going on here? I haven't found any good explanations in the help files.

Opencv cascade classifier with SVM as weak learner

I'm working on a project related to people detection. I successfully implemented both an HOG SVM based classifier (with libSVM) and a cascade classifier (with opencv). The svm classifier works really good, i tested over a number of videos and it is correctly detecting people with really a few false positive and a few false negative; problem here is the computational time: nearly 1.2-1.3 sec over the entire image and 0.2-0.4 sec over the foreground patches; since i'm working on a project that must be able to work in nearly real-time environment, so i switched to the cascade classifier (to get less computational time).
So i trained many different cascade classifiers with opencv (opencv_traincascade). The output is good in terms of computational time (0.2-0.3 sec over the entire image, a lot less when launched only over the foreground), so i achieved the goal, let's say. Problem here is the quality of detection: i'm getting a lot of false positive and a lot of false negative. Since the only difference between the two methods is the base classifier used in opencv (decision tree or decision stumps, anyway no SVM as far as i understand), so i'm starting to think that my problem could be the base classifier (in some way, hog feature are best separated with hyperplanes i guess).
Of course, the dataset used in libsvm and Opencv is exactly the same, both for training and for testing...for the sake of completeness, i used nearly 9 thousands positive samples and nearly 30 thousands negative samples.
Here my two questions:
is it possible to change the base weak learner in the opencv_traincascade function? if yes, it the svm one of the possible choices? if the both answers are yes, how can i do such a thing? :)
are there other computer vision or machine learning libraries that implement the svm as weak classifier and have some methods to train a cascade classifier? (are these libraries suitable to be used in conjuction with opencv?)
thank you in advance as always!
Marco.
In principle a weak classifier can be anything, but the strength of Adaboost related methods is that they are able to obtain good results out of simple classifiers (they are called “weak” for a reason).
Using SVN and Adaboost cascade is a contradiction, as the former has no need to be used in such a framework: it is able to do its job by itself, and the latter is fast just because it takes advantage of weak classifiers.
Furthermore I don’t know of any study about it and OpenCv doesn’t support it: you have to write code by yourself. It is a huge undertaking and probably you won’t get any interesting result.
Anyway if you think that HOG features are more fitted for your task, OpenCv’s traincascade has an option for it, apart from Haar and Lbp.
As to your second question, I’m not sure but quite confident that the answer is negative.
My advice is: try to get the most you can from traincascade, for example try increase the number of samples id you can and compare the results.
This paper is quite good. It simply says that SVM can be treated as a weak classifier if you use fewer samples to train it (let's say less than half of the training set). The higher the weights the more chance it will be trained by the 'weak-SVM'.
The source code is not widely available unfortunately. If you want a quick prototype, use python scikit learn and see if you can get desirable results before modifying opencv.

Stata - Tobit - Lagrange Multiplier Test

Community,
I am running a left- and right-censored tobit regression model. The dependent variable is the proportion of cash used in M&A transactions running from 0 to 1.
I assume heteroskedasticity to be prevalent due to the characteristics of my cross-sectional sample as well as the BPCW test for the LS regression model. In order to test the tobit specifications, I used bctobit. However, bctobit is not applicable for right-censored data.
This gives rise to the following question:
- Is there another user-written command to test for the tobit specifications with right- and left-censored data?
Thanks a lot for your efforts!
The most immediate question to me here is statistical. From what you say this approach is inadvisable, so how to implement it is immaterial.
I don't think Tobit makes much sense for variables that are defined to lie in an interval. Censoring to me implies that some high or low values might have been observed in principle, but in practice are recorded as less extreme values. It seems to me that logit or probit are the appropriate link functions for proportional responses, and in that Stata that means glm with e.g. logit link.
Regardless of that, do you regard linear dependence as expected here?
For an excellent concise review making this point, see http://www.stata-journal.com/sjpdf.html?articlenum=st0147

Supprt Vector Machine works in matlab, doesn't work in c++

I'm writing an application that uses an SVM to do classification on some images (specifically these). My Matlab implementation works really well. Using a SIFT bag-of-words approach, I'm able to get near 100% accuracy with a linear kernel.
I need to implement this in C++ for speed/portability reasons, and so I've tried using both libsvm and dlib. I've tried multiple SVM types (c_svm, nu_svm, one_class) and multiple kernels (linear, polynomial, rbf). The best I've been able to achieve is around 50% accuracy - even on the same samples that I've trained on. I've confirmed that my feature generators are working, because when I export my c++-generated features to Matlab and train on those, I'm able to get near-perfect results again.
Is there something magical about Matlab's SVM implementation? Are there any common pitfalls or areas that I might look into that would explain the behavior I'm seeing? I know this is a little vague, but part of the problem is that I don't know where to go. Please let me know in the comments if there is other info I can provide that would be helpful.
There is nothing magical about the Matlab version of the libraries, other that it runs in Matlab which makes it harder to shoot yourself on the foot.
A check list:
Are you normalizing your data, making all values lie between 0 and 1
(or between -1 and 1), either linearly or using the mean and the
standard deviation?
Are you parameter searching for a good value of C (or C and gamma in
the case of an RBF kernel)? Doing cross validation or on a hold out set?
Are you sure that your're handling NaN, and all other floating point
nastiness? Matlab is very good at hiding this from you, C++ not so
much.
Could it be that you're loading your data incorrectly, reading a
"%s" into a double or something that is adding noise to your input
data?
Could it be that libsvm/dlib expects the data in row major order and
your're sending it in in column major (or the other way around)? Again Matlab makes this almost impossible, C++ not so much.
32-64 bit nastiness one version of the library, executable compiled
with the other?
Some other things:
Could it be that in Matlab you're somehow leaking the class (y) into
the preprocessing? no one does this on purpose, but I've seen it happen.
If you make almost any f(y) a feature, you'll get almost 100%
everytime.
Sometimes it helps to verify that everything is numerically
identical by printing to file before training both in C++ and
Matlab.
i'm very happy with libsvm using the rbf kernel. carlosdc pointed out the most common errors in the correct order :-). for libsvm - did you use the python tools shipped with libsvm? if not i recommend to do so. write your feature vectors to a file (from matlab and/or c++) and do a metatraining for the rbf kernel with easy.py. you get the parameters and a prediction for the generated model. if this prediction is ok continue with c++. from training you also get a scaled feature file (min/max transformed to -1.0/1.0 for every feature). compare these to your c++ implementation as well.
some libsvm issues: a nasty habit is (if i remember correctly) that values scaling to 0 (zero) are omitted in the scaled file. in grid.py is a parameter "nr_local_worker" which is defining the mumber of threads. you might wish to increase it.

Two way clustering in ordered logit model, restricting rstudent to mitigate outlier effects

I have an ordered dependent variable (1 through 21) and continuous independent variables. I need to run the ordered logit model, clustering by firm and time, eliminating outliers with Studentized Residuals <-2.5 or > 2.5. I just know ologit command and some options for the command; however, I have no idea about how to do two way clustering and eliminate outliers with studentized residuals:
ologit rating3 securitized retained, cluster(firm)
As far as I know, two way clustering has only been extended to a few estimation commands (like ivreg2 from scc and tobit/logit/probit here). Eliminating outliers can easily be done on your own and there's no automated way of doing it.
Use the logit2.ado from the link Dimitriy gave (Mitchell Petersen's website) and modify it to use the ologit command. It's simple enough to do with a little trial and error. Good luck!
If you have a variable with 21 ordinal categories, I would have no problems treating that as a continuous one. If you want to back that up somehow, I wrote a paper on welfare measurement with ordinal variables, see DOI:10.1111/j.1475-4991.2008.00309.x. Then you can use ivreg2. You should be aware of all the issues involved with that estimator, in particular, that it implicitly assumed that the correlations are fully modeled by this two-way structure, and observations for firms i and j and times t and s are definitely uncorrelated for i!=j and t!=s. Sometimes, this is a strong assumption to make -- i.e., New York and New Jersey may be correlated in 2010, but New York 2010 is uncorrelated with New Jersey 2009.
I have no idea of what you might mean by ordinal outliers. Somebody must have piled a bunch of dissertation advice (or worse analysis requests) without really trying to make sense of every bit.