Same do-file, same computer, sometimes different results - stata

I've got a large do-file that calls several sub-do-files, all in the lead-up to the estimation of a custom maximum likelihood model. That is, I have a main.do, which looks like this
version 12
set seed 42
do prepare_data
* some other stuff
do estimate_ml
and estimate_ml.do looks like this
* lots of other stuff
global cdf "normal"
program define customML
args lnf r noise
tempvar prob1l prob2l prob1r prob2r y1l y2l y1r y2r euL euR euDiff scale
quietly {
generate double `prob1l' = $ML_y2
generate double `prob2l' = $ML_y3
generate double `prob1r' = $ML_y4
generate double `prob2r' = $ML_y5
generate double `scale' = 1/100
generate double `y1l' = `scale'*((($ML_y10+$ML_y6)^(1-`r'))/(1-`r'))
generate double `y2l' = `scale'*((($ML_y10+$ML_y7)^(1-`r'))/(1-`r'))
generate double `y1r' = `scale'*((($ML_y10+$ML_y8)^(1-`r'))/(1-`r'))
generate double `y2r' = `scale'*((($ML_y10+$ML_y9)^(1-`r'))/(1-`r'))
generate double `euL' = (`prob1l'*`y1l')+(`prob2l'*`y2l')
generate double `euR' = (`prob1r'*`y1r')+(`prob2r'*`y2r')
generate double `euDiff' = (`euR'-`euL')/`noise'
replace `lnf' = ln($cdf( `euDiff')) if $ML_y1==1
replace `lnf' = ln($cdf(-`euDiff')) if $ML_y1==0
}
end
ml model lf customML ... , maximize technique(nr) difficult cluster(id)
ml display
To my great surprise, when I run the whole thing from top to bottom in Stata 12/SE I get different results for one of the coefficients reported by ml display each time I run it.
At first I thought this was a problem of running the same code on different computers but the issue occurs even if I run the same code on the same machine multiple times. Then I thought this was a random number generator issue but, as you can see, I can reproduce the issue even if I fix the seed at the beginning of the main do-file. The same holds when I move the set seed command immediately above the ml model.... The only way to get the same results though multiple runs is if I run everything above ml model and then only run ml model and ml display repeatedly.
I know that the likelihood function is very flat in the direction of the parameter whose value changes over runs so it's no surprise it can change. But I don't understand why it would, given that there seems to be little that isn't deterministic in my do files to begin with and nothing that couldn't be made deterministic by fixing the seed.

I suspect a problem with sorting. The default behaviour is that if two observations have the same value, they will be sorted randomly. Moreover, the random process that guides this sorting is governed by a different seed. This is intentional, as it prevents users to by accident see consistency where none exist. The logic being that it is better to be puzzled than to be overly confident.

As someone mentioned in the comments to this answer, adding the option stable to my sort command made the difference in my situation.

Related

MCMC taking forever to run in SAS

Suppose I have a regression where the response variable is sales, and I have various drivers of sales as the independent variables. I want to build a model using MCMC but I am unsure if it is even possible ( I am running in SAS). See below for a simplified model structure (there are many more variables and random interactions in the production model):
Yij=β0+β1TVX1ij+γ(TV×dma)i+εi
For the model above, I have one main effect for TV represented by β1 and a random interaction between DMA (there are 210 DMAS in the US) and TV which is represented by gamma. I have priors for all my parameters and when I run MCMC in SAS, it takes hours to run. Can MCMC handle 210 random interaction for the random term? I am using MCMC because I want to utilize the prior knowledge from previous modeling rounds but it makes no sense if it takes forever to run.
proc mcmc data=modeldbsubset outpost=postout thin=1000 nmc=20000 seed=7893
monitor=(b0 b1);
ods select PostSummaries PostIntervals tadpanel;
parms b0 0 b1 0;
parms s2 1 ;
parms s2g 1;
prior b: ~ normal(0, var = 10000);
prior s2: ~ igamma(0.001, scale = 1000);
random gamma ~ normal(0, var=s2g) subject = dmanum monitor = (gamma) namesuffix = position;
mu = b0 + b1*TV + gamma;
model Y ~ normal(mu, var = s2);
I don’t use SAS, but it’s no surprise this scale of model would fail miserably with the default random walk Metropolis they use, which initializes with an identity cov matrix for the proposal distribution. The documentation on scale tuning says you can tune to a MAP estimate of cov (this is what PyMC3 does by default), so maybe try starting there. However, the docs also say doing this will then use the MAP for parameter initialization, which is a bad idea since MAP is usually not in the typical set at high dimensions.
In the end, I expect you’ll need to do a lot of tuning specific to your data to really get it running cleanly, and unfortunately that’s just part of the art.
Alternatively, you might be better off picking up a more advanced MCMC sampling framework that implements HMC/NUTS, such as Stan or PyMC3 or Edward. There are even some high-level packages, like RStanArm, specifically for Bayesian regression modeling, but which keep the lower level MCMC stuff in the background.

userWarning pymc3 : What does reparameterize mean?

I built a pymc3 model using the DensityDist distribution. I have four parameters out of which 3 use Metropolis and one uses NUTS (this is automatically chosen by the pymc3). However, I get two different UserWarnings
1.Chain 0 contains number of diverging samples after tuning. If increasing target_accept does not help try to reparameterize.
MAy I know what does reparameterize here mean?
2. The acceptance probability in chain 0 does not match the target. It is , but should be close to 0.8. Try to increase the number of tuning steps.
Digging through a few examples I used 'random_seed', 'discard_tuned_samples', 'step = pm.NUTS(target_accept=0.95)' and so on and got rid of these user warnings. But I couldn't find details of how these parameter values are being decided. I am sure this might have been discussed in various context but I am unable to find solid documentation for this. I was doing a trial and error method as below.
with patten_study:
#SEED = 61290425 #51290425
step = pm.NUTS(target_accept=0.95)
trace = sample(step = step)#4000,tune = 10000,step =step,discard_tuned_samples=False)#,random_seed=SEED)
I need to run these on different datasets. Hence I am struggling to fix these parameter values for each dataset I am using. Is there any way where I give these values or find the outcome (if there are any user warnings and then try other values) and run it in a loop?
Pardon me if I am asking something stupid!
In this context, re-parametrization basically is finding a different but equivalent model that it is easier to compute. There are many things you can do depending on the details of your model:
Instead of using a Uniform distribution you can use a Normal distribution with a large variance.
Changing from a centered-hierarchical model to a
non-centered
one.
Replacing a Gaussian with a Student-T
Model a discrete variable as a continuous
Marginalize variables like in this example
whether these changes make sense or not is something that you should decide, based on your knowledge of the model and problem.

Stata seems to be ignoring my starting values in maximum likelihood estimation

I am trying to estimate a maximum likelihood model and it is running into convergence problems in Stata. The actual model is quite complicated, but it converges with no troubles in R when it is supplied with appropriate starting values. I however cannot seem to get Stata to accept the starting values I provide.
I have included a simple example below estimating the mean of a poisson distribution. This is not the actual model I am trying to estimate, but it demonstrates my problem. I set the trace variable, which allows you to see the parameters as Stata searches the likelihood surface.
Although I use init to set a starting value of 0.5, the first iteration still shows that Stata is trying a coefficient of 4.
Why is this? How can I force the estimation procedure to use my starting values?
Thanks!
generate y = rpoisson(4)
capture program drop mypoisson
program define mypoisson
args lnf mu
quietly replace `lnf' = $ML_y1*ln(`mu') - `mu' - lnfactorial($ML_y1)
end
ml model lf mypoisson (mean:y=)
ml init 0.5, copy
ml maximize, iterations(2) trace
Output:
Iteration 0:
Parameter vector:
mean:
_cons
r1 4
Added: Stata doesn't ignore the initial value. If you look at the output of the ml maximize command, the first line in the listing will be titled
initial: log likelihood =
Following the equal sign is the value of the likelihood for the parameter value set in the init statement.
I don't know how the search(off) or search(norescale) solutions affect the subsequent likelihood calculations, so these solution might still be worthwhile.
Original "solutions":
To force a start at your initial value, add the search(off) option to ml maximize:
ml maximize, iterate(2) trace search(off)
You can also force a use of the initial value with search(norescale). See Jeff Pitblado's post at http://www.stata.com/statalist/archive/2006-07/msg00499.html.

Using predict for ancillary parameters in maximum likelihood model in Stata

I wanted to know whether I can use the predict option for ancillary parameters (maximum likelihood ) program as follows (I estimated lnsigma and so sigma is the ancillary parameter in the model):
predict lnsigma, eq(lnsigma)
gen sigma=exp(lnsigma)
I also would like to know whether we can use above for heteroscedastic model.
Thank you in advance.
That sounds correct. I would be more explicit by typing predict lnsigma, xb eq(lnsigma). This way your code will not be broken when someone later on desides to write a prediction program for your estimation program and sets the default to something different than the linear prediction.
You can also do it in one line:
predictnl sigma = exp(xb(#2))
This assumes that lnsigma is the second equation in your model. If it is the third equation you replace xb(#2) with xb(#3). predictnl is also also an easy way of using the delta method to predict standard errors and confidence intervals for sigma.
I assume this is your own Stata program. If that is true, then you also have a third option: You can create your own prediction program, which Stata's predict command will recongnize. You can find some useful tricks on how to do that here: http://www.stata.com/help.cgi?_pred_se

OpenCV Neural Network train one iteration at a time

The only way I know to train a multilayer neural network in OpenCV is:
CvANN_MLP network;
....
network.train(input, output, Mat(), Mat(), params, flags);
But this will not print out any meaningful debug (e.g. Iteration count, current error,...), the program will just sit there until it finishes training, very troublesome if the dataset is in gigabytes, there's no way I can see the progress.
How do I train the network one iteration at a time, or print out some debug while training?
Problem not solved, but question solved. Answer: It's impossible as far as the current OpenCV versions are concerned.
Are you setting the UPDATE_WEIGHTS flags?
You can test the error yourself by having the ANN predict the result vector for each sample in the training set.
According to http://opencv.willowgarage.com/documentation/cpp/ml_neural_networks.html#cvann-mlp-train
the params parameter is of Type cvANN_MLP_TrainParams. This class contains a property TermCriteria which controls the when the training function terminates. This Termination criteria class http://opencv.willowgarage.com/documentation/cpp/basic_structures.html can be set to terminate after a given number of iterations or when a given epsilon conditions is fulfilled or some combination of both. I have not used the training function myself so I can't know the code that you'd use to make this work, but something like this should limit the number of training cycles
CvANN_MLP_TrainParams params = CvANN_MLP_TrainParams()
params.term_crit.type = 1;//This should tell the train function you want to terminate on number of iterations
params.term_crit.maxCount = 1;//Termination after one iteration might be max_iter instead of maxCount
network.train(input, output, Mat(),Mat(), params, flags)
Like I said I haven't worked with openCV but having read the documentation something like this should work.
Your answer lays in the source code. IF you want to get some output after every x epochs, put something in the source code, in this loop:
https://github.com/opencv/opencv/blob/9787ab598b6609a6ca6652a12441d741cb15f695/modules/ml/src/ann_mlp.cpp#L941
When they made OpenCV they had to find a balance between user customizability and how easy it is to use/read. Ultimately you have the power to do whatever you want when editing the source code.