modifying the family parameter in ggmcmc plots - regex

I am using BUGS software through R for doing bayesian analysis and i utilize ggmcmc package for bayesian inference.
In my recent example i have a whole matrix b of parameters under monitor, with dimensions 5x8. Now if i use straight ahead a plot from the ggmcmc package, the parameters are so many that i cant see a thing in the output posterior plot.
e.g. ggs_histogram
Now plot functions in ggmcmc have a parameter called family and you use this to select a subset of parameters to include in the plot. In the official package page it says that you have to set family equal to a regular expression that matches the parameters you want and its quite easy if you have let's say parameters a,b and you want to plot b(family='b').
Now i want from the b matrix that i mentioned to plot only one column elements , for example b[1,1],b[2,1],b[3,1],...,b[8,1]
So i tried to subset this the usual way ,like family='b[,1]'.
Error in seq.default(mn, mx, by = bw) :
'from' cannot be NA, NaN or infinite
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
Any ideas? Maybe a correct regexp or a ggplot facet_grid dribble?

Eventually, the ggmcmc package official pdf document had all the info i was looking for. I was right about the need of a regular expression and the tutorial of the package was pretty informative about the form the regular expression is expected to have.
So if i wanted let's say to infer about the elements of the first column of the parameters matrix,
family='b\\[.,1\\]'
would do the job just fine. This works on any of the inference functions of ggmcmc package.

Related

How to get y axis range in Stata

Suppose I am using some twoway graph command in Stata. Without any action on my part Stata will choose some reasonable values for the ranges of both y and x axes, based both upon the minimum and maximum y and x values in my data, but also upon some algorithm that decides when it would be prettier for the range to extend instead to a number like '0' instead of '0.0139'. Wonderful! Great.
Now suppose that after (or while) I draw my graph, I want to slap some very important text onto it, and I want to be choosy about precisely where the text appears. Having the minimum and maximum values of the displayed axes would be useful: how can I get these min and max numbers? (Either before or while calling the graph command.)
NB: I am not asking how to set the y or x axis ranges.
Since this issue has been a bit of a headache for me for quite some time and I believe there is no good solution out there yet I wanted to write up two ways in which I was able to solve a similar problem to the one described in the post. Specifically, I was able to solve the issue of gray shading for part of the graph using these.
Define a global macro in the code generating the axis labels This is the less elegant way to do it but it works well. Locate the tickset_g.class file in your ado path. The graph twoway command uses this to draw the axes of any graph. There, I defined a global macro in the draw program that takes the value of the omin and omax locals after they have been set to the minimum between the axis range and data range (the command that does this is local omin = min(.scale.min,omin) and analogously for the max), since the latter sometimes exceeds the former. You could also define the global further up in that code block to only get the axis extent. You can then access the axis range using the globals after the graph command (and use something like addplot to add to the previously drawn graph). Two caveats for this approach: using global macros is, as far as I understand, bad practice and can be dangerous. I used names I was sure wouldn't be included in any program with the prefix userwritten. Also, you may not have administrator privileges that allow you to alter this file based on your organization's decisions. However, it is the simpler way. If you prefer a more elegant approach along the lines of what Nick Cox suggested, then you can:
Use the undocumented gdi natscale command to define your own axis labels The gdi commands are the internal commands that are used to generate what you see as graph output (cf. https://www.stata.com/meeting/dcconf09/dc09_radyakin.pdf). The tickset_g.class uses the gdi natscale command to generate the nice numbers of the axes. Basic documentation is available with help _natscale, basically you enter the minimum and maximum, e.g. from a summarize return, and a suggested number of steps and the command returns a min, max, and delta to be used in the x|ylabel option (several possible ways, all rather straightforward once you have those numbers so I won't spell them out for brevity). You'd have to adjust this approach in case you use some scale transformation.
Hope this helps!
I like Nick's suggestion, but if you're really determined, it seems that you can find these values by inspecting the output after you set trace on. Here's some inefficient code that seems to do exactly what you want. Three notes:
when I import the log file I get this message:
Note: Unmatched quote while processing row XXXX; this can be due to a formatting problem in the file or because a quoted data element spans multiple lines. You should carefully inspect your data after importing. Consider using option bindquote(strict) if quoted data spans multiple lines or option bindquote(nobind) if quotes are not used for binding data.
Sometimes the data fall outside of the min and max range values that are chosen for the graph's axis labels (but you can easily test for this).
The log linesize is actually important to my code below because the key values must fall on the same line as the strings that I use to identify the helpful rows.
* start a log (critical step for my solution)
cap log close _all
set linesize 255
log using "log", replace text
* make up some data:
clear
set obs 3
gen xvar = rnormal(0,10)
gen yvar = rnormal(0,.01)
* turn trace on, run the -twoway- call, and then turn trace off
set trace on
twoway scatter yvar xvar
set trace off
cap log close _all
* now read the log file in and find the desired info
import delimited "log.log", clear
egen my_string = concat(v*)
keep if regexm(my_string,"forvalues yf") | regexm(my_string,"forvalues xf")
drop if regexm(my_string,"delta")
split my_string, parse("=") gen(new)
gen axis = "vertical" if regexm(my_string,"yf")
replace axis = "horizontal" if regexm(my_string,"xf")
keep axis new*
duplicates drop
loc my_regex = "(.*[0-9]+)\((.*[0-9]+)\)(.*[0-9]+)"
gen min = regexs(1) if regexm(new3,"`my_regex'")
gen delta = regexs(2) if regexm(new3,"`my_regex'")
gen max_temp= regexs(3) if regexm(new3,"`my_regex'")
destring min max delta , replace
gen max = min + delta* int((max_temp-min)/delta)
*here is the info you want:
list axis min delta max

specify priors in multi-label Naive Bayes in python scikit-learn

I am working on a multi-label classification. I used GaussianNB function on python scikit-learn. The target is an array with (N, L) shape, where L is the number of classes and N is the number of observations.
I used three ways to deal with multi-label case:
binary relevance
chain model
label powerset
I have a prior distribution for L classes, which is an array of (L,) shape. I tried to incorporate this prior distribution into GaussianNB through priors parameter like this
classifier = BinaryRelevance(GaussianNB(priors = prior_dist))
However, it returns the following error
ValueErrors: number of priors must match number of classes
What is the correct way to specify priors into GaussianNB in a multi-label case?
I haven't added support for this yet in scikit-multilearn, but it seems fairly easy to add - could you put it as a feature request in scikit-multilearn? I think I have an idea how to add this, but we can track the issue further in github.

How can I create spline function variables in Stata by hand?

I know there is, in Stata, a command called mkspline that generates cubic spline function. But I want to replicate my Stata output using other software, so I need to learn how to create these spline function variables.
If, say, I use a syntax like this :
mkspline age3sp = age, cubic knots(-13 -7 0 8 16)
How would you do it by hand ?
I would read the Stata documentation for mkspline (see http://www.stata.com/manuals14/rmkspline.pdf for an online copy of the version included with Stata 14) and follow the guidance in the Methods and formulas section.

Stata seems to be ignoring my starting values in maximum likelihood estimation

I am trying to estimate a maximum likelihood model and it is running into convergence problems in Stata. The actual model is quite complicated, but it converges with no troubles in R when it is supplied with appropriate starting values. I however cannot seem to get Stata to accept the starting values I provide.
I have included a simple example below estimating the mean of a poisson distribution. This is not the actual model I am trying to estimate, but it demonstrates my problem. I set the trace variable, which allows you to see the parameters as Stata searches the likelihood surface.
Although I use init to set a starting value of 0.5, the first iteration still shows that Stata is trying a coefficient of 4.
Why is this? How can I force the estimation procedure to use my starting values?
Thanks!
generate y = rpoisson(4)
capture program drop mypoisson
program define mypoisson
args lnf mu
quietly replace `lnf' = $ML_y1*ln(`mu') - `mu' - lnfactorial($ML_y1)
end
ml model lf mypoisson (mean:y=)
ml init 0.5, copy
ml maximize, iterations(2) trace
Output:
Iteration 0:
Parameter vector:
mean:
_cons
r1 4
Added: Stata doesn't ignore the initial value. If you look at the output of the ml maximize command, the first line in the listing will be titled
initial: log likelihood =
Following the equal sign is the value of the likelihood for the parameter value set in the init statement.
I don't know how the search(off) or search(norescale) solutions affect the subsequent likelihood calculations, so these solution might still be worthwhile.
Original "solutions":
To force a start at your initial value, add the search(off) option to ml maximize:
ml maximize, iterate(2) trace search(off)
You can also force a use of the initial value with search(norescale). See Jeff Pitblado's post at http://www.stata.com/statalist/archive/2006-07/msg00499.html.

Excel Formula Calculations

I'm trying to add a spreadsheet editing function in my iOS app. I am using a gridview to display (not relevant to the question) and I am using LibXL to load the data into the view. That part all works very well but I have no way to calculate the formulas after a cell has been modified.
It appears that when I write a formula with LibXL it is not calculating the new value, just setting the formula value (a string). So when I try to read the number value from that cell it is still set to the last computed number (from excel).
Likewise, if I create cells with numbers and a formula cell to SUM them, it is never actually computed which reads a 0 number value until it's opened in Excel.
I was hoping LibXL was the silver bullet to my problem, but now I'm stuck with just the formula string value (i.e. "SUM(A1:b2)" ) and the last computed value.
I would love it if LibXL simply DID compute values and I just have it all wrong, but I can't see any documentation that says otherwise. If that's not the case are there any Obj-C, C, or C++ libraries that I can use to match the Excel Formula syntax and compute these values?
Just adding my previous comment as an answer:
Dave Delong's DDMathParser has the option to add custom functions, check it out here: http://github.com/davedelong/DDMathParser