Adding formatted value to plot title in Stata - stata

I have variables, y and x, and I have run a regression to obtain the slope of the regression line. I can retrieve the p value (and other parameters) from the regression model using the method described in https://journals.sagepub.com/doi/pdf/10.1177/1536867X0800700408, such as:
regress y x
local t = _b[x]/_se[x]
local p = 2*ttail(e(df_r),abs(`t'))
I want to produce a plot of the same data with a line predictor such as:
graph twoway (lfitci x y) (scatter x y), note("Simple regression p value = `p'")
However, in the above case, the p value that is presented in the plot's note area is expressed to full precision (and without the leading zero). How can I restrict the number of decimal places in the plot to a sensible number? I've tried including %9.3f but haven't been able to work out the correct syntax.

local p : di %04.3f `p'
If you want spaces in front, put spaces in front.
Examples of similar technique can be found inside the code for aaplot on SSC.

Related

numpy.interpd -returning only the last value after interpolation

I have a set of data points in pressure(p) and vmr.
I want to find the vmr values for another pressure grid(pressure_grid).I used np.interp,
for lines in itertools.islice(input_file, i, l):
lines=lines.split()
p.append(float(lines[0]))
vmr.append(float(lines[3]))
x = np.array(p)
y = np.array(vmr)
yi=np.interp(pressure_grid,x,y)
But when I tried to print "yi" it is printing only the value(i.e.,vmr value) corresponding to the last value of "pressure_grid".For all iterations it is printing the same value
I tried to print p and vmr ,Everything seems to be fine till there.I'm not able to understand why this is happening...
I'm new to this........Please Help
This is how my file looks like,first column-p and second column-vmr.
and this is my pressure grid
https://1drv.ms/t/s!AmPNuP3pNnN8g35NPwIfzSl-VBeO
https://1drv.ms/f/s!AmPNuP3pNnN8hAx3opovgipabSjJ
There are two issues with your code. First x and y are being overwritten for each iteration of the for loop, which means, x and y contain just a single element for the interpolation. To fix this, you could define x and y list outside the loop and append in the for loop, or more simply, just use numpy.loadtxt():
import numpy as np
data = np.loadtxt('demo.txt',comments='<',usecols=[0,2])
Here, I have specified to skip rows beginning with a less than sign, so we only get the actual data.
Second, for numpy.interp to actually work, you need the x-coordinate to be an increasing sequence. (check the notes). For your data, x is a decreasing sequence, so you should flip the data after loading it:
x = data[::-1,0]
y = data[::-1,1]
interpolation = np.interp(grid,x,y)
Alternatively, you could just use the scipy.interpolate package on the original, unflipped data. This has the added advantage of allowing you to extrapolate data that isn't enclosed by your input domain:
from scipy import interpolate
interpolation = interpolate.interp1d(x,y,fill_value='extrapolate')
Note: your input file appears to have more than one <Matrix> </Matrix> set. To get all this to work, I trimmed the file so it only contained one dataset. Otherwise, your x input data will not be strictly increasing, even after flipping, and you will have to sort.

How to apply a mask to a DataFrame in Python?

My dataset named ds_f is a 840x57 matrix which contains NaN values. I want to forecast a variable with a linear regression model but when I try to fit the model, I get this message "SVD did not converge":
X = ds_f[ds_f.columns[:-1]]
y = ds_f['target_o_tempm']
model = sm.OLS(y,X) #stackmodel
f = model.fit() #ERROR
So I've been searching for an answer to apply a mask to a DataFrame. Although I was thinking of creating a mask to "ignore" NaN values and then convert it into a DataFrame, I get the same DataFrame as ds_f, nothing changes:
m = ma.masked_array(ds_f, np.isnan(ds_f))
m_ds_f = pd.DataFrame(m,columns=ds_f.columns)
EDIT: I've solved the problem by writing model=sm.OLS(X,y,missing='drop') but a new problem appears when I display results, I get only NaN:
Are you using statsmodels? If so, you could specify sm.OLS(y, X, missing='drop'), to drop the NaN values prior to estimation.
Alternatively, you may want to consider interpolating the missing values, rather than dropping them.

Extracting coefficients from sqreg in Stata

I am trying to run quantile regressions across deciles, and so I use the sqreg command to get bootstrap standard errors for every decile. However, after I run the regression (so Stata runs 9 different regressions - one for each decile except the 100th) I want to store the coefficients in locals. Normally, this is what I would do:
reg y x, r
local coeff = _b[x]
And things would work well. However, here my command is:
sqreg y x, q(0.1 0.2 0.3)
So, I will have three different coefficients here that I want to store as three different locals. Something like:
local coeff10 = _b[x] //Where _b[x] is the coefficient on x for the 10th quantile.
How do I do this? I tried:
local coeff10 = _b[[q10]x]
But this gives me an error. Please help!
Thank you!
Simply save matrix of coefficients from postestimation scalars and reference the outputted variable by row and column.
The reason you could not do the same as the OLS is the sqreg matrix holds multiple named instances of coefficient names:
* OUTPUTS MATRIX OF COEFFICIENTS (1 X 6)
matrix list e(b)
* SAVE COEFF. MATRIX TO REGULAR MATRIX VARIABLE
mat b = e(b)
* EXTRACT BY ROW/COLUMN INTO OTHER VARIABLES
local coeff10 = b[1,1]
local coeff20 = b[1,3]
local coeff30 = b[1,5]

Stata: extract p-values and save them in a list

This may be a trivial question, but as an R user coming to Stata I have so far failed to find the correct Google terms to find the answer. I want to do the following steps:
Do a bunch of tests (e.g. lrtest results in a foreach loop)
Extract the p-value from each test and save them in a list of some kind
Have a list I can do further operations on (e.g. perform multiple comparison correction)
So I am wondering how to extract p-values (or similar) from command results and how to save them into a vector-like object that I can work with. Here is some R code that does something similar:
myData <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10)) ## generate some data
pValue <- c()
for (variableName in c("b", "c")) {
myModel <- lm(as.formula(paste("a ~", variableName)), data=myData) ## fit model
pValue <- c(pValue, coef(summary(myModel))[2, "Pr(>|t|)"]) ## extract p-value and save in vector
}
pValue * 2 ## do amazing multiple comparison correction
To me it seems like Stata has much less of a 'programming' mindset to it than R. If you have any general Stata literature recommendations for an R user who can program, that would also be appreciated.
Here is an approach that would save the p-values in a matrix and then you can manipulate the matrix, maybe using Mata or standard matrix manipulation in Stata.
matrix storeMyP = J(2, 1, .) //create empty matrix with 2 (as many variables as we are looping over) rows, 1 column
matrix list storeMyP //look at the matrix
loc n = 0 //count the iterations
foreach variableName of varlist b c {
loc n = `n' + 1 //each iteration, adjust the count
reg a `variableName'
test `variableName' //this does an F-test, but for one variable it's equivalent to a t-test (check: -help test- there is lots this can do
matrix storeMyP[`n', 1] = `r(p)' //save the p-value in the matrix
}
matrix list storeMyP //look at your p-values
matrix storeMyP_2 = 2*storeMyP //replicating your example above
What's going on this that Stata automatically stores certain quantities after estimation and test commands. When the help files say this command stores the following values in r(), you refer to them in single quotes.
It could also be interesting for you to convert the matrix column(s) into variables using svmat storeMyP, or see help svmat for more info.

Perform Fisher Exact Test from aggregated using Stata

I have a set of data like below:
A B C D
1 2 3 4
2 3 4 5
They are aggregated data which ABCD constitutes a 2x2 table, and I need to do Fisher exact test on each row, and add a new column for the p-value of the Fisher exact test for that row.
I can use fisher.exact and loop to do it in R, but I can't find a command in Stata for Fisher exact test.
You are thinking in R terms, and that is often fruitless in Stata (just as it is impossible for a Stata guy to figure out how to do by ... : regress in R; every package has its own paradigm and its own strengths).
There are no objects to add columns to. May be you could say a little bit more as to what you need to do, eventually, with your p-values, so as to find an appropriate solution that your Stata collaborators would sympathize with.
If you really want to add a new column (generate a new variable, speaking Stata), then you might want to look at tabulate and its returned values:
clear
input x y f1 f2
0 0 5 10
0 1 7 12
1 0 3 8
1 1 9 5
end
I assume that your A B C D stand for two binary variables, and the numbers are frequencies in the data. You have to clear the memory, as Stata thinks about one data set at a time.
Then you could tabulate the results and generate new variables containing p-values, although that would be a major waste of memory to create variables that contain a constant value:
tabulate x y [fw=f1], exact
return list
generate p1 = r(p_exact)
tabulate x y [fw=f2], exact
generate p2 = r(p_exact)
Here, [fw=variable] is a way to specify frequency weights; I typed return list to find out what kind of information Stata stores as the result of the procedure. THAT'S the object-like thing Stata works with. R would return the test results in the fisher.test()$p.value component, and Stata creates returned values, r(component) for simple commands and e(component) for estimation commands.
If you want a loop solution (if you have many sets), you can do this:
forvalues k=1/2 {
tabulate x y [fw=f`k'], exact
generate p`k' = r(p_exact)
}
That's the scripting capacity in which Stata, IMHO, is way stronger than R (although it can be argued that this is an extremely dirty programming trick). The local macro k takes values from 1 to 2, and this macro is substituted as ``k'` everywhere in the curly bracketed piece of code.
Alternatively, you can keep the results in Stata short term memory as scalars:
tabulate x y [fw=f1], exact
scalar p1 = r(p_exact)
tabulate x y [fw=f2], exact
scalar p2 = r(p_exact)
However, the scalars are not associated with the data set, so you cannot save them with the
data.
The immediate commands like cci suggested here would also have returned values that you can similarly retrieve.
HTH, Stas
Have a look the cci command with the exact option:
cci 10 15 30 10, exact
It is part of the so-called "immediate" commands. They allow you to do computations directly from the arguments rather than from data stored in memory. Have a look at help immediate
Each observation in the poster's original question apparently consisted of the four counts in one traditional 2 x 2 table. Stas's code applied to data of individual observations. Nick pointed out that -cci- can analyze a b c d data. Here's code that applies -cci to each table and, like Stas's code, adds the p-values to the data set. The forvalues i = 1/`=_N' statement tells Stata to run the loop from the first to the last observation. a[`i'] refers to the the value of the variable `a' in the i-th observation.
clear
input a b c d
10 2 8 4
5 8 2 1
end
gen exactp1 = .
gen exactp2 =.
label var exactp1 "1-sided exact p"
label var exactp2 "2-sided exact p"
forvalues i = 1/`=_N'{
local a = a[`i']
local b = b[`i']
local c = c[`i']
local d = d[`i']
qui cci `a' `b' `c' `d', exact
replace exactp1 = r(p1_exact) in `i'
replace exactp2 = r(p_exact) in `i'
}
list
Note that there is no problem in giving a local macro the same name as a variable.