Nonlinear least squares in Stata, how to model summation over variables/sets? - stata

I would like to estimate the following function by nonlinear least squares using Stata:
I am testing the results of another papper and would like to use Stata since it is the same software/solver as they used in the paper I am replicating and because it should be easier to do than using GAMS, for example.
My problem is that I cannot find any way to write out the sum part of the equation above. In my data all i's have are a single observation with the values for the j's in separate variables. I could write out the whole expression in the following manner (for three observations/i's):
nl (ln_wage = {alpha0} + {alpha0}*log( ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) + ((S_over_H_2)^{alpha2})*exp({alpha3}*distance_2) + ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) ))
Is there a simple way to tell Stata to sum over an expression/variables for a given set of numbers, like in GAMS where you can write:
lnwage(i) = alpha0 + alpha1*ln(sum((j), power(S_over_H(i,j),alpha2) * exp(alpha3 * distance(i,j))))

There is no direct equivalent in Stata of the GAMS notation you cite, but you could do this
forval j = 1/3 {
local call `call' S_over_H_`j'^({alpha2}) * exp({alpha3} * distance_`j')
}
nl (ln_wage = {alpha0} + {alpha1} * ln(`call')
P.S. please explain what GAMS is.

Related

sympy separate fractions from variables

Using sympy how do I keep fractions separate from variables
Mul(Fraction(3,5), Pow(K, Integer(2)))
2
3⋅K
────
5
to
3 2
─ K
5
I know this simplified version is not too bad, but when i have really big equations, it gets messy
I'm not very familiar with pretty printing or LaTeX printing but I managed to come up with something. Put UnevaluatedExpr in each of the arguments of Mul:
from sympy import *
from fractions import Fraction
K = symbols("K")
expr1 = Mul(UnevaluatedExpr(Fraction(3,5)), UnevaluatedExpr(Pow(K, Integer(2))))
expr2 = Mul(UnevaluatedExpr(pi/5), UnevaluatedExpr(Pow(K, Integer(2))))
expr3 = ((UnevaluatedExpr(S(1)*3123456789/512345679) * UnevaluatedExpr(Pow(K, Integer(2)))))
pprint(expr1)
pprint(expr2)
pprint(expr3)
Produces:
2
3/5⋅K
π 2
─⋅K
5
1041152263 2
──────────⋅K
170781893
I couldn't find a way to make it print a stacked fraction for the slashed fraction 3/5. Longer fractions seem to work though. If you are printing in LaTeX however, the documentation suggests something like latex(expr1, fold_frac_powers=False) to correct this.
Too bad I couldn't find an elegant solution like putting init_printing(div_stack_symbols=False) at the top of the document.
To elaborate on Maelstrom's Answer, you need to do 2 things to make this work like you want:
Create the separate fraction you want as its own expression.
Prevent the numerator or denominator from being modified when the expression is combined with other expressions.
What Maelstrom showed will work, but it's much more complicated than what's actually needed. Here's a much cleaner solution:
from sympy import *
K = symbols("K")
# Step 1: make the fraction
# This seems to be a weird workaround to prevent fractions from being broken
# apart. See the note after this code block.
lh_frac = UnevaluatedExpr(3) / 5
# Step 2: prevent the fraction from being modified
# Creating a new multiplication expression will normally modify the involved
# expressions as sympy sees fit. Setting evaluate to False prevents that.
expr = Mul(lh_frac , Pow(K, 2), evaluate=False)
pprint(expr)
gives:
3 2
-*K
5
Important Note:
Doing lh_frac = UnevaluatedExpr(3) / 5 is not how fractions involving 2 literal numbers should typically be created. Normally, you would do:
lh_frac = Rational(3, 5)
as shown in the sympy docs. However, that gives undesirable output for our use case right now:
2
3*K
----
5
This outcome is surprising to me; setting evaluate to False inside Mul should be sufficient to do what we want. I have an open question about this.

how can i improve bulk calculation from file data

I have a file of binary values. The section I am looking at is 4 byte int with the values in the pattern of MW1, MVAR1, MW2, MVAR2,...
I read the values in with
temp = array.array("f")
temp.fromfile(file, length *2)
mw_mvar = temp.tolist()
I then calculate the magnitude like this.
mag = [0] * length
for x in range(0,length * 2, 2):
a = mw_mvar[x]
b = mw_mvar[x + 1]
mag[(x / 2)] = sqrt(a*a + b*b)
The calculations (not the read) are doubling the total length of my script. I know there is (theoretically) a way to do this faster because am mimicking a script that ultimately calls fortran (pyd to call function dlls in fortran i think) which is able to do this calculation with negligible affect on run time.
This is the best i can come up with. any suggestions for improvements?
I have also tried math.pow(), **.5, **2 with no differences.
with no luck improving the calculations, I went around the problem. I realised that I only needed 1% of those calculated values so I created a class to calculate them on demand. It was important (to me) that the resulting code act similar to as if it were a list of calculated values. A lot of the remainder of the process uses the values and different versions of the data are pre-calculated. The class means i don't need a set of procedures for each version of data
class mag:
def __init__(self,mw_mvar):
self._mw_mvar = mw_mvar
#_sgn = sgn
def __len__(self):
return len(self._mw_mvar/2)
def __getitem__(self, item):
return sqrt(self._mw_mvar[2*item] ** 2 + self._mw_mvar[2*item+1] ** 2)
ps this could also be done in a function and take both versions. i would have had to make more changes to the overall script.
function (a,b,x):
if b[x]==0:
return a[x]
else:
return sqrt(a[x]**2 + b[x]**2)

Pretty print expression as entered

I would like to pretty print an expression to double check that it's what I want, without any manipulations or simplifications. Here's a simple example:
from sympy import *
import abc
init_session()
sigma_1, sigma_0, mu_1, mu_0,x = symbols("sigma_1 sigma_0 mu_1 mu_0 x")
diff = log(1/(sqrt(2*pi*sigma_1**2)) * exp(-(x-mu_1)**2/(2*sigma_1**2))) - log(1/(sqrt(2*pi*sigma_0**2)) * exp(-(x-mu_0)**2/(2*sigma_0**2)))
diff
This has manipulated the expression a bit, but I'd like to see it pretty printed just in the order I entered it, so I can check it easily against the formulas I've got written down.
Is there a way to do that?
You can avoid some simplifications by using
sympify("log(1/(sqrt(2*pi*sigma_1**2)) * exp(-(x-mu_1)**2/(2*sigma_1**2))) - log(1/(sqrt(2*pi*sigma_0**2)) * exp(-(x-mu_0)**2/(2*sigma_0**2)))", evaluate=False)
However, some simplifications can't be avoided. For example, there's no way to keep terms in the same order, and some expressions, like 1/x and x**-1 are internally represented in the same way. With that being said, there are definitely places where sympify(evaluate=False) could be improved.

Extracting coefficients from sqreg in Stata

I am trying to run quantile regressions across deciles, and so I use the sqreg command to get bootstrap standard errors for every decile. However, after I run the regression (so Stata runs 9 different regressions - one for each decile except the 100th) I want to store the coefficients in locals. Normally, this is what I would do:
reg y x, r
local coeff = _b[x]
And things would work well. However, here my command is:
sqreg y x, q(0.1 0.2 0.3)
So, I will have three different coefficients here that I want to store as three different locals. Something like:
local coeff10 = _b[x] //Where _b[x] is the coefficient on x for the 10th quantile.
How do I do this? I tried:
local coeff10 = _b[[q10]x]
But this gives me an error. Please help!
Thank you!
Simply save matrix of coefficients from postestimation scalars and reference the outputted variable by row and column.
The reason you could not do the same as the OLS is the sqreg matrix holds multiple named instances of coefficient names:
* OUTPUTS MATRIX OF COEFFICIENTS (1 X 6)
matrix list e(b)
* SAVE COEFF. MATRIX TO REGULAR MATRIX VARIABLE
mat b = e(b)
* EXTRACT BY ROW/COLUMN INTO OTHER VARIABLES
local coeff10 = b[1,1]
local coeff20 = b[1,3]
local coeff30 = b[1,5]

Getting unknown function mean() in a forvalues loop

Getting unknown function mean for this. Can't use egen because it has to be calculated for each value. A little confused.
edu_mov_avg=.
forvalues current_year = 2/133 {
local current_mean = mean(higra) if longitbirthqtr >= current_year - 2 & longitbirthqtr >= current_year + 2
replace edu_mov_avg = current_mean if longitbirthqtr =
}
Your code is a long way from working. This should be closer.
gen edu_mov_avg = .
qui forvalues current_qtr = 2/133 {
su higra if inrange(longitbirthqtr, `current_qtr' - 2, `current_qtr' + 2), meanonly
replace edu_mov_avg = r(mean) if longitbirthqtr == `current_qtr'
}
You need to use a command generate to produce a new variable.
You need to reference local macro values with quotation marks.
egen has its own mean() function, but it produces a variable, whereas you need a constant here. Using summarize, meanonly is the most efficient method. There is in Stata no mean() function that can be applied anywhere. Once you use summarize, there is no need to use a local macro to hold its results. Here r(mean) can be used directly.
You have >= twice, but presumably don't mean that. Using inrange() is not essential in writing your condition, but gives shorter code.
You can't use if qualifiers to qualify assignment of local macros in the way you did. They make no sense to Stata, as such macros are constants.
longitbirthqtr looks like a quarterly date. Hence I didn't use the name current_year.
With a window this short, there is an alternative using time series operators
tsset current_qtr
gen edu_mov_avg = (L2.higra + L1.higra + higra + F1.higra + F2.higra) / 5
That is not exactly equivalent as missings will be returned for the first two observations and the last two.
Your code may need further work if your data are panel data. But the time series operators approach remains easy so long as you declare the panel identifier, e.g.
tsset panelid current_qtr
after which the generate call is the same as above.
All that said, rolling offers a framework for such calculations.