Regressing state-level coefficient on state-level law - stata

I have the following.
county state employment county_shock state_law
1 NY 70 3 10
2 NY 80 4 10
4 IL 100 2 5
7 IL 60 9 5
3 TX 90 8 2
I ran the regression for all counties:
regress employment county_shock
But now I am curious about the state-level law on the degree to which county_shock affects employment.
Not sure adding interaction term achieves this.
But what I am trying to do is the following:
Run
regress employment county_shock
for "each state"
Then I will have coefficient for county_shock for each state. I could get those coefficients by "_b"
Then, regress those coefficients on state-level law.
How should I do this?

This is what I thought at first:
The initial model is y = b0 + b1*x + u, where x is county_shock for short. You mean b1 is a function of z, which is state_law. For this you can regress y on x, z and x*z. (Here you want to let the intercept different across state as well. The model without z looks strange because it means that the intercept is the same for all states.)
Now, let the model with interaction be y = c0 + c1*x + c2*z + c3*x*z + u. This is written as y = (c0 + c2*z) + (c1+c3*z)*x + u. Thus, c3 is the coefficient you want.
In Stata,
reg employment c.county_shock##c.state_law
-
After reading the second part of your question, I am not sure if this is what you want.

Related

How to compare and calculate each row to all other rows in the same column and table

My data looks like this, a place with their coordinates details
Place Latitude Longitude
A 2.314 97.6110288
B 3.425 98.6925504
C 4.1231 99.774072
D 5.096466667 100.8555936
E 6.001016667 101.9371152
F 6.905566667 103.0186368
G 7.810116667 104.1001584
H 8.714666667 105.18168
I 9.619216667 106.2632016
J 10.52376667 107.3447232
K 11.42831667 108.4262448
L 12.33286667 109.5077664
M 13.23741667 110.589288
N 14.14196667 111.6708096
O 15.04651667 112.7523312
P 15.95106667 113.8338528
So the table looks like this
what i want to do is compare the place to all the other place by counting the distance in between places. and if it fulfills the criteria, we add one to output
so for example
We compare the distance of Place A to , B,C,D,E,F,G
so
for example A-B , distance = 100
A-C, distance = 70
A-D, distance = 50
A-E,distance = 120
A-F,distance = 140
A-G,distance = 175
A-H, DIstance=80
A-I,Distance =40
A-J,Distance=190
A-K,distance=209
A-L,distance=109
A-M,A-N,A-O,A-P=150
and we go a conditional so if i want to only take the one that is larger than 151 , it will return 3 for the row
and this will calculate for all rows in the table
the output example is like this
output expected
Place Latitude Longitude Bigger Than 151
A 2.314 97.6110288 3
B 3.425 98.6925504 5
C 4.1231 99.774072 1
D 5.096466667 100.8555936 3
E 6.001016667 101.9371152 2
F 6.905566667 103.0186368 1
G 7.810116667 104.1001584 5
H 8.714666667 105.18168 2
I 9.619216667 106.2632016 4
J 10.52376667 107.3447232 1
K 11.42831667 108.4262448 0
L 12.33286667 109.5077664 0
M 13.23741667 110.589288 0
N 14.14196667 111.6708096 0
O 15.04651667 112.7523312 0
P 15.95106667 113.8338528 0
i also can use python for power bi, if power query/dax power Bi may not be able to solve this .
Thank you
Start by cross-joining your Places table with itself: Cross join
Next calculate the (haversine) distances between all places: Use Power Query to Calculate Distance
Finally filter the Distance column > 151 and GroupBy Place, counting the rows.
Of cause everything can be done in DAX as well, but all calculations will run "live" in the report, which will impact the performance with 100k x 100k rows.

Fixed effects in Stata

Very new to Stata, so struggling a bit with using fixed effects. The data here is made up, but bear with me. I have a bunch of dummy variables that I am doing regression with. My dependent variable is a dummy that is 1 if a customer bought something and 0 if not. My fixed effects are whether or not there was a yellow sign out front or not (dummy variable again). My independent variable is if the store manager said hi or not (dummy variable).
Basically, I want my output to look like this (with standard errors obviously)
Yellow sign No Yellow sign
Manager said hi estimate estimate
You can use the ## operator in a regression to get a saturated model with fixed effects:
First, input data such that you have a binary outcome (bought), a dependent variable (saidhi), and a fixed effects variable (sign). saidhi should be correlated with your outcome (so there is a portion of saidhi that is uncorrelated with bought and a portion that is), and your FE variable should be correlated with both bought and saidhi (otherwise there is no point having it in your regression if you are only interested in the effect of saidhi).
clear
set obs 100
set seed 45
gen bought = runiform() > 0.5 // Binary y, 50/50 probability
gen saidhi = runiform() + runiform()^2*bought
gen sign = runiform() + runiform()*saidhi + runiform()*bought > 0.66666 // Binary FE, correlated with both x and y
replace saidhi = saidhi > 0.5
Now, run your regression:
* y = x + FE + x*FE + cons
reg bought saidhi##sign, r
exit
Your output should be:
Linear regression Number of obs = 100
F(3, 96) = 13.34
Prob > F = 0.0000
R-squared = 0.1703
Root MSE = .46447
------------------------------------------------------------------------------
| Robust
bought | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.saidhi | .3571429 .2034162 1.76 0.082 -.0466351 .7609209
1.sign | .3869048 .1253409 3.09 0.003 .138105 .6357046
|
saidhi#sign |
1 1 | -.1427489 .2373253 -0.60 0.549 -.6138359 .3283381
|
_cons | .0714286 .0702496 1.02 0.312 -.0680158 .210873
------------------------------------------------------------------------------
1.saidhi is the effect of saidhi when sign == 0. 1.sign is the effect of the sign, alone, i.e. when saidhi == 0. The parts under saidhi#sign describe the interaction between these two variables (i.e. the marginal effect of them both being 1 at the same time... keep in mind the total effect of them both being one includes the previous two terms). Your constant is represents the average value of bought when both are 0 (e.g. this as the same as you would get from sum bought if saidhi == 0 & sign == 0).

How to simulate pairs from a joint distribution

I have two normal distributions X and Y and with a given covariance between them and variances for both X and Y, I want to simulate (say 200 points) of pairs of points from the joint distribution, but I can't seem to find a command/way to do this. I want to eventually plot these points in a scatter plot.
so far, I have
set obs 100
set seed 1
gen y = 64*rnormal(1, 5/64)
gen x = 64*rnromal(1, 5/64)
matrix D = (1, .5 | .5, 1)
drawnorm x, y, cov(D)
but this makes an error saying that x and y already exist.
Also, once I have a sample, how would I plot the drawnorm output as a scatter?
A related approach for generating correlated data is to use the corr2data command:
clear
set obs 100
set seed 1
matrix D = (1, .5 \ .5, 1)
drawnorm x1 y1, cov(D)
corr2data x2 y2, cov(D)
. summarize x*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x1 | 100 .0630304 1.036762 -2.808194 2.280756
x2 | 100 1.83e-09 1 -2.332422 2.238905
. summarize y*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y1 | 100 -.0767662 .9529448 -2.046532 2.726873
y2 | 100 3.40e-09 1 -2.492884 2.797518
It is important to note that unlike drawnorm, the corr2data approach does not generate data that is a sample from an underlying population.
You can then create a scatter plot as follows:
scatter x1 y1
Or to compare the two approaches in a single graph:
twoway scatter x1 y1 || scatter x2 y2
EDIT:
For specific means and variances you need to specify the mean vector μ and covariance matrix Σ in drawnorm. For example, to draw two random variables that are jointly normally distributed with means of 8 and 12, and variances 5 and 8 respectively, you type:
matrix mu = (8, 12)
scalar cov = 0.4 * sqrt(5 * 8) // assuming a correlation of 0.4
matrix sigma = (5, cov \ cov, 8)
drawnorm double x y, means(mu) cov(sigma)
The mean and cov options of drawnorm are both documented in the help file.
Here is an almost minimal example:
. clear
. set obs 100
number of observations (_N) was 0, now 100
. set seed 1
. matrix D = (1, .5 \ .5, 1)
. drawnorm x y, cov(D)
As the help for drawnorm explains, you must supply new variable names. As x and y already exist, drawnorm threw you out. You also had a superfluous comma that would have triggered a syntax error.
help scatter tells you about scatter plots.

Adding variables in Stata and then dividing by a number is giving unexpected results

I am trying to calculate z-scores by creating a variable D from 3 other variables, namely A, B, and C. I am trying to generate D as : D= (A-B)/C but for some reason when I do it, it produces very large numbers. When I did just (A-B) it did not get what it should have when I calculated by hand, instead of -2, I for -105.66.
Variable A is 'long' and variable B is 'float', I am not sure if this is the reason? My stata syntax is:
gen zscore= (height-avheight)/meansd
did not work.
You are confusing scalars and variables. Here's a solution (chop off the first four lines and replace x by height to fit the calculation into your code):
// example data
clear
set obs 50
gen x = runiform()
// summarize
qui su x
// store scalars
sca de mu = r(mean)
sca de sd = r(sd)
// z-score
gen zx = (x - mu) / sd
su zx
x and its z-score zx are variables that take many values, whereas mu and sd are constants. You might code constants in Stata by using scalars or macros.
I am not sure what you are trying to get, but I will use the auto data from Stata to explain. This is basic stuff in Stata. Say I want to test that the price=3
sysuse auto
sum price
#return list which is optional command
scalar myz=(3-r(mean))/r(sd) #r(mean) and r(sd) gives the mean and sd of price, if that is given you can simply enter the value for that
dis myz
-2.0892576
So, z value is -2.09 here.

Calculate a difference in stata with if command

I want to calculate something like
by group: egen x if y==1 - x if y==2
Of course this is not a real stata code but I'm kind of lost. In R this is simply passed by a "[]" behind the variable of intrest but I'm not sure about stata
R would be
x[y==1] - x[y==2]
I would use reshape.
clear
version 11.2
set seed 2001
* generate data
set obs 100
generate y = 1 + mod(_n - 1, 2)
generate x = rnormal()
generate group = 1 + floor((_n - 1) / 2)
list in 1/10
* reshape to wide and difference
reshape wide x, i(group) j(y)
generate x_diff = x1 - x2
list in 1/5
I would use reshape in R, also. Otherwise can you be sure that everything is properly ordered to give you the difference you want?
There is likely a neat Mata solution, but I know very little Mata. You may find preserve and restore helpful if you're averse to reshapeing.
Richard Herron makes a good point that a reshape to a different structure might be worthwhile. Here I focus on how to do it with the existing structure.
Assuming that there are precisely two observations for each group of group, one with y == 1 and one with y == 2, then
bysort group (y) : gen diff = x[1] - x[2]
gives the difference between values of x, necessarily repeated for each observation of two in a group. An assumption-free method is
bysort group: egen mean_1 = mean(x / (y == 1))
by group: egen mean_2 = mean(x / (y == 2))
gen diff = mean_1 - mean_2
Consider expressions such as x / (y == 1). Here the denominator y == 1 is 1 when y is indeed 1 and 0 otherwise. Division by 0 yields missing in Stata, but the egen command here ignores those. So the first command of the three commands above yields the mean of x for observations for which y == 1 and the second the mean of x for observations for which y == 2. Other values of y (even missings) will be ignored. This method should agree with the first method when the first method is valid.
For a review of similar problems, see http://stata-journal.com/article.html?article=dm0055
In Stata the if referred to here is a qualifier (not a command).