I have two normal distributions X and Y and with a given covariance between them and variances for both X and Y, I want to simulate (say 200 points) of pairs of points from the joint distribution, but I can't seem to find a command/way to do this. I want to eventually plot these points in a scatter plot.
so far, I have
set obs 100
set seed 1
gen y = 64*rnormal(1, 5/64)
gen x = 64*rnromal(1, 5/64)
matrix D = (1, .5 | .5, 1)
drawnorm x, y, cov(D)
but this makes an error saying that x and y already exist.
Also, once I have a sample, how would I plot the drawnorm output as a scatter?
A related approach for generating correlated data is to use the corr2data command:
clear
set obs 100
set seed 1
matrix D = (1, .5 \ .5, 1)
drawnorm x1 y1, cov(D)
corr2data x2 y2, cov(D)
. summarize x*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
x1 | 100 .0630304 1.036762 -2.808194 2.280756
x2 | 100 1.83e-09 1 -2.332422 2.238905
. summarize y*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
y1 | 100 -.0767662 .9529448 -2.046532 2.726873
y2 | 100 3.40e-09 1 -2.492884 2.797518
It is important to note that unlike drawnorm, the corr2data approach does not generate data that is a sample from an underlying population.
You can then create a scatter plot as follows:
scatter x1 y1
Or to compare the two approaches in a single graph:
twoway scatter x1 y1 || scatter x2 y2
EDIT:
For specific means and variances you need to specify the mean vector μ and covariance matrix Σ in drawnorm. For example, to draw two random variables that are jointly normally distributed with means of 8 and 12, and variances 5 and 8 respectively, you type:
matrix mu = (8, 12)
scalar cov = 0.4 * sqrt(5 * 8) // assuming a correlation of 0.4
matrix sigma = (5, cov \ cov, 8)
drawnorm double x y, means(mu) cov(sigma)
The mean and cov options of drawnorm are both documented in the help file.
Here is an almost minimal example:
. clear
. set obs 100
number of observations (_N) was 0, now 100
. set seed 1
. matrix D = (1, .5 \ .5, 1)
. drawnorm x y, cov(D)
As the help for drawnorm explains, you must supply new variable names. As x and y already exist, drawnorm threw you out. You also had a superfluous comma that would have triggered a syntax error.
help scatter tells you about scatter plots.
Related
I have to find the subpixel (x,y) coordinates of the maximum value given a set of discrete points.Im ny case, I run cv::matchTemplate function that slides a model window along an image and returns a score value for each pixel position.The result is an image with the score for each position and the location (x0, y0) with of the maximum value, like these values around the maximum value found:
x_1 x0 x1
y_1 |0.91 | 0.89 | 0.90|
y0 |0.92 | 0.99 | 0.89|
y1 |0.95 | 0.95 | 0.90|
I would like to use a quadratic interpolation to find where are the subpixel point coordinates of the interpolated maximum value, using just the nearest neighbors.
In a 1d case, I use this formula (assuming x0 is the origin):
interpolated_x = (x_1-x1)/(2.*(x_1-2.*x0+x1));
For example:
x_1 x0 x1
|0.92 | 0.99 | 0.89|
you get interpolated_x = -0.08823, that is correctly slightly on the left of x0.
Is there some C++ code for the 2d case?
I have variables y, x1 and x2. I estimate a differenced equation (without the intercept) using reg d.(y x1 x2), nocons. Now I want to get the residuals for the original variables using the estimated coefficients. I can do it by
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
But would there be an easier way? I need to keep those generated residuals for future use. Here is a complete minimal example.
clear all
set obs 100
gen id = floor((_n-1)/5)+1
by id, sort: gen year = 1990+_n
xtset id year
set seed 1
gen x1 = rnormal()
gen x2 = rnormal()
gen y = rnormal()
*** Data generated ***
reg d.(y x1 x2), nocons
matrix b = e(b)
gen resid = y - b[1,1]*x1 - b[1,2]*x2
I wonder if there is a flexible approach because sometimes I want to completely change variable names for the regression (e.g., reg dy dx1 dx2, nocons not just reg d.(y x1 x2)). I thought perhaps predict might be helpful but I don't know. Would it be possible to avoid typing the variable names explicitly?
predict will not work since it will create residuals on the differenced scale. You want residuals in terms of the original y, which is unusual, so there is no off-the-shelf solution.
I think the easiest path is to do something like this:
reg d.(y x1 x2), nocons coefl
local vars:colnames e(b) // get a list of coefficients
foreach x of local vars {
local xvar = subinstr("`x'","D.","",1) // strip out the D. prefix from the coefficient names
local diff "`diff' - _b[`x']*`xvar'"
}
gen resid = y `diff'
If you have covariates like dx1 and dx2, you can modify the prefix stipper like this:
local xvar = subinstr("`x'","d","",1) // strip out the first d prefix
Here are my data. Data are structured like so: id x1 x2 x3 y.
I used proc mixed to analyze it, but now want to determine regression coefficients and I don't know how to do it. I'm only a beginner with sas. From the results I see that x1, x2, x3 and x1x2x3 are the significant effects, but how to determine the coefficients alpha, beta, gamma, delta, theta:
y = theta + alpha*x1 + beta*x2 + gamma*x3 + delta*x1*x2*x3
This is my code:
ods graphics on;
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id;
run;
ods graphics off;
EDIT 1: Here is a part of the table Solutions for Fixed Effects:
Since x1 has two levels, there are two rows for it in the table. Do I get the effect of x1 by summing these two values: -109.07 for the first row and 0 for the second, or should I do something else? Note that this is 2^k design. The effect of x1 should be computed as half the difference between the average values for y when x1 is high (20) and when it is low (10).
Based on your model, x1, x2, x3 should be treated as continuous variables, then you should be able to get the coefficients in your model.
proc mixed data=test;
model y=x1 x2 x3 x1*x2*x3/ solution residual;
random id/s;
run;
However, based on your code and the values of x1, x2 and x3, it would be better to treat them as categorical variable as what you did, then the Estimate in your table actually is the mean difference between whatever two levels. The link below may help you understand your results.
http://support.sas.com/kb/38/384.htmlexplanation of estimation of coefficients
The solution option should generate your estimates.You need to include it on the model and random statements. You should see two tables, Solution for Fixed Effects and Solution for Random Effects that hold the estimates.
proc mixed data=test;
class x1 x2 x3;
model y = x1 | x2 | x3 / solution residual;
random id / s;
run;
The Random Coefficients example in the documentation is close to your question.
https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_mixed_sect034.htm
I am trying to calculate z-scores by creating a variable D from 3 other variables, namely A, B, and C. I am trying to generate D as : D= (A-B)/C but for some reason when I do it, it produces very large numbers. When I did just (A-B) it did not get what it should have when I calculated by hand, instead of -2, I for -105.66.
Variable A is 'long' and variable B is 'float', I am not sure if this is the reason? My stata syntax is:
gen zscore= (height-avheight)/meansd
did not work.
You are confusing scalars and variables. Here's a solution (chop off the first four lines and replace x by height to fit the calculation into your code):
// example data
clear
set obs 50
gen x = runiform()
// summarize
qui su x
// store scalars
sca de mu = r(mean)
sca de sd = r(sd)
// z-score
gen zx = (x - mu) / sd
su zx
x and its z-score zx are variables that take many values, whereas mu and sd are constants. You might code constants in Stata by using scalars or macros.
I am not sure what you are trying to get, but I will use the auto data from Stata to explain. This is basic stuff in Stata. Say I want to test that the price=3
sysuse auto
sum price
#return list which is optional command
scalar myz=(3-r(mean))/r(sd) #r(mean) and r(sd) gives the mean and sd of price, if that is given you can simply enter the value for that
dis myz
-2.0892576
So, z value is -2.09 here.
I want to calculate something like
by group: egen x if y==1 - x if y==2
Of course this is not a real stata code but I'm kind of lost. In R this is simply passed by a "[]" behind the variable of intrest but I'm not sure about stata
R would be
x[y==1] - x[y==2]
I would use reshape.
clear
version 11.2
set seed 2001
* generate data
set obs 100
generate y = 1 + mod(_n - 1, 2)
generate x = rnormal()
generate group = 1 + floor((_n - 1) / 2)
list in 1/10
* reshape to wide and difference
reshape wide x, i(group) j(y)
generate x_diff = x1 - x2
list in 1/5
I would use reshape in R, also. Otherwise can you be sure that everything is properly ordered to give you the difference you want?
There is likely a neat Mata solution, but I know very little Mata. You may find preserve and restore helpful if you're averse to reshapeing.
Richard Herron makes a good point that a reshape to a different structure might be worthwhile. Here I focus on how to do it with the existing structure.
Assuming that there are precisely two observations for each group of group, one with y == 1 and one with y == 2, then
bysort group (y) : gen diff = x[1] - x[2]
gives the difference between values of x, necessarily repeated for each observation of two in a group. An assumption-free method is
bysort group: egen mean_1 = mean(x / (y == 1))
by group: egen mean_2 = mean(x / (y == 2))
gen diff = mean_1 - mean_2
Consider expressions such as x / (y == 1). Here the denominator y == 1 is 1 when y is indeed 1 and 0 otherwise. Division by 0 yields missing in Stata, but the egen command here ignores those. So the first command of the three commands above yields the mean of x for observations for which y == 1 and the second the mean of x for observations for which y == 2. Other values of y (even missings) will be ignored. This method should agree with the first method when the first method is valid.
For a review of similar problems, see http://stata-journal.com/article.html?article=dm0055
In Stata the if referred to here is a qualifier (not a command).