I am trying to impute two variables simultaneously in Stata: say y and x. And then I want to perform a linear regression for them.
The code I used are:
mi set mlong
mi register imputed y x
mi impute regress y a b c, add(10)
mi impute regress x a b c, add(10)
mi estimate: regress y x
I run into an error: "estimation sample varies between m=1 and m=11". Can someone help me out? Thanks!
I prefer doing it using chained equations. The code below should work (note that Part 1 can be skipped as I only used it to generate a suitable mock dataset):
* Part 1
clear all
set seed 0945
set obs 50
gen y0 = _n
gen y = runiform()
sort y
gen x0 = _n
gen x = runiform()
sort x
replace y = . in 1
replace y = . in 5
replace y = . in 10
replace y = . in 15
replace y = . in 20
replace y = . in 25
replace y = . in 30
replace y = . in 35
replace y = . in 40
replace y = . in 45
replace y = . in 50
sort y
replace x = . in 1
replace x = . in 5
replace x = . in 10
replace x = . in 15
replace x = . in 20
replace x = . in 25
replace x = . in 30
replace x = . in 35
replace x = . in 40
replace x = . in 45
replace x = . in 50
gen a = _n
sort x
gen b = _n
gen c = _n
* Part 2
mi set mlong
mi register imputed y x
mi impute chained (regress) y x = a b c, add(10)
mi estimate, dots: regress y x
Related
Working in Stata, suppose I have a data table like this...
Household Identifier
Person Identifier
Var1
Var2
1
1
a
b
1
1
c
d
1
2
e
f
2
1
g
h
2
1
i
j
2
1
k
l
2
2
m
n
2
2
o
p
3
1
q
r
I want to be able to combine these so there is just one observation per household, i.e. like this
Household Identifier
Person1_Var1_1
Person1_Var2_1
Person1_Var1_2
Person1_Var2_2
Person1_Var3_1
Person1_Var3_2
Person2_Var1_1
Person2_Var2_1
Person2_Var1_2
Person2_Var2_2
Person2_Var3_1
Person2_Var3_2
1
a
b
c
d
.
.
e
f
.
.
.
.
2
g
h
i
j
k
l
m
n
o
p
.
.
3
q
r
.
.
.
.
.
.
.
.
.
.
Is there a straightforward way of doing this?
You can use reshape wide twice. Note that when I create rowid, I add an underscore to it; I also add underscore to the var1 and var2 columns. In the first reshape call, I use string to identify rowid as a string variable
bysort householdidentifier personidentifier: gen rowid = strofreal(_n) + "_"
rename var* =_
reshape wide var1 var2, i(householdidentifier personidentifier) j(rowid) string
reshape wide var*, i(householdidentifier) j(personidentifier)
Output:
househ~r var1_1_1 var2_1_1 var1_2_1 var2_2_1 var1_3_1 var2_3_1 var1_1_2 var2_1_2 var1_2_2 var2_2_2 var1_3_2 var2_3_2
1. 1 a b c d e f
2. 2 g h i j k l m n o p
3. 3 q r
I have the following SAS dataset.
correlation
policynum
risknum
A
X
Y
A
X
Y
A
X
Y
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
C
Z
M
C
Z
M
C
Z
M
D
Z
M
D
Z
M
D
Z
M
In SAS, I want to filter the above dataset so I get my final output as:
correlation
policynum
risknum
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
D
Z
M
D
Z
M
D
Z
M
i.e. for each group of policynum and risknum, if multiple values exist for correlation, I want to keep the second value and get rid of the first value.
If only a single value of correlation exists for a group of policynum and risknum, I want to retain that group in my final output too.
What would be the best way to do this? It might be something simple as I am relatively new to SAS.
Thanks in advance!
If the order of the correlation values, in sort order, is the same ordering as they appear row-wise in the data set you can use SQL. Otherwise, SQL, being based on set theory, which does not have implicit row numbers, can not be used. A DATA step with DOW loop can be used.
Example:
FYI, one common situation in which SAS coders use the phrase 'DOW loop' is when SET & BY statements occur inside a DO loop.
data have;
input correlation $ policynum $ risknum $;
datalines;
A X Y
A X Y
A X Y
B X Y
B X Y
B X Y
B X L
B X L
B X L
C Z M
C Z M
C Z M
D Z M
D Z M
D Z M
;
/* keep last group of a nested group */
* SQL can be used only if correlation wanted is ALWAYS highest valued correlation;
proc sql;
create table want as
select * from have
group by policynum, risknum
having correlation = max(correlation)
;
* DATA Step DOW loops can be used when correlation wanted is last occurring correlation within by group;
data want;
do _n_ = 1 by 1 until (last.policynum);
set have;
by policynum risknum notsorted; /* presume at least contiguous */
end;
_want_correlation = correlation;
do _n_ = 1 to _n_;
set have;
if _want_correlation = correlation then OUTPUT;
end;
run;
I have a string variable
var1
x
y
z
that I need to "duplicate" and append to give
var1 var2
x x
x y
x z
--------
y x
y y
y z
--------
z x
z y
z z
where I added the horizontal lines to facilitate reading. Is such an expansion possible in Stata without loops? (I am not sure if "duplicate" is the right term.)
Two commands:
gen var2 = var1
fillin var1 var2
See help fillin and http://www.stata-journal.com/sjpdf.html?articlenum=dm0011
I'm trying to pull only last 4 working days data in SAS...I tried following code but I'm not getting what I'm intended to...
data input;
Input id $ id1 $ id2 $ num date date9.;
Format Date Date9.;
datalines;
x y z 3 19JUL2015
x y z 2 18JUL2015
x y z 3 17JUL2015
x y z 2 16JUL2015
x y z 3 15JUL2015
x y z 2 14JUL2015
x y z 3 13JUL2015
a b c 1 12JUL2015
a b c 1 11JUL2015
a b c 1 10JUL2015
a b c 1 09JUL2015
a b c 1 08JUL2015
a b c 2 07JUL2015
x y z 1 06JUL2015
;
Run;
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
*if intck('weekday',Date,today()) >4;
if 1<Weekday(Date)<7 and Date>=today()-4;
Run;
I think you need to reverse the > in your code, and add a qualification that you only want weekdays:
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
if intck('weekday',Date,'20JUL2015'd) le 4 and 1<weekday(Date)<7;
*if 1<Weekday(Date)<7 and Date>='20JUL2015'd-5;
Run;
I would like to group concatenate a categorical variable. Example:
pat x
1 a
1 b
1 b
2 a
2 a
The group concatenating should result in:
pat y
1 a-b-b
2 a-a
In Mysql this would be done using group_concat:
SELECT pat, GROUP_CONCAT(x SEPARATOR '-') y FROM tb GROUP BY pat
Also it would be nice if the function could concatenate distinct ordered values. With above example the output should be:
pat y
1 a-b
2 a
With MySQL:
SELECT pat, GROUP_CONCAT(DISTINCT x ORDER BY x SEPARATOR '-') y FROM tb GROUP BY pat
Note that this would reduce the data set to fewer observations.
bysort pat y: keep if _n == 1
by pat: gen Y = y[1]
by pat: replace Y = Y[_n-1] + "-" + y if _n > 1
by pat: keep if _n == _N