How do I fill down empty values for different groups.
My data looks like:
id visit status var reason
1 1 Done x1
1 1 Done x2
1 1 Done x3
1 2 Not Done x1 text1
1 2 Not Done x2
1 2 Not Done x3
1 3 Done x1
1 3 Done x2
1 3 Done x3
2 1 Not Done x1 text2
2 1 Not Done x2
2 1 Not Done x3
2 2 Done x1
2 2 Done x2
2 2 Done x3
2 3 Done x1
2 3 Done x2
2 3 Done x3
The output should be like this
id visit status var reason
1 1 Done x1
1 1 Done x2
1 1 Done x3
1 2 Not Done x1 text1
1 2 Not Done x2 text1
1 2 Not Done x3 text1
1 3 Done x1
1 3 Done x2
1 3 Done x3
2 1 Not Done x1 text2
2 1 Not Done x2 text2
2 1 Not Done x3 text2
2 2 Done x1
2 2 Done x2
2 2 Done x3
2 3 Done x1
2 3 Done x2
2 3 Done x3
I think this is a quite simple problem, but for now, I haven't been able to resolve it. Any help would be greatly appreciated!
This is a simple task utilising the first.variable functionality that exists in a data step when a by statement is used.
Essentially I've created a new variable that is assigned the value of Reason whenever a new visit is encountered. The retain statement ensures that the new variable value is copied for all subsequent rows where the Id and Visit do not change. Then I just delete the original Reason variable and rename the new one.
data have;
infile datalines dsd;
input id visit status &$ var $ reason $;
datalines;
1, 1, Done, x1,,
1, 1, Done, x2,,
1, 1, Done, x3,,
1, 2, Not Done, x1, text1,
1, 2, Not Done, x2,,
1, 2, Not Done, x3,,
1, 3, Done, x1,,
1, 3, Done, x2,,
1, 3, Done, x3,,
2, 1, Not Done, x1, text2,
2, 1, Not Done, x2,,
2, 1, Not Done, x3,,
2, 2, Done, x1,,
2, 2, Done, x2,,
2, 2, Done, x3,,
2, 3, Done, x1,,
2, 3, Done, x2,,
2, 3, Done, x3
;
run;
data want;
set have;
retain reason_new;
by id visit;
if first.visit then reason_new=reason;
drop reason;
rename reason_new = reason;
run;
Related
I have the following data frame:
df1 <- data.frame(x1=c(1,2,3,4), x2=c(10,20,30,40), x3=c(100,200,300,400))
And I want to generate al the possible data frames that can be created from combining d1$x1, df1$x2 and df1$x3 in different orders so 4^3 different dataframes, e.g:
x1 x2 x3
1 1 10 100
2 2 20 200
3 3 30 300
4 4 40 400`
x1 x2 x3
1 1 40 400
2 2 30 300
3 3 20 200
4 4 10 100
and so on. For each of them I want to compute the following function:
my.function <- function(x1, x2, x3) {
sum(0.3*x1^2+0.3*x2^2+0.4*x3)/nrow(x1)
}
I did this, but it's clearly wrong:
res1 <- rep(NA, nrow(df1)^3)
for(i in 1:nrow(df1)){
for(j in 1:nrow(df1)){
for(k in 1:nrow(df1)){
x1.1 <- as.vector(c(df1[-i, 1], df1[i, 1]))
x2.1 <- as.vector(c(df1[-k, 2], df1[k, 2]))
x3.1 <- as.vector(c(df1[-j, 3], df1[j, 3]))
res1[nrow(df1)^2*(i-1) + nrow(df1)*(j-1)+k] <- m.function(x1.1, x2.1, x3.1)
}
}
}
I tried to find a similar problem of mine without much luck, could you please help me?
Thank you so much!!!
I currently have a health injury data set of scores 0-6, where 0 is no injury and 6 is fatal injury. This is across 6 categorical body region variables. I'm attempting to construct an Abbreviated Injury Scale, where the three highest scores in an observation would be considered for the calculations. How do I filter the three highest in each row in SAS? Below is an example:
ID A B C D E F
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
So in OBS 1, scores 3, 4, and 0 would be used; OBS 2 - 4, 2, and 1; OBS 3 - 5, 0, and 0; OBS 4 - 5, 4, 2.
I've provided code below to do what you asked, and detailed out the steps enough that you should be able to modify it for many options/uses.
Basically, it takes your data, transposes it as Quentin suggested and then uses proc means to output the top 3 observations for each ID.
DATA NEW;
INPUT ID A B C D E F;
CARDS;
1 0 0 0 3 4 0
2 1 2 1 4 0 0
3 0 0 5 0 0 0
4 1 2 1 5 4 0
RUN;
PROC TRANSPOSE DATA=NEW OUT=T_OUT(RENAME=(_NAME_ = VARIABLE COL1=VALUES));
BY ID;
VAR A B C D E F;
PROC PRINT DATA=T_OUT;
RUN;
PROC MEANS DATA=T_OUT NOPRINT;
CLASS ID;
TYPES ID;
VAR VALUES;
OUTPUT OUT=TOP3LIST(RENAME=(_FREQ_=RANK VALUES_MEAN=INDEX_CRITERIA))SUM= MEAN=
IDGROUP(MAX(VALUES) OUT[3] (VALUES VARIABLE)=)/AUTOLABEL AUTONAME;
PROC PRINT DATA=TOP3LIST;
RUN;
***THEN YOU CAN MERGE THIS DATA SET TO YOUR ORIGINAL ONE BY ID TO GET YOUR INDEX CRITERIA ADDED TO IT***;
***THE INDEX_CRITERIA IS A MEAN FROM PROC MEANS BEFORE THE KEEPING OF JUST THE TOP3 VALUES***;
DATA FINAL (DROP=_TYPE_ RANK VALUES_Sum VALUES_1 VALUES_2 VALUES_3 VARIABLE_1 VARIABLE_2 VARIABLE_3);
MERGE NEW TOP3LIST;
INDEX_CRITERIA2=SUM(VALUES_1, VALUES_2, VALUES_3)/3; *THIS CRITERIA IS AVERAGE OF THE KEPT 3 VALUES;
BY ID;
PROC PRINT DATA=FINAL;
RUN;
Best regards,
john
I would like to know if it's possible to select the 5 minimum or maximum values by rows with IML ?
This is my code :
Proc iml ;
use table;
read all var {&varlist} into matrix ;
n=nrow(matrix) ; /* n=369 here*/
p=ncol(matrix); /* p=38 here*/
test=J(n,5,.) ;
Do i=1 to n ;
test[i,1]=MIN(taux[i,]);
End;
Quit ;
So I would like to obtain a matrix test that contains for the 1rst column the maximal minimum value, then for the 2nd column the minimum value of my row EXCEPTING the 1rst value, etc...
If you have any idea ! :)
Event if it's not with IML (but with SAS : base, sql..)
So for example :
Data test; input x1-x10 ; cards;
1 9 8 7 3 4 2 6
9 3 2 1 4 7 12 -2
;run;
And I would like to obtain the results sorted by row:
1 2 3 4 6 7 8 9
-2 1 2 3 4 7 12
in order to select my 5 minimum values in another table :
y1 y2 y3 y4 y5
1 2 3 4 6
-2 1 2 3 4
Read the article "Compute the kth smallest data value in SAS"
Define the modules as in the article. Then use the following:
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
x = have`; /* transpose */
ord = j(5,ncol(x));
do j = 1 to ncol(x);
ord[,j] = ordinal(1:5, x[,j]);
end;
print ord;
If you have missing values in your data and want to exclude them, use the SMALLEST module instead of the ORDINAL module.
You can use call sort() in PROC IML to sort a column. Because you want to separate the columns and not sort the whole matrix, extract the column, sort it, and then update the original.
You want to sort rows, so transpose your matrix, do the sorting, and then transpose back.
proc iml;
have = {1 9 8 7 3 4 2 6,
9 3 2 1 4 7 12 -2};
print have;
n = nrow(have);
have = have`; /*Transpose because sort works on columns*/
do i=1 to n;
tmp = have[,i];
call sort(tmp,1);
have[,i]=tmp;
end;
have = have`;
want = have[,1:5];
print want;
quit;
I have a variable age, 13 variables x1 to x13, and 802 observations in a Stata dataset. age has values ranging 1 to 9. x1 to x13 have values ranging 1 to 13.
I want to know how to count the number of 1 .. 13 in x1 to x13 according to different values of age. For example, for age 1, in x1 to x13, count the number of 1,2,3,4,...13.
I first change x1 to x13 as a matrix by using
mkmat x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13, matrix (a)
Then, I want to count using the following loop:
gen count = 0
quietly forval i = 1/802 {
quietly forval j = 1/13 {
replace count = count + inrange(a[r'i', x'j'], 0, 1), if age==1
}
}
I failed.
I am still somewhat uncertain as to what you like to achieve. But if I am understanding you correctly, here is one way to do it.
First, a simple data that has age ranging from one to three, and four variables x1-x4, each with values of integers ranging between 5 and 7.
clear
input age x1 x2 x3 x4
1 5 6 6 6
1 7 5 6 5
2 5 7 6 6
3 5 6 7 7
3 7 6 6 6
end
Then we create three count variables (n5, n6 and n7) that counts the number of 5s, 6s, and 7s for each subject across x1-x4.
forval i=5/7 {
egen n`i'=anycount(x1 x2 x3 x4),v(`i')
}
Below is how the data looks like now. To explain, the first "1" under n5 indicates that there is only one "5" for the subject across x1-x4.
+----------------------------------------+
| age x1 x2 x3 x4 n5 n6 n7 |
|----------------------------------------|
1. | 1 5 6 6 6 1 3 0 |
2. | 1 7 5 6 5 2 1 1 |
3. | 2 5 7 6 6 1 2 1 |
4. | 3 5 6 7 7 1 1 2 |
5. | 3 7 6 6 6 0 3 1 |
+----------------------------------------+
It sounds to me like your ultimate goal is to have sums calculated separately for each value in age. Assuming this is true, let's create a 3x3 matrix to store such results.
mat A=J(3,3,.) // age (1-3) and values (5-7)
mat rown A=age1 age2 age3
mat coln A=value5 value6 value7
forval i=5/7 {
forval j=1/3 {
qui su n`i' if age==`j'
loca k=`i'-4 // the first column for value5
mat A[`j',`k']=r(sum)
}
}
The matrix looks like this. To explain, the first "3" under value5 indicates that for all children of the age of 1, the value 5 appears a total of three times across x1-x4
A[3,3]
value5 value6 value7
age1 3 4 1
age2 1 2 1
age3 1 4 3
With Aspen's example, you could do this:
gen id = _n
reshape long x, i(id)
tab age x
Note that your sample code doesn't loop over different ages and there is an incorrect comma in the count command. I won't try to fix the code, as there are many more direct methods, one of which is above. tabulate has an option to save the table as a matrix.
Here is another solution closer to the original idea. Warning: code not tested.
matrix count = J(9, 13, 0)
forval i = 1/9 {
forval j = 1/13 {
forval J = 1/13 {
qui count if age == `i' & x`J' == `j'
matrix count[`i', `j'] = count[`i', `j'] + r(N)
}
}
}
Combining multiple variables into one by choosing the maximum value
id v1 v2 v3 v4 v5 v6
1 1 2 5 3 1 1
2 4 2 3 5 1
3 3 2 2 1 3
4 2 1 2 5 7
5 6 7 1 2 1 7
into n1=max(v1,v2), n2=v3, n3=max(v4,v5,v6)
id n1 n2 n3
1 2 5 3
2 4 3 5
3 3 2 3
4 2 2 7
5 7 1 7
How do I do this in SAS? (It's so easy in excel.. It's relatively intuitive in R.. But I can't figure it out in SAS! Please help!)
Thank you for your time!
MAX function is your friend.
data want;
set have;
n1 = max(of v1 v2);
n2 = v3;
n3 = max(of v4 v5 v6);
run;
Arrays and variable lists also work (such as, n3 = max(of v4-v6);).
I agree that the MAX function is what you want, but I would code it differently.
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(v4, v5, v6);
run;
Alternatively:
data want;
set have;
n1 = max(v1, v2);
n2 = v3;
n3 = max(of v4-v6);
run;