I need to create a new variable in which is comprised of a list of other variables found in my dataset.
HAD1 (1=yes 2=no 9=unknown),
HAF10 (1=yes 2=no 9=unknown),
HAC1C (1=yes 2=no 9=unknown),
and HAC1D (1=yes 2=no 9=unknown)
to add up the number of health conditions an individual has. I also want to set all 9 to equal "."
My new variable will be named CC4
CC4 =
(0=no conditions,
1=one condition,
2=two conditions,
3=three conditions,
4=four conditions,
.=any condition appears missing)
How do I code it in the correct way and add it to my dataset?
I only wrote this:
data dataset;
set dataset;
*use arrays to create clean, re-coded versions of variables;
array code1 [4] HAD1 HAF10 HAC1C HAC1D;
array code2 [4] diabetes hattack hfailure stroke;
do new= 1 to 4;
if code1 [new] = 1 then code2 [new] = 1;
*keep all 1's as 1's;
else if code1 [new] = 2 then code2 [new] = 2;
*keep all 2's as 2's;
else if code1 [new] = 9 then code2 [new] = .;
*make all 9's into .'s;
end;
drop new;
*create summation variable;
cc4=HAD1+HAF10+HAC1C+HAC1D;
You don't need to recode to count.
You can take advantage of SAS evaluating boolean expressions to 1 for TRUE and 0 for FALSE.
data want;
set have;
cc4 = (HAD1=1)+(HAF10=1)+(HAC1C=1)+(HAC1D=1);
run;
PS Do not overwrite your input data by using the same dataset name in the DATA and SET statements. It will make it hard to correct coding mistakes.
If you want new names for the original code variables, use RENAME
If you want new names and the original code variables, probably shouldn't
If you only want the cc4 result, you can use the fact of a single digit code for meaning to compute your condition count when all conditions assert a yes/no state.
Example:
data have;
do code1 = 1,2,9;
do code2 = 1,2,9;
do code3 = 1,2,9;
do code4 = 1,2,9;
output;
end;end;end;end;
run;
data want;
set have;
codes = cats(code1,code2,code3,code4);
drop codes;
cc4 = ifn(index(codes,'9'),.,count(codes,'1'));
run;
In your case replace code1,code2,code3,code4 with HAD1,HAF10,HAC1C,HAC1D
Related
I am trying to collapse my multiple rows of binary variables into a single row per patient id as depicted in my illustration. Could someone please help me with the SAS code to do this? Thanks
If the rule is that to set it to 1 if it is ever 1 then take the MAX. If the rule is to set it to one only if all of them are one then take the MIN.
proc summary data=have nway ;
by id;
output out=want max= ;
run;
Update trick
data want;
update have(obs=0) have;
by id;
run;
Or
proc sql;
create table want as
select ID, max('2018'n) as Y2018, max('2019'n) as Y2019, max('2020'n) as Y2020
from have
group by ID
order by ID;
quit;
Untested because you provided data as images, please post as text, preferably as a data step.
Here is a data step-based solution. Certainly more complex than the above answers, but it does show ways you can use arrays, first. and last. processing, and the retain statement.
Use a retained temporary array to hold the values of 2018-2020 until the last observation of each id group. On the last value of each id, check if each held value is 1 and set each value of the year to a 1 or 0.
data want;
set have;
by id;
array year[3] '2018'n--'2020'n;
array hold[3] _TEMPORARY_;
retain hold;
if(first.id) then call missing(of hold[*]);
do i = 1 to dim(year);
if(year[i] = 1) then hold[i] = 1;
end;
if(last.id) then do;
do i = 1 to dim(year);
year[i] = (hold[i] = 1);
end;
output;
end;
drop i;
run;
I am working with a panel dataset, so many countries and many variables throughout a period. The problem is that some countries have no value for certain variables across the whole period and I would like to get rid of them. I found this code for deleting rows with missing values :
DATA data0;
SET data1;
IF cmiss(of _all_) then delete;
RUN;
But all this does is check every row, while I would like to delete a whole country if it has no observations in at least one variable.
Here's a part of the data :
If you want to delete the whole country if it has any information missing, you are on the right track, you just need to add a (group) by statement.
If your data is already sorted by country, as it appears to be in the picture, you can just run:
data want;
set have;
IF cmiss(of _all_) then delete;
by country;
If it is not sorted, you need to first run:
proc sort data=have;
by country;
However, if you have 60 years of data for every country, my guess is that you will not find a single one that have all the information for every year. It will be probably better to do some substantive choices of countries and periods you want to analyze, and then perform multiple imputatiom of missing data: https://support.sas.com/rnd/app/stat/papers/multipleimputation.pdf
You can use a DOW loop to compute which variable(s) contain only missing values within a group.
A second DOW loop outputs only those groups in which all variables contain at least on value.
Example:
data have;
call streaminit (2020);
do country = 1 to 6;
do year = 1960 to 1999;
array x gini kof tradegdp fdi gdp age_dep educ;
do over x;
x = rand('integer', 20, 100);
end;
if country = 1 then call missing (gini);
if country = 2 then call missing (educ);
if country = 4 then call missing (fdi);
output;
end;
end;
run;
data want;
* count number of non-missing values over group for each arrayed variable;
do _n_ = 1 by 1 until (last.country);
set have;
by country;
array x gini kof tradegdp fdi gdp age_dep educ;
array flag(100) _temporary_; * flag if variable has a non-missing value in group;
do _index = 1 to dim(x);
if not(flag(_index)) then flag(_index) = 1 - missing(x(_index));
end;
end;
* check if at least one variable has no values;
_remove_group_flag = sum(of flag(*)) ne dim(x);
do _n_ = 1 to _n_;
set have;
if not _remove_group_flag then output;
end;
call missing (of flag(*));
run;
Will LOG
NOTE: There were 240 observations read from the data set WORK.HAVE. First DOW loop
NOTE: There were 240 observations read from the data set WORK.HAVE. Second DOW loop
NOTE: The data set WORK.WANT has 120 observations and 11 variables. Conditional output
In a SAS Data Step i have a character variable called "varName". This variable stores the name of another variable. In below's example, it stores the name of the numeric variable "changeMe":
data TMP;
length
varName $32
changeMe 8
;
varName = ‘changeMe’;
/*??? How to change the content of variable that varName holds ???*/
run;
Now the question is: how do i change the content of the variable that varName holds?
The use case would be that varName acts as a dynamic pointer to different variables that i want to manipulate in a big SAS Data Set.
DATA Step does not directly provide for named indirect assignment.
In some cases, the indirect assignment requirement might indicate you want to perform a Proc TRANSPOSE data transformation. If the variable names and values are provided in a transaction data set, and the data has BY group variables, your better solution might be to TRANSPOSE the transaction data and merge that transform to the master data using an UPDATE or MODIFY statement.
Regardless, you can array variables of a given type and iterate the array looking for the target requiring assignment.
Example:
data want;
set sashelp.class;
varname = 'name';
varvalue = 'Scooter';
array chars _character_;
do _n_ = 1 to dim(chars);
if upcase (vname(chars(_n_))) = upcase(varname) then do;
chars(_n_) = varvalue;
end;
end;
run;
Output
call execute() is a highly feasible solution.
data TMP;
length
varName $32
changeMe 8
;
varName = 'changeMe';
run;
data _null_;
set TMP end=eof;
if _n_ = 1 then call execute('data %trim(&syslast.); modify %trim(&syslast.);');
call execute(cats(varName)||' = rand("uniform",1,0);');
if eof then call execute('run;');
run;
Log:
NOTE: CALL EXECUTE generated line.
1 + data WORK.TMP; modify WORK.TMP;
2 + changeMe = rand("uniform",1,0);
3 + run;
NOTE: There were 1 observations read from the data set WORK.TMP.
NOTE: The data set WORK.TMP has been updated. There were 1 observations rewritten, 0 observations
added and 0 observations deleted.
I am supposed to create a summary data set containing the mean, median, and standard deviation broken down by gender and group (using the CLASS statement). Using this summary data set, create four other data sets (in one DATA step) as follows:
(1) grand mean
(2) stats broken down by gender
(3) stats broken down by group
(4) stats broken down by gender and group
Given the hint to use the CHARTYPE option.
I provided my attempted solution, but I don't think I did it in the way asked.
DATA CLINICAL;
*Use LENGTH statement to control the order of
variables in the data set;
LENGTH PATIENT VISIT DATE_VISIT 8;
RETAIN DATE_VISIT WEIGHT;
DO PATIENT = 1 TO 25;
IF RANUNI(135) LT .5 THEN GENDER = 'Female';
ELSE GENDER = 'Male';
X = RANUNI(135);
IF X LT .33 THEN GROUP = 'A';
ELSE IF X LT .66 THEN GROUP = 'B';
ELSE GROUP = 'C';
DO VISIT = 1 TO INT(RANUNI(135)*5);
IF VISIT = 1 THEN DO;
DATE_VISIT = INT(RANUNI(135)*100) + 15800;
WEIGHT = INT(RANNOR(135)*10 + 150);
END;
ELSE DO;
DATE_VISIT = DATE_VISIT + VISIT*(10 + INT(RANUNI(135)*50));
WEIGHT = WEIGHT + INT(RANNOR(135)*10);
END;
OUTPUT;
IF RANUNI(135) LT .2 THEN LEAVE;
END;
END;
DROP X;
FORMAT DATE_VISIT DATE9.;
RUN;
PROC MEANS DATA=CLINICAL;
CLASS GENDER GROUP;
OUTPUT OUT=SUMMARY
MEAN=
MEDIAN=
STDDEV= / AUTONAME;
RUN;
No, what they're asking you to do is:
Use the OUTPUT statement in PROC MEANS to create a summary dataset. Choose the appropriate TYPES and CLASS values in PROC MEANS such that all four sets of data are represented on the output.
Using a single data step that has four dataset names on the data statement, selectively output those rows to the correct dataset. You would use the _TYPE_ variable to determine which dataset a row would be output to.
CHARTYPES just means your _TYPE_ variable will look like 1001 instead of 9 (the binary representation, basically). 1001 indicates which class variable is used (the first and the fourth) to create that breakout. (With only two class variables, you would have values 00, 01, 10, 11 possible). This is sometimes easier for non-programmers who aren't used to thinking in binary (these values would be 0, 1, 2, and 3 in decimal without CHARTYPES and thus might be more difficult for you to tell which corresponds to which variable).
I want to be able to create a flag, here called timeflag, that is set to 1 for every first and last entry of a certain Session, designated by logflag. What I have is the following but this gives me null data points:
data OUT.TENMAY_TIMEFLAG;
set IN.TENMAY_LOGFLAG;
if first.logflag then timeflag = 1;
if last.logflag then timeflag = 1;
run;
What is it about the first. and last. functions that I am not understanding here or is it that I have 2 if statements?
To have SAS create FIRST. and LAST. automatic variables you need to use a BY statement. If you want the new variable to be coded 1/0 then no need for the IF statement, just assign the automatic variable to a new permanent variable. To make one variable that is 1 for the first and the last then just use an OR.
data want;
set have;
by logflag ;
timeflag = first.logflag or last.logflag ;
run;
data OUT.TENMAY_TIMEFLAG;
set IN.TENMAY_LOGFLAG;
by logflag;
if first.logflag then timeflag = 1;
if last.logflag then timeflag = 1;
run;
P.S. in this case the dataset IN.TENMAY_LOGFLAG should be sorted by logflag.