BEGINNER - coding date of birth in SAS - sas

enter image description hereI'm new to SAS coding in a beginning class and I'm struggling with the basics. I'm trying to first create a date_of_birth variable but I keep getting that month_of_birth, day_of_birth, and year_of_birth is uninitialized. I've put the code below, any help would be appreciated!
code:
data dob;
set Assign6_sp2022;
date_of_birth = MDY (Month_of_Birth, Day_of_Birth, Year_of_Birth);
run;
log:
48 data dob;
49 set Assign6_sp2022;
50 date_of_birth = MDY (Month_of_Birth, Day_of_Birth, Year_of_Birth);
51 run;
NOTE: Variable Month_of_Birth is uninitialized.
NOTE: Variable Day_of_Birth is uninitialized.
NOTE: Variable Year_of_Birth is uninitialized.
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 50:17
NOTE: There were 1 observations read from the data set WORK.ASSIGN6_SP2022.
NOTE: The data set WORK.DOB has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

It seems your table is in the Assn6 library.
Make sure to use it in the set statement : set Assn6.Assign6_sp2022.
Also it seems your variables have spaces in their names.
if that is the case, you need to use a quoted string followed by the letter N.
Use "Month of Birth"n, "Day of Birth"n and "Year of Birth"n instead.
data dob;
set Assn6.Assign6_sp2022;
date_of_birth = mdy("Month of Birth"n, "Day of Birth"n, "Year of Birth"n);
run;

Related

How to create a new variable of age based upon an existing numeric date born variable in sas?

I want to create a numeric age variable using an existing numeric born date variable (MMDDYY10) in SAS. This "BORN" variable is numeric with a length of 8, the format is MMDDYY10. I'm assuming to use: age=today's date -BORN date. However, BORN date is like:-15226、-8803….I just don't understand why before these number, there is a minus signal. So what is the code to transfer to actual age?
I don't understand why before born date number, there is a minus signal. So how to use today's date minus born date of patient?
SAS is using a number for date/time. Dates are defined as number of days between 1.1. 1960 and specified date, so dates before that time are negative. To translate it to a (for people) readable form, you have to use formats (for example MMDDYY10.)
Similarly time is a number of seconds since midnight of the current day. SAS time values are between 0 and 86400.
Your code would look like this:
data have;
input born MMDDYY10.;
format born MMDDYY10.;
datalines;
03/17/2000
11/11/1988
08/11/1923
;
run;
data want;
set have;
age = floor((DATE()-born) / 365.25);
run;
SAS will correctly translate your input (if you correctly used your formats) into numbers, which are easy for a program to calculate with.

SAS - create data step variables using a dynamic macro-variable

I want to store an instance of a data step variable in a macro-variable using call symput, then use that macro-variable in the same data step to populate a new field, assigning it a new value every 36 records.
I tried the following code:
data a;
set a;
if MOB = 1 then do;
MOB1_accounts = accounts;
call symput('MOB1_acct', MOB1_accounts);
end;
else if MOB > 1 then MOB1_accounts = &MOB1_acct.;
run;
I have a series of repeating MOB's (1-36). I want to create a field called MOB1_Accts, set it equal to the # of accounts for that cohort where MOB = 1, and keep that value when MOB = 2, 3, 4 etc. I basically want to "drag down" the MOB 1 value every 36 records.
For some reason this macro-variable is returning "1" instead of the correct # accounts. I think it might be a char/numeric issue but unsure. I've tried every possible permutation of single quotes, double quotes, symget, etc... no luck.
Thanks for the help!
You are misusing the macro system.
The ampersand (&) introducer in source code tells SAS to resolve the following symbol and place it into the code submission stream. Thus, the resolved &MOB1_acct. can not be changed in the running DATA Step. In other words, a running step can not change it's source code -- The resolved macro variable will be the same for all implicit iterations of the step because its value became part of the source code of the step.
You can use SYMPUT() and SYMGET() functions to move strings out of and into a DATA Step. But that is still the wrong approach for your problem.
The most straight forward technique could be
use of a retained variable
mod (_n_, 36) computation to determine every 36th row. (_n_ is a proxy for row number in a simple step with a single SET.)
Example:
data a;
set a;
retain mob1_accounts;
* every 36 rows change the value, otherwise the value is retained;
if mod(_n_,36) = 1 then mob1_accounts = accounts;
run;
You didn't show any data, so the actual program statements you need might be slightly different.
Contrasting SYMPUT/SYMGET with RETAIN
As stated, SYMPUT/SYMGET is a possible way to retain values by off storing them in the macro symbol table. There is a penalty though. The SYM* requires a function call and whatever machinations/blackbox goings on are happening to store/retrieve a symbol value, and possibly additional conversions between character and numeric.
Example:
1,000,000 rows read. DATA _null_ steps to avoid writing overhead as part of contrast.
data have;
do rownum = 1 to 1e6;
mob + 1;
accounts = sum(accounts, rand('integer', 1,50) - 10);
if mob > 36 then mob = 1;
output;
end;
run;
data _null_;
set have;
if mob = 1 then call symput ('mob1_accounts', cats(accounts));
mob1_accounts = symgetn('mob1_accounts');
run;
data _null_;
set have;
retain mob1_accounts;
if mob = 1 then mob1_accounts = accounts;
run;
On my system logs
142 data _null_;
143 set have;
144
145 if mob = 1 then call symput ('mob1_accounts', cats(accounts));
146
147 mob1_accounts = symgetn('mob1_accounts');
148 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.34 seconds
cpu time 0.34 seconds
149
150 data _null_;
151 set have;
152 retain mob1_accounts;
153
154 if mob = 1 then mob1_accounts = accounts;
155 run;
NOTE: There were 1000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.03 seconds
Or
way real cpu
------------- ------ ----
SYMPUT/SYMGET 0.34 0.34
RETAIN 0.04 0.03

sas, combine, unknown code datasets observations

I need help reading the code below. I am not sure what specific parts in this code are doing. For example, what does ( firstobs = 2 keep = column3 rename = (column3 = column4) ) do?
Also, what does ( obs = 1 drop = _all_ ); do?
I have also not used column5 = ifn( first.column1, (.), lag(column3) ); before. What does this do?
I am reading someone else's code. I wish I could provide more detail. If I find a solution, I will post it. Thanks for your help.
data out.dataset1;
set out.dataset2;
by column1;
WHERE column2 = 'N';
set out.dataset1 ( firstobs = 2 keep = column3 rename = (column3 = column4) )
out.dataset1 ( obs = 1 drop = _all_ );
FORMAT column5 DATETIME20.;
FORMAT column4 DATETIME20.;
column5 = ifn( first.column1, (.), lag(column3) );
column4 = ifn( last.column1, (.), column4 );
IF first.column1 then DIF=intck('dtday',column4,column3);
ELSE DIF= intck('dtday',column5,column3);
format column6 $6.;
IF first.column1
OR intck('dtday',column5,column3) GT 20 THEN column6= 'HARM';
ELSE column6= 'REPEAT';
run;
Seems like you need to learn about SAS datastep languange!
This series of things happening in the parenthesis are datastep options
You can use those options when ever you are referencing a table, even in a proc sql
The options you have:
firstobs : This starts the datafeed on the record enter in your case 2 it means SAS will start on the table on the 2nd record.
keep : This will only use the fields in list rather than using all the field in the table
rename = rename will rename field, so it works like an alias in SQL
OBS = will limit the amount of record you pull out of a table like top or limit in SQL
DROP = would remove the fields selected from the table in your case all is used wich means he's dropping all the fields.
as for the functions:
LAG is keeping the value from the previous record for the field you put in parenthesis so DPD_CLOSE_OF_BUSINESS_DT
INF = Works like a case or if. Basically you create a condition in the 1st argument and then the 2nd argument is applied when your condition in the 1st argument is true, the 3rd argument get done in the event that your condition on the 1st argument is false.
So to answer that question if it's the first record for the variable SOR_LEASE_NBR then the field Prev_COB_DT will be . otherwise it will be a previous value of DPD_CLOSE_OF_BUSINESS_DT.
The best advise I can give you is to start googling SAS and the function name you are wondering what it does, then it's a matter of encapsulation!
Hope this helps!
Basically your data step is using the LAG() function to look back one observation and the extra SET statement to look ahead one observation.
The IFN() function calls are then being used to make sure that missing values are assigned when at the boundary of a group.
You then use these calculated PREV and NEXT dates to calculate the DIF variable.
Note for this to work you need to be referencing the same input dataset in the two different SET statements (the dataset used in the last with the obs=1 and drop=_all_ dataset options
doesn't really need to be the same since it is not reading any of the actual data, it just has to have at least one observation).
( firstobs = 2 keep = DPD_CLOSE_OF_BUSINESS_DT rename =
(DPD_CLOSE_OF_BUSINESS_DT = Next_COB_DT) ) do?
Here the code firstobs=2 says SAS to read the data from 2nd observation in the dataset.
and also by using rename option trying to change the name of the variable.
(obs = 1 drop = _all_);
obs=1 is reading only the 1st obs in the dataset. If you specify obs=2 then up to 2nd obs will be read.
drop = _all_ , is dropping all of your variables.
Firstobs:
Can read part of the data. If you specify Firstobs= 10, it starts reading the data from 10th observation.
Obs :
If specify obs=15, up to 15th obs the data will be readed.
If you run the below table, it gives you 3 observations ( from 2nd to 4th ) in the output result.
Example;
DATA INSURANCE;
INFILE CARDS FIRSTOBS=2 OBS=4;
INPUT NAME$ GENDER$ AGE INSURANCE $;
CARDS;
SOWMYA FEMALE 20 MEDICAL
SUNDAR MALE 25 MEDICAL
DIANA FEMALE 67 MEDICARE
NINA FEMALE 56 MEDICAL
RUN;

Automating creation of an indicator variable in SAS

I am working with a SAS dataset that includes up to 30 medications prescribed to an individual patient. The medications are coded med1, med2 ... med30. Each medication is represented by a 5-digit character variable. Using the identifier, I can then code the name of the drug, and whether that particular medication is a topical antibiotic or a systemic antibiotic.
For each patient, I want to use all 30 medication codes to create one variable indicating whether the patient got a topical antibiotic only, a systemic antibiotic only, or both a topical and an oral antibiotic. So if any of the 30 medications is a systemic antibiotic, I want the patient coded as oral_antibiotic=1.
I currently have this code:
data want;
set have;
array meds[30] med1-med30;
if meds[i] in ('06925' '06920') then do;
penicillin=1;
oral_antibiotic=1;
end;
else if meds[i] in ('03197') then do;
neosporin=1;
topical_antibiotic=1;
end;
.... (many more do loops with many more medications)
run;
The problem is that this code creates one indicator variable instead of 30, overwriting previous information.
I think that I really need 30 indicator variables, indicating whether each of the 30 drugs is an oral or topical antibiotic, before I write code that says if any of the drugs are oral antibiotics, the patient received an oral antibiotic.
I am new to macros and would really appreciate help.
data current;
input med1 med2 med3;
cards;
'06925' '06920' '03197' ;
run;
And I want this:
data want;
input med1 topical_antibiotic1 oral_antibiotic1 med2 topical_antibiotic2 oral_antibiotic2 med3 topical_antibiotic3 oral_antibiotic3;
cards;
'06925' 0 1 '06920' 0 1 '03197' 1 0
;
run;
I think that I really need 30 indicator variables, indicating whether
each of the 30 drugs is an oral or topical antibiotic, before I write
code that says if any of the drugs are oral antibiotics, the patient
received an oral antibiotic.
That's not true. Your current approach is fine as long as you're not resetting them. You don't show us the full code, so it's hard to say, but I'm going to assume that's what is happening here.
Your loop should look like:
array med(30) med1-med30;
*set to 0 at top of the loop;
topical_antibiotic=0; oral_antibiotic=0;
do i=1 to dim(med);
if med(i) in (.....) /*list of topical codes*/ then topical_antibiotic=1;
else if med(i) in (.....) /*list of oral codes*/ then oral_antibiotic=1;
end;
This assumes that an antibiotic cannot be in both Topical/Oral groups. If it can, you need to remove the ELSE from the second IF statement.
I agree that you probably only need one indicator variable for each drug group, (medication of interest). Seems like you just want to know for each subject, "Do they have it?" This example flips the arguments of the IN operator. If you had given more example data I could have done better with this example.
data current;
infile cards missover;
array med[3] $5;
input med[*];
oral_antibotic = '069' in: med; /*Assume oral all start with '069'*/;
topical_antibotic = '03197' in med;
cards;
06925 06920 03197
06925
;;;;
run;

Yrdif returns unexpected values for certain date spans?

Attempting to calculate age, I did a bit of googling and discovered that yrdif was updated in 9.3 to include a handy-dandy 'AGE' option.
However, in using it, I noticed that when calculating date spans ranging from Jan 1st to Dec 31st, we get some unexpected results. Examples:
age = yrdif('01Jan1932'd,'31Dec2012'd,'Age');
put age;
The above yields 81 years, when it should be one day less than 81 years (80.9972222). But more surprising is the result when we increment the dates by one day:
age = yrdif('02Jan1932'd,'01Jan2013'd,'AGE');
put age;
Now we get the expected value (80.997222).
Bug? Something else going on here that I'm not aware of? Desired next step was to simply do floor(yrdif(dob,dod,'AGE')) to get age, but it seems like it will not be quite so easy.
In 9.3 TS1M2 and 9.4 TS1M2 I get the expected result:
1 data _null_;
2 age = yrdif('01Jan1932'd,'31Dec2012'd,'Age');
3 put age;
4 run;
80.997260274
Perhaps it was a fixed bug. Searching TS notes for that doesn't come up with anything.
In 9.3+ you can also use INTCK to correctly calculate age if you want the years as an integer.
age2= intck('YEAR','01Jan1932'd,'31Dec2012'd,'c');
The 'c' at the end asks SAS to consider the interval continuous, so it correctly handles intrayear differences.
data _null_;
age = yrdif('01Jan1932'd,'02Jan2013'd,'Age');
age2= intck('YEAR','01Jan1932'd,'31Dec2012'd,'c');
age3= intck('YEAR','01Mar1932'd,'01Jan2013'd,'c');
age4= intck('YEAR','01Mar1932'd,'01Apr2013'd,'c');
put age= age2= age3= age4=;
run;
So here, age3 is correctly 80 while age4 is correctly 81; in the past this would've been incorrect (both would be 81).