how to use retain statement in SAS to populate missing data? - sas

I have a dataset which has multiple rows of data for a given person, but only the first row of the person's information contains their name. The rest of the rows of that person's data have the name field missing. I think I can use the retain statement to populate the name, but nothing I try works.
Here is an example of the dataset structure I am working with:
data test;
input id $ value ;
datalines;
Bob 100
. 200
. 300
Jim 475
. 250
. 300
;
run;

I think the problem is that technically id is not missing in those rows, it equals ., even though when reading datalines with input statement you get empty id.
Try this:
data test;
input id $ value;
/*store not empty ID in different retained variable*/
retain current_id;
if not missing(id) then current_id=id;
else id=current_id;
datalines;
Bob 100
. 200
. 300
Jim 475
. 250
. 300
;
run;

Related

How to fill in missing values to show age at a specified date

I am looking to find the age of a person when comparing multiple different dates in years. I have the persons age on their first record at a specific date but I want to find their age at multiple following dates in years.
My dataframe is something like this
data;
input ID date age ;
dataline;
1 10/27/2004 21
1 02/04/2006 .
1 12/08/2009 .
2 07/25/2007 24
2 08/31/2008 .
2 08/27/2012 .
run;
I tried this but it rearanged my data & only added to the first age variable of each ID. I thought of maybe using RETAIN as well but did not have any luck using that either.
data want;
set have;
by age;
if first.id then age1=first.date-date+first.age;
run;
Instead of trying to calculate the age dependent on previous values I would recommend calculating the birth_year and using that going forward.
Use RETAIN to hold the birth_year across the rows and then use it to calculate the age when age is missing.
data have;
input ID date : mmddyy10. age;
format date date9.;
datalines;
1 10/27/2004 21
1 02/04/2006 .
1 12/08/2009 .
2 07/25/2007 24
2 08/31/2008 .
2 08/27/2012 .
;
;
;;
run;
data want;
set have;
by id;
retain birth_year;
if first.id then
birth_year=year(date) - age;
if missing(age) then
age=year(date) - birth_year;
run;
FYI your data step doesn't run as posted. Please test it before posting.

Produce custom table in SAS with a subsetted data set

I want to use SAS and eg. proc report to produce a custom table within my workflow.
Why: Prior, I used proc export (dbms=excel) and did some very basic stats by hand and copied pasted to an excel sheet to complete the report. Recently, I've started to use ODS excel to print all the relevant data to excel sheets but since ODS excel would always overwrite the whole excel workbook (and hence also the handcrafted stats) I now want to streamline the process.
The task itself is actually very straightforward. We have some information about IDs, age, and registration, so something like this:
data test;
input ID $ AGE CENTER $;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
The goal is to produce a table report which should look like this structure-wise:
ID NO-ID Total
Count 3 2 5
Age (mean) 27 45.5 34.4
Count by Center:
A 2 1 3
B 0 1 1
A 1 0 1
It seems, proc report only takes variables as columns but not a subsetted data set (ID NE .; ID =''). Of course I could just produce three reports with three subsetted data sets and print them all separately but I hope there is a way to put this in one table.
Is proc report the right tool for this and if so how should I proceed? Or is it better to use proc tabulate or proc template or...?
I found a way to achieve an almost match to what I wanted. First if all, I had to introduce a new variable vID (valid ID, 0 not valid, 1 valid) in the data set, like so:
data test;
input ID $ AGE CENTER $;
if ID = '' then vID = 0;
else vID = 1;
datalines;
111 23 A
. 27 B
311 40 C
131 18 A
. 64 A
;
run;
After this I was able to use proc tabulate as suggested by #Reeza in the comments to build a table which pretty much resembles what I initially aimed for:
proc tabulate data = test;
class vID Center;
var age;
keylabel N = 'Count';
table N age*mean Center*N, vID ALL;
run;
Still, I wonder if there is a way without introducing the new variable at all and just use the SAS counters for missing and non-missing observations.
UPDATE:
#Reeza pointed out to use the proc format to assign a value to missing/non-missing ID data. In combination with the missing option (prints missing values) in proc tabulate this delivers the output without introducing a new variable:
proc format;
value $ id_fmt
' ' = 'No-ID'
other = 'ID'
;
run;
proc tabulate data = test missing;
format ID $id_fmt.;
class ID Center;
var age;
keylabel N = 'Count';
table N age*(mean median) Center*N, (ID=' ') ALL;
run;

enter column in a dataset to an array

I have 33 different datasets with one column and all share the same column name/variable name;
net_worth
I want to load the values into arrays and use them in a datastep. But the array that I use should depend on the the by groups in the datastep (country by city). There are total of 33 datasets and 33 groups (country by city). each dataset correspond to exactly one by group.
here is an example what the by groups look like in the dataset: customers
UK 105 (other fields)
UK 102 (other fields)
US 291 (other fields)
US 292 (other fields)
Could I get some advice on how to go about and enter the columns in arrays and then use them in a datastep. or do you suggest to do it in another way?
%let var1 = uk105
%let var2 = uk102
.....
&let var33 = jk12
data want;
set customers;
by country city;
if _n_ = 1 then do;
*set datasets and create and populate arrays*;
* use array values in calculations with fields from dataset customers, depending on which by group. if the by group is uk and city is 105 then i need to use the created array corresponding to that by group;
It is a little hard to understand what you want.
It sounds like you have one dataset name CUSTOMERS that has all of the main variables and a bunch of single variable datasets that the values of NET_WORTH for a lot of different things (Countries?).
Assuming that the observations in all of the datasets are in the same order then I think you are asking for how to generate a data step like this:
data want;
set customers;
set uk105 (rename=(net_worth=uk105));
set uk103 (rename=(net_worth=uk103));
....
run;
Which might just be easiest to do using a data step.
filename code temp;
data _null_;
input name $32. ;
file code ;
put ' set ' name '(rename=(net_worth=' name '));' ;
cards;
uk105
uk102
;;;;
data want;
set customers;
%include code / source2;
run;

Convert long to wide without missing values in SAS

I have a dataset which has three variables: Application number, decline code and sequence. Now, there may be multiple decline code for a single application(which will have different sequence number). So the data looks like following:
Application No Decline Code Sequence
1234 FG 1
1234 FK 3
1234 AF 2
1256 AF 2
1256 FK 1
.
.
.
.
And so on
So, I have to put this in wide format such that the first column contains unique application numbers and corresponding to each of them is their decline code(I don't need sequence number, just that decline codes should appear in order of their sequence number from left to right, separated by a comma). Something like below
Application Number Decline Code
1234 FG, AF, FK
1256 FK, AF
..........
.........
And so on
Now I tried ruining proc transpose by application number on SAS. But the problem is that it creates multiple columns with all the decline codes listed and then if a certain decline code doesn't apply for an application, it will show . in that. So their are many missing values and it isn't quite the format I am expecting. Is there any way to do this in SAS or sql?
PROC TRANSPOSE can certainly help here; then you can CATX the variables together if you really just want one variable:
data have;
input ApplicationNo DeclineCode $ Sequence ;
datalines;
1234 FG 1
1234 FK 3
1234 AF 2
1256 AF 2
1256 FK 1
;;;;
run;
proc sort data=have;
by ApplicationNo Sequence;
run;
proc transpose data=have out=want_pre;
by ApplicationNo;
var DeclineCode;
run;
data want;
set want_pre;
length decline_codes $1024;
decline_codes = catx(', ',of col:);
keep ApplicationNo decline_codes;
run;
You could also do this trivially in one datastep, using first and last checks.
data want_ds;
set have;
by ApplicationNo Sequence;
retain decline_codes;
length decline_codes $1024; *or whatever you need;
if first.ApplicationNo then call missing(decline_codes);
decline_codes = catx(',',decline_codes, DeclineCode);
if last.ApplicationNo then output;
run;

SAS reading data with 2 delimiter

I have the data as follows
id^number^obs
123^2^a~b
124^3^c~d~e
125^4^f~g~h~i
the first number is a unique id, the second number is the # of observations for the id, the rest of the line is the observations.
for the first line, the unique id is 123, it has 2 observations: they are a and b
I want read the data into SAS as
id number obs
123 2 a
123 2 b
124 3 c
124 3 d
124 3 e
125 4 f
125 4 g
125 4 h
125 4 i
My question is how I can do that in SAS?
Thanks a lot!
I'm assuming this is a question regarding reading in data from a flat-file and storing it in a SAS dataset. The following code will do that for you:
/* Insert filename */
filename myfile "";
/* This writes out a dataset called mydataset from the flat-file */
data mydataset;
infile myfile dlm='^' dsd firstobs=2;
input id number _obs $;
_i=1;
do until (scan(_obs,_i,'~') = '');
obs=scan(_obs,_i,'~');
_i+1;
drop _:; /* Remove this line to see all variables in final dataset */
output;
end;
run;
Explanation
The data-step reads in records from the flat-file, but before outputting to the dataset, it uses the scan function to separate the obs variable by '~', outputting a separate observation for each value.
As mentioned in the comment, you can remove the drop statement to further understand how the code is working.