How do I use PROC EXPAND to fill in time series observations within a panel (longitudinal) data set? - sas

I'm using this SAS code:
data test1;
input cust_id $
month
category $
status $;
datalines;
A 200003 ABC C
A 200004 DEF C
A 200006 XYZ 3
B 199910 ASD X
B 199912 ASD C
;
quit;
proc sql;
create view test2 as
select cust_id, input(put(month, 6.), yymmn6.) as month format date9.,
category, status from test1 order by cust_id, month asc;
quit;
proc expand data=test2 out=test3 to=month method=none;
by cust_id;
id month;
quit;
proc print data=test3;
title "after expand";
quit;
and I want to create a dataset that looks like this:
Obs cust_id month category status
1 A 01MAR2000 ABC C
2 A 01APR2000 DEF C
3 A 01MAY2000 . .
4 A 01JUN2000 XYZ 3
5 B 01OCT1999 ASD X
6 B 01NOV1999 . .
7 B 01DEC1999 ASD C
but the output from proc expand just says "Nothing to do. The data set WORK.TEST3 has 0 observations and 0 variables." I don't want/need to change the frequency of the data, just interpolate it with missing values.
What am I doing wrong here? I think proc expand is the correct procedure to use, based on this example and the documentation, but for whatever reason it doesn't create the data.

You need to add a VAR statement. Unfortunately, the variables need to be numeric. So just expand the month by cust_id. Then join back the original values.
proc expand data=test2 out=test3 to=month ;
by cust_id;
id month;
var _numeric_;
quit;
proc sql noprint;
create table test4 as
select a.*,
b.category,
b.status
from test3 as a
left join
test2 as b
on a.cust_id = b.cust_id
and a.month = b.month;
quit;
proc print data=test4;
title "after expand";
quit;

Related

SAS transpose columns to row and values to columns

I have a summary table which I want to transpose, but I can't get my head around. The columns should be the rows, and the columns are the values.
Some explanation about the table. Each column represents a year. People can be in 3 groups: A, B or C. In 2016, everyone (100) is in group A. In 2017, 35 are in group A (5 + 20 + 10), 15 in B and 50 in C.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
I want to be able to make a nice graph of the evolution of the groups through the different periods. So I want to end up with a table where the columns are the rows (=period) and the columns are the values (= the 3 different groups). Please find an example of the table I want:
Image of table want
I have tried different approaches, but I can't get what I want.
Maybe more direct way but this is probably how I would do it.
DATA have;
INPUT year2016 $ year2017 $ year2018 $ count;
id + 1;
DATALINES;
A A A 5
A A B 20
A A C 10
A B C 15
A C A 50
;
RUN;
proc print;
proc transpose data=have out=want1 name=period;
by id count notsorted;
var year:;
run;
proc print;
run;
proc summary data=want1 nway completetypes;
class period col1;
freq count;
output out=want2(drop=_type_);
run;
proc print;
run;
proc transpose data=want2 out=want(drop=_name_) prefix=Group_;
by period;
var _freq_;
id col1;
run;
proc print;
run;

SAS: Most frequent value (Like a MODE) ties solved by recency?

I have data like this:
data mydata;
input ID $ Val $ Date;
datalines;
1 A 2010-12-01
1 B 2010-12-03
1 A 2010-12-04
1 B 2010-12-08
2 X 2009-10-01
2 X 2009-10-02
2 Z 2009-10-03
;
run;
I would like the mode returned where it exists. ID 1, however, doesn't have a true mode. In the case of ties where modes do not exist I would like the most recent Val to break the tie (as in id 1).
Desired OUTPUT:
ID Mode
1 B
2 X
I tried proc univariate (which only handles numeric modes, another problem) but this gives the dataset with mode null; which SAS has correct but is not the desired output. I would like to do this in a datastep.
CODE:
proc univariate data=mydata noprint;
class id;
var val;
output out=modetable mode=mode;
run;
OUTPUT:
ID Mode
1
2 X
use IDgroup from proc means
An example of this statement can be fount in Identifying the Top Three Extreme Values with the Output Statistics
Let us extend the example data a little bit;
data myInput;
infile datalines dsd delimiter='09'x;
input
#1 ID 1.
#4 Val $1.
#7 Date yymmdd10.;
format Date yymmdd10.;
datalines;
2 X 2009-10-01
2 X 2009-10-02
2 Z 2009-10-03
3 C 2010-10-01
3 B 2010-10-03
3 A 2010-10-04
3 A 2010-12-01
3 B 2010-12-03
3 C 2010-12-04
;
run;
Now let us count the frequency and the last occurence of each ´Val´ for each ´ID´;
proc sql;
create view myView as
select ID, Val, max(Date) as Date format=yymmdd10., count(*) as freq
from myInput
group by ID, Val;
run;
And finally, retain one Val for each ID, prefering the more frequent one and within equally frequent ones the most recent one;
proc means data=myView nway noprint;
class ID;
output out=myModes(keep= ID Mode)
idgroup( max(freq Date) out[1] (Val)=Mode);
run;
proc print data=myModes;
run;
The result is;
ID Mode
2 X
3 C
Here is proc sql solution I came up with although I like the selected solution better:
%macro modes(data, mode , tie , break, outset , lib );
proc sql;
create table &lib..&outset as
select &id, &mode
from (select &id, &mode, latest
from(select &id, &mode, latest
from(select &id, &mode, count(*) as n, &break.(&tie) as latest
from &data
where &mode is not null
group by &id, &mode)
group by &id
having n = max(n))
group by &id
having latest= &break.(latest) )
;
quit;
%mend modes;
%modes(data=mydata, mode=age , tie=somedateorvalue , break=max, outset=outtable , lib =mylib);
Tie : is the column that is used to break ties
break : should be min or max, if you want earliest or latest date or high or low values to break ties with
The rest should be self explanatory.

Extracting info by matching two datasets in SAS

I have two datasets. Both have a common column- ID. I would like to check if ID from df1 lies in df2 and extract all such rows from df1. I'm doing this in SAS.
It is easily done in one sql query.
proc sql;
create table extract_from_df1 as
select
*
from
df1
where
id in (select id from df2)
;
quit;
There are lots of ways to do this. For example:
proc sql;
create table compare as select distinct
a.id as id1, b.id as id2
from table1 as a
left join table2 as b
on a.id = b.id;
quit;
and then keep matches. Or you can try:
proc sql;
delete from table2 where id2 in select distinct id1 from table1;
quit;
data df1;
input id name $;
cards;
1 abc
2 cde
3 fgh
4 ijk
;
run;
data df2;
input id address $;
cards;
1 abc
2 cde
5 ggh
6 ihh
7 jjj
;
run;
data c;
merge df1(in=x) df2(in=y);
if x and y;
keep id name;
run;
proc print data=c;
run;

SAS Proc SQL Count Issue

I have one column of data and the column is named (Daily_Mileage). I have 15 different types of daily mileages and 250 rows. I want a separate count for each of the 15 daily mileages. I am using PROC SQL in SAS and it does not like the Cross join command. I am not really sure what I should do but this is what I started:
PROC SQL;
select A, B
From (select count(Daily_Mileage) as A from Work.full where Daily_Mileage = 'Farm Utility Vehicle (Class 7)') a
cross join (select count(Daily_Mileage) as B from Work.full where Daily_Mileage = 'Farm Truck Light (Class 35)') b);
QUIT;
Use case statements to define your counts as below.
proc sql;
create table submit as
select sum(case when Daily_Mileage = 'Farm Utility Vehicle (Class 7)'
then 1 else 0 end) as A,
sum(case when Daily_Mileage = 'Farm Truck Light (Class 35)'
then 1 else 0 end) as B
from Work.full
;
quit ;
Can't you just use a proc freq?
data example ;
input #1 Daily_Mileages $5. ;
datalines ;
TYPE1
TYPE1
TYPE2
TYPE3
TYPE3
TYPE3
TYPE3
;
run ;
proc freq data = example ;
table Daily_Mileages ;
run ;
/* Create an output dataset */
proc freq data = example ;
table Daily_Mileages /out=f_example ;
run ;
You can first create another column of ones, then SUM that column and GROUP BY Daily_Mileage. Let me know if I'm misunderstanding your questions.
PROC SQL;
CREATE TABLE tab1 AS
SELECT Daily_Mileage, 1 AS Count, SUM(Count) AS Sum
FROM <Whatever table your data is in>
GROUP BY Daily_Mileage;
QUIT;

Update one dataset with another without using PROC SQL

I have the below two datasets
Dataset A
id age mark
1 . .
2 . .
1 . .
Dataset B
id age mark
2 20 200
1 10 100
I need the below dataset as output
Output Dataset
id age mark
1 10 100
2 20 200
1 10 100
How to carry out this without using PROC SQL i.e. using DATA STEP?
There are many ways to do this. The easiest is to sort the two data sets and then use MERGE. For example:
proc sort data=A;
by id;
run;
proc sort data=B;
by id;
run;
data WANT;
merge A(drop=age mark) B;
by ID;
run;
The trick is to drop the variables you are adding from the first data set A; the new variables will come from the second data set B.
Of course, this solution does not preserve the original order of the observations in your data set AND only works because your second data set contains unique values of id.
I tried this and it worked for me, even if you have data you would like to preserve in that column. Just for completeness sake I added an SQL variant too.
data a;
input id a;
datalines;
1 10
2 20
;
data b;
input id a;
datalines;
1 .
1 5
1 .
2 .
3 4
;
data c (drop=b);
merge a (rename = (a=b) in=ina) b (in = inb);
by id;
if b ne . then a = b;
run;
proc sql;
create table d as
select a.id, a.a from a right join b on a.id=b.id where a.id is not null
union all
select b.id, b.a from a right join b on a.id = b.id where a.id is null
;
quit;