Transpose Multiple Variables to Columns Suffixed with Dates - sas

I have data that looks like this:
data have;
format date monyy.;
input date:date9. group$ value1 value2;
datalines;
01JAN2020 A 100 10
01FEB2020 A 200 20
01JAN2020 B 300 30
01FEB2020 B 400 40
;
run;
But I need it to look like this, where value1 value2 are suffixed with date. This format is being used to export to Excel.
data want;
input group$ value1_JAN20 value1_FEB20 value2_JAN20 value2_FEB20;
datalines;
A 100 200 10 20
B 300 400 30 40
;
run;
I have tried PROC TRANSPOSE, but the actual values of value1 value2 are stored as date names with the names of value1 value2 stored in the column _NAME_.
Is there a way to do this with PROC TRANSPOSE, or is this only a data step/macro solution?
proc transpose data = have
out = try
delim = '_'n
name = date;
by group;
id date;
var value1 value2;
run;

Transpose twice. First to get a truly vertical structure. Then to generate the structure you want.
data have;
input date :date9. group $ value1 value2;
format date monyy.;
datalines;
01JAN2020 A 100 10
01FEB2020 A 200 20
01JAN2020 B 300 30
01FEB2020 B 400 40
;
proc transpose data=have out=tall;
by group date ;
var value1-value2;
run;
proc transpose data=tall out=want(drop=_name_) delim=_;
by group;
id _name_ date ;
var col1 ;
run;
Results:
value1_ value2_ value1_ value2_
Obs group JAN20 JAN20 FEB20 FEB20
1 A 100 10 200 20
2 B 300 30 400 40

Related

matching two datasets with one month lag

I am trying to match max daily data within a month to a monthly data.
data daily;
input permno $ date ret;
datalines;
1000 19860101 88
1000 19860102 90
1000 19860201 70
1000 19860202 55
1001 19860201 97
1001 19860202 74
1001 19860203 79
1002 19860301 55
1002 19860302 100
1002 19860301 10
;
run;
data monthly;
input permno $ date ret;
datalines;
1000 19860131 1
1000 19860228 2
1000 19860331 5
1001 19860331 3
1002 19860430 4
;
run;
The result I want is the following; (I want to match daily max data to one month lag monthly data. )
1000 19860102 90 1000 19860228 2
1000 19860201 70 1000 19860331 5
1001 19860201 97 1001 19860331 3
1002 19860302 100 1002 19860430 4
Below is what I have tried so far.
I want to have maximum ret value within a month so I have created yrmon to assign same yyyymm data for the same month daily data
data a1; set daily;
yrmon=year(date)*100 + month(date);
run;
In order to choose the maximum value(here, ret) within same yrmon group for the same permno, I used code below
proc means data=a1 noprint;
class permno yrmon ;
var ret;
output out= a2 max=maxret;
run;
However, it only got me permno yrmon ret data, leaving the original date data away.
data a3;
set a2;
new=intnx('month',yrmon,1);
format date new yymmn6.;
run;
But it won't work since yrmon is no longer date format.
Thank you in advance.
Hello
I am trying to match two different sets by permno(same company) but with one month lag (eg. daily9 dataset yrmon=198601 and monthly2 dataset yrmon=198602)
it is pretty difficult to handle for me because if I just add +1 in yrmon, 198612 +1 will not be 198701 and I am confused with handling these issues.
Can anyone help?
1) informat date1/date2 yymmn6. is used to read the date in yyyymm format
2) format date1/date2 yymmn6. is used to view the date in yyyymm format
3) intnx("months",b.date2,-1) is used to join the dates with lag of 1 month
data data1;
input date1 value1;
informat date1 yymmn6.;
format date1 yymmn6.;
cards;
200101 200
200212 300
200211 400
;
run;
data data2;
input date2 value2;
informat date2 yymmn6.;
format date2 yymmn6.;
cards;
200101 3000000
200102 4000000
200301 2000000
200212 2000000
;
run;
proc sql;
create table result as
select a.*,b.date2,b.value2 from
data1 a
left join
data2 b
on a.date1 = intnx("months",b.date2,-1);
quit;
My Output:
date1 |value1 |date2 |value2
200101 |200 |200102 |4000000
200211 |400 |200212 |2000000
200212 |300 |200301 |2000000
Let me know in case of any queries.

Transpose multiple columns to rows in SAS

I am new to SAS and I want to transpose the following table in SAS
From
ID Var1 Var2 Jul-09 Aug-09 Sep-09
1 10 15 200 300
2 5 17 -150 200
to
ID Var1 Var2 Date Transpose
1 10 15 Jul-09 200
1 10 15 Aug-09 300
2 5 17 Aug-09 -150
2 5 17 Sep-09 200
Can anyone help please?
You can use proc transpose to tranform data.
options validvarname=any;
data a;
infile datalines missover;
input ID Var1 Var2 "Jul-09"n "Aug-09"n "Sep-09"n;
datalines;
1 10 15 200 300
2 5 17 -150 200
;
run;
proc transpose data=a out=b(rename=(_NAME_=Date COL1=Transpose));
var "Jul-09"n--"Sep-09"n;
by ID Var1-Var2;
run;
data a;
input ID Var1 Var2 Jul_09 Aug_09;
CARDS;
1 10 15 200 300
2 5 17 -150 200
;
DATA b(drop=i jul_09 aug_09);
array dates_{*} jul_09 aug_09;
set a;
do i=1 to dim(dates_);
this_value=dates_{i};
this_date=input(compress(vname(dates_{i}),'_'),MONYY5.);
output;
end;
format this_date monyy5.;
run;

Getting next observation per group

I am working on a dataset in SAS to get the next observation's score should be the current observation's value for the column Next_Row_score. If there is no next observation then the current observation's value for the column Next_Row_score should be 'null'per group(ID). For better illustration i have provided the sample below dataset :
ID Score
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
Resultant output should be like -
ID Salary Next_Row_Salary
10 1000 1500
10 1500 2000
10 2000 .
20 3000 4000
20 4000 .
30 2500 2500
Thank you in advance for your help.
data want(drop=_: flag);
merge have have(firstobs=2 rename=(ID=_ID Score=_Score));
if ID=_ID then do;
Next_Row_Salary=_Score;
flag+1;
end;
else if ID^=_ID and flag>=1 then do;
Next_Row_Salary=.;
flag=.;
end;
else Next_Row_Salary=score;
run;
Try this :
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
proc sql noprint;
select count(*) into :obsHave
from have;
quit;
data want2(rename=(id1=ID Score1=Salary) drop=ID id2 Score);
do i=1 to &obsHave;
set have point=i;
id1=ID;
Score1=Score;
j=i+1;
set have point=j;
id2=ID;
if id1=id2 then do;
Next_Row_Salary = Score;
end;
else Next_Row_Salary=".";
output;
end;
stop;
;
run;
There is a simpler (in my mind, at least) proc sql approach that doesn't involve loops:
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
/*count each observation's place in its ID group*/
data have2;
set have;
count + 1;
by id;
if first.id then count = 1;
run;
/*if there is only one ID in a group, keep original score, else lag by 1*/
proc sql;
create table want as select distinct
a.id, a.score,
case when max(a.count) = 1 then a.score else b.score end as score2
from have2 as a
left join have2 (where = (count > 1)) as b
on a.id = b.id and a.count = b.count - 1
group by a.id;
quit;

proc tabulate table with where clause

I have a dataset with tree variables, three binary variables.
I wrote a proc tabulate
proc tabulate data=mydata;
class country var1 var2;
table Country, var1 var2;
run;
Var1 Var2
0 1 0 1
USA 40 50 40 50
AUS 50 20 50 20
IRE 60 40 60 40
DUB 70 50 70 50
Here I get the table with the totals of both var 1 var 2 for 0s and 1s.
However I want only the totals of 1s in this cross table. How can I do that.
If I use a where caluse as below, it shows only 1ns both..
proc tabulate data=mydata;
class country var1 var2;
table Country, var1 var2;
where var1=1 and var2=2;
run;
When I use the above it brings out only the 1s present in both at the sametime.
Which is not I am looking for.
So the dataset I want is as below.
Var1 Var2
1 1
USA 50 50
AUS 20 20
IRE 40 40
DUB 50 50
Is there any other way of doing this?
Change and to or.
Truth table for
Var1=1, Var2=1
Include?
Var1 Var2 AND OR
0 0 N N
0 1 N Y
1 0 N Y
1 1 Y Y
Since your variables are coded 0,1 you can ask for the SUM statistic to get the "count" of the number of ones.
proc tabulate data=mydata;
class country;
var var1 var2;
table Country, var1*sum var2*sum;
run;

PROC SUMMARY using multiple fields as a key

Is there a way to replicate the behaviour below using PROC SUMMARY without having to create the dummy variable UniqueKey?
DATA table1;
input record_desc $ record_id $ response;
informat record_desc $4. record_id $1. response best32.;
format record_desc $4. record_id $1. response best32.;
DATALINES;
A001 1 300
A001 1 150
A002 1 200
A003 1 150
A003 1 99
A003 2 250
A003 2 450
A003 2 250
A004 1 900
A005 1 100
;
RUN;
DATA table2;
SET table1;
UniqueKey = record_desc || record_id;
RUN;
PROC SUMMARY data = table2 NWAY MISSING;
class UniqueKey record_desc record_id;
var response;
output out=table3(drop = _FREQ_ _TYPE_) sum=;
RUN;
You can summarise by record_desc and record_id (see class statement below) without creating a concatenation of the two columns. What made you think you couldn't?
PROC SUMMARY data = table1 NWAY MISSING;
class record_desc record_id;
var response;
output out=table4(drop = _FREQ_ _TYPE_) sum=;
RUN;
proc compare
base=table3
compare=table4;
run;