SAS, switch rows and columns, for results - sas

I would like to reformat my printed results, if there is a way.
What I got after running a proc means is:
Yearㅣ # ㅣ Variable ㅣ N ㅣMean
-------------------------------------
1991ㅣ x ㅣ AAAAA ㅣ x ㅣ x
1991ㅣ x ㅣ BBBBB ㅣ x ㅣ x
1991ㅣ x ㅣ CCCCC ㅣ x ㅣ x
1992ㅣ x ㅣ AAAAA ㅣ x ㅣ x
1992ㅣ x ㅣ BBBBB ㅣ x ㅣ x
1992ㅣ x ㅣ CCCCC ㅣ x ㅣ x
1993ㅣ x ㅣ AAAAA ㅣ x ㅣ x
1993ㅣ x ㅣ BBBBB ㅣ x ㅣ x
1993ㅣ x ㅣ CCCCC ㅣ x ㅣ x
I would like to change this result table so that 1991 1992 1993 are the columns
and I would like the rows presented as:
AAAAA
N
Mean
BBBBB
N
Mean
CCCCC
N
Mean
Thank you.

For displayed output (versus produced output data sets) you can use Proc TABULATE to easily arrange your presentation.
SASHELP.FISH is SAS sample data set that is likely the same kind of layout as your data.
Example:
Demonstration wise, in FISH the variable species would correspond to your YEAR
Compare and contrast MEANS output with TABULATE output. The numbers are the same but the presentation is different (stats in rows instead of columns).
proc means data=sashelp.fish n mean ;
class species;
var weight height width;
where species =: 'P'; * limit for demonstration;
run;
proc tabulate data=sashelp.fish;
title "Tabulate";
title2 font=courier "(weight height width) * (n mean) , species";
class species;
var weight height width;
table (weight height width) * (n mean) , species;
where species =: 'P'; * limit for demonstration;
run;
will produce these two outputs

Try this. It might not be exactly what you want, but it probably comes close.
proc sort data=WHAT_I_HAVE;
by variable year;
proc transpose data=WHAT_I_HAVE out=WHAT_I_NEED;
by variable;
id year
var n mean;
run;
Disclaimer: this is untested code and might need debugging.

Related

Deleting first instance of a column after group by in sas proc sql

I have the following SAS dataset.
correlation
policynum
risknum
A
X
Y
A
X
Y
A
X
Y
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
C
Z
M
C
Z
M
C
Z
M
D
Z
M
D
Z
M
D
Z
M
In SAS, I want to filter the above dataset so I get my final output as:
correlation
policynum
risknum
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
D
Z
M
D
Z
M
D
Z
M
i.e. for each group of policynum and risknum, if multiple values exist for correlation, I want to keep the second value and get rid of the first value.
If only a single value of correlation exists for a group of policynum and risknum, I want to retain that group in my final output too.
What would be the best way to do this? It might be something simple as I am relatively new to SAS.
Thanks in advance!
If the order of the correlation values, in sort order, is the same ordering as they appear row-wise in the data set you can use SQL. Otherwise, SQL, being based on set theory, which does not have implicit row numbers, can not be used. A DATA step with DOW loop can be used.
Example:
FYI, one common situation in which SAS coders use the phrase 'DOW loop' is when SET & BY statements occur inside a DO loop.
data have;
input correlation $ policynum $ risknum $;
datalines;
A X Y
A X Y
A X Y
B X Y
B X Y
B X Y
B X L
B X L
B X L
C Z M
C Z M
C Z M
D Z M
D Z M
D Z M
;
/* keep last group of a nested group */
* SQL can be used only if correlation wanted is ALWAYS highest valued correlation;
proc sql;
create table want as
select * from have
group by policynum, risknum
having correlation = max(correlation)
;
* DATA Step DOW loops can be used when correlation wanted is last occurring correlation within by group;
data want;
do _n_ = 1 by 1 until (last.policynum);
set have;
by policynum risknum notsorted; /* presume at least contiguous */
end;
_want_correlation = correlation;
do _n_ = 1 to _n_;
set have;
if _want_correlation = correlation then OUTPUT;
end;
run;

Convert Stata "egen, group" to SAS

I am trying to find the equivalent of the Stata code "egen group" in SAS.
The goal is:
I have three variables x, y, and z. I want to create a new variable which will assign a different ordinal number for each combination of values of x, y, and z. How can I do this in SAS?
If you order your data by x, y, and z, SAS knows exactly where the groups x, y, and z start/end. You can use this to create unique identifiers.
Let's make some sample data. This data purposefully has duplicate values to illustrate how first. works.
data have;
do x = 'a', 'b', 'c';
do y = 'd', 'e', 'f';
do z = 'g', 'h', 'i';
output;
output;
end;
end;
end;
run;
Single-Threaded Unique IDs
This is the most likely case for you. This applies if you're running code in Base SAS.
First, sort the data by x y z.
proc sort data=have;
by x y z;
run;
Next, create your identifiers. We'll tell SAS that the data is ordered by x y z. Since z is nested within y and x, if we reach the first value of z, we've reached a unique combination of x y z.
data want;
set have;
by x y z;
if(first.z) then id+1;
run;
Output:
x y z id
a d g 1
a d g 1
a d h 2
a d h 2
a d i 3
a d i 3
...
id+1 is a special SAS shortcut called a sum statement and is equivalent to the following code:
retain id 0;
if(first.z) then id = id+1;
Multi-threaded Unique IDs
This applies if you're running code in SAS Viya in CAS. You need to add _THREADID_ to the ID to make it unique. For example:
cas;
libname casuser cas caslib='casuser';
data casuser.have;
set have;
run;
data casuser.want;
set casuser.have;
by x y z;
if(first.z) then _id+1;
id = catx('_', _THREADID_, _id);
drop _id;
run;
Output:
x y z id
a d g 15_1
a d g 15_1
a d h 15_2
a d h 15_2
a d i 15_3
a d i 15_3
...

Duplicate Observations in Stata (data manipulation)

I have a string variable
var1
x
y
z
that I need to "duplicate" and append to give
var1 var2
x x
x y
x z
--------
y x
y y
y z
--------
z x
z y
z z
where I added the horizontal lines to facilitate reading. Is such an expansion possible in Stata without loops? (I am not sure if "duplicate" is the right term.)
Two commands:
gen var2 = var1
fillin var1 var2
See help fillin and http://www.stata-journal.com/sjpdf.html?articlenum=dm0011

How to get only last 4 WORKING days data in SAS?

I'm trying to pull only last 4 working days data in SAS...I tried following code but I'm not getting what I'm intended to...
data input;
Input id $ id1 $ id2 $ num date date9.;
Format Date Date9.;
datalines;
x y z 3 19JUL2015
x y z 2 18JUL2015
x y z 3 17JUL2015
x y z 2 16JUL2015
x y z 3 15JUL2015
x y z 2 14JUL2015
x y z 3 13JUL2015
a b c 1 12JUL2015
a b c 1 11JUL2015
a b c 1 10JUL2015
a b c 1 09JUL2015
a b c 1 08JUL2015
a b c 2 07JUL2015
x y z 1 06JUL2015
;
Run;
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
*if intck('weekday',Date,today()) >4;
if 1<Weekday(Date)<7 and Date>=today()-4;
Run;
I think you need to reverse the > in your code, and add a qualification that you only want weekdays:
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
if intck('weekday',Date,'20JUL2015'd) le 4 and 1<weekday(Date)<7;
*if 1<Weekday(Date)<7 and Date>='20JUL2015'd-5;
Run;

SAS transpose using a part of the name

I'm here asking for help for a problem with proc transpose.
I have a dataset made this way (I'm going to show only 3 variables but I have lots of them)
PR ID VAR1a VAR1b VAR1c VAR2a VAR2b VAR2c VAR3a VAR3b VAR3c
1 1 x x x x x x x x x
1 2 x x x x x x x x x
1 3 x x x x x x x x x
2 1 x x x x x x x x x
2 2 x x x x x x x x x
2 3 x x x x x x x x x
I need an output dataset like this:
PREID ID VAR(name) A B C
1 1 VAR1(name) x x x
1 1 VAR2(name) x x x
1 1 VAR3(name) x x x
1 2 VAR1(name) x x x
1 2 VAR2(name) x x x
1 2 VAR3(name) x x x
1 3 VAR1(name) x x x
1 3 VAR2(name) x x x
1 3 VAR3(name) x x x
etc with preid 2 id 1 2 3, preid 3 id 1 2 3.
So I need to transpose but using the name (discriminating from a b c), I really have no idea from where I could start.
Can you help me please?
If i'm able to understand the output correctly. I think to achieve the result, first each observation of your input data would be broken into several different observation. So single observation would be converted into 9(var1a to var3c) observations( You can achive that using proc transpose by pr & id variable and transpose var1a to var3c variables). After this using a datastep, you would need to break _NAME__ variable into var1/2/3 and the a/b/c. After getting this done, you should be able to transpose the data to achieve your result.
I tried to write down the code based on your input data. Let me know if it helps.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
proc transpose data=input out=staging ;
by pr id ;
var VAR1a--VAR3c;
run;
data staging;
set staging;
var=substrn(strip(_name_),1,length(strip(_name_))-1);
dummy=substrn(strip(_name_),length(strip(_name_)),1);
drop _name_;
run;
proc transpose data=staging out=final(drop=_name_);
by pr id var;
id dummy;
var col1;
run;
proc print data=final;run;
Similar to #sushil solution above, but one less step. Since you have to go into a data step anyways, you may as well transpose the data in that step as well. So in this solution the Proc Transpose/Data step are combined. If you had few enough variables I'd remove the last transpose as well, but this is more flexible if you have quite a few variables.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
data out1;
set input;
array vars(*) var1a--var3c;
do i=1 to dim(vars);
name=vname(vars(i));
varname=substr(name,1,length(name)-1);
group=substr(name,length(name));
value=vars(i);
output;
end;
drop var1a--var3c;
run;
proc transpose data=out1 out=out2;
by pr id varname;
id group;
var value;
run;