SAS: PROC RANK: Use operators on variables - sas

I have some data (created using the code) below that ranks observations according to two variables. In this case, it ranks the players first bet and second bet and creates two 'rank' variables. What I want to do instead is rank the observations according a function of the two variables instead (like the average of the two variables) and I'd like to do this in the PROC RANK command itself rather than using a preliminary data step as the ranking will get fairly involved after I replicate this on all the variables I need. Can I put operators into the PROC RANK statement? Rather than doing this:
Proc rank data=want ties=mean out=ranked groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
I would like to do this:
Proc rank data=want ties=mean out=ranked groups=2;
var avg(bet1stake, bet2stake);
ranks firstTwoBetsRank;
run;
Is this possible?
This is how the full example data can be created.
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
proc sort data=have;
by username betdate;
run;
data have;
set have;
by username betdate;
retain eventTime;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
run;
proc sql;
create table want as
select
distinct username,
(select distinct stake from have where username = main.username and eventTime = 1) as bet1Stake,
(select distinct stake from have where username = main.username and eventTime = 2) as bet2Stake
from have main;
quit;
Proc rank data=want ties=mean out=want groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
Thanks for any help on this.

I'm afraid you cannot apply operators on the variables you'd like to rank your observations.
The choice you have is either to use a DATA step to do both the application of operators and the calculation of the ranking
Or
use a Data step view or SQL view to apply the operator as an intermediate step just in case if you are concerned about disk space.
In case you are pulling the data from a SQL database (assuming it supports windowing functions) you should be to do exactly what you are trying to do just with some SQL code that is passed-through to the database.

Related

SAS compare two tables with different column order

Hi I have two tables with different column orders, and the column name are not capitalized as the same. How can I compare if the contents of these two tables are the same?
For example, I have two tables of students' grades
table A:
Math English History
-------+--------+---------
Tim 98 95 90
Helen 100 92 85
table B:
history MATH english
--------+--------+---------
Tim 90 98 95
Helen 85 100 92
You may use either of the two approaches to compare, regardless of the order or column name
/*1. Proc compare*/
proc sort data=A; by name; run;
proc sort data=B; by name; run;
proc compare base=A compare=B;
id name;
run;
/*2. Proc SQL*/
proc sql;
select Math, English, History from A
<union/ intersect/ Except>
select MATH, english, history from B;
quit;
use except corr(corresponding) it will check by name. if everything is matching you will get zero records.
data have1;
input Math English History;
datalines;
1 2 3
;
run;
data have2;
input English math History;
datalines;
2 1 3
;
run;
proc sql ;
select * from have1
except corr
select * from have2;
edit1
if you want to check which particular column it differs you may have to transpose and compare as shown below example.
data have1;
input name $ Math English pyschology History;
datalines;
Tim 98 95 76 90
Helen 100 92 55 85
;
run;
data have2;
input name $ English Math pyschology History;
datalines;
Tim 95 98 76 90
Helen 92 100 99 85
;
run;
proc sort data = have1 out =hav1;
by name;
run;
proc sort data = have2 out =hav2;
by name;
run;
proc transpose data =hav1 out=newhave1 (rename = (_name_= subject
col1=marks));
by name;
run;
proc transpose data =hav2 out=newhave2 (rename = (_name_= subject
col1=marks));
by name;
run;
proc sql;
create table want(drop=mark_dif) as
select
a.name as name
,a.subject as subject
,a.marks as have1_marks
,b.marks as have2_marks
,a.marks -b.marks as mark_dif
from newhave1 a inner join newhave2 b
on upcase(a.name) = upcase(b.name)
and upcase(a.subject) =upcase(b.subject)
where calculated mark_dif ne 0;

Calculated Variable in PROC REPORT

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT.
data have1;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout groups=2;
var stake;
ranks stakeRank;
run;
I want to add an extra, computed variable to the report above. What am I doing wrong here? I'm sure it's just a small syntax issue, but I'm having no luck with it!
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank stake, (n mean stake=discountedstake);
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS '';
define n/format=8. ;
define discountedstake / analysis format=8.2;
compute discountedstake;
discountedstake = stake * 0.9;
endcompute;
RUN;
Thanks.
I'm not sure what you trying to do, but below I'm using:
one variable with two statistics:
stake labeled Count is using N statistic
stakemean labeled Mean is a Mean statistic.
and creating a computed column - discountedstake (I'm multiplying mean statistic. If you need to multiply original value, it can be done e.g. by creating a datastep view on top of dataset.)
Example:
PROC REPORT DATA=ranksout NOWINDOWS;
COLUMN stakerank stake stake = stakemean discountedstake;
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean';
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
One of problems in your code is stake=discountedstake - creating alias discountedstake and also computing discountedstake.

Calculated 'Across' Variable in PROC REPORT

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT. This is an elaboration on a previous post I put up, but am posting it as a separate question because the sample data has changed and the question is a bit more involved.
data have1;
input username $ betdate : datetime. stake winnings winner;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 0
player1 04NOV2008:09:03:44 100 40 1
player2 07NOV2008:14:03:33 120 -120 0
player1 05NOV2008:09:00:00 50 15 1
player1 05NOV2008:09:05:00 30 5 1
player1 05NOV2008:09:00:05 20 10 1
player2 09NOV2008:10:05:10 10 -10 0
player2 09NOV2008:10:05:40 15 -15 0
player2 09NOV2008:10:05:45 15 -15 0
player2 09NOV2008:10:05:45 15 45 1
player2 15NOV2008:15:05:33 35 -35 0
player1 15NOV2008:15:05:33 35 15 1
player1 15NOV2008:15:05:33 35 15 1
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout1 groups=2;
var stake winner;
ranks stakeRank winnerRank;
run;
PROC TABULATE DATA=ranksout1 NOSEPS;
VAR stake;
class stakerank winnerrank;
TABLE stakerank = '', winnerrank=''*stake=''*(N Mean Skewness);
RUN;
The output provided by tabulate above is what I want, although I will ultimately be adding some more calculated fields so would like to do this with PROC REPORT.
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
When I start grouping the variables 'ACROSS' using commas and brackets, I can't seem to insert a calculated variable at all. It works if I only GROUP once on stakerank, but if I introduce the winnerrank grouping, it doesn't work. I get errors telling me that 'missing values were generated', and that 'stakemean is uninitialized'.
Would appreciate any tips at all. Thanks.
Maybe this helps:
preparing calculated variable in SAS view on detail data:
data ranks_view / view=ranks_view;
set ranksout1;
discountedstake = stake * 0.9;
run;
PROC REPORT DATA=ranks_view NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / ANALYSIS MEAN format=8.2;
RUN;
In DEFINE discountedstake / ANALYSIS MEAN format=8.2; - MEAN says the result is average of discountedstake.

SAS: Replicate PROC MEANS output in PROC TABULATE

I would like to replicate the output of PROC MEANS using PROC TABULATE. The reason for this is that I would like to have a profit percentage (or margin) as one of the variables in the PROC MEANS output, but would like to suppress the calculation for one or more of the statistics i.e. there will be a '-' or similar in the 'margin' row under 'N' and 'SUM.
Here is the sample data:
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
data want;
set have;
retain margin;
margin = (winnings) / stake;
PROC PRINT; RUN;
I have been calculating statistics with PROC MEANS (like below), but the value for the SUM statistics for the 'margin' variable means nothing: I would like to suppress this value. I have therefore been attempting to replicate this table using PROC TABULATE to have more control of the output, but have been unsuccessful so far.
proc means data=want N sum mean median stddev min max maxdec=2 order=freq STACKODS;
var stake winnings margin;
run;
proc tabulate data=want;
var stake winnings margin;
table stake * (N Sum mean Median StdDev Min Max);
run;
I would appreciate any help on this.
In principle, you can't create this type of output as a default part of the TABULATE function; in essence, you are asking for two different table definitions. Anything you do with the SAS syntax will basically amount to adding more dimensions to the table, but it won't fix your core problem.
You can use this code to get the tables you want, but they're still different tables:
PROC TABULATE DATA=want NOSEPS;
VAR stake winnings margin;
TABLE (stake winnings),(N SUM MEAN MEDIAN STDDEV MIN MAX);
TABLE (margin),(N MEAN MEDIAN STDDEV MIN MAX);
RUN;
There are some guides out there on hacking ODS to do what you want (namely, create "stacked tables" where several child tables are assembled into a single table. Check out here for an example. If you Google "SAS stack tables" you'll find more examples.
I've done this in HTML by creating a new tagset - basically, a special ODS destination that removes spaces between tables, etc. I don't have the code that I used anymore, unfortunately; I moved to R to do automated reporting.

SAS Group by numerical value or convert to char

I have the following data:
data have;
input username $ betdate : datetime. customerCode;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 1
player1 04NOV2008:09:03:44 10
player2 07NOV2008:07:03:33 1
player2 05NOV2008:09:00:00 0.5
player3 05NOV2008:09:05:00 1
player2 07NOV2008:14:03:33 1
player1 05NOV2008:09:00:05 20
player2 07NOV2008:16:03:33 1
player2 07NOV2008:18:03:33 1
player2 09NOV2008:10:05:10 0.7
player3 15NOV2008:15:05:33 10
player3 15NOV2008:15:05:33 1
player2 15NOV2008:15:05:33 0.1
run;
PROC PRINT; RUN;
When I run the following, I don't get distinct, collapsed entries for customerCode when I group by it because it is numeric, I presume.
proc sql;
select username, customerCode from have group by 1,2;
quit;
How can I do this? I want to get a history of all the customer codes that have been assigned to a customer (i.e as they change), rather than an entry for each numeric value for customerCode. I haven't been able to convert the variable to a char value so that the grouping works:
proc sql;
create table want as
select * from have, customerCode FORMAT $10. as code;
quit;
Thanks for any help on this.
You're not getting distinct entries because it is ignoring your group by, because you didn't ask for any summary functions. SAS does not permit group by without a summary function (ie, sum(something) or count(something) or whatever), it converts it to order by. There's no explicit reason numeric wouldn't work for grouping.
This is noted in the log with a NOTE, by the way.
You can use distinct, as you suggested in the comments:
proc sql;
select distinct username, customercode from have;
quit;
That will give you a list of all username/customercode combinations.
If you wanted to format it, you have to remove the $ - the $ in format does not mean "make this a character", which is what all formats do; it means "the original value pre-format was a character value".
proc sql;
create table want as
select distinct username, customercode format=10. from have;
quit;
This won't quite work as expected, because the format is applied after the distinct is processed (and the post-decimal portion still exists, just under the hood). However, you can do:
proc sql;
create table want as
select distinct username, put(customercode,10.) from have;
quit;
Or you could use ROUND or something else to keep it numeric.