Calculated 'Across' Variable in PROC REPORT - sas

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT. This is an elaboration on a previous post I put up, but am posting it as a separate question because the sample data has changed and the question is a bit more involved.
data have1;
input username $ betdate : datetime. stake winnings winner;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 0
player1 04NOV2008:09:03:44 100 40 1
player2 07NOV2008:14:03:33 120 -120 0
player1 05NOV2008:09:00:00 50 15 1
player1 05NOV2008:09:05:00 30 5 1
player1 05NOV2008:09:00:05 20 10 1
player2 09NOV2008:10:05:10 10 -10 0
player2 09NOV2008:10:05:40 15 -15 0
player2 09NOV2008:10:05:45 15 -15 0
player2 09NOV2008:10:05:45 15 45 1
player2 15NOV2008:15:05:33 35 -35 0
player1 15NOV2008:15:05:33 35 15 1
player1 15NOV2008:15:05:33 35 15 1
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout1 groups=2;
var stake winner;
ranks stakeRank winnerRank;
run;
PROC TABULATE DATA=ranksout1 NOSEPS;
VAR stake;
class stakerank winnerrank;
TABLE stakerank = '', winnerrank=''*stake=''*(N Mean Skewness);
RUN;
The output provided by tabulate above is what I want, although I will ultimately be adding some more calculated fields so would like to do this with PROC REPORT.
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
When I start grouping the variables 'ACROSS' using commas and brackets, I can't seem to insert a calculated variable at all. It works if I only GROUP once on stakerank, but if I introduce the winnerrank grouping, it doesn't work. I get errors telling me that 'missing values were generated', and that 'stakemean is uninitialized'.
Would appreciate any tips at all. Thanks.

Maybe this helps:
preparing calculated variable in SAS view on detail data:
data ranks_view / view=ranks_view;
set ranksout1;
discountedstake = stake * 0.9;
run;
PROC REPORT DATA=ranks_view NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / ANALYSIS MEAN format=8.2;
RUN;
In DEFINE discountedstake / ANALYSIS MEAN format=8.2; - MEAN says the result is average of discountedstake.

Related

Numeric Macro in SAS

I have a variable for counting days. I'm trying to use the day count to divide by total days.
How do I create a macro that stores the most recent day and allows me to quote it later?
This is what I have so far (I've cut out code that's not relevant)
DATA scotland;
input day deathsscotland casesscotland;
cards;
1 1 85
2 1 121
3 1 153
4 1 171
5 2 195
6 3 227
7 6 266
8 6 322
9 7 373
10 10 416
11 14 499
12 16 584
13 22 719
14 25 894
;
run;
proc sort data=scotland out=scotlandsort;
by day;
run;
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then output;
run;
%let daycountscot = day
Data ratio;
set cdratio;
SCOTLANDAVERAGE = (SCOTLANDRATIO/&daycountscot)*1000;
run;
Using your own code, you can create the macro variable like this
Data _null_;
keep day;
set scotlandsort end=eof;
if eof then call symputx('daycountscot', day);
run;
%put &daycountscot.;
The data _null_ is not doing anything. You can eliminate the sort and data steps by selecting the max day value directly into a macro variable.
proc sql noprint;
select max(day) into :daycountscot trimmed
from scotland
;
quit;
No need to use macro code for this, it is better to keep values in variables anyway. To convert the value into text to store it as a macro variable SAS will have to round the number.
You could make a dataset with the maximum DAY value and then combine it with the dataset where you want to do the division.
data last_day;
set scotlandsort end=eof;
if eof then output;
keep day;
rename day=last_day;
run;
data ratio;
set cdratio;
if _n_=1 then set last_day;
SCOTLANDAVERAGE = (SCOTLANDRATIO/last_day)*1000;
run;
Probably easier in SQL code:
proc sql;
create table ratio as
select a.*, (SCOTLANDRATIO/last_day)*1000 as SCOTLANDAVERAGE
from cdratio a
, (select max(day) as last_day from scotland)
;
quit;

Calculated Variable in PROC REPORT

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT.
data have1;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout groups=2;
var stake;
ranks stakeRank;
run;
I want to add an extra, computed variable to the report above. What am I doing wrong here? I'm sure it's just a small syntax issue, but I'm having no luck with it!
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank stake, (n mean stake=discountedstake);
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS '';
define n/format=8. ;
define discountedstake / analysis format=8.2;
compute discountedstake;
discountedstake = stake * 0.9;
endcompute;
RUN;
Thanks.
I'm not sure what you trying to do, but below I'm using:
one variable with two statistics:
stake labeled Count is using N statistic
stakemean labeled Mean is a Mean statistic.
and creating a computed column - discountedstake (I'm multiplying mean statistic. If you need to multiply original value, it can be done e.g. by creating a datastep view on top of dataset.)
Example:
PROC REPORT DATA=ranksout NOWINDOWS;
COLUMN stakerank stake stake = stakemean discountedstake;
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean';
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
One of problems in your code is stake=discountedstake - creating alias discountedstake and also computing discountedstake.

SAS: Replicate PROC MEANS output in PROC TABULATE

I would like to replicate the output of PROC MEANS using PROC TABULATE. The reason for this is that I would like to have a profit percentage (or margin) as one of the variables in the PROC MEANS output, but would like to suppress the calculation for one or more of the statistics i.e. there will be a '-' or similar in the 'margin' row under 'N' and 'SUM.
Here is the sample data:
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
data want;
set have;
retain margin;
margin = (winnings) / stake;
PROC PRINT; RUN;
I have been calculating statistics with PROC MEANS (like below), but the value for the SUM statistics for the 'margin' variable means nothing: I would like to suppress this value. I have therefore been attempting to replicate this table using PROC TABULATE to have more control of the output, but have been unsuccessful so far.
proc means data=want N sum mean median stddev min max maxdec=2 order=freq STACKODS;
var stake winnings margin;
run;
proc tabulate data=want;
var stake winnings margin;
table stake * (N Sum mean Median StdDev Min Max);
run;
I would appreciate any help on this.
In principle, you can't create this type of output as a default part of the TABULATE function; in essence, you are asking for two different table definitions. Anything you do with the SAS syntax will basically amount to adding more dimensions to the table, but it won't fix your core problem.
You can use this code to get the tables you want, but they're still different tables:
PROC TABULATE DATA=want NOSEPS;
VAR stake winnings margin;
TABLE (stake winnings),(N SUM MEAN MEDIAN STDDEV MIN MAX);
TABLE (margin),(N MEAN MEDIAN STDDEV MIN MAX);
RUN;
There are some guides out there on hacking ODS to do what you want (namely, create "stacked tables" where several child tables are assembled into a single table. Check out here for an example. If you Google "SAS stack tables" you'll find more examples.
I've done this in HTML by creating a new tagset - basically, a special ODS destination that removes spaces between tables, etc. I don't have the code that I used anymore, unfortunately; I moved to R to do automated reporting.

SAS Group by numerical value or convert to char

I have the following data:
data have;
input username $ betdate : datetime. customerCode;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 1
player1 04NOV2008:09:03:44 10
player2 07NOV2008:07:03:33 1
player2 05NOV2008:09:00:00 0.5
player3 05NOV2008:09:05:00 1
player2 07NOV2008:14:03:33 1
player1 05NOV2008:09:00:05 20
player2 07NOV2008:16:03:33 1
player2 07NOV2008:18:03:33 1
player2 09NOV2008:10:05:10 0.7
player3 15NOV2008:15:05:33 10
player3 15NOV2008:15:05:33 1
player2 15NOV2008:15:05:33 0.1
run;
PROC PRINT; RUN;
When I run the following, I don't get distinct, collapsed entries for customerCode when I group by it because it is numeric, I presume.
proc sql;
select username, customerCode from have group by 1,2;
quit;
How can I do this? I want to get a history of all the customer codes that have been assigned to a customer (i.e as they change), rather than an entry for each numeric value for customerCode. I haven't been able to convert the variable to a char value so that the grouping works:
proc sql;
create table want as
select * from have, customerCode FORMAT $10. as code;
quit;
Thanks for any help on this.
You're not getting distinct entries because it is ignoring your group by, because you didn't ask for any summary functions. SAS does not permit group by without a summary function (ie, sum(something) or count(something) or whatever), it converts it to order by. There's no explicit reason numeric wouldn't work for grouping.
This is noted in the log with a NOTE, by the way.
You can use distinct, as you suggested in the comments:
proc sql;
select distinct username, customercode from have;
quit;
That will give you a list of all username/customercode combinations.
If you wanted to format it, you have to remove the $ - the $ in format does not mean "make this a character", which is what all formats do; it means "the original value pre-format was a character value".
proc sql;
create table want as
select distinct username, customercode format=10. from have;
quit;
This won't quite work as expected, because the format is applied after the distinct is processed (and the post-decimal portion still exists, just under the hood). However, you can do:
proc sql;
create table want as
select distinct username, put(customercode,10.) from have;
quit;
Or you could use ROUND or something else to keep it numeric.

SAS: PROC RANK: Use operators on variables

I have some data (created using the code) below that ranks observations according to two variables. In this case, it ranks the players first bet and second bet and creates two 'rank' variables. What I want to do instead is rank the observations according a function of the two variables instead (like the average of the two variables) and I'd like to do this in the PROC RANK command itself rather than using a preliminary data step as the ranking will get fairly involved after I replicate this on all the variables I need. Can I put operators into the PROC RANK statement? Rather than doing this:
Proc rank data=want ties=mean out=ranked groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
I would like to do this:
Proc rank data=want ties=mean out=ranked groups=2;
var avg(bet1stake, bet2stake);
ranks firstTwoBetsRank;
run;
Is this possible?
This is how the full example data can be created.
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
proc sort data=have;
by username betdate;
run;
data have;
set have;
by username betdate;
retain eventTime;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
run;
proc sql;
create table want as
select
distinct username,
(select distinct stake from have where username = main.username and eventTime = 1) as bet1Stake,
(select distinct stake from have where username = main.username and eventTime = 2) as bet2Stake
from have main;
quit;
Proc rank data=want ties=mean out=want groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
Thanks for any help on this.
I'm afraid you cannot apply operators on the variables you'd like to rank your observations.
The choice you have is either to use a DATA step to do both the application of operators and the calculation of the ranking
Or
use a Data step view or SQL view to apply the operator as an intermediate step just in case if you are concerned about disk space.
In case you are pulling the data from a SQL database (assuming it supports windowing functions) you should be to do exactly what you are trying to do just with some SQL code that is passed-through to the database.