Calculated Variable in PROC REPORT - sas

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT.
data have1;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout groups=2;
var stake;
ranks stakeRank;
run;
I want to add an extra, computed variable to the report above. What am I doing wrong here? I'm sure it's just a small syntax issue, but I'm having no luck with it!
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank stake, (n mean stake=discountedstake);
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS '';
define n/format=8. ;
define discountedstake / analysis format=8.2;
compute discountedstake;
discountedstake = stake * 0.9;
endcompute;
RUN;
Thanks.

I'm not sure what you trying to do, but below I'm using:
one variable with two statistics:
stake labeled Count is using N statistic
stakemean labeled Mean is a Mean statistic.
and creating a computed column - discountedstake (I'm multiplying mean statistic. If you need to multiply original value, it can be done e.g. by creating a datastep view on top of dataset.)
Example:
PROC REPORT DATA=ranksout NOWINDOWS;
COLUMN stakerank stake stake = stakemean discountedstake;
DEFINE stakerank / GROUP id 'Rank for Variable Stake' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean';
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
One of problems in your code is stake=discountedstake - creating alias discountedstake and also computing discountedstake.

Related

Order that variables appear in plot output of proc freq

I have created a frequency plot using the plot option in proc freq. However, I am not able to order that I want. I have categories of '5 to 10 weeks' 'Greater than 25 weeks', '10 to 15 weeks', '15 to 20 weeks'. I want them to go in the logical order of increasing weeks but I'm not sure how to do that. I tried using the order option but nothing seemed to fix that.
A possible solution would be to code the order I want as values of 1-5, order them using the order= option and then have a label for 1-5. But I'm not sure if that's possible.
Tried the order= option, however, that didn't fix the issue.
I want the bins to show up as 'less then 5 weeks' '5 to 10 weeks' '10 to 15 weeks' '15 to 20 weeks' '20 to 25 weeks' 'greater then 25 weeks'
When the Proc FREQ plot displays the tabled variables values in alphabetic order, and the plot option order= is not specified you have the following scenario
variable is character
display order is default (INTERNAL)
Note: Other frequency plotting techniques, such a SGPLOT VBAR recognize midpoint axis specification that can control the explicit order the character values appear. Proc FREQ does not have a plot option for mxaxis.
You are correct in presuming an inverse map (or remap, or unmap) from label to a desired ordered value is essential. The are two main ways to remap
custom format to map label to a character value (via PUT)
custom informat to map label to a numeric value (via INPUT)
Once you have remapped the labels to a value, you need a second custom format to map the values back to the original labels.
Example:
* format to map unmapped labels back to original labels;
proc format;
value category
1 = 'Less than 5 weeks'
2 = '5 to 10 weeks'
3 = '10 to 15 weeks'
4 = '15 to 20 weeks'
5 = '20 to 25 weeks'
6 = 'Greater than 25 weeks'
;
* informat to unmap labels to numeric with desired freq plot order;
invalue category_to_num
'Less than 5 weeks' = 1
'5 to 10 weeks' = 2
'10 to 15 weeks' = 3
'15 to 20 weeks' = 4
'20 to 25 weeks' = 5
'Greater than 25 weeks' = 6
;
* generate sample data;
data have;
do itemid = 1 to 500;
cat_num = rantbl(123,0.05,0.35,0.25,0.15,0.07); * for demonstration purposes;
cat_char = put(cat_num, category.); * your actual category values;
output;
end;
run;
* demonstration: numeric category (unformatted) goes ascending internal order;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
run;
* demonstration: numeric category (formatted) in desired order with desired category text;
proc freq data=have;
table cat_num / plots=freqplot(scale=percent) ;
format cat_num category.;
run;
* your original plot showing character values being ordered alphabetically
* (as is expected from default order=internal);
proc freq data=have;
table cat_char / plots=freqplot(scale=percent) ;
run;
* unmap the category texts to numeric values that are ordered as desired;
data have_remap;
set have;
cat_numX = input(cat_char, category_to_num.);
run;
* table the numeric values computed during unmap, using format to display
* the desired category texts;
proc freq data=have_remap;
table cat_numX / plots=freqplot(scale=percent) ; * <-- cat_numX ;
format cat_numX category.; * <-- format ;
run;

Calculated 'Across' Variable in PROC REPORT

I'm trying to use a alias to create multiple statistics for the same variable in PROC REPORT. This is an elaboration on a previous post I put up, but am posting it as a separate question because the sample data has changed and the question is a bit more involved.
data have1;
input username $ betdate : datetime. stake winnings winner;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90 0
player1 04NOV2008:09:03:44 100 40 1
player2 07NOV2008:14:03:33 120 -120 0
player1 05NOV2008:09:00:00 50 15 1
player1 05NOV2008:09:05:00 30 5 1
player1 05NOV2008:09:00:05 20 10 1
player2 09NOV2008:10:05:10 10 -10 0
player2 09NOV2008:10:05:40 15 -15 0
player2 09NOV2008:10:05:45 15 -15 0
player2 09NOV2008:10:05:45 15 45 1
player2 15NOV2008:15:05:33 35 -35 0
player1 15NOV2008:15:05:33 35 15 1
player1 15NOV2008:15:05:33 35 15 1
run;
PROC PRINT; RUN;
Proc rank data=have1 ties=mean out=ranksout1 groups=2;
var stake winner;
ranks stakeRank winnerRank;
run;
PROC TABULATE DATA=ranksout1 NOSEPS;
VAR stake;
class stakerank winnerrank;
TABLE stakerank = '', winnerrank=''*stake=''*(N Mean Skewness);
RUN;
The output provided by tabulate above is what I want, although I will ultimately be adding some more calculated fields so would like to do this with PROC REPORT.
PROC REPORT DATA=ranksout1 NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / computed format=8.2;
COMPUTE discountedstake;
discountedstake = stakemean * 0.9;
ENDCOMP;
RUN;
When I start grouping the variables 'ACROSS' using commas and brackets, I can't seem to insert a calculated variable at all. It works if I only GROUP once on stakerank, but if I introduce the winnerrank grouping, it doesn't work. I get errors telling me that 'missing values were generated', and that 'stakemean is uninitialized'.
Would appreciate any tips at all. Thanks.
Maybe this helps:
preparing calculated variable in SAS view on detail data:
data ranks_view / view=ranks_view;
set ranksout1;
discountedstake = stake * 0.9;
run;
PROC REPORT DATA=ranks_view NOWINDOWS;
COLUMN stakerank winnerrank, (N stake=stakemean discountedstake);
DEFINE stakerank / GROUP '' ORDER=INTERNAL;
DEFINE winnerrank / ACROSS '' ORDER=INTERNAL;
DEFINE stake / ANALYSIS N 'Count';
DEFINE stakemean / ANALYSIS MEAN 'Mean' format=8.2;
DEFINE discountedstake / ANALYSIS MEAN format=8.2;
RUN;
In DEFINE discountedstake / ANALYSIS MEAN format=8.2; - MEAN says the result is average of discountedstake.

SAS: Replicate PROC MEANS output in PROC TABULATE

I would like to replicate the output of PROC MEANS using PROC TABULATE. The reason for this is that I would like to have a profit percentage (or margin) as one of the variables in the PROC MEANS output, but would like to suppress the calculation for one or more of the statistics i.e. there will be a '-' or similar in the 'margin' row under 'N' and 'SUM.
Here is the sample data:
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
data want;
set have;
retain margin;
margin = (winnings) / stake;
PROC PRINT; RUN;
I have been calculating statistics with PROC MEANS (like below), but the value for the SUM statistics for the 'margin' variable means nothing: I would like to suppress this value. I have therefore been attempting to replicate this table using PROC TABULATE to have more control of the output, but have been unsuccessful so far.
proc means data=want N sum mean median stddev min max maxdec=2 order=freq STACKODS;
var stake winnings margin;
run;
proc tabulate data=want;
var stake winnings margin;
table stake * (N Sum mean Median StdDev Min Max);
run;
I would appreciate any help on this.
In principle, you can't create this type of output as a default part of the TABULATE function; in essence, you are asking for two different table definitions. Anything you do with the SAS syntax will basically amount to adding more dimensions to the table, but it won't fix your core problem.
You can use this code to get the tables you want, but they're still different tables:
PROC TABULATE DATA=want NOSEPS;
VAR stake winnings margin;
TABLE (stake winnings),(N SUM MEAN MEDIAN STDDEV MIN MAX);
TABLE (margin),(N MEAN MEDIAN STDDEV MIN MAX);
RUN;
There are some guides out there on hacking ODS to do what you want (namely, create "stacked tables" where several child tables are assembled into a single table. Check out here for an example. If you Google "SAS stack tables" you'll find more examples.
I've done this in HTML by creating a new tagset - basically, a special ODS destination that removes spaces between tables, etc. I don't have the code that I used anymore, unfortunately; I moved to R to do automated reporting.

SAS Group by numerical value or convert to char

I have the following data:
data have;
input username $ betdate : datetime. customerCode;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 1
player1 04NOV2008:09:03:44 10
player2 07NOV2008:07:03:33 1
player2 05NOV2008:09:00:00 0.5
player3 05NOV2008:09:05:00 1
player2 07NOV2008:14:03:33 1
player1 05NOV2008:09:00:05 20
player2 07NOV2008:16:03:33 1
player2 07NOV2008:18:03:33 1
player2 09NOV2008:10:05:10 0.7
player3 15NOV2008:15:05:33 10
player3 15NOV2008:15:05:33 1
player2 15NOV2008:15:05:33 0.1
run;
PROC PRINT; RUN;
When I run the following, I don't get distinct, collapsed entries for customerCode when I group by it because it is numeric, I presume.
proc sql;
select username, customerCode from have group by 1,2;
quit;
How can I do this? I want to get a history of all the customer codes that have been assigned to a customer (i.e as they change), rather than an entry for each numeric value for customerCode. I haven't been able to convert the variable to a char value so that the grouping works:
proc sql;
create table want as
select * from have, customerCode FORMAT $10. as code;
quit;
Thanks for any help on this.
You're not getting distinct entries because it is ignoring your group by, because you didn't ask for any summary functions. SAS does not permit group by without a summary function (ie, sum(something) or count(something) or whatever), it converts it to order by. There's no explicit reason numeric wouldn't work for grouping.
This is noted in the log with a NOTE, by the way.
You can use distinct, as you suggested in the comments:
proc sql;
select distinct username, customercode from have;
quit;
That will give you a list of all username/customercode combinations.
If you wanted to format it, you have to remove the $ - the $ in format does not mean "make this a character", which is what all formats do; it means "the original value pre-format was a character value".
proc sql;
create table want as
select distinct username, customercode format=10. from have;
quit;
This won't quite work as expected, because the format is applied after the distinct is processed (and the post-decimal portion still exists, just under the hood). However, you can do:
proc sql;
create table want as
select distinct username, put(customercode,10.) from have;
quit;
Or you could use ROUND or something else to keep it numeric.

SAS: PROC RANK: Use operators on variables

I have some data (created using the code) below that ranks observations according to two variables. In this case, it ranks the players first bet and second bet and creates two 'rank' variables. What I want to do instead is rank the observations according a function of the two variables instead (like the average of the two variables) and I'd like to do this in the PROC RANK command itself rather than using a preliminary data step as the ranking will get fairly involved after I replicate this on all the variables I need. Can I put operators into the PROC RANK statement? Rather than doing this:
Proc rank data=want ties=mean out=ranked groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
I would like to do this:
Proc rank data=want ties=mean out=ranked groups=2;
var avg(bet1stake, bet2stake);
ranks firstTwoBetsRank;
run;
Is this possible?
This is how the full example data can be created.
data have;
input username $ betdate : datetime. stake winnings;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
datalines;
player1 12NOV2008:12:04:01 90 -90
player1 04NOV2008:09:03:44 100 40
player2 07NOV2008:14:03:33 120 -120
player1 05NOV2008:09:00:00 50 15
player1 05NOV2008:09:05:00 30 5
player1 05NOV2008:09:00:05 20 10
player2 09NOV2008:10:05:10 10 -10
player2 15NOV2008:15:05:33 35 -35
player1 15NOV2008:15:05:33 35 15
player1 15NOV2008:15:05:33 35 15
run;
proc sort data=have;
by username betdate;
run;
data have;
set have;
by username betdate;
retain eventTime;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
run;
proc sql;
create table want as
select
distinct username,
(select distinct stake from have where username = main.username and eventTime = 1) as bet1Stake,
(select distinct stake from have where username = main.username and eventTime = 2) as bet2Stake
from have main;
quit;
Proc rank data=want ties=mean out=want groups=2;
var bet1stake bet2stake;
ranks bet1stakeRank bet2stakeRank;
run;
Thanks for any help on this.
I'm afraid you cannot apply operators on the variables you'd like to rank your observations.
The choice you have is either to use a DATA step to do both the application of operators and the calculation of the ranking
Or
use a Data step view or SQL view to apply the operator as an intermediate step just in case if you are concerned about disk space.
In case you are pulling the data from a SQL database (assuming it supports windowing functions) you should be to do exactly what you are trying to do just with some SQL code that is passed-through to the database.