I want to display as PERCENT3.1 without multiplying the values by hundred, as they are already computed as percentages.
I tried this
proc format;
picture abc low-high='000.0%';
run;
and then using abc.
It shows values greater than 1, like 38.12 as 38.1% which is as desired, but for values less than 1, like 0.92 it shows as 9%
Are there any other methods to do this?
Change your picture statement to have non-zero digit selectors on both the left and right hand side of the decimal point. You will also want rounding to occur before the value is picture rendered.
Example:
proc format;
picture abc (round) low-high='009.9 %';
run;
data _null_;
input x ##;
put x= #10 x= abc.;
datalines;
100 38 38.12 3.14 3.19 3 0.92 0
;
----- LOG -----
x=100 x=100.0 %
x=38 x=38.0 %
x=38.12 x=38.1 %
x=3.14 x=3.1 %
x=3.19 x=3.2 %
x=3 x=3.0 %
x=0.92 x=0.9 %
x=0 x=0.0 %
Related
I have the following SAS PROC MEANS statement that works great as it is.
proc means data=MBA_NODUP_APPLICANT_&TERM. missing nmiss n mean median p10 p90 fw = 8;
where ENR = 1;
by SRC_TYPE;
var gmattotal greverb2 grequant2 greanwrt;
run;
However, I am trying to add new variable calculating nmiss/(nmiss+n). I don't see any examples of this online, but also nothing that says that it cannot be done.
To calculate the percent missing, which is what your formula means, just use the OUTPUT statement to generate a dataset with the NMISS and N values. Then add a step to do the arithmetic yourself.
Or you could create a new binary variable using the MISSING() function and take the MEAN of that. The mean of a 1/0 variable is the same are the percent that were 1 (TRUE).
Example:
data test;
set sashelp.cars;
missing_cylinders=missing(cylinders);
run;
proc means data=test nmiss n mean;
var cylinders missing_cylinders ;
run;
So 2/428 is a little less than 0.5%.
The MEANS Procedure
N
Variable Miss N Mean
------------------------------------------------
Cylinders 2 426 5.8075117
missing_cylinders 0 428 0.0046729
Let's say I have stores all around the world and I want to know what was my top losses sales across the world per store. What is the code for that?!
here is my try:
proc sort data= store out=sorted_store;
by store descending amount;
run;
and
data calc1;
do _n_=1 by 1 until(last.store);
set sorted_store;
by store;
if _n_ <= 5 then "Sum_5Largest_Losses"n=sum(amount);
end;
run;
but this just prints out the 5:th amount and not 1.. TO .. 5! and I really don't know how to select the top 5 of EACH store . I think a kind of group by would be a perfect fit. But first things, first. How do I selct i= 1...5 ? And not just = 5?
There is also way of doing it with proc sql:
data have;
input store$ amount;
datalines;
A 100
A 200
A 300
A 400
A 500
A 600
A 700
B 1000
B 1100
C 1200
C 1300
C 1400
D 600
D 700
E 1000
E 1100
F 1200
;
run;
proc sql outobs=4; /* limit to first 10 results */
select store, sum(amount) as TOTAL_AMT
from have
group by 1
order by 2 desc; /* order them to the TOP selection*/
quit;
The data step sum(,) function adds up its arguments. If you only give it one argument then there is nothing to actually sum so it just returns the input value.
data calc1;
do _n_=1 by 1 until(last.store);
set sorted_store;
by store;
if _n_ <= 5 then Sum_5Largest_Losses=sum(Sum_5Largest_Losses,amount);
end;
run;
I would highly recommend learning the basic methods before getting into DOW loops.
Add a counter so you can find the first 5 of each store
As the data step loops the sum accumulates
Output sum for counter=5
proc sort data= store out=sorted_store;
by store descending amount;
run;
data calc1;
set sorted_store;
by store;
*if first store then set counter to 1 and total sum to 0;
if first.store then do ;
counter=1;
total_sum=0;
end;
*otherwise increment the counter;
else counter+1;
*accumulate the sum if counter <= 5;
if counter <=5 then total_sum = sum(total_sum, amount);
*output only when on last/5th record for each store;
if counter=5 then output;
run;
I had posted this earlier, and got help on it. My interest was piqued, and I ventured into this a little further to see what I could do with it. I am fascinated with simulations, but am just an average SAS programmer. I wonder if somebody might help here.
data out;
call streaminit(7); *seed better random number engine;
do pointvar = 1 by 1 until (outs=27); *iterate starting at
1 and stop when 27 outs ;
randvar = rand('Uniform'); *better random number engine;
if pointvar > 9 then pointvar=1; *reset to 1 if over 9;
set in point=pointvar; *pull the row we need;
if randvar < cutoff then do;
outs+1;
outs_inning+1;
end;
output;
if outs_inning=3 then outs_inning=0;
end;
stop;
run;
the data set has just one observation for the 9 hitters.
.73
.75
.72
.78
.81
.69
.74
.72
.75
With the help of Joe and others, the above did what I wanted, which was to simulate primarily the counting of outs involved in ONE baseball game.
I have been playing around with this (to no avail) and trying to get it to repeat a game, so to speak, where it would start at the top of the lineup after 27 outs. So for what I have right now, assume the 27th out is achieved with the 5th batter. I would like to put this whole code inside of a loop where it starts the process again at the beginning of the data set (1st observation, i.e, first batter).
So, assume I want to complete 3 iterations here. 3 games of 27 outs. Is there a way to do this? I tried doing the following.
%macro replicate(new,out,n)/des=’&new1 is &out repeated &n times
Data &new;
%do i=1 to &n;
Set &out;
Output;
%end;
%mend;
%replicate(new,out,3);
Proc print;
I was hoping with a do statement I could do this, but The problem with this is that it is reading each observation 3 times. So in the do i=1 to 3, followed by set out (three instances it takes the first observation from data set ‘out’, then 3 times it takes the second observation from data set out, etc.
i.e.
Outs randvar cutoff outs_inning
0 0.84 0.73 0
0 0.84 0.73 0
0 0.84 0.73 0
1 0.61 0.75 0
1 0.61 0.75 0
1 0.61 0.75 0
Can anybody help? I appreciate that this is a little outside the realm of what is typically discussed here, but a few of my students are also interested in simulations, and a baseball example has certainly interested them. It has become a fun problem. thanks for getting me this far.
You don't need a macro. You should be able to add an outer DO loop which is do game=1 to 3;
Below I changed the variable POINTVAR to be BATTER, and added a PUT statement to write messages to the log.
data in;
input cutoff ##;
cards;
.73 .75 .72 .78 .81 .69 .74 .72 .75
;
data play;
call streaminit(7);
do game=1 to 3;
outs=0;
outs_inning=0;
do batter = 1 by 1 until (outs=27);
randvar = rand('Uniform');
if batter > 9 then batter=1;
set in point=batter;
if randvar < cutoff then do;
outs+1;
outs_inning+1;
end;
output;
put (game batter cutoff randvar outs_inning outs)(=);
if outs_inning=3 then outs_inning=0;
end;
end;
stop;
run;
I have a dataset which some cells are valorized with 888888888 and 999999999. I would do the mean not considering these values. That is:
x=5, y=10, z=888888888
the mean will be 5.
How can I fix?
As you're calculating across variables, just store them in an array, loop through them and sum any that are less than the required threshold (I've used 100,000,000), then divide by the total number of variables to get the mean.
data have;
input x y z;
datalines;
5 10 888888888
4 20 999999999
;
run;
data want;
set have;
array vars{*} x y z;
_sum=0;
do _i = 1 to dim(vars);
if vars{_i}<1e8 then _sum+vars{_i};
end;
mean_vars = _sum/dim(vars);
drop _: ;
run;
I would like to assign IDs with blank Sizes a size based on the frequency distribution of their Group.
Dataset A contains a snapshot of my data:
ID Group Size
1 A Large
2 B Small
3 C Small
5 D Medium
6 C Large
7 B Medium
8 B -
Dataset B shows the frequency distribution of the Sizes among the Groups:
Group Small Medium Large
A 0.31 0.25 0.44
B 0.43 0.22 0.35
C 0.10 0.13 0.78
D 0.29 0.27 0.44
For ID 8, we know that it has a 43% probability of being "small", a 22% probability of being "medium" and a 35% probability of being "large". That's because these are the Size distributions for Group B.
How do I assign ID 8 (and other blank IDs) a Size based on the Group distributions in Dataset B? I'm using SAS 9.4. Macros, SQL, anything is welcome!
The table distribution is ideal for this. The last datastep here shows that; before that I set things up to create the data at random and determine the frequency table, so you can skip that if you already do that.
See Rick Wicklin's blog about simulating multinomial data for an example of this in other use cases (and more information about the function).
*Setting this up to help generate random data;
proc format;
value sizef
low - 1.3 = 'Small'
1.3 <-<2.3 = 'Medium'
2.3 - high = 'Large'
;
quit;
*Generating random data;
data have;
call streaminit(7);
do id = 1 to 1e5;
group = byte(65+rand('Uniform')*4); *A = 65, B = 66, etc.;
size = put((rank(group)-66)*0.5 + rand('Uniform')*3,sizef.); *Intentionally making size somewhat linked to group to allow for differences in the frequency;
if rand('Uniform') < 0.05 then call missing(size); *A separate call to set missingness;
output;
end;
run;
proc sort data=have;
by group;
run;
title "Initial frequency of size by group";
proc freq data=have;
by group;
tables size/list out=freq_size;
run;
title;
*Transpose to one row per group, needed for table distribution;
proc transpose data=freq_size out=table_size prefix=pct_;
var percent;
id size;
by group;
run;
data want;
merge have table_size;
by group;
array pcts pct_:; *convenience array;
if first.group then do _i = 1 to dim(pcts); *must divide by 100 but only once!;
pcts[_i] = pcts[_i]/100;
end;
if missing(size) then do;
size_new = rand('table',of pcts[*]); *table uses the pcts[] array to tell SAS the table of probabilities;
size = scan(vname(pcts[size_new]),2,'_');
end;
run;
title "Final frequency of size by group";
proc freq data=want;
by group;
tables size/list;
run;
title;
You can also do this with a random value and some if-else logic:
proc sql;
create table temp_assigned as select
a.*, rand("Uniform") as random_roll, /*generate a random number from 0 to 1*/
case when missing(size) then
case when calculated random_roll < small then small
when calculated random_roll < sum(small, medium) then medium
when calculated random_roll < sum(small, medium, large) then large
end end as value_selected, /*pick the value of the size associated with that value in each group*/
coalesce(case when calculated value_selected = small then "Small"
when calculated value_selected = medium then "Medium"
when calculated value_selected = large then "Large" end, size) as group_assigned /*pick the value associated with that size*/
from temp as a
left join freqs as b
on a.group = b.group;
quit;
Obviously you can do this without creating the value_selected variable, but I thought showing it for demonstrative purposes would be helpful.