I have a problem with changing value labels of xaxis in proc sgplot (see graph). I want to change x value labels 0 to 'female' and 1 to 'male', what should I do?
Many thanks in advance!
this is a box plot, label of x is gender, and I want the value labels displaying female instead of "0" and male instead of '1'
You should change the value before creating your GPLOT.
Following this example :
First prepare your data :
data work.classtemp (drop=name );
length Gender $ 6;
set sashelp.class;
if sex="F" then Gender="Female";
else Gender="Male";
proc sort data=work.classtemp out=work.class;
by weight height;
After, label defintion :
legend1 label=none value=("Male" "Female") Position=(right middle outside)
legend2 label=none value=("Male" "Female");
In your case you have to prepare your data before the GPLOT by changing the 1 to Male and 0 to Female.
With something like this :
data want;
set mydata;
if value=0 then Gender="Female";
else Gender="Male";
I have a very big SAS Dataset with over 280 variables in it and I need retrieve all the complete NULL columns based on a Variable value. For example I have a Variable called Reported(with only values Yes & No) in this dataset and I want to find out based on value No, all the complete Null Columns in this dataset.
Is there any quick way to find this out with out writing all the columns names for complete NULL values?
So for example if I have 4 Variables in the table,
So based on the above table I would like to see the output like this where Var4='No' and only return the columns with all the missing values
This would help me to identify variables which are not being populated at all where the Var4 value is 'No'
Note the WHERE statement in PROC FREQ.
proc format;
value $_xmiss_(default=1 min=1 max=1) ' ' =' ' other='1';
value _xmiss_(default=1 min=1 max=1) ._-.Z=' ' other='1';
%let data=sashelp.heart;
proc freq data=&data nlevels;
where status eq: 'A';
ods select nlevels;
ods output nlevels=nlevels;
format _character_ $_xmiss_. _numeric_ _xmiss_.;
data nlevels;
set nlevels;
There are 2 parts to the question I think. First is to subset records where Reported = "N". Then among those records, report columns that have all missing values. If this is correct then you could do something as follows (I am assuming that the columns with missing values are all numeric. If not, this approach will need a slight modification):
/* Thanks to REEZA for pointing out this way of getting the freqs. This eliminates some constraints and is more efficient */
proc freq data=have nlevels ;
where var1 = "N" ;
ods output nlevels = freqs;
table _all_;
proc sql noprint;
select TableVar into :cols separated by " " from freqs where NNonMissLevels = 0 ;
%put &cols;
data want;
set have (keep = &cols var1);
where var1 = "N" ;
I would like to turn the following long dataset:
data test;
input Id Injury $;
1 Ankle
1 Shoulder
2 Ankle
2 Head
3 Head
3 Shoulder
Into a wide dataset that looks like this:
ID Ankle Shoulder Head
1 1 1 0
2 1 0 1
3 0 1 1'
This answer seemed the most relevant but was falling over at the proc freq stage (my real dataset is around 1 million records, and has around 30 injury types):
Creating dummy variables from multiple strings in the same row
Additional help: https://communities.sas.com/t5/SAS-Statistical-Procedures/Possible-to-create-dummy-variables-with-proc-transpose/td-p/235140
Thanks for the help!
Here's a basic method that should work easily, even with several million records.
First you sort the data, then add in a count to create the 1 variable. Next you use PROC TRANSPOSE to flip the data from long to wide. Then fill in the missing values with a 0. This is a fully dynamic method, it doesn't matter how many different Injury types you have or how many records per person. There are other methods that are probably shorter code, but I think this is simple and easy to understand and modify if required.
data test;
input Id Injury $;
1 Ankle
1 Shoulder
2 Ankle
2 Head
3 Head
3 Shoulder
proc sort data=test;
by id injury;
data test2;
set test;
proc transpose data=test2 out=want prefix=Injury_;
by id;
var count;
id injury;
idlabel injury;
data want;
set want;
array inj(*) injury_:;
do i=1 to dim(inj);
if inj(i)=. then inj(i) = 0;
drop _name_ i;
Here's a solution involving only two steps... Just make sure your data is sorted by id first (the injury column doesn't need to be sorted).
First, create a macro variable containing the list of injuries
proc sql noprint;
select distinct injury
into :injuries separated by " "
from have
order by injury;
Then, let RETAIN do the magic -- no transposition needed!
data want(drop=i injury);
set have;
by id;
format &injuries 1.;
retain &injuries;
array injuries(*) &injuries;
if first.id then do i = 1 to dim(injuries);
injuries(i) = 0;
do i = 1 to dim(injuries);
if injury = scan("&injuries",i) then injuries(i) = 1;
if last.id then output;
Following OP's question in the comments, here's how we could use codes and labels for injuries. It could be done directly in the last data step with a label statement, but to minimize hard-coding, I'll assume the labels are entered into a sas dataset.
1 - Define Labels:
data myLabels;
infile datalines dlm="|" truncover;
informat injury $12. labl $24.;
input injury labl;
S460|Acute meniscal tear, medial
S520|Head trauma
2 - Add a new query to the existing proc sql step to prepare the label assignment.
proc sql noprint;
/* Existing query */
select distinct injury
into :injuries separated by " "
from have
order by injury;
/* New query */
select catx("=",injury,quote(trim(labl)))
into :labls separated by " "
from myLabels;
3 - Then, at the end of the data want step, just add a label statement.
data want(drop=i injury);
set have;
by id;
/* ...same as before... */
* Add labels;
label &labls;
And that should do it!
So, I have a list of counties and I'd like to find the under 18 population as a percent of the population for each county, so as an example from the table above I'd like to add only the population of agegrp 1 and 2 and divide by the 'all' population. In this case it would be 300/400. I'm wondering if this can be done for every county.
Let's call your SAS data set "HAVE" and say it has two character variables (County and AgeGrp) and one numeric variable (Population). And let's say you always have one observation in your data set for a each County with AgeGrp='All' on which the value of Population is the total for the county.
To be safe, let's sort the data set by County and process it in another data step to, creating a new data set named "WANT" with new variables for the county population (TOT_POP), the sum of the two Age Group values you want (TOT_GRP) and calculate the proportion (AgeGrpPct):
proc sort data=HAVE;
by County;
data WANT;
retain TOT_POP TOT_GRP 0;
set HAVE;
by County;
if first.County then do;
TOT_POP = 0;
TOT_GRP = 0;
if AgeGrp in ('1','2') then TOT_GRP + Population;
else if AgeGrp = 'All' then TOT_POP = Population;
if last.County;
AgeGrpPct = TOT_GRP / TOT_POP;
keep County TOT_POP TOT_GRP AgeGrpPct;
Notice that the observation containing AgeGrp='All' is not really needed; you could just as well have created another variable to collect a running total for all age groups.
If you want a procedural approach, create a format for the under 18's, then use PROC FREQ to calculate the percentage. It is necessary to exclude the 'All' values from the dataset with this method (it's generally bad practice to include summary rows in the source data).
PROC TABULATE could also be used for this.
data have;
input County $ AgeGrp $ Population;
A 1 200
A 2 100
A 3 100
A All 400
B 1 200
B 2 300
B 3 500
B All 1000
proc format;
value $age_fmt '1','2' = '<18'
other = '18+';
proc sort data=have;
by county;
proc freq data=have (where=(agegrp ne 'All')) noprint;
by county;
table agegrp / out=want (drop=COUNT where=(agegrp in ('1','2')));
format agegrp $age_fmt.;
weight population;
I have two cross-tabs being output in SAS: one for Time0 and one for Time1. I am interesting in comparing the change in values in each of the cells in the first crosstab with those in second.
Is there a clever way to change the background colour of a cell based on a comparison with an equivalent cell in another cross-tab? If not, and I create a variable with the change in the variable between Time0 and Time1, how can I change the cell colour of the crosstab depending on whether a value is positive or negative? Is it possible to put a colour gradient in increments of 5% if the cell contains a percentage change?
I have some sample data as follows:
data have;
input username $ betdate : datetime. stake;
dateOnly = datepart(betdate) ;
format betdate DATETIME.;
format dateOnly ddmmyy8.;
player1 12NOV2008:12:04:01 90
player1 04NOV2008:09:03:44 30
player2 07NOV2008:14:03:33 120
player1 05NOV2008:09:00:00 50
player1 05NOV2008:09:05:00 30
player1 05NOV2008:09:00:05 20
player2 09NOV2008:10:05:10 10
player2 15NOV2008:15:05:33 35
player1 15NOV2008:15:05:33 35
player1 15NOV2008:15:05:33 35
proc sort data=have; by username betdate; run;
data have;
set have;
by username dateOnly betdate;
retain eventTime;
if first.username then eventTime = 0;
if first.betdate then eventTime + 1;
proc sql;
create table playerStats as
distinct username,
(select distinct avg(stake) from have where username = main.username and eventTime <= 1) format comma10.2 as bet1AvgStake,
(select distinct avg(stake) from have where username = main.username and eventTime <= 2) format comma10.2 as bet2AvgStake,
(select distinct avg(stake) from have where username = main.username and eventTime <= 3) format comma10.2 as bet3AvgStake
from have main;
Proc rank data=playerStats ties=mean out=customerStats groups=2;
var bet1AvgStake bet2AvgStake;
ranks bet1AvgStakeRank bet2AvgStakeRank;
VAR bet1AvgStake bet2AvgStake;
class bet1AvgStakeRank;
TABLE bet1AvgStakeRank, bet1AvgStake*(N Mean);
TABLE bet1AvgStakeRank, bet2AvgStake*(N Mean);
I would like to see a red cell when the value in each cell in the second crosstab is lower than the equivalent cell in the first and a green cell when the value is higher.
Thanks for any help on this.
I don't think you can do all that in a single proc, but you certainly can do part 2 if I understand properly. It's called "Traffic Lighting" more generally, to help with googling for more detailed information; for example, this paper has some examples of how to do so.
Generally, the concept is that you create a format, the label of which is a color:
proc format;
value betfmt
low - -5= 'red'
-5 >-> 0 = 'lightred'
0 - 5 ='lightgreen'
5 >- high = 'green'; *or hex values like 'cxFF0099';
Then use that format in the proc tabulate:
proc tabulate data=yourdata;
var bets;
tables bets/style=[background=betfmt.];
It does need to be based on the current cell, though; you can't calculate based on another cell without using PROC REPORT.
I have the following, but I wish to control the order in which the data is displayed. Instead of displaying the bars in the order of A, B, C, D, E, F, I wish to display the bars based on a user-specified ordering. For example, I would like to be able to assign in a SAS dataset a value to a variable named rank that will control the order in which the bars are stacked.
How can I do this?
%let name=ex_17;
%let myfont=Albany AMT;
goptions reset=all;
goptions reset=(global goptions);
/*GOPTIONS DEVICE=png xpixels=800 ypixels=400;*/
goptions gunit=pct border cback=white colors=(blacks) ctext=black
htitle=4 htext=3.0 ftitle="&myfont" ftext="&myfont";
data mileage;
length factor $ 24;
input factor $ level $ value;
C left -38.882
C right 39.068
D right 38.99
D left -38.97
E right 38.982
E left -38.975
F left -38.973
F right 38.979
B left -38.975
B right 38.975
A right 38.977
A left -38.973
/* base case: 38.975 */
data mileage;
set mileage;
if level="right" then value = value - 38.975;
if level="left" then value = -1*(38.975 - value*-1);
data convert;
set mileage;
*if level='left' then value=-value;
proc format;
picture posval low-high='000,009';
data anlabels(drop=factor level value);
length text $ 24;
retain function 'label' when 'a' xsys ysys '2' hsys '3' size 2;
set convert;
midpoint=factor; subgroup=level;
*text=left(put(value, BEST6.3));
if level ='left' then position='>';
else position='<'; output;
title1 'Sensitivity Analysis graph';
*footnote1 justify=left ' SAS/GRAPH' move=(+0,+.5) 'a9'x move=(+0,-.5) ' Software'
justify=right 'DRIVER ';
*title2 'by Daniel Underwood' h=3.0;
footnote1 'Estimates accurate within +/- 0.002';
*axis1 label=(justify=left 'Disutility') style=0 color=black;
axis1 label=(justify=left '') style=0 color=black;
*axis2 label=none value=(tick=3 '') minor=none major=none
width=3 order=(-10000 to 20000 by 10000) color=black;
axis2 label=none minor=none major=none value=(tick=3 '')
width=3 order=(-0.093 to 0.093 by 0.186) color=black;
pattern1 value=solid color=ltgray;
pattern2 value=solid color=ltgray;
goption vpos=25;
goptions vsize=5in;
proc gchart data=convert;
format value BEST6.3;
note move=(40,90) height=3 'Women' move=(+12,+0) 'Men';
hbar factor / sumvar=value discrete nostat subgroup=level
maxis=axis1 raxis=axis2 nolegend annotate=anlabels
coutline=same des='' space=2;
The order of values displayed is controlled by the ORDER= option on either an AXIS statement (to order midpoints or the chart variable) or a LEGEND statement (to order values of a sub-group variable).
If you are asking for a way to use a variable named RANK to control the order for sub-group variables, here is a SAS sample program that does exactly that.