add a new column based on other columns in sas - sas

I'm new to SAS and would like to get help with the question as follows:
1: Sample table shows as below
Time Color Food label
2020 red Apple A
2019 red Orange A,B
2018 blue Apple A,B
2017 blue Orange B
Logic to return label is:
when color = 'red' then 'A'
when color = 'blue' then 'B'
when food = 'orange' then 'B'
when food = 'apple' then 'A',
since for row 2, we have both red and orange then our label should contains both 'A,B', same as row 3.
The requirement is to print out the label for each combination. I know that we can use CASE WHEN statement to define how is our label should be based on color and food. Here we only have 2 kind of color and 2 different food, but what if we like 7 different color and 10 different food, then we would have 7*10 different combinations. I don't want to list all of those combinations by using case when statement.
Is there any convenient way to return the label? Thanks for any ideas!(prefer to achieve it in PROC SQL, but SAS is also welcome)

This looks like a simple application of formats. So define a format that converts COLOR to a code letter and a second one that converts FOOD to a code letter.
proc format ;
value color 'red'='A' 'blue'='B';
value food 'Apple'='A' 'Orange'='B' ;
run;
Then use those to convert the actual values of COLOR and FOOD variables into the labels. Either in a data step:
data want;
set have ;
length label $5 ;
label=catx(',',put(color,color.),put(food,food.));
run;
Or an SQL query:
proc sql ;
create table want as
select *
, catx(',',put(color,color.),put(food,food.)) as label length=5
from have
;
run;
You do not need to re-create the format if the data changes, only if the list of possible values changes.

Related

display only variable names in stacked correlation matrix in SAS

In SAS, is there a way to display the variable label instead of the variable name in a stacked correlation matrix? Specifically in the row that goes across at the top of the matrix? I'm applying a template that modifies base.corr.stackedmatrix, changing the color of significant p-values to red, and I know using RowLabel for the column displays the variable label. I can't figure out how to display the label for the row of variable names so only the variable labels are displayed.
proc format;
value pvalsig low-.05 ="red" .05-high="black";
run;
proc template;
edit base.corr.stackedmatrix;
column (RowLabel) (Matrix) * (Matrix2) * (Matrix3) * (Matrix4);
edit matrix2;
style={foreground=pvalsig.};
end;
end;
run;

Counting matching character values

I'm new to SAS and I have a basic question about counting the number of matching values in a column.
For example, if I have a variable called hair_color, and the different values are "brown", "black", "blonde", and "red" - I want to be able to produce a table that shows my database has 45 people with brown hair, 43 with black hair, 23 with blonde, etc.
I've written:
proc freq data=fake_dataset;
tables hair_color;
where hair_color="brown" "black" "blonde" "red";
run;
This code only runs if I have one value in the 'where' statement, for example, only "brown". How can I create a table of counts for all four hair colors?
Proc FREQ will create a bin for each distinct value of hair_color. If you have more than the four colors you listed but want only counts of those four, you will want to use the IN operator in the where clause
proc freq data=fake_dataset;
tables hair_color;
where hair_color in ("brown", "black", "blonde", "red");
run;
The commas separating the values list in the IN clause are optional.

By group controlling line colors/where clause

I want to plot Y by X plot where I group by year, but color code year based on different variable (dry). So each year shows as separate line but dry=1 years plot one color and dry=0 years plot different color. I actually figured one option (yeah!) which is below. But this doesn't give me much control.
Is there a way to put a where clause in the series statement to select specific categories so that I can specifically assign a color (or other format)? Or is there another way? This would be analogous to R where one can use multiple line statements for different subsets of data.
Thanks!!
This code works.
proc sgplot data = tmp;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Use an Attribute map, see the documentation
You can use the DRY variable to set the specific colours. For each year, assign the colour using the DRY variable in a data step.
proc sort data=tmp out=attr_data; by year; run;
data attrs;
set attr_data;
id='year';
if dry=0 then linecolor='green';
if dry=1 then linecolor='red';
keep id linecolor;
run;
Then add the dattrmap=attrs in the PROC SGPLOT statement and the attrid=year in the SGPLOT options.
ods graphics / attrpriority=none;
proc sgplot data = tmp dattrmap=attrs;
where microsite_id = "&msit";
by microsite_id ;
yaxis label= "Pct. Stakes" values = (0 to 100 by 20);
xaxis label= 'Date' values = (121 to 288 by 15);
series y=tpctwett x=jday / markers markerattrs=(symbol=plus) group = year grouplc=dry groupmc=dry attrid=year;
format jday tadjday metajday jdyfmt.;
label tpctwett='%surface water' tadval1='breed' metaval1='meta';
run;
Note that I tested and edited this post so it should work now.

SAS, exclude any value that is flagged x in another column unless it is also flagged y

I have multiple values in a flag column. I want any ID's eliminated that have been flagged blue unless it has also been flagged red somewhere else.
Id flag
a red
b blue
c red
d red
e blue
a blue
result:
ID
a
c
d
thank you
proc sql noprint;
select distinct id from dataset
where id ne 'blue';
quit;

Different number formats for axis value labels and bar labels in PROC GCHART

Is there any way to specify formats directly for axis values and data labels? As far as I can tell, it uses whatever format is applied to the dependent variable.
Example:
data sample;
input group $ number;
format number dollar6.1;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
axis1 minor=none order=0 to 60 by 10;
proc gchart data=sample;
vbar group/ type=sum sumvar=number sum levels=all raxis=axis1;
run;
If I set the format to dollar6.1 then the axis labels have an unecessary decimal (0.0, 10.0, 20.0, etc.)
But, if I set the format to dollar6.0, then the labels on the tops of each bar are missing the decimal that I would like to show.
Any way to specify formats independantly for either of these?
I don't believe you can control the formats separately; you have limited kinds of control as far as time axis, log axis, etc., but otherwise no control over the numeric format.
What you can do is one of two things. At least in SGPLOT, you can create a secondary variable with a different format, and produce an empty graph (or an identical copy of your bar chart but with no label) using the variable formatted how you want the axis formatted; then produce the chart with the second, otherly formatted variable.
Secondly, you can assign explicit values to the axis. Rather than using the automatic values arising from your data, you can just use VALUE= to overwrite the labels placed on the tick marks. This isn't optimal if you have a varying axis (ie, you produce twenty of these with different axis amounts or whatnot), but if it's a fixed axis then you can probably get away with this. Look at the AXIS statement in GChart for more information.
How you'd do the first option:
data sample;
input group $ number;
format number dollar6.1;
axis_number = number;
format axis_number dollar6.0;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
proc sgplot data=sample;
vbar group /response=axis_number;
vbar group /response=number datalabel;
yaxis label='Number (sum)';
run;
That creates the bar chart twice, once with axis_number which then defines the axis, and once with number which defines the labels.
You can do this sort of thing using an annotate dataset. I'd give a better explanation of this if I had a solid understanding of how it works, but I use it so rarely that it's usually more of a trial-and-error process:
data sample;
input group $ number;
format number dollar6.0;
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
Create anno dataset. I pulled this from the link above and got rid of extraneous stuff. Set [function]='label', [position] = '2' to place the labels above the bars, xsys = 2' and ysys = 2 to base the coordinates on the data values. size and style control the font.
midpoint=group puts the labels on the bars, y=number makes the y coordinate of the label equal the height of the bars, and text is where you specify the value and format of your label.
SAS Annotate Dictionary
data anno;
length function style $12;
retain function 'label' size 1 position '2'
xsys '2' ysys '2' style 'Albany AMT';
set sample;
midpoint=group;
y=number;
text=put(number,dollar6.1);
run;
Make your chart using your current code, but removing the sum and inserting annotate=anno.
axis1 minor=none order=0 to 60 by 10;
proc gchart data=sample;
vbar group/ type=sum sumvar=number annotate=anno levels=all raxis=axis1;
run;
If you're running 9.2 or later, and are happy to use the Graphics Template Language (GTL) then you can do it like this:
Add a new column to your data that rounds the value:
data sample;
input group $ number;
format number dollar6.1;
axisval=round(number,1);
cards;
A 55.2
B 20.3
C 47.1
D 43.2
;
run;
Define the chart:
proc template;
define statgraph mychart;
begingraph;
layout overlay;
barchartparm x=group y=axisval / datalabel=number;
endlayout;
endgraph;
end;
run;
Render the chart using the data we created earlier:
proc sgrender data=sample template=mychart;
run;
The trick here is using the datalabel= option of the barchartparm statement to specify which column contains the values for the labels. There may be some other ways to do this using the GTL and specifying formats but this seemed pretty straightforward to me.
The GTL is included in Base SAS 9.2 onwards I believe.