SAS: ODS EXCEL (how to name different sheets) - sas

I want to export a table generated by PROC TABULATE. My code goes like this:
ODS EXCEL FILE="myFile.xlsx" (options sheet_name="CRIME TYPE");
PROC TABULATE DATA=myData;
TITLE 'myTitle';
BY crime_type;
CLASS year;
CLASS nation / ORDER=FREQ;
TABLES year, nationality / CONDENSE;
RUN;
ODS EXCEL CLOSE;
This creates me an excel file with different sheets:
THEFT
country1 country2 country3 ...
--------------------------------------
1990
1991
1992
--------------------------------------
ASSAULT
country1 country2 country3 ...
--------------------------------------
1990
1991
1992
--------------------------------------
Unfortunately, the sheets do not have the names of the different crimes (theft, assault, …) but are called "CRIME TYPE 1", "CRIME TYPE 2" and so forth (SHEET_NAME="CRIME TYPE").
Does anyone know how to name the sheets according to the values of the variable crime_type?

If you want to name the sheets using values of crime_type variable, you can use options(sheet_name='#byval1') instead (options sheet_name="CRIME TYPE")

try this solution from SAS support
https://communities.sas.com/t5/ODS-and-Base-Reporting/ODS-Excelxp-PROC-TABULATE-multiple-sheets/td-p/359181

Related

I would like to represent the title and gender variables, in my SAS table, as numbers. How do I do this in SAS?

I would like to represent the title and gender variables as numbers. What code do I need to add to do this?
DATA test;
INPUT title$ gender$ name$ age;
CARDS;
Mr Male Micheal 20
Mrs Female Stephanie 25
Mr Female Linda 30
Dr Male James 40
Dr Female Jane 45;
run;
Below is my attempt at the question. However something is wrong because the title and gender variables does not change!
proc format library = Work;
value $title_ 'Mr' = 1 'Mrs' = 2 'Dr' = 3;
value $gender_ 'Male' = 1 'Female' = 2;
run;
OPTIONS FMTSEARCH = (Work);
data test;
format $title $title_;
set test;
run;
You're nearly there - you just have slightly wrong syntax for your format statement. This is your current format statement:
format $title $title_;
Here's a corrected one. I've extended it to apply your gender format as well:
format title $title_. gender $gender_.;
It is not necessary to overwrite a dataset to apply a format, i.e.
data mydata;
set mydata;
format ...;
run;
You can apply one directly by using proc datasets instead of writing a data step like the one above, e.g.
proc datasets lib = work;
modify test;
format title $title_. gender $gender_.;
run;
quit;

Look up and replace values from a separate table in SAS

Dataset HAVE includes two variables with misspelled names in them: names and friends.
Name Age Friend
Jon 11 Ann
Jon 11 Tom
Jimb 12 Egg
Joe 11 Egg
Joe 11 Anne
Joe 11 Tom
Jed 10 Ann
I have a small dataset CORRECTIONS that includes wrong_names and resolved_names.
current_names resolved_names
Jon John
Ann Anne
Jimb Jim
I need any name in names or friends in HAVE that matches a name in the wrong_names column of CORRECTIONS to get recoded to the corresponding string in resolved_name. The resulting dataset WANT should look like this:
Name Age Friend
John 11 Anne
John 11 Tom
Jim 12 Egg
Joe 11 Egg
Joe 11 Anne
Joe 11 Tom
Jed 10 Anne
In R, I could simply invoke each dataframe and vector using if_else(), but the DATA step in SAS doesn't play nicely with multiple datasets. How can I make these replacements using CORRECTIONS as a look-up table?
There are many ways to do a lookup in SAS.
First of all, however, I would suggest to de-duplicate your look-up table (for example, using PROC SORT and Data Step/Set/By) - deciding which duplicate to keep (if any exist).
As for the lookup task itself, for simplicity and learning I would suggest the following:
The "OLD SCHOOL" way - good for auditing inputs and outputs (it is easier to validate the results of a join when input tables are in the required order):
*** data to validate;
data have;
length name $10. age 4. friend $10.;
input name age friend;
datalines;
Jon 11 Ann
Jon 11 Tom
Jimb 12 Egg
Joe 11 Egg
Joe 11 Anne
Joe 11 Tom
Jed 10 Ann
run;
*** lookup table;
data corrections;
length current_names $10. resolved_names $10.;
input current_names resolved_names;
datalines;
Jon John
Ann Anne
Jimb Jim
run;
*** de-duplicate lookup table;
proc sort data=corrections nodupkey; by current_names; run;
proc sort data=have; by name; run;
data have_corrected;
merge have(in=a)
corrections(in=b rename=(current_names=name))
;
by name;
if a;
if b then do;
name=resolved_names;
end;
run;
The SQL way - which avoids sorting the have table:
proc sql;
create table have_corrected_sql as
select
coalesce(b.resolved_names, a.name) as name,
a.age,
a.friend
from work.have as a left join work.corrections as b
on a.name eq b.current_names
order by name;
quit;
NB the Coalesce() is used to replace missing resolved_names values (ie when there is no correction) with names from the have table
EDIT: To reflect Quentin's (CORRECT) comment that I'd missed the update to both name and friend fields.
Based on correcting the 2 fields, again many approaches but the essence is one of updating a value only IF it exists in the lookup (corrections) table. The hash object is pretty good at this, once you've understood it's declaration.
NB: any key fields in the Hash object need to be specified on a Length statement BEFOREHAND.
EDIT: as per ChrisJ's alternative to the Length statement declaration, and my reply (see below) - it would be better to state that key variables need to be defined BEFORE you declare the hash table.
data have_corrected;
keep name age friend;
length current_names $10.;
*** load valid names into hash lookup table;
if _n_=1 then do;
declare hash h(dataset: 'work.corrections');
rc = h.defineKey('current_names');
rc = h.defineData('resolved_names');
rc = h.defineDone();
end;
do until(eof);
set have(in=a) end=eof;
*** validate both name fields;
if h.find(key:name) eq 0 then
name = resolved_names;
if h.find(key:friend) eq 0 then
friend = resolved_names;
output;
end;
run;
EDIT: to answer the comments re ChrisJ's SQL/Update alternative
Basically, you need to restrict each UPDATE statement to ONLY those rows that have name values or friend values in the corrections table - this is done by adding another where clause AFTER you've specified the set var = (clause). See below.
NB. AFAIK, an SQL solution to your requirement will require MORE than 1 pass of both the base table & the lookup table.
The lookup/hash table, however, requires a single pass of the base table, a load of the lookup table and then the lookup actions themselves. You can see the performance difference in the log...
proc sql;
*** create copy of have table;
create table work.have_sql as select * from work.have;
*** correct name field;
update work.have_sql as u
set name = (select resolved_names
from work.corrections as n
where u.name=n.current_names)
where u.name in (select current_names from work.corrections)
;
*** correct friend field;
update work.have_sql as u
set friend = (select resolved_names
from work.corrections as n
where u.friend=n.current_names)
where u.friend in (select current_names from work.corrections)
;
quit;
Given data
*** data to validate;
data have;
length name $10. age 4. friend $10.;
input name age friend;
datalines;
Jon 11 Ann
Jon 11 Tom
Jimb 12 Egg
Joe 11 Egg
Joe 11 Anne
Joe 11 Tom
Jed 10 Ann
run;
*** lookup table;
data corrections;
length from_name $10. to_name $10.;
input from_name to_name;
datalines;
Jon John
Ann Anne
Jimb Jim
run;
One SQL alternative is to perform a existent mapping select look up on each field to be mapped. This would be counter to joining the corrections table one time for each field to be mapped.
proc sql;
create table want1 as
select
case when exists (select * from corrections where from_name=name)
then (select to_name from corrections where from_name=name)
else name
end as name
, age
, case when exists (select * from corrections where from_name=friend)
then (select to_name from corrections where from_name=friend)
else friend
end as friend
from
have
;
Another, SAS only way, to perform inline left joins is to use a custom format.
data cntlin;
set corrections;
retain fmtname '$cohen'; /* the fixer */
rename from_name=start to_name=label;
run;
proc format cntlin=cntlin;
run;
data want2;
set have;
name = put(name,$cohen.);
friend = put(friend,$cohen.);
run;
You can use an UPDATE in proc sql :
proc sql ;
update have a
set name = (select resolved_names b from corrections where a.name = b.current_names)
where name in(select current_names from corrections)
;
update have a
set friend = (select resolved_names b from corrections where a.friend = b.current_names)
where friend in(select current_names from corrections)
;
quit ;
Or, you could use a format :
/* Create format */
data current_fmt ;
retain fmtname 'NAMEFIX' type 'C' ;
set resolved_names ;
start = current_names ;
label = resolved_names ;
run ;
proc format cntlin=current_fmt ; run ;
/* Apply format */
data want ;
set have ;
name = put(name ,$NAMEFIX.) ;
friend = put(friend,$NAMEFIX.) ;
run ;
Try this:
proc sql;
create table want as
select p.name,p.age,
case
when q.current_names is null then p.friend
else q.resolved_names
end
as friend1
from
(
select
case
when b.current_names is null then a.name
else b.resolved_names
end
as name,
a.age,a.friend
from
have a
left join
corrections b
on upcase(a.name) = upcase(b.current_names)
) p
left join
corrections q
on upcase(p.friend) = upcase(q.current_names);
quit;
Output:
name age friend
John 11 Anne
Jed 10 Anne
Joe 11 Anne
Jim 12 Egg
Joe 11 Egg
Joe 11 Tom
John 11 Tom
Let me know in case of any clarifications.

sas relative frequencies by group

I have a categorical variable, say SALARY_GROUP, and a group variable, say COUNTRY. I would like to get the relative frequency of SALARY_GROUP within COUNTRY in SAS. Is it possible to get it by proc SUMMARY or proc means?
Perhaps explore proc tabulate and a counter variable?
Yes, You can calculate the relative frequency of a categorical variable using both Proc Means and Proc Summary. For both procs you have to:
-Specify NWAY in the proc statement,
-Specify in the Class statement your categorical fields,
-Specify in the Var statement your response or numeric field.
Example below is for proc means:
Dummy Data:
/*Dummy Data*/
data work.have;
input Country $ Salary_Group $ Value;
datalines;
USA Group1 100
USA Group1 100
GBR Group1 100
GBR Group1 100
USA Group2 20
USA Group2 20
GBR Group2 20
GBR Group1 100
;
run;
Code:
*Calculating Frequncy and saving output to table sg_means*/
proc means data=have n nway ;
class Country Salary_Group;
var Value;
output out=sg_means n=frequency;
run;
Output Table:
Country=GBR Salary_Group=Group1 _TYPE_=3 _FREQ_=3 frequency=3
Country=GBR Salary_Group=Group2 _TYPE_=3 _FREQ_=1 frequency=1
Country=USA Salary_Group=Group1 _TYPE_=3 _FREQ_=2 frequency=2
Country=USA Salary_Group=Group2 _TYPE_=3 _FREQ_=2 frequency=2

How to write an Excel file with a formatted date in the header, and formatted columns?

I want to export a dataset into an Excel file from SAS, like shown below:
Claim_id State Suffix Policy Amount
125 CA 231 cyt $58,000.00
458 dd 789 ghu $78,961.00
458 lk 586 lk -$56.00
785 ga 712 js -$89.00
It needs to have a header like such:
"As of [current month name] [current year].', for instance "As of January 2017".
Also if the amount is negative, it needs to show in the red color.
Title with today's Date in Month Name - Year format:
%let today_month = %sysfunc(today(), monname8.);
%let today_year = %sysfunc(today(), year4.);
%put &today_month. &today_year.;
title "As of &today_month. &today_year.";
Setting a column in Excel to a custom format:
/* This line goes into your PROC PRINT or PROC REPORT */
var amount / style(column)={tagattr="format: $#,##0.00_);[Red]($#,##0.00)"};
To tweak the format, in Excel, right click on the cell, go to Format Cells -> Custom, create your format, and paste the string into the "format: "part.
Example with ODS Excel:
data test;
input Claim_id Amount;
datalines;
125 58000
458 78961
458 -56
785 -89
;
run;
%let today_month = %sysfunc(today(), monname8.);
%let today_year = %sysfunc(today(), year4.);
%put &today_month. &today_year.;
ods excel file='output path and file name here.xlsx'
options(embedded_titles="yes");
proc print data=test noobs;
title "As of &today_month. &today_year.";
var claim_id;
var amount
/ style(column)={tagattr="format: $#,##0.00_);[Red]($#,##0.00)"};
run;
ods excel close;
Result:
Note that ODS EXCEL was experimental in 9.4M2 and became production in 9.4M3. To use the older ods tagsets.excelxp, which is XML but appears as an Excel file, simply swap that in for ods excel:
ods tagsets.excelxp file='output path and file name here.xlsx'
options(embedded_titles="yes");;
... code here ...
ods tagsets.excelxp close;
Sources:
http://blogs.sas.com/content/sasdummy/2014/09/21/ods-excel-and-proc-export-xlsx/ https://support.sas.com/resources/papers/proceedings16/SAS5642-2016.pdf http://support.sas.com/resources/papers/proceedings13/366-2013.pdf
Sounds like you should use ODS EXCEL, which ships with SAS 9.4 TS1M2 I believe. That would let you do exactly what you're asking.
If you don't have that version, you might do the same with tagsets.excelxp, though that doesn't create a normal-excel-file-type file; it would be an xml file that might need a further step to process.

SAS: PROC MEANS Grouping in Class Variable

I have the following sample data and 'proc means' command.
data have;
input measure country $;
datalines;
250 UK
800 Ireland
500 Finland
250 Slovakia
3888 Slovenia
34 Portugal
44 Netherlands
4666 Austria
run;
PROC PRINT data=have; RUN;
The following PROC MEANS command prints out a listing for each country above. How can I group some of those countries (i.e. UK & Ireland, Slovakia/SLovenia as Central Europe) in the PROC MEANS step, rather than adding another datastep to add a 'case when' etc?
proc means data=have sum maxdec=2 order=freq STACKODS;
var measure;
class country;
run;
Thanks for any help at all on this. I understand there are various things you can do in the PROC MEANS command itself (like limit the number of countries by doing this:
proc means data=have(WHERE=(country not in ('Finland', 'UK')
I'd like to do the grouping in the PROC MEANS command for brevity.
Thanks.
This is very easy with a format for any PROC that takes a CLASS statement.
Simply build a format, either with code or from data; then apply the format in the PROC MEANS statement.
proc format lib=work;
value $countrygroup
"UK"="British Isles"
"Ireland"="British Isles"
"Slovakia","Slovenia"="Central Europe"
;
quit;
proc means data=have;
class country;
var measure;
format country $countrygroup.;
run;
It's usually better to have numeric codes for country and then format those to be whichever set of names is needed at any one time, particularly as capitalization/etc. is pretty irritating, but this works well enough even here.
The CNTLIN= option in PROC FORMAT allows you to make a format from a dataset, with FMTNAME as the value statement, START as the value-to-label, LABEL as the label. (END=end of range if numeric.) There are other options also, the documentation goes into more detail.