replacing field name suffixes in bulk - sas

I have a dataset where I have several variables with suffixes that correspond to given dates. I want to replace the suffixes with the dates to make my output tables more user friendly.
Here is a sample of my code
the fields in my sales dataset are
product number_of_sales_1 number_of_sales_2 number_of_sales_3 revenue_1 revenue_2 revenue_3 tax_1 tax_2 tax_3
The suffixes 1,2,3 correspond to dates which are held in a second dataset with the following format
dates
id date
1 01Apr
2 01May
3 01Jun
I want to bulk replace the suffixes with the dates so my fields in sales become
product number_of_sales_01Apr number_of_sales_01May number_of_sales_01Jun revenue_01Apr revenue_01May revenue_01Jun tax_01Apr tax_01May tax_01Jun
Both the number of dates and the numberof metrics in sales are dynamic so I can't just hardcode in the the code.

I assume your datasets look like below:
data sales;
product="abc";number_of_sales_1=1;number_of_sales_2=2;number_of_sales_3=3;
revenue_1=1000;revenue_2=2000;revenue_3=3000;tax_1=100;tax_2=200;tax_3=300;
run;
data dates;
id=1;date="01Apr";output;id=2;date="01May";output;id=3;date="01Jun";output;
run;
1st Step - Finding out the dates variables which needs to be renamed
proc contents data=sales out=sales_temp(keep=name) noprint; run;
data sales_temp1;
length check_date_vars $1. id 8.;
set sales_temp;
check_date_vars=compress(substr(name,length(name)));
temp=notdigit(check_date_vars);
if temp=0 then id=check_date_vars;
run;
2nd step - Merging the above dataset with the datset which contains the formats, to create a mapping between old names and new names and creating macro variables out of it
proc sort data=sales_temp1; by id; run;
proc sort data=dates; by id; run;
data sales_temp_date;
merge sales_temp1(in=a) dates(in=b);
by id;
if a and b;
new_name=substr(name,1,length(name)-1)||date;
run;
proc sql noprint;
select count(*) into :num_vars separated by " " from sales_temp_date;
quit;
proc sql noprint;
select name into:old_name1 - :old_name&num_vars. from sales_temp_date;
select new_name into:new_name1 - :new_name&num_vars. from sales_temp_date;
quit;
3rd Step - Renaming the variables
%macro rename();
proc datasets library=work nolist;
modify sales;
rename
%do i=1 %to &num_vars.;
&&old_name&i.= &&new_name&i.
%end;
;
run;
%mend;
%rename;

Related

Macrotise proc format in SAS

I've created a user defined format by using proc format statements.Would like to create a macro over it in a way that if the input data changes, the code should able to do change accordingly.
Here is the code:
proc format ;
value $a 1='1-sepstrata'
0='0-Non-sepstrata'
A='A-sepstrata';
run;
In the dateset I've,a columns named stratum which has unique values such as 1,0,A.
Select the distinct values of STRATA and use it to generate the format definition in a file. Then use PROC FORMAT to create the format.
proc sql;
create table fmtdef as
select '$A' as fmtname
, strata as start
, catx('-',strata
,case when (strata='0') then 'Non-sepstrata' else 'sepstrata' end
) as label
from have
group by strata
order by fmtname,start
;
quit;
proc format lib=work.formats cntlin=fmtdef;
run;

Out of Memory using PROC FREQ

I have approximately 1,000,000 rows and 25 columns of data and I'm trying to return a list of column names, the number of distinct values and whether there are missing values.
I am not able to directly code in column names in PROC SQL and count distinct as I have numerous data sets with different column names and I'm trying to automatically return the desired outcome for all tables with one piece of code.
I've tried running the following code
proc freq nlevels data= &DATASET_NAME;
ods output nlevels=nlevels ;
tables _all_ NOPRINT;
run;
This returns an out of memory error. Is there another way to achieve the result, avoiding the out of memory error.
It is unnecessary to input column name by table _all_, but it possibly makes out of memory by inputting all columns at the same time, try to separate column to do proc freq and then combine results:
proc sql;
create table name as
select name from dictionary.columns where libname='SASHELP' and memname='CLASS';
quit;
data want;
run;
data _null_;
set name;
call execute(
'proc freq data=class nlevels;
table '||name||';
ods output nlevels=nlevels;
run;
data want;
set want nlevels;
run;'
);
run;
This question is very similar to SAS summary statistic from a dataset
The answers cover techniques for
transpose + freq
hash
freq w/ ODS exclude+output

Value labels to be created using data from another data set

I am having two data sets. The first data set has airport codes (JFK, LGA, EWR) in a variable 'airport'. The second dataset has the list of all major airports in the world. This dataset has two variables 'faa' holding the FAA Code (like JFG, LGA, EWR) and 'name' holding the actual name of the airport (John. F Kennedy, Le Guardia etc.).
My requirement is to create value labels for in the first data set, so that instead of airport code, the actual name of the airport comes up. I know I can use custom formats to achieve this. But can I write SAS code which can read the unique airport codes, then get the names from another data set and create a value label automatically?
PS: Other wise, the only option I see is to use MS Excel to get the unique list of FAA codes in dataset 1, and then use VLOOKUP to get the names of the airports. And then create one custom format by listing each unique FAA code and the airport name.
I think "value label" is SPSS terminology. Looks like you want to create a format. Just use your lookup table to create an input dataset for PROC FORMAT.
So if your second table looks like this:
data table2;
length FAA $4 Name $40 ;
input FAA Name $40. ;
cards;
JFK John F. Kennedy (NYC)
LGA Laguardia (NYC)
EWR Newark (NJ)
;
You can use this code to convert it into a dataset that PROC FORMAT can use to create a format.
data fmt ;
fmtname='$FAA';
hlo=' ';
set table2 (rename=(faa=start name=label));
run;
proc format cntlin=fmt lib=work.formats;
run;
Now you can use that format with your other data.
proc freq data=table1 ;
tables airport ;
format airport faa. ;
run;
Firstly, consider if it is really a format what is needed. For example, you may just do a left join to retrieve the column (airport) name from table2 (FAA-Name table).
Anyway, I believe the following macro does the trick:
Create auxiliary tables:
data have1;
input airport $;
datalines;
a
d
e
;
run;
data have2;
input faa $ name $;
datalines;
a aaaa
b bbbb
c cccc
d dddd
;
run;
Macro to create Format:
%macro create_format;
*count number of faa;
proc sql noprint;
select distinct count(faa) into:n
from have2;
quit;
*create macro variables for each faa and name;
proc sql noprint;
select faa, name
into:faa1-:faa%left(&n),:name1-:name%left(&n)
from have2;
quit;
*create format;
proc format;
value $airport
%do i=1 %to &n;
"&faa%left(&i)" = "&name%left(&i)"
%end;
other = "Unknown FAA code";
run;
%mend create_format;
%create_format;
Apply format:
data want;
set have1;
format airport $airport.;
run;

SAS - Add origin table name as a column in report

I have an output table that contains 300+ variables from 30 different tables that are joined by UNION, which is used for modelling. I have created a macro that creates a report with a number of statistics, such as mean, min/max values etc. using this output table. I am trying to add a column to the report that details which table(s) the variables come from. I say table(s) as some of the variables are shared across different tables. I want to avoid having the same variable in the report multiple times as the statistics are the same irrespective of what table the variable comes from. Is there an efficient way to do this?
Instead of UNION consider using a DATA STEP and then use the INDSNAME option instead.
data want;
set sashelp.class sashelp.cars indsname=source;
source_dataset = source;
run;
If it were me, I would loop over each of the union datasets and just put the table name and variable names into a compiled dataset. You probably have all the table names in either a macro list or typed out, so you can just add a few more lines of code to run proc contents on each of those to compile a full list of table and variable names. Note that like your example, there will be duplicate variable names that you can modify after the table is compiled:
** create different tables **;
data height; set sashelp.class(keep=name height); run;
data weight; set sashelp.class(keep=name weight); run;
data sex; set sashelp.class(keep=name sex); run;
** put your datasets into a list either manually or dynamically **;
/* manually */
%let ds_list=height weight sex;
/* dynamically -- be careful to include only tables in your union */
proc sql noprint;
select MEMNAME
into: ds_list separated by " "
from sashelp.vmember
where libname = "WORK" and memname not in ("SASMACR","FORMATS");
quit;
%put &ds_list.;
** loop over each table to put the table name and variables in a dataset **;
%MACRO get_names(ds_list);
%do i=1 %to %sysfunc(countw(&ds_list.));
%let ds = %scan(&ds_list.,&i.);
proc contents data = &ds. noprint
out=names_&ds.(keep=MEMNAME NAME rename=(MEMNAME=SOURCE_DATASET));
run;
proc append data = names_&ds. base=full force; run;
%end;
%MEND;
%get_names(&ds_list.);
I managed to do this using the following:
Create table with source tables.
PROC SQL;
CREATE TABLE SOURCES AS
SELECT NAME
,MEMNAME
FROM DICTIONARY.COLUMNS
WHERE LIBNAME='LIBNAME'
ORDER BY 1,2;
RUN;
Join to my stats table.
PROC SQL;
CREATE TABLE STATS_NEW AS
SELECT memname AS TABLE_NAME,a.*
FROM STATS a
LEFT JOIN SOURCES b
ON a.name = b.name
GROUP BY a.name
ORDER BY a.name;
QUIT;
Transpose data and add in comma separators.
DATA STATS_TRANSPOSE (drop=TABLE_NAME);
LENGTH INPUT_TABLES $1000;
SET STATS_NEW;
BY name;
RETAIN INPUT_TABLES;
IF FIRST.name THEN DO; INPUT_TABLES=TABLE_NAME; END;
IF NOT FIRST.name
THEN DO;
INPUT_TABLES=CATS(INPUT_TABLES,', ',TABLE_NAME);
END;
IF LAST.name THEN DO;
IF name IN ('FIELD1','FIELD2')
THEN DO; INPUT_TABLES='ALL'; END;
OUTPUT;
END;
RUN;

Adding date in variable name in SAS

I have a column with name total transaction. I want to add a date 4 days back from now in its name .
For example if today is 20161220 so I want my variable to be renamed as total_transaction_20161216.
Please suggest me a way out of my problem.
Just create a macro variable that stores the required date format and then use that in a rename statement within proc datasets.
%let datevar = %sysfunc(intnx(day,%sysfunc(today()),-4),yymmddn8.);
%put &=datevar.;
data have;
total_transaction=1;
run;
proc datasets lib=work nolist nodetails;
modify have;
rename total_transaction = total_transaction_&datevar.;
quit;