Populate SAS macro-variable using a SQL statement within another SQL statement? - sas

I stumbled upon the following code snippet in which the variable top3 has to be filled from a table have rather than from an array of numbers.
%let top3 = 14 15 42; /* This should be made obsolete.. */
%let no = 3;
proc sql;
create table want as
select *
from (select x, y from foo) a
%do i = 1 %to &no.;
%let current = %scan(&top3.,&i.); /* What do I need to put here? */
left join (select x, y from bar where z=&current.) row_&current.
on a.x = row_&current..x
%end;
;
quit;
The table have contains the xs from the string and looks as follows:
i x
1 14
2 15
3 42
I am now wondering how I should modify the %let current = ... line such that current is populated from the table have. I know how to populate a macro variable using proc sql with select .. into, but I am afraid that the way I am going right now is fully against SAS philosophy.

It looks like you're more or less transposing something. If that's the case, this is doable in macro/sql pretty easily.
First, here's the simple version - no macro.
proc sql;
create table class_t as
select * from (
select name from sashelp.class ) class
left join (
select name, age as age_Alfred
from sashelp.class
where name='Alfred') Alfred
on class.name = Alfred.name
;
quit;
We grab the value of age from the Alfred row and put it on the main join. This isn't exactly what you're doing, but it seems similar. (I'm just using one table, but you can of course use two here.)
Now, how do we extend this to be table-driven and not handwritten? Macros!
First, here's the macro - just taking the Alfred bit and making it generic.
%macro joiner(name=);
left join (
select name, age as age_&name.
from sashelp.class
where name="&name.") &name.
on class.name = &name..name
%mend joiner;
Second, we look at this and see two things we need to put into macro lists: the SELECT variable list (we'll get one new variable for each call), and the JOIN list.
proc sql;
select cats('%joiner(name=',name,')')
into :joinlist separated by ' '
from sashelp.class;
select cats(name,'.age_',name)
into :selectlist separated by ','
from sashelp.class;
quit;
And then, we just call it!
proc sql;
create table class_t as
select class.name,&selectlist. from (
select name from sashelp.class) class
&joinlist.
;
quit;
Now, your dataset you call the macro lists from is perhaps the dataset with the 3 rows in it you have above ("have"). The dataset you actually get the appending data from is some other dataset ("bar"), right? And then the ones you join to is perhaps a third dataset ("foo"). Here I just use the one, for simplicity, but the concept is the same, just different sources.

When the lookup data is in a table you can perform a three way join without any need for SAS Macro. You don't provide any data so the example will mock some.
Example:
Suppose a master record has several associated detail records, and the detail records contain a z value used for selection into a result set per a wanted z lookup table.
data masters;
call streaminit(2020);
do id = 1 to 100;
do x = 1 to 100;
m_rownum + 1;
code = rand('integer', 10,45);
output;
end;
end;
run;
data details;
call streaminit(2020);
do date = 1 to 20;
do x = 1 to 100;
do rep = 1 to 5;
d_rownum + 1;
amount = rand('integer', 100,200);
z = rand('integer', 10,45);
output;
end;
end;
end;
run;
data zs;
input z ##; datalines;
14 15 42
;
proc sql;
create table want as
select
m_rownum
, d_rownum
, masters.id
, masters.x
, masters.code
, details.z
, details.date
, details.amount
from
masters
left join
details
on
details.x = masters.x
inner join
zs
on
zs.z = details.z
order by
masters.id, masters.x, details.z, details.date
;
quit;

Related

Transformation of data using proc transpose -- static column names lead to duplicate records

I have the following data strucure, which I need to transform:
I have been working on transposing the data properly to obtain a table structure.
Target structure
Attribute 1, Attribute 2, ...., Attribute n
Outcome1-A, Outcome2-A, ...., Outcome n-A .....
....
Outcome A-Z, Outcome2-Z, ...., Outcome n-Z .....
I started with the following statement, modifications would be great. The static names of the attributes are imported as duplicate records.
PROC TRANSPOSE DATA=INPUT_TAB OUT=vertikal ;
VAR v name ;
id n;
RUN;
The id statement serves tels SAS which variable to use for column names, and if I understand your question well, that is Name, not n. Hence Name should not be in the VAR list
You should also tell SAS when to start a new observation (aka row), using a by statement. Probably one of the columns left of Name is a good candidate. Let's name it group_id. Hence the code should look like this:
PROC TRANSPOSE DATA=INPUT_TAB OUT=vertikal ;
by group_id;
VAR v;
id Name;
RUN;
If there is no variable like group_id available, you can create one this way:
DATA INPUT_VIEW / view=INPUT_VIEW;
set INPUT_TAB;
retain group_id 1;
if Name eq "SZENARIO_ID" then group_id = group_id + 1;
PROC TRANSPOSE DATA=INPUT_VIEW OUT=vertikal ;
by group_id;
VAR v;
id Name;
RUN;
I choose a view to avoid needlessly writing data to disk, but that only matters for large datasets.
I expanded the table using COL_TYP, however the transpose command does not work, see error message
PROC SQL;
CREATE TABLE TEST_CASE_WHEN AS
SELECT A.*,
CASE WHEN A.NAME IN ('SZENARIO_ID') THEN 1
ELSE 0 END AS COL_TYP
FROM WORK.TEST A
;
QUIT;
proc sort
data = work.INPUT_VIEW out= input_view2;
BY N;
RUN;
PROC TRANSPOSE DATA=INPUT_VIEW2 OUT=vertikal ;
by COL_TYP;
VAR v;
id Name;
RUN;
ERROR: Data set WORK.INPUT_VIEW2 is not sorted in ascending sequence. The current BY group has
COL_TYP = 2 and the next BY group has COL_TYP = 0.
Thanks and Kind Regards
Kingsley

SAS: Replace rare levels in variable with new level "Other"

I've got pretty big table where I want to replace rare values (for this example that have less than 10 occurancies but real case is more complicated- it might have 1000 levels while I want to have only 15). This list of possible levels might change so I don't want to hardcode anything.
My code is like:
%let var = Make;
proc sql;
create table stage1_ as
select &var.,
count(*) as count
from sashelp.cars
group by &var.
having count >= 10
order by count desc
;
quit;
/* Join table with table including only top obs to replace rare
values with "other" category */
proc sql;
create table stage2_ as
select t1.*,
case when t2.&var. is missing then "Other_&var." else t1.&var. end as &var._new
from sashelp.cars t1 left join
stage1_ t2 on t1.&var. = t2.&var.
;
quit;
/* Drop old variable and rename the new as old */
data result;
set stage2_(drop= &var.);
rename &var._new=&var.;
run;
It works, but unfortunately it is not very officient as it needs to make a join for each variable (in real case I am doing it in loop).
Is there a better way to do it? Maybe some smart replace function?
Thanks!!
You probably don't want to change the actual data values. Instead consider creating a custom format for each variable that will map the rare values to an 'Other' category.
The FREQ procedure ODS can be used to capture the counts and percentages of every variable listed into a single table. NOTE: Freq table/out= captures only the last listed variable. Those counts can be used to construct the format according to the 'othering' rules you want to implement.
data have;
do row = 1 to 1000;
array x x1-x10;
do over x;
if row < 600
then x = ceil(100*ranuni(123));
else x = ceil(150*ranuni(123));
end;
output;
end;
run;
ods output onewayfreqs=counts;
proc freq data=have ;
table x1-x10;
run;
data count_stack;
length name $32;
set counts;
array x x1-x10;
do over x;
name = vname(x);
value = x;
if value then output;
end;
keep name value frequency;
run;
proc sort data=count_stack;
by name descending frequency ;
run;
data cntlin;
do _n_ = 1 by 1 until (last.name);
set count_stack;
by name;
length fmtname $32;
fmtname = trim(name)||'top';
start = value;
label = cats(value);
if _n_ < 11 then output;
end;
hlo = 'O';
label = 'Other';
output;
run;
proc format cntlin=cntlin;
run;
ods html;
proc freq data=have;
table x1-x10;
format
x1 x1top.
x2 x2top.
x3 x3top.
x4 x4top.
x5 x5top.
x6 x6top.
x7 x7top.
x8 x8top.
x9 x9top.
x10 x10top.
;
run;

sas + formatting macro variable result created using proc sql select into:

I use code to write tables that captures the total count for each column as a macro variable, then uses it in the labels statement to complete the table column headers.
The count cohort&cnum._tot is created as:
proc sql noprint;
select count(*) into : cohort&cnum._tot from &analytic_file. (&&cohort&cnum);
quit;
And is used:
proc print data=TABLES.&tbl noobs label split="*";
var label_
c1_STAT1 c2_STAT1 c12_stat
c3_STAT1 c4_STAT1 c34_stat
c5_STAT1 c6_STAT1 c56_stat ;
* labeling step creates column header detail ;
label
%do i=1 %to &num;
c&i._STAT1 = "&&&c&i.lab. * N= &&cohort&i._tot. * N"
%end;
c12_stat = "* * * % of row"
c34_stat = "* * * % of row"
c56_stat = "* * * % of row"
;
run;
I've looked around and can't find a solution ... so I'm here asking is there a way to format &&cohort&i._tot. so that it returns 8,675,309 instead of 8675309?
Thanks!
You can format the count(*) in the select by using the PUT function. In this example the row count is multiplied to get a number large enough to require commas. The TRIMMED option removes leading and trailing spaces from the value before sticking it into the macro variable.
proc sql noprint;
select put( 123456789 * count(*),comma18.-L) into :count trimmed from sashelp.class;
%put !&count.!;
The alternative is to format the macro value using sysfunc. Two ways, either works.
%put %sysfunc(sum(&count.), comma12.); %* format feature of sysfunc evaluation;
%put %sysfunc(putn(&count , comma12.)); %* versus putn function;
You can assign the format in your proc sql using format=comma12.
Your code would be like this:
proc sql noprint;
select count(*) format=comma12. into : cohort&cnum._tot from &analytic_file. (&&cohort&cnum);
quit;

How to write a concise list of variables in table of a freq when the variables are differentiated only by a suffix?

I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to &top;
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to &top;
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)

How to create an output row if a proc sql “group by” group has no no observations

I am working in SAS Enterprise guide and am running a proc sql query as follows:
proc sql;
CREATE TABLE average_apples AS
SELECT farm, size, type, mean(apples) as average_apples
FROM input_table
GROUP BY farm, size, type
;
quit;
For some of the data sets I am running this query on there are groups which have no observations assigned to them, so there is no entry for them in the query output.
How can I force this query to return a row for each of my groups (for example with a value of 0in the apples column?
Thanks up front for the help!
I'd do this:
/* sample input table */
data input_table;
length farm size type $3 apples 8;
stop; /* try also with this statement commented out
to check the result for non-empty input table */
run;
proc sql;
CREATE TABLE average_apples AS
SELECT farm, size, type, mean(apples) as average_apples
FROM input_table
GROUP BY farm, size, type
;
quit;
%let group_rows = &SQLOBS;
%put &group_rows;
data average_apples_blank;
if &group_rows ne 0 then set average_apples(obs=0);
else do;
array zeros {*} _numeric_ /* or your list of variables */;
do i=1 to dim(zeros);
zeros[i] = 0;
end;
output; /* empty row */
end;
drop i;
run;
proc append base=average_apples data=average_apples_blank force;
run;
Try this
proc sql;
select f.farm, s.size, t.type, coalesce(mean(apples), 0) as average_apples
from (select distinct farm from input_table) as f
, (select distinct size from input_table) as s
, (select distinct type from input_table) as t
left join input_table as i
on i.farm = f.farm and i.size = s.size and i.type t.type;
quit;
I did not test it, though. It it does not work, put this in a comment and I will debug it.