I have a SAS table that has the if condition embedded in the condition1 column of that table. To be more explicit, I created a test dataset:
data test;
infile datalines delimiter=',';
input x1 x2 flag $ condition1 $ value_cond1_true $ value_cond1_false $ ;
datalines;
1,5, ,x1>x2,A,B
6,5, ,x2>x1,D,A
3,2, , ,C,D
;
run;
I am wondering if it possible to create a code that can directly output in the SAS code the if statement instead of creating a single macro-variable for each observation (&cond1_1, &cond1_2, ... &cond1_n).
Here is what I would want to do (I know it is not possible to use call symput in that case):
data final;
set test;
/* For each observation */
do i=1 to _n_;
/* Creating macro-variables for the if condition */
call symput("cond1",CONDITION1);
call symput("value_cond1_true",VALUE_COND1_TRUE);
call symput("value_cond1_false",VALUE_COND1_FALSE);
/* If the cond1 macro-variable is not empty then do */
if %sysevalf(%superq(cond1)=, boolean) = 0 then do;
if &cond1. then flag = &value_cond1_true.;
else flag = &value_cond1_false.;
end;
/* If the cond1 macro-variable is empty then */
else flag = "X";
end;
run;
Data can not modify the statements of a running DATA Step.
There is no 'dynamic expression resolver' that is part of data step.
There are some options though
Use the data to write source code
A different conditional has to be performed for each row (n)
Use resolve() to dynamically evaluate an expression in the macro system.
The values of the variables have to be replaced into the conditional for each row (n)
Write a program
filename evals temp;
data _null_;
file evals;
set test;
length statement $256;
put 'if _n_ = ' _n_ ' then do;';
if missing(condition1) then
statement = 'flag="X";'; /* 'call missing(flag);'; */
else
statement = 'flag = ifc('
|| trim(condition1) || ','
|| quote(trim(value_cond1_true )) || ','
|| quote(trim(value_cond1_false ))
|| ');';
put statement;
put 'end;';
run;
options source2;
data want;
set test;
length flag $8;
%include evals;
keep x1 x2 flag;
run;
filename evals;
RESOLVE function
data want;
set test;
length flag $8 cond expr $256;
cond = condition1;
cond = transtrn(cond,'x1',cats(x1));
cond = transtrn(cond,'x2',cats(x2));
expr = 'ifc(' || trim(cond) || ',' ||
trim(value_cond1_true) || ',' ||
trim(value_cond1_false) ||
')';
if not missing (condition1) then
flag = resolve ('%sysfunc(' || trim(expr) || ')');
else
flag = "X";
keep x1 x2 flag;
run;
If you are going to use the data to write code then take advantage of the power of the PUT statement.
data have;
infile cards dsd truncover;
input x1 x2 condition1 $ value_cond1_true $ value_cond1_false $ ;
cards;
1,5,x1>x2,A,B
6,5,x2>x1,D,A
3,2,,C,D
;
filename code temp;
data _null_;
set have;
file code;
if condition1 ne ' ';
put
'if _n_=' _n_ 'then flag=ifc(' condition1
',' value_cond1_true :$quote.
',' value_cond1_false :$quote.
');'
;
run;
data want;
set have;
length flag $8 ;
flag='X';
%include code / source2;
run;
Results:
84 data want;
85 set have;
86 length flag $8 ;
87 flag='X';
88 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file ...\#LN00059.
89 +if _n_=1 then flag=ifc(x1>x2 ,"A" ,"B" );
90 +if _n_=2 then flag=ifc(x2>x1 ,"D" ,"A" );
NOTE: %INCLUDE (level 1) ending.
91 run;
NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 3 observations and 6 variables.
Related
I would like to categorize variables from one table which looks like this:
Var1 Var2
19 0.2
30 0.1
45 0.2
With table that stores conditions for the categroziation
variable condition category
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
And the result of that would be a new table created containing categories of variables based on first table:
Var1 Var2
1 2
2 1
3 2
This is just a duplicate of this previous question. Categorize variables basing on conditions from other data set
Code generation from data is much easier to create and debug if you just use SAs code to do it and not add in complications of macro code.
Here is the answer again in more detail. First let's make your example data printouts into actual SAS datasets.
data rawdata ;
input Var1 Var2;
cards;
19 0.2
30 0.1
45 0.2
;
data metadata ;
input variable :$32. condition :$200. category ;
cards;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now let's generate an SQL select statement with a CASE statement to generate each output variable from the metadata.
filename code temp;
data _null_;
set metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table want as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put 'from rawdata' / ';' ;
sep=',' ;
run;
And run it.
proc sql;
%include code / source2 ;
quit;
Example SAS LOG:
1639 proc sql;
1640 %include code / source2 ;
NOTE: %INCLUDE (level 1) file CODE is file C:\Users\xxx\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00654.
1641 +create table want as select
1642 + case
1643 + when (Var1<20 ) then 1
1644 + when (40>Var1>=20 ) then 2
1645 + when (Var1>=40 ) then 3
1646 + else . end as Var1
1647 +,case
1648 + when (Var2<0.2 ) then 1
1649 + when (Var2>=0.2 ) then 2
1650 + else . end as Var2
1651 +from rawdata
1652 +;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
Results:
Obs Var1 Var2
1 1 2
2 2 1
3 3 2
If you want to convert it to macro then just replace the hard coded input dataset names and output dataset names with macro variable references.
%macro gencat(indata=,outdata=,metadata=metadata);
filename code temp;
data _null_;
set &metadata end=eof;
by variable ;
file code ;
retain sep ' ';
if _n_=1 then put "create table &outdata as select";
if first.variable then put sep $1. 'case ';
put ' when (' condition ') then ' category ;
if last.variable then put ' else . end as ' variable ;
if eof then put "from &indata" / ';' ;
sep=',' ;
run;
proc sql;
%include code / nosource2 ;
quit;
%mend gencat;
So now the same result is gotten by calling with these values:
%gencat(indata=rawdata,outdata=want)
So the log now looks like this:
1783 %gencat(indata=rawdata,outdata=want)
MPRINT(GENCAT): filename code temp;
NOTE: PROCEDURE SQL used (Total process time):
real time 10.35 seconds
cpu time 0.20 seconds
MPRINT(GENCAT): data _null_;
MPRINT(GENCAT): set metadata end=eof;
MPRINT(GENCAT): by variable ;
MPRINT(GENCAT): file code ;
MPRINT(GENCAT): retain sep ' ';
MPRINT(GENCAT): if _n_=1 then put "create table want as select";
MPRINT(GENCAT): if first.variable then put sep $1. 'case ';
MPRINT(GENCAT): put ' when (' condition ') then ' category ;
MPRINT(GENCAT): if last.variable then put ' else . end as ' variable ;
MPRINT(GENCAT): if eof then put "from rawdata" / ';' ;
MPRINT(GENCAT): sep=',' ;
MPRINT(GENCAT): run;
NOTE: The file CODE is:
Filename=C:\Users\AppData\Local\Temp\1\SAS Temporary Files\_TD13724_AMRL20B7F00CGPP_\#LN00659,
RECFM=V,LRECL=32767,File Size (bytes)=0,
Last Modified=02Feb2018:12:36:39,
Create Time=02Feb2018:12:36:39
NOTE: 12 records were written to the file CODE.
The minimum record length was 1.
The maximum record length was 28.
NOTE: There were 5 observations read from the data set WORK.METADATA.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
MPRINT(GENCAT): proc sql;
MPRINT(GENCAT): create table want as select case when (Var1<20 ) then 1 when (40>Var1>=20 ) then 2 when (Var1>=40 ) then 3 else .
end as Var1 ,case when (Var2<0.2 ) then 1 when (Var2>=0.2 ) then 2 else . end as Var2 from rawdata ;
NOTE: Table WORK.WANT created, with 3 rows and 2 columns.
MPRINT(GENCAT): quit;
Here is a macro way to accomplish this. It assumes that the conditions in the table are in the order you want them applied and grouped by variable. If not, then sort the table appropriately.
First test data:
data have;
input Var1 Var2;
datalines;
19 0.2
30 0.1
45 0.2
;
data conditions;
informat variable condition $32.;
input variable $ condition $ category;
datalines;
Var1 Var1<20 1
Var1 40>Var1>=20 2
Var1 Var1>=40 3
Var2 Var2<0.2 1
Var2 Var2>=0.2 2
;
Now make a macro. We will read the table into macro variables and then write a datastep to apply them. We use IF/THEN/ELSE blocks for each variable.
%macro apply_conditions();
%local i j n;
proc sql noprint;
select count(*) into :n trimmed from conditions;
%do i=1 %to &n;
%local var&i;
%local condition&i;
%local category&i;
%end;
select variable, condition, category
into :var1 - :var&n,
:condition1 - :condition&n,
:category1 - :category&n
from conditions;
quit;
data want;
set have;
%do i=1 %to &n;
/*If the variable changes, then don't add the ELSE */
%if &i>1 %then %do;
%let j=%eval(&i-1);
%if &&var&i = &&var&j %then %do;
else
%end;
%end;
/*apply the condition*/
if &&condition&i then
&&var&i = &&category&i;
%end;
run;
%mend;
Finally run the macro. Using MPRINT to see the code that is generated.
options mprint;
%apply_conditions;
I have a SAS table with a lot of missing values. This is only a simple example.
The real table is much bigger (>1000 rows) and the numbers is not the same. But what is the same is that I have a column a that have no missing numbers. Column b and c have a sequence that is shorter than the length of a.
a b c
1 1b 1000
2 2b 2000
3 3b
4
5
6
7
What I want is to fill b an c with repeating the sequences until they columns are full. The result should look like this:
a b c
1 1b 1000
2 2b 2000
3 3b 1000
4 1b 2000
5 2b 1000
6 3b 2000
7 1b 1000
I have tried to make a macro but it become to messy.
The hash-of-hashes solution is the most flexible here, I suspect.
data have;
infile datalines delimiter="|";
input a b $ c;
datalines;
1|1b|1000
2|2b|2000
3|3b|
4| |
5| |
6| |
7| |
;;;;
run;
%let vars=b c;
data want;
set have;
rownum = _n_;
if _n_=1 then do;
declare hash hoh(ordered:'a');
declare hiter hih('hoh');
hoh.defineKey('varname');
hoh.defineData('varname','hh');
hoh.defineDone();
declare hash hh();
do varnum = 1 to countw("&vars.");
varname = scan("&vars",varnum);
hh = _new_ hash(ordered:'a');
hh.defineKey("rownum");
hh.defineData(varname);
hh.defineDone();
hoh.replace();
end;
end;
do rc=hih.next() by 0 while (rc=0);
if strip(vvaluex(varname)) in (" ",".") then do;
num_items = hh.num_items;
rowmod = mod(_n_-1,num_items)+1;
hh.find(key:rowmod);
end;
else do;
hh.replace();
end;
rc = hih.next();
end;
keep a &Vars.;
run;
Basically, one hash is built for each variable you are using. They're each added to the hash of hashes. Then we iterate over that, and search to see if the variable requested is populated. If it is then we add it to its hash. If it isn't then we retrieve the appropriate one.
Assuming that you can tell how many rows to use for each variable by counting how many non-missing values are in the column then you could use this code generation technique to generate a data step that will use the POINT= option SET statements to cycle through the first Nx observations for variable X.
First get a list of the variable names;
proc transpose data=have(obs=0) out=names ;
var _all_;
run;
Then use those to generate a PROC SQL select statement to count the number of non-missing values for each variable.
filename code temp ;
data _null_;
set names end=eof ;
file code ;
if _n_=1 then put 'create table counts as select ' ;
else put ',' #;
put 'sum(not missing(' _name_ ')) as ' _name_ ;
if eof then put 'from have;' ;
run;
proc sql noprint;
%include code /source2 ;
quit;
Then transpose that so that again you have one row per variable name but this time it also has the counts in COL1.
proc transpose data=counts out=names ;
var _all_;
run;
Now use that to generate SET statements needed for a DATA step to create the output from the input.
filename code temp;
data _null_;
set names ;
file code ;
length pvar $32 ;
pvar = cats('_point',_n_);
put pvar '=mod(_n_-1,' col1 ')+1;' ;
put 'set have(keep=' _name_ ') point=' pvar ';' ;
run;
Now use the generated statements.
data want ;
set have(drop=_all_);
%include code / source2;
run;
So for your example data file with variables A, B and C and 7 total observations the LOG for the generated data step looks like this:
1229 data want ;
1230 set have(drop=_all_);
1231 %include code / source2;
NOTE: %INCLUDE (level 1) file CODE is file .../#LN00026.
1232 +_point1 =mod(_n_-1,7 )+1;
1233 +set have(keep=a ) point=_point1 ;
1234 +_point2 =mod(_n_-1,3 )+1;
1235 +set have(keep=b ) point=_point2 ;
1236 +_point3 =mod(_n_-1,2 )+1;
1237 +set have(keep=c ) point=_point3 ;
NOTE: %INCLUDE (level 1) ending.
1238 run;
NOTE: There were 7 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.
Populate a temporary array with the values, then check the row and add the appropriate value.
Setup the data
data have;
infile datalines delimiter="|";
input a b $ c;
datalines;
1|1b|1000
2|2b|2000
3|3b|
4| |
5| |
6| |
7| |
;
Get a count of the non-null values
proc sql noprint;
select count(*)
into :n_b
from have
where b ^= "";
select count(*)
into :n_c
from have
where c ^=.;
quit;
Now populate the missing values by repeating the contents of each array.
data want;
set have;
/*Temporary Arrays*/
array bvals[&n_b] $ 32 _temporary_;
array cvals[&n_c] _temporary_;
if _n_ <= &n_b then do;
/*Populate the b array*/
bvals[_n_] = b;
end;
else do;
/*Fill the missing values*/
b = bvals[mod(_n_+&n_b-1,&n_b)+1];
end;
if _n_ <= &n_c then do;
/*populate C values array*/
cvals[_n_] = c;
end;
else do;
/*fill in the missing C values*/
c = cvals[mod(_n_+&n_c-1,&n_c)+1];
end;
run;
data want;
set have;
n=mod(_n_,3);
if n=0 then b='3b';
else b=cats(n,'b');
if n in (1,0) then c=1000;
else c=2000;
drop n;
run;
I am trying to parse a delimited dataset with over 300 fields. Instead of listing all the input fields like
data test;
infile "delimited_filename.txt"
DSD delimiter="|" lrecl=32767 STOPOVER;
input field_A:$200.
field_B :$200.
field_C:$200.
/*continues on */
;
I am thinking I can dump all the field names into a file, read in as a sas dataset, and populate the input fields - this also gives me the dynamic control if any of the field names changes (add/remove) in the dataset. What would be some good ways to accomplish this?
Thank you very much - I just started sas, still trying to wrap my head around it.
This worked for me - Basically "write" data open code using macro language and run it.
Note: my indata_header_file contains 5 columns: Variable_Name, Variable_Length, Variable_Type, Variable_Label, and Notes.
%macro ReadDsFromFile(filename_to_process, indata_header_file, out_dsname);
%local filename_to_process indata_header_file out_dsname;
/* This macro var contain code to read data file*/
%local read_code input_in_line;
%put *** Processing file: &filename_to_process ...;
/* Read in the header file */
proc import OUT = ds_header
DATAFILE = &indata_header_file.
DBMS = EXCEL REPLACE; /* REPLACE flag */
SHEET = "Names";
GETNAMES = YES;
MIXED = NO;
SCANTEXT = YES;
run;
%let id = %sysfunc(open(ds_header));
%let NOBS = %sysfunc(attrn(&id.,NOBS));
%syscall set(id);
/*
Generates:
data &out_dsname.;
infile "&filename_to_process."
DSD delimiter="|" lrecl=32767 STOPOVER FIRSTOBS=3;
input
'7C'x
*/
%let read_code = data &out_dsname. %str(;)
infile &filename_to_process.
DSD delimiter=%str("|") lrecl=32767 STOPOVER %str(;)
input ;
/*
Generates:
<field_name> : $<field_length>;
*/
%do i = 1 %to &NObs;
%let rc = %sysfunc(fetchobs(&id., &i));
%let VAR_NAME = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Name)) ));
%let VAR_LENGTH = %sysfunc(getvarn(&id., %sysfunc(varnum(&id., Variable_Length)) ));
%let VAR_TYPE = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Type)) ));
%let VAR_LABEL = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Variable_Label)) ));
%let VAR_NOTES = %sysfunc(getvarc(&id., %sysfunc(varnum(&id., Notes)) ));
%if %upcase(%trim(&VAR_TYPE.)) eq CHAR %then
%let input_in_line = &VAR_NAME :$&VAR_LENGTH..;
%else
%let input_in_line = &VAR_NAME :&VAR_LENGTH.;
/* append in_line statment to main macro var*/
%let read_code = &read_code. &input_in_line. ;
%end;
/* Close the fid */
%let rc = %sysfunc(close(&id));
%let read_code = &read_code. %str(;) run %str(;) ;
/* Run the generated code*/
&read_code.
%mend ReadDsFromFile;
Sounds like you want to generate code based on metadata. A data step is actually a lot easier to code and debug than a macro.
Let's assume you have metadata that describes the input data. For example let's use the metadata about the SASHELP.CARS. We can build our metadata from the existing DICTIONARY.COLUMNS metadata on the existing dataset. Let's set the INFORMAT to the FORMAT since that table does not have INFORMAT value assigned.
proc sql noprint ;
create table varlist as
select memname,varnum,name,type,length,format,format as informat,label
from dictionary.columns
where libname='SASHELP' and memname='CARS'
;
quit;
Now let's make a sample text file with the data in it.
filename mydata temp;
data _null_;
set sashelp.cars ;
file mydata dsd ;
put (_all_) (:);
run;
Now we just need to use the metadata to write a program that could read that data. All we really need to do is define the variables and then add a simple INPUT firstvar -- lastvar statement to read the data.
filename code temp;
data _null_;
set varlist end=eof ;
by varnum ;
file code ;
if _n_=1 then do ;
firstvar=name ;
retain firstvar ;
put 'data ' memname ';'
/ ' infile mydata dsd truncover lrecl=1000000;'
;
end;
put ' attrib ' name 'length=' #;
if type = 'char' then put '$'# ;
put length ;
if informat ne ' ' then put #10 informat= ;
if format ne ' ' then put #10 format= ;
if label ne ' ' then put #10 label= :$quote. ;
put ' ;' ;
if eof then do ;
put ' input ' firstvar '-- ' name ';' ;
put 'run;' ;
end;
run;
Now we can just run the generated code using %INCLUDE.
%include code / source2 ;
I have a data set with a variable named "Condition" that I want to use in the code. I'm guessing I need to do it in a macro but I'm still learning how to write macros in SAS.
So if my data set is this:
Question,Answer,Condition,Result
Q1,1,Answer=1," "
Q2,2,Answer=1," "
Q3,3,Answer=4," "
Then I want the program to take the Condition variable as a string and then use it as an if statement:
if Condition then Result = "Correct";
Is this possible?
That is not easy to do. For your simple example you could do:
data want ;
set have ;
if cats('Answer=',answer) = condition then ....
But that will not generalize to situations where CONDITION references the values of other variables. You might be able to generate code from a set of unique values of CONDITION.
Sample data:
data have ;
infile cards dsd truncover ;
input Question $ Answer Condition :$30. Expected $ ;
cards;
Q1,1,Answer=1,"Correct"
Q2,2,Answer=1,"Wrong"
Q3,3,Answer=4,"Wrong"
;;;;
Generate code using unique values of CONDITION.
filename code temp ;
data _null_;
set have end=eof ;
by condition ;
file code ;
if _n_=1 then put 'SELECT ;' ;
if first.condition then put ' WHEN (' CONDITION= :$quote. ' AND (' condition ')) RESULT="CORRECT" ;' ;
if eof then put ' OTHERWISE RESULT="WRONG";'
/ 'END;'
;
run;
Use the generated code in a data step.
data want ;
set have ;
%inc code / source2;
run;
Sample Log records.;
252 data want ;
253 set have ;
254 %inc code / source2;
255 +SELECT ;
256 + WHEN (Condition="Answer=1" AND (Answer=1 )) RESULT="CORRECT" ;
257 + WHEN (Condition="Answer=4" AND (Answer=4 )) RESULT="CORRECT" ;
258 + OTHERWISE RESULT="WRONG";
259 +END;
NOTE: %INCLUDE (level 1) ending.
260 run;
I have a SAS dataset which contains one column of polynomials. For example, X1**(-2)+X1**(2).
Is there a function to transform this into a numeric expression?
Many thanks,
If I understand you correctly, I don't think there is a specific function that will easily let you do this. You have two options - write your own logic to interpret the polynomial expressions, or use call execute to have SAS write out a (potentially very long) data step for you, assuming that the polynomials are all entered as valid data step code. Here's a call execute approach:
data have;
input x1 polynomial $255.;
infile datalines truncover;
datalines;
1 X1**(-2)+X1**(2)
2 X1**(-1)+X1**(1)
3 X1**(1)+X1**(-1)
;
run;
data _null_;
set have end = eof;
if _n_ = 1 then call execute('data want; set have; select(_n_);');
call execute(catx(' ','when(',_N_,') y =',polynomial,';'));
if eof then call execute('end; run;');
run;
Convert them to macro variables, and then resolve them into a calculation...
Using the dataset example in user667489's answer :
/* Create numbered macro variables, 1 per row of data */
data _null_ ;
set have end=eof ;
call symputx(cats('POLY',_n_),polynomial) ;
if eof then call symputx('POLYN',_n_) ;
run ;
%MACRO ROWLOOPER ;
%DO N = 1 %TO &POLYN ;
if _n_ = &N then result = &&POLY&N ;
%END ;
%MEND ;
data want ;
set have ;
/* Not very efficient, looping over all polynomials on each row of data */
/* So for 3 rows, you'll perform 9 iterations here */
%ROWLOOPER ;
run ;
Or, alternatively, write your dataset out into a SAS program, and %inc that program :
data _null_ ;
file "polynomials.sas" ;
set have end=eof ;
if _n_ = 1 then do ;
put "data poly;" ;
put " set have;" ;
end ;
put " result = " polynomial ";" ;
if eof then put "run;" ;
run ;
%inc "polynomials.sas" ;