Why in SAS data step where works but not if - sas

Working code:
data t2;
set t1;
where a like "%SR";
run;
Code errored:
data t2;
set t1;
if a like "%SR";
run;
Error message:
ERROR 388-185: Expecting an arithmetic operator.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
It complained about 'like'
Any ideal?

LIKE is not an operator that SAS code understands. The only reason it works in WHERE is because WHERE statement supports SQL syntax such as LIKE and BETWEEN to make it easier to push the WHERE condition into a remote database.
Use some other way to test if the last two letters are SR. Here are two methods.
if 'SR' = substrn(a,length(a)-1);
if 'RS' =: left(reverse((a)) ;

The most similar solution is to use prxmatch:
data t2;
set t1;
if prxmatch("/.*SR/ios",a);
run;
Note that this is much slower than WHERE with LIKE, and Tom's solutions are faster if there's a reasonable way to do them (as there is in the example).

The LIKE operator is not understood by the DATA Step IF statement.
LIKE is available to DATA Step in the WHERE statement, the WHERE= data set option, or PROC SQL WHERE clause.
data have;
input text $CHAR20.;
datalines;
ABCEFG
YESSR
Mark JR
Mark SR
;
data want;
set have;
where text like '%SR'; /* where statement */
run;
data want;
set have(where=(text like '%SR')); /* where= option */
run;
proc sql;
create table want as
select text from have
where text like '%SR' /* where clause */
;

Related

SAS: proc reg and macro

i have a data that contain 30 variable and 2000 Observations.
I want to calculate regression in a loop, whan in each step I delete the i row in the data.
so in the end I need thet my output will be 2001 regrsion, one for the regrsion on all the data end 2000 on each time thet I drop a row.
I am new to sas, and I tray to find how to do it withe macro, but I didn't understand.
Any comments and help will be appreciated!
This will create the data set I was talking about in my comment to Chris.
data del1V /view=del1v;
length group _obs_ 8;
set sashelp.class nobs=nobs;
_obs_ = _n_;
group=0;
output;
do group=1 to nobs;
if group eq _n_ then;
else output;
end;
run;
proc sort out=analysis;
by group;
run;
DATA NEW;
DATA OLD;
do i = 1 to 2001;
IF _N_ ^= i THEN group=i;
else group=.;
output;
end;
proc sort data=new;
by group;
proc reg syntax;
by group;
run;
This will create a data set that is much longer. You will only call proc reg once, but it will run 2001 models.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. SAS can create a data set with the GROUP column to differentiate the results.
I edited my original answer per #data null suggestion. I agree that the above is probably faster, though I'm not as confident that it would be 100x faster. I do not know enough about the costs of the overhead of proc reg versus the cost of the group by statement and a larger data set. Regardless the answer above is simpler programming. Here is my original answer/alternate approach.
You can do this within a macro program. It will have this general structure:
%macro regress;
%do i=1 %to 2001;
DATA NEW;
DATA OLD;
IF _N_=&I THEN DELETE;
RUN;
proc reg syntax;
run;
%end;
%mend;
%regress
Macros are an advanced programming function in SAS. The macro program is required in order to do a loop of proc reg. The %'s are indicative of macro functions. &i is a macro variable (& is the prefix of a macro variable that is being called). The macro is created in a block that starts and ends with %macro / %mend, and called by %regress.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. Use &i to create a different data set each time and then append together as part of the macro loop.

List all the output created by a SAS step

Is there a way to get a list of all the outputs (datasets/files) created by a step(iteration) in SAS?
I tried using the automatic variables but all that I could get was the last created dataset using &syslast and &sysdsn variables. But what if a data step creates multiple datasets? How can I get their names/details automatically in SAS without using any list, etc keywords? Is there a way possible?
Please Suggest!
Thank you!
I don't believe this is possible. The only way I can think of is to parse the log following your data step / iteration.
For this you can use something like:
/* set up a fresh log prior to your iteration */
%let logloc=%sysfunc(pathname(work))/mylog.txt;
proc printto log="&logloc" new;
run;
/* run your iteration */
data mystep with lots of output datasets;
set something;
run;
/* return to normal logging */
proc printto log=log;
run;
data _null_;
infile "&logloc";
input;
if _infile_=:'data' then do;
/* perform log scanning */
/* will likely need some complex logic to be robust!*/
end;
run;
PROC SCAPROC will report this in the log, with the caveat that you have to run the process first and then you'll get the output.

Automating IF and then statement in sas using macro in SAS

I have a data where I have various types of loan descriptions, there are at least 100 of them.
I have to categorise them into various buckets using if and then function. Please have a look at the data for reference
data des;
set desc;
if loan_desc in ('home_loan','auto_loan')then product_summary ='Loan';
if loan_desc in ('Multi') then product_summary='Multi options';
run;
For illustration I have shown it just for two loan description, but i have around 1000 of different loan_descr that I need to categorise into different buckets.
How can I categorise these loan descriptions in different buckets without writing the product summary and the loan_desc again and again in the code which is making it very lengthy and time consuming
Please help!
Another option for categorizing is using a format. This example uses a manual statement, but you can also create a format from a dataset if you have the to/from values in a dataset. As indicated by #Tom this allows you to change only the table and the code stays the same for future changes.
One note regarding your current code, you're using If/Then rather than If/ElseIf. You should use If/ElseIf because then it terminates as soon as one condition is met, rather than running through all options.
proc format;
value $ loan_fmt
'home_loan', 'auto_loan' = 'Loan'
'Multi' = 'Multi options';
run;
data want;
set have;
loan_desc = put(loan, $loan_fmt.);
run;
For a mapping exercise like this, the best technique is to use a mapping table. This is so the mappings can be changed without changing code, among other reasons.
A simple example is shown below:
/* create test data */
data desc (drop=x);
do x=1 to 3;
loan_desc='home_loan'; output;
loan_desc='auto_loan'; output;
loan_desc='Multi'; output;
loan_desc=''; output;
end;
data map;
loan_desc='home_loan'; product_summary ='Loan '; output;
loan_desc='auto_loan'; product_summary ='Loan'; output;
loan_desc='Multi'; product_summary='Multi options'; output;
run;
/* perform join */
proc sql;
create table des as
select a.*
,coalescec(b.product_summary,'UNMAPPED') as product_summary
from desc a
left join map b
on a.loan_desc=b.loan_desc;
There is no need to use the macro language for this task (I have updated the question tag accordingly).
Already good solutions have been proposed (I like #Reeza's proc format solution), but here's another route which also minimizes coding.
Generate sample data
data have;
loan_desc="home_loan"; output;
loan_desc="auto_loan"; output;
loan_desc="Multi"; output;
loan_desc=""; output;
run;
Using PROC SQL's case expression
This way doesn't allow, to my knowledge, having several criteria on a single when line, but it really simplifies coding since the resulting variable's name needs to be written down only once.
proc sql;
create table want as
select
loan_desc,
case loan_desc
when "home_loan" then "Loan"
when "auto_loan" then "Loan"
when "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;
Otherwise, using the following syntax is also possible, giving the same results:
proc sql;
create table want as
select
loan_desc,
case
when loan_desc in ("home_loan", "auto_loan") then "Loan"
when loan_desc = "Multi" then "Multi options"
else "Unknown"
end as product_summary
from have;
quit;

How does " into" work on proc sql?

This is PROC SQL. Could anyone explain what I am getting as output ? Thanks !
proc sql;
select time into :date from end_date;
quit;
In addition to Chris J's answer, the INTO clause has a very versatile functionality. The following resources will give you very good overview.
Essentially using the INTO clause you can create a macro variable which holds a lists of items seperated by a custom delimiter, create a whole host of macro variables inside a single PROC SQL procedure - a task which could take multiple DATA _NULL_ steps & PROC SORT\MEANS\FREQ steps etc...
It is the PROC SQL equivalent of using %let date = <some time value>; or inside a datastep
DATA _NULL_;
set end_date;
call symputx("date", time);
RUN;
Using the Magical Keyword "INTO:" in PROC SQL
SAS(R) 9.2 Macro Language: Reference: INTO Clause
It simply puts the result into a macro variable, in this case the macro variable 'DATE' contains the time value off the record in the dataset end_date.

How to remove Warning from SAS log?

i am getting a log warning stating
WARNING: 21 observations omitted due to missing ID values
i was transposing the dataset using this code:
PROC TRANSPOSE DATA= PT OUT= PT;
BY SOC_NM PT_NM;
ID TREATMENT;
VAR COUNT;
RUN;
i want to remove this warning from log.is there any option available in SAS for this
thank you for help.
You need to decide whether you are keeping the TREATMENT=' ' records or not. If you want to keep them, then you need to assign a nonmissing value to TREATMENT. If not, then the WHERE statement like vasja's answer will work.
Will adding WHERE clause do the job for you?
PROC TRANSPOSE DATA= PT OUT= PT;
BY SOC_NM PT_NM;
ID TREATMENT;
VAR COUNT;
WHERE NOT MISSING(TREATMENT);
RUN;
Before transposing, add this condition in the data step
if TREATMENT=. then TREATMENT=99;
after transposing, drop the variable "_99"
There's no option to remove warning messages from the log. If you really must keep your code as is then you can use PROC PRINTTO to temporarily divert the log output to an external file. However, this means you won't see anything in the log for that particular step, so it is not something I would recommend unless you are very sure of what you are doing. Check out the example code below, you'll see that only the steps creating tables a and c show in the log.
data a;
run;
proc printto log='c:\temp\temp.log';
run;
data b;
run;
proc printto;
run;
data c;
run;