print warning to log if then else statement sas - if-statement

I am writing an if/then/else statement, where the final else is :
if variable2 = 'foo' then variable = 'bar'
else variable = .
Can I print a custom 'warning' to the log file that has a list or array of the variable2 names where
variable = .

You can use the PUTLOG statement to write messages to the log.
if variable2 = 'foo' then variable = 'bar' ;
else do;
variable = . ;
putlog "WARNING: bad value " variable2 = ;
end ;

It looks like the OP wanted a single message with all the relevant variable names listed. There are various ways to do that... the easiest would be a quick PROC SQL on the output data set after this step finishes:
proc sql noprint;
select distinct variable2 into :bad_vals separated by ' '
from my_data
where variable = .
;
quit;
%put WARNING: Bad values of VARIABLE: &bad_vals;
This would be fine for relatively small data sets; for big data sets you could avoid the extra pass through the data by maintaining a list of relevant values while the initial data step is running, and just print the message once at the end of the data step.
data mydata;
length bad_vals $ 10000;
drop bad_vals;
set in end=end;
...blah...
...else do;
variable = .;
bad_vals = strip(bad_vals) || ' ' || variable2;
end;
if end then do;
putlog 'WARNING: Bad values of VARIABLE:' bad_vals;
end;
run;
or you could use a macro var instead of the data step var bad_vals, etc.

Related

SAS Append datasets only if they exist

I have many datasets for each month with the same name, changing just the end with specific month so for instance my datasets that i am calling with this code:
TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
are called "TEMPCAAD.LIFT_MODEL_V1_202021", "TEMPCAAD.LIFT_MODEL_V1_202022" and so on...
I am trying to append all datasets but some of them doesn't exist, so when i run the following code I get the error
Dataset "TEMPCAAD.LIFT_MODEL_V1_202022" does not exist.
%let currentmonth = &anomes_scores;
%let previousyearmonth = &anomes_x12;
data _null_;
length string $1000;
cur_month = input("&previousyearmonth.01",yymmdd8.);
do until (cur_month > input("&currentmonth.01",yymmdd8.));
string = catx(' ',trim(string),'TEMPCAAD.LIFT_&NOME_MODELO._&VERSAO_MODELO._'!! put(cur_month,yymmn6.));
cur_month = intnx('month',cur_month,1,'b');
end;
call symput('mydatasets',trim(string));
%put &mydatasets;
run;
data WORK.LIFTS_U6M;
set &mydatasets.;
run;
How can I append only existing datasets?
Instead of looping on every file to see whether it exist or not, why don't you just extract all the dataset names from dictionary.tables?
libname TEMPCAAD "/home/kermit/TEMPCAAD";
data tempcaad.lift_model_v1_202110 tempcaad.lift_model_v1_202111 tempcaad.lift_model_v1_202112;
id = 1;
output tempcaad.lift_model_v1_202110;
id = 2;
output tempcaad.lift_model_v1_202111;
id = 3;
output tempcaad.lift_model_v1_202112;
run;
%let nome_modelo = MODEL;
%let versao_modelo = V1;
proc sql;
select strip("TEMPCAAD."||memname) into :dataset separated by " "
from dictionary.tables
where libname="TEMPCAAD" and memname like "LIFT_&NOME_MODELO._&VERSAO_MODELO.%";
quit;
data want;
set &dataset.;
run;
You can easily tweak the where statement to only extract the data that you wish to append. Just remember to put double quotes if you specify a macro-variable in it.

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

Print values in a PROC IML

This is a dataset that I am using.
data have;
input name $char17.; datalines;
Abdallah
Abou Hanna
Afonso
Angre
Audepart
Bah Aicha
Baudras
Berthelot
;
This consists of just one variable which has some names in it. I have to create a macro in SAS which I have named RAN_NAMES. I have to enter a NUM and then it will print that amount of names. The problem that I am facing is that I have to print the names using a macro variable '&NAME' using a DO loop in different lines. And with them I have to print the dates.
So I don't know how to create a macro character variable because inside IML only a matrix is returned.
%MACRO RAN_NAMES(NUM);
PROC IML;
varnms = {"nom"};
USE WORKK.NOMS;
READ ALL VAR varnms INTO NAMES;
CLOSE WORKK.NOMS;
N = NROW(NAMES);
IF &NUM. > N THEN DO; PRINT "Maximum number of names exceeded. You can enter more than 41"; ABORT; END;
ELSE DO;
SAM = SAMPLE(NAMES, &NUM.);
END;
QUIT;
%MEND RAN_NAMES;

SAS Array <array-elements> to jump by 10

I want to achieve the same output but instead of harcoding each of the array-element use something like var1 - var10 but that would jump by 10 like decades.
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
/* here and below something like:
array pop_diff_amount {10} pop_diff_amount_1920 -- pop_diff_amount_2010;*/
array pop_diff_amount {10} pop_diff_amount_1920 pop_diff_amount_1930
pop_diff_amount_1940 pop_diff_amount_1950
pop_diff_amount_1960 pop_diff_amount_1970
pop_diff_amount_1980 pop_diff_amount_1990
pop_diff_amount_2000 pop_diff_amount_2010;
array pop_diff_prcnt {10} pop_diff_prcnt_1920 pop_diff_prcnt_1930
pop_diff_prcnt_1940 pop_diff_prcnt_1950
pop_diff_prcnt_1960 pop_diff_prcnt_1970
pop_diff_prcnt_1980 pop_diff_prcnt_1990
pop_diff_prcnt_2000 pop_diff_prcnt_2010;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
I am still beginner in it therefore I am not sure is this possible or easy to achieve.
Thanks!
Not automatic but not all that difficult either. First create a data set of the names then transpose and use an unexecuted set to bring in the names and then define arrays. Note how arrays are define using [*] and name: as you did with population_array.
data names;
do type = 'Amount','Prcnt';
do year=1920 to 2010 by 10;
length _name_ $32;
_name_ = catx('_','pop_diff',type,year);
output;
end;
end;
run;
proc print;
run;
proc transpose data=names out=pop_diff(drop=_name_);
var;
run;
proc contents varnum;
run;
data pop;
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
if 0 then set pop_diff;
array pop_diff_amount[*] pop_diff_amount:;
array pop_diff_prcnt[*] pop_diff_prcnt:;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
run;
proc print data=pop;
run;
SAS is automatically going to increment the array elements by 1. Here is an alternative solution that creates the variables using one extra step to create a set of macro variables that hold the desired variable names. Since you are basing them off of the variable POPULATION_<year>, we will simply grab the years from those variable names, create the variable names for the arrays that we want, and store them into a few macro variables.
proc sql noprint;
select cats('pop_diff_amount_', scan(name, -1, '_') )
, cats('pop_diff_prcnt_', scan(name, -1, '_') )
into :pop_diff_amount_vars separated by ' '
, :pop_diff_prcnt_vars separated by ' '
from dictionary.columns
where libname = 'SASHELP'
AND memname = 'US_DATA'
AND upcase(name) LIKE 'POPULATION_%'
;
quit;
data work.test(keep= statename pop_diff:);
set sashelp.us_data(keep=STATENAME POPULATION:);
array population_array {*} POPULATION_1910 -- POPULATION_2010;
dimp = dim(population_array);
array pop_diff_amount {*} &pop_diff_amount_vars.;
array pop_diff_prcnt {*} &pop_diff_prcnt_vars.;
do i=1 to dim(population_array) - 1;
pop_diff_amount{i} = population_array{i+1} - population_array{i};
pop_diff_prcnt{i} = (population_array{i+1} / population_array{i} -1) * 100;
end;
RUN;
Getting the data out of the meta data (create variable year) would make coding life easier.
proc transpose data=sashelp.us_data out=us_pop(rename=(col1=Population));
by statename;
var population_:;
run;
data us_pop;
set us_pop;
by statename;
year = input(scan(_name_,-1,'_'),4.);
pop_diff_amount=dif(population);
pop_diff_prcnt =(population/lag(population))-1;
format pop_diff_prcnt percent10.2;
if first.statename then call missing(of pop_diff_amount pop_diff_prcnt);
drop _:;
run;
proc print data=us_pop(obs=10);
run;

How to calculate a mean for the non zero values using proc means or proc summary

I want to have a mean which is based in non zero values for given variables using proc means only.
I know we do can calculate using proc sql, but I want to get it done through proc means or proc summary.
In my study I have 8 variables, so how can I calculate mean based on non zero values where in I am using all of those in the var statement as below:
proc means = xyz;
var var1 var2 var3 var4 var5 var6 var7 var8;
run;
If we take one variable at a time in the var statement and use a where condition for non zero variables , it works but can we have something which would work for all the variables of interest mentioned in the var statement?
Your suggestions would be highly appreciated.
Thank you !
One method is to change all of your zero values to missing, and then use PROC MEANS.
data zeromiss /view=zeromiss ;
set xyz ;
array n{*} var1-var8 ;
do i = 1 to dim(n) ;
if n{i} = 0 then call missing(n{i}) ;
end ;
drop i ;
run ;
proc means data=zeromiss ;
var var1-var8 ;
run ;
Create a view of your input dataset. In the view, define a weight variable for each variable you want to summarise. Set the weight to 0 if the corresponding variable is 0 and 1 otherwise. Then do a weighted summary via proc means / proc summary. E.g.
data xyz_v /view = xyz_v;
set xyz;
array weights {*} weight_var1-weight_var8;
array vars {*} var1-var8;
do i = 1 to dim(vars);
weights[i] = (vars[i] ne 0);
end;
run;
%macro weighted_var(n);
%do i = 1 to &n;
var var&i /weight = weight_var&i;
%end;
%mend weighted_var;
proc means data = xyz_v;
%weighted_var(8);
run;
This is less elegant than Chris J's solution for this specific problem, but it generalises slightly better to other situations where you want to apply different weightings to different variables in the same summary.
Can't you use a data statement?
data lala;
set xyz;
drop qty;
mean = 0;
qty = 0;
if(not missing(var1) and var1 ^= 0) then do;
mean + var1;
qty + 1;
end;
if(not missing(var2) and var2 ^= 0) then do;
mean + var2;
qty + 1;
end;
/* ... repeat to all variables ... */
if(not missing(var8) and var8 ^= 0) then do;
mean + var8;
qty + 1;
end;
mean = mean/qty;
run;
If you want to keep the mean in the same xyz dataset, just replace lala with xyz.