using dataset variable values as IN clause values in SAS - sas

I have a dataset in which I store all the values to be used in IN clause elsewhere.
DATA INVALUES;
INPUT INVAL;
DATALINES;
1
2
3
;
RUN;
I have to use the invalues in another dataset as below.
DATA OUTPUT;
SET INPUT;
IF A IN ( --- INVAL values from dataset INVALUES ----);
;
RUN;
Could this be done in any way?

You can use a macro variable for this.
/* Put the inval values in a comma separated macro variable */
proc sql;
select inval into:inval separated by ","
from invalues;
/* prints the macro variable in the log */
%put &inval; /* 1,2,3 */
/* use the macrovariable in the IF statement */
DATA OUTPUT;
SET INPUT;
IF A IN &inval;
RUN;

A neater way is to use a subquery in SQL, no need to worry about the datatype/quoting.
proc sql ;
create table output as
select *
from input
where a in(select distinct inval from invalues) ;
quit ;

Related

Matching SAS character variables to a list

So I have a vector of search terms, and my main data set. My goal is to create an indicator for each observation in my main data set where variable1 includes at least one of the search terms. Both the search terms and variable1 are character variables.
Currently, I am trying to use a macro to iterate through the search terms, and for each search term, indicate if it is in the variable1. I do not care which search term triggered the match, I just care that there was a match (hence I only need 1 indicator variable at the end).
I am a novice when it comes to using SAS macros and loops, but have tried searching and piecing together code from some online sites, unfortunately, when I run it, it does nothing, not even give me an error.
I have put the code I am trying to run below.
*for example, I am just testing on one of the SASHELP data sets;
*I take the first five team names to create a search list;
data terms; set sashelp.baseball (obs=5);
search_term = substr(team,1,3);
keep search_term;;
run;
*I will be searching through the baseball data set;
data test; set sashelp.baseball;
run;
%macro search;
%local i name_list next_name;
proc SQL;
select distinct search_term into : name_list separated by ' ' from work.terms;
quit;
%let i=1;
%do %while (%scan(&name_list, &i) ne );
%let next_name = %scan(&name_list, &i);
*I think one of my issues is here. I try to loop through the list, and use the find command to find the next_name and if it is in the variable, then I should get a non-zero value returned;
data test; set test;
indicator = index(team,&next_name);
run;
%let i = %eval(&i + 1);
%end;
%mend;
Thanks
Here's the temporary array solution which is fully data driven.
Store the number of terms in a macro variable to assign the length of arrays
Load terms to search into a temporary array
Loop through for each word and search the terms
Exit loop if you find the term to help speed up the process
/*1*/
proc sql noprint;
select count(*) into :num_search_terms from terms;
quit;
%put &num_search_terms.;
data flagged;
*declare array;
array _search(&num_search_terms.) $ _temporary_;
/*2*/
*load array into memory;
if _n_ = 1 then do j=1 to &num_search_terms.;
set terms;
_search(j) = search_term;
end;
set test;
*set flag to 0 for initial start;
flag = 0;
/*3*/
*loop through and craete flag;
do i=1 to &num_search_terms. while(flag=0); /*4*/
if find(team, _search(i), 'it')>0 then flag=1;
end;
drop i j search_term ;
run;
Not sure I totally understand what you are trying to do but if you want to add a new binary variable that indicates if any of the substrings are found just use code like:
data want;
set have;
indicator = index(term,'string1') or index(term,'string2')
... or index(term,'string27') ;
run;
Not sure what a "vector" would be but if you had the list of terms in a dataset you could easily generate that code from the data. And then use %include to add it to your program.
filename code temp;
data _null_;
set term_list end=eof;
file code ;
if _n_ =1 then put 'indicator=' # ;
else put ' or ' #;
put 'index(term,' string :$quote. ')' #;
if eof then put ';' ;
run;
data want;
set have;
%include code / source2;
run;
If you did want to think about creating a macro to generate code like that then the parameters to the macro might be the two input dataset names, the two input variable names and the output variable name.

Print values in a PROC IML

This is a dataset that I am using.
data have;
input name $char17.; datalines;
Abdallah
Abou Hanna
Afonso
Angre
Audepart
Bah Aicha
Baudras
Berthelot
;
This consists of just one variable which has some names in it. I have to create a macro in SAS which I have named RAN_NAMES. I have to enter a NUM and then it will print that amount of names. The problem that I am facing is that I have to print the names using a macro variable '&NAME' using a DO loop in different lines. And with them I have to print the dates.
So I don't know how to create a macro character variable because inside IML only a matrix is returned.
%MACRO RAN_NAMES(NUM);
PROC IML;
varnms = {"nom"};
USE WORKK.NOMS;
READ ALL VAR varnms INTO NAMES;
CLOSE WORKK.NOMS;
N = NROW(NAMES);
IF &NUM. > N THEN DO; PRINT "Maximum number of names exceeded. You can enter more than 41"; ABORT; END;
ELSE DO;
SAM = SAMPLE(NAMES, &NUM.);
END;
QUIT;
%MEND RAN_NAMES;

Concatenating all variables in an observation in SAS

Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)
Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables
proc print data=sashelp.shoes (obs=10);
run;
proc sql;
select name into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
Any solution needs to be general purpose as it will run on disparate datasets with differently formatted variables.
You can manually loop over PDV variables of the data set, concatenating each formatted value retrieved with vvaluex. A hash can be used to track which variables of the data set to process. If you are comma separating values you will probably want to double quote formatted values that contain a comma.
data want;
set sashelp.cars indsname=_data;
if _n_ = 1 then do;
declare hash vars();
length _varnum 8 _varname $32;
vars.defineKey('_n_');
vars.defineData('_varname');
vars.defineDone();
_dsid = open(_data);
do _n_ = 1 to attrn(_dsid,'NVAR');
rc = vars.add(key:_n_,data:varname(_dsid,_n_));
end;
_dsid = close(_dsid);
call missing (of _:);
end;
format weight comma7.;
length allcat $32000 _vvx $32000;
do _n_ = 1 to vars.NUM_ITEMS;
vars.find();
_vvx = strip(vvaluex(_varname));
if index(_vvx,",") then _vvx = quote(strip(_vvx));
if _n_ = 1
then allcat = _vvx;
else allcat = cats(allcat,',',_vvx);
end;
drop _:;
run;
You can use import and export to csv file:
filename tem temp;
proc export data=sashelp.SHOES file=tem dbms=csv replace;
run;
data l;
length all $ 200;
infile tem truncover firstobs=2;
input all 1-200;
run;
P.S.
If you need concatenate only char, uou can create array of all CHARACTER columns in dataset, and just iterate thru:
data l;
length all $ 5000;
set sashelp.SHOES;
array ch [*] _CHARACTER_;
do i = 1 to dim(ch);
all=catx(',',all,ch[i]);
end;
run;
The PUT statement is the easiest way to do that. You don't need to know the variables names as you can use the _all_ variable list.
put (_all_) (+0);
It will honor the formats attached the variables and if you have used DSD option on the FILE statement then the result is a delimited list.
What is the ultimate goal of this exercise? If you want to create a file you can just write the file directly.
data _null_;
set sashelp.shoes(obs=3);
file 'myfile.csv' dsd ;
put (_all_) (+0);
run;
If you really do want to get that string into a dataset variable there is no need to invent some new function. Just take advantage of the PUT statements abilities by creating a file and then reading the lines from the file.
filename junk temp;
data _null_;
set sashelp.shoes(obs=3);
file junk dsd ;
put (_all_) (+0);
run;
data stuff ;
set sashelp.shoes(obs=3);
infile junk truncover ;
input all $5000.;
run;
You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.
filename junk temp;
data stuff;
set sashelp.shoes(obs=3);
file junk dsd lrecl=5000 ;
length all $5000;
put #1 (_all_) (+0) +(-2) ' ' #;
all = _file_;
output;
all=' ';
put #1 all $5000. #;
run;
Solution with vvalue and concat function (||):
It is similar with 'solution without catx' (the last one), but it is simplified by vvalue function instead put.
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for concat function (||)*/
proc sql;
select ('strip(vvalue('|| strip(name) ||'))') into :varstr4 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step to concat all variables*/
data stuff2;
format all $5000.;
set work.wocatx ;
all = &varstr4. ;
put all;
run;
Solution with catx:
proc print data=SASHELP.SHOES;
run;
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "SASHELP" and
memname = "SHOES";
quit;
data stuff();
format all $5000.;
set sashelp.shoes ;
all = catx(',',&varstr2.) ;
put all;
run;
If there isn't in dictionary.columns format, then in macro variable varstr2 will just name, if there is format, then when it would call in catx it will convert in format, that you need, for example,if variable is num type then put(Sales,DOLLAR12.), or if it char type then input function . You could add any conditions in select into if you need.
If there is no need of using of input function just change select:
ifc(strip(format) is missing,strip(name),'put('|| strip(name) ||','|| strip(format) ||')')
Solution without catx:
/*edit sashelp.shoes with missing values in Product as test-cases*/
proc sql noprint;
create table wocatx as
select * from SASHELP.SHOES;
update wocatx
set Product = '';
quit;
/*Macro variable for catx*/
proc sql;
select ifc(strip(format) is missing,strip(name),ifc(type='num','put('|| strip(name) ||','|| strip(format) ||')','input('|| strip(name) ||','|| strip(format) ||')')) into :varstr2 separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*data step with catx*/
data stuff;
format all $5000.;
set work.wocatx ;
all = catx(',',&varstr2.) ;
put all;
run;
/*Macro variable for concat function (||)*/
proc sql;
select ifc(strip(format) is missing,
'strip(' || strip(name) || ')',
'strip(put('|| strip(name) ||','|| strip(format) ||'))') into :varstr3 separated by "|| ',' ||"
from dictionary.columns
where libname = "WORK" and
memname = "WOCATX";
quit;
/*Data step without catx*/
data stuff1;
format all $5000.;
set work.wocatx ;
all = &varstr3. ;
put all;
run;
Result with catx and missing values:
Result without catx and with missing values:

How to write a concise list of variables in table of a freq when the variables are differentiated only by a suffix?

I have a dataset with some variables named sx for x = 1 to n.
Is it possible to write a freq which gives the same result as:
proc freq data=prova;
table s1 * s2 * s3 * ... * sn /list missing;
run;
but without listing all the names of the variables?
I would like an output like this:
S1 S2 S3 S4 Frequency
A 10
A E 100
A E J F 300
B 10
B E 100
B E J F 300
but with an istruction like this (which, of course, is invented):
proc freq data=prova;
table s1:sn /list missing;
run;
Why not just use PROC SUMMARY instead?
Here is an example using two variables from SASHELP.CARS.
So this is PROC FREQ code.
proc freq data=sashelp.cars;
where make in: ('A','B');
tables make*type / list;
run;
Here is way to get counts using PROC SUMMARY
proc summary missing nway data=sashelp.cars ;
where make in: ('A','B');
class make type ;
output out=want;
run;
proc print data=want ;
run;
If you need to calculate the percentages you can instead use the WAYS statement to get both the overall and the individual cell counts. And then add a data step to calculate the percentages.
proc summary missing data=sashelp.cars ;
where make in: ('A','B');
class make type ;
ways 0 2 ;
output out=want;
run;
data want ;
set want ;
retain total;
if _type_=0 then total=_freq_;
percent=100*_freq_/total;
run;
So if you have 10 variables you would use
ways 0 10 ;
class s1-s10 ;
If you just want to build up the string "S1*S2*..." then you could use a DO loop or a macro %DO loop and put the result into a macro variable.
data _null_;
length namelist $200;
do i=1 to 10;
namelist=catx('*',namelist,cats('S',i));
end;
call symputx('namelist',namelist);
run;
But here is an easy way to make such a macro variable from ANY variable list not just those with numeric suffixes.
First get the variables names into a dataset. PROC TRANSPOSE is a good way if you use the OBS=0 dataset option so that you only get the _NAME_ column.
proc transpose data=have(obs=0) ;
var s1-s10 ;
run;
Then use PROC SQL to stuff the names into a macro variable.
proc sql noprint;
select _name_
into :namelist separated by '*'
from &syslast
;
quit;
Then you can use the macro variable in your TABLES statement.
proc freq data=have ;
tables &namelist / list missing ;
run;
Car':
In short, no. There is no shortcut syntax for specifying a variable list that crosses dimension.
In long, yes -- if you create a surrogate variable that is an equivalent crossing.
Discussion
Sample data generator:
%macro have(top=5);
%local index;
data have;
%do index = 1 %to ⊤
do s&index = 1 to 2+ceil(3*ranuni(123));
%end;
array V s:;
do _n_ = 1 to 5*ranuni(123);
x = ceil(100*ranuni(123));
if ranuni(123) < 0.1 then do;
ix = ceil(&top*ranuni(123));
h = V(ix);
V(ix) = .;
output;
V(ix) = h;
end;
else
output;
end;
%do index = 1 %to &top;
end;
%end;
run;
%mend;
%have;
As you probably noticed table s: created one freq per s* variable.
For example:
title "One table per variable";
proc freq data=have;
tables s: / list missing ;
run;
There is no shortcut syntax for specifying a variable list that crosses dimension.
NOTE: If you specify out=, the column names in the output data set will be the last variable in the level. So for above, the out= table will have a column "s5", but contain counts corresponding to combinations for each s1 through s5.
At each dimensional level you can use a variable list, as in level1 * (sublev:) * leaf. The same caveat for out= data applies.
Now, reconsider the original request discretely (no-shortcut) crossing all the s* variables:
title "1 table - 5 columns of crossings";
proc freq data=have;
tables s1*s2*s3*s4*s5 / list missing out=outEach;
run;
And, compare to what happens when a data step view uses a variable list to compute a surrogate value corresponding to the discrete combinations reported above.
data haveV / view=haveV;
set have;
crossing = catx(' * ', of s:); * concatenation of all the s variables;
keep crossing;
run;
title "1 table - 1 column of concatenated crossings";
proc freq data=haveV;
tables crossing / list missing out=outCat;
run;
Reality check with COMPARE, I don't trust eyeballs. If zero rows with differences (per noequal) then the out= data sets have identical counts.
proc compare noprint base=outEach compare=outCat out=diffs outnoequal;
var count;
run;
----- Log -----
NOTE: There were 31 observations read from the data set WORK.OUTEACH.
NOTE: There were 31 observations read from the data set WORK.OUTCAT.
NOTE: The data set WORK.DIFFS has 0 observations and 3 variables.
NOTE: PROCEDURE COMPARE used (Total process time)

How to scan a numeric variable

I have a table like this:
Lista_ID 1 4 7 10 ...
in total there are 100 numbers.
I want to call each one of these numbers to a macro i created. I was trying to use 'scan' but read that it's just for character variables.
the error when i runned the following code was
there's the code:
proc sql;
select ID INTO: LISTA_ID SEPARATED BY '*' from
WORK.AMOSTRA;
run;
PROC SQL;
SELECT COUNT(*) INTO: NR SEPARATED BY '*' FROM
WORK.AMOSTRA;
RUN;
%MACRO CICLO_teste();
%LET LIM_MSISDN = %EVAL(NR);
%LET I = %EVAL(1);
%DO %WHILE (&I<= &LIM_MSISDN);
%LET REF = %SCAN(LISTA_ID,&I,,'*');
DATA WORK.UP&REF;
SET WORK.BASE&REF;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%LET I = %EVAL(&I+1);
%END;
%MEND;
%CICLO_TESTE;
the error was that:
VARIABLE PERC IS UNITIALIZED and
VARIABLE FIRST.ID_CLIENTE IS UNITIALIZED.
What I want is to run this macro for each one of the Id's in the List I showed before, and that are referenced in work.base&ref and work.up&ref.
How can I do it? What I'm doing wrong?
thanks!
Here's the CALL EXECUTE version.
%MACRO CICLO_teste(REF);
DATA WORK.UP&REF;
SET WORK.BASE&REF;
BY ID_CLIENTE;
FORMAT PERC_ACUM 9.3;
IF FIRST.ID_CLIENTE THEN PERC_ACUM=0;
PERC_ACUM+PERC;
RUN;
%CICLO_TESTE;
DATA _NULL_;
SET amostra;
*CREATE YOUR MACRO CALL;
STR = CATT('%CLIO_TESTE(', ID, ')');
CALL EXECUTE(STR);
RUN;
First you should note that SAS macro variable resolve is intrinsically a "text-based" copy-paste action. That is, all the user-defined macro variables are texts. Therefore, %eval is unnecessary in this case.
Other miscellaneous corrections include:
Check the %scan() function for correct usage. The first argument should be a text string WITHOUT QUOTES.
run is redundant in proc sql since each sql statement is run as soon as they are sent. Use quit; to exit proc sql.
A semicolon is not required for macro call (causes unexpected problems sometimes).
use %do %to for loops
The code below should work.
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
proc sql noprint;
select id into :lista_id separated by ' ' from work.amostra;
select count(*) into :nr separated by ' ' from work.amostra;
quit;
* check;
%put lista_id=&lista_id nr=&nr;
%macro ciclo_teste();
%local ref;
%do i = 1 %to &nr;
%let ref = %scan(&lista_id, &i);
%*check;
%put ref = &ref;
/* your task below */
/* data work.up&ref;*/
/* set work.base&ref;*/
/* format perc_acum 9.3;*/
/* if first.id_cliente then perc_acum=0;*/
/* perc_acum + perc;*/
/* run; */
%end;
%mend;
%ciclo_teste()
tested on SAS 9.4 win7 x64
Edited:
In fact I would recommend doing this to avoid scanning a long string which is inefficient.
%macro tester();
/* get the number of obs (a more efficient way) */
%local NN;
proc sql noprint;
select nobs into :NN
from dictionary.tables
where upcase(libname) = 'WORK'
and upcase(memname) = 'AMOSTRA';
quit;
/* assign &ref by random access */
%do i = 1 %to &NN;
data _null_;
a = &i;
set work.amostra point=a;
call symputx('ref',id,'L');
stop;
run;
%*check;
%put ref = &ref;
/* your task below */
%end;
%mend;
%tester()
Please let me know if you have further questions.
Wow that seems like a lot of work. Why not just do the following:
data work.amostra;
input id;
cards;
1
4
7
10
;
run;
%macro test001;
proc sql noprint;
select count(*) into: cnt
from amostra;
quit;
%let cnt = &cnt;
proc sql noprint;
select id into: x1 - :x&cnt
from amostra;
quit;
%do i = 1 %to &cnt;
%let x&i = &&x&i;
%put &&x&i;
%end;
%mend test001;
%test001;
now in variables &x1 - &&x&cnt you have your values and you can process them however you like.
In general if your list is small enough (macro variables are limited to 64K characters) then you are better off passing the list in a single delimited macro variable instead of multiple macro variables.Remember that PROC SQL will automatically set the count into the macro variable SQLOBS so there is no need to run the query twice. Or you can use %sysfunc(countw()) to count the number of entries in your delimited list.
proc sql noprint ;
select id into :idlist separated by '|' from .... ;
%let nr=&sqlobs;
quit;
...
%do i=1 %to &nr ;
%let id=%scan(&idlist,&i,|);
data up&id ;
...
%end;
If you do generate multiple macro variables there is no need to set the upper bound in advance as SAS will only create the number of macro variables it needs based on the number of observations returned by the query.
select id into :idval1 - from ... ;
%let nr=&sqlobs;
If you are using an older version of SAS the you need set an upper bound on the macro variable range.
select id into :idval1 - :idval99999 from ... ;