I need to break one vaiable into multiple variables (max length 2000).
For example my string has length 10000000 (10 mb) I use:
proc sql;
create table str as
select
substr(string,2000,1) as field1,
substr(string,2000,2001) as field2,
.......
from data_table
Could I write loop in select statement not to write these field1-field5000.
Thank you!
First
substr() function takes 3 arguments
substr(string, position <, length>)
string - string constatn or field
position - starting position
length - length of the string you want to return
Second
In proc sql you can only use macro language loops so you must write macroprogram.
options mprint;
%macro substrLoop;
%let length = 2000;
%let endLoop = %eval(1000000/&length.);
proc sql;
create table str as
select
%do i = 1 %to &endLoop.;
substr(string, %eval(1 + (&i.-1)*&length.),&length.) as field&i.
%if &i ne &endLoop. %then ,;
%end;
from data_table;
quit;
%mend substrLoop;
%substrLoop
Explanation
options mprint;
enables to see in log code that was generated by called macro
%let length = 2000;
%let endLoop = %eval(1000000/&length.);
Setting macarovariables for length of substring and calculating when loop should end.
%do i = 1 %to &endLoop.;
substr(string, %eval(1 + (&i.-1)*&length.),&length.) as field&i.
%if &i ne &endLoop. %then ,;
%end;
Actual loop puting substr(string, 1,2000) as field1 , substr(string, 2001,2000) as field2 , etc. calculated fields into sql code.
%if &i ne &endLoop. %then ,; is needed to prevent puting comma after last generated field.
Related
need help on one query , I have to iterate date in do loop that is in format of yymmd6.(202112) so that once the month reach to 12 then its automatically change to next year first month.
///// code////////
%let startmo=202010 ;
%let endmo= 202102;
%macro test;
%do month= &startmo %to &endmo;
Data ABC_&month;
Set test&month;
X=&month ;
%end;
Run;
%mend;
%test;
//////////
Output should be 5 dataset as
ABC_202010
ABC_202011
ABC_202012
ABC_202101
ABC_20210
I need macro variable month to be resolved 202101 once it reached to 202012
Those are not actual DATE values. Just strings that you have imposed your own interpretation on so that they LOOK like dates to you.
Use date values instead and then it is easy to generate strings in the style you need by using a FORMAt.
%macro test(startmo,endmo);
%local offset month month_string;
%do offset = 0 to %sysfunc(intck(month,&startmo,&endmo));
%let month=%sysfunc(intnx(month,&startmo,&offset));
%let month_string=%sysfunc(putn(&month,yymmn6.));
data ABC_&month_string;
set test&month_string;
X=&month ;
format X monyy7.;
run;
%end;
%mend;
%test(startmo='01OCT2020'd , endmo='01FEB2021'd)
And if you need to convert one of those strings into a date value use an INFORMAT.
%let date=%sysfunc(inputn(202010,yymmn6.));
I would prefer to use a do while loop.
check whether the last 2 characters are 12, if so, change the month part to 01.
code
%let startmo=202010 ;
%let endmo= 202102;
%macro test;
%do %while(&startmo <= &endmo);
Data ABC_&startmo;
Set test&startmo;
X=&startmo ;
Run;
%end;
%let mon = %substr(&startmo, 5, 2);
%let yr = %substr(&startmo, 1, 4);
%if &mon = 12 %then %do;
%let m = 01;
%let startmo = %sysfunc(cat(%eval(&yr + 1), &m));
%end;
%else %do;
%let startmo = %eval(&startmo + 1);
%end;
%mend;
%test;
I have a dataset with each observation having two space-separated lists as string variables. I want a third variable showing the overlap between the string lists. Using another SO post, I've created a macro to calculate the overlap. I can't work out how to implement it in a DATA step to get the third variable.
This is my dataset, with dummy data:
data use;
infile datalines dlm='~~';
input list1:$100. list2:$100. expected_match:$10.;
datalines;
Homer Bart~~Homer Bart~~Full
Marge Lisa~~Lisa Marge~~Full
Homer Marge~~Marge~~Partial
Bart Lisa~~Bart~~Partial
Homer Marge Bart Lisa~~Maggie~~None
;;;;
run;
This is the macro, with tests (all of which pass):
%macro list_overlap(list1, list2);
%local i matches match_type;
%let matches = 0;
%do i = 1 %to %sysfunc(countw(&list1, %str( )));
%if %sysfunc(findw(&list2, %scan(&list1, &i,, s)))
%then %let matches = %eval(&matches + 1);
%end;
%if &matches = %sysfunc(countw(&list1, %str( )))
and %sysfunc(countw(&list1, %str( ))) = %sysfunc(countw(&list2, %str( )))
%then %let match_type = 'Full';
%else %if &matches = 0 %then %let match_type = 'None';
%else %let match_type = 'Partial';
match_type = &match_type%str(;)
%mend list_overlap;
%put NOTE: %list_overlap(Homer Bart,Homer Bart);
%put NOTE: %list_overlap(Marge Lisa,Lisa Marge);
%put NOTE: %list_overlap(Homer Marge,Marge);
%put NOTE: %list_overlap(Bart Lisa,Bart);
%put NOTE: %list_overlap(Homer Marge Bart List,Maggie);
This is how I'm trying to implement it in a DATA step:
data matches;
set use;
call execute(catt('%list_overlap(', list1, ',', list2, ')'));
run;
I'm getting the following error with this case:
NOTE: Line generated by the CALL EXECUTE routine.
1 + match_type = 'Full';
__________
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
I've tried other ways too, but this is the closest I've got.
Looks like you want the RESOLVE() function instead of CALL EXECUTE.
data matches;
set use;
match_type = resolve(cats('%list_overlap(',list1,',',list2,')'));
run;
But with your current definition of the macro that will include all of characters the macro generates, match_type = 'Full';, into the value of the MATCH_TYPE variable. So remove the superfluous characters the macro is currently generating so that it only generates the value you want to save.
... %then Full;
%else %if matches eq 0 %then None;
%else Partial;
Your problem here is that call execute isn't doing what you think, I suspect.
What's happening:
Data step runs, call execute lines generated
Then the macro stuff is call executed, and you have:
Code example:
data matches;
set use;
call execute(stuff);
run;
match_type = 'Full';
That's not legal - that's a data step line but not in a data step.
Instead of doing all of that work in macro land, do it in data step land. Works just as well, and gets done what you want.
Something like this:
%macro list_overlap(list1,list2);
matches=0;
length match_type $7;
do i = 1 to countw(&list1,' ');
if findw(&list2,scan(&list1,i,' '))
then matches = matches + 1;
end;
if matches eq countw(&list1,' ') then match_type = 'Full';
else if matches eq 0 then match_type = 'None';
else match_type = 'Partial';
%mend list_overlap;
Something like that, I can't test it right now, but that should generally work. Then don't call execute the macro, just call it normally.
data matches;
set use;
%list_overlap(list1,list2);
run;
The following macro makes an inner join between two tables containing one column from each table in addition to the joining column :
%macro ij(x=,y=,to=".default",xc=,yc=,by=);
%if &to = ".default" %then %let to = &from;
PROC SQL;
CREATE TABLE &to AS
SELECT t1.&xc, t2.&yc, t1.&by
FROM &x t1 INNER JOIN &y t2
ON t1.&by = t2.&by;
RUN;
%mend;
I want to find a way to use several columns in &xc, &yc and &by.
As I don't think I can use vectors of variables.
My idea is to pass parameters as vectors of strings instead of simple variables, for example xc = {"col1" "col2"} and loop through them
using %let some_var= %sysfunc(dequote(&some_string)); to convert them back to variables.
Applied on xc only it would become something like:
%macro ij(x=,y=,to=".default",xc=,yc=,by=);
%if &to = ".default" %then %let to = &from;
PROC SQL;
CREATE TABLE &to AS
SELECT
%do i = 1 %to %NCOL(&xc)
%let xci = %sysfunc(dequote(&xc[1]));
t1.&xci,
%end;
t2.&yc, t1.&by
FROM &x t1 INNER JOIN &y t2
ON t1.&by = t2.&by;
RUN;
%mend;
But this loop fails. How could I make it work ?
Note: this is a simplified example, my ultimate ambition is to build join macros that would be as little verbose as possible and integrate data quality checks.
Really this would be much easier to code use SAS dataset options instead of building complicated macro logic.
proc sql ;
create table want2 as
select *
from sashelp.class(keep=name age)
natural inner join sashelp.class(keep=name height weight)
;
quit;
I would suggest learning how to use data step code instead of SQL code. For most normal data manipulations it is clearer and simpler. Say you wanted to combine IN1 and IN2 on the variable ID and keep the variable A and B from IN1 and the variable X and Y from the IN2.
data out ;
merge in1 in2 ;
by id ;
keep id a b x y ;
run;
Second I would resist the urge to generate too complex a web of macro code. It will make the programs harder to understand for the next programmer. Including yourself two weeks later. Your particular example does not look like something that is worth coding as a macro. You are not really typing less information, just using a few commas in place of where your SQL code would have had keywords like FROM or JOIN.
Now to answer your actual question. To pass in a list of values to macro use a delimited list. When at all possible use space as the delimiter, but especially avoid using comma as the delimiter. This will be easier to type, easier to pass into the macro and easier to use since it matches the SAS language as you can see in the data step above. If you really need to generate code like SQL syntax that uses commas then have the macro code generate them where needed.
%macro ij
(x= /* First dataset name */
,y= /* Second dataset name */
,by= /* BY variable list */
,to= /* Output dataset name. If empty use data step to generate DATAn work name */
,xc= /* Variable list from first dataset */
,yc= /* Variable list from second dataset */
);
%if not %length(&to) %then %do;
* Let SAS generate a name for new dataset ;
data ; run;
%let to=&syslast ;
proc delete data=&to; run;
%end;
%if not %length(&xc) %then %let xc=*;
%if not %length(&yc) %then %let yx=*;
%local i sep ;
proc sql ;
create table &to as
select
%let sep= ;
%do i=1 %to %sysfunc(countw(&by)) ;
&sep.T1.%scan(&by,&i)
%let sep=,;
%end;
%do i=1 %to %sysfunc(countw(&xc)) ;
&sep.T1.%scan(&xc,&i)
%end;
%do i=1 %to %sysfunc(countw(&yc)) ;
&sep.T2.%scan(&yc,&i)
%end;
from &x T1 inner join &y T2 on
%let sep= ;
%do i=1 %to %sysfunc(countw(&by)) ;
&sep.T1.%scan(&by,&i)=T2.%scan(&by,&i)
%let sep=,;
%end;
;
quit;
%mend ij ;
Try it:
options mprint;
%ij(x=sashelp.class,y=sashelp.class,by=name,to=want,xc=age,yc=height weight);
SAS LOG:
MPRINT(IJ): proc sql ;
MPRINT(IJ): create table want as select T1.name ,T1.age ,T2.height ,T2.weight from sashelp.class
T1 inner join sashelp.class T2 on T1.name=T2.name ;
NOTE: Table WORK.WANT created, with 19 rows and 4 columns.
MPRINT(IJ): quit;
Instead of vectors, think simple lists.
Pass your variable lists as unquoted, space separated list of values. The values are SAS variable names that can be scanned out as tokens.
%macro ij (x=, ...);
...
%local i token;
%let i = 1;
%do %while (%length(%scan(&X,&i)));
%let token = %scan(&X,&i);
&token.,/* emit the token as source code */
%let i = %eval(&i+1);
%end;
...
%mend;
%ij ( x = one two three, ... )
Be sure to localize all your macro variables to prevent unwanted side effects outside the macro.
For consistency I try to use i/o related macro parameters that mimic SAS Procs -- data=, out=, file=, ...
Some would say named arguments are verbose!
If your 'proto-code' expects the xci symbol to be some sort of serially numbered variable, it is not. You would have to use %local xc&i; %let xc&i= for assignment, and &&xc&i for resolution. Also, your original code references &from which is not passed.
Building is fun. I would also recommend surveying past conference papers and SAS literature for similar works that may already meet your goal.
You could start with a space-separated list of column names and avoid looping entirely:
/*Define list of columns*/
%let COLS = A B C;
%put COLS = &COLS;
/*Add table alias prefix*/
%let REGEX = %sysfunc(prxparse(s/(\S+)/t1.$1/));
%let COLS = %sysfunc(prxchange(®EX,-1,&COLS));
%put COLS = &COLS;
%syscall prxfree(REGEX);
/*Condense multiple spaces to a single space*/
%let COLS = %sysfunc(compbl(&COLS));
%put COLS = &COLS;
/*Replace spaces with commas*/
%let COLS = %sysfunc(translate(&COLS,%str(,),%str( )));
%put COLS = &COLS;
In the end as #Tom noted, SAS dataset options are more convenient, and using them one doesn't need to loop over variables.
Here is the macro I came with :
*--------------------------------------------------------------------------------------------- ;
* JOIN ;
* Performs any join (defaults to inner join). ;
* By default left table is overwritten (convenient for successive left joins) ;
* Performs a natural join so columns should be renamed accordingly through 'rename' parameters ;
*----------------------------------------------------------------------------------------------;
%macro join
(data1= /* left table */
,data2= /* right table */
,keep1= /* columns to keep (default: keep all), don't use with drop */
,keep2=
,drop1= /* columns to drop (default: none), don't use with keep */
,drop2=
,rename1= /* rename statement, such as 'old1 = new1 old2 = new2 */
,rename2=
,j=ij /* join type, either ij lj or rj */
,out= /* created table, by default data1 (left table is overwritten)*/
);
%if not %length(&out) %then %let out = &data1;
%if %length(&keep1) %then %let keep1 = keep=&keep1;
%if %length(&keep2) %then %let keep2 = keep=&keep2;
%if %length(&drop1) %then %let drop1 = drop=&drop1;
%if %length(&drop2) %then %let drop2 = drop=&drop2;
%if %length(&rename1) %then %let rename1 = rename=(&rename1);
%if %length(&rename2) %then %let rename2 = rename=(&rename2);
%let kdr1 =;
%let kdr2 =;
%if (%length(&keep1) | %length(&drop1) | %length(&rename1)) %then %let kdr1 = (&keep1&drop1 &rename1);
%if (%length(&keep2) | %length(&drop2) | %length(&rename2)) %then %let kdr2 = (&keep2&drop2 &rename2);
%if &j=lj %then %let j = LEFT JOIN;
%if &j=ij %then %let j = INNER JOIN;
%if &j=rj %then %let j = RIGHT JOIN;
proc sql;
create table &out as select *
from &data1&kdr1 t1 natural &j &data2&kdr2 t2;
quit;
%mend;
Reproducible Examples:
data temp1;
input letter $ number1 $;
datalines;
a 1
a 2
a 3
b 4
c 8
;
data temp2;
input letter $ letter2 $ number2 $;
datalines;
a c 666
b d 0
;
* left join on common columns into new table temp3;
%join(data1=temp1,data2=temp2,j=lj,out=temp3)
* inner join by default, overwriting temp 1, after renaming to join on another column;
%join(data1=temp1,data2=temp2,drop2=letter,rename2= letter2=letter)
The character variable in dataset never matches with the macro variable. The %IF loop never comes true. Kindly advice.
I am trying to match by months and accordingly trying to create array and put counts only for specific months. Not working because the month macro variable never matches with dataset variable having month.
/*create dummy data*/
data datefile;
input tran_date date9. cnt 3.;
datalines;
13feb2015 5
10feb2015 4
11feb2015 3
05feb2015 8
08feb2015 5
01jan2015 1
20dec2014 1
31jan2015 2
23dec2014 2
12jan2015 1
;
/*calculate month*/
data datefile11;
set datefile;
tran_mon=year(tran_date)*100+month(tran_date);
run;
/*select distinct month*/
proc sql;
create table datefile12 as select distinct(tran_mon)
from datefile11 order by tran_mon;
quit;
/*convert month from numeric to character*/
data datefile11(drop=tran_mon);
informat tran_mon2 $6.;
set datefile11;
tran_mon2=tran_mon;
run;
/*create macro variables through datastep*/
data datefile13;
set datefile12;
monum = cat('mnth',_N_);
run;
data _null_;
set datefile13;
call symput(monum,trim(left(tran_mon)));
run;
/*use array to make separate column for each month and
put split count for each month to each colunms*/
%macro c;
proc sql noprint;
select count(1) into :nrow from datefile13;
quit;
%let nrow = &nrow;
data datefile14;
set datefile11;
array mon{*} mon_1 - mon_&nrow;
%do i=1 %to &nrow;
%if tran_mon2 = &&mnth&i %then %do; %put tran_mon2;
mon_&i = cnt; %end;
%else %do; mon_&i = 0 ; %end;
%end;
run;
%mend c;
%c
Your macro %if %then %do check executes while the data step is still being compiled - by the time the data step has begun to execute, there is no further opportunity to use macro logic like that.
Try doing it the other way round - write your loop using if then do data step logic instead.
I am creating a macro variable with the SAS code below. It's storing a list of data names where I need to replace certain values in specific variables.
proc sql noprint;
select distinct data_name
into :data_repl separated by ' '
from TP_attribute_matching
where Country="&Country_Name" and Replace_this ne ' ';
quit;
I would like to skip the following 2 blocks if data_repl is empty. These 2 blocks go through each data set and variables in that data set, and then replaces x with y.
/*Block 1*/
%do i=1 %to %_count_(word=&data_repl);
proc sql noprint;
select var_name,
Replace_this,
Replace_with
into :var_list_repl_&i. separated by ' ',
:repl_this_list_&i. separated by '#',
:repl_with_list_&i. separated by '#'
from TP_attribute_matching
where Replace_this ne ' ' and data_name="%scan(&data_repl,&i.)";
quit;
/* Block 2 */
%do i=1 %to %_count_(word=&data_repl);
data sasdata.%scan(&data_repl,&i);
set sasdata.%scan(&data_repl,&i);
%do j=1 %to %_count_(word=&&var_list_repl_&i.);
%let from=%scan("&&repl_this_list_&i.",&j,'#');
%let to=%scan("&&repl_with_list_&i.",&j,'#');
%scan(&&var_list_repl_&i.,&j)=translate(%scan(&&var_list_repl_&i.,&j),&to,&from);
%end;
run;
%end;
How shoould I do this? I was going through %SKIP and if then leave, but cannot figure this out yet.
%IF and %DO are macro statements that can only be used inside a macro:
%macro DoSomething;
%if "&data_repl" ne "" %then %do;
/*Block 1*/
%do i=1 %to %_count_(word=&data_repl);
proc sql noprint;
select var_name,
Replace_this,
Replace_with
into :var_list_repl_&i. separated by ' ',
:repl_this_list_&i. separated by '#',
:repl_with_list_&i. separated by '#'
from TP_attribute_matching
where Replace_this ne ' ' and data_name="%scan(&data_repl,&i.)";
quit;
/* Block 2 */
%do i=1 %to %_count_(word=&data_repl);
data sasdata.%scan(&data_repl,&i);
set sasdata.%scan(&data_repl,&i);
%do j=1 %to %_count_(word=&&var_list_repl_&i.);
%let from=%scan("&&repl_this_list_&i.",&j,'#');
%let to=%scan("&&repl_with_list_&i.",&j,'#');
%scan(&&var_list_repl_&i.,&j)=translate(%scan(&&var_list_repl_&i.,&j),&to,&from);
%end;
run;
%end;
%end;
%mend;
%DoSomething
EDIT:
Instead of checking the string, you can use count from PROC SQL (&SQLOBS macro var)
%let SQLOBS=0; /* reset SQLOBS */
%let data_repl=; /* initialize data_repl,
would not be defined in case when no rows returned */
proc sql noprint;
select distinct data_name
into :data_repl separated by ' '
from TP_attribute_matching
where Country="&Country_Name" and Replace_this ne ' '
and not missing(data_name);
quit;
%let my_count = &SQLOBS; /* keep the record count from last PROC SQL */
...
%if &my_count gt 0 %then %do;
...
...
%end;
If you already have a main macro, no need to define new (I'm not sure what you're asking now).
First off, this is yet another good example where list processing basics would simplify the code to where you don't need to worry about your actual question. Will elaborate later.
Second off, the way these loops are usually coded is something like
%do ... %while ¯ovar ne ;
which checks for empty and doesn't execute the loop at all if it's empty to start with. ¯ovar there would be the result of the scan. IE:
%let scan_result = %scan(&Data_repl.,1);
%do i = 1 %to %_count_... while &scan_result ne ; *perhaps minus one, not sure what %_count_() does exactly;
... code
%let scan_result=%scan(&data_Repl.,&i+1);
%end;
Going back to list processing, what you're ultimately doing is:
data &dataset.;
set &dataset.;
[for some set of &variables,&tos, &froms]
&variable. = translate(&variable.,&to.,&from.);
[/set of variables]
run;
So what you need is a couple of macros. Assuming you have a dataset with
<dataset> <varname> <to> <from>
You can call this pretty easily. Two ways:
Run it as a set of nested macros/calls. This is a bit messier, but might be a bit easier to understand.
%macro do_dataset(data=);
proc sql noprint;
select cats('%convert_Var(var=',varname,',to=',to,',from=',from,')')
into :convertlist separated by ' '
from dataset_with_conversions
where dataset="&data.";
quit;
data &data;
set &data;
&convertlist.;
run;
%mend do_dataset;
%macro convert_var(var=,to=,from=);
&var. = translate(&var.,"&to.","&from.");
%mend convert_var;
proc sql noprint;
select cats('%do_dataset(data=',dataset,')')
into :dslist separated by ' '
from dataset_with_conversions;
quit;
&dslist;
Second, you can do all of that in one datastep using call execute (rather than having two different steps). IE, do a by dataset statement, then for first.dataset execute data <dataset>; (filling in that) and for last.dataset execute run, and otherwise execute the translates.
More complicated, but one pass solution - depends on your comfort level which you prefer, they should generally work similarly.
if you want to skip something based on the parameter, if data_repl is set as null, you can add a check for the value, it will avoid error causing during the include statement, since at that time this will be null and which may cause error. E.g
if libary path is derived based on variable passed. which will lead to invalid library path during the include statement, We can use the skip statement.
%macro DoSomething(data_repl=);
%if "&data_repl" ne "" %then %do;
// your code goes here.
%end;
%mend;
%DoSomething