Is there a way to extract all the data sources and libraries used in an opened program in SAS EG?
I found the following which I am able to manipulate, but unfortunately I am unable to use Filename because of how sas is set up, and I am unable to use Filename FTP. Any ideas here?
Filename inp "<path and name of the sas code/>";
data temp;
infile inp;
input rec $100.;
if index(lowcase(rec), "libname") ne 0 then output; /*check libraries */
else if index(lowcase(rec),"filename") ne 0 then output; /*check input files*/
else if index(lowcase(rec),"data") ne 0 and index(rec,".") ne 0 then output; /* to check the
permanent datasets */
else if index(lowcase(rec),"merge") ne 0 and index(rec,".") ne o then output; /*check permanent
datasets used in merge*/
run;
Related
I know I can use call execute to create and execute multiple data steps. But is there a way to generate code for one singular data step, with many repetitive lines of code?
In R for instance I can create a vector of variables, executing some paste/print statement and get something approximating the output I need. As follows:
strings<-c("Exkl_UtgUtl_Flyg",
"Exkl_UtgUtl_Tag",
"Exkl_UtgUtl_Farja",
"Exkl_UtgUtl_Hyrbil",
"Exkl_UtgUtl_Bo",
"Exkl_UtgUtl_Aktiv",
"Exkl_UtgUtl_Annat")
first_string<-strings[1]
other_strings<-strings[strings!=first_string]
gsub("\t", "",gsub("\n","",gsub(",","",paste0(
paste0("DATA IBIS3_5;
Set IBIS3_5;
if ",first_string,"=3 and hjalpvariabel=1 then do;",
first_string,"=1;",
paste0(gsub("Exkl_","",first_string),"SSEK_Pers")," = ",paste0(gsub("^Exkl_","",first_string),"SSEK_PPmedel"),";
end;
else if ",first_string,"=3 then ",first_string,"=2;"),
paste0("else if ",other_strings,"=3 and hjalpvariabel=1 then do;",
other_strings,"=1;",
paste0(gsub("Exkl_","",other_strings),"SSEK_Pers")," = ",paste0(gsub("^Exkl_","",other_strings),"SSEK_PPmedel"),";
end;
else if ",other_strings,"=3 then ",other_strings,"=2;", collapse=","),"run;"))))
I still have to delete the quotes and the bracketed number manually, but that's at least bearable. The final output looks something like this:
DATA IBIS3_5; Set IBIS3_5;
if Exkl_UtgUtl_Flyg=3 and hjalpvariabel=1 then do; Exkl_UtgUtl_Flyg=1;UtgUtl_FlygSSEK_Pers=UtgUtl_FlygSSEK_PPmedel; end; else if Exkl_UtgUtl_Tag=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Tag=1;UtgUtl_TagSSEK_Pers=UtgUtl_TagSSEK_PPmedel; end; else if Exkl_UtgUtl_Tag=3 then Exkl_UtgUtl_Tag=2;
else if Exkl_UtgUtl_Farja=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Farja=1;UtgUtl_FarjaSSEK_Pers=UtgUtl_FarjaSSEK_PPmedel; end; else if Exkl_UtgUtl_Farja=3 then Exkl_UtgUtl_Farja=2;
else if Exkl_UtgUtl_Hyrbil=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Hyrbil=1;UtgUtl_HyrbilSSEK_Pers=UtgUtl_HyrbilSSEK_PPmedel; end; else if Exkl_UtgUtl_Hyrbil=3 then Exkl_UtgUtl_Hyrbil=2;
else if Exkl_UtgUtl_Bo=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Bo=1;UtgUtl_BoSSEK_Pers=UtgUtl_BoSSEK_PPmedel; end; else if Exkl_UtgUtl_Bo=3 then Exkl_UtgUtl_Bo=2;
else if Exkl_UtgUtl_Aktiv=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Aktiv=1;UtgUtl_AktivSSEK_Pers=UtgUtl_AktivSSEK_PPmedel; end; else if Exkl_UtgUtl_Aktiv=3 then Exkl_UtgUtl_Aktiv=2;
else if Exkl_UtgUtl_Annat=3 and hjalpvariabel=1 then do;Exkl_UtgUtl_Annat=1;UtgUtl_AnnatSSEK_Pers=UtgUtl_AnnatSSEK_PPmedel; end; else if Exkl_UtgUtl_Annat=3 then Exkl_UtgUtl_Annat=2;
run;
Is there a way of generating this code in SAS, without, having to resort to other programs?
I do not see the pattern in the code you want to generate so I will leave that part to your imagination. So for the purpose of discussion let's assume you want to generate code like:
var1=var1**2;
var2=var2**2;
You can use conditional logic in the data step that is generating the code via CALL EXECUTE() to generate the beginning (and possible the ending) of the data step you want to generate.
One way is to test when you are on the first (or last) observation.
Example:
data _null_;
set variables end=eof;
if _n_=1 then call execute('data want; set have;');
call execute(cats(variable,'=',variable,'**2;'));
if eof then call execute('run;');
run;
You could also use that _n_=1 test to determine whether or not you need to generate the ELSE.
if _n_>1 then call execute(' else ');
Another way is to use a DO loop to read all of the observations.
data _null_;
call execute('data want; set have;');
do while(not eof);
set variables end=eof;
call execute(cats(variable,'=',variable,'**2;'));
end;
call execute('run;');
stop;
run;
Or if instead of using CALL EXECUTE() you write the code to a file you don't need to worry about generating the beginning or ending of the data step. You can just %INCLUDE the generated code into the middle of a data step.
filename code temp;
data _null_;
file code;
set variables end=eof;
put variable '=' variable '**2;' ;
run;
data want;
set have;
%include code / source2;
run;
The below writes the programs to a file then brings the file bqack into SAS for execution. This is how I avoid macros in almost all cases. 13 years of macros and I ended up with 5 ampersands one night. Vowed to fix.:
* USE WHEN NOT DEBUGGING CODE: ;
* filename TEMP '$MYTEMPFILE';
* USE WHEN DEBUGGING CODE: ;
filename TEMP 'c:\temp\Test.sas';
data _null_ ;
file TEMP ;
set DICT end=eof;
if _n_ = 1 then
do ;
put 'data Africa ; '
/ ' attrib '
;
end;
if label = "" then
put #12 name #30 'label="XX_' name +(-1) '"' ;
else
put #12 name #30 'label="XX_' label +(-1)'"' ;
if eof then
do ;
put #12';'
/ ' set sashelp.SHOES;'
/ 'run;'
;
end;
run;
%include TEMP ;
I'm trying to create a custom text report from my sas code, below is the code
data have ;
ncandidates=1; ngames=3; controlppt=1; controlgame=2;
ppt1='Abc'; ppt2='Bcd';
infile cards dsd dlm='|';
input (var1-var21) ($);
cards;
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
;
filename report 'myreport.txt';
data _null_;
file report dsd dlm='|' LRECL=8614;
a='';
put
83*'#'
/ '##### Number of ppts'
/ 83*'#'
/ 'input.Name=' #
;
eof = 0;
do until(eof);
set have end=eof;
If not missing(var1) then
put var1-var10 ## ;
end;
put a
// 83*'#'
/ '##### Output Data'
/ 83* '#'
// 'output.Name=' #;
eof=0;
do until(eof);
set have ;
If not missing(var11) then
put var11-var20 ## ;
end;
put '1';
run;
Everything gets printed to the file except for the last put '1';
Nothing after the second do until block gets executed;
Also, if I add end=eof to the last do until block then everything gets printed twice.
Do we have a solution around this?
I am not sure about the cause of the issue. But sometimes SAS behaves weird if a dataset is read several times as you do it. But using a different variable for second set have end=eof2; resolves the problem:
data have ;
ncandidates=1; ngames=3; controlppt=1; controlgame=2;
ppt1='Abc'; ppt2='Bcd';
infile cards dsd dlm='|';
input (var1-var21) ($);
cards;
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b1|2|a|1|3|b
;
filename report '~/myreport.txt';
data _null_;
file report dsd dlm='|' LRECL=8614;
a='';
put
83*'#'
/ '##### Number of ppts'
/ 83*'#'
/ 'input.Name=' #
;
eof = 0;
do until(eof);
set have end=eof;
If not missing(var1) then
put var1-var10 ## ;
end;
put a
// 83*'#'
/ '##### Output Data'
/ 83* '#'
// 'output.Name=' #;
eof2=0;
do until(eof2);
set have end=eof2;
If not missing(var11) then
put var11-var20 ## ;
end;
put '1';
stop;
run;
A 'plain' DATA step stops when a read is attempted after the last record of a set has been read. This typically happens during the implicit loop that is inherent in the magic of a DATA step. When you loop over a set explicitly with end of data checks, the read attempt beyond does not occur, and thus does not implicitly end the step.
The eof flag is changed only when the end of data is reached. It is not set to 0 when not at end of data -- the eof flag is simply what it is at the start of the loop. Thus the flag needs to be reset if reused for a subsequent loop.
* 'top' is logged twice;
* the data step ends when the second implicit iteration tries to read past eof of the first set;
data _null_;
put 'top';
do until (eof);
set sashelp.class(obs=2) end=eof;
put name=;
end;
eof = 0; * reset flag;
do until (eof);
set sashelp.class(where=(name=:'J')) end=eof;
put name=;
end;
run;
* 'top' is logged once;
* the data step ends when the stop is reached at the bottom;
data _null_;
put 'top';
do until (eof);
set sashelp.class(obs=2) end=eof;
put name=;
end;
eof = 0;
do until (eof);
set sashelp.class(where=(name=:'J')) end=eof;
put name=;
end;
run;
%let dirname = C:\Users\data;
filename DIRLIST pipe 'dir/B &dirname\*.dbf';
/* Create a data set with one observation for each file name */
data dirlist;
length fname $8.;
infile dirlist length=reclen;
input fname $8.;
run;
data all_text (drop=fname);
set dirlist;
filepath = "&dirname\"||fname||".dbf";
infile dummy filevar = filepath length=reclen end=done missover;
do while(not done);
INPUT
F1 : 2.
F2 : 2.
F3 : 2.
F4 : 10.
F5 : 4.;
output;
end;
run;
The problem is that it is only reading the first line of each files and not the whole file before moving to the next. Also variable F1 are shown as missing.
Suggestions are welcome
So a standard proc import would be:
proc import out=sample1 datafile="path to dbf file.dbf" dbms=DBF replace;
run;
The problem now, is how to generate this set of code for every file in your file list. Using the CALL EXECUTE statement from #Tom is your best bet. You call also create a small macro and call it for each filename, using CALL EXECUTE. If you're new to SAS this can be easier to understand.
*Create a macro that imports the DBF
%macro import_dbf(input= , output=);
proc import out=&out datafile="&output" dbms=DBF replace;
run;
%mend;
Then call macro from dataset. I'm naming the datasets DBF001, DBF0002 etc.
%let dirname=C:\_localdata;
data dirlist;
informat fname $20.;
input fname;
cards;
data1.dbf
data2.dbf
data3.dbf
data4.dbf
;
run;
data out;
set dirlist;
str=catt('%import_dbf(input="', "&dirname", '\', fname, '", output=dbf',
put(_n_, z4.), ');');
run;
proc print data=out;
run;
Import them one by one and then combine them.
%let dirname = C:\Users\data;
data filelist ;
infile "dir /b &dirname\*.dbf" pipe truncover end=eof;
fileno + 1;
input fname $256. ;
tempname = 'temp'||put(fileno,z4.);
call execute(catx(' ','proc import replace dbms=dbf'
,'out=',tempname,'datafile=',quote(trim(fname)),';run;'
));
if eof then call symputx('lastname',tempname);
run;
data want ;
set temp0001-&lastname;
run;
Is there a way to read in specific parts of my data without using FIRSTOBS=? For example, I have 5 different files, all of which have a few rows of unwanted characters. I want my data to read in starting with the first row that is numeric. But each of these 5 files have that first numeric row starting in different rows. Rather than going into each file to find where FIRSTOBS should be, is there a way I can instead check this? Perhaps by using an IF statement with ANYDIGIT?
Have you tried something like this from the SAS docs? Example 5: Positioning the Pointer with a Numeric Variable
data office (drop=x);
infile file-specification;
input x #;
if 1<=x<=10 then
input #x City $9.;
else do;
put 'Invalid input at line ' _n_;
delete;
end;
run;
This assumes that you don't know how many lines are to be skipped at the beginning of each file. My filerefs are UNIX to run the example on another OS they will need to be changed;
*Create two example input data files;
filename FT15F001 '~/file1.txt';
parmcards;
char
char and 103
10 10 10
10 10 10.1
;;;;
run;
filename FT15F001 '~/file2.txt';
parmcards;
char
char and 103
char
char
char
10 10 10.5
10 10 10
;;;;
run;
*Read them starting from the first line that has all numbers;
filename FT77F001 '~/file*.txt';
data both;
infile FT77F001 eov=eov;
input #;
/*Reset the flag at the start of each new file*/
if _n_ eq 1 or eov then do;
eov=0;
flag=1;
end;
if flag then do;
if anyalpha(_infile_) then delete;
else flag=0;
end;
input v1-v3;
drop flag;
retain flag;
run;
proc print;
run;
I ended up doing:
INPUT City $#;
StateAvg = input(substr(City,1,4),COMMA4.);
IF 5000<= StateAvg <= 7000 THEN
INPUT City 1-7 State ZIP;
ELSE DO;
Delete;
END;
And this worked. Thanks for the suggestions, I went back and looked at example 5 and it helped.
I am trying to read the folder with zip files using Pipe Command. But I get error saying ls command not recognized. There are actually 2 zip files(ABC_*.zip) in the folder /PROD/
Can anybody help me in this?
%let extl_dir=/PROD/ ;
filename zl pipe "ls &extl_dir.ABC_*.zip";
data ziplist_a;
infile zl end=last;
length path $200 zipnm $50 filedt $15;
input path $;
zipnm=scan(path,-1,"/");
filedt=scan(scan(path,-1,"_"),1,".");
call symput('zip'||left(_n_), zipnm);
call symput('path'||left(_n_), path);
call symput('filedt'||left(_n_),filedt);
if last then call symput('num_zip',_n_);
*call symput('flenm',filenm);
run;
SAS has published a convenient macro to list files within a directory that does not rely upon running external commands. It can be found here. I prefer this approach as it does not introduce external sources of possible error such as user permissions, pipe permissions etc.
The macro uses datastep functions (through %sysfunc) and the commands can be called in the same manner from a datastep. Below is an example which extracts tile information.
%let dir = /some/folder;
%let fType = csv;
data want (drop = _:);
_rc = filename("dRef", "&dir.");
_id = dopen("dRef");
_n = dnum(_id);
do _i = 1 to _n;
name = dread(_id, _i);
if upcase(scan(name, -1, ".")) = upcase("&fType.") then do;
_rc = filename("fRef", "&dir./" || strip(name));
_fid = fopen("fRef");
size = finfo(_fid, "File Size (bytes)");
dateCreate = finfo(_fid, "Create Time");
dateModify = finfo(_fid, "Last Modified");
_rc = fclose(_fid);
output;
end;
end;
_rc = dclose(_id);
run;