file statement vs. filename assignment in sas - sas

I thought the following two data steps would be equivalent, but in a UNIX environment they produce slightly different binary files. Can anyone explain why?
/*Example 1*/
filename myfile "/tmp/file1";
data _null_;
file myfile recfm=n;
a=1;
put a;
run;
filename myfile;
/*Example 2*/
data _null_;
file "/tmp/file2" recfm=n;
a=1;
put a;
run;

Related

How to convert a SAS dataset into CSV file whereas a single filed in it has value with comma

I have a SAS dataset, let us say
it has 4 columns A,B,C,D and the values
A = x
B = x
C = x
**D = x,y**
Here column D has two values inside a single column while converting it into CSV format it generates a new column with the value Y. How to avoid this and to convert SAS dataset into CSV file?
* get some test records in a file;
Data _null_;
file 'c:\tmp\test.txt' lrecl=80;
put '1,22,Hans Olsen,Denmark,333,4';
put '1111,2,Turner, Alfred,England,3333,4';
put '1,222,Horst Mayer,Germany,3,4444';
run;
* Read the file as a delimited file;
data test; infile 'c:\tmp\test.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
* Read the file and write another file.
* If 6 delimiters and not 5, change the third to #;
data test2;
infile 'c:\tmp\test.txt' lrecl=80 truncover;
file 'c:\tmp\test2.txt' lrecl=80;
length rec $80;
drop pos len;
input rec $char80.;
if count(rec,',') = 6 then do;
call scan(rec,4,pos,len,',');
substr(rec,pos-1,1) = '','';
end;
put rec;
run;
* Read the new file as a delimited file;
data test2; infile 'c:\tmp\test2.txt' dsd dlm=',' missover;
length v1 v2 8 v3 v4 $40 v5 v6 8;
input
'V1'n : ?? BEST5.
'V2'n : ?? BEST5.
'V3'n : $CHAR40.
'V4'n : $CHAR40.
'V5'n : ?? BEST5.
'V6'n : ?? BEST5.;
run;
In this code, it add '#' but I want ',' itself in the output.
Could anyone please guide me to do that?
Thanks in advance!!
It sounds like you are starting with an improperly created CSV file.
1,22,Hans Olsen,Denmark,333,4
1111,2,Turner, Alfred,England,3333,4
1,222,Horst Mayer,Germany,3,4444
That should have been made like this:
1,22,Hans Olsen,Denmark,333,4
1111,2,"Turner, Alfred",England,3333,4
1,222,Horst Mayer,Germany,3,4444
If you are positive that you know that the only field with embedded commas is the third then you can use a data step to read it in and generate a valid file.
data _null_;
infile bad dsd truncover ;
file good dsd ;
length v1-v6 dummy $200;
input v1-v2 #;
do i=1 to countw(_infile_,',','q')-5;
input dummy #;
v3=catx(', ',v3,dummy);
end;
input v4-v6 ;
put v1-v6 ;
run;
Once you have a properly formatted CSV file then it is easy to read.
data want;
infile good dsd truncover ;
length v1-v2 8 v3-v4 $40 v5-v6 8;
input v1-v6 ;
run;
But if the extra comma could be in any field then you will probably need to have a human fix those lines.
If your field value contains the field delimiter you will want to double quote the field value. Proc EXPORT will do such double quoting when the data base type is specified as CSV
Example:
data have;
A = 1;
B = 2;
C = 3;
D = 'x,y';
run;
filename csv temp;
proc export data=have outfile=csv dbms=csv;
run;
data _null_;
infile csv;
input;
put _infile_;
run;
The log will show the exported file contains double quoted values as needed in the csv file produced.
Log
A,B,C,D
1,2,3,"x,y"

How to create data set by the reference csv file

data name
filename reference name "filename.csv"
infile filename.csv dlm=",";
run;
what is wrong with the code?How to create data set by the reference csv file
Place the filename statement before the DATA Step.
You will need an INPUT statement to read the data into variables,
or if the file has a header row use Proc IMPORT and the system will best guess the input needed.
Example 1
Presume file has no header row and there are 3 columns of numbers separated by commas
filename myfile 'mydatafile.csv';
data want;
infile myfile dsd dlm=',';
input x y z;
run;
Example 2
Presume there is a header row
filename myfile 'mydatafile.csv';
proc import file=myfile replace out=want dbms=csv;
run;
or
* columns expected are known;
filename myfile 'mydatafile.csv';
data want;
infile myfile dsd dlm=',' firstobs=2;
input x y z;
run;
NOTE
An INFILE statement can also directly refer to a file
...
INFILE "filename.csv" ... ;
...

SAS include code into data step

I have dynamically create a myfile.sas with following content:
and a = 0
and b = 0
Now I want to include this file into a data step:
data y;
set x;
if 1=1
%include incl("myfile.sas")
then selektion=0;
else selektion=1;
run;
The result should be:
data y;
set x;
if 1=1
and a=0
and b=0
then myvar=0
else myvar=1;
run;
However I get the following error:
ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored.
Is this possible to include the file into the if statement?
Indeed, that doesn't work. You can use %include within a data or proc step to add some lines to it but not within an incomplete statement.
Had your myfile.sas looked like this:
if 1=1
and a = 0
and b = 0
you could have written
data y;
set x;
%include "myfile.sas";;
then selektion=0;
else selektion=1;
run;
Couldn't you have these lines in a macro instead of a file?
%macro mymacro;
and a=0
and b=0
%mend;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;
If that myfile.sas has to stay as is, you could work around it in this rather convoluted (but still generic) way:
filename myfile temp;
data _null_;
file myfile2;
infile 'myfile.sas' eof=end;
input;
if _n_=1 then put '%macro mymacro;';
put _infile_;
return;
end:
put '%mend;';
run;
%include myfile;
data y;
set x;
if 1=1
%mymacro
then selektion=0;
else selektion=1;
run;
The %INCLUDE needs to be at statement boundary. You could put the IF 1=1 into the same file or into another file. Make sure to include semi-colon to end the %INCLUDE command, but don't include a semi-colon in the contents of of the file.
data y;
set x;
%include incl("if1file.sas","myfile.sas") ;
then selektion=0;
else selektion=1;
run;
A better solution might be to put the code into a macro variable (if less than 64K bytes).
%let condition=
and a = 0
and b = 0
;
data y;
set x;
if 1=1 &condition then selektion=0;
else selektion=1;
run;
If it is longer than 64K bytes then define it as a macro instead.
%macro condition;
and a = 0
and b = 0
%mend;
data y;
set x;
if 1=1 %condition then selektion=0;
else selektion=1;
run;
According to SAS documentation:
%INCLUDE Statement
Brings a SAS programming statement, data lines, or both, into a current SAS program.
The injection you are attempting is not a complete statement, so it fails. A more specific description of the action you are describing would be %INLINE. However, there is no such SAS statement.
Let's call a program that outputs code a 'codegener' and the output it produces the 'codegen'
In the context of your use the codegen is specific to a single statement. This highly suggests the codegener should be placing the codegen in a macro variable (for ease of later use) instead of a file.
Suppose the codegener uses data about statement construction:
DATA statements_meta;
length varname $32 operator $10 value $200;
input varname operator value;
datalines;
a = 0
b = 0
run;
and the codegener is a DATA step
DATA _null_;
file "myfile.snippet";
... looping logic over data for statement construction ...
put " and " varname " = 0 "
...
run;
Change the codegener to be more like the following:
DATA _null_;
length snippet $32000;
snippet = "";
... looping logic over data for statement construction ...
snippet = catx (" ", snippet, "and", varname, comparisonOperator, comparisonValue);
... end loop
call symput('snippet', trim(snippet));
stop;
run;
...
DATA ...
if 1=1 &snippet then ... else ...
run;

How to import multiple .dbf files in SAS

%let dirname = C:\Users\data;
filename DIRLIST pipe 'dir/B &dirname\*.dbf';
/* Create a data set with one observation for each file name */
data dirlist;
length fname $8.;
infile dirlist length=reclen;
input fname $8.;
run;
data all_text (drop=fname);
set dirlist;
filepath = "&dirname\"||fname||".dbf";
infile dummy filevar = filepath length=reclen end=done missover;
do while(not done);
INPUT
F1 : 2.
F2 : 2.
F3 : 2.
F4 : 10.
F5 : 4.;
output;
end;
run;
The problem is that it is only reading the first line of each files and not the whole file before moving to the next. Also variable F1 are shown as missing.
Suggestions are welcome
So a standard proc import would be:
proc import out=sample1 datafile="path to dbf file.dbf" dbms=DBF replace;
run;
The problem now, is how to generate this set of code for every file in your file list. Using the CALL EXECUTE statement from #Tom is your best bet. You call also create a small macro and call it for each filename, using CALL EXECUTE. If you're new to SAS this can be easier to understand.
*Create a macro that imports the DBF
%macro import_dbf(input= , output=);
proc import out=&out datafile="&output" dbms=DBF replace;
run;
%mend;
Then call macro from dataset. I'm naming the datasets DBF001, DBF0002 etc.
%let dirname=C:\_localdata;
data dirlist;
informat fname $20.;
input fname;
cards;
data1.dbf
data2.dbf
data3.dbf
data4.dbf
;
run;
data out;
set dirlist;
str=catt('%import_dbf(input="', "&dirname", '\', fname, '", output=dbf',
put(_n_, z4.), ');');
run;
proc print data=out;
run;
Import them one by one and then combine them.
%let dirname = C:\Users\data;
data filelist ;
infile "dir /b &dirname\*.dbf" pipe truncover end=eof;
fileno + 1;
input fname $256. ;
tempname = 'temp'||put(fileno,z4.);
call execute(catx(' ','proc import replace dbms=dbf'
,'out=',tempname,'datafile=',quote(trim(fname)),';run;'
));
if eof then call symputx('lastname',tempname);
run;
data want ;
set temp0001-&lastname;
run;

Double Looping in SAS

This problem might be trivial but I got stuck.
My problem is I've to go to each folder for a dataset and transpose the data.
I wrote the following code and it works fine.
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%LET LOC=E:\folder;
%macro test1(k,l);
libname libary "&loc.\&k\&l.";
data dataset_&l.;
set libary.dataset_original;
run;
proc transpose data=dataset_&l. out=dataset_&l._T;
run;
%mend;
%test1(var_1,var'_1);
%test1(var_2,var'_2);
%test1(var_3,var'_3);
The issue with this code is it's not dynamic in terms of folder structure. E.g. if there's another 4 extra folders, I've to write "%test1(var_3,var'_3);"4 times.
So I tried writing the following code to make it more dynamic. But unfortunately it's not working. Can anybody please tell me where I'm making the mistake.
OPTIONS MPRINT MLOGIC SYMBOLGEN;
%LET LOC=E:\folder;
%let k=var_1 var_2 var_3;
%let l=var'_1 var'_2 var'_3;
%macro words(string);
%local count word;
%let count=1;
/* The third argument of the %QSCAN function specifies the delimiter */
%let word=%qscan(&string,&count,%str( ));
%do %while(&word ne);
%let count=%eval(&count+1);
%let word=%qscan(&string,&count,%str( ));
%end;
%eval(&count-1)
%mend words;
%macro test1(k,l);
libname libary "&loc.\&k\&l.";
data dataset_&l.;
set libary.dataset_original;
run;
proc transpose data=dataset_&l. out=dataset_&l._T;
run;
%mend;
%macro test();
%do i=1 %to %words(&k.);
%do j=1 %to %words(&l.);
%let var=%scan(&k.,&i.,str());
%let var1=%scan(&l.,&j.,str());
%test1(&var.,&var1.);
%end;
%end;
%mend;
%test();
Thanks!
Try this:
/* Set your base directory */
%let base = E:\Folder;
/* Pipe output from dir */
filename flist pipe "dir /s /b /a:-h &base";
/* Read files from pipe */
data files;
length file dir $ 200 name $ 50 ext $ 10;
infile flist;
input #1 file $ &;
/* File extension */
ext = scan(file, -1, ".");
/* File name */
name = scan(scan(file, -1, "\"), 1, ".");
/* Directory */
rfile = reverse(file);
dir = reverse(substr(rfile, index(rfile, "\") + 1));
/* Select only SAS datasets */
if upcase(ext) = "SAS7BDAT" then output;
drop rfile;
run;
/* Define a macro to process each file */
%macro trans_file(dir, name);
libname d "&dir";
proc transpose data = d.&name out = d.&name._t;
run;
libname d clear;
%mend trans_file;
/* Run on all files */
data _null_;
set files;
call execute(cats('%trans_file(', dir, ",", name, ");"));
run;
This gets the file list by submitting the Windows command dir. It gets all files in the specified directory and its subdirectories.
This approach then uses a simple macro that defines a data library, reads a dataset from the library, writes a transposed dataset to it, then clears it. The macro is called for each file in the list using call execute.