I have an xml file that I need to read as a single column table. Now, to achieve the result I embed following line into INFILE statement:
dlmstr='nodlmstr'
I have not found any appropriate option that would let me do it more accurately.
I don't think you need an option at all. Just create your 1 variable with enough length to hold the line.
data _null_;
file "c:\temp\test.xml";
put "<a>";
put " <aa>1 </aa>";
put " <bb>2</bb>";
put "</a>";
run;
data test;
infile "c:\temp\test.xml";
format line $2000.;
input;
line = _infile_;
run;
Related
I'm relatively new to SAS (using SAS EG if it matters).
I have several previously written programs that read new data files. For each type of data file there's a separate program with it's specific INPUT column statement.
For example, one program would have:
DATA data1;
INFILE 'D:\file.txt' noprint missover;
INPUT
ID 1 - 8
NAME $ 9 - 20;
run;
whereas another program would have other definitions. for example:
INPUT
ID 1 - 5
NAME $ 6 - 20
Each data file contains hundreds of variables, so the INPUT column statement in each program is very long. However, the rest of these programs are completely identical.
My intention is to combine these programs into one,
I have two questions:
Is it possible to combine these programs with a conditional INPUT column statement?
Is it possible to read the definition of each file type columns from a variable? (Thus enabling me to define it elsewhere in the workflow or even to read it from an external file)
It seems like you use text files with a fixed width definition. For these you can each specify a format file of the form
column, type, start, end
and then read that file first in order to build the INPUT statement. column is the column name, type one of n (numeric) or c (character), start and end start and end positions for this column.
You would wrap this into a MACRO like this:
%macro readFile(file, output);
%local input_statement;
/* First, read the format file that contains the column details. */
data _null_;
infile "&file..fmt" dlm="," end=eof;
input column $ type $ start end;
length input_statement $ 32767;
retain input_statement "input";
if type = "c" then type = "$";
else type = "";
input_statement = catx(" ", input_statement, column, type, start, "-", end);
if eof then call symputx("input_statement", input_statement);
run;
/* Read the actual file. */
data &output.;
infile "&file.";
&input_statement.;
run;
%mend;
For a file file.txt the macro needs the format file to be named file.txt.fmt in the same path. Call the macro as
%readFile(%str(D:\file.txt), data1);
I am trying to import multiple unformatted data files in a single folder into a SAS dataset using a '*.xle' wildcard while skipping the first 47 lines of each file. SAS will use the 'firstobs=48' for the first file but will ignore for each subsequent file and begin reading at line 1. I have set up the code using the eov=0 as suggested on multiple other Stackoverflow threads, but it still does not seem to work. Any help is much appreciated. Please see my code below:
data test;
infile "*.xle" eov=eov firstobs=48;
input #;
if eov then input;
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
eov=0;
run;
You are very close you need to input 47 times when you start a new file EOV=1.
Alternatively you could use FILEVAR and FIRSTOBS would work for each file but that would require generating a list of filenames to use to drive the data step. Six vs. half dozen so to speak.
filename FT15F001 '.\a.xle';
parmcards;
a line 1
a line 2
a line 3
a line 4
;;;;
filename FT15F001 '.\b.xle';
parmcards;
b line 1
b line 2
b line 3
b line 4
;;;;
filename FT15F001 '.\c.xle';
parmcards;
c line 1
c line 2
c line 3
c line 4
;;;;
data test;
infile "*.xle" eov=eov firstobs=3 length=l;
input #;
if eov then do;
do _n_ = 1 to 2; input; end;
eov=0;
end;
input line $varying40. l;
list;
run;
proc print;
run;
It is possible to use the variable created by the EOV= option, but I have found it easier to just use FILENAME= instead and then use LAG() function to detect when a new file starts.
To skip 48 lines you could execute multiple INPUT statements or add multiple / characters to one INPUT statement.
data test;
length fname $256 ;
infile "*.xle" filename=fname ;
input #;
if fname ne lag(fname) then do;
input %sysfunc(repeat(/,48-1));
end;
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
run;
Note that if any of the files is actually shorter than expected then you will need to be more careful in both the skipping step and the reading step. Otherwise when you read multiples lines in one INPUT statement you could read past the end of one file and start reading lines from the next file.
It might be better to get the list of files first and use that with the FILEVAR= option to drive the process. Then INFILE executes separately for each file and you can use the FIRSTOBS= option. You will then need to add a loop to read and output the observations from the text file(s). This way each iteration of the data step will process one whole file.
data files;
infile "ls *.xle" pipe truncover ;
input filename $256.;
run;
data test;
set files ;
fname=filename ;
infile dummy filevar=fname firstobs=48 end=eof;
do while (not eof);
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
output;
end;
run;
But again reading multiple lines in one INPUT statement is dangerous and you should change the code that reads the lines to read them one by one and check that you have not read past the end of the file yet. Remember that SAS will stop the whole data step if INPUT statement (or SET statement) reads past the end of the input stream.
I'm trying to create a program that takes a text file, replaces any macro references within it, and appends it to a single output file. The macro references are generated as I iterate over the observations in a dataset.
I'm having trouble trying to get it to read the entire text file for each observation in my source table. I think there's an implicit stop instruction related to my use of the end= option on the infile statement that is preventing my set statement from iterating over each record.
I've simplified the template and code, examples below:
Here is the template that I'm trying to populate:
INSERT INTO some_table (name,age)
VALUES (&name,&age);
Here is the SAS code:
filename dest "%sysfunc(pathname(work))\backfill.sql";
data _null_;
attrib line length=$1000;
set sashelp.class;
file dest;
infile "sql_template.sas" end=template_eof;
call symput('name', quote(cats(name)));
call symput('age' , cats(age));
do while (not template_eof);
input;
line = resolve(_infile_);
put line;
end;
run;
Running the above code produces the desired output file but only for the first observation in the dataset.
You cannot do it that way since after the first observation you are already at the end of the input text file. So your DO WHILE loop only runs for the first observation.
Here is a trick that I learned a long time ago on SAS-L. Toggle between two input files so that you can start at the top of the input file again.
First let's create your example template program and an empty dummy file.
filename template temp;
filename dummy temp;
data _null_;
file template;
put 'INSERT INTO some_table (name,age)'
/ ' VALUES (&name,&age)'
/ ';'
;
file dummy ;
run;
Now let's write a data step to read the input data and use RESOLVE() function to convert the text.
filename result temp;
data _null_;
length filename $256 ;
file result ;
set sashelp.class;
call symputx('name', catq('1at',name));
call symputx('age' , age);
do filename=pathname('template'),pathname('dummy');
infile in filevar=filename end=eof ;
do while (not eof);
input;
_infile_ = resolve(_infile_);
put _infile_;
end;
end;
run;
The resulting file will look like this:
INSERT INTO some_table (name,age)
VALUES ('Alfred',14)
;
INSERT INTO some_table (name,age)
VALUES ('Alice',13)
;
...
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do until(done);
if _N_ =1 then
input &headerlength.;
input &allvar.;
output;
end;run;
variable path is in di data set.
I wanna read multiple txt files into one SAS data set. In each txt file the first row is header and I want to retain this header for each observation so I used if _N_ = 1 input header then input second row of other variables for analysis.
The output is very strange. only the first row contains header and other rows are not correct observations.
Could someone help me a little bit? Thank you so much.
I like Shenglin Chen's answer, but here's another option: reset the row counter to 1 each time the data step starts importing a new file.
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do _N_ = 1 by 1 until(done);
if _N_ = 1 then input &headerlength.;
input &allvar.;
output;
end;
run;
This generalises more easily in case you ever want to do something different with every nth row within each file.
Try:
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover dlm='09'x;
input &headerlength.;
do until(done);
input &allvar.;
output;
end;
run;
You should use WHILE (NOT DONE) instead of UNTIL (DONE) to prevent reading past the end of the file, and stopping the data step, when the file is empty. Or for some of the answers when the file only has the header row.
I'm trying to use a double pipe delimiter "||" when I export a file from SAS to txt. Unfortunately, it only seems to correctly delimit the header row and uses the single version for the data.
The code is:
proc export data=notes3 outfile='/file_location/notes3.txt'
dbms = dlm;
delimiter = '||';
run;
Which results in:
ID||VAR1||VAR2
1|0|STRING1
2|1|STRING2
3|1|STRING3
If you want to use a two character delimiter, you need to use dlmstr instead of dlm in the file statement in data step file creation. You can't use proc export, unfortunately, as that doesn't support dlmstr.
You can create your own proc export fairly easily, by using dictionary.columns or sashelp.vcolumn to construct the put statement. Feel free to ask more specific questions on that side if you need help with it, but search around for data driven output and you'll most likely find what you need.
The reason proc export won't use a double pipe is because it generates a data step to do the export, which uses a file statement. This is a known limitation - quoting the help file:
Restriction: Even though a character string or character variable is
accepted, only the first character of the string or variable is used
as the output delimiter. This differs from INFILE DELIMITER=
processing.
The header row || works because SAS constructs it as a string constant rather than using a file statement.
So I don't think you can fix the proc export code, but here's a quick and dirty data step that will transform the output into the desired format, provided that your dataset has no missing values and doesn't contain any pipe characters:
/*Export as before to temporary file, using non-printing TAB character as delimiter*/
proc export
data=sashelp.class
outfile="%sysfunc(pathname(work))\temp.txt"
dbms = dlm;
delimiter = '09'x;
run;
/*Replace TAB with double pipe for all rows beyond the 1st*/
data _null_;
infile "%sysfunc(pathname(work))\temp.txt" lrecl = 32767;
file "%sysfunc(pathname(work))\class.txt";
input;
length text $32767;
text = _infile_;
if _n_ > 1 then text = tranwrd(text,'09'x,'||');
put text;
run;
/*View the resulting file in the log*/
data _null_;
infile "%sysfunc(pathname(work))\class.txt";
input;
put _infile_;
run;
As Joe suggested, you could alternatively write your own delimiter logic in a dynamically generated data step, e.g.
/*More efficient option - write your own delimiter logic in a data step*/
proc sql noprint;
select name into :VNAMES separated by ','
from sashelp.vcolumn
where libname = "SASHELP" and memname = "CLASS";
quit;
data _null_;
file "%sysfunc(pathname(work))\class.txt";
set sashelp.class;
length text $32767;
text = catx('||',&VNAMES);
put text;
run;