Do loop following filevar option in SAS - sas

data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do until(done);
if _N_ =1 then
input &headerlength.;
input &allvar.;
variable path is in di data set.
I wanna read multiple txt files into one SAS data set. In each txt file the first row is header and I want to retain this header for each observation so I used if _N_ = 1 input header then input second row of other variables for analysis.
The output is very strange. only the first row contains header and other rows are not correct observations.
Could someone help me a little bit? Thank you so much.

I like Shenglin Chen's answer, but here's another option: reset the row counter to 1 each time the data step starts importing a new file.
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do _N_ = 1 by 1 until(done);
if _N_ = 1 then input &headerlength.;
input &allvar.;
This generalises more easily in case you ever want to do something different with every nth row within each file.

data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover dlm='09'x;
input &headerlength.;
do until(done);
input &allvar.;

You should use WHILE (NOT DONE) instead of UNTIL (DONE) to prevent reading past the end of the file, and stopping the data step, when the file is empty. Or for some of the answers when the file only has the header row.


Manually Reading in Data in SAS from CSV

So I have a large dataset that is rather oddly formatted and I want to read it in based on the header. It only has unique columns for each unique participant and each participant participated in multiple rounds of the study. The data is from some experiments and is formatted as having variables for each participant (e.g. "participant.code") then some session variables which I can drop and then the actual variables from the experiment. These are formatted as "study.[round number].player.[variable]"
Rather then repeating the variable for every round, I want to just take out the round number as a separate variable and have an observation for every round for each participant.
I want to read these in differently depending on the variable and pick it out. I would rather not have to manually mess with the source file since the experiment is going to be run multiple times.
If someone could just point me towards some relevant material or whatnot that would be great.
Thank you!
Edit: example of some of the raw data:
1,kppf7hjb,,0,221,221,study,FinalPay,2022-04-16 22:08:18.471115,1,,,0.0,lew8kph3,,,,,0,1.0,0.0,externality_control,0,2,Seller,0.0,1,0,0,10,0,125,125,50,100,50,0,0,0,1,1,,,1,3,,0,1,1,100,0,0,,50.0,,,,,,1,1,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,100,0,0,,45.0,,,,,,1,2,6,1,5,6,13,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,3,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,138,1,0,,38.0,,,,,,1,4,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,135,1,0,,35.0,,,,,,1,5,6,1,5,6,11,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,0,0,,0,,,100,0,0,,,,,,,,1,6,5,1,5,6,6,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,6,,0,,,132,1,0,,32.0,,,,,,1,7,6,1,5,6,4,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,150,1,0,,50.0,,,,,,1,8,6,1,5,6,9,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,2,,0,,,100,0,0,,49.0,,,,,,1,9,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,5,,0,,,100,0,0,,39.0,,,,,,1,10,6,1,5,6,3,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,132,1,0,,32.0,,,,,,1,11,6,1,5,6,10,2,Seller,0.0,,0,0,0,0,,,,,,,,,,,,,1,1,,0,,,130,1,0,,30.0,,,,,,1,12,6,1,5,6,8,2,Seller,0.0,1,192,132,10,0,,,,,,,,,,,,,1,2,,0,,,128,1,0,,28.0,,,,,,1,13,6,1,5,6,11
Your file is not really as complicated as it first seems. For example the bulk of the data is just 43 columns that repeat 13 times. The STUDY.1 columns, then STUDY.2 columns etc.
For this one just write a program to read it. There are 22 columns that are not "study" columns. Then 13 copies of the 43 study columns.
data want;
infile csv dsd truncover firstobs=2;
input var1 ..... var22 #;
do study=1 to 13;
input svar1 .... svar43 # ;
So you turn each line into 13 observations (study=1 to study=13).
To complete the sketch of a data step above you just need figure out want names you want to use for the 65 (22 + 43) variables other than STUDY. And for each variable what type of variable it is, numeric or character, and when character what length it needs to store the longest possible value.
If you need to work with a lot of different variations of files in this style then it might be worth working on a program to analyze the headers and determine the role of the columns based on the pattern of the header name and perhaps generate the code to read the file.
You might start by building a dataset with just the header names.
data headers;
infile csv dsd obs=1 ;
length col 8 words 8 ;
array header [4] $50 ;
input header1 :$50. ## ;
do _n_=words to 1 by -1;
header[_n_] = scan(header1,_n_,'.');
You can use that list of the headers to help you figure out what would be useful names for the variables.
If you want to let SAS guess how to define and name the variables you could try splitting the CSV file into two separate CSV files. One with the first 22 columns and one with the other 43. So first split the headers (perhaps removing the STUDY.N. prefix while you are at it). Then split the data. Add an ROW number to make it easy to join them later.
filename single temp;
filename multiple temp;
data _null_;
infile csv dsd obs=1 ;
input header :$50. ## ;
file single dsd ;
if _n_=1 then put 'ROW,' #;
if _n_<= 22 then put header #;
else do;
file multiple dsd;
if _n_=23 then put 'ROW,STUDY,'# ;
call scan(header,3,pos,len,'.');
header = substr(header,pos);
put header #;
if _n_=22+43 then stop;
data _null_;
infile csv dsd firstobs=2 truncover ;
length s1-s43 $200 ;
input s1-s22 #;
file single dsd mod;
put row s1-s22 ;
file multiple dsd mod;
do study=1 to 13 ;
input s1-s43 # ;
put row study s1-s43 ;
Now you can use PROC IMPORT to GUESS how to read SINGLE and MULTIPLE and then you can join them back together.
proc import file=single dbms=csv out=single replace;
proc import file=multiple dbms=csv out=multiple replace;
data want;
merge single multiple;
by row;

How to get last modified time of file while using infile statement?

I am going to analyze a batch of SAS program file and I am stucked in getting the last modified time of program files. I have thought about X command but it was too inefficient.
I just find when I use infile statement:
data test;
infile 'D:\test.txt' truncover;
input ;
Log shows the last modified time:
NOTE: The infile 'D:\test.txt' is:
RECFM=V,LRECL=32767,File Size (bytes)=7,
Last Modified=2021/1/26 15:25:48,
Create Time=2021/1/26 15:25:42
As you can see, log window shows the infomation of file as a NOTE. However, my wish output is a variable filled with Last Modified Time.
Is there some option to get it while using infile statement?
Surely, Other efficient ways are welcomed, too.
Use functions FOPEN and FINFO
Show all available information items and their value for a sample data file.
filename datafile 'c:\temp\datafile.txt';
data _null_;
file datafile;
put 'Me data';
data _null_;
fid = fopen('datafile');
if fid then do;
do index = 1 to foptnum(fid);
info_name = foptname(fid,index);
info_value = finfo(fid, info_name);
put index= info_name= #40 info_value=;
rc = fclose(fid);
Will log information such as
index=1 info_name=Filename info_value=c:\temp\datafile.txt
index=2 info_name=RECFM info_value=V
index=3 info_name=LRECL info_value=32767
index=4 info_name=File Size (bytes) info_value=9
index=5 info_name=Last Modified info_value=26Jan2021:06:29:47
index=6 info_name=Create Time info_value=26Jan2021:06:28:23

How do I import multiple unformatted data files into SAS while skipping multiple lines for each file?

I am trying to import multiple unformatted data files in a single folder into a SAS dataset using a '*.xle' wildcard while skipping the first 47 lines of each file. SAS will use the 'firstobs=48' for the first file but will ignore for each subsequent file and begin reading at line 1. I have set up the code using the eov=0 as suggested on multiple other Stackoverflow threads, but it still does not seem to work. Any help is much appreciated. Please see my code below:
data test;
infile "*.xle" eov=eov firstobs=48;
input #;
if eov then input;
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
You are very close you need to input 47 times when you start a new file EOV=1.
Alternatively you could use FILEVAR and FIRSTOBS would work for each file but that would require generating a list of filenames to use to drive the data step. Six vs. half dozen so to speak.
filename FT15F001 '.\a.xle';
a line 1
a line 2
a line 3
a line 4
filename FT15F001 '.\b.xle';
b line 1
b line 2
b line 3
b line 4
filename FT15F001 '.\c.xle';
c line 1
c line 2
c line 3
c line 4
data test;
infile "*.xle" eov=eov firstobs=3 length=l;
input #;
if eov then do;
do _n_ = 1 to 2; input; end;
input line $varying40. l;
proc print;
It is possible to use the variable created by the EOV= option, but I have found it easier to just use FILENAME= instead and then use LAG() function to detect when a new file starts.
To skip 48 lines you could execute multiple INPUT statements or add multiple / characters to one INPUT statement.
data test;
length fname $256 ;
infile "*.xle" filename=fname ;
input #;
if fname ne lag(fname) then do;
input %sysfunc(repeat(/,48-1));
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
Note that if any of the files is actually shorter than expected then you will need to be more careful in both the skipping step and the reading step. Otherwise when you read multiples lines in one INPUT statement you could read past the end of one file and start reading lines from the next file.
It might be better to get the list of files first and use that with the FILEVAR= option to drive the process. Then INFILE executes separately for each file and you can use the FIRSTOBS= option. You will then need to add a loop to read and output the observations from the text file(s). This way each iteration of the data step will process one whole file.
data files;
infile "ls *.xle" pipe truncover ;
input filename $256.;
data test;
set files ;
fname=filename ;
infile dummy filevar=fname firstobs=48 end=eof;
do while (not eof);
input Date $ 19-28 / Time $ 19-26 // Data 18-24 / Temp 18-22 //;
But again reading multiple lines in one INPUT statement is dangerous and you should change the code that reads the lines to read them one by one and check that you have not read past the end of the file yet. Remember that SAS will stop the whole data step if INPUT statement (or SET statement) reads past the end of the input stream.

Read in Files with pattern match into one SAS dataset

I would like to read a number of .csv files into a single SAS dataset using a pattern match. For example if in the directory /home/datasets there are 5 files:
All with known and identical structures and data types. I would like to read in only those files corresponding to group 1 without having to explicitly specify the filenames.
You can use a wildcard in your infile statement. If you have headers in each file you'll need to account for that. Here's a bit more of an example.
data try01;
length filename txt_file_name $256;
retain txt_file_name;
infile "Path\*.txt" eov=eov filename=filename truncover;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
else input
*Place input code here;

Get the current observation count in SAS

I have a file in which the first line is a header line containing some meta-data information.
How can I get the current observation number(say =1 for the first observation) that the SAS processor is dealing with so that I can put in a IF clause to handle such special data line.
Follow up: I want to process the first line and keep one of the column values in a local variable for further processing. I don't want to keep this line in my final output. is this possible?
The automatic variable _N_ returns the current iteration number of the SAS data step loop. For a traditional data step, ie:
data something;
set something;
_N_ is equivalent to the row number (since one row is retrieved for each iteration of the data step loop).
So if you wanted to only do something once, on the first iteration, this would accomplish that:
data something;
set something;
if _n_ = 1 then do;
(more code);
For your follow up, you want something like this:
data want;
set have;
retain _temp;
if _n_ = 1 then do;
_temp = x;
... more code ...
drop _temp;
DROP and RETAIN statements can appear anywhere in the code and have the same effect, I placed them in their human-logical locations. RETAIN says to not reset the variable to missing each time through the data step loop, so you can access it further down.
if you are reading a particularly large text file, you may want to avoid having to execute the (if _n_=1 then) condition for every iteration. You can do this by reading the file twice - once to extract the header row, and again to read in the file, as follows:
data _null_; /* create dummy file for demo purposes */
file "c:\myfile.txt";
put 'blah'; output;
put 'blah blah blah 666'; output;
data _null_; /* read in header info */
infile "c:\myfile.txt";
input myvar:$10.; /* or wherever the info is that you need */
call symput('myvar',myvar);/* create macro variable with relevant info */
stop; /* no further processing at this point */
data test; /* read in data FROM SECOND LINE */
infile "c:\myfile.txt" firstobs=2 ; /* note the FIRSTOBS option */
input my $ regular $ input $ statement ;
For short / simple stuff though, Joe's answer is better as it's more readable.. (and may be more efficient for small files).