from this link i learnt how to read multiple txt file.
Problem: is it possible to create a macro variable to input all txt file in a folders. say C:\Users\Desktop\ (given all files are in txt format with name datasetyyyymmdd.)
I have dataset20150101.txt - dataset20150806.txt and i do not want to manually input all those linkage in the datalines.
data whole2;
infile datalines;
length fil2read $256;
input fil2read $;
infile dummy filevar=fil2read end=done dsd;
do while (not done);
input name$ value1 value2;
output;
end;
datalines;
C:\Users\Desktop\dataset20150501.txt
C:\Users\Desktop\dataset20150502.txt
run;
Ask the operating system which files are present:
filename DataIn pipe "dir C:\Users\Desktop\dataset*.txt /S /B";
data whole2;
infile DataIn truncover;
length fil2read $256;
input fil2read $;
infile dummy filevar=fil2read end=done dsd;
do while (not done);
input name$ value1 value2;
output;
end;
run;
The Bare option /B removes unneeded information like last access date.
I added the Sub-folder option /S because then the dir statement returns full path names. This way it also reads dataset*.txt files in subfolder of C:\Users\Desktop\. If that does not suite you, remove the /S and use
path2Read = "dir C:\Users\Desktop\"||fil2read;
Related
This is a follow-up of my previous question:
How to import a txt file with single quote mark in a variable and another in another variable.
The solution there works perfectly until there is not a variable whose values could be null.
In this latter case, I get:
filename sample 'c:\temp\sample.txt';
data _null_;
file sample;
input;
put _infile_;
datalines;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data want;
data prova;
infile sample dlm='|' lrecl=50 truncover;
format
codice $3.
could_be_null $20.
nome $20.
luogo $20.
importo 4.
;
input
codice
could_be_null
nome
luogo
importo
;
putlog _infile_;
run;
proc print;
run;
Is it possible to correctly load a file like the one in the example directly in SAS, without manually modifying the original .txt?
You will need to pre-process the file to fix the issue.
If you add quotes around the values then you will not have the problem.
002||"'80S WERE GREAT"|"FORLI'"|1100
IF you know that none of the values contain the delimiter then adding a space before every delimiter
002 | |'80S WERE GREAT |FORLI' |1100
will let you read it without the DSD option.
If lines are shorter than 32K bytes then it can be done in the same step that reads the data.
data test2 ;
infile sample dlm='|' truncover ;
input #;
_infile_ = tranwrd(_infile_,'|',' |');
input (var1-var5) (:$40.);
run;
proc print;
run;
Results:
Obs var1 var2 var3 var4 var5
1 001 This variable could be null PROVA MILANO 1000
2 002 '80S WERE GREAT FORLI' 1100
3 003 '80S WERE GREAT ROMA 1110
One way to test if you have the issue is to make sure each line has the right number of fields.
filename sample temp;
options parmcards=sample;
parmcards;
001|This variable could be null|PROVA|MILANO|1000
002||'80S WERE GREAT|FORLI'|1100
003||'80S WERE GREAT|ROMA|1110
;
data _null_;
infile sample dsd end=eof;
if eof then do;
call symputx('nfound',nfound);
putlog / 'Found ' nfound :comma11.
'problem lines out of ' _n_ :comma11. 'lines.'
;
end;
input;
retain expect nfound;
words=countw(_infile_,'|','qm');
if _n_=1 then expect=words;
else if expect ne words then do;
nfound+1;
if nfound <= 10 then do;
putlog (_n_ expect words) (=) ;
list;
end;
end;
run;
Example Results:
_N_=2 expect=5 words=4
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8
2 002||'80S WERE GREAT|FORLI'|1100 32
_N_=3 expect=5 words=3
3 003||'80S WERE GREAT|ROMA|1110 30
Found 2 problem lines out of 4 lines.
PS Go tell SAS to enhance their delimited file processing: https://communities.sas.com/t5/SASware-Ballot-Ideas/Enhancements-to-INFILE-FILE-to-handle-delimited-file-variations/idi-p/435977
You need to add the DSD option to your INFILE statement.
https://support.sas.com/techsup/technote/ts673.pdf
DSD (delimiter-sensitive data) option—Specifies that SAS should treat
delimiters within a data value as character data when the delimiters
and the data value are enclosed in quotation marks. As a result, SAS
does not split the string into multiple variables and the quotation
marks are removed before the variable is stored. When the DSD option
is specified and SAS encounters consecutive delimiters, the software
treats those delimiters as missing values. You can change the default
delimiter for the DSD option with the DELIMTER= option.
I am trying to use SAS to read multiple files from a directory and they were created before a date.
I have used this code to help me to read all the files. It works perfectly. Now I found out that only some files that were created before a certain date are what I need. I think that could be done either by FILENAME PIPE Dir options or by INFILE statement options, but I cannot find the answers.
code source:
http://support.sas.com/kb/41/880.html
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
data dirlist ;
infile dirlist lrecl=200 truncover;
input file_name $100.;
run;
data _null_;
set dirlist end=end;
count+1;
call symputx('read'||put(count,4.-l),cats('c:\_today\',file_name));
call symputx('dset'||put(count,4.-l),scan(file_name,1,'.'));
if end then call symputx('max',count);
run;
options mprint symbolgen;
%macro readin;
%do i=1 %to &max;
data &&dset&i;
infile "&&read&i" lrecl=1000 truncover dsd;
input var1 $ var2 $ var3 $;
run;
%end;
%mend readin;
%readin;
Currently you are reading in just the file names using the dir command. The existing /b modifier is saying print just the file name and nothing else. You want to change it to read both the file name and the CREATED date of the file. In order to do that it gets a little messy. You will need to change that pipe command from:
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /b ';
...to this... :
filename DIRLIST pipe 'dir "C:\_today\file*.csv" /tc ';
The output will change from something like this:
file1.csv
file2.csv
...
...to something like this... :
Volume in drive C has no label.
Volume Serial Number is 90ED-A122
Directory of C:\_today
01/13/2017 09:14 AM 1,991 file1.csv
01/11/2017 11:43 AM 169 file2.csv
...
...
...
01/11/2017 11:43 AM 169 file99.csv
99 File(s) 6,449 bytes
0 Dir(s) 57,999,806,464 bytes free
So you will then need to modify your data step that creates dirlist to clean up the results returned by the new dir statement. You will need to ignore the header and footer and read in the date and time etc. Once you have that date and time in the appropriate SAS format, you can then just use a SAS where clause to keep the rows you are interested in. I will leave this as an exercise for you to do. If you have trouble with it you can always open a new question.
If you need more information on the dir command, you can open up a command prompt (Start Menu->Run->"cmd"), and then type in dir /? to see a list of available switches for the dir command. You may find a slightly different combination of switches for it that better suits your task than what I listed above.
You can use powershell to leverage the features of the operating system.
filename get_them pipe
" powershell -command
""
dir c:\temp
| where {$_.LastWriteTime -gt '3/19/2019'}
| select -property name
| ft -hidetableheader
""
";
data _null_;
infile get_them;
input;
putlog _infile_;
run;
I am tying to convert a comma delimited text file to a pipe delimited file but my input file name (comma delimited file) is a variable (flname1). I am using the code below suggested by a stackoverflow member. The code works fine as long as I specify the file name in the infile statement but I don't know how to specify file name as a variable-
data _null_;
enddate=date();
flname1=compress("d:\temp\wq_" || year(enddate) || put(month(enddate),z2.) || ".txt");
length x1-x6 $200;
infile 'flname1' dsd dlm=',' truncover;
file 'C:\temp\pipe.txt' dsd dlm='|';
input x1-x6;
put x1-x6;
run;
I am new to SAS and any help will be greatly appreciated. Thank you!
You should be able to use the filevar option in the infile statement, e.g.:
data _null_;
enddate=date();
flname1=compress("d:\temp\wq_"||year(enddate)||put(month(enddate),z2.)||".txt");
length x1-x6 $200;
infile myinputfile dsd dlm=',' filevar=flname1 truncover;
file 'C:\temp\pipe.txt' dsd dlm='|';
input x1-x6;
put x1-x6;
run;
The documentation explains more about the option and has an example of its use in Example 5.
You probably want to actually do this as a macro variable - this isn't a normal usage of filevar (which you'd use if you had a dataset with a bunch of filenames in it or something).
%let filename = d:\temp\wq_%sysfunc(today(),YYMMN6.).txt;
%put &=filename;
data _null_;
length x1-x6 $200;
infile "&filename." dsd dlm=',' truncover;
file 'C:\temp\pipe.txt' dsd dlm='|';
input x1-x6;
put x1-x6;
run;
Macro variables are just text substitutions, so they can be used wherever you could type the same thing in. They also don't need concatenating functions - any more than you have to concatenate when you type a word in - so it's easier to do.
Here, I use %sysfunc to tell SAS to execute the today() function, and the second argument tells it how to format it - YYMMN6. is the format you look like you want (201506 or similar). Then just make sure to use " quotes not ' quotes as the latter doesn't let the macro variable resolve.
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do until(done);
if _N_ =1 then
input &headerlength.;
input &allvar.;
output;
end;run;
variable path is in di data set.
I wanna read multiple txt files into one SAS data set. In each txt file the first row is header and I want to retain this header for each observation so I used if _N_ = 1 input header then input second row of other variables for analysis.
The output is very strange. only the first row contains header and other rows are not correct observations.
Could someone help me a little bit? Thank you so much.
I like Shenglin Chen's answer, but here's another option: reset the row counter to 1 each time the data step starts importing a new file.
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover;
do _N_ = 1 by 1 until(done);
if _N_ = 1 then input &headerlength.;
input &allvar.;
output;
end;
run;
This generalises more easily in case you ever want to do something different with every nth row within each file.
Try:
data &state.&sheet.;
set di;
retain &header.;
infile in filevar= path end=done missover dlm='09'x;
input &headerlength.;
do until(done);
input &allvar.;
output;
end;
run;
You should use WHILE (NOT DONE) instead of UNTIL (DONE) to prevent reading past the end of the file, and stopping the data step, when the file is empty. Or for some of the answers when the file only has the header row.
I would like to read a number of .csv files into a single SAS dataset using a pattern match. For example if in the directory /home/datasets there are 5 files:
/home/datasets
~/output_group1a.csv
~/output_group1b.csv
~/output_group1c.csv
~/output_group2a.csv
~/output_group2b.csv
All with known and identical structures and data types. I would like to read in only those files corresponding to group 1 without having to explicitly specify the filenames.
You can use a wildcard in your infile statement. If you have headers in each file you'll need to account for that. Here's a bit more of an example.
https://gist.github.com/statgeek/4c27ea9a7ed6d3528835
data try01;
length filename txt_file_name $256;
retain txt_file_name;
infile "Path\*.txt" eov=eov filename=filename truncover;
input#;
if _n_ eq 1 or eov then do;
txt_file_name = scan(filename, -2, ".\");
eov=0;
end;
else input
*Place input code here;
;
run;