SAS Input from .txt where input spans multiple lines - sas

everyone.
I have a question that is driving me crazy.
Say I have 2 text files that look like this:
File_one.txt:
Name_sample_f1 *spans one line
File_sample_f1 *spans one line
String_sample_f1 *spans multiple, varying lines until the end of the file
String_sample_f1
File_two.txt:
Name_sample_f2 *spans one line
File_sample_f2 *spans one line
String_sample_f2 *spans multiple, varying lines until the end of the file
String_sample_f2
String_sample_f2
String_sample_f2
I would like to input both of them into a dataset named test and take the following form:
Name File String
---- ---- ------
1 Name_sample_f1 File_sample_f1 String_sample_f1
String_sample_f1
2 Name_sample_f2 File_sample_f2 String_sample_f2
String_sample_f2
String_sample_f2
String_sample_f2
I appreciate it ahead of time if anyone can help!
Thanks

You don't have to do it quite as complicatedly as three datasteps (especially if you're going to do N files). It's pretty easy, really. Use the EOV indicator (End of Volume) to see when you're at the start of a new file [EOV is tripped after ending a volume/file] and each time you're at the start of a new file, read the name and filename in the first two lines.
data test;
format name filename $100.;
retain name filename line;
infile '("c:\temp\file1.txt", "c:\temp\file2.txt")' eov=end lrecl=100 pad truncover; *or use wildcards, like infile "c:\temp\file*.txt";
input a $ #;
put _all_;
if (_n_=1) or (end=1) then do;
end=0;
line=1;
end;
else line+1;
if line=1 then do;
input #1 name $100.;
end;
else if line=2 then do;
input #1 filename $100.;
end;
else do;
input #1 string $100.;
output;
end;
run;

filename file1 'testfile1.txt';
filename file2 'testfile2.txt';
DATA file1;
LENGTH thisname thisfile thistext $ 200;
RETAIN thisname thisfile;
linecounter=0;
DO UNTIL(eof);
INFILE file1 end = eof;
INPUT;
linecounter+1;
IF (linecounter eq 1) THEN thisname=_infile_;
ELSE IF (linecounter eq 2) then thisfile=_infile_;
ELSE DO;
thistext=_infile_;
output;
END;
END;
RUN;
DATA file2;
LENGTH thisname thisfile thistext $ 200;
RETAIN thisname thisfile;
linecounter=0;
DO UNTIL(eof);
INFILE file2 end = eof;
INPUT;
linecounter+1;
IF (linecounter eq 1) THEN thisname=_infile_;
ELSE IF (linecounter eq 2) then thisfile=_infile_;
ELSE DO;
thistext=_infile_;
output;
END;
END;
RUN;
DATA all_files;
SET file1 file2;
RUN;
PROC PRINT DATA=all_files; RUN;

Related

Create a table from one line CSV data on SAS

I try to import data from a csv with only one line data formatted like this :
CAS$#$#$LLT_CODE$#$#$PT_CODE$#$#$HLT_CODE$#$#$HLGT_CODE$#$#$SOC_CODE$#$#$LLT$#$#$PT$#$#$HLT$#$#$HLGT$#$#$SOC$#$#$SOC_ABB#$#$#DJ20210005-0$#$#$10001896$#$#$10012271$#$#$10001897$#$#$10057167$#$#$10029205$#$#$Maladie d'Alzheimer$#$#$Démence de type Alzheimer$#$#$Maladie d'Alzheimer (incl sous-types)$#$#$Déficiences mentales$#$#$Affections du système nerveux$#$#$Nerv#$#$#DJ20210005-0$#$#$10019308$#$#$10003664$#$#$10007607$#$#$10007510$#$#$10010331$#$#$Communication interauriculaire$#$#$Communication interauriculaire$#$#$Défauts congénitaux du septum cardiaque$#$#$Troubles congénitaux cardiovasculaires$#$#$Affections congénitales, familiales et génétiques$#$#$Cong#$#$#
"#$#$#" determine end of line and "$#$#$" separe columns.
How can i do to import it ?
Here's my code :
data a; infile "C:/Users/Papa Yatma/Documents/My SAS Files/9.4/ATCD.txt" dlm="$" dsd ; input var1 $ var2 $ var3 $ var4 $ var5 $ var6 $ var7 $ var8 $ var9 $ var10 $ var11 $ var12 $ ##; run;
Thank you for your help.
As long as the actual "records" are not too long I would use the DLMSTR= option to process the file twice. First to parse the "records" into lines. Then to read the fields from the lines.
So first make a new text file that has one line per record.
filename new temp;
data _null_;
infile have recfm=n lrecl=1000000 dlmstr='#$#$#';
file new ;
input line :$32767. #;
put line ;
run;
Now you can read the file NEW using the other delimiter string.
For example you could convert it to a real CSV file.
filename csv temp;
data _null_;
infile new dlmstr='$#$#$' length=ll column=cc truncover ;
file csv dsd ;
do until(cc>=ll);
input word :$32767. # ;
put word #;
end;
put;
run;
Results:
CAS,LLT_CODE,PT_CODE,HLT_CODE,HLGT_CODE,SOC_CODE,LLT,PT,HLT,HLGT,SOC,SOC_ABB
DJ20210005-0,10001896,10012271,10001897,10057167,10029205,Maladie d'Alzheimer,Démence de type Alzheimer,Maladie d'Alzheimer (incl sous-types),Déficiences mentales,Affections du système nerveux,Nerv
DJ20210005-0,10019308,10003664,10007607,10007510,10010331,Communication interauriculaire,Communication interauriculaire,Défauts congénitaux du septum cardiaque,Troubles congénitaux cardiovasculaires,"Affections congénitales, familiales et génétiques",Cong
This CSV file is then easy to read:
data test;
infile csv dsd firstobs=2 truncover ;
length CAS LLT_CODE PT_CODE HLT_CODE HLGT_CODE SOC_CODE LLT PT HLT HLGT SOC SOC_ABB $100;
input CAS -- SOC_ABB;
run;
If it is possible any of the values might include end of line characters then you should add code to replace those in the first step. For example you might add this line to replace CRLF strings with pipe characters.
line = tranwrd(line,'0D0A'x,'|');

Is there a way to instantly resolve macro variable created in a data step in the same data step?

Background is that I need to use filename command to execute grep and use the result as input.
Here is my input data set named test
firstname lastname filename
<blank> <blank> cus_01.txt
<blank> <blank> cus_02.txt
Filename values are actual files which I need to grep because I need certain string inside those files to fill up the firstname and lastname
Here is the code:
data work.test;
set work.test;
call symputx('file', filename);
filename fname pipe "grep ""Firstname"" <path>/&file.";
filename lname pipe "grep ""Lastname"" <path>/&file.";
infile fname;
input firstname;
infile lname;
input lastname;
run;
However, macro variables created inside a data step can't be used until after the data step procedure is completed. So, that means, &file. can't be resolved and can't be used in filename.
Is there a way to for resolve the macro variable?
Thanks!
This is not tested. You need to use the INFILE statement option FILEVAR.
data test;
input (firstname lastname filename) (:$20.);
cards;
<blank> <blank> cus_01.txt
<blank> <blank> cus_02.txt
;;;;
run;
data work.grep;
set work.test;
length cmd $128;
cmd = catx(' ','grep',quote(strip(firstname)),filename);
putlog 'NOTE: ' cmd=;
infile dummy pipe filevar=cmd end=eof;
do while(not eof);
input;
*something;
output;
end;
run;
If you have many customer files the use of pipe to grep can be an expensive operating system action, and on SAS servers potentially disallowed (pipe, x, system, etc...)
You can read all pattern-named files in a single data step using the wildcard feature of infile and the filename= option to capture the active file being read from.
Sample:
%let sandbox_path = %sysfunc(pathname(WORK));
* create 99 customer files, each with 20 customers;
data _null_;
length outfile $125;
do index = 1 to 99;
outfile = "&sandbox_path./" || 'cust_' || put(index,z2.) || '.txt';
file huzzah filevar=outfile;
putlog outfile=;
do _n_ = 1 to 20;
custid+1;
put custid=;
put "firstname=Joe" custid;
put "lastname=Schmoe" custid;
put "street=";
put "city=";
put "zip=";
put "----------";
end;
end;
run;
* read all the customer files in the path;
* scan each line for 'landmarks' -- either 'lastname' or 'firstname';
data want;
length from_whence source $128;
infile "&sandbox_path./cust_*.txt" filename=from_whence ;
source = from_whence;
input;
select;
when (index(_infile_,"firstname")) topic="firstname";
when (index(_infile_,"lastname")) topic="lastname";
otherwise;
end;
if not missing(topic);
line_read = _infile_;
run;

How to read informats data: $1,000.1M to 1000.1

The datasets include a list of numbers:
$1,000.1M
$100.5M
$1,002.3M
$23.4M
$120.3M
I want to read the variable as a numeric in SAS
the result should be:
Money(millions)
1000.1
100.5
1002.3
23.4
120.3
I used COMMAw.d to read this data, but cannot run
The code is:
input Money(millions) COMMA9.1;
run;
How to modify it?
Thank you very much!
The COMMA informat does not expect letters like 'M', it removes only commas, blanks, dollar signs, percent signs, dashes, and close parentheses.
You can just convert your raw string to a string containing a number by removing all characters you do not need:
data input;
length moneyRaw $200;
infile datalines;
input moneyRaw $;
datalines;
$1,000.1M
$100.5M
$1,002.3M
$23.4M
$120.3M
;
run;
data result;
set input;
* "k" modifier inverts the removed characters;
money = input(compress(moneyRaw,"0123456789.","k"),best.);
run;
Or if you know regex, you can add some intrigue to the code for anyone who reads it in the future:
data resultPrx;
set input;
moneyUpdated = prxChange("s/^\$(\d+(,\d+)*(\.\d+)?)M$/$1/",1,strip(moneyRaw));
money = input(compress(moneyUpdated,','),best.);
run;
I think you're best off reading it as a character and then processing it as in Dmitry's answer. But if it was a single column you could read it if you set the delimiter to M. I suspect this will work in a demo, but not in your full process.
data input;
informat moneyRaw dollar8.;
infile datalines dlm='M';
input moneyRaw ;
*moneyRaw = moneyRaw * (1000000);
format moneyRaw dollar32.;
datalines;
$1,000.1M
$100.5M
$1,002.3M
$23.4M
$120.3M
;
run;

SAS Replace line break / carriage return with a space

I want to read a file and replace all the line break / carriage return with a space. Can you please help?
/*input.txt*/
<li
data-linked-resource-type="userinfo" data-base-url="https://gbr.host.com/cc">Jean
Paul
Gautier
</a></li>
/*Required output*/
<li data-linked-resource-type="userinfo" data-base-url="https://gbr.host.com/cc">Jean Paul Gautier</a></li>
/*sas datastep*/
data inp;
infile "c:/tmp/input.txt";
/*ADD LOGIC*/
infile "c:/tmp/output.txt";
run;
I would suggest you just read the textfile line by line and then concat the lines together.
data inp;
length x $300. y $300.;
retain y "";
infile "d:/input.txt" dsd truncover;
input x $;
y=catx(" ",y, x); /*concat every line seperated by a space*/
run;
data _null_;
set inp end=EOF ;
FILE 'd:\input2.txt' ; /* Output Text File */
if eof ; /*only last observation has full concatinated string*/
y=compbl(y); /*remove additional spaces*/
PUT y;
run;
otherwise you can replace linefeeds the same way i showed you in your last question:
tranwrd(mydata,'0A'x, " ");

How to read file which has delimitor with in the double quotes

I have to read a file with a tab delimited x'05'c (dlm='0C'x). For few records the delimiter is present with in the string which has a double quotes. when I'm using '&' in the input statement it is working fine but records with more than one space is giving error.
Data I have to read:
1.AIRWORLDWIDE.z1234565
2.MEDICAL.y121546
3."INPUTTTFAM.ILY TRUST"
Output desired:
ID text text_ref
-----------------------------------
1 AIRWORLDWIDE z1234565
2 MEDICAL y121546
3 "INPUTTTFAM ILY TRUST"
My program :
Data Want;
format id $char1.
text $char12.
text_ref $char12.;
informat id $char1.
text $char12.
text_ref $char12.;
length id text text_ref;
infile have dlm='0C'x dsd END=eof missover ;
input id text text_ref;
/* input id (text text_ref) (& $12.); */
run;
thanks in advance
DSD is not the INFILE option you want here.
filename FT15F001 temp;
data want;
infile FT15F001 dlm='.' missover;
informat id $char1. text $char12. text_ref $char12.;
input (_all_)(:);
list;
parmcards;
1.AIRWORLDWIDE.z1234565
2.MEDICAL.y121546
3."INPUTTTFAM.ILY TRUST"
;;;;
run;
proc contents varnum;
run;
proc print;
run;