Unable to import .txt file in SAS using proc IMPORT - sas

My program makes a web-service call and receives a response in XML format which I store as output.txt. When opened in notepad, the file looks like this
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
I am unable to read this file into SAS using proc IMPORT. My SAS code is below
proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
delimiter='<>"=';
getnames=yes;
run;
My log is
1 %_eg_hidenotesandsource;
5 %_eg_hidenotesandsource;
28
29 proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
30 delimiter='<>"=';
31 getnames=yes;
32 run;
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Unable to sample external file, no data in first 5 records.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
33
34 %_eg_hidenotesandsource;
46
47
48 %_eg_hidenotesandsource;
51
My ultimate goal is to mine Employee first name (Gerald), last name (Harris) and Employee Number (108181) from the above file and store it in the dataset (and then do this over and over again with a loop and upend the same dataset). If you can help regarding importing the entire file or just the information that I need directly, then that would help.

If you only need these three fields then named input a single input statement is perfectly viable, and arguably preferable to parsing xml with regex:
data want;
infile xmlfile dsd dlm = ' /';
input #"Employee" #"firstName=" firstName :$32. #"lastName=" lastName :$32. #"emplnbr=" emplnbr :8.;
run;
This uses the input file constructed in Richard's answer. The initial #Employee is optional but reduces the risk of picking up any fields with the same names as the desired ones that are subfields of a different top-level field.
Bonus: the same approach can also be used to import json files if you're in a similar situation.

Since you are unable to use the preferred methods of reading xml data, and you are processing a single record result from a service query the git'er done approach seems warranted.
One idea that did not pan out was to use named input.
input #'Employee' lastname= firstname= emplnbr=;
The results could not be made to strip the quotes with $QUOTE. informat nor honor infile dlm=' /'
An approach that did work was to read the single line and parse the value out using a regular expression with capture groups. PRXPARSE is used to compile a pattern, PRXMATCH to test for a match and PRXPOSN to retrieve the capture group.
* create a file to read from (represents the file from the service call capture);
options ls=max;
filename xmlfile "%sysfunc(pathname(WORK))\1-service-call-record.xml";
data have;
input;
file xmlfile;
put _infile_;
datalines;
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
run;
* read the entire line from the file and parse out the values using Perl regular expression;
data want;
infile xmlfile;
input;
rx_employee = prxparse('/employee\s+firstname="([^"]+)"\s+lastname="([^"]+)"\s+emplnbr="([^"]+)"/i');
if prxmatch(rx_employee,_infile_) then do;
firstname = prxposn(rx_employee, 1, _infile_);
lastname = prxposn(rx_employee, 2, _infile_);
emplnbr = prxposn(rx_employee, 3, _infile_);
end;
keep firstname last emplnbr;
run;

Related

Invalid data for numeric variable in line xxx

I am trying to import a data sheet in SAS using the Proc Import function. However, the import is unsuccessfull due to invalid data in two numeric variables (SAMPLE_D and SAMPLE_T). How do I change this so I can import my data? I still want the variables to be numeric but this is pretty much a hard stop. First my code:
/* Source File: LAB_MICRO.csv */
/* Source Path: /sasfolders/user/mhau0061 */
/* Code generated on: 20/02/22 17.25 */
%web_drop_table(WORK.LAB_CULT);
FILENAME REFFILE '/sasfolders/user/mhau0061/LAB_MICRO.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=DLM
OUT=WORK.LAB_CULT;
DELIMITER=";";
GETNAMES=YES;
GUESSINGROWS=15000;
RUN;
PROC CONTENTS DATA=WORK.LAB_CULT; RUN;
%web_open_table(WORK.LAB_CULT); ```
and it gives me the following error message in the log:
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
x many.
I have tried to change the guessing rows to max but it still gives errors. What should I do?
Write your own data step step to read the file. Then you can read any problem variables as text.
If you have documentation on the lengths of the fields in the file then use those, otherwise you could just round up little from what PROC IMPORT was able to guess from looking at this one version of the file.
data lab_micrp;
infile REFFILE dsd dlm=';' truncover firstobs=2 ;
length
PATIENT $300
LAB_INT_ID $40
SRC $10
SAMPLE_D $10
SAMPLE_T $20
SAMPLE_MAT $50
REF_DEPARTM $50
INV_EXAM_TYPE $20
COM_GRP $200
SAMPLE_LOC $30
INV_EXAM $50
CLIN_INF $80
MIC_RES $200
REF_HOSP $50
SAMPLE_LAB_INT_ID $40
;
input patient -- sample_lab_int_id;
run;
Once you have the data in a dataset you can use SAS to look a the values and see if you need to modify the step that reads the values. Or add steps to convert the text read into other types of variables.

SAS Export Issue as it is giving additional double quote

I am trying to export SAS data into CSV, sas dataset name is abc here and format is
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
I am using following code.
filename exprt "C:/abc.csv" encoding="utf-8";
proc export data=abc
outfile=exprt
dbms=tab;
run;
output is
LINE_NUMBER DESCRIPTION
524JG "24PC AMEFA VINTAGE CUTLERY SET ""DUBARRY"""
so there is double quote available before and after the description here and additional doble quote is coming after & before DUBARRY word. I have no clue whats happening. Can some one help me to resolve this and make me understand what exatly happening here.
expected result:
LINE_NUMBER DESCRIPTION
524JG 24PC AMEFA VINTAGE CUTLERY SET "DUBARRY"
There is no need to use PROC EXPORT to create a delimited file. You can write it with a simple DATA step. If you want to create your example file then just do not use the DSD option on the FILE statement. But note that depending on the data you are writing that you could create a file that cannot be properly parsed because of extra un-protected delimiters. Also you will have trouble representing missing values.
Let's make a sample dataset we can use to test.
data have ;
input id value cvalue $ name $20. ;
cards;
1 123 A Normal
2 345 B Embedded|delimiter
3 678 C Embedded "quotes"
4 . D Missing value
5 901 . Missing cvalue
;
Essentially PROC EXPORT is writing the data using the DSD option. Like this:
data _null_;
set have ;
file 'myfile.txt' dsd dlm='09'x ;
put (_all_) (+0);
run;
Which will yield a file like this (with pipes replacing the tabs so you can see them).
1|123|A|Normal
2|345|B|"Embedded|delimiter"
3|678|C|"Embedded ""quotes"""
4||D|Missing value
5|901||Missing cvalue
If you just remove DSD option then you get a file like this instead.
1|123|A|Normal
2|345|B|Embedded|delimiter
3|678|C|Embedded "quotes"
4|.|D|Missing value
5|901| |Missing cvalue
Notice how the second line looks like it has 5 values instead of 4, making it impossible to know how to split it into 4 values. Also notice how the missing values have a minimum length of at least one character.
Another way would be to run a data step to convert the normal file that PROC EXPORT generates into the variant format that you want. This might also give you a place to add escape characters to protect special characters if your target format requires them.
data _null_;
infile normal dsd dlm='|' truncover ;
file abnormal dlm='|';
do i=1 to 4 ;
if i>1 then put '|' #;
input field :$32767. #;
field = tranwrd(field,'\','\\');
field = tranwrd(field,'|','\|');
len = lengthn(field);
put field $varying32767. len #;
end;
put;
run;
You could even make this datastep smart enough to count the number of fields on the first row and use that to control the loop so that you wouldn't have to hard code it.

SAS says data contains no columns when using linear regression

I'm new to SAS and am having issues with using Linear Regression.
I loaded a CSV file and then in Tasks and Utilities > Tasks > Statistics > Linear Regression I selected WORK.BP (BP = filename) for my data. When I try to select my dependent variable SAS says "No columns are available."
The CVS file appears to have loaded correctly and has 2 columns so I can't figure out what the issue is.
Thanks for the help.
This is the code I used for loading the file:
data BP;
infile '/folders/myfolders/BP.csv' dlm =',' firstobs=2;
input BP $Pressure$;
run;
And this is what the output looks like
By running your code. you import the .csv file with the 'PRESSURE' variable as character variable; in a linear regression model, you need to have all varaible as _numeric_.
In order to do this, I suggest to use the PROC IMPORT to import a .csf format file, instead of a DATA step with an INPUT statement.
In your case, you shold follows those steps:
Define the path where the .csv file is located:
%let path = the_folder_path_where_the_csv_file_is_located ;
Define the number of rows from which start your data (by including the labels/variables names in the count):
%let datarow = 2;
Import the .csv file, here named 'BP', as follows:
proc import datafile="&path.\BP.csv"
out=BP
dbms=csv
replace;
delimiter=",";
datarow=&datarow.;
getnames=YES;
run;
I assumed that the file you want as output has to be called BP too (you'll find it in the work library!) and that the delimiter is the colon.
Hope this helps!

not reading a numeric value using input list method in sas

This is the story
This is the input file
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
this is the import statement that I am using
data datetimedata;
infile fileref dlm=',';
input lastname$ datechkin mmddyy10. datechkout mmddyy10. room_rate equip_cost;
run;
the below is the log which shows success
NOTE: The infile FILEREF is:
Filename=\\VBOXSVR\win_7\SAS\DATA\datetime\datetimedata.csv,
RECFM=V,LRECL=256,File Size (bytes)=688,
Last Modified=13Jun2015:12:08:36,
Create Time=13Jun2015:09:13:09
NOTE: 17 records were read from the infile FILEREF.
The minimum record length was 34.
The maximum record length was 40.
NOTE: The data set WORK.DATETIMEDATA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
I have published only 3 observation here.
Now when I print the sas dataset everything works fine except the room_rate variable.
THe output should be 3 digit numbers , but i am getting only the last digit .
Where Am i going wrong !!!
You're mixing input types. When you use list input, you can't specify informats. You either need to specify them using modified list input (add a colon to the informat) or use an informat statement earlier. The following works.
data datetimedata;
infile datalines dlm=',';
input lastname$ datechkin :mmddyy10. datechkout :mmddyy10. room_rate equip_cost;
datalines;
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
;;;;
run;
proc print data=datetimedata;
run;

Import text file into SAS

I'm importing a text file into SAS, using the code below :
proc import datafile="C:\Users\Desktop\data.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
However, I get error messages in the log and certain fields are populated with "." in place of the real data and I don't know what is the problem.
The error message is :
Invalid data for DIPL in line 26 75-76.
Invalid data for DIPL in line 28 75-76.
Invalid data for DIPL in line 31 75-76.
Invalid data for DIPL in line 34 75-76.
A sample of the data is available here http://m.uploadedit.com/b029/1392916373370.txt
Don't use PROC IMPORT in most cases for delimited files; you should use data step input. You can use PROC IMPORT to generate initial code (to your log), but most of the time you will want to make at least some changes. This sounds like one of those times.
data want;
infile "blah.dat" dlm=';' dsd lrecl=32767 missover;
informat
trans $1.
triris $1.
typc $6.
;
input
trans $
triris $
typc $
... rest of variables ...
;
run;
PROC IMPORT generates code just like this in your log, so you can use that as a starting point, and then correct things that are wrong (numeric instead of character, add variables if it has too few as the above apparently does, etc.).
I copied the text file from your link, and ran your code (without the apostrophe):
proc import datafile="C:\temp\test.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
And it worked fine despite the following:
Number of names found is less than number of variables found.
Result:
NOTE: WORK.INDIVS data set was successfully created.
NOTE: The data set WORK.INDIVS has 50 observations and 89 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
cpu time 0.26 seconds
If log has this "Number of names found is less than number of variables found."
then it creates new variables which have blank values.