I'm importing a text file into SAS, using the code below :
proc import datafile="C:\Users\Desktop\data.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
However, I get error messages in the log and certain fields are populated with "." in place of the real data and I don't know what is the problem.
The error message is :
Invalid data for DIPL in line 26 75-76.
Invalid data for DIPL in line 28 75-76.
Invalid data for DIPL in line 31 75-76.
Invalid data for DIPL in line 34 75-76.
A sample of the data is available here http://m.uploadedit.com/b029/1392916373370.txt
Don't use PROC IMPORT in most cases for delimited files; you should use data step input. You can use PROC IMPORT to generate initial code (to your log), but most of the time you will want to make at least some changes. This sounds like one of those times.
data want;
infile "blah.dat" dlm=';' dsd lrecl=32767 missover;
informat
trans $1.
triris $1.
typc $6.
;
input
trans $
triris $
typc $
... rest of variables ...
;
run;
PROC IMPORT generates code just like this in your log, so you can use that as a starting point, and then correct things that are wrong (numeric instead of character, add variables if it has too few as the above apparently does, etc.).
I copied the text file from your link, and ran your code (without the apostrophe):
proc import datafile="C:\temp\test.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
And it worked fine despite the following:
Number of names found is less than number of variables found.
Result:
NOTE: WORK.INDIVS data set was successfully created.
NOTE: The data set WORK.INDIVS has 50 observations and 89 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
cpu time 0.26 seconds
If log has this "Number of names found is less than number of variables found."
then it creates new variables which have blank values.
Related
I am trying to import a data sheet in SAS using the Proc Import function. However, the import is unsuccessfull due to invalid data in two numeric variables (SAMPLE_D and SAMPLE_T). How do I change this so I can import my data? I still want the variables to be numeric but this is pretty much a hard stop. First my code:
/* Source File: LAB_MICRO.csv */
/* Source Path: /sasfolders/user/mhau0061 */
/* Code generated on: 20/02/22 17.25 */
%web_drop_table(WORK.LAB_CULT);
FILENAME REFFILE '/sasfolders/user/mhau0061/LAB_MICRO.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=DLM
OUT=WORK.LAB_CULT;
DELIMITER=";";
GETNAMES=YES;
GUESSINGROWS=15000;
RUN;
PROC CONTENTS DATA=WORK.LAB_CULT; RUN;
%web_open_table(WORK.LAB_CULT); ```
and it gives me the following error message in the log:
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
x many.
I have tried to change the guessing rows to max but it still gives errors. What should I do?
Write your own data step step to read the file. Then you can read any problem variables as text.
If you have documentation on the lengths of the fields in the file then use those, otherwise you could just round up little from what PROC IMPORT was able to guess from looking at this one version of the file.
data lab_micrp;
infile REFFILE dsd dlm=';' truncover firstobs=2 ;
length
PATIENT $300
LAB_INT_ID $40
SRC $10
SAMPLE_D $10
SAMPLE_T $20
SAMPLE_MAT $50
REF_DEPARTM $50
INV_EXAM_TYPE $20
COM_GRP $200
SAMPLE_LOC $30
INV_EXAM $50
CLIN_INF $80
MIC_RES $200
REF_HOSP $50
SAMPLE_LAB_INT_ID $40
;
input patient -- sample_lab_int_id;
run;
Once you have the data in a dataset you can use SAS to look a the values and see if you need to modify the step that reads the values. Or add steps to convert the text read into other types of variables.
I have been working on this for 3 days now and have tried all I can think of including %str(),%bquote(), translate() and tranwrd() to replace single apostrophe with double apostrophe or %’
The below data step and macro work fine until I hit a last name which contains an apostrophe e.g. O’Brien. I then encounter syntax errors due to un closed left parentheses. The below code I have left what I thought was closest to working with the tranwrd included.
Any assistance you can provide is greatly appreciated.
%macro put_data (object1,id);
Proc http
method=“put”
url=“https://myurl/submissionid/&id”
in=&object1;
Headers “content-type”=“application/json”;
Run;
%mend;
data _null_;
Set work.emp_lis;
call execute(catt(‘%put_data(‘,’%quote(‘’{“data”:{“employeeName”:”’,tranwrd(employeeName,”’’”,”’”),’”}}’’),’,id,’)’));
run;
Craig
There are a wide potential of problems in constructing or passing a json string in SAS macro. Proc JSON will produce valid json (in a file) from data and that file in turn can be specified as the input to be consumed by your web service.
Example:
data have;
length id 8 name $25;
input id& name&;
datalines;
1 Homer Simpson
2 Ned Flanders
3 Shaun O'Connor
4 Tom&Bob
5 'X Æ A-12'
6 Goofy "McDuck"
;
%macro put_data (data,id);
filename jsonfile temp;
proc json out=jsonfile;
export &data.(where=(id=&id));
write values "data";
write open object;
write values "name" name;
write close;
run;
proc http
method="put"
url="https://myurl/submissionid/&id"
in=jsonfile
;
headers "content-type"="application/json";
run;
%mend;
data _null_;
set have;
call execute(cats('%nrstr(%put_data(have,',id,'))'));
run;
I was able to find issue with my code with the tranwrd statement was backwards and it needed to be moved to the proc sql create table statement. I also needed to wrap &object1 in %bquote. This was the final code that worked.
When creating table wrap variables in tranwrd as below.
tranwrd(employeeName, “‘“,”’’”)
% macro put_data (object1,id);
Proc http
method=“put”
url=“https://myurl/submissionid/&id”
in=%bquote(&object1);
Headers “content-type”=“application/json”;
Run;
%mend;
data _null_;
Set work.emp_lis;
call execute(catt(‘%put_data(‘,’%quote(‘’{“data”:{“employeeName”:”’,employeeName,’”}}’’),’,id,’)’));
run;
Just use actual quotes and you won't have to worry about macro quoting at all.
So if your macro looks like this:
%macro put_data(object1,id);
proc http method="put"
url="https://myurl/submissionid/&id"
in=&object1
;
headers "content-type"="application/json";
run;
%mend;
Then the value of OBJECT1 would usually be a quoted string literal or a fileref. (There are actually other forms.) Looks like you are trying to generate a quoted string. So just use the QUOTE() function.
So if your data looks like:
data emp_lis;
input id employeeName $50.;
cards;
1234 O'Brien
4567 Smith
;
Then you can use a data step like this to generate one macro call for each observation.
data _null_;
set emp_lis;
call execute(cats
('%nrstr(%put_data)('
,quote(cats('{"data":{"employeeName":',quote(trim(employeeName)),'}}'))
,',',id
,')'
));
run;
And your SAS log will look something like:
NOTE: CALL EXECUTE generated line.
1 + %put_data("{""data"":{""employeeName"":""O'Brien""}}",1234)
NOTE: PROCEDURE HTTP used (Total process time):
real time 2.46 seconds
cpu time 0.04 seconds
2 + %put_data("{""data"":{""employeeName"":""Smith""}}",4567)
NOTE: PROCEDURE HTTP used (Total process time):
real time 2.46 seconds
cpu time 0.04 seconds
I'm new to SAS and am having issues with using Linear Regression.
I loaded a CSV file and then in Tasks and Utilities > Tasks > Statistics > Linear Regression I selected WORK.BP (BP = filename) for my data. When I try to select my dependent variable SAS says "No columns are available."
The CVS file appears to have loaded correctly and has 2 columns so I can't figure out what the issue is.
Thanks for the help.
This is the code I used for loading the file:
data BP;
infile '/folders/myfolders/BP.csv' dlm =',' firstobs=2;
input BP $Pressure$;
run;
And this is what the output looks like
By running your code. you import the .csv file with the 'PRESSURE' variable as character variable; in a linear regression model, you need to have all varaible as _numeric_.
In order to do this, I suggest to use the PROC IMPORT to import a .csf format file, instead of a DATA step with an INPUT statement.
In your case, you shold follows those steps:
Define the path where the .csv file is located:
%let path = the_folder_path_where_the_csv_file_is_located ;
Define the number of rows from which start your data (by including the labels/variables names in the count):
%let datarow = 2;
Import the .csv file, here named 'BP', as follows:
proc import datafile="&path.\BP.csv"
out=BP
dbms=csv
replace;
delimiter=",";
datarow=&datarow.;
getnames=YES;
run;
I assumed that the file you want as output has to be called BP too (you'll find it in the work library!) and that the delimiter is the colon.
Hope this helps!
My program makes a web-service call and receives a response in XML format which I store as output.txt. When opened in notepad, the file looks like this
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
I am unable to read this file into SAS using proc IMPORT. My SAS code is below
proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
delimiter='<>"=';
getnames=yes;
run;
My log is
1 %_eg_hidenotesandsource;
5 %_eg_hidenotesandsource;
28
29 proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
30 delimiter='<>"=';
31 getnames=yes;
32 run;
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Unable to sample external file, no data in first 5 records.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
33
34 %_eg_hidenotesandsource;
46
47
48 %_eg_hidenotesandsource;
51
My ultimate goal is to mine Employee first name (Gerald), last name (Harris) and Employee Number (108181) from the above file and store it in the dataset (and then do this over and over again with a loop and upend the same dataset). If you can help regarding importing the entire file or just the information that I need directly, then that would help.
If you only need these three fields then named input a single input statement is perfectly viable, and arguably preferable to parsing xml with regex:
data want;
infile xmlfile dsd dlm = ' /';
input #"Employee" #"firstName=" firstName :$32. #"lastName=" lastName :$32. #"emplnbr=" emplnbr :8.;
run;
This uses the input file constructed in Richard's answer. The initial #Employee is optional but reduces the risk of picking up any fields with the same names as the desired ones that are subfields of a different top-level field.
Bonus: the same approach can also be used to import json files if you're in a similar situation.
Since you are unable to use the preferred methods of reading xml data, and you are processing a single record result from a service query the git'er done approach seems warranted.
One idea that did not pan out was to use named input.
input #'Employee' lastname= firstname= emplnbr=;
The results could not be made to strip the quotes with $QUOTE. informat nor honor infile dlm=' /'
An approach that did work was to read the single line and parse the value out using a regular expression with capture groups. PRXPARSE is used to compile a pattern, PRXMATCH to test for a match and PRXPOSN to retrieve the capture group.
* create a file to read from (represents the file from the service call capture);
options ls=max;
filename xmlfile "%sysfunc(pathname(WORK))\1-service-call-record.xml";
data have;
input;
file xmlfile;
put _infile_;
datalines;
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
run;
* read the entire line from the file and parse out the values using Perl regular expression;
data want;
infile xmlfile;
input;
rx_employee = prxparse('/employee\s+firstname="([^"]+)"\s+lastname="([^"]+)"\s+emplnbr="([^"]+)"/i');
if prxmatch(rx_employee,_infile_) then do;
firstname = prxposn(rx_employee, 1, _infile_);
lastname = prxposn(rx_employee, 2, _infile_);
emplnbr = prxposn(rx_employee, 3, _infile_);
end;
keep firstname last emplnbr;
run;
I am trying to import text file in sas. Data as below:
AccNumber Name Date of Birth Type City Score
1211111111 Mmmmm Ggggg 01-Dec-1989 Base Nanded 111
7222222222 Rannnn Sssss 14-Jan-1989 Silver mumbai 222
FILENAME REFFILE '/folders/myshortcuts/MyFolder/AccountChar.txt';
PROC IMPORT DATAFILE=REFFILE
DBMS=csv
OUT=WORK.IMPORT2;
GETNAMES=YES;
delimiter='09'x;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2; RUN;
But, after import, I got a dataset with 107 columns and only Account number column is showing correct data.
Need help.
Log output:
NOTE: 296 records were read from the infile REFFILE.The minimum record length was 128.The maximum record length was 150.
NOTE: The data set WORK.IMPORT5 has 296 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
296 rows created in WORK.IMPORT5 from REFFILE.
NOTE: WORK.IMPORT5 data set was successfully created.
NOTE: The data set WORK.IMPORT5 has 296 observations and 1 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.14 seconds
cpu time 0.13 seconds
PROC CONTENTS DATA=WORK.IMPORT5; RUN;
From the sample you posted and the comments it looks like your file is NOT a delimited file, but it does appear to have data in fixed column locations. Just figure out where on the line each column is placed and read it directly using a data step. Something like this:
data WORK.IMPORT2;
infile '/folders/myshortcuts/MyFolder/AccountChar.txt' firstobs=2 truncover;
input
AccountNumber $ 1-25
Name $ 26-50
#51 Date_of_Birth date11.
Type $ 74-98
City $ 99-123
Fica 124-130
;
format date_of_birth date9. ;
run;
You are providing a delimiter option '09'x (tab) which is ignored as your dbms is set to csv.
Try:
FILENAME REFFILE '/folders/myshortcuts/MyFolder/AccountChar.txt';
PROC IMPORT DATAFILE=REFFILE
DBMS=dlm /* use delimiter option */
OUT=WORK.IMPORT2;
GETNAMES=YES;
delimiter='09'x;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2; RUN;
For more info, see documentation