I am trying to import text file in sas. Data as below:
AccNumber Name Date of Birth Type City Score
1211111111 Mmmmm Ggggg 01-Dec-1989 Base Nanded 111
7222222222 Rannnn Sssss 14-Jan-1989 Silver mumbai 222
FILENAME REFFILE '/folders/myshortcuts/MyFolder/AccountChar.txt';
PROC IMPORT DATAFILE=REFFILE
DBMS=csv
OUT=WORK.IMPORT2;
GETNAMES=YES;
delimiter='09'x;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2; RUN;
But, after import, I got a dataset with 107 columns and only Account number column is showing correct data.
Need help.
Log output:
NOTE: 296 records were read from the infile REFFILE.The minimum record length was 128.The maximum record length was 150.
NOTE: The data set WORK.IMPORT5 has 296 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
296 rows created in WORK.IMPORT5 from REFFILE.
NOTE: WORK.IMPORT5 data set was successfully created.
NOTE: The data set WORK.IMPORT5 has 296 observations and 1 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.14 seconds
cpu time 0.13 seconds
PROC CONTENTS DATA=WORK.IMPORT5; RUN;
From the sample you posted and the comments it looks like your file is NOT a delimited file, but it does appear to have data in fixed column locations. Just figure out where on the line each column is placed and read it directly using a data step. Something like this:
data WORK.IMPORT2;
infile '/folders/myshortcuts/MyFolder/AccountChar.txt' firstobs=2 truncover;
input
AccountNumber $ 1-25
Name $ 26-50
#51 Date_of_Birth date11.
Type $ 74-98
City $ 99-123
Fica 124-130
;
format date_of_birth date9. ;
run;
You are providing a delimiter option '09'x (tab) which is ignored as your dbms is set to csv.
Try:
FILENAME REFFILE '/folders/myshortcuts/MyFolder/AccountChar.txt';
PROC IMPORT DATAFILE=REFFILE
DBMS=dlm /* use delimiter option */
OUT=WORK.IMPORT2;
GETNAMES=YES;
delimiter='09'x;
RUN;
PROC CONTENTS DATA=WORK.IMPORT2; RUN;
For more info, see documentation
Related
I have a very large data table on "dsv" format and i'm trying to import it on sas. However i don't have enough space to import the full table and then filter it (i've done this for smaller tables).
Is there any way to filter the data while importing it because at the end i will only use a part of that table ? If i want for example to import only rows that have the value 103 for Var2
PS: i'm using "proc import" not "data - infile..." because i don't know the exact number of columns
Var1
Var2
Var3
A10
103
Test
A02
102
Hiis
...
...
....
Thank you
You can add dataset options to the dataset listed in the OUT= option of PROC IMPORT.
Example:
filename dsv temp;
data _null_;
input (var1-var3) (:$20.);
file dsv dsd dlm='|';
put var1-var3;
cards;
Var1 Var2 Var3
A10 103 Test
A02 102 Hiis
;
proc import file=dsv dbms=csv out=want(where=(var2=102)) replace ;
delimter='|';
run;
The result is a dataset with just one observation.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
If you don't know the name of the second variable you could always just read the header row first and put the name into a macro variable.
data _null_;
infile dsv dsd dlm='|' truncover obs=1;
input (2*name) (:$32.);
call symputx('var2',nliteral(name));
run;
proc import file=dsv dbms=csv out=want(where=(&var2=102)) replace ;
delimter='|';
run;
You can add a where dataset option to the out= statement. For example:
proc import
file = 'myfile.txt'
out = want(where=(var2=103))
...;
run;
I am trying to import a data sheet in SAS using the Proc Import function. However, the import is unsuccessfull due to invalid data in two numeric variables (SAMPLE_D and SAMPLE_T). How do I change this so I can import my data? I still want the variables to be numeric but this is pretty much a hard stop. First my code:
/* Source File: LAB_MICRO.csv */
/* Source Path: /sasfolders/user/mhau0061 */
/* Code generated on: 20/02/22 17.25 */
%web_drop_table(WORK.LAB_CULT);
FILENAME REFFILE '/sasfolders/user/mhau0061/LAB_MICRO.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=DLM
OUT=WORK.LAB_CULT;
DELIMITER=";";
GETNAMES=YES;
GUESSINGROWS=15000;
RUN;
PROC CONTENTS DATA=WORK.LAB_CULT; RUN;
%web_open_table(WORK.LAB_CULT); ```
and it gives me the following error message in the log:
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
NOTE: Invalid data for SAMPLE_D in line 18 159-162.
x many.
I have tried to change the guessing rows to max but it still gives errors. What should I do?
Write your own data step step to read the file. Then you can read any problem variables as text.
If you have documentation on the lengths of the fields in the file then use those, otherwise you could just round up little from what PROC IMPORT was able to guess from looking at this one version of the file.
data lab_micrp;
infile REFFILE dsd dlm=';' truncover firstobs=2 ;
length
PATIENT $300
LAB_INT_ID $40
SRC $10
SAMPLE_D $10
SAMPLE_T $20
SAMPLE_MAT $50
REF_DEPARTM $50
INV_EXAM_TYPE $20
COM_GRP $200
SAMPLE_LOC $30
INV_EXAM $50
CLIN_INF $80
MIC_RES $200
REF_HOSP $50
SAMPLE_LAB_INT_ID $40
;
input patient -- sample_lab_int_id;
run;
Once you have the data in a dataset you can use SAS to look a the values and see if you need to modify the step that reads the values. Or add steps to convert the text read into other types of variables.
My program makes a web-service call and receives a response in XML format which I store as output.txt. When opened in notepad, the file looks like this
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
I am unable to read this file into SAS using proc IMPORT. My SAS code is below
proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
delimiter='<>"=';
getnames=yes;
run;
My log is
1 %_eg_hidenotesandsource;
5 %_eg_hidenotesandsource;
28
29 proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
30 delimiter='<>"=';
31 getnames=yes;
32 run;
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Unable to sample external file, no data in first 5 records.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
33
34 %_eg_hidenotesandsource;
46
47
48 %_eg_hidenotesandsource;
51
My ultimate goal is to mine Employee first name (Gerald), last name (Harris) and Employee Number (108181) from the above file and store it in the dataset (and then do this over and over again with a loop and upend the same dataset). If you can help regarding importing the entire file or just the information that I need directly, then that would help.
If you only need these three fields then named input a single input statement is perfectly viable, and arguably preferable to parsing xml with regex:
data want;
infile xmlfile dsd dlm = ' /';
input #"Employee" #"firstName=" firstName :$32. #"lastName=" lastName :$32. #"emplnbr=" emplnbr :8.;
run;
This uses the input file constructed in Richard's answer. The initial #Employee is optional but reduces the risk of picking up any fields with the same names as the desired ones that are subfields of a different top-level field.
Bonus: the same approach can also be used to import json files if you're in a similar situation.
Since you are unable to use the preferred methods of reading xml data, and you are processing a single record result from a service query the git'er done approach seems warranted.
One idea that did not pan out was to use named input.
input #'Employee' lastname= firstname= emplnbr=;
The results could not be made to strip the quotes with $QUOTE. informat nor honor infile dlm=' /'
An approach that did work was to read the single line and parse the value out using a regular expression with capture groups. PRXPARSE is used to compile a pattern, PRXMATCH to test for a match and PRXPOSN to retrieve the capture group.
* create a file to read from (represents the file from the service call capture);
options ls=max;
filename xmlfile "%sysfunc(pathname(WORK))\1-service-call-record.xml";
data have;
input;
file xmlfile;
put _infile_;
datalines;
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
run;
* read the entire line from the file and parse out the values using Perl regular expression;
data want;
infile xmlfile;
input;
rx_employee = prxparse('/employee\s+firstname="([^"]+)"\s+lastname="([^"]+)"\s+emplnbr="([^"]+)"/i');
if prxmatch(rx_employee,_infile_) then do;
firstname = prxposn(rx_employee, 1, _infile_);
lastname = prxposn(rx_employee, 2, _infile_);
emplnbr = prxposn(rx_employee, 3, _infile_);
end;
keep firstname last emplnbr;
run;
This is the story
This is the input file
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
this is the import statement that I am using
data datetimedata;
infile fileref dlm=',';
input lastname$ datechkin mmddyy10. datechkout mmddyy10. room_rate equip_cost;
run;
the below is the log which shows success
NOTE: The infile FILEREF is:
Filename=\\VBOXSVR\win_7\SAS\DATA\datetime\datetimedata.csv,
RECFM=V,LRECL=256,File Size (bytes)=688,
Last Modified=13Jun2015:12:08:36,
Create Time=13Jun2015:09:13:09
NOTE: 17 records were read from the infile FILEREF.
The minimum record length was 34.
The maximum record length was 40.
NOTE: The data set WORK.DATETIMEDATA has 17 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
I have published only 3 observation here.
Now when I print the sas dataset everything works fine except the room_rate variable.
THe output should be 3 digit numbers , but i am getting only the last digit .
Where Am i going wrong !!!
You're mixing input types. When you use list input, you can't specify informats. You either need to specify them using modified list input (add a colon to the informat) or use an informat statement earlier. The following works.
data datetimedata;
infile datalines dlm=',';
input lastname$ datechkin :mmddyy10. datechkout :mmddyy10. room_rate equip_cost;
datalines;
mukesh,04/04/15,04/06/15,125.00,333.23
vishant,04/05/15,04/07/15,200.00,200
achal,04/06/15,04/08/15,275.00,55.43
;;;;
run;
proc print data=datetimedata;
run;
I'm importing a text file into SAS, using the code below :
proc import datafile="C:\Users\Desktop\data.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
However, I get error messages in the log and certain fields are populated with "." in place of the real data and I don't know what is the problem.
The error message is :
Invalid data for DIPL in line 26 75-76.
Invalid data for DIPL in line 28 75-76.
Invalid data for DIPL in line 31 75-76.
Invalid data for DIPL in line 34 75-76.
A sample of the data is available here http://m.uploadedit.com/b029/1392916373370.txt
Don't use PROC IMPORT in most cases for delimited files; you should use data step input. You can use PROC IMPORT to generate initial code (to your log), but most of the time you will want to make at least some changes. This sounds like one of those times.
data want;
infile "blah.dat" dlm=';' dsd lrecl=32767 missover;
informat
trans $1.
triris $1.
typc $6.
;
input
trans $
triris $
typc $
... rest of variables ...
;
run;
PROC IMPORT generates code just like this in your log, so you can use that as a starting point, and then correct things that are wrong (numeric instead of character, add variables if it has too few as the above apparently does, etc.).
I copied the text file from your link, and ran your code (without the apostrophe):
proc import datafile="C:\temp\test.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
And it worked fine despite the following:
Number of names found is less than number of variables found.
Result:
NOTE: WORK.INDIVS data set was successfully created.
NOTE: The data set WORK.INDIVS has 50 observations and 89 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
cpu time 0.26 seconds
If log has this "Number of names found is less than number of variables found."
then it creates new variables which have blank values.