Is there a way to combine a piped infile and utf-8 encoding in SAS?
For example, this works:
data wordlist;
infile 'wordlist.txt' dlm='|' encoding='utf-8';
input polar $3. word :$30.;
run;
but this doesn't:
filename inf pipe 'perl fix.pl';
data wordlist;
infile inf dlm='|' encoding='utf-8';
input polar $3. word :$30.;
run;
The error is:
ERROR 23-2: Invalid option name ENCODING.
I've tried putting the encoding statement in the filename statement and the infile statement, but neither works.
The ENCODING option is not currently supported for the PIPE "device" in SAS for Windows. However, my understanding is that can work on UNIX.
As another responded, if you set your session encoding to UTF-8 (using the -ENCODING system option) that might work for you. But setting the session encoding may have other side effects for your processing, so you'll need to exercise care.
No idea why piped files would be different, but what if you change your session encoding to UTF-8?
Alternately you can of course work around it by having the perl script output written to a file, then reading that via normal input methods.
This was discussed a bit ago on SAS-L, and I don't think any better solution was found (see http://listserv.uga.edu/cgi-bin/wa?A2=ind1203b&L=sas-l&D=0&P=9728 )
Related
I am trying to read the following data from a Notepad (text) file into a SAS data set:
name1,124325,08/10/2003,1250.03
name2,114565,08/11/2003,11115.11
name3,000007,08/11/2003,12500.02
When I use this SAS code:
data new;
filename tfile '~\transact2.txt';
infile tfile dsd;
input name $ id date mmddyy10. cost 8.2;
run;
I get this, where cost is all missing:
However, if I just replace dsd with dlm=',', then the cost variable is read in correctly. Why does dsd cause the cost variable to be read in incorrectly?
dsd does not say "use a delimiter". It tells SAS how to use that delimiter (mostly, saying anything in quotes is treated as one field, and modifying how consecutive delimiters are treated). dlm=',' is necessary to read this in correctly. I'm a bit surprised you got as close to correct as you did. (Fortunately, SAS makes some assumptions here that end up making it work correctly, more-or-less).
Also, you're mixing two styles of input, which isn't allowed.
When you use delimited input, you are using list, not column, input. You can only indicate character/not character, and cannot use informats directly. If you want to embed the informats like you do for the date, you need to use modified column input:
data new;
filename tfile '~\transact2.txt';
infile tfile dsd;
input name $ id date :mmddyy10. cost;
run;
Also note that reading in cost with 8.2 is incorrect. The decimal in an informat is only for reading in 12345678 as 123456.78 (back in the day when you had 80 column cards and didn't want to spend one on the decimal). In general in "modern" SAS you should not be using decimal portion of informat ever. SAS will see the decimal and work it out properly.
I am a novice with SAS and am having trouble importing a CSV file. The information is in the following layout:
aircraft,duration,no_pasg,speed_ground,speed_air,height,pitch,distance
boeing,98.4790912,53,107.91568,109.3283765,27.41892425,4.043514571,3369.836364
boeing,125.7332973,69,101.6555886,102.8514051,27.80471618,4.117431699,2987.803924
boeing,112.0170008,61,71.05196088,,18.58938573,4.434043129,1144.922426
I run the code below and I get 0 observations (not expected) and 8 variables (expected). What am I doing wrong? I even tried putting the firstob=2 to skip the first row that contain headers but that hasn't helped
I appreciate the help.
DATA FAA_DATA1;
INFILE '~/Project/Data/FAA1.csv' dsd dlm=',' firstobs=2;
INPUT aircraft $ duration no_pasg speed_ground speed_air height pitch distance;
RUN;
PROC PRINT;
RUN;
Sounds like it is not seeing the end-of-line characters and so thinks your file is one long line. That would explain why FIRSTOBS=2 causes you to get 0 observations. There should be a note in the log about how many lines SAS read from the file.
Try using the TERMSTR= option on INFILE statement. The normal end of line for Unix is TERMSTR=LF. The normal end of line for Windows is TERMSTR=CRLF. If you made the file using Excel on a Mac then you should try using TERMSTR=CR.
For some reason Excel on a Mac still thinks that Mac OS uses CR as the end-of-line character for text files even though Apple converted the Mac OS to using Unix years ago. There also should be an option in Excel when saving the file to save it as comma delimited but using normal end of line characters.
I need to export a data set as text file for an ancient batch process probably running on Unix. The file has one column and all fields are numeric.
I want to create a text file which emulates the way Excel creates Text (MS-DOS) files:
Saves a workbook as a tab-delimited text file for use on the MS-DOS
operating system, and ensures that tab characters, line breaks, and
other characters are interpreted correctly. Saves only the active
sheet.
What is the best way to achieve this?
DOS uses encoding page 437, which is a very limited set of characters. If you don't have any special characters, you're good. If you do have special characters, you'll need to change the encoding page to 437 in order to guarantee character compatibility. This can be done as a dataset option.
SAS internally names this pcoem437. You can see the difference in output by changing the encoding= option.
data have;
input var$;
datalines;
ElNiƱo
ElNino
;
run;
proc export data=have(encoding=pcoem437)
file='C:\Directory\want.txt'
dbms=tab
replace;
run;
If you just have one column then the delimiter doesn't matter. You can write the file using a DATA step very easily.
data _null_;
set have ;
file 'myfile.txt' ;
put VAR1 ;
run;
If you want to add an extra line with the column name then add this before the PUT statement.
if _n_=1 then put 'VAR1';
If you are worried about whether you need to generate LF or CRLF for the end of line you can control that with the TERMSTR= option on the FILE statement.
I 'm beginner in SAS and currently using SAS 9.1 I want to import txt file using infile command but it is giving error. My code is as follows
data sasdata.twenty;
infile "C:\Users\Ravi Raghava\Desktop\Cricket.txt" firstobs=2;
input Position Runs Sixes Fours balls;
run;
Any help will be highly appreciated.
Usually I do it by the following:
proc import datafile='data.txt'
out=TableName
replace;
delimiter='09'x;
run;
It works correctly.
you are using the default sas input (i.e. maybe space is used as a separator, guess whether I meant numeric or alpha-numeric and so on). You assume the SAS knows what it is doing. You know what happens when you assume.
Use a
Length Positions $20 Runs Sixes Fours balls 8;
statement. That will at least make sure the first word is treated as alpha numeric and the rest as numbers. also use the
infile ..... dlm=',';
if you got a CSV file. or
infile ..... dlm='09'x;
if it is a tab separated file.
I am using SAS 9.1.3 in AIX 5.3
I have to proc import a CSV file using SAS.
The first line of CSV are column names.
SAS reports error in the log.
Then, I find out that the CSV file has 3 characters
(which is the utf8 byte order mark).
at the very beginning of the file.
I tried to use :
filename XXX 'XXXXXXXXXX' BOM ;
But, this is syntax error.
I replace BOM with BOMFILE, still syntax error.
It seems that SAS 9.1.3 cannot recognize the BOM options.
Does anyone have similar experience ?
Instead of the import procedure, you might try a data step like the following:
data test;
infile "data.csv" firstobs=2 dlm=','; /* assuming delimiter is a comma */
input /* use Input with $UTF8Xw. informat */
field1 $utf8x3. /* input fields 1 through 3 */
field2 $utf8x10.
field3 $utf8x3.
;
run;
SAS can read this (at least 9.1 plus) but your SAS session must be running with the DBCS and encoding options set.
-DBCS
-encoding UTF-8
These need to be in the sasconfig file or on command line of invocation. With these options the default encoding is Unicode for the SAS session. Without it Unicode options pass syntax checks but have no effect.
You can try using the encoding= options infile statement but for me that never worked.
For some related info see also http://www.phuse.eu/download.aspx?type=cms&docID=3658