Can SAS 9.1.3 read csv file with utf8 BOM? - sas

I am using SAS 9.1.3 in AIX 5.3
I have to proc import a CSV file using SAS.
The first line of CSV are column names.
SAS reports error in the log.
Then, I find out that the CSV file has 3 characters
(which is the utf8 byte order mark).
at the very beginning of the file.
I tried to use :
filename XXX 'XXXXXXXXXX' BOM ;
But, this is syntax error.
I replace BOM with BOMFILE, still syntax error.
It seems that SAS 9.1.3 cannot recognize the BOM options.
Does anyone have similar experience ?

Instead of the import procedure, you might try a data step like the following:
data test;
infile "data.csv" firstobs=2 dlm=','; /* assuming delimiter is a comma */
input /* use Input with $UTF8Xw. informat */
field1 $utf8x3. /* input fields 1 through 3 */
field2 $utf8x10.
field3 $utf8x3.
;
run;

SAS can read this (at least 9.1 plus) but your SAS session must be running with the DBCS and encoding options set.
-DBCS
-encoding UTF-8
These need to be in the sasconfig file or on command line of invocation. With these options the default encoding is Unicode for the SAS session. Without it Unicode options pass syntax checks but have no effect.
You can try using the encoding= options infile statement but for me that never worked.
For some related info see also http://www.phuse.eu/download.aspx?type=cms&docID=3658

Related

SAS Infile not reading any data rows

I am a novice with SAS and am having trouble importing a CSV file. The information is in the following layout:
aircraft,duration,no_pasg,speed_ground,speed_air,height,pitch,distance
boeing,98.4790912,53,107.91568,109.3283765,27.41892425,4.043514571,3369.836364
boeing,125.7332973,69,101.6555886,102.8514051,27.80471618,4.117431699,2987.803924
boeing,112.0170008,61,71.05196088,,18.58938573,4.434043129,1144.922426
I run the code below and I get 0 observations (not expected) and 8 variables (expected). What am I doing wrong? I even tried putting the firstob=2 to skip the first row that contain headers but that hasn't helped
I appreciate the help.
DATA FAA_DATA1;
INFILE '~/Project/Data/FAA1.csv' dsd dlm=',' firstobs=2;
INPUT aircraft $ duration no_pasg speed_ground speed_air height pitch distance;
RUN;
PROC PRINT;
RUN;
Sounds like it is not seeing the end-of-line characters and so thinks your file is one long line. That would explain why FIRSTOBS=2 causes you to get 0 observations. There should be a note in the log about how many lines SAS read from the file.
Try using the TERMSTR= option on INFILE statement. The normal end of line for Unix is TERMSTR=LF. The normal end of line for Windows is TERMSTR=CRLF. If you made the file using Excel on a Mac then you should try using TERMSTR=CR.
For some reason Excel on a Mac still thinks that Mac OS uses CR as the end-of-line character for text files even though Apple converted the Mac OS to using Unix years ago. There also should be an option in Excel when saving the file to save it as comma delimited but using normal end of line characters.

Export SAS dataset in Excel's "Text (MS-DOS)" format

I need to export a data set as text file for an ancient batch process probably running on Unix. The file has one column and all fields are numeric.
I want to create a text file which emulates the way Excel creates Text (MS-DOS) files:
Saves a workbook as a tab-delimited text file for use on the MS-DOS
operating system, and ensures that tab characters, line breaks, and
other characters are interpreted correctly. Saves only the active
sheet.
What is the best way to achieve this?
DOS uses encoding page 437, which is a very limited set of characters. If you don't have any special characters, you're good. If you do have special characters, you'll need to change the encoding page to 437 in order to guarantee character compatibility. This can be done as a dataset option.
SAS internally names this pcoem437. You can see the difference in output by changing the encoding= option.
data have;
input var$;
datalines;
ElNiƱo
ElNino
;
run;
proc export data=have(encoding=pcoem437)
file='C:\Directory\want.txt'
dbms=tab
replace;
run;
If you just have one column then the delimiter doesn't matter. You can write the file using a DATA step very easily.
data _null_;
set have ;
file 'myfile.txt' ;
put VAR1 ;
run;
If you want to add an extra line with the column name then add this before the PUT statement.
if _n_=1 then put 'VAR1';
If you are worried about whether you need to generate LF or CRLF for the end of line you can control that with the TERMSTR= option on the FILE statement.

SAS: Reading all files containing a specific string

I would like to read a set of files that all contain a specific string (xxx*.csv). So if my folder has the files below, the code should read in file #1 and #2.
1. xxx_123.csv
2. xxx_456.csv
3. xyz_111.csv
The first part of the filename is known, but the second part is unknown. Anyone know how I can achieve this in SAS Enterprise Guide?
You can create a wildcard fileref, not sure how to do it interactively in EG, but in Base SAS the code would be similar to the below.
filename wildxxx "/path/xxx*.csv" ;
data readin ;
infile wildxxx /* + infile options */;
/* input statement */
run ;

How to import txt file in SAS using infile command

I 'm beginner in SAS and currently using SAS 9.1 I want to import txt file using infile command but it is giving error. My code is as follows
data sasdata.twenty;
infile "C:\Users\Ravi Raghava\Desktop\Cricket.txt" firstobs=2;
input Position Runs Sixes Fours balls;
run;
Any help will be highly appreciated.
Usually I do it by the following:
proc import datafile='data.txt'
out=TableName
replace;
delimiter='09'x;
run;
It works correctly.
you are using the default sas input (i.e. maybe space is used as a separator, guess whether I meant numeric or alpha-numeric and so on). You assume the SAS knows what it is doing. You know what happens when you assume.
Use a
Length Positions $20 Runs Sixes Fours balls 8;
statement. That will at least make sure the first word is treated as alpha numeric and the rest as numbers. also use the
infile ..... dlm=',';
if you got a CSV file. or
infile ..... dlm='09'x;
if it is a tab separated file.

utf-8 encoding in SAS with a pipe

Is there a way to combine a piped infile and utf-8 encoding in SAS?
For example, this works:
data wordlist;
infile 'wordlist.txt' dlm='|' encoding='utf-8';
input polar $3. word :$30.;
run;
but this doesn't:
filename inf pipe 'perl fix.pl';
data wordlist;
infile inf dlm='|' encoding='utf-8';
input polar $3. word :$30.;
run;
The error is:
ERROR 23-2: Invalid option name ENCODING.
I've tried putting the encoding statement in the filename statement and the infile statement, but neither works.
The ENCODING option is not currently supported for the PIPE "device" in SAS for Windows. However, my understanding is that can work on UNIX.
As another responded, if you set your session encoding to UTF-8 (using the -ENCODING system option) that might work for you. But setting the session encoding may have other side effects for your processing, so you'll need to exercise care.
No idea why piped files would be different, but what if you change your session encoding to UTF-8?
Alternately you can of course work around it by having the perl script output written to a file, then reading that via normal input methods.
This was discussed a bit ago on SAS-L, and I don't think any better solution was found (see http://listserv.uga.edu/cgi-bin/wa?A2=ind1203b&L=sas-l&D=0&P=9728 )