I need to export a data set as text file for an ancient batch process probably running on Unix. The file has one column and all fields are numeric.
I want to create a text file which emulates the way Excel creates Text (MS-DOS) files:
Saves a workbook as a tab-delimited text file for use on the MS-DOS
operating system, and ensures that tab characters, line breaks, and
other characters are interpreted correctly. Saves only the active
sheet.
What is the best way to achieve this?
DOS uses encoding page 437, which is a very limited set of characters. If you don't have any special characters, you're good. If you do have special characters, you'll need to change the encoding page to 437 in order to guarantee character compatibility. This can be done as a dataset option.
SAS internally names this pcoem437. You can see the difference in output by changing the encoding= option.
data have;
input var$;
datalines;
ElNiño
ElNino
;
run;
proc export data=have(encoding=pcoem437)
file='C:\Directory\want.txt'
dbms=tab
replace;
run;
If you just have one column then the delimiter doesn't matter. You can write the file using a DATA step very easily.
data _null_;
set have ;
file 'myfile.txt' ;
put VAR1 ;
run;
If you want to add an extra line with the column name then add this before the PUT statement.
if _n_=1 then put 'VAR1';
If you are worried about whether you need to generate LF or CRLF for the end of line you can control that with the TERMSTR= option on the FILE statement.
Related
I am importing an Excel spreadsheet into SAS using Proc Import:
Proc Import out=OUTPUT
Datafile = "(filename)"
DBMS=XLSX Replace;
Range = "Sheet1$A:Z";
run;
My numeric data columns contain a mixture of values held in Excel as numerics and '0 values held as text - i.e. with a leading apostrophe / single quote. When SAS imports these it treats them all the same (i.e. it returns Character strings of the values with the leading apostrophe stripped out).
This results in differences from the spreadsheet when calculations are applied (e.g. averaging) as Excel treats the '0 values as missing but SAS treats them as 0.
Is it possible to import the values as strings including the leading single quote / apostrophe, so that I can replace the '0 with missing values but keep the 0 records as 0? I would like to avoid having to manually manipulate the data in Excel as this data is drawn from an external source (don't ask...)
I doubt it. I think Excel doesn’t really consider the leading apostrophe as part of the value. It’s just a crazy way to indicate that a value is a text string (rather than numeric). When SAS imports the data, it recognizes that the quote is not part of the value. So if you’ve got an Excel column with ‘0 in some cells and 0 in others, it’s going to come in as character, and I don’t think you can tell the difference between them.
Unfortunately, the xlsx engine doesn’t support the s DBSASTYPE option. Other engines that import Excel have the DBSASTYPE option. That should allow you to tell SAS to import a column as a numeric variable, even if it sees character values. If it’s the case that you want all text values in the cell converted to missing, that might do the trick. But it’s possible it would still treat ‘0 the same as 0. I’m away from SAS, so can’t test.
Option:
The ~ (tilde) format modifier enables you to read and retain single quotation marks.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003209907.htm
Is it possible to convert the .xlsx to .txt keeping the single quotes? Because it is not possible to infile xlsx in a data step.
filename df disk 'C:\data_temp\ex.txt';
data test;
infile df firstobs=2;
input ID $2. x ~$3. ;
run;
proc print data=test;
run;
I am trying to read the following data from a Notepad (text) file into a SAS data set:
name1,124325,08/10/2003,1250.03
name2,114565,08/11/2003,11115.11
name3,000007,08/11/2003,12500.02
When I use this SAS code:
data new;
filename tfile '~\transact2.txt';
infile tfile dsd;
input name $ id date mmddyy10. cost 8.2;
run;
I get this, where cost is all missing:
However, if I just replace dsd with dlm=',', then the cost variable is read in correctly. Why does dsd cause the cost variable to be read in incorrectly?
dsd does not say "use a delimiter". It tells SAS how to use that delimiter (mostly, saying anything in quotes is treated as one field, and modifying how consecutive delimiters are treated). dlm=',' is necessary to read this in correctly. I'm a bit surprised you got as close to correct as you did. (Fortunately, SAS makes some assumptions here that end up making it work correctly, more-or-less).
Also, you're mixing two styles of input, which isn't allowed.
When you use delimited input, you are using list, not column, input. You can only indicate character/not character, and cannot use informats directly. If you want to embed the informats like you do for the date, you need to use modified column input:
data new;
filename tfile '~\transact2.txt';
infile tfile dsd;
input name $ id date :mmddyy10. cost;
run;
Also note that reading in cost with 8.2 is incorrect. The decimal in an informat is only for reading in 12345678 as 123456.78 (back in the day when you had 80 column cards and didn't want to spend one on the decimal). In general in "modern" SAS you should not be using decimal portion of informat ever. SAS will see the decimal and work it out properly.
I am a novice with SAS and am having trouble importing a CSV file. The information is in the following layout:
aircraft,duration,no_pasg,speed_ground,speed_air,height,pitch,distance
boeing,98.4790912,53,107.91568,109.3283765,27.41892425,4.043514571,3369.836364
boeing,125.7332973,69,101.6555886,102.8514051,27.80471618,4.117431699,2987.803924
boeing,112.0170008,61,71.05196088,,18.58938573,4.434043129,1144.922426
I run the code below and I get 0 observations (not expected) and 8 variables (expected). What am I doing wrong? I even tried putting the firstob=2 to skip the first row that contain headers but that hasn't helped
I appreciate the help.
DATA FAA_DATA1;
INFILE '~/Project/Data/FAA1.csv' dsd dlm=',' firstobs=2;
INPUT aircraft $ duration no_pasg speed_ground speed_air height pitch distance;
RUN;
PROC PRINT;
RUN;
Sounds like it is not seeing the end-of-line characters and so thinks your file is one long line. That would explain why FIRSTOBS=2 causes you to get 0 observations. There should be a note in the log about how many lines SAS read from the file.
Try using the TERMSTR= option on INFILE statement. The normal end of line for Unix is TERMSTR=LF. The normal end of line for Windows is TERMSTR=CRLF. If you made the file using Excel on a Mac then you should try using TERMSTR=CR.
For some reason Excel on a Mac still thinks that Mac OS uses CR as the end-of-line character for text files even though Apple converted the Mac OS to using Unix years ago. There also should be an option in Excel when saving the file to save it as comma delimited but using normal end of line characters.
I'm creating a text file with SAS and I'm using a macro variable with a date in my text file's name to make it distinct from other similar files.
The problem I'm experiencing:
SAS is adding two unwanted spaces in the middle of the file name. The unwanted spaces are placed directly before the text generated by my macro variable
I'm certain this has everything to do with my macro variable being used, but on its own, the variable doesn't contain any spaces. Below is my code:
proc format;
picture dateFormat
other = '%Y%0m%0d%0H%0M' (datatype=datetime);
run;
data _null_;
dateTime=datetime();
call symput('dateTime', put(dateTime,dateFormat.));
run;
%LET FILE = text_text_abc_&dateTime..txt;
filename out "/location/here/&FILE" termstr=crlf;
data _null_; set flatfile;
/*file content is created in here*/
run;
The exported file name will look like this:
NOTE: The file OUT is:
Filename=/location/here/text_text_abc_ 201702010855.txt
If it helps, I'm using SAS E-Guide 7.1.
Any help is appreciated! Thanks, all!
You need to assign an appropriate default length to your picture format. SAS is applying a default default length of 14 but you need 12, e.g.
proc format;
picture dateFormat (default=12)
other = '%Y%0m%0d%0H%0M' (datatype=datetime);
run;
Use call symputx() instead of call symput(), then SAS will automatically strip the leading and trailing blanks from the value written to the macro variable. You should really only use call symput() in the rare cases where you want the macro variable value to have leading or trailing blanks.
Run this little program to see the difference.
data _null_;
str=' XX ';
call symput('var1',str);
call symputX('var2',str);
run;
%put |&var1|;
%put |&var2|;
I am using SAS 9.1.3 in AIX 5.3
I have to proc import a CSV file using SAS.
The first line of CSV are column names.
SAS reports error in the log.
Then, I find out that the CSV file has 3 characters
(which is the utf8 byte order mark).
at the very beginning of the file.
I tried to use :
filename XXX 'XXXXXXXXXX' BOM ;
But, this is syntax error.
I replace BOM with BOMFILE, still syntax error.
It seems that SAS 9.1.3 cannot recognize the BOM options.
Does anyone have similar experience ?
Instead of the import procedure, you might try a data step like the following:
data test;
infile "data.csv" firstobs=2 dlm=','; /* assuming delimiter is a comma */
input /* use Input with $UTF8Xw. informat */
field1 $utf8x3. /* input fields 1 through 3 */
field2 $utf8x10.
field3 $utf8x3.
;
run;
SAS can read this (at least 9.1 plus) but your SAS session must be running with the DBCS and encoding options set.
-DBCS
-encoding UTF-8
These need to be in the sasconfig file or on command line of invocation. With these options the default encoding is Unicode for the SAS session. Without it Unicode options pass syntax checks but have no effect.
You can try using the encoding= options infile statement but for me that never worked.
For some related info see also http://www.phuse.eu/download.aspx?type=cms&docID=3658