Im quite new to SAS and after many attempts Im still wondering how could import my csv file into SAS accurately.
Here's my data
color, Description, price
"Black, blue, grey", "Pipe, 16" inch wide, PVC", 20.27
Here's my sas code
PROC IMPORT datafile='/home/..data.csv'
out=data dbms=csv replace;
getnames=yes;
guessingrows = max;
RUN;
Here's what SAS reads:
Color Description Price
Black, blue, grey "Pipe .
I suspect SAS treat "Pipe under description instead of Pipe, 16" inch wide, PVC. What could I do so that SAS can read the whole line?
The csv data is invalid.
The 'desired' double quoted value Pipe, 16" inch wide, PVC contains both a double-quote (") and the value separator (,). Some CSV readers will parse correctly if the 16" has an escaped " as 16"". However, SAS IMPORT appears to be not one of those.
Can you get the data with an alternate field delimiter such as | or ~?
Related
I am trying to download a file from SAS and import it to Hadoop.
Its a huge dataset - 6GB.
When I export the sas dataset to csv file and then import back to sas.(as I was facing few in issues in hadoop, I tried importing back to SAS and verify values). The import shows problems in the dataset in the same tool itself..
The column values are jumbled up.
Few columns have junk values, few columns are overlapped
How can I export the dataset in csv format with the column values intact.
filename output 'AAA.csv' encoding="utf-8";
Proc export data= input_data
outfile= output
dbms = CSV;
run;
Just a guess, but try removing any end of line characters that might exist in your character strings.
For example you could use a simple data step view to convert the strings on the fly. Here is one that replaces any CR or LF character with a pipe character.
data for_export / view=for_export;
set input_data;
array _c _character_;
do over _c;
_c = translate(_c,'||','0D0A'x);
end;
run;
proc export data=for_export outfile=output dbms=CSV;
run;
You might also watch out for backslash characters. Some readers try to interpret those as an escape character.
My program makes a web-service call and receives a response in XML format which I store as output.txt. When opened in notepad, the file looks like this
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
I am unable to read this file into SAS using proc IMPORT. My SAS code is below
proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
delimiter='<>"=';
getnames=yes;
run;
My log is
1 %_eg_hidenotesandsource;
5 %_eg_hidenotesandsource;
28
29 proc import datafile="/mktg/prc203/abhee/output.txt" out=work.test2 dbms=dlm replace;
30 delimiter='<>"=';
31 getnames=yes;
32 run;
NOTE: Unable to open parameter catalog: SASUSER.PARMS.PARMS.SLIST in update mode. Temporary parameter values will be saved to
WORK.PARMS.PARMS.SLIST.
Unable to sample external file, no data in first 5 records.
ERROR: Import unsuccessful. See SAS Log for details.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.09 seconds
cpu time 0.09 seconds
33
34 %_eg_hidenotesandsource;
46
47
48 %_eg_hidenotesandsource;
51
My ultimate goal is to mine Employee first name (Gerald), last name (Harris) and Employee Number (108181) from the above file and store it in the dataset (and then do this over and over again with a loop and upend the same dataset). If you can help regarding importing the entire file or just the information that I need directly, then that would help.
If you only need these three fields then named input a single input statement is perfectly viable, and arguably preferable to parsing xml with regex:
data want;
infile xmlfile dsd dlm = ' /';
input #"Employee" #"firstName=" firstName :$32. #"lastName=" lastName :$32. #"emplnbr=" emplnbr :8.;
run;
This uses the input file constructed in Richard's answer. The initial #Employee is optional but reduces the risk of picking up any fields with the same names as the desired ones that are subfields of a different top-level field.
Bonus: the same approach can also be used to import json files if you're in a similar situation.
Since you are unable to use the preferred methods of reading xml data, and you are processing a single record result from a service query the git'er done approach seems warranted.
One idea that did not pan out was to use named input.
input #'Employee' lastname= firstname= emplnbr=;
The results could not be made to strip the quotes with $QUOTE. informat nor honor infile dlm=' /'
An approach that did work was to read the single line and parse the value out using a regular expression with capture groups. PRXPARSE is used to compile a pattern, PRXMATCH to test for a match and PRXPOSN to retrieve the capture group.
* create a file to read from (represents the file from the service call capture);
options ls=max;
filename xmlfile "%sysfunc(pathname(WORK))\1-service-call-record.xml";
data have;
input;
file xmlfile;
put _infile_;
datalines;
<OwnerInquiryResponse xmlns="http://www.fedex.com/esotservice/schema"><ResponseHeader><TimeStamp time="2018-02-01T16:09:19.319Z"/></ResponseHeader><Owner><Employee firstName="Gerald" lastName="Harris" emplnbr="108181"/><SalesAttribute type="Sales"/><Territory NodeGlobalRegion="US" SegDesc="Worldwide Sales" SegNbr="1" TTY="2-2-1-2-1-1-10"/></Owner><Delegates/><AlignmentDetail><SalesAttribute type="Sales"/><Alignments/></AlignmentDetail></OwnerInquiryResponse>
run;
* read the entire line from the file and parse out the values using Perl regular expression;
data want;
infile xmlfile;
input;
rx_employee = prxparse('/employee\s+firstname="([^"]+)"\s+lastname="([^"]+)"\s+emplnbr="([^"]+)"/i');
if prxmatch(rx_employee,_infile_) then do;
firstname = prxposn(rx_employee, 1, _infile_);
lastname = prxposn(rx_employee, 2, _infile_);
emplnbr = prxposn(rx_employee, 3, _infile_);
end;
keep firstname last emplnbr;
run;
I have the following SAS code that exports to an .xls-file. (NB: I need the OLD 1997-2003 format).
I specify the sheet name to be: 'PB Organization'
but when the file is created the sheet name is 'PB_Organization'
An "_" has been added. What is happening?
PS: The file contains the right columns and rows, it is just the sheet name that is wrong.
%let Path_Org = "\\Folder\CurrentMonth - PB Organization";
proc export data=pb_org2
outfile = &Path_Org
dbms=xls replace;
sheet = 'PB Organization';
run;
from SAS docs:
SHEET=sheet-name
identifies a particular spreadsheet in an Excel workbook. Use the SHEET= option only when you want to import an entire spreadsheet. If the EXPORT procedure sheet-name contains special characters (such as space) SAS converts it to an underscore.
The space is converted to an underscore. "Employee Information" becomes "Employee_Information"
see also here http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003103761.htm
When I run the following and output to the ODS Latex desination, SAS escapes the dollar sign and all the brackets, which won't compile in Latex.
How can I output what's in each cell verbatim?
ods escapechar='^';
Proc format;
picture sigstar (round)
low-0.01="***" (NOEDIT)
0.01<-0.05="** " (FILL=' ' PREFIX='')
0.05<-0.10="* " (FILL=' ' PREFIX='')
other= " " (FILL=' ' PREFIX='');
run;
data test;
input mean pvalue;
datalines;
2.50 0.001
3.50 0.05
4.25 0.12
5.00 0.01
;
run;
data test;
set test;
output = cats( put(mean,7.2) , '{$^{', put(pvalue,sigstar.),'}$}' );
format pvalue sigstar.;
drop mean pvalue;
run;
ods tagsets.simplelatex file="test.tex" (notop nobot);
proc print data=test;run;
ods tagsets.simplelatex close;
This outputs the following .text code:
\sassystemtitle[c]{~}
\sascontents[1]{The Print Procedure}
\sascontents[2]{Data Set WORK.TEST}
\begin{longtable}{|r|l|r|}\hline
Obs & output & pvalue\\\hline
\endhead
1 & 2.50\{\$\^\{***\}\$\} & ~\\\hline
2 & 3.50\{\$\^\{**\}\$\} & ~\\\hline
3 & 4.25\{\$\^\{\}\$\} & ~\\\hline
4 & 5.00\{\$\^\{***\}\$\} & ~\\\hline
\end{longtable}
How can I ensure that the second cell in the first row shows 2.50{$^{***}$} and not 2.50\{\$\^\{***\}\$\}? The package I'm using in Latex requires that the stars are surrounded by the chars above, so I just want to output what's in the dataset verbatim.
I don't think you would generally do that in latex tagsets. The point of tagsets is to take normal SAS data and reformat it to make a latex file; what you're doing is manually doing the latex file itself.
This paper shows an example of how you can do something close to what you want; basically, you write the latex file sans-SAS provided graphs or charts, mark where you want the SAS part to go, and effectively paste it in via SAS code. You should also be able to do the writing of the initial latex file in SAS - but don't do it with the tagset, just write it directly in a data step put.
The easiest way, though, may be to simply customize your own latex tagset. Edit the one closest to what you are looking for, and add in your own code as needed. An example is this question.
To avoid these characters being escaped it suffices to edit the following lines at the end of the SimpleLatex tagset:
mapsub = %nrstr("/\%%/$/\&/\~/\\#/{\textunderscore}");
map = %nrstr("%%$&~#_");
I'm importing a text file into SAS, using the code below :
proc import datafile="C:\Users\Desktop\data.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
However, I get error messages in the log and certain fields are populated with "." in place of the real data and I don't know what is the problem.
The error message is :
Invalid data for DIPL in line 26 75-76.
Invalid data for DIPL in line 28 75-76.
Invalid data for DIPL in line 31 75-76.
Invalid data for DIPL in line 34 75-76.
A sample of the data is available here http://m.uploadedit.com/b029/1392916373370.txt
Don't use PROC IMPORT in most cases for delimited files; you should use data step input. You can use PROC IMPORT to generate initial code (to your log), but most of the time you will want to make at least some changes. This sounds like one of those times.
data want;
infile "blah.dat" dlm=';' dsd lrecl=32767 missover;
informat
trans $1.
triris $1.
typc $6.
;
input
trans $
triris $
typc $
... rest of variables ...
;
run;
PROC IMPORT generates code just like this in your log, so you can use that as a starting point, and then correct things that are wrong (numeric instead of character, add variables if it has too few as the above apparently does, etc.).
I copied the text file from your link, and ran your code (without the apostrophe):
proc import datafile="C:\temp\test.txt" out=Indivs dbms=dlm replace;
delimiter=';';
getnames=yes;
run;
And it worked fine despite the following:
Number of names found is less than number of variables found.
Result:
NOTE: WORK.INDIVS data set was successfully created.
NOTE: The data set WORK.INDIVS has 50 observations and 89 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 0.30 seconds
cpu time 0.26 seconds
If log has this "Number of names found is less than number of variables found."
then it creates new variables which have blank values.