I have a csv with polish characters in it but when I am importing in SAS , certain polish characters are being replaced by "?" or any other random variable , How do I handle this.
I have a list of all the possible polish characters and I dont mind it being replaced by its english counterpart
You need to set the appropriate file encoding on your infile statement, e.g. encoding="UTF-8".
SAS Documentation > http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000146932.htm
http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/viewer.htm#a002610945.htm
Related
I have a string and it's going to be a filename . So i want to check if there is a special characters that i'm going to replace them so i won't be a problem when i'm going to create the file . is it a good practice to replace them with "_" ?
i' used this is it correct ? is there other characters excepts alphabet and number can be used on file name ? Which characters should I avoid in file names
String filename = ch.replaceAll(RegExp('[^A-Za-z0-9]'), '_');
The list of allowed filename characters depends on the underlying filesystem. On (most) Unix, anything except / and \0 is allowed. On Windows, the rules get weird. For example, you (usually) can't end a filename with a period; you can't name a file NUL, etc.
Other considerations: It would be confusing to allow spaces at the beginning/end of a filename. Spaces within a filename break certain tools (looking at you, make). Is your filesystem case-sensitive or case-preserving? Does it have a maximum filename length?
Which characters should I avoid in file names?
Wrong question. Do you have a particular need to allow "unusual" characters in filenames?
If these are machine-generated names, just do what you're doing (I prefer hyphens, but that's a stylistic decision). If these are user-generated filenames, just try saving the file -- if it fails, get the user to choose another name.
tl;dr: use URL-safe characters: [A-Za-z0-9_-]+.
I am completely new to SAS programming hence pardon if the question is very basic. I am trying to send a file using SAS sftp from Linux to windows server. I am able to transfer the file but the destination file has a row delimiter LF whereas our job is expecting the file to have CRLF delimiter. I tried using the termstr option but it fails with error "invalid option termstr". Below is my code
filename out-file sftp 'file.txt' cd='/project/dir'
host='hostname' recfm=v
user=user1;
data _null_;
file out-file TERMSTR=crlf;
do i=1 to i=10;
put i=;
end;
run;
Your program is using an invalid value for the fileref. You cannot use a hyphen in a SAS name.
You can use the TERMSTR= option on either the FILENAME or FILE statement to change the end of line characters.
I thought that SFTP always moved files as binary. You could try changing your SFTP option to make sure it is doing that. Try removing the recfm=v option.
Note: Text files have end-of-line characters, not record delimiters. If you are writing some type of proprietary binary file format you might consider the characters between rows of data a record delimiter, but it just leads to confusion if you think of the lines in text files as being separated instead of terminated.
TERMSTR is an option for the INFILE statement, there is no corresponding option for the FILE statement.
Try using PUT with a hexadecimal string and held output (#)
PUT I= '0d0a'x #;
From SAS documentation
Specifying Hexadecimal Values
Hexadecimal values for (system) option values must begin with a number (0–9) and must be followed by an X. For example, the following OPTIONS statement sets the line size to 160 using a hexadecimal number:
options linesize=0a0x;
Character assignments for hexadecimal numbers require quotation marks:
options formchar='a0'x;
Additional reading at SAS Constants in Expressions will reveal
Character Constants Expressed in Hexadecimal Notation
SAS character constants can be expressed in hexadecimal notation. A character hexadecimal constant is a string of an even number of hexadecimal characters enclosed in single or double quotation marks, followed immediately by an X
and
Numeric Constants Expressed in Hexadecimal Notation
A numeric constant that is expressed as a hexadecimal value starts with a numeric digit (usually 0), can be followed by more hexadecimal characters, and ends with the letter X. The constant can contain up to 16 valid hexadecimal characters (0 to 9, A to F)
I am importing an Excel spreadsheet into SAS using Proc Import:
Proc Import out=OUTPUT
Datafile = "(filename)"
DBMS=XLSX Replace;
Range = "Sheet1$A:Z";
run;
My numeric data columns contain a mixture of values held in Excel as numerics and '0 values held as text - i.e. with a leading apostrophe / single quote. When SAS imports these it treats them all the same (i.e. it returns Character strings of the values with the leading apostrophe stripped out).
This results in differences from the spreadsheet when calculations are applied (e.g. averaging) as Excel treats the '0 values as missing but SAS treats them as 0.
Is it possible to import the values as strings including the leading single quote / apostrophe, so that I can replace the '0 with missing values but keep the 0 records as 0? I would like to avoid having to manually manipulate the data in Excel as this data is drawn from an external source (don't ask...)
I doubt it. I think Excel doesn’t really consider the leading apostrophe as part of the value. It’s just a crazy way to indicate that a value is a text string (rather than numeric). When SAS imports the data, it recognizes that the quote is not part of the value. So if you’ve got an Excel column with ‘0 in some cells and 0 in others, it’s going to come in as character, and I don’t think you can tell the difference between them.
Unfortunately, the xlsx engine doesn’t support the s DBSASTYPE option. Other engines that import Excel have the DBSASTYPE option. That should allow you to tell SAS to import a column as a numeric variable, even if it sees character values. If it’s the case that you want all text values in the cell converted to missing, that might do the trick. But it’s possible it would still treat ‘0 the same as 0. I’m away from SAS, so can’t test.
Option:
The ~ (tilde) format modifier enables you to read and retain single quotation marks.
http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a003209907.htm
Is it possible to convert the .xlsx to .txt keeping the single quotes? Because it is not possible to infile xlsx in a data step.
filename df disk 'C:\data_temp\ex.txt';
data test;
infile df firstobs=2;
input ID $2. x ~$3. ;
run;
proc print data=test;
run;
I am trying to reformat my variables in SAS using the put statement and a user defined format. However, I can't seem to get it to work. I want to make the value "S0001-001" convert to "S0001-002". However, when I use this code:
put("S0001-001",$format.)
it returns "S0001-001". I double-checked my format and it is mapped correctly. I import it from Excel, convert it to a SAS table, and convert the SAS table to a SAS format.
Am I misunderstanding what the put statement is supposed to be doing?
Thanks for the help.
Assuming that you tried something like this it should work as you intended.
proc format ;
value $format 'S0001-001' = 'S0001-002' ;
run;
data want ;
old= 'S0001-001';
new=put(old,$format.);
put (old new) (=:$quote.);
run;
Make sure that you do not have leading spaces or other invisible characters in either the variable value or the START value of your format. Similarly make sure that your hyphens are actual hyphens and not em-dash characters.
I am trying to import excel sheets that contain chinese characters into Stata 13.1. Following the guidelines at the following link: "Chinese Characters in Stata" I am able to get Chinese characters read in Stata. For example, I have .dta files which contain chinese characters and these are displayed correctly.The issue is that when I try and import excel sheets that contain chinese characters, these are imported as "????" - a string of question marks of varying lengths. Is their a way to solve this issue?
Note: I am using Windows 8.1 but I think the method in the link above still applies.
It sounds like an issue with your file and not so much with Stata. Chinese characters are often (if not always) encoded as UTF-8. It's possible that your Excel sheet didn't do this correctly. If you're not required to import from Excel directly, maybe try opening the file in Excel, saving the sheet as a "*.csv" (Comma Separated Values) file, and make sure to select the option which asks for UTF-8 encoding. Then use the insheet using "file.csv" , names to get the file in Stata with the first row made into variable names.