SAS: Converting character to numeric variable - comma as a decimal separator - sas

I'm trying to use INPUT function, as it is always suggested, but it seems that SAS has some problems with proper interpretation of amounts like:
2,30
1,61
0,00
...and I end up with missing values. Perhaps it's caused by comma being thousands separator where SAS come from ;)
data temp;
old = '1,61';
new = input(old, 5.2);
run;
Why the result of above is new = .?
It seems that I've found some work-around - by replacing comma with a period using TRANWRD before INPUT function is called (vide code below), but it's quite ugly solution and I suppose there must be a proper one.
data temp;
old = '1,61';
new = input(tranwrd(old,',','.'), 5.2);
run;

The reason new = . in your example is because SAS does not recognize the comma as a decimal separator. See the note in the log.
NOTE: Invalid argument to function INPUT at line 4 column 11.
old=1,61 new=. ERROR=1 N=1
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to
missing values.
The documentation contains a list of various SAS informats. Based on the documentation it looks like you can use the COMMAX informat.
COMMAXw.d - Writes numeric values with a period that separates every three digits and a comma that separates the decimal fraction.
The modified code looks like this:
data temp;
old = '1,61';
new = input(old,commax5.);
run;
proc print;
The resulting output is:
Obs old new
1 1,61 1.61
If you want to keep the new variable in the same format you can just add the statement format new commax5.; to the data step.
Thanks to Tom for pointing out that SAS uses informats in the INPUT() function.

Related

Unable to convert a character variable with numbers with a comma into numeric

I have a set of variables in SAS that should be numeric but are characters. Numbers are comma separated and I need a point. For example, I need 19,000417537 to be 19.000417537. I tried translate without success. the comma is still there and I'm not able to convert the variable to numeric using input(). Can anyone help me please?
Thank you in advance
Best
Use INPUT() with the COMMAX informat.
data have;
length have $20.;
have = "19,000417537";
want = input(have, commax32.);
format want 32.8;
run;
proc print data=have;
run;
Obs have want
1 19,000417537 19.00041754
In two steps you can replace the , with . with tranwrd and then use input to convert it to numeric.
data yourdf;
set df;
charnum2=tranwrd(charnum, ",", "."); /*replace , with .*/
numvar = input(charnum2, 12.); /*convert to numeric*/
run;
You can use the COMMA informat to read strings with commas in them. But if you want it to treat the commas as decimal points instead of ignoring them then you probably need to use COMMAX instead (Or perhaps use the NLNUM informat instead so that the meaning of commas and periods in the text will depending on your LOCALE settings).
So if the current dataset is named HAVE and the text you want to convert is in the variable named STRING you can create a new dataset named WANT with a new numeric variable named NUMBER with code like this:
data want;
set have;
number = input(string,commax32.);
run;

SAS treatment of blanks interferes with regex rules

I have a dataset which I need to clean using regex rules. These rules come from a file regex_rules.csv with columns string_pattern and string_replace and are applied using a combination of prxparse and prxchange as follows:
array a_rules{1:&NOBS} $200. _temporary_;
array a_rules_parsed{1:&num_rules} _temporary_;
if _n_ = 1 then
do i = 1 to &num_rules;
a_rules{i} = cat("'s/",string_pattern,"/",string_replace,"/'");
a_rules_parsed{i} = prxparse(cats('s/',string_pattern,'/',string_replace,'/','i'));
end
set work.dirty_strings;
clean_string = dirty_string;
do i = 1 to &num_rules;
debug_string = cats("Executing prxchange(",a_rules{i},",",-1,",","'",clean_string,"'",")");
put debug_string;
clean_string = PRXCHANGE(a_rules_parsed{i},-1,clean_string);
end
Some rules specify replacing certain patterns with a single blank space, so the corresponding string_replace value in the file is a single blank space.
The issue I'm facing is that SAS never respects the single space, and instead replaces the matched string_pattern for these records with an empty string (the other rules are applied as expected).
To troubleshoot I executed the following:
proc sql;
create table work.single_blanks as
select
string_pattern,
string_replace,
from work.regex_rules
where string_replace = " ";
quit;
which yielded the expected records. I was confused to find that changing the where clause to
where string_replace = "" or
where string_replace = " " gave identical results! (I've been using sas for a while but I guess this behavior has gone unnoticed until now). Consequently, I could not determine whether SAS is neglecting to properly read in the file and retain the single blank, or whether one of the prx functions is failing to properly handle the single blank.
I can think of "hacky" work-arounds, but I'd rather understand what I'm doing wrong here and what the correct solution should be.
EDIT 1:
Here is a rule from the file and how I'd expect it to act on an example input value:
string_pattern, string_replace
"(#|,|/|')", " "
running the code above on the input string dirty_string = "10,120 DIRTY DRIVE"; does not produce the expected output of "10 120 DIRTY DRIVE" but rather "10120 DIRTY DRIVE".
EDIT 2
In addition to not respecting single spaces, leading and trailing spaces do not seem to be respected. For example, for a file with the rules
string_pattern, string_replace
"\\bDR(\\.|\\b)", "DRIVE "
"\\bS(\\.|\\b)?W(\\.|\\b)", " SOUTH WEST"
running the code above on the input string dirty_string = "10120 DIRTY DR.SW."; does not produce the expected output of "10120 DIRTY DRIVE SOUTH WEST" but rather "10120 DIRTY DRIVESW.". This is because the space at the end of the first string_replace value gets lost, meaning there is no word boundary at the beginning of the second string_pattern to be matched.
SAS stores character variables as fixed length strings that are padded with spaces. As a consequence string comparisons ignore trailing spaces. So x=' ' and x=' ' are the same test.
The CATS() will remove all of the leading and trailing spaces, so empty strings will generate nothing at all. It sounds like you want to treat an empty string as a single space. The TRIM() function will return a single space for an empty string. So perhaps you just want to change this:
cats('s/',string_pattern,'/',string_replace,'/','i')
into
cat('s/',trim(string_pattern),'/',trim(string_replace),'/','i')
Here is a working code (with a fixed string_pattern) of your example data:
data test;
length string_pattern string_replace dirty_string expect
clean_string regex $200
;
infile cards dsd truncover;
input string_pattern string_replace dirty_string expect;
regex= cat('s/',trim(string_pattern),'/',trim(string_replace),'/i') ;
regex_id = prxparse(trim(regex));
clean_string = prxchange(regex_id,-1,trim(dirty_string));
if clean_string=expect then put 'GOOD'; else put 'BAD';
*put (_character_) (=$quote./);
cards4;
"(#|,|\/|')", " ","10,120 DIRTY DRIVE","10 120 DIRTY DRIVE"
;;;;
If any of your values have significant trailing spaces then you will need to store the data differently. You could for example quote the values:
string_replace = "'DRIVE '";
...
cat('s/',dequote(string_pattern),'/',dequote(string_replace),'/','i')
If you only add quotes around values that need them then you will need to include the TRIM() function calls.
cat('s/',dequote(trim(string_pattern)),'/',dequote(trim(string_replace)),'/','i')
Or store the string lengths into separate numeric fields.
cat('s/',substrn(string_pattern,1,len1),'/',substrn(string_replace,1,len2),'/','i')
And note that if any of your original character strings had either significant leading or trailing spaces they would have been eliminated by reading the data from a CSV file.

One function to replace different text with other in SAS

I want to replace one combination of text with another. For example
data test;
a='raja\ram{work}italic';
if index(a,'\') then b=tranwrd(a,'\','\\');
if index(a,'{') then b=tranwrd(a,'{','\{');
if index(a,'}') then b=tranwrd(a,'}','\}');
if index(upcase(a),'ITALIC') then b=tranwrd(a,substr(a,index(upcase(a),'ITALIC'),length('ITALIC')),'\i');
run;
Required Result: b=raja\\ram\{work\}\i;
These kind of combination I wanted to replace. I'm not interested to use a macro or FCMP or if else condition.
Is there any function to do all at once? I tried to use a Perl expression that also working for one at a time b= prxchange('s/\\/\\\\/', -1, a)
Your regular expression is on the right track. You have a set of characters, right, you want to always prepend a \ to? So search for (one of that set of characters), which you do with [...], and then add a \ to it, using a capturing group. That's the escape character, so you have to add two any time you want to use one (\\ escapes itself to \).
data test;
a='Hello\Goodbye{stuff}';
b= prxchange('s/([\\{}])/\\$1/',-1,a);
put b=;
run;
You should do the italic bit in a second expression (or just use tranwrd). That's a totally different replacement and while theoretically possible to put in one, would make it too messy.
This question is almost identical to the other question: Multiple search and replace within a string through regular expression in SAS
Is that a coincidence?
Here is the code that worked for the other question.
%let text = abc\pqr{work};
data _null_;
var=prxchange("s/\\/\\\\/",-1,"&text");
var=prxchange("s/\{/\\\{/",-1,var);
var=prxchange("s/\}/\\\}/",-1,var);
put var;
run;
Result: abc\\pqr\{work\};
%let text = BOLD\ITALIC\ITALICBOLD\BOLDITALIC\B\I\IB\BI;
data _null_;
var=prxchange("s/BOLD/b/",-1,"&text");
var=prxchange("s/ITALIC/i/",-1,var);
var=lowcase(var);
put var;
run;
RESULT: b\i\ib\bi\b\i\ib\bi

Convert string into numeric and change period to comma seperator sas

I have a string called weight that is 85.5
I would like to convert it into a numeric 85,5 and replace the decimal seperator with a comma using SAS.
So far I am using this (messy) two step approach
weight_num= (weight*1);
format weight_num COMMAX13.2;
How can this be achieved in a less clumpsy way??
Your sample code is the recommended method of changing a variable type.
Another way is transtrn function to replace the . with a comma. This is only a good method if you don't plan to do any calculations on the values.
data have;
set sashelp.class;
keep name weight:;
weight_char=put(weight, 8.1);
run;
data want;
set have;
weight_char=transtrn(weight_char, ".", ",");
run;
proc print data=want;
run;
If you just want to change it so that commas are used for decimal point instead of periods then why not just use a simple character substitution. Do you also want to change thousands separator from comma to period? TRANSLATE() is good for that.
weight = translate(weight,',.','.,');
If you want to convert it to a number then use the INPUT() function rather than forcing SAS to convert for you.
weight_num = input(weight,comma32.);
You can then attach whatever format you want to the new numeric variable.

how to import informats (comma separated values) into sas

I have a data with commas in tab file and I have imported it the values were imported into sas as a char datatype with a comma values.
like 23,1 53,2
I want to now convert these into numeric with either . or comma how do i do it?
if I use
want=input(have,comma.);
informat want comma.;
format want comma.;
I get missing values., !
You can use the NUMXw.d informat to input numbers with commas as the decimal separator.
want = input(have,NUM4.1);
or just use that on the initial input statement and you don't have to convert it.
NUMXw.d also is a format, so you can use it to display the variable with a comma if that's how you are more comfortable viewing decimals.
You can use a TRANWRD function to replace the comma with a period, then wrap this within an INPUT function to convert the new character value to numeric.
F2 = INPUT(TRANWRD(F1,',','.'),4.1);