I am using the notalnum function in SAS. The input is a db field. Now, the function is returning a value that tells me there is a special character at the end of every string.
It is not a space character, because I have used COMPRESS function on the input field.
How can I print the ACII value of the special character at the end of each string?
The $HEX. format is the easiest way to see what they are:
data have;
var="Something With A Special Char"||'0D'x;
run;
data _null_;
set have;
rul=repeat('1 2 3 4 5 6 7 8 9 0 ',3); *so we can easily see what char is what;
put rul=;
put var= $HEX.;
run;
You can also use the c option on compress (var=compress(var,,'c');) to compress out control characters (which are often the ones you're going to run into in these situations).
Finally - 'A0'x is a good one to add to the list, the non-breaking space, if your data comes from the web.
If you want to see the position of the character within the ascii table you can use the rank() function, e.g.:
data _null_;
string = 'abc123';
do i = 1 to length(string);
asc = rank(substr(string,i,1));
put i= asc=;
end;
run;
Gives:
i=1 asc=97
i=2 asc=98
i=3 asc=99
i=4 asc=49
i=5 asc=50
i=6 asc=51
Joe's solution is very elegant, but seeing as my hex->decimal conversion skills are pretty poor I tend to do it this way.
Related
Using SAS, I have a table with sentences and I am looking to find the rows in the table where the keyword is found in the sentence making use of fuzzy matching (complev function). Is there a way in SAS to find the keyword string in the sentences? I know how to use complev, but I only can use it to compare complete strings, not a string as a part of a larger string. For this example table the keyword would be 'example' and the result of the comparison would be in the column Result.
Thanks for your ideas!
This is an Example sentence : 1
Here is another one : 0
Also an exmple : 1
The examples keep coming : 1
No worries : 0
See if you can use this as a template. I compare the Complev value to three, but you can set it to any fitting value.
data have;
input string $ 1-25;
datalines;
Example sentence
Here is another one
Also an exmple
The examples keep coming
No worries
;
data want;
set have;
result = 0;
do _N_ = 1 to countw(string);
if complev('example', scan(string, _N_)) < 3 then do;
result=1; leave;
end;
end;
run;
EDIT: Use complev('example', scan(string, _N_), 'i') if you want the comparison the be case insensitive.
This is somewhat related to my other question recently.
Setup I am reading in character variables of the sort 1 or 2,0 or 10,0 or 2,5. I want to convert them to numerics using a decimal point instead of a comma.
So ideally I would like to get the following result:
1 -> 1
2,0 -> 2
10,0 -> 10
2,5 -> 2.5
My code
data _null_;
test='5,0';
result=input(test_point,comma10.1);
put 'this should be:' result;
run;
does this for all character variables which are of the type 'xy,z' but fails for 'xy' with no comma separation at all. Here I would get
xy -> x,y
I was thinking to add an if/else to check whether the character string has length of 1 or bigger. So something like
data _null_;
test='5';
if length(test)=1 then result=input(test, comma10.);
else result=input(test, comma10.1);
put 'this should be:' result;
run;
But the problem here would be that
10 -> 1
Problems with like 10,00 (which is supposed to be 10) becoming 100 could probably be resolved by substituting the ',' with '.', but the characters with no decimal delimiter remain a problem.
Is there any clever solution to this?
My solution which is a bit hacky (and basically only uses the fact that the comma introduces a length>2 - problems with e.g. 123 would still arise):
data _null_;
t='5,5';
test=tranwrd(t, ',', '.');
if length(test)=1 or length(test)=2 then result=input(test, comma10.);
else result=input(test, comma10.1);
put 'this should be:' result;
run;
Sounds like your text strings were created in a place where the normal meaning of comma and period in numbers is reversed. So instead of using a period for decimal point and comma for thousand grouping they have reversed the meaning.
For that type of strings SAS has the COMMAX informat.
Normally you do NOT want to add a decimal specification to your informat. The decimal part of the informat is only used when the source string does not have a explicit decimal point. Basically it is telling SAS to divide values without an explicit decimal point by 10 to the power of the number of decimal places in the informat specification. It is designed to read data where the decimal point was purposely not written in order to save space.
Pretty much all the COMMA informat does is strip the string of commas and dollar signs and then read it using the normal numeric informat.
The COMMAX informat is the one that will understand the reversed meaning of the commas and periods. So it pretty much eliminates the periods and then converts the commas to periods and then reads it using the normal numeric informat.
Try a little test of your own.
data check;
input #1 string $32. #1 num ??32. #1 comma ??comma32. #1 commax ??commax32.
#1 d2num ??32.2 #1 d2comma ??comma32.2 #1 d2commax ??commax32.2
;
cards;
123
123.4
123,4
1,234.5
1.234,5
;
proc print;
run;
As it turns out (found it here) the COMMAXw,d does the trick without any hassle, the code then would be:
data _null_;
test='0,5';
result = input(test, COMMAX10.);
put 'this should be:' result;
run;
I find it a bit anti-intuitive, but it works.
I can not find the way to reverse text strings.
For example I want to reverse these:
MMMM121231M34 to become 43M132121MMMM
MM1M11M1 to become 1M11M1MM
1111213111 to become 1113121111
Judging from your examples, what you mean by 'rearrange' is actually 'reverse'.
In that case, you've got the very handy reverse() function in SAS.
Used in context:
data test;
length text $32;
infile datalines;
input text $;
result=reverse(strip(text));
datalines;
MMMM121231M34
MM1M11M1
1111213111
;
run;
EDIT on #Joe's request: in the particular example above, I create the test dataset by setting a length of 32 characters for the text variable. Therefore, when reading the values from datalines, these are padded with blanks up to that total of 32 characters. Hence, when reversing that value, the result has that many blanks at the start, followed by the actual value you are looking for. By adding the strip function, you remove the excess blanks from the value of text before reversing, keeping only the "real" value in the result.
I have a SAS string that always starts with a date. I want to remove the date from the substring.
Example of data is below (data does not have bullets, included bullets to increase readability)
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
I want the data to look like this (data does not have bullets, included bullets to increase readability)
test_num15
recom_1_test1
test_0_8_i0|vacc_previous0
Index find '|' position in the string, then substr substring; or use regular expression.
data have;
input x $50.;
x1=substr(x,index(x,'|')+1);
x2=prxchange('s/([^_]+\|)(?=\w+)//',1,x);
cards;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;
run;
This is a great use case for call scan. If your length of date is constant (always 10), then you don't actually need this (start would be 12 then and skip to the substr, as user667489 noted in comments), but if it's not this would be helpful.
data have;
length textstr $100;
input textstr $;
datalines;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;;;;
run;
data want;
set have;
call scan(textstr,2,start,length,'|');
new_textstr = substr(textstr,start);
run;
It would also let you grab the second word only if that's useful (using length third argument for substr).
I have data as follows:
ID date shoesize shoetype
1 4/3/12 . bball
2 . 12 running
3 1/2/12 8 .
4 . 9.5 bball
I want to count the number of '.' there are in each row and make a frequency table with the information. Thanks in advance
You can determine the number of missing values in a row with the NMISS and CMISS functions (NMISS for numeric, CMISS for character). If you have a list of just some of your variables, you should use that list; if not, you need to deal with the fact that number_missing itself will be missing (the -1 there).
data want;
set have;
number_missing=nmiss(of _numeric_) + cmiss(of _character_)-1;
run;
Then do whatever you want with that new variable.
NMISS doesn't work if you wish to evaluate character variables. It converts character variables in the list of arguments to numeric which results in a count being made of missing in every instance that a character variable is encountered. CMISS doesn't convert character variable values to missing and therefore you get the correct answer.
Obviously you can choose not to include the character variables as your arguments, however I am assuming that you want to count missing values in character variables as well, based on the sample you provided. If this is the case the following should do what you want.
DATA WANT3;
SET HAVE;
NUMBER_MISSING = 0;
NUMBER_MISSING=CMISS(OF _ALL_);
RUN;
You must allocate a value to NUMBER_MISSING, otherwise the new variable is also evaluated as a missing.