how to detect and modify lower case letters in SAS - sas

I have a variable named ID looks like the following.
ID
ABC.L
ABCa.L
BDE.L
BDEna.L
BNE.F
HDF.A
The last character or the last two character of this variable before . might be in lower case. I want to check if it is the case, if it is the case I will create a new variable and drop the lower case characters. If there is no lower cased character the new variable will be the same as the original variable. Can anyone kindly suggest me how can I achieve this please?
ID New_ID
ABC.L ABC.L
ABCa.L ABC.L
BDE.L BDE.L
BDEna.L BDE.L
BNE.F BNE.F
HDF.A HDF.A

COMPRESS function to K=keep U=uppercase alphabetic characters including the .=period
254 data have;
255 input ID $;
256 newid = compress(id,'.','KU');
257 put 'NOTE: ' (_all_)(=);
258 cards;
NOTE: ID=ABC.L newid=ABC.L
NOTE: ID=ABCa.L newid=ABC.L
NOTE: ID=BDE.L newid=BDE.L
NOTE: ID=BDEna.L newid=BDE.L
NOTE: ID=BNE.F newid=BNE.F
NOTE: ID=HDF.A newid=HDF.A

Another way to use prxchange function. Here [a-z] indicates lowercase letters // here means replace it with nothing. -1 indicates as many times it is present.
data want;
set have;
new_id1=prxchange('s/[a-z]//',-1, id);
run;

Related

Why is not $ needed between the names of two last columns

I can't get why there isn't $ between Age and Weight like there is for others
enter image description here
And Why the missing value is denoted by blank in row 2 but period(.) in row 4
enter image description here
Please help with such questions, I'm new to SAS and it's so miserable when I asked my colleagues, and nobody was willing to answer...
The $ after name and gender indicates that those variables are character variables. The other variables are numeric variables. Those are the only 2 data types in SAS.
Missing values for character variables are blanks while missing values for numeric variables are shown with a period.
The $ modifier in a INPUT statement let's SAS know that it should treat the preceding variable as CHARACTER instead of numeric. It is only needed when you have not previously already told SAS that the variable is character, because the default assumption is that a variable is numeric.
The normal character informat, $, which is what your code used because it did not specify any other informat for NAME or SEX, will convert a single period into blanks, which is how SAS stores an empty character value. If you want SAS to input a single period as an actual single period you would need to use the $CHAR informat instead. So if you modified the input statement like this:
input name $ sex :$char1. age weight ;
then the value of SEX would be the one byte string '.' (since the width of the informat gave SAS better information about what length to use to define the variable) instead of the current 8 character string of blanks ' '.
The normal way to display a missing numeric variable is as a single period. You can change to have it display some other single character instead by changing the setting of the MISSING option. For example to have the missing values of AGE print as blank also you could add this OPTIONS statement before you print it.
options missing=' ';

SAS variable value does not match the formatting specified

I have a SAS dataset. When I 'View Columns', I find a column with Type=text, length=3, informat = $3., format=$3.
The value stored in this variable is 10.
But based on the attributes, should it not be stored as 010?
The attributes say you have a character variable that can hold 3 bytes (normal character encodings use one byte per character). You could store '010' in that variable or '10 ' or even ' 10'. You could also store 'ABC' or 'abc'. It is just a character variable. Note that SAS always stores fixed length character fields so shorter values are padded with spaces.
It also has optionally added the FORMAT metadata saying that when displaying the value SAS should use the $3. format. Similarly is has optional metadata that says when reading text it should use the $3. informat to convert incoming text into the value to be stored.
This metadata is NOT needed because SAS already knows how to read and display character data. If you did store values with leading spaces you might want to attach the $CHAR3. format instead so that the leading spaces are preserved when writing the value.
As the variable is just text, it will just store what it is assigned. For example:
data have;
length var1 $ 3;
informat var1 $3.;
format var1 $3.;
input var1;
datalines;
10
010
;
The fact that it has a format of $3. will not cause it to be prefixed with a leading 0, as you will see from the documentation of the $w. format, where that is not mentioned. Also, the value could later be changed to 'ab'; in both cases the value is padded with a trailing space to make up the length of 3.

SAS print ASCII value of special character

I am using the notalnum function in SAS. The input is a db field. Now, the function is returning a value that tells me there is a special character at the end of every string.
It is not a space character, because I have used COMPRESS function on the input field.
How can I print the ACII value of the special character at the end of each string?
The $HEX. format is the easiest way to see what they are:
data have;
var="Something With A Special Char"||'0D'x;
run;
data _null_;
set have;
rul=repeat('1 2 3 4 5 6 7 8 9 0 ',3); *so we can easily see what char is what;
put rul=;
put var= $HEX.;
run;
You can also use the c option on compress (var=compress(var,,'c');) to compress out control characters (which are often the ones you're going to run into in these situations).
Finally - 'A0'x is a good one to add to the list, the non-breaking space, if your data comes from the web.
If you want to see the position of the character within the ascii table you can use the rank() function, e.g.:
data _null_;
string = 'abc123';
do i = 1 to length(string);
asc = rank(substr(string,i,1));
put i= asc=;
end;
run;
Gives:
i=1 asc=97
i=2 asc=98
i=3 asc=99
i=4 asc=49
i=5 asc=50
i=6 asc=51
Joe's solution is very elegant, but seeing as my hex->decimal conversion skills are pretty poor I tend to do it this way.

How to modify string to a character value in which each character of the string is separated by a comma?

I came across this question today morning and I am still trying to figure out it can be done. the following dataset is present and has a character variable CAT.
CAT
A
AB
B
ABCD
CB
.
.
.
and so on.
We need to write a SAS program to introduce commas in-between each character of the string if the length of the string is more than 1. I used length() function and used a do loop to create different variables and it just got messy. How do i tackle this?
Regular expression solution:
data have;
input CAT $;
datalines;
A
AB
B
ABCD
CB
;;;;
run;
data want;
set have;
cat_c = prxchange('s/(?<=[[:alpha:]])([[:alpha:]])/,$1/io',-1,CAT);
put cat_c=;
run;
The first parenthetical group is a look-behind for an alpha character; then the captured alpha character. Then replace with comma and character. If you want something other than [[:alpha:]] (ie, A-Z) then supply that as a class.
The solution using length and do loop isn't bad, honestly, if you want something that is more readable to novice programmers. Just use SUBSTR left of the equal sign.
data want2;
set have;
if length(cat) > 1 then
do _t = 1 to length(cat)-1;
substr(cat_c,2*_t-1,2)=substr(cat,_t,1)||',';
end;
substr(cat_c,2*length(cat)-1,1)=substr(cat,length(cat),1);
put cat_c=;
run;

how many values are missing for each observation

I have data as follows:
ID date shoesize shoetype
1 4/3/12 . bball
2 . 12 running
3 1/2/12 8 .
4 . 9.5 bball
I want to count the number of '.' there are in each row and make a frequency table with the information. Thanks in advance
You can determine the number of missing values in a row with the NMISS and CMISS functions (NMISS for numeric, CMISS for character). If you have a list of just some of your variables, you should use that list; if not, you need to deal with the fact that number_missing itself will be missing (the -1 there).
data want;
set have;
number_missing=nmiss(of _numeric_) + cmiss(of _character_)-1;
run;
Then do whatever you want with that new variable.
NMISS doesn't work if you wish to evaluate character variables. It converts character variables in the list of arguments to numeric which results in a count being made of missing in every instance that a character variable is encountered. CMISS doesn't convert character variable values to missing and therefore you get the correct answer.
Obviously you can choose not to include the character variables as your arguments, however I am assuming that you want to count missing values in character variables as well, based on the sample you provided. If this is the case the following should do what you want.
DATA WANT3;
SET HAVE;
NUMBER_MISSING = 0;
NUMBER_MISSING=CMISS(OF _ALL_);
RUN;
You must allocate a value to NUMBER_MISSING, otherwise the new variable is also evaluated as a missing.