I have data as follows:
ID date shoesize shoetype
1 4/3/12 . bball
2 . 12 running
3 1/2/12 8 .
4 . 9.5 bball
I want to count the number of '.' there are in each row and make a frequency table with the information. Thanks in advance
You can determine the number of missing values in a row with the NMISS and CMISS functions (NMISS for numeric, CMISS for character). If you have a list of just some of your variables, you should use that list; if not, you need to deal with the fact that number_missing itself will be missing (the -1 there).
data want;
set have;
number_missing=nmiss(of _numeric_) + cmiss(of _character_)-1;
run;
Then do whatever you want with that new variable.
NMISS doesn't work if you wish to evaluate character variables. It converts character variables in the list of arguments to numeric which results in a count being made of missing in every instance that a character variable is encountered. CMISS doesn't convert character variable values to missing and therefore you get the correct answer.
Obviously you can choose not to include the character variables as your arguments, however I am assuming that you want to count missing values in character variables as well, based on the sample you provided. If this is the case the following should do what you want.
DATA WANT3;
SET HAVE;
NUMBER_MISSING = 0;
NUMBER_MISSING=CMISS(OF _ALL_);
RUN;
You must allocate a value to NUMBER_MISSING, otherwise the new variable is also evaluated as a missing.
Related
I have a variable named ID looks like the following.
ID
ABC.L
ABCa.L
BDE.L
BDEna.L
BNE.F
HDF.A
The last character or the last two character of this variable before . might be in lower case. I want to check if it is the case, if it is the case I will create a new variable and drop the lower case characters. If there is no lower cased character the new variable will be the same as the original variable. Can anyone kindly suggest me how can I achieve this please?
ID New_ID
ABC.L ABC.L
ABCa.L ABC.L
BDE.L BDE.L
BDEna.L BDE.L
BNE.F BNE.F
HDF.A HDF.A
COMPRESS function to K=keep U=uppercase alphabetic characters including the .=period
254 data have;
255 input ID $;
256 newid = compress(id,'.','KU');
257 put 'NOTE: ' (_all_)(=);
258 cards;
NOTE: ID=ABC.L newid=ABC.L
NOTE: ID=ABCa.L newid=ABC.L
NOTE: ID=BDE.L newid=BDE.L
NOTE: ID=BDEna.L newid=BDE.L
NOTE: ID=BNE.F newid=BNE.F
NOTE: ID=HDF.A newid=HDF.A
Another way to use prxchange function. Here [a-z] indicates lowercase letters // here means replace it with nothing. -1 indicates as many times it is present.
data want;
set have;
new_id1=prxchange('s/[a-z]//',-1, id);
run;
I hope you can help with the below problem.
I have a dataset with a Char column with for example '000000036', 'C', 'M' etc. I want to convert this column to a Numeric column with the examples above displayed as 36, C, M etc.
Thanks
Tope
SAS numeric values can have special missing values, which are referenced in code as .A .B .C etc, but when printed they display as A B C. If your variable only has one character long alpha values, this may be the easiest way to maintain the characters. You can use the MISSING statement to tell SAS which characters represent special missing values.
data have;
input cvar $3.;
cards;
036
C
M
;
missing C M;
data want;
set have;
nvar=input(cvar,3.);
put cvar= nvar=;
run;
missing; *restore default;
If your data has longer character strings, you would need to use a different approach for the conversion (perhaps a custom informat), but I would still consider using special missing values, assuming you have no more than 27 unique character values in the source data.
Numeric columns can only store numbers or missing values. It is possible to display numbers as character, using a format, but that is not relevant here.
#Quentin is correct in that you can use special missing values to display your letters, although as he points out, these are restricted to single letters only. SAS usually stores missing numeric values as a period (.), but you can also use the letters A-Z and the underscore to represent special missing values, these are actually stored as .A, .B, .C,----> ._ .
If you use the missing statement in a data step with all the alphabet letters, then it will automatically assign and display the relevant letter as the special missing value. If any character values are more than one letter then it won't be able to convert or display that value.
data have;
input char_col $15.;
datalines;
000000036
C
M
;
run;
data want;
set have;
missing a b c d e f g h i j k l m n o p q r s t u v w x y z;
num_col=input(char_col,best12.);
run;
i have dataset a
data q7;
input trt$;
cards;
a150
b250
c300
400
abc180
;
run;
We have to create dataset b like this
trt dose
a150 150mg
b250 250mg
c300 300mg
400 400mg
abc180 180mg
new dose variable is added & mg is written after each
numeric values
here is my solution - Basically use the compress functions to keep (hence the 'k') only numbers from the trt variable. From there then is just the case of concatenating mg to numbers.
data want;
set q7;
dose = cats(compress(trt,'0123456789','k'),'mg');
run;
The compress function default behaviour is to return a character string with specified characters removed from the original string.
so
compress(trt,'0123456789') would have removed all numbers from the trt variable.
However compress comes with a battery of modifiers that let the user alter the default behaviour.
So in your case, we wanted to keep numbers regardless of the number of preceding letters so I used the modifier k to keep instead the list of characters in this case 012345679
For a full list of modifiers please read the following link
cats is one of the many functions SAS have to concatenate strings, so passing the compress argument as 1st string and mg as 2nd string will concatenate both to produce your desired result
hope it helps
I am using the notalnum function in SAS. The input is a db field. Now, the function is returning a value that tells me there is a special character at the end of every string.
It is not a space character, because I have used COMPRESS function on the input field.
How can I print the ACII value of the special character at the end of each string?
The $HEX. format is the easiest way to see what they are:
data have;
var="Something With A Special Char"||'0D'x;
run;
data _null_;
set have;
rul=repeat('1 2 3 4 5 6 7 8 9 0 ',3); *so we can easily see what char is what;
put rul=;
put var= $HEX.;
run;
You can also use the c option on compress (var=compress(var,,'c');) to compress out control characters (which are often the ones you're going to run into in these situations).
Finally - 'A0'x is a good one to add to the list, the non-breaking space, if your data comes from the web.
If you want to see the position of the character within the ascii table you can use the rank() function, e.g.:
data _null_;
string = 'abc123';
do i = 1 to length(string);
asc = rank(substr(string,i,1));
put i= asc=;
end;
run;
Gives:
i=1 asc=97
i=2 asc=98
i=3 asc=99
i=4 asc=49
i=5 asc=50
i=6 asc=51
Joe's solution is very elegant, but seeing as my hex->decimal conversion skills are pretty poor I tend to do it this way.
I am looking for a way to convert the characters into numbers in SAS so that I can use the max function. Also, it would be helpful if the characters and only the numbers are kept. Below is a list of data for a column in a SAS table.
Column UNK
abc20140714
abc20140714x
abc20140714xyz
123_abc20140714_xyz
abc20150718
After stripping out the number values from the column, I would then group the data and use the max function in SAS, which should only generate the value 20150718.
To avoid any confusion, my question, is there a way to strip out the non-numeric values, and then convert the column into a numeric column so I can use the max function?
Thanks.
Sure!
var_num = input(compress(var_char,,'kd'),yymmdd8.);
Compress removes or keeps characters from a list. 'kd' says to 'keep digits'.
You then input using the appropriate informat; yymmdd8. looks right based on the data you provide. Then apply a format, format var_num yymmdd8n.; or similar, so it looks like a date visually (even if it's really a number underneath).
As pointed out, this won't work if there are other numeric digits in the values; you need to look at your data and identify how those appear and clean them out separately. You could use a regular expression for example to identify things that have 8 consecutive digits, starting with a 20; but ultimately it is a data analysis issue to handle these as your data require.
To get the first sequence of 8 digits in a row starting with a 1 or a 2 as a numeric value, you can use the following:
data want;
set have;
pos = prxmatch("/[12]\d{7}/", character_string);
if pos > 0 then number = input(substr(character_string, pos, 8), 8.);
else number = .;
drop pos;
run;
The prxmatch expression finds the starting position of the sequence, and the substr expression extracts the sequence, then the input function converts it to a numeric.
(Edited to incorporate Joe's feedback)