SAS: How to delete word between two specific position? - sas

data:
Hell_TRIAL21_o World
Good Mor_Trial9_ning
How do I remove the _TRIAL21_ and _TRIAL9_?
What I did was I find the position of the first _ and the second _. Then I want to compress from the first _ and second _. But the compress function is not available to do so. How?
x = index(string, '_');
if (x>0) then do;
y = x+1;
z = find(string, '_', y);
end;

Text= " Hell_TRIAL21_o World Good Mor_Trial9_ning"
var= catx("",scan(text,1,"_"),"__",scan(text,3,"_"),"_", scan(text,5,"_"))
Note that the length of variable var may not be desirable to your case.Remember to adjust accordingly.

PERL regular expressions are a good way of identifying these sort of strings. call prxchange is the function that will remove the relevant characters. It requires prxparse beforehand to create the search and replace parameters.
I've used modify here to amend the existing dataset, obviously you may want to use set to write out to a new dataset and test the results first.
data have;
input string $ 30.;
datalines;
Hell_TRIAL21_o World
Good Mor_Trial9_ning
;
run;
data have;
modify have;
regex = prxparse('s/_.*_//'); /* identify and remove anything between 2 underscores */
call prxchange(regex,-1,string);
run;
Or to create a new variable and dataset, just use prxchange (which doesn't require prxparse).
data want;
set have;
new_string = prxchange('s/_.*_//',-1,string);
run;

Related

Unable to convert a character variable with numbers with a comma into numeric

I have a set of variables in SAS that should be numeric but are characters. Numbers are comma separated and I need a point. For example, I need 19,000417537 to be 19.000417537. I tried translate without success. the comma is still there and I'm not able to convert the variable to numeric using input(). Can anyone help me please?
Thank you in advance
Best
Use INPUT() with the COMMAX informat.
data have;
length have $20.;
have = "19,000417537";
want = input(have, commax32.);
format want 32.8;
run;
proc print data=have;
run;
Obs have want
1 19,000417537 19.00041754
In two steps you can replace the , with . with tranwrd and then use input to convert it to numeric.
data yourdf;
set df;
charnum2=tranwrd(charnum, ",", "."); /*replace , with .*/
numvar = input(charnum2, 12.); /*convert to numeric*/
run;
You can use the COMMA informat to read strings with commas in them. But if you want it to treat the commas as decimal points instead of ignoring them then you probably need to use COMMAX instead (Or perhaps use the NLNUM informat instead so that the meaning of commas and periods in the text will depending on your LOCALE settings).
So if the current dataset is named HAVE and the text you want to convert is in the variable named STRING you can create a new dataset named WANT with a new numeric variable named NUMBER with code like this:
data want;
set have;
number = input(string,commax32.);
run;

Create new variable using partial string of another variable in SAS

I am new to SAS and would like to keep what's before the hyphen '-' to create a new variable:
x
abc-something
efgh-everything
hij-something
I tried:
DATA NEW
set OLD;
y = (compress(substr([x], 3, 1));
RUN;
PROC PRINT DATA = NEW;
RUN;
to get it to look like this but it doesn't work:
x
abc
efgh
hij
Use the scan() function to split a string based on delimiter character(s).
y=scan(x,1,'-');
Of if you just want to first three characters then use SUBSTR() function.
y=substr(x,1,3);
Try without square brackets. Compress not required either.

How to check whether the first character of a string is a small letter using sas

I have a variable NAME. I want to check whether the first character of this variable is a small letter or not. Name looks like the following:
aBMS
BMS
xMS
zVewS
fPP
NBMS
I extract the first character of my variable using first_letter = first(NAME); Can anyone teach me how to check whether the variable first_letter is a small letter or not. Now I did it as follows, but I am wondering if I can achieve this without typing the whole alphabet letters. if first_letter = 'a' | first_letter = 'b' |first_letter = 'c' ... then dummy = 1.
Using the compress function with kl as the 3rd argument tells SAS to keep only lowercase characters, so the following works correctly for all cases, including non-alphanumeric first characters:
data have;
input NAME $;
cards;
aBMS
BMS
xMS
zVewS
fPP
NBMS
;
run;
data want;
set have;
FLAG = compress(first(NAME),,'lk') ne '';
run;
N.B. The third argument for compress is a feature that was only added to SAS in version 9.1, so this won't work in earlier versions of SAS.
Also, this will work both in a where clause and in a data step if statement - by contrast, the between syntax used in Gordon's answer is only valid in a where clause. A variant of this approach that would work in both cases is:
data want;
set have;
/*Yes, SAS supports character inequalities!*/
FLAG = 'a' <= first(NAME) <= 'z';
run;
Perl Regular Expression can also provide an alternative:
data have;
input NAME $;
cards;
aBMS
BMS
xMS
zVewS
fPP
NBMS
;
run;
data want;
set have;
if prxmatch('/^[[:lower:]]/', name)>0;
run;
This is very straightforward, literally checking if the first letter is the lower case. ^ to define the beginning of the string, [[:lower:]] is to match the lower case characters.
first(string) eq lowcase(first(string))
This will also true be if the first character in the string is not alphabet character. You don't mention if that scenario is to be considered.
SAS proc sql is case sensitive, so the following should work:
proc sql;
select t.*
from t
where substring(t.name from 1 for 1) between 'a' and 'z';

Convert string into numeric and change period to comma seperator sas

I have a string called weight that is 85.5
I would like to convert it into a numeric 85,5 and replace the decimal seperator with a comma using SAS.
So far I am using this (messy) two step approach
weight_num= (weight*1);
format weight_num COMMAX13.2;
How can this be achieved in a less clumpsy way??
Your sample code is the recommended method of changing a variable type.
Another way is transtrn function to replace the . with a comma. This is only a good method if you don't plan to do any calculations on the values.
data have;
set sashelp.class;
keep name weight:;
weight_char=put(weight, 8.1);
run;
data want;
set have;
weight_char=transtrn(weight_char, ".", ",");
run;
proc print data=want;
run;
If you just want to change it so that commas are used for decimal point instead of periods then why not just use a simple character substitution. Do you also want to change thousands separator from comma to period? TRANSLATE() is good for that.
weight = translate(weight,',.','.,');
If you want to convert it to a number then use the INPUT() function rather than forcing SAS to convert for you.
weight_num = input(weight,comma32.);
You can then attach whatever format you want to the new numeric variable.

Strip apostrophes from a character string (compress?)

I have a string which looks like this:
"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"
And I'd like it to look like this:
ABAR_VAL ACQ_EXPTAX_Y ACQ_EXP_TAX ADJ_MATHRES2
I.e. no apostrophes or commas and single space separated.
What is the cleanest / shortest way to do so in SAS 9.1.3?
Preferably something along the lines of:
call symput ('MyMacroVariable',compress(????,????,????))
Just to be clear, the result needs to be single space separated, devoid of punctuation, and contained in a macro variable.
Here you go..
data test;
var1='"ABAR_VAL", "ACQ_EXPTAX_Y", "ACQ_EXP_TAX", "ADJ_MATHRES2"';
run;
data test2;
set test;
call symput('macrovar',COMPBL( COMPRESS( var1,'",',) ) );
run;
%put &macrovar;
Is this part of an infile statement or are you indeed wanting to create macro variables that contain these values? If this is part of an infile statement you shouldn't need to do anything if you have the delimiter set properly.
infile foo DLM=',' ;
And yes, you can indeed use the compress function to remove specific characters from a character string, either in a data step or as part of a macro call.
COMPRESS(source<,characters-to-remove>)
Sample Data:
data temp;
input a $;
datalines;
"boo"
"123"
"abc"
;
run;
Resolve issue in a data step (rather than create a macro variable):
data temp2; set temp;
a=compress(a,'"');
run;
Resolve issue whilst generating a macro variable:
data _null_; set temp;
call symput('MyMacroVariable',compress(a,'"'));
run;
%put &MyMacroVariable.;
You'll have to loop through the observations in order to see the compressed values the variable for each record if you use the latter code. :)
To compress multiple blanks into one, use compbl : http://www.technion.ac.il/docs/sas/lgref/z0214211.htm