I am new to SAS and would like to keep what's before the hyphen '-' to create a new variable:
x
abc-something
efgh-everything
hij-something
I tried:
DATA NEW
set OLD;
y = (compress(substr([x], 3, 1));
RUN;
PROC PRINT DATA = NEW;
RUN;
to get it to look like this but it doesn't work:
x
abc
efgh
hij
Use the scan() function to split a string based on delimiter character(s).
y=scan(x,1,'-');
Of if you just want to first three characters then use SUBSTR() function.
y=substr(x,1,3);
Try without square brackets. Compress not required either.
Related
I have a set of variables in SAS that should be numeric but are characters. Numbers are comma separated and I need a point. For example, I need 19,000417537 to be 19.000417537. I tried translate without success. the comma is still there and I'm not able to convert the variable to numeric using input(). Can anyone help me please?
Thank you in advance
Best
Use INPUT() with the COMMAX informat.
data have;
length have $20.;
have = "19,000417537";
want = input(have, commax32.);
format want 32.8;
run;
proc print data=have;
run;
Obs have want
1 19,000417537 19.00041754
In two steps you can replace the , with . with tranwrd and then use input to convert it to numeric.
data yourdf;
set df;
charnum2=tranwrd(charnum, ",", "."); /*replace , with .*/
numvar = input(charnum2, 12.); /*convert to numeric*/
run;
You can use the COMMA informat to read strings with commas in them. But if you want it to treat the commas as decimal points instead of ignoring them then you probably need to use COMMAX instead (Or perhaps use the NLNUM informat instead so that the meaning of commas and periods in the text will depending on your LOCALE settings).
So if the current dataset is named HAVE and the text you want to convert is in the variable named STRING you can create a new dataset named WANT with a new numeric variable named NUMBER with code like this:
data want;
set have;
number = input(string,commax32.);
run;
I have a table in sas and I want to create a new column C with a variable that should be computed by A and B, A should be in upcase letters and B in brackets.
If A is dog and B is cat then the C in that row should be DOG (cat).
I' m very new to sas, how can I do that?
I know that I can get upcase by upcase(A), but I don't know how I can have 2 character variables after one another to create a new variable and how to put a new variable in brackets.
SAS has a series of CAT.() functions that make that simple. CATS() strips the leading/trailing spaces from the values. CATX() allows you specify a value to paste between the values.
data want ;
set have;
length new $100 ;
new=catx(' ',upcase(a),cats('[',b,']'));
run;
Personally, I'm using cat/cats/catx only in very specific cases. For a problem like this, you can simply use the concatenate operator || that will make the code much more easier to understand:
data want;
set have;
attrib new format=$100.;
new = strip(upcase(a)) || " (" || strip(b) || ")";
run;
OK, that's maybe a little bit more verbose, but I think that's also more easy to understand for a new SAS programmer :)
Had a quick question - I need to remove punctuation and replace characters with a space (i.e.: if I have a field that contains a * I need to replace it with a white space).
I can't seem to get it right - I was originally doing this to just remove it, but I've found that in some cases my string is being squished together.
Thoughts?
STRING2 = compress(STRING, ":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...");
The COMPRESS() function will remove the characters. If you want to replace them with spaces then use the TRANSLATE() function. If you want to reduce multiple blanks to a single blank use the COMPBL() function.
STRING2 = compbl(translate(STRING,' ',":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…..."));
Rather than listing the characters that need to be converted to spaces you could use COMPRESS() to turn the problem around to listing the characters that should be kept.
So this example will use the modifiers ad on the COMPRESS() function call to pass the characters in STRING that are not alphanumeric characters to the TRANSLATE() function call so they will be replaced by spaces.
STRING2 = compbl(translate(STRING,' ',compress(STRING,' ','ad')));
Try using the translate function and see if it fits your needs:
data want;
STRING = "!';AAAAÄAA$";
STRING2 = translate(STRING,' ',':;,*~''’°-!()®#""#$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...');
run;
Output:
STRING STRING2
!';AAAAÄAA$ AAAA AA
Try the TRANSLATE() function.
TRANSLATE(SOURCE,TO,FROM);
data test;
string = "1:,*2~’°-ÍÓ3Äö‘—È…...4";
string2 = translate(string,
" ",
":,*~’°-!';()®""##$%^&©+=\/|[]}{]{?><ÉÑËÁ’ÍÓÄö‘—È…...");
put string2=;
run;
I get
string2=1 2 3 4
While translate function could get you there, you could also use REGEX in SAS. It is more elegant, but you need to escape the characters in the actual regex pattern.
data want;
input string $60.;
length new_string $60.;
new_string = prxchange('s/([\:\,\*\~\’\°\-\!\'||"\'"||';\(\)\®\"\"\#\#\$\%\^\&\©\+\=\\\/\|\[\}\{\]\{\\\?\>\<\É\Ñ\Ë\Á\’\Í\Ó\Ä\ö\‘\—\È\…\.\.\.\]])/ /',-1,string);
datalines;
Cats, dogs, and anyone else!
;
Try it with the help of regular expressions.
data have;
old = "AM;'IGH}|GH";
new = prxchange("s/[^A-Z]/ /",-1,old);
run;
proc print data=have nobs;
run;
OUTPUT-
old new
AM;'IGH}|GH AM IGH GH
I want to replace one combination of text with another. For example
data test;
a='raja\ram{work}italic';
if index(a,'\') then b=tranwrd(a,'\','\\');
if index(a,'{') then b=tranwrd(a,'{','\{');
if index(a,'}') then b=tranwrd(a,'}','\}');
if index(upcase(a),'ITALIC') then b=tranwrd(a,substr(a,index(upcase(a),'ITALIC'),length('ITALIC')),'\i');
run;
Required Result: b=raja\\ram\{work\}\i;
These kind of combination I wanted to replace. I'm not interested to use a macro or FCMP or if else condition.
Is there any function to do all at once? I tried to use a Perl expression that also working for one at a time b= prxchange('s/\\/\\\\/', -1, a)
Your regular expression is on the right track. You have a set of characters, right, you want to always prepend a \ to? So search for (one of that set of characters), which you do with [...], and then add a \ to it, using a capturing group. That's the escape character, so you have to add two any time you want to use one (\\ escapes itself to \).
data test;
a='Hello\Goodbye{stuff}';
b= prxchange('s/([\\{}])/\\$1/',-1,a);
put b=;
run;
You should do the italic bit in a second expression (or just use tranwrd). That's a totally different replacement and while theoretically possible to put in one, would make it too messy.
This question is almost identical to the other question: Multiple search and replace within a string through regular expression in SAS
Is that a coincidence?
Here is the code that worked for the other question.
%let text = abc\pqr{work};
data _null_;
var=prxchange("s/\\/\\\\/",-1,"&text");
var=prxchange("s/\{/\\\{/",-1,var);
var=prxchange("s/\}/\\\}/",-1,var);
put var;
run;
Result: abc\\pqr\{work\};
%let text = BOLD\ITALIC\ITALICBOLD\BOLDITALIC\B\I\IB\BI;
data _null_;
var=prxchange("s/BOLD/b/",-1,"&text");
var=prxchange("s/ITALIC/i/",-1,var);
var=lowcase(var);
put var;
run;
RESULT: b\i\ib\bi\b\i\ib\bi
data:
Hell_TRIAL21_o World
Good Mor_Trial9_ning
How do I remove the _TRIAL21_ and _TRIAL9_?
What I did was I find the position of the first _ and the second _. Then I want to compress from the first _ and second _. But the compress function is not available to do so. How?
x = index(string, '_');
if (x>0) then do;
y = x+1;
z = find(string, '_', y);
end;
Text= " Hell_TRIAL21_o World Good Mor_Trial9_ning"
var= catx("",scan(text,1,"_"),"__",scan(text,3,"_"),"_", scan(text,5,"_"))
Note that the length of variable var may not be desirable to your case.Remember to adjust accordingly.
PERL regular expressions are a good way of identifying these sort of strings. call prxchange is the function that will remove the relevant characters. It requires prxparse beforehand to create the search and replace parameters.
I've used modify here to amend the existing dataset, obviously you may want to use set to write out to a new dataset and test the results first.
data have;
input string $ 30.;
datalines;
Hell_TRIAL21_o World
Good Mor_Trial9_ning
;
run;
data have;
modify have;
regex = prxparse('s/_.*_//'); /* identify and remove anything between 2 underscores */
call prxchange(regex,-1,string);
run;
Or to create a new variable and dataset, just use prxchange (which doesn't require prxparse).
data want;
set have;
new_string = prxchange('s/_.*_//',-1,string);
run;