I am trying to remove special character from the string.
"Mumbai rains live updates: IMD predicts heavy rainfall for next 24 hours �"
data demo1 (keep=headline2 headline3 headline4 headline5);
set kk.newspaper_append_freq_daily1;
headline2=trim(headline);
headline3=tranwrd(headline2,"�"," ");
headline5=compress(headline2,"�");
headline4=index(headline2,"�");
run;
You can use kpropdata function.
From doc:
Removes or converts unprintable characters.
Code example:
%let in=kk.newspaper_append_freq_daily1;
%let out=demo1;
data &out;
set ∈
array cc (*) _character_;
do i=1 to dim(cc);
cc(_N_)=kpropdata(cc(i),"TRUNC", 'utf-8');
end;
run;
In code I've used array statement to iterate over all character columns in table.
compress should also handle this if you keep a whitelist of characters rather than trying to exclude a blacklist - e.g.
clean_text = compress(dirty_text,'','kw');
The k modifier keeps characters instead of removing them, and w adds all printable characters to the list.
I am new to SAS and would like to keep what's before the hyphen '-' to create a new variable:
x
abc-something
efgh-everything
hij-something
I tried:
DATA NEW
set OLD;
y = (compress(substr([x], 3, 1));
RUN;
PROC PRINT DATA = NEW;
RUN;
to get it to look like this but it doesn't work:
x
abc
efgh
hij
Use the scan() function to split a string based on delimiter character(s).
y=scan(x,1,'-');
Of if you just want to first three characters then use SUBSTR() function.
y=substr(x,1,3);
Try without square brackets. Compress not required either.
Do you guys know how to replace remove the comma and period in something like this:
'18430109646000104331929350001,064380958490001,974317618110001,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,. '
I had to concatenate to get list of claim numbers (with leading zeros). Now, I have that string but I want to delete all the stuff at the end. I tried this but it didn't work
data OUT.REQ_1_4_25 ;
set OUT.REQ_1_4_24;
CONCAT1=PRXCHANGE('s/,.//',1,CONCAT);
run;
By the way, I am using SAS and regex, something like prxchange.
This also worked for me
data OUT.REQ_1_4_25 ;
set OUT.REQ_1_4_24;
CONCAT1=TRANWRD(CONCAT, ',.', '');
run;
The second argument to the PRXCHANGE function specifies the number of times the search and replace should be done. Replacing your 1 by -1 will run the replacement until the end of the string, rather than only once.
Also, the pair ',.' will replace a comma followed by any character ('.' is a wildcard). You want to catch either a comma (',') or a period ('.'), the last of which is a metacharacter you need to escape from, using '\':
CONCAT1=PRXCHANGE('s/[,\.]//',-1,CONCAT);
If you only want to remove the comma-period pairs, then remove the square brackets:
CONCAT1=PRXCHANGE('s/,\.//',-1,CONCAT);
No need for regex unless you have something more complicated than actually shown.
Just use the scan() function and tell it to use . and , as delimiters:
data claims;
length claim $50;
list = '18430109646000104331929350001,064380958490001,974317618110001,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.';
cnt=1;
claim=scan(list,cnt,'.,');
do while (claim ne '');
output;
cnt=cnt+1;
claim=scan(list,cnt,'.,');
end;
keep claim;
run;
Match strings ending in certain character
I am trying to get create a new variable which indicates if a string ends with a certain character.
Below is what I have tried, but when this code is run, the variable ending_in_e is all zeros. I would expect that names like "Alice" and "Jane" would be matched by the code below, but they are not:
proc sql;
select *,
case
when prxmatch("/e$/",name) then 1
else 0
end as ending_in_e
from sashelp.class
;quit;
You should account for the fact that, in SAS, strings are of char type and spaces are added up to the string end if the actual value is shorter than the buffer.
Either trim the string:
prxmatch("/e$/",trim(name))
Or add a whitespace pattern:
prxmatch("/e\s*$/",name)
^^^
to match 0 or more whitespaces.
SAS character variables are fixed length. So you either need to trim the trailing spaces or include them in your regular expression.
Regular expressions are powerful, but they might be confusing to some. For such a simple pattern it might be clearer to use simpler functions.
proc print data=sashelp.class ;
where char(name,length(name))='e';
run;
I know that the underscore wildcard can be used to match a single character, but I'm not sure if it can be used to match a single space? Can anybody help answer this? Thanks a lot.
Yes.
data test;
x=' ';
run;
proc sql;
select count(1) from test
where x like "_";
quit;
returns 1.
I think you can match a single space as a character as long as you specify the whole length of the character variable. If you want to use the _, when you specify the like operator you would have to include the full details of the length of character.
E.g. If I want to select "black wolf":
data work.animals;
input name $1-16 weight;
datalines;
monkey 20
shark 500
blue whale 200
black wolf 120
buffalo 400
;
data work.animals3;
set animals;
where name like 'black_wol_';
run;
I can use like 'black_wol_'; which includes the full matching pattern of the character inside the character variable. But, I can't just do like 'black_' or like 'black_wol';. It won't work because the number of characters in the string are different.
Alternatively, you can use the % sign which can specify any number of characters before, after or in the middle of a string. E.g. where name like '%e'; or where name like 'blu%e'; can select "blue whale". You can use both _ and % together.