Is it possible to use the number in this string:
'xx8xx'
by replacing the number with 8 spaces to get this string:
'xx xx'
I can identify the number between the xx but the replacement syntax does not work as intended:
PRXCHANGE(s/xx([\d]*)xx/' ' x $1/io, -1, 'xx8xx')
Is there a way to use the number being held in $1 to repeat the space character by that number i.e. something like ' ' x $1?
Any help much appreciated!
Tiaan
Supposed you need to replace with three blank.
data _null_;
x=prxchange('s/(xx)\d+(xx)/$1 $2/', -1, 'xx8xx');
_x=prxchange('s/(?=\w+)(\d+)/ /',1,'xx8xx');
put _all_;
run;
Edit:
I missed important information. Tranwrd and repeat could be used to get it.
data _null_;
x=tranwrd('xx8xx', prxchange('s/.*(\d+).*/$1/',1,'xx8xx'), repeat(' ',prxchange('s/.*(\d+).*/$1/',1,'xx8xx')));
put _all_;
run;
You'll need to extract first, then compile a new regex. This will be expensive since you have to compile once per line.
data have;
input xstr $;
datalines;
xx8xx
xx3xx
xx4xx
;;;;
run;
data want;
set have;
rx1 = prxparse('/xx([\d])*xx/io');
rc1 = prxmatch(Rx1,xstr);
num_x = prxposn(rx1,1,xstr);
rx2 = prxparse(cat('s/(xx)[\d]*(xx)/$1',repeat(" ",num_x-1),'$2/i'));
newstr = prxchange(rx2,-1,xstr);
run;
Related
I have a number of text entries (municipalities) from which I need to remove the s at the end.
Data test;
input city $;
datalines;
arjepogs
askers
Londons
;
run;
data cities;
set test;
if prxmatch("/^(.*?)s$/",city)
then city=prxchange("s/^(.*?)s$/$1/",-1,city);
run;
Strangely enough, my s's are only removed from my first entry.
What am I doing wrong?
You defined CITY as length $8. The s in Londons is in the 7th position of the string. Not the LAST position of the string. Use the TRIM() function to remove the trailing spaces from the value of the variable.
data have;
input city $20.;
datalines;
arjepogs
Kent
askers
Londons
;
data want;
set have;
length new_city $20 ;
new_city=prxchange("s/^(.*?)s$/$1/",-1,trim(city));
run;
Result
Obs city new_city
1 arjepogs arjepog
2 Kent Kent
3 askers asker
4 Londons London
You could also just change the REGEX to account for the trailing spaces.
new_city=prxchange("s/^(.*?)s\ *$/$1/",-1,city);
Here is another solution using only SAS string functions and no regex. Note that in this case there is no need to trim the variable:
data cities;
set test;
if substr(city,length(city)) eq "s" then
city=substr(city,1,length(city)-1);
run;
Using SAS, I have a table with sentences and I am looking to find the rows in the table where the keyword is found in the sentence making use of fuzzy matching (complev function). Is there a way in SAS to find the keyword string in the sentences? I know how to use complev, but I only can use it to compare complete strings, not a string as a part of a larger string. For this example table the keyword would be 'example' and the result of the comparison would be in the column Result.
Thanks for your ideas!
This is an Example sentence : 1
Here is another one : 0
Also an exmple : 1
The examples keep coming : 1
No worries : 0
See if you can use this as a template. I compare the Complev value to three, but you can set it to any fitting value.
data have;
input string $ 1-25;
datalines;
Example sentence
Here is another one
Also an exmple
The examples keep coming
No worries
;
data want;
set have;
result = 0;
do _N_ = 1 to countw(string);
if complev('example', scan(string, _N_)) < 3 then do;
result=1; leave;
end;
end;
run;
EDIT: Use complev('example', scan(string, _N_), 'i') if you want the comparison the be case insensitive.
I have a table that looks similar to this:
A | B
1234|A1B2C
1124|$1n7
1342|*6675
1189|966
I need to create a column C where it takes the data from column B and replaces all non numeric characters with a "9" and makes each one 5 characters long by adding 0's to the front. It should come out like this:
91929
09197
96675
00966
Any assistance would be much appreciated, Thank you!
Edit: Sorry first time posting on any forum like this and got a bit ahead of myself, I created the table using SQL to pull data from 3 other tables and am a bit more familiar with SQL than SAS, which I have only been using for a few weeks. I have tried using COMPRESS but as I read more about that it seem like it only removes the values, so I tried TRANWRD but from what I was able to figure out I would need to create an entry for each letter and symbol that could appear, ie.
data Work.temp;
str = b;
Alpha=tranwrd(str, "a", "9");
Alpha=tranwrd(str, "b", "9");
put Alpha;
run;
so then I researched some more and found SAS replace character in ALL columns
based on that I used this code:
data temp;
set work.temp;
array vars [*] _character_;
do i = 1 to dim(vars);
vars[i] = compress(tranwrd(vars[i],"a","9"));
end;
drop i;
run;
That just returns:
|Str|B|Alpha|
|---.|-.|.-------|
(sorry about the bad formatting there, spent 30 min trying to figure out how to make the table look right with spaces but kept coming out wrong. Please imagine the -'s are spaces)
again any help would be appreciated, Thank you!
try this.
data test;
input var1 $5.;
datalines;
A1B2C
$1n7
*6675
966
;
run;
data test1;
set test;
length var2 $5.;
regex = prxparse ("s/[^0-9|\s]/9/"); /*holds the regular expression you want to use to substitute the non-number characters*/
var2 = prxchange (regex, -1, var1); /*use this function to substitute all instances of the pattern*/
var3 = put (input (var2, best5.), z5.); /*use input and put to pad the front of the variable with 0s*/
run;
Good luck.
Keeping only the digits is simple. Use the modifiers on the COMPRESS() function.
c=compress(b,,'kd');
Padding on the left with zeros there are a number of ways to do that.
You could convert the digits to a number then write it back to a string use the Z format.
c=put(input(c,??5.),Z5.);
You could add the zeros. Using IF statement:
if length(c) < 5 then c=repeat('0',5-length(c)-1)||c ;
Or using SUBSTRN() function.
c=substrn('00000',1,5-length(c))||c;
Or have some fun with the REVERSE() function.
c=reverse(substr(reverse(cats('00000',c)),1,5));
I have a SAS string that always starts with a date. I want to remove the date from the substring.
Example of data is below (data does not have bullets, included bullets to increase readability)
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
I want the data to look like this (data does not have bullets, included bullets to increase readability)
test_num15
recom_1_test1
test_0_8_i0|vacc_previous0
Index find '|' position in the string, then substr substring; or use regular expression.
data have;
input x $50.;
x1=substr(x,index(x,'|')+1);
x2=prxchange('s/([^_]+\|)(?=\w+)//',1,x);
cards;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;
run;
This is a great use case for call scan. If your length of date is constant (always 10), then you don't actually need this (start would be 12 then and skip to the substr, as user667489 noted in comments), but if it's not this would be helpful.
data have;
length textstr $100;
input textstr $;
datalines;
10/01/2016|test_num15
11/15/2016|recom_1_test1
03/04/2017|test_0_8_i0|vacc_previous0
;;;;
run;
data want;
set have;
call scan(textstr,2,start,length,'|');
new_textstr = substr(textstr,start);
run;
It would also let you grab the second word only if that's useful (using length third argument for substr).
I came across this question today morning and I am still trying to figure out it can be done. the following dataset is present and has a character variable CAT.
CAT
A
AB
B
ABCD
CB
.
.
.
and so on.
We need to write a SAS program to introduce commas in-between each character of the string if the length of the string is more than 1. I used length() function and used a do loop to create different variables and it just got messy. How do i tackle this?
Regular expression solution:
data have;
input CAT $;
datalines;
A
AB
B
ABCD
CB
;;;;
run;
data want;
set have;
cat_c = prxchange('s/(?<=[[:alpha:]])([[:alpha:]])/,$1/io',-1,CAT);
put cat_c=;
run;
The first parenthetical group is a look-behind for an alpha character; then the captured alpha character. Then replace with comma and character. If you want something other than [[:alpha:]] (ie, A-Z) then supply that as a class.
The solution using length and do loop isn't bad, honestly, if you want something that is more readable to novice programmers. Just use SUBSTR left of the equal sign.
data want2;
set have;
if length(cat) > 1 then
do _t = 1 to length(cat)-1;
substr(cat_c,2*_t-1,2)=substr(cat,_t,1)||',';
end;
substr(cat_c,2*length(cat)-1,1)=substr(cat,length(cat),1);
put cat_c=;
run;