How to extract some value from character column in PROC SQL in SAS Enterprise Guide? - sas

I have table in SAS like below:
data type:
ID - numeric
TYPE - character
ID
TYPE
111
{"Top":10}
222
{"abC":"false","Top":111, "aa":5}
333
{"Top":15.2}
...
...
And I need to create new column "COL1" base on column "TYPE" where I need to extract value which is after character "Top":
So I need something like below:
ID | TYPE | COL1
--- |-----------------------------------|------
111 | {"Top":10} | 10
222 | {"abC":"false","Top":111, "aa":5} | 111
333 | {"Top":15.2} | 15.2
... | ... | ...
How can I do that in PROC SQL in SAS Enterprise Guide ?

Perhaps use the following
data want;
set have;
length col1 $50.;
col1 = strip(scan(tranwrd(type,'"Top":','|'),2,'|},'));
run;
Use the tranwrd() function to replace all occurrences of "Top": in type by a vertical bar |.
Use the scan() function to extract the substring between the vertical bar | and the closing bracket } or comma ,.
id type col1
111 {"Top":10} 10
222 {"abC":"false","Top":111,"aa":5} 111
333 {"Top":15.2} 15.2

A regex-based solution:
data want;
set have;
/* Create a regular expression ID */
regexid = prxparse('/(?<="Top":)(\d+)/');
/* Find the position and length of the match */
call prxsubstr(regexid, type, pos, len);
/* Extract the match and convert it to a number */
COL1 = input(substr(type, pos, len), 8.);
run;
You can test this out here:
https://regex101.com/r/0RDsCJ/1

Looks like your TYPE variable has JSON strings. If the strings are as simple as your example you can just use nested SCAN() function calls to tease out the values.
So first let's convert your data listing into actual data.
data have;
input ID TYPE $50.;
cards4;
111 {"Top":10}
222 {"abC":"false","Top":111, "aa":5}
333 {"Top":15.2}
;;;;
Now let's process the JSON text to pull out the name:value pairs. And when the name is Top let's convert the value into a number.
data want;
set have;
do index=1 to countw(type,'{,}','q');
pair = scan(type,index,'{,}','q');
name = dequote(scan(pair,1,':','q'));
value = dequote(scan(pair,2,':','q'));
if name='Top' then number=input(value,32.);
end;
drop index pair name value;
run;
Note that if Top appears multiple times in your JSON text the last value seen will be used.
Results:
OBS ID TYPE number
1 111 {"Top":10} 10.0
2 222 {"abC":"false","Top":111, "aa":5} 111.0
3 333 {"Top":15.2} 15.2

Related

How to aggregate number of occure each value per column in table in SAS Enterprise Guide?

I have table in SAS Enterprise Guide like below:
COL1 | COL2 | COL3
-----|-------|------
111 | A | C
111 | B | C
222 | A | D
333 | A | D
And I need to aggregate abve table to know how many each value in columns occured, so as to have something like below:
COL2_A | COL2_B | COL3_C | COL3_D
--------|--------|--------|--------
3 | 1 | 2 | 2
Because:
COL_2A = 3, because in COL2 value "A" exists 3 times
and so on...
How can I do that in SAS Enterprise Guide or in PROC SQL ?
I need the output as SAS dataset
Try this
data have;
input COL1 COL2 $ COL3 $;
datalines;
111 A C
111 B C
222 A D
333 A D
;
data long;
set have;
array col COL2 COL3;
do over col;
c = col;
n = cats(vname(col), '_', c);
output;
end;
run;
proc summary data = long nway;
class n;
output out = freq(drop = _TYPE_);
run;
proc transpose data = freq out = wide_freq(drop = _:);
id n;
run;

Extracting row with highest value in a column while also calculating averages by group

I have been tasked with taking the following data and creating two permanent data sets from it. One of these permanent data sets is supposed to contain the average of the "value" column for each group (meaning there should only be four rows in the end, with a new column that represents the average of respective values for A, B, C, and D). Averages should exclude missing values, meaning that if category A has a missing value, it should be divided by 3, not 4. The second permanent data set needs to be the one row with the highest overall value in the "value" column (in this case, the row with D 09JUL2021 951 should be the only row exported). I am having a tough time extracting that single row for the second data set. If you know of a way to perform these operations simultaneously, please let me know. Thank you for your time!
Example data:
data work.have;
input type $ date DATE9. value;
datalines;
A 08JUL2021 .
A 09JUL2021 20
A 20JUL2021 55
A 20JUL2021 2
B 02JUL2021 9
B 22JUL2021 6
B 04JUL2021 8
B 07JUL2021 406
C 01JUL2021 215
C 28JUL2021 63
C 30JUL2021 78
C 21JUL2021 80
D 18JUL2021 951
D 09JUL2021 .
D 14JUL2021 54
D 08JUL2021 73
;
Here is what I tried:
data mylib.data1(keep=type date value value_avg) mylib.data2;
set work.have;
by type;
if value ne . then NotMissing=1; else NotMissing=0;
if first.type then call missing(of value_avg);
value_avg+value;
if first.type then call missing(of num_per_cat);
num_per_cat+NotMissing;
Avg=divide((value_avg+value),(num_per_cat+NotMissing));
if last.type then output mylib.data1;
run;
This was successful for me with calculating averages, but I have no idea how to extract the row with the highest value in the "value" column to a second data set.
data work.have;
input type $ date DATE9. value;
datalines;
A 08JUL2021 .
A 09JUL2021 20
A 20JUL2021 55
A 20JUL2021 2
B 02JUL2021 9
B 22JUL2021 6
B 04JUL2021 8
B 07JUL2021 406
C 01JUL2021 215
C 28JUL2021 63
C 30JUL2021 78
C 21JUL2021 80
D 18JUL2021 951
D 09JUL2021 .
D 14JUL2021 54
D 08JUL2021 73
;
proc summary data = have nway;
class type;
var value;
output out = want_mean(drop = _:) mean = ;
run;
proc summary data = have nway;
class type;
var value;
output out = want_max(drop = _:) max = ;
run;
Both sets are easelly done by proc sql.
First one:
proc sql;
create table want1 as
select distinct type, max(value) as Max_value, mean(value) as Average_value
from have
group by type
;
quit;
Second one:
proc sql;
create table want2 as
select *
from have
having value = max(value)
;
quit;

Automatically replace outlying values with missing values

Suppose the data set have contains various outliers which have been identified in an outliers data set. These outliers need to be replaced with missing values, as demonstrated below.
Have
Obs group replicate height weight bp cholesterol
1 1 A 0.406 0.887 0.262 0.683
2 1 B 0.656 0.700 0.083 0.836
3 1 C 0.645 0.711 0.349 0.383
4 1 D 0.115 0.266 666.000 0.015
5 2 A 0.607 0.247 0.644 0.915
6 2 B 0.172 333.000 555.000 0.924
7 2 C 0.680 0.417 0.269 0.499
8 2 D 0.787 0.260 0.610 0.142
9 3 A 0.406 0.099 0.263 111.000
10 3 B 0.981 444.000 0.971 0.894
11 3 C 0.436 0.502 0.563 0.580
12 3 D 0.814 0.959 0.829 0.245
13 4 A 0.488 0.273 0.463 0.784
14 4 B 0.141 0.117 0.674 0.103
15 4 C 0.152 0.935 0.250 0.800
16 4 D 222.000 0.247 0.778 0.941
Want
Obs group replicate height weight bp cholesterol
1 1 A 0.4056 0.8870 0.2615 0.6827
2 1 B 0.6556 0.6995 0.0829 0.8356
3 1 C 0.6445 0.7110 0.3492 0.3826
4 1 D 0.1146 0.2655 . 0.0152
5 2 A 0.6072 0.2474 0.6444 0.9154
6 2 B 0.1720 . . 0.9241
7 2 C 0.6800 0.4166 0.2686 0.4992
8 2 D 0.7874 0.2595 0.6099 0.1418
9 3 A 0.4057 0.0988 0.2632 .
10 3 B 0.9805 . 0.9712 0.8937
11 3 C 0.4358 0.5023 0.5626 0.5799
12 3 D 0.8138 0.9588 0.8293 0.2448
13 4 A 0.4881 0.2731 0.4633 0.7839
14 4 B 0.1413 0.1166 0.6743 0.1032
15 4 C 0.1522 0.9351 0.2504 0.8003
16 4 D . 0.2465 0.7782 0.9412
The "get it done" approach is to manually enter each variable/value combination in a conditional which replaces with missing when true.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
;
run;
data outliers;
input parameter $ 11. group replicate $ measurement;
datalines;
cholesterol 3 A 111
height 4 D 222
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
EDIT: Updated outliers so that parameter avoids truncation and changed measurement to be numeric type so as to match the corresponding height, weight, bp, cholesterol. This shouldn't change the responses.
data want;
set have;
if group = 3 and replicate = 'A' and cholesterol = 111 then cholesterol = .;
if group = 4 and replicate = 'D' and height = 222 then height = .;
if group = 2 and replicate = 'B' and weight = 333 then weight = .;
if group = 3 and replicate = 'B' and weight = 444 then weight = .;
if group = 2 and replicate = 'B' and bp = 555 then bp = .;
if group = 1 and replicate = 'D' and bp = 666 then bp = .;
run;
This, however, doesn't utilize the outliers data set. How can the replacement process be made automatic?
I immediately think of the IN= operator, but that won't work. It's not the entire row which needs to be matched. Perhaps an SQL key matching approach would work? But to match the key, don't I need to use a where statement? I'd then effectively be writing everything out manually again. I could probably create macro variables which contain the various if or where statements, but that seems excessive.
I don't think generating statements is excessive in this case. The complexity arises here because your outlier dataset cannot be merged easily since the parameter values represent variable names in the have dataset. If it is possible to reorient the outliers dataset so you have a 1 to 1 merge, this logic would be simpler.
Let's assume you cannot. There are a few ways to use a variable in a dataset that corresponds to a variable in another.
You could use an array like array params{*} height -- cholesterol; and then use the vname function as you loop through the array to compare to the value in the parameter variable, but this gets complicated in your case because you have a one to many merge, so you would have to retain the replacements and only output the last record for each by group... so it gets complicated.
You could transpose the outliers data using proc transpose, but that will get lengthy because you will need a transpose for each parameter, and then you'd need to merge all the transposed datasets back to the have dataset. My main issue with this method is that code with a bunch of transposes like that gets unwieldy.
You create the macro variable logic you are thinking might be excessive. But compared to the other ways of getting the values of the parameter variable to match up with the variable names in the have dataset, I don't think something like this is excessive:
data _null_;
set outliers;
call symput("outlierstatement"||_n_,"if group = "||group||" and replicate = '"||replicate||"' and "||parameter||" = "||measurement||" then "|| parameter ||" = .;");
call symput("outliercount",_n_);
run;
%macro makewant();
data want;
set have;
%do i = 1 %to &outliercount;
&&outlierstatement&i;
%end;
run;
%mend;
Lorem:
Transposition is the key to a fully automatic programmatic approach. The transposition that will occur is of the filter data, not the original data. The transposed filter data will have fewer rows than the original. As John indicated, transposition of the want data can create a very tall table and has to be transposed back after applying the filters.
As to the the filter data, the presence of a filter row for a specific group, replicate and parameter should be enough to mark a cell for filtering. This is on the presumption that you have a system for automatic outlier detection and the filter values will always be in concordance with the original values.
So, what has to be done to automate the filter application process without code generating a wall of test and assign statements ?
Transpose filter data into same form as want data, call it Filter^
Merge Want and Filter^ by record key (which is the by group of Group and Replicate)
Array process the data elements, looking for filtering conditions.
For your consideration, try the following SAS code. There is an erroneous filter record added to the mix.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
5 E 222 0.2465 0.7782 0.9412 /* test record for filter value misalignment test */
;
run;
data outliers;
length parameter $32; %* <--- widened parameter so it can transposed into column via id;
input parameter $ group replicate $ measurement ; %* <--- changed measurement to numeric variable;
datalines;
cholesterol 3 A 111
height 4 D 222
height 5 E 223 /* test record for filter value misalignment test */
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
data want;
set have;
if group = 3 and replicate = 'A' and cholesterol = 111 then cholesterol = .;
if group = 4 and replicate = 'D' and height = 222 then height = .;
if group = 2 and replicate = 'B' and weight = 333 then weight = .;
if group = 3 and replicate = 'B' and weight = 444 then weight = .;
if group = 2 and replicate = 'B' and bp = 555 then bp = .;
if group = 1 and replicate = 'D' and bp = 666 then bp = .;
run;
/* Create a view with 1st row having all the filtered parameters
* This is necessary so that the first transposed filter row
* will have the parameters as columns in alphabetic order;
*/
proc sql noprint;
create view outliers_transpose_ready as
select distinct parameter from outliers
union
select * from outliers
order by group, replicate, parameter
;
/* Generate a alphabetic ordered list of parameters for use
* as a variable (aka column) list in the filter application step */
select distinct parameter
into :parameters separated by ' '
from outliers
order by parameter
;
quit;
%put NOTE: &=parameters;
/* tranpose the filter data
* The ID statement pivots row data into column names.
* The prefix=_filter_ ensure the new column names
* will not collide with the original data, and can be
* the shortcut listed with _filter_: in an array statement.
*/
proc transpose data=outliers_transpose_ready out=outliers_apply_ready prefix=_filter_;
by group replicate notsorted;
id parameter;
var measurement;
run;
/* Robust production code should contain a bin for
* data that does not conform to the filter application conditions
*/
data
want2(label="Outlier filtering applied" drop=_i_ _filter_:)
want2_warnings(label="Outlier filtering: misaligned values")
;
merge have outliers_apply_ready(keep=group replicate _filter_:);
by group replicate;
/* The arrays are for like named columns
* due to the alphabetic ordering enforced in data and codegen preparation
*/
array value_filter_check _filter_:;
array value &parameters;
if group ne .;
do _i_ = 1 to dim(value);
if value(_i_) EQ value_filter_check(_i_) then
value(_i_) = .;
else
if not missing(value_filter_check(_i_)) AND
value(_i_) NE value_filter_check(_i_)
then do;
put 'WARNING: Filtering expected but values do not match. ' group= replicate= value(_i_)= value_filter_check(_i_)=;
output want2_warnings;
end;
end;
output want2;
run;
Confirm your want and automated want2 agree.
proc compare noprint data=want compare=want2 outnoequal out=diffs;
by group replicate;
run;
Enjoy your SAS
You could use a hash table. Load a hash table with the outlier dataset, with parameter-group-replicate defined as the key. Then read in the data, and as you read each record, check each of the variables to see if that combination of parameter-group-replicate can be found in the hash table. I think below works (I'm no hash expert):
data want;
if 0 then set outliers (keep=parameter group replicate);
if _N_ = 1 then
do;
declare hash h(dataset:'outliers') ;
h.defineKey('parameter', 'group', 'replicate') ;
h.defineDone() ;
end;
set have ;
array vars {*} height weight bp cholesterol ;
do i=1 to dim(vars);
parameter=vname(vars{i});
if h.check()=0 then call missing(vars{i});
end;
drop i parameter;
run;
I like #John's suggestion:
You could use an array like array params{*} height -- cholesterol; and
then use the vname function as you loop through the array to compare
to the value in the parameter variable, but this gets complicated in
your case because you have a one to many merge, so you would have to
retain the replacements and only output the last record for each by
group... so it gets complicated.
Generally in a one to many merge I would avoid recoding variables from the dataset that is unique, because variables are retained within BY groups. But in this case, it works out well.
proc sort data=outliers;
by group replicate;
run;
data want (keep=group replicate height weight bp cholesterol);
merge have (in=a)
outliers (keep=group replicate parameter in=b)
;
by group replicate;
array vars {*} height weight bp cholesterol ;
do i=1 to dim(vars);
if vname(vars{i})=parameter then call missing(vars{i});
end;
if last.replicate;
run;
Thank you #John for providing a proof of concept. My implementation is a little different and I think worth making a separate entry for posterity. I went with a macro variable approach because I feel it is the most intuitive, being a simple text replacement. However, since a macro variable can contain only 65534 characters, it is conceivable that there could be sufficient outliers to exceed this limit. In such a case, any of the other solutions would make fine alternatives. Note that it is important that the put statement use something like best32. Too short a width will truncate the value.
If you desire to have a dataset containing the if statements (perhaps for verification), simply remove the into : statement and place a create table statements as line at the beginning of the PROC SQL step.
data have;
input group replicate $ height weight bp cholesterol;
datalines;
1 A 0.4056 0.8870 0.2615 0.6827
1 B 0.6556 0.6995 0.0829 0.8356
1 C 0.6445 0.7110 0.3492 0.3826
1 D 0.1146 0.2655 666 0.0152
2 A 0.6072 0.2474 0.6444 0.9154
2 B 0.1720 333 555 0.9241
2 C 0.6800 0.4166 0.2686 0.4992
2 D 0.7874 0.2595 0.6099 0.1418
3 A 0.4057 0.0988 0.2632 111
3 B 0.9805 444 0.9712 0.8937
3 C 0.4358 0.5023 0.5626 0.5799
3 D 0.8138 0.9588 0.8293 0.2448
4 A 0.4881 0.2731 0.4633 0.7839
4 B 0.1413 0.1166 0.6743 0.1032
4 C 0.1522 0.9351 0.2504 0.8003
4 D 222 0.2465 0.7782 0.9412
;
run;
data outliers;
input parameter $ 11. group replicate $ measurement;
datalines;
cholesterol 3 A 111
height 4 D 222
weight 2 B 333
weight 3 B 444
bp 2 B 555
bp 1 D 666
;
run;
proc sql noprint;
select
cat('if group = '
, strip(put(group, best32.))
, " and replicate = '"
, strip(replicate)
, "' and "
, strip(parameter)
, ' = '
, strip(put(measurement, best32.))
, ' then '
, strip(parameter)
, ' = . ;')
into : listIfs separated by ' '
from outliers
;
quit;
%put %quote(&listIfs);
data want;
set have;
&listIfs;
run;

Transform numbers with 0 values at the beginning

I have the following dataset:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
;
PROC PRINT; RUN;
I want to link this data to another table but the thing is that the numbers in the other table are stored in the following format: 0012, 0023, 0023.
So I am looking for a way to do the following:
Check how long the number is
If length = 1, add 3 0 values to the beginning
If length = 2, add 2 0 values to the beginning
Any thoughts on how I can get this working?
Numbers are numbers so if the other table has the field as a number then you don't need to do anything. 13 = 0013 = 13.00 = ....
If the other table actually has a character variable then you need to convert one or the other.
char_number = put(number, Z4.);
number = input(char_number, 4.);
You can use z#. formats to accomplish this:
DATA survey;
INPUT zip_code number;
DATALINES;
1212 12
1213 23
1214 23
9999 999
8888 8
;
data survey2;
set survey;
number_long = put(number, z4.);
run;
If you need it to be four characters long, then you could do it like this:
want = put(input(number,best32.),z4.);

How can I find and replace specific text in a SAS data set?

I have a data set with 400 observations of 4 digit codes which I would like to pad with a space on both sides
ex. Dataset
obs code
1 1111
2 1112
3 3333
.
.
.
400 5999
How can I go through another large data set and replace every occurrence of any of the padded 400 codes with a " ".
ex. Large Dataset
obs text
1 abcdef 1111 abcdef
2 abcdef 1111 abcdef 1112 8888
3 abcdef 1111 abcdef 11128888
...
Data set that I want
ex. New Data set
obs text
1 abcdef abcdef
2 abcdef abcdef 8888
3 abcdef abcdef 11128888
...
Note: I'm only looking to replace 4 digit codes that are padded on both sides by a space. So in obs 3, 1112 won't be replaced.
I've tried doing the following proc sql statement, but it only finds and replaces the first match, instead of all the matches.
proc sql;
select
*,
tranwrd(large_dataset.text, trim(small_dataset.code), ' ') as new_text
from large_dataset
left join small_dataset
on findw(large_dataset.text, trim(small_dataset.code))
;
quit;
You could just use a DO loop to scan through the small dataset of codes for each record in the large dataset. If you want to use TRANWRD() function then you will need to add extra space characters.
data want ;
set have ;
length code $4 ;
do i=1 to nobs while (text ne ' ');
set codes(keep=code) nobs=nobs point=i ;
text = substr(tranwrd(' '||text,' '||code||' ',' '),2);
end;
drop code;
run;
The DO loop will read the records from your CODES list. Using the POINT= option on the SET statement lets you read the file multiple times. The WHILE clause will stop if the TEXT string is empty since there is no need to keep looking for codes to replace at that point.
If your list of codes is small enough and you can get the right regular expression then you might try using PRXCHANGE() function instead. You can use an SQL step to generate the codes as a list that you can use in the regular expression.
proc sql noprint ;
select code into :codelist separated by '|'
from codes
;
quit;
data want ;
set have ;
text=prxchange("s/\b(&codelist)\b/ /",-1,text);
run;
There might be more efficient ways of doing this, but this seems to work fairly well:
/*Create test datasets*/
data codes;
input code;
cards;
1111
1112
3333
5999
;
run;
data big_dataset;
infile cards truncover;
input text $100.;
cards;
abcdef 1111 abcdef
abcdef 1111 abcdef 1112 8888
abcdef 1111 abcdef 11128888
;
run;
/*Get the number of codes to use for array definition*/
data _null_;
set codes(obs = 1) nobs = nobs;
call symput('ncodes',nobs);
run;
%put ncodes = &ncodes;
data want;
set big_dataset;
/*Define and populate array with padded codes*/
array codes{&ncodes} $6 _temporary_;
if _n_ = 1 then do i = 1 to &ncodes;
set codes;
codes[i] = cat(' ',put(code,4.),' ');
end;
do i = 1 to &ncodes;
text = tranwrd(text,codes[i],' ');
end;
drop i code;
run;
I expect a solution using prxchange is also possible, but I'm not sure how much work it is to construct a regex that matches all of your codes compared to just substituting them one by one.
Taking Tom's solution and putting the code-lookup into a hash-table. Thereby the dataset will only be read once and the actual lookup is quite fast. If the Large Dataset is really large this will make a huge difference.
data want ;
if _n_ = 1 then do;
length code $4 ;
declare hash h(dataset:"codes (keep=code)") ;
h.defineKey("code") ;
h.defineDone() ;
call missing (code);
declare hiter hiter('h') ;
end;
set big_dataset ;
rc = hiter.first() ;
do while (rc = 0 and text ne ' ') ;
text = substr(tranwrd(' '||text,' '||code||' ',' '),2) ;
rc = hiter.next() ;
end ;
drop code rc ;
run;
Use array and regular express:
proc transpose data=codes out=temp;
var code;
run;
data want;
if _n_=1 then set temp;
array var col:;
set big_dataset;
do over var;
text = prxchange(cats('s/\b',var,'\b//'),-1,text);
end;
drop col:;
run;