search and replace not working in sas as expected - sas

I am trying to replace the words from list1 to list2 (basically translating from one language to another), but the search and replace function is not working as expected. It only replaces a portion of thew string and not the complete. Am I doing something incorrect?
Output I am running after running the code above:
names_list names_list_french i
My name is Craig Matthews, My name is Donald Dunn Je m'appis Craig Matthews, Je m'appis Donald Dunn 3
Expected Output:
names_list names_list_french i
My name is Craig Matthews, My name is Donald Dunn Je m'appelle Craig Matthews, Je m'appelle Donald Dunn 3
SAS CODE:
data datatable;
names_list="My name is Craig Matthews, My name is Donald Dunn";
array list1{2} $ _temporary_ (
"My name is Craig Matthews",
"My name is Donald Dunn"
);
array list2{2} $ _temporary_ (
"Je m'appelle Craig Matthews",
"Je m'appelle Donald Dunn"
);
names_list_french=names_list;
put names_list= names_list_french=;
do i=1 to dim(list1);
put list1{i}= list2{i}=;
names_list_french=tranwrd(names_list_french,list1{i},list2{i});
end;
put names_list= names_list_french=;
run;

SAS stores strings as fixed length padded with spaces.
If you do not tell it otherwise SAS will default character variables to length of 8.
So define the length of your variables and use TRIM() when passing the strings to TRANWRD() function.
data datatable;
length names_list $100;
names_list="My name is Craig Matthews, My name is Donald Dunn";
array list1{2} $100 _temporary_ (
"My name is Craig Matthews",
"My name is Donald Dunn"
);
array list2{2} $100 _temporary_ (
"Je m'appelle Craig Matthews",
"Je m'appelle Donald Dunn"
);
names_list_french=names_list;
put names_list= names_list_french=;
do i=1 to dim(list1);
put list1{i}= list2{i}=;
names_list_french=tranwrd(names_list_french,trim(list1{i}),trim(list2{i}));
end;
put names_list= names_list_french=;
run;

Related

How to sort address alphabetically in SAS?

I have a dataset that has bunch of addresses.
PROC SORT DATA=work68;
by ADDRESS ;
run;
However it only show ADDRESS columns like .. it considers only the very first number of address..
2237 Strang Avenue
2932 Ely Avenue
3306 Wilson Ave
3313 Wilson Avenue
3313 Wilson Avenue
3313 Wilson Avenue
46 Nuvern Avenue
You can use the option SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON) to ask SAS to try and sort numeric values as if they were numbers.
PROC SORT DATA=work68 sortseq=linguistic(numeric_collation=on);
by ADDRESS ;
run;
If I understand correctly what you're asking, you could try creating a new address column with all digits removed and sort on that:
data have;
input address $100.;
infile cards truncover;
cards;
1107 Huichton Rd.
1111 Ely Avenue
;
run;
data v_have /view = v_have;
set have;
address_nonumbers = strip(compress(address,,'d'));
run;
proc sort data = v_have out = want;
by address_nonumbers;
run;
Proc SQL syntax can sort data in special ways, ORDER BY <computation-1>, …, <computation-N>
You may want to sort by street names first, and then by numeric premise identifier (house number). For example
Data
data have; input; address=_infile_;datalines;
2237 Strang Avenue
2932 Ely Avenue
3306 Wilson Ave
3313 Wilson Avenue
46 Nuvern Avenue
3313 Ely Avenue
4494 Nuvern Avenue
run;
Sort on street name, then house number
proc sql;
create table want as
select *
from have
order by
compress (address,,'ds') /* ignore digits and spaces - presume to be street name */
, input (scan(address,1),? best12.) /* house number */
;
quit;
This example has simplified presumptions and will not properly sort address constructs such as #### ##th Street

How to transpose Table in a specific way

This is some example data, real data is more complex, other fields and about 40000 observations and up to 180 values per id (i know that i will get 360 rows in transposed table, but thats ok):
Data have;
input lastname firstname $ value;
datalines;
miller george 47
miller george 45
miller henry 44
miller peter 45
smith peter 42
smith frank 46
;
run;
And i want it to transpose in this way, so I have lastname, and then alternating firstname and value for ervery line matching the lastname.
data want:
Lastname Firstname1 Value1 Firstname2 value2 Firstname3 Value3 firstname4 value4
miller george 47 george 45 henry 44 peter 45
smith peter 42 frank 46
I tried a bit with proc transpose, but i was not able to build a table exactly the way i want it, described above. I need the want table exactly that way (real data is more complex and with other fields), so please no answers which propose to create a want table with other layout.
proc summary has a very useful function to do this, idgroup. You need to specify how many values you have per lastname, so I've included a step to calculate the maximum number.
Data have;
input lastname $ firstname $ value;
datalines;
miller george 47
miller george 45
miller henry 44
miller peter 45
smith peter 42
smith frank 46
;
run;
/* get frequency count of lastnames */
proc freq data=have noprint order=freq;
table lastname / out=name_freq;
run;
/* store maximum into a macro variable (first record will be the highest) */
data _null_;
set name_freq (obs=1);
call symput('max_num',count);
run;
%put &max_num.;
/* transpose data using proc summary */
proc summary data=have nway;
class lastname;
output out=want (drop=_:)
idgroup(out[&max_num.] (firstname value)=) / autoname;
run;

What's the easiest way to get SAS to do this?

I have a dataset that looks like this but with many, many more variable pairs:
Stuff2016 Stuff2008 Earth2016 Earth2008 Fire2016 Fire2008
123456 5646743 45 456 456 890101
541351 543534534 45 489 489 74456
352352 564889 98 489489 1231 189
464646 542235423 13 15615 1561 78
987654 4561889 44 1212 12121 111
For each pair of almost identically named variables,
I want SAS to subtract 2016 data - 2008 data without typing the variable names.
What's the easiest way to tell SAS to do this without having to specifically type the variable names? Is there a way to tell it to subtract every other variable minus the one that precedes it without mentioning the specific variable names?
Thanks a lot!!!!
I would probably recommend three arrays but you could do it with one. This highly depends on the order of the variables which isn't a good assumption in my book. Also, how would you name the results automatically?
data want;
set have;
array vars(*) stuff2016--fire2008;
array diffs(*) diffs1-diffs20; *something big enough to hold difference;
do i=1 to dim(vars)-1;
diffs(i) = vars(i)-vars(i+1);
end;
run;
Instead, I'd highly suggest you use the dictionary tables to query your variable names and dynamically generate your variable lists which are then passed onto three different arrays, one for 2016, one for 2008 and one for the difference. The libname and memname are stored in uppercase in the Dictionary table so keep that in mind.
data have;
input Stuff2016 Stuff2008 Earth2016 Earth2008 Fire2016 Fire2008;
cards;
123456 5646743 45 456 456 890101
541351 543534534 45 489 489 74456
352352 564889 98 489489 1231 189
464646 542235423 13 15615 1561 78
987654 4561889 44 1212 12121 111
;
run;
proc sql;
select name into :var2016 separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2016'
order by name;
select name into :var2008 separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2008'
order by name;
select catx("_", compress(name, ,'d'), "diff") into :vardiff separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2016'
order by name;
quit;
%put &var2016.;
%put &var2008.;
%put &vardiff.;
data want;
set have;
array v2016(*) &var2016;
array v2008(*) &var2008;
array diffs(*) &vardiff;
do i=1 to dim(v2016);
diffs(i)=v2016(i)-v2008(i);
end;
run;

How many observations in the output dataset?

A raw data file is listed below:
RANCH,1250,2,1,Sheppard Avenue, "$64,000"
SPLIT,1190,1,1,Rand Street, "$65,850"
CONDON, 1400,2,1,Market Street, "80,050"
TWOSTORY, 1810,4,3,Garris Street, "$107,250"
RANCH, 1500,3,3,Kemble Avenue, "$86,650"
SPLIT, 1615, 4,3, West Drive, "94,450"
SPLIT, 1305, 3,1.5,Graham Avenue, "$73,650"
The following is the code:
data work.condo_ranch;
infield "file_specificaton" did;
input style $ #;
if style = 'CONDO' or style = 'RANCH' then
input sqfeet bedrooms baths street $ price: dollar10.;
run;
So, I think the output dataset contains 3 observations, while the correct answer is that the output contains 7 observations. Does anyone tell me why? Many thanks for your time and attention.
Why would you expect the output dataset to have only 3 observations. There is an implied OUTPUT statement at the bottom of the DATA step. If you want to output only those records where STYLE IN ("CONDO","RANCH") you could add a conditional OUTPUT, e.g.:
if style = 'CONDO' or style = 'RANCH' then do;
input sqfeet bedrooms baths street $ price: dollar10.;
output;
end;
If you only want to output the records where style is CONDO or RANCH you could just change your THEN to a semi-colon. That would make your IF statement a subsetting IF. So the data step would return at that point and never run the second INPUT or the implied OUTPUT at the end of the step.

Rearranging the order of the text in a character string in SAS?

I have a data set with a character variable called "name". It contains the full name of a person like this:
"firstname middlename lastname".
I want to have the data rearranged so that is becomes:
"lastname, firstname middlename".
I'm not that hardcore in SAS functions, but I have used some of the few I know.
(My code can be seen below).
In the first try (test2) I don't get the result I want - I get:
"lastName , firstName middleName" and not
"lastName, firstName middleName" - my problem is the comma.
So I thought that I would solve my problem by making af new last name variable containing the comma at the end (in test2_new). But I don't get what I want? SAS put three dots at the end, and not a comma?
I hope a person with more SAS skills than me, can answer my question??
Kind Regards
Maria
data have ;
input #1 text & $64. ;
datalines ;
Susan Smith
David A Jameson
Bruce Thomas Forsyth
;
run ;
data want ;
set have ;
lastname = scan(text,-1,' ') ;
firstnames = substr(text,1,length(text)-length(lastname)) ;
newname = catx(', ',lastname,firstnames) ;
run ;
Which gives
text lastname firstnames newname
Susan Smith Smith Susan Smith, Susan
David A Jameson Jameson David A Jameson, David A
Bruce Thomas Forsyth Forsyth Bruce Thomas Forsyth, Bruce Thomas
PERL expressions are a useful tool here, particularly PRXCHANGE. The SAS Support website provides a good example of how to reverse first and last name, here's a slight modification of that code. I've only catered for people with either 2 or 3 names, but it should be fairly simple to expand this if necessary. My code is based on the HAVE dataset created in the answer from #Chris J.
data want;
set have;
if countw(text)=2 then text = prxchange('s/(\w+) (\w+)/$2, $1/', -1, text);
else if countw(text)=3 then text = prxchange('s/(\w+) (\w+) (\w+)/$3, $1 $2/', -1, text);
run;