Combining data from different rows into one variable - sas

I have a table as below:
id sprvsr phone name
2 123 5232 ali
2 128 5458 ali
3 145 7845 oya
3 125 4785 oya
I would like to put same id and same name on one column and sprvsr and phone in one column together as below:
id sprvsr phone name
2 123-128 5232-5458 ali
3 145-125 7845-4785 oya
edit question:
have one more question- related this one.
i followed the way you showed me and works. Thank you! Another problem is for example:
sprvsr name
5232-5458 ali
5232-5458 ali
5458-5232 ali
is there any way that i can make them in same order?

If you need the variables in the same order, you'll need to use a temporary array and sort it. This requires having some idea of how many rows you might have. Also requires it to be sorted. This is a bit more complicated than the previous solution (in a previous revision).
data have;
input id sprvsr $ phone $ name $;
datalines;
2 123 5232 ali
2 128 5458 ali
3 145 7845 oya
3 125 4785 oya
4 128 5458 ali
4 123 5232 ali
;
run;
data want;
array phones[99] $8 _temporary_; *initialize these two to some reasonably high number;
array sprvsrs[99] $3 _temporary_;
length phone_all sprvsr_all $200; *same;
set have;
by id;
if first.id then do; *for each id, start out clearing the arrays;
call missing(of phones[*] sprvsrs[*]);
_counter=0;
end;
_counter+1; *increment counter;
phones[_counter]=phone; *assign current phone/sprvsr to array elements;
sprvsrs[_counter]=sprvsr;
if last.id then do; *now, create concatenated list and output;
call sortc(of phones[*]); *sort the lists;
call sortc(of sprvsrs[*]);
phone_all = catx('-',of phones[*]); *concatenate them together;
sprvsr_all= catx('-',of sprvsrs[*]);
output;
end;
drop phone sprvsr;
rename
phone_all=phone
sprvsr_all=sprvsr;
run;
The construction array[*] means "All variables of that array". So catx('-',of phones[*]) means put all phones elements in the catx (fortunately, missing ones are ignored by catx).

This is a way to do that:
data have;
input id sprvsr $ phone $ name $;
datalines;
2 123 5232 ali
2 128 5458 ali
3 145 7845 oya
3 125 4785 oya
;
run;
data want (drop=lag_sprvsr lag_phone);
format id;
length sprvsr $7 phone $9;
set have;
by id;
lag_sprvsr=lag(sprvsr);
lag_phone=lag(phone);
if lag(id)=id then do;
sprvsr=catx('-',lag_sprvsr,sprvsr);
phone=catx('-',lag_phone,phone);
end;
if last.id then output;
run;
Just pay attention to the possible lenghts of the input variables and that of the concatenated string. The input dataset must be sorted by id.
The catx() function removes the leading and trailing blanks and concatenates with a delimiter.

Related

All values for only most recent occurrence

I am trying to extract all the Time occurrences for only the recent visit. Can someone help me with the code please.
Here is my data:
Obs Name Date Time
1 Bob 2017090 1305
2 Bob 2017090 1015
3 Bob 2017081 0810
4 Bob 2017072 0602
5 Tom 2017090 1300
6 Tom 2017090 1010
7 Tom 2017090 0805
8 Tom 2017072 0607
9 Joe 2017085 1309
10 Joe 2017081 0815
I need the output as:
Obs Name Date Time
1 Bob 2017090 1305,1015
2 Tom 2017090 1300,1010,0805
3 Joe 2017085 1309
Right now my code is designed to give me only one recent entry:
DATA OUT2;
SET INP1;
BY DATE;
IF FIRST.DATE THEN OUTPUT OUT2;
RETURN;
I would first sort the data by name and date. Then I would transpose and process the results.
proc sort data=have;
by name date;
run;
proc transpose data=have out=temp1;
by name date;
var value;
run;
data want;
set temp1;
by name date;
if last.name;
format value $2000.;
value = catx(',',of col:);
drop col: _name_;
run;
You may want to further process the new VALUE to remove excess commas (,) and missing value .'s.
Very similar to the question yesterday from another user, you can use quite a few solutions here.
SQL again is the easiest; this is not valid ANSI SQL and pretty much only SAS supports this, but it does work in SAS:
proc sql;
select name, date, time
from have
group by name
having date=max(date);
quit;
Even though date and time are not on the group by it's legal in SAS to put them on the select, and then SAS automatically merges (inner joins) the result of select name, max(date) from have group by name having date=max(date) to the original have dataset, returning multiple rows as needed. Then you'd want to collapse the rows, which I leave as an exercise for the reader.
You could also simply generate a table of maximum dates using any method you choose and then merge yourself. This is probably the easiest in practice to use, in particular including troubleshooting.
The DoW loop also appeals here. This is basically the precise SAS data step implementation of the SQL above. First iterate over that name, figure out the max, then iterate again and output the ones with that max.
proc sort data=have;
by name date;
run;
data want;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then output;
end;
run;
Of course here you more easily collapse the rows, too:
data want;
length timelist $1024;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
max_Date = max(max_date,date);
end;
do _n_ = 1 by 1 until (last.name);
set have;
by name;
if date=max_date then timelist=catx(',',timelist,time);
if last.name then output;
end;
run;
If the data is sorted then just retain the first date so you know which records to combine and output.
proc sort data=have ;
by name descending date time;
run;
data want ;
set have ;
by name descending date ;
length timex $200 ;
retain start timex;
if first.name then do;
start=date;
timex=' ';
end;
if date=start then do;
timex=catx(',',timex,time);
if last.date then do;
output;
call missing(start,timex);
end;
end;
drop start time ;
rename timex=time ;
run;

SAS - Survey Select - Selecting Different Sample Size per Stratum

I have a list of financial advisors and I need to pull 4 samples per advisor but catch is in those 4 samples I need to force 2 mortgages, 1 loan, 1 credit card lets say.
Is there a way in the Survey select statement to set the specific number of samples to pull per stratum? I know you can stratify on 1 category and set it as a equal number. I was hoping I could use a mapping of employee names + the number of samples left to pull for each category and have survey select utilize that to pull in a dynamic way.
I'm using this as an example but this only stratifies on employee first and gives me 4 per employee. I would need to further stratify on Product type and set that to a specific sample size per product.
proc surveyselect data=work.Emp_Table_Final
method=srs n=4 out=work.testsample SELECTALL;
strata Employee_No;
run;
Thanks i know it might sound complicated, but if i know its possible then i can google the rest
Yes, you can have a dataset be the target of the n option. That dataset must:
Contain the strata variables as well as a variable SAMPSIZE or _NSIZE_ with the number to select
Have the same type and length as the strata variables
Be sorted by the strata variables
Have an entry for every strata variable value
See the documentation for more details.
data sample_counts;
length sex $1;
input sex $ _NSIZE_;
datalines;
F 5
M 3
;;;;
run;
proc sort data=sashelp.class out=class;
by sex;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex;
run;
For two variables it's the same, you just need two variables in the sample_counts. Of course it makes it a lot more complicated, and you may want to produce this in an automated fashion.
proc sort data=sashelp.class out=class;
by sex age;
run;
data sample_counts;
length sex $1;
input sex $ age _NSIZE_;
datalines;
F 11 1
F 12 1
F 13 1
F 14 1
F 15 1
M 11 1
M 12 1
M 13 1
M 14 1
M 15 1
M 16 0
;;;;
run;
/* or do it in an automated way*/
data sample_counts;
set class;
by sex age; *your strata;
if first.age then do; *do this once per stratum level;
if age le 15 then _NSIZE_ = 1; *whatever your logic is for defining _NSIZE_;
else _NSIZE_=0;
output;
end;
run;
proc surveyselect n=sample_counts method=srs out=samples data=class;
strata sex age;
run;

What's the easiest way to get SAS to do this?

I have a dataset that looks like this but with many, many more variable pairs:
Stuff2016 Stuff2008 Earth2016 Earth2008 Fire2016 Fire2008
123456 5646743 45 456 456 890101
541351 543534534 45 489 489 74456
352352 564889 98 489489 1231 189
464646 542235423 13 15615 1561 78
987654 4561889 44 1212 12121 111
For each pair of almost identically named variables,
I want SAS to subtract 2016 data - 2008 data without typing the variable names.
What's the easiest way to tell SAS to do this without having to specifically type the variable names? Is there a way to tell it to subtract every other variable minus the one that precedes it without mentioning the specific variable names?
Thanks a lot!!!!
I would probably recommend three arrays but you could do it with one. This highly depends on the order of the variables which isn't a good assumption in my book. Also, how would you name the results automatically?
data want;
set have;
array vars(*) stuff2016--fire2008;
array diffs(*) diffs1-diffs20; *something big enough to hold difference;
do i=1 to dim(vars)-1;
diffs(i) = vars(i)-vars(i+1);
end;
run;
Instead, I'd highly suggest you use the dictionary tables to query your variable names and dynamically generate your variable lists which are then passed onto three different arrays, one for 2016, one for 2008 and one for the difference. The libname and memname are stored in uppercase in the Dictionary table so keep that in mind.
data have;
input Stuff2016 Stuff2008 Earth2016 Earth2008 Fire2016 Fire2008;
cards;
123456 5646743 45 456 456 890101
541351 543534534 45 489 489 74456
352352 564889 98 489489 1231 189
464646 542235423 13 15615 1561 78
987654 4561889 44 1212 12121 111
;
run;
proc sql;
select name into :var2016 separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2016'
order by name;
select name into :var2008 separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2008'
order by name;
select catx("_", compress(name, ,'d'), "diff") into :vardiff separated by " "
from sashelp.vcolumn
where libname='WORK'
and memname='HAVE'
and name like '%2016'
order by name;
quit;
%put &var2016.;
%put &var2008.;
%put &vardiff.;
data want;
set have;
array v2016(*) &var2016;
array v2008(*) &var2008;
array diffs(*) &vardiff;
do i=1 to dim(v2016);
diffs(i)=v2016(i)-v2008(i);
end;
run;

How to sort by formatted values

proc sort data=sas.mincome;
by F3 F4;
run;
Proc sort doesn't sort the dataset by formatted values, only internal values. I need to sort by two variables prior to a merge. Is there anyway to do this with proc sort?
I don't think you can sort by formatted values in proc sort, but you can definitely use a simple proc SQL procedure to sort a dataset by formatted values. proc SQL is similar to the data step and proc sort, but is more powerful.
The general syntax of proc sql for sorting by formatted values will be:
proc sql;
create table NewDataSet as
select variable(s)
from OriginalDataSet
order by put(variable1, format1.), put(variable2, format2.);
quit;
For example, we have a sample data set containing the names, sex and ages of some people and we want to sort them:
proc format;
value gender 1='Male'
2='Female';
value age 10-15='Young'
16-24='Old';
run;
data work.original;
input name $ sex age;
datalines;
John 1 12
Zack 1 15
Mary 2 18
Peter 1 11
Angela 2 24
Jack 1 16
Lucy 2 17
Sharon 2 12
Isaac 1 22
;
run;
proc sql;
create table work.new as
select name, sex format=gender., age format=age.
from work.original
order by put(sex, gender.), put(age, age.);
quit;
Output of work.new will be:
Obs name sex age
1 Mary Female Old
2 Angela Female Old
3 Lucy Female Old
4 Sharon Female Young
5 Jack Male Old
6 Isaac Male Old
7 John Male Young
8 Zack Male Young
9 Peter Male Young
If we had used proc sort by sex, then Males would have been ranked first because we had used 1 to represent Males and 2 to represent Females which is not what we want. So, we can clearly see that proc sql did in fact sort them according to the formatted values (Females first, Males second).
Hope this helps.
Because of the nature of formats, SAS only uses the underlying values for the sort. To my knowledge, you cannot change that (unless you want to build your own translation table via PROC TRANTAB).
What you can do is create a new column that contains the formatted value. Then you can sort on that column.
proc format library=work;
value $test 'z' = 'a'
'y' = 'b'
'x' = 'c';
run;
data test;
format val $test.;
informat val $1.;
input val $;
val_fmt = put(val,$test.);
datalines;
x
y
z
;
run;
proc print data=test(drop=val_fmt);
run;
proc sort data=test;
by val_fmt;
run;
proc print data=test(drop=val_fmt);
run;
Produces
Obs val
1 c
2 b
3 a
Obs val
1 a
2 b
3 c

Creating a single record from multiple records in SAS

I have a SAS Data set called coaches_assistants with the following structure. There are always only two records per TeamID.
TeamID Team_City CoachCode
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
... ... ....
What I'd like to do with this is to create a data set with an extra field called AssistantCode and make it look like:
TeamID Team_City HeadCode AssistantCode
123 Durham 242 876
124 London 876 922
125 Bath 667 786
126 Dover 544 978
... ... ... ...
If possible, I'd like to do this in a single DATA step (though I recognize that I might need a PROC SORT step first). I know how to do it in python or ruby or any traditional scripting languages, but I don't know how to do it in SAS.
What's the best way to do this?
While it's possible to do in one datastep, I generally find that this sort of problem is better served in PROC TRANSPOSE. Less manual coding this way and more flexibility for new things (say a new value "HeadAssistant" appeared, this would instantly work).
data have;
length coachcode $25;
input TeamID Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
;;;;
run;
data have_t;
set have;
id=scan(coachcode,1,'_');
val = scan(coachcode,2,'_');
keep teamId team_city id val;
run;
proc transpose data=have_t out=want(drop=_name_);
by teamID team_city;
id id;
var val;
run;
Here are two possible solutions (one using a data step as requested and another using PROC SQL):
data have;
length TeamID $3 Team_City CoachCode $20;
input TeamID $ Team_City $ CoachCode $;
datalines;
123 Durham Head_242
123 Durham Assistant_876
124 London Head_876
124 London Assistant_922
125 Bath Head_667
125 Bath Assistant_786
126 Dover Head_544
126 Dover Assistant_978
run;
/* A data step solution */
proc sort data=have;
by TeamID;
run;
data want1(keep=TeamID Team_City HeadCode AssistantCode);
/* Define all variables, retain the new ones */
length TeamID $3 Team_City $20 HeadCode $3 AssistantCode $3;
retain HeadCode AssistantCode;
set have;
by TeamID;
if CoachCode =: 'Head'
then HeadCode = substr(CoachCode,6,3);
else AssistantCode = substr(CoachCode,11,3);
if last.TeamID;
run;
/* An SQL solution */
proc sql noprint;
create table want2 as
select TeamID
, max(Team_City) as Team_City
, max(CASE WHEN CoachCode LIKE 'Head%'
THEN substr(CoachCode,6,3) ELSE ' '
END) LENGTH=3 as HeadCode
, max(CASE WHEN CoachCode LIKE 'Assistant%'
THEN substr(CoachCode,11,3) ELSE ' '
END) LENGTH=3 as AssistantCode
from have
group by TeamID;
quit;
PROC SQL has the advantage of not requiring you to sort the data in advance.
This assumes you've sorted the data by teamID, and head coaches always come before assistants. Caveat: untested (I really need to get access to SAS again....)
data want (drop=nc coachcode);
set have;
length headcode assistantcode $3;
retain headcode;
by teamid;
nc = length(coachcode);
if substr(coachcode, 1, 4) = 'Head' then
headcode = substr(coachcode, nc-2, nc);
else
assistantcode = substr(coachcode, nc-2, nc);
if last.teamid;
run;