Make a string in SAS from one column - sas

I have a dataset in SAS and I want to Convert one column into string by the Product. I have attached the image of input and output required.
I need the Colomn STRING in the outut. can anyone please help me ?

I have coded a data step to create the input data:
data have;
input products $
dates
value
;
datalines;
a 1 0
a 2 0
a 3 1
a 4 0
a 5 1
a 6 1
b 1 0
b 2 1
b 3 1
b 4 1
b 5 0
b 6 0
c 1 1
c 2 0
c 3 1
c 4 1
c 5 0
c 6 1
;
Does the following suggested solution give you what you want?:
data want;
length string $ 20;
do until(last.products);
set have;
by products;
string = catx(',',string,value);
end;
do until(last.products);
set have;
by products;
output;
end;
run;

Here's my quick solution.
data temp;
length cat $20.;
do until (last.prod);
set have;
by prod notsorted;
cat=catx(',',cat,value);
end;
drop value date;
run;
proc sql;
create table want as
select have.*, cat as string
from have inner join temp
on have.prod=temp.prod;
quit;

Related

Needing to retain Lab category tests based on individual positive test result

Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;

SAS, sum by row AND column

I want to do some sum calculate for a data set. The challenge is I need to do both row sum AND column Sum by ID. Below is the example.
data have;
input ID var1 var2;
datalines;
1 1 1
1 3 2
1 2 3
2 0 5
2 1 3
3 0 1
;
run;
data want;
input ID var1 var2 sum;
datalines;
1 1 1 12
1 3 2 12
1 2 3 12
2 0 5 9
2 1 3 9
3 0 1 1
;
run;
Using SQL is cool, but SAS has nice data step!
proc sort data=have; by id; run;
data result;
set have;
by id;
retain sum 0;
if first.id then sum=0;
sum=sum+sum(var1,var2);
if last.id then output;
run;
proc sort data=result; by id; run;
data want;
merge have result;
by id;
run;
You will decide what to use...
Use SQL to do all of it in one step. Group only by ID, but keep var1 and var2 in the column selection. This will create the same data in want.
proc sql noprint;
create table want as
select ID
, var1
, var2
, sum(var1) + sum(var2) as sum
from have
group by ID
;
quit;

SAS concatenate in SAS Data Step

I don't know how to describe this question but here is an example. I have an initial dataset looks like this:
input first second $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
...
;
I want an output dataset like this:
input first second $;
cards;
1 "A,B,C,D"
2 "E,F"
3 "S,A"
4 "C"
5 "Y"
6 "II,UU,OO,N"
7 "G,H"
...
;
Both tables will have two columns. Unique value of range of the column "first" could be 1 to any number.
Can someone help me ?
something like below
proc sort data=have;
by first second;
run;
data want(rename=(b=second));
length new_second $50.;
do until(last.first);
set have;
by first second ;
new_second =catx(',', new_second, second);
b=quote(strip(new_second));
end;
drop second new_second;
run;
output is
first second
1 "A,B,C,D"
2 "E,F"
3 "A,S"
4 "C"
5 "Y"
6 "II,N,OO,UU"
7 "G,H"
You can use by-group processing and the retain function to achieve this.
Create a sample dataset:
data have;
input id value $3.;
cards;
1 A
1 B
1 C
1 D
2 E
2 F
3 S
3 A
4 C
5 Y
6 II
6 UU
6 OO
6 N
7 G
7 H
;
run;
First ensure that your dataset is sorted by your id variable:
proc sort data=have;
by id;
run;
Then use the first. and last. notation to identify when the id variable is changing or about to change. The retain statement tells the datastep to keep the value within concatenated_value over observations rather than resetting it to a blank value. Use the quote() function to apply the " chars around the result before outputting the record. Use the cats() function to perform the actual concatenation and separate the records with a ,.
data want;
length contatenated_value $500.;
set have;
by id;
retain contatenated_value ;
if first.id then do;
contatenated_value = '';
end;
contatenated_value = catx(',', contatenated_value, value);
if last.id then do;
contatenated_value = quote(cats(contatenated_value));
output;
end;
drop value;
run;
Output:
contatenated_
value id
"A,B,C,D" 1
"E,F" 2
"S,A" 3
"C" 4
"Y" 5
"II,UU,OO,N" 6
"G,H" 7

SAS sort by original order

Say you have three separate data sets consisting of the same number of observations. Each observation has an ID letter, A-Z, followed by some numerical observation. For example:
Data set 1:
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
Data set 2:
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
I want to merge the data sets BY that first variable. The problem is, the first variable is NOT already sorted in alphabetical form, and I do not want to sort it in alphabetical form. I want to merge the data but keep the original order. For example, I would get:
Merged data:
B 3 8 1 9 4
B 0 3 3 1 8
C 4 1 9 3 1
C 3 1 9 4 0
A 4 4 5 4 9
A 4 1 2 0 0
Is there any way to do this?
You can create a variable that holds the order and then apply that the new dataset after its "merged". I believe this is an append rather than merge though. I've used a format, though you could use a sql or data set merge as well.
data have1;
input id $ var1-var5;
cards;
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
;
run;
data have2;
input id $ var1-var5;
cards;
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
;
run;
data order;
set have1;
fmtname='sort_order';
type='J';
label=_n_;
start=id;
keep id fmtname type label start;
run;
proc format cntlin=order;
run;
data want;
set have1 have2;
order_var=input(id, $sort_order.);
run;
proc sort data=want;
by order_var;
run;
This is just one SQL version which follows along a similar path to Joe's answer. Row order is input via a sub-query rather than a format. However the initial order of the two input tables is lost in the join to the row order sub-query. The original order (have2 follows have1) is re-instated by using the table names as a secondary order variable.
proc sql;
create table want1 as
select want.id
,want.var1
,want.var2
,want.var3
,want.var4
,want.var5
from (
select *
, 'have1' as source
from have1
union all
select *
, 'have2' as source
from have2
) as want
left join
(
select id
, monotonic() as row_no
from have1
) as order
on want.id eq order.id
order by order.row_no
,want.source
;
quit;
proc compare
base=want1
compare=want
;
run;
And this is a data step version without a format. Here the have1 table with row order is re-merged with the concatenated data (have1 and have2) and then re-sorted by row order.
data want2;
set have1 have2;
run;
data have1;
set have1;
order_var = _n_;
run;
proc sort data=want2;
by id;
run;
proc sort data=have1;
by id;
run;
data want2;
merge want2 have1;
by id;
run;
proc sort data=want2;
by order_var;
run;
proc compare
base=want2
compare=want
;
run;

Is there a way to delete 0's from a dataset?

Suppose I want to only apply proc means or the better means macro to only non zero entries in my dataset? Is there an easy option to do this? If I have a dataset:
A B C
0 1 2
2 2 0
2 0 1
How can I use proc means or the better means macro to ignore the 0 values?
You can create a view to convert them on the fly. BETTERMEANS may have a way of handling this; not sure.
data have;
input A B C ;
format a b c zeromissing1.;
datalines;
0 1 2
2 2 0
2 0 1
;;;;
run;
data have_z/view=have_z;
set have;
array num _numeric_;
do _i = 1 to dim(num);
if num[_i]=0 then num[_i]=.;
end;
run;
proc means data=have_z;
var a b c;
run;