Here is my Input:
ID Color
1 green
1 red
1 orange
1 green
1 red
2 red
2 red
2 blue
3 green
3 red
Here is what I want in my output - a count of records by ID for each color:
ID green red orange blue
1 2 2 1 0
2 0 2 0 1
3 1 1 0 0
I know I can get the information using proc freq, but I want to output a dataset exactly like the one I have written above. I can't seem to figure out how to make the colors the columns in this output dataset.
first, generate the data.
data data;
format ID 8. Color $8.;
input id color;
datalines;
1 green
1 red
1 orange
1 green
1 red
2 red
2 red
2 blue
3 green
3 red
run;
next, summarize color counts by id.
proc freq data=data noprint;
table id*color / out=freq;
run;
make the table flat.
proc transpose data=freq out=freq_trans(drop=_:);
id color;
by id;
var count;
run;
optionally, fill in missing cells with 0.
data freq_trans_filled;
set freq_trans;
array c(*) green red orange blue;
do i = 1 to dim(c);
if c[i]=. then c[i]=0;
end;
drop i;
run;
You can fill the missing cells with zero's using the SPARSE option to the PROC FREQ's TABLE statement. This way, you don't need another DATA step. The order of the colors can also be controlled by the ORDER= option to PROC FREQ.
data one;
input id color :$8.;
datalines;
1 green
1 red
1 orange
1 green
1 red
2 red
2 red
2 blue
3 green
3 red
run;
proc freq data=one noprint order=data;
table id*color /out=freq sparse;
run;
proc transpose data=freq out=two(drop=_:);
id color;
by id;
var count;
run;
proc print data=two noobs;
run;
/* on lst
id green red orange blue
1 2 2 1 0
2 0 2 0 1
3 1 1 0 0
*/
I've never been a fan of proc transpose because I can never remember the syntax. Here's a way to do it with proc sql and a macro variable.
proc sql noprint;
select sum(color = '" || trim(color) || "') as " || color into: color_list separated by ", "
from (select distinct color from one);
create table result as
select id,
&color_list
from one
group by id;
quit;
id blue green orange red
1 0 2 1 2
2 1 0 0 2
3 0 1 0 1
For (pteranodon), I happened to be reviewing the archives(6+ yrs later) which is why so untimely, but someone may benefit.
proc sql noprint feedback;
select catx(' ','sum(color =',quote(trim(color)),') as',color) into: color_list separated by ", "
from (select distinct color from one);
create table result as
select id, &color_list
from one
group by id;
quit;
Related
I have a data set where ID's have 2 different occurences on the same day. There are about 10 different occurences. I want to cross tabulate the occurences using proc freq or proc tabulate & find how many times each instance occurs on the same day. I want my table to look something like this
Frequency occ1 occ2 occ3 occ4 occ5 occ6
occ1 2 0 0 1 4 0
occ2 1 0 0 0 0 0
occ3 3 0 0 0 0 0
occ4 0 5 3 0 3 0
occ5 0 2 4 0 5 0
occ6 1 5 4 2 1 2
My data looks something like this
data have;
input id occurrence ;
datalines;
id1 occ3
id1 occ2
id2 occ1
id2 occ6
id3 occ2
id3 occ4
etc...
i tried
proc freq data=have;
tables occurrence*occurence ;
run;
but not having any luck.
I have tried other variations & using by ID but it gives every single ID individually & i have about 200 ID numbers.
Thanks!
Reform data so a tabulation of ordered pairs can be done.
data have;
call streaminit(2022);
do id = 1 to 20;
topic = rand('integer', 10); output;
topic = rand('integer', 10); output;
end;
run;
data stage;
do until (last.id);
set have;
by id;
row = col;
col = topic;
end;
run;
ods html file='pairfreq.html';
title "Ordered pair counts";
proc tabulate data=stage;
class row col;
table row='1st topic in id pair',col='2nd topic in id pair'*n='';
run;
ods html close;
I have a dataset that consists of a product variable, an area variable and then each of the years from the last 10 years as an individual variable, so 12 variables in total for the dataset.
I cant work out how to display the data from a single row into a pie chart.
The dataset looks as such just to make it easier to visualise:
Product Area year1 year2 year3
1 1 7 14 7
1 2 12 15 11
1 3 5 9 8
2 1 4 12 5
2 2 8 3 14
2 3 5 0 2
3 1 2 12 12
My end result is to be able to input say product 1 and area 3 and then have it produce a pie chart that shows the values for each of the years. I can't figure out a way of doing it though, my current knowledge and research suggests that pulling from a single row isn't possible?
First stack the years variables in one colum with proc transpose ;
Then make a normal pie chart with BY Product Area; to have one chart per original line (assuming Product*Area is actually a unique ID for you lines). I used proc gchart here.
*** DEFINE DATA --------------;
data have;
infile datalines dlm=' ';
input Product Area year1 year2 year3;
datalines;
1 1 7 14 7
1 2 12 15 11
1 3 5 9 8
2 1 4 12 5
2 2 8 3 14
2 3 5 0 2
3 1 2 12 12
;run;
*** STACK YEARS --------------;
proc sort data=work.have out=work.tmp0temptableinput;
by product area;
run;
proc sql;
create view work.tt1 as
select src.*, "values" as _eg_idcol_
from work.tmp0temptableinput as src;
quit;
proc transpose data=work.tt1
out=work.tt2
name=year;
by product area;
id _eg_idcol_;
var year:; * THIS IS GENERALISED FOR FOR THAN 3 YEARxx VARIABLES;
run;
proc datasets lib=work nolist;
modify tt2;
label values = ;
label year = ;
label valuedescription = ;
run;
*** PLOT --------------;
proc sort
data=work.tt2
out=work.tt3;
by product area;
run;
proc gchart data =work.tt3;
pie year /
sumvar=values
type=sum
nolegend
slice=outside
percent=none
value=outside
other=4
otherlabel="other"
coutline=black
noheading
;
by product area;
run; quit;
I have a situation where I would like to put the value of a variable in the label in SAS.
Example: Median for Total_Days is 2. I would like to put this value in Days_Median_Split label. The median keeps on changing with varying data, so I would like to automate it.
Phy_Activity Total_Days "Days_Median_Split: Number of Days with Median 2"
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
Sample Dataset
Thanks so much!
* step 1 create data;
data have;
input Phy_Activity $ Total_Days Days_Median_Split;
datalines;
No 0 0
No 0 0
Yes 2 1
Yes 3 1
Yes 5 1
run;
*step 2 sort data on Total_days;
proc sort data = have;
by Total_days;
run;
*step 3 get count of obs;
proc sql noprint;
select count(*) into: cnt
from have;quit;
* step 4 calulate median;
%let median = %sysevalf(&cnt/2 + .5);
*step 5 get median obsevation;
proc sql noprint;
select Total_days into: medianValue
from have
where monotonic()=&median;quit;
*step 6 create label;
data have;
set have;
label Days_Median_split = 'Days_Median_split: Number of Days with Median '
%trim(&medianValue);
run;
How to do below codes in proc sql.
Two proc statement and one merge given below.
proc sort data=new out=new1 nodupkey;
by id;
where roll=100;
run;
proc sort data new2 out =new4 nodupkey
by id;
where roll=100;
run;
data score;
merge new4 (in=a) new1;
by id;
if a;
run;
The merge you show is equivalent to SQL left-join. You want all the rows from "new2" and ignore all the rows from "new" that don't have a common id. The uniqueness of the id (per the pre-sorts) further supports a left-join equivalence.
Proc SQL;
select new.*, new2.*
from new2
left join new on new.id = new2.id
where roll=100
order by id;
quit;
For the scenario of atypical data where there is many:many ids in the merge, the left-join is not equivalent.
I did leave out the NODUPKEY equivalent. Presuming option EQUALS is in effect, the selection of a groups first row would be equivalent. The undocumented MONOTONIC() function can be used to apply a default row order to a sub-query, which can then be used in a by group having expression.
data LEFT;
input id x1 x2 x3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
;
run;
data RIGHT;
input id y1 y2 y3 x1;
datalines;
1 1 1 1 11
2 1 1 1 22
3 1 2 3 4
3 2 3 4 5
3 3 4 5 6
4 1 1 1 44
6 6 6 6 6
;
run;
proc sql;
select
LEFT.id
, coalesce(RIGHT.x1,LEFT.x1) as x1
, LEFT.x2
, LEFT.x3
, RIGHT.y1
, RIGHT.y2
, RIGHT.y3
from
(
select * from (select monotonic() as _seq_, * from LEFT) group by id having _seq_ = min(_seq_)
)
as LEFT
left join
(
select * from (select monotonic() as _seq_, * from RIGHT) group by id having _seq_ = min(_seq_)
)
as RIGHT
on
LEFT.id = RIGHT.id
;
I feel the need to reiterate that SQL left join is not always the same a merge, and SQL does not have common variable 'overlaying' that is implicit in DATA Step. When LEFT and RIGHT collide on non-key variables, you need to select a coalescence of the common variables into a new like-named variable in the output.
I have created this table:
And from this I want to create an adjacency matrix which shows how many employee_id's the tables share. It would look like this (I think):
I'm not sure if I'm going about this the correct way. I think I may be doing it wrong. I know that this is probably easier if I have more SAS products but I only have the basic SAS enterprise guide to work with.
I really appreciate the help. Thank you.
Here's another way using PROC CORR that's still better than the solution above. And you don't need to filter - it doesn't matter regarding the variables, you only specify them in the PROC CORR procedure.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
ods output sscp=want;
proc corr data=id sscp ;
var human alien wizard;
run;
proc print data=want;
format _numeric_ 8.;
run;
Results are:
Obs Variable human alien wizard
1 human 4 2 1
2 alien 2 4 1
3 wizard 1 1 3
I think this is what you want but it does not give the thing you show as answer.
data id;
input id:$4. human alien wizard;
cards;
1005 1 1 0
1018 0 0 1
1022 0 0 1
1024 1 0 0
1034 0 1 0
1069 0 1 0
1078 1 0 0
1247 1 1 1
;;;;
run;
proc corr noprint nocorr sscp out=sscp;
var human alien wizard;
run;
proc print;
run;
I was able to get the answer using this, although it does not include the last cell I wanted (human_alien_wizard):
proc transpose data=FULL_JOIN_ALL3 out=FULL_JOIN_ALL3_v2;
by employee_id;
var human_table alien_table wizard_table;
run;
proc sql;
create table FULL_JOIN_ALL3_v3 as
select distinct a._name_ as anm,b._name_ as bnm,
count(distinct case when a.col1=1 and b.col1=1 then a.employee_id else . end) as smalln
from FULL_JOIN_ALL3_v2 a, FULL_JOIN_ALL3_v2 b
where a.employee_id=b.employee_id
group by anm,bnm
;
proc tabulate data=FULL_JOIN_ALL3_v3;
class anm bnm;
var smalln;
table anm='',bnm=''*smalln=''*sum=''*f=best3. / rts=5;
run;