SAS count unique observation by group - if-statement

I am looking to figure out how many customers get their product from a certain store. The problem each prod_id can have up to 12 weeks of data for each customer. I have tried a multitude of codes, some add up all of the obersvations for each customer while others like the one below remove all but the last observation.
proc sort data= have; BY Prod_ID cust; run;
Data want;
Set have;
by Prod_Id cust;
if (last.Prod_Id and last.cust);
count= +1;
run;
data have
prod_id cust week store
1 A 7/29 ABC
1 A 8/5 ABC
1 A 8/12 ABC
1 A 8/19 ABC
1 B 7/29 ABC
1 B 8/5 ABC
1 B 8/12 ABC
1 B 8/19 ABC
1 B 8/26 ABC
1 C 7/29 XYZ
1 C 8/5 XYZ
1 F 7/29 XYZ
1 F 8/5 XYZ
2 A 7/29 ABC
2 A 8/5 ABC
2 A 8/12 ABC
2 A 8/19 ABC
2 C 7/29 EFG
2 C 8/5 EFG
2 C 8/12 EFG
2 C 8/19 EFG
2 C 8/26 EFG
what i want it to look like
prod_id store count
1 ABC 2
1 XYZ 2
2 ABC 1
2 EFG 2

Firstly, read about if-statement.
I've just edited your code to make it work:
proc sort data=have;
by prod_id store cust;
run;
data want(drop=cust week);
set have;
retain count;
by prod_id store cust;
if (last.cust) then count=count+1;
else if (first.prod_id or first.store) then count = 0;
if (last.prod_id or last.store) then output;
run;
If you will have questions, ask.

The only place where the result of the COUNT() aggregate function in SQL might be confusing is that it will not count missing values of the variable.
select prod_id
, store
, count(distinct cust) as count
, count(distinct cust)+max(missing(cust)) as count_plus_missing
from have
group by prod_id ,store
;

Related

SAS: assign a value to a variable based on characteristics of other variables

I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;

SAS flag each row that contains the max value

I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;

In a sas compare, output only differences and new records

In a compare with id, how can I output only the difference and the new records
but not the old records no more present?
Example, suppose I have two tables:
mybase:
key other
1 Ann
3 Ann
4 Charlie
5 Emily
and mycompare:
key other
2 Bill
3 Charlie
4 Charlie
running:
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
I get a table "myoutput" like this:
type obs key other
base 1 1 Ann
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
base 4 5 Emily
I would like to have this:
type obs key other
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
This works for your example. I think you want to output records that are not matched in base and any records that match and have differences.
data mybase;
input key other $;
cards;
1 Ann
3 Ann
4 Charlie
5 Emily
;;;;
data mycompare;
input key other $;
cards;
2 Bill
3 Charlie
4 Charlie
;;;;
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
proc print;
run;
data test;
set myoutput;
by key;
if (first.key and last.key) and _type_ eq 'BASE' then delete;
run;
proc print;
run;
Obs _TYPE_ _OBS_ key other
1 COMPARE 1 2 Bill
2 BASE 2 3 Ann
3 COMPARE 2 3 Charlie
4 DIF 1 3 XXXXXXX.

add tables down on other table with variable name in SAS

I have 2 sas output tables. First table has a,b,c columns and second table has d,e,f columns
First table is :
a b c
1 2 3
4 5 6
Second table is :
d e f
7 8 9
Is it possible to append them in one sheet with desired output
a b c
1 2 3
4 5 6
d e f
7 8 9
Yes, you can append the second table to the first using proc append , you just need to rename the columns in second table before appending.
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Full Code:
data table1;
input a b c;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d e f;
datalines;
7 8 9
;
run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Table1 will look like this after appending:
a=1 b=2 c=3
a=4 b=5 c=6
a=7 b=8 c=9
Another Option: If all your data is characters, you just need a create a third table to hold the column names you want to append.
Full Code:
data table1;
input a $ b $ c $;
datalines;
1 2 3
4 5 6
;
run;
data table2;
input d $ e $ f $;
datalines;
7 8 9
;
run;
data table2_names;
input d $ e $ f $;
datalines;
d e f
;
run;
proc append base=table1 data=table2_names(rename=(d=a e=b f=c)) ;run;
proc append base=table1 data=table2(rename=(d=a e=b f=c)) ;run;
Output:
a=1 b=2 c=3
a=4 b=5 c=6
a=d b=e c=f
a=7 b=8 c=9

Averaging Panel Data in SAS

I have panel data set that looks like this
ID Usage month
1234 2 -2
1234 4 -1
1234 3 1
1234 2 2
2345 5 -2
2345 6 -1
2345 3 1
2345 6 2
Obviously there are more ID variables and usage data, but this is the general form. I want to average the usage data when the month column is negative, and when it is positive for each ID. In other words for each unique ID, average the usage for negative months and for positive months. My goal is to get something like this.
ID avg_usage_neg avg_usage_pos
1234 3 2.5
2345 5.5 4.5
Here's a few options for you.
First create the test data:
data sample;
input ID
Usage
month;
datalines;
1234 2 -2
1234 4 -1
1234 3 1
1234 2 2
2345 5 -2
2345 6 -1
2345 3 1
2345 6 2
;
run;
Here's an SQL solution:
proc sql noprint;
create table result as
select id,
avg(ifn(month < 0, usage, .)) as avg_usage_neg,
avg(ifn(month > 0, usage, .)) as avg_usage_pos
from sample
group by 1
;
quit;
Here's a datastep / proc means solution:
data sample2;
set sample;
usage_neg = ifn(month < 0, usage, .);
usage_pos = ifn(month > 0, usage, .);
run;
proc means data=sample2 noprint missing nway;
class id;
var usage_neg usage_pos;
output out=result2 mean=;
run;