I have a dataset like this
ACC two three ...
124 12 a
124 14 a
32 129 a
32 12 b
48 1 c
I would like to keep the first ACC, i.e. remove duplicates, based on column two.
I tried with
Data ...;
Set ... ;
By ACC two;
ACC=first.ACC;
keep ACC
two
three;
Run;
However I still have duplicates.
Can you tell me where I am wrong?
Desired output:
ACC two three ...
124 12 a
32 12 b
48 1 c
I think this is what you want
data have;
input ACC two three $;
datalines;
124 12 a
124 14 a
32 129 a
32 12 b
48 1 c
;
proc sort data=have;
by ACC two;
run;
data want;
set have;
by ACC two;
if first.ACC;
run;
Related
I am trying to compute the frequency of observation in a group.
My dataset looks like:
Date Account C_group Age ...
1 152627 A 28
2 152627 B 28
1 163718 B 32
3 163628 D 12
4 163717 C 41
.
.
I would like to determine the percentage of accounts in the different groups.
Do you know how I could that?
Thanks
The following should get you close to what you are looking for:
data dset ;
input
freqgroup $
subgroup ;
datalines ;
A 12
B 12
C 12
C 21
C 23
A 12
A 21
B 12
B 21
B 21
;
run;
proc sort data=dset;
by freqgroup;
run;
proc freq data=dset ;
table freqgroup ;
run ;
proc freq data=dset ;
by freqgroup ;
table subgroup ;
run ;
Say I have a dataset like this
day product sales
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
On day "b" there were no sales for product 1, so there is no row where day = b and product = 1. Is there an easy way to add a row with day = b, product = 1 and sales = 0, and similar "missing" rows to get a dataset like this?
day product sales
1 a 1 48
2 a 2 55
3 a 3 88
4 b 1 0
5 b 2 33
6 b 3 87
7 c 1 97
8 c 2 95
9 c 3 0
In R you can do complete(df, day, product, fill = list(sales = 0)). I realize you can accomplish this with a self-join in proc sql, but I'm wondering if there is a procedure for this.
In this particular example you can also use the SPARSE option in PROC FREQ. It tells SAS to generate all the complete types with every value from DAY included with PRODUCT, so similar to a cross join between those elements. If you do not have the value in the table already it cannot add the value. You would need a different method in that case.
data have;
input n day $ product sales;
datalines;
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
;;;;
run;
proc freq data=have noprint;
table day*product / out=want sparse;
weight sales;
run;
proc print data=want;run;
There are, as usual in SAS, about a dozen ways to do this. Here's my favorite.
data have;
input n day $ product sales;
datalines;
1 a 1 48
2 a 2 55
3 a 3 88
4 b 2 33
5 b 3 87
6 c 1 97
7 c 2 95
;;;;
run;
proc means data=have completetypes;
class day product;
types day*product;
var sales;
output out=want sum=;
run;
completetypes tells SAS to put out rows for every class combination, including missing ones. You could then use proc stdize to get them to be 0's (if you need them to be 0). It's possible you might be able to do this in the first place with proc stdize, I'm not as familiar unfortunately with that proc.
You can do this with proc freq using the sparse option.
Code:
proc freq data=have noprint;
table day*product /sparse out=freq (drop=percent);
run;
Output:
day=a product=1 COUNT=1
day=a product=2 COUNT=1
day=a product=3 COUNT=1
day=b product=1 COUNT=0
day=b product=2 COUNT=1
day=b product=3 COUNT=1
day=c product=1 COUNT=1
day=c product=2 COUNT=1
day=c product=3 COUNT=0
How to generate a repeating series of numbers in a column in SAS, from 1 to x?
Suppose x is 3.
Data is like:
name age
A 15
D 16
C 21
B 35
E 79
F 85
G 64
and I want to add a column named list, like this:
name age list
A 15 1
D 16 2
C 21 3
B 35 1
E 79 2
F 85 3
G 64 1
data class;
set sashelp.class;
if list>=3 then list=0;
list+1;
run;
Easiest way I can think of is to use mod and the iteration counter.
data want;
set have;
list = 1 + mod(_N_ - 1,3);
run;
mod is the modulo function (gives the remainder after dividing).
So if you want that to vary based on some parameter, well, change the 3 to a parameter.
%let num_atwork = 2;
data want;
set have;
list = 1 + mod(_N_ - 1, &num_atwork.);
run;
I want to insert a record of mean into the data set according the identifier variable. The data set is like DS1 and I want to insert a variable if we have more than one pair of a-b values. Such as the target data set would be like DS2. Thanks my friends.
data DS1;
input a b c;
cards;
1 2 23
1 2 43
1 2 23
1 3 55
1 4 48
2 1 43
2 1 56
2 2 34
;
run;
data DS2;
input a b c;
cards;
1 2 23
1 2 43
1 2 23
1 2 27.66
1 3 55
1 4 48
2 1 43
2 1 56
2 1 44.5
2 2 34
;
run;
Why SQL? If you're going to request a specific solution it's good to know why. Here's two methods, one uses a data step and the other is SQL. Essentially the SQL solution calculates the values and UNION ALL appends them into the data set. The DATA STEP calculates the values as it passes through the data set, requiring only one pass through the data, and maintaining the order of the original data set.
data want_datastep;
set ds1;
by a b;
retain sum count;
if first.b then do;
sum=0;
count=0;
end;
sum=sum+c;
count+1;
if last.b and not first.b then do;
output;
c=sum/count;
output;
end;
else output;
run;
proc sql;
create table want_sql as
select * from
(select ds1.* from ds1 as ds1
union all
(select ds1_x.a, ds1_x.b, mean(ds1_x.c) as c
from ds1 as ds1_x
group by ds1_x.a, ds1_x.b
having count(ds1_x.b)>1))
order by a, b, c;
quit;
I am having trouble with how to compare two data sets in SAS, but one data set might have extra observations. I want to get rid of these extra observations and just compare the rest of the two data sets as they are. Let me give an example:
Data Set 1
ID Value1 Value2
105 1 A
105 2 B
105 3 C
*105 4 D
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
Data Set 2
ID Value1 Value2
105 1 A
105 2 B
105 3 C
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
Both data sets are equal except for the observation with ID=105, Value1=4 (marked with an asterisk for visual convenience) that is in Data Set 1, but not in Data Set 2.
I need to compare both data sets with these types of observations gone from my first data set and check if those observations are equal for ID and Value1. And yes, the ID value is repeated for some observations. They are not duplicates though as they have different "Value1" values associated with them.
Is there an easy way to do this?
data a1;
input ID value1 value2$;
datalines;
105 1 A
105 2 B
105 3 C
105 4 D
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
run;
data b1;
input ID value1 value2$;
datalines;
105 1 A
105 2 B
105 3 C
106 10 E
106 20 F
106 30 G
107 50 H
107 60 I
run;
data a2(rename=(value1=value1_a value2=value2_a));
set a1;
newID=compress(ID||value1);
run;
data b2(rename= ( value1=value1_b value2=value2_b));
set b1;
newID=compress(ID||value1);
run;
proc sort data=a2;
by newID;
run;
proc sort data=b2;
by newid;
run;
data c1;
merge a2(in=a) b2(in=b);
by newID;
from_a=a;
from_b=b;
run;
/**check out unmatched data records**/
data unmatched;;
set c1;
where from_a^=1 or from_b^=1;
run;
proc print data=unmatched;
run;
Results:
Here is for matched records:
data matched;;
set c1;
where from_a=1 and from_b=1;
run;
proc print data=matched;
run;
Results:
Use PROC COMPARE with BY or ID
proc sort data=data1;
by id value1 value2;
run;
proc sort data=data2;
by id value1 value2;
run;
proc compare base=data1 compare=data;
id id value1;
run;
This is documented under Comparing datasets with an ID variable:
http://support.sas.com/documentation/cdl/en/proc/65145/HTML/default/viewer.htm#n14cxqy1h9hof4n1cq4xmhv2atgs.htm