I want to merge two data sets in SAS. I want to show by using example:
Group Value
A 10
A 8
A 6
B 7
B 9
B 11
it is my first data set. I have the second dataset as well:
Group Volume
A 2
B 3
I want to merge these two data sets. The result should be:
Group Value Volume
A 10 2
A 8 2
A 6 2
B 7 3
B 9 3
B 11 3
I hope, i can explain it. Many thanks.
Well one way is to use proc sql and just use a join:
proc sql noprint;
select a.*,b.volume
from dataset1 as a
left join dataset2 as b
on a.group = b.group;quit;
or if you want to do it with merge:
data combine;
merge dataset1 dataset2;
by group;
run;
Related
I want to transpose a simple dataset as left, to become a dataset at the right. They are all numeric variables. Please also make the variable names as I put there (I have a lot of variables I want to follow this pattern), would prefer not to rename them by hand one by one if possible. Thank you!
Here is a simple approach. I added another id for demonstration. You can re-arrange the columns if you like.
data have;
input id Vistime v1 v2;
datalines;
1 1 2 5
1 2 3 6
1 3 4 7
2 1 2 5
2 2 3 6
2 3 4 7
;
proc transpose data=have out=temp;
by id Vistime;
var v1 v2;
run;
proc transpose data=temp delim=_ out=want(drop=_:);
by id;
var col1;
id _name_ Vistime;
run;
Result
id v1_1 v2_1 v1_2 v2_2 v1_3 v2_3
1 2 5 3 6 4 7
2 2 5 3 6 4 7
The google search has been difficult for this. I have two categorical variables, age and months, with 7 levels each. for a few levels, say age =7 and month = 7 there is no value and when I use proc sql the intersections that do not have entries do not show, eg:
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
THIS LINE DOESNT SHOW
what i want
age month value
1 1 4
2 1 12
3 1 5
....
7 1 6
...
1 7 8
....
5 7 44
6 7 5
7 7 0
this happens a few times in the data, where tha last groups dont have value so they dont show, but I'd like them to for later purposes
You have a few options available, both seem to work on the premise of creating the master data and then merging it in.
Another is to use a PRELOADFMT and FORMATs or CLASSDATA option.
And the last - but possibly the easiest, if you have all months in the data set and all ages, then use the SPARSE option within PROC FREQ. It creates all possible combinations.
proc freq data=have;
table age*month /out = want SPARSE;
weight value;
run;
First some sample data:
data test;
do age=1 to 7;
do month=1 to 12;
value = ceil(10*ranuni(1));
if ranuni(1) < .9 then
output;
end;
end;
run;
This leaves a few holes, notably, (1,1).
I would use a series of SQL statements to get the levels, cross join those, and then left join the values on, doing a coalesce to put 0 when missing.
proc sql;
create table ages as
select distinct age from test;
create table months as
select distinct month from test;
create table want as
select a.age,
a.month,
coalesce(b.value,0) as value
from (
select age, month from ages, months
) as a
left join
test as b
on a.age = b.age
and a.month = b.month;
quit;
The group independent crossing of the classification variables requires a distinct selection of each level variable be crossed joined with the others -- this forms a hull that can be left joined to the original data. For the case of age*month having more than one item you need to determine if you want
rows with repeated age and month and original value
rows with distinct age and month with either
aggregate function to summarize the values, or
an indication of too many values
data have;
input age month value;
datalines;
1 1 4
2 1 12
3 1 5
7 1 6
1 7 8
5 7 44
6 7 5
8 8 1
8 8 11
run;
proc sql;
create table want1(label="Original class combos including duplicates and zeros for absent cross joins")
as
select
allAges.age
, allMonths.month
, coalesce(have.value,0) as value
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
order by
allMonths.month, allAges.age
;
quit;
And a slight variation that marks duplicated class crossings
proc format;
value S_V_V .t = 'Too many source values'; /* single valued value */
quit;
proc sql;
create table want2(label="Distinct class combos allowing only one contributor to value, or defaulting to zero when none")
as
select distinct
allAges.age
, allMonths.month
, case
when count(*) = 1 then coalesce(have.value,0)
else .t
end as value format=S_V_V.
, count(*) as dup_check
from
(select distinct age from have) as allAges
cross join
(select distinct month from have) as allMonths
left join
have
on
have.age = allAges.age and have.month = allMonths.month
group by
allMonths.month, allAges.age
order by
allMonths.month, allAges.age
;
quit;
This type of processing can also be done in Proc TABULATE using the CLASSDATA= option.
I am wanting to count the number of time a certain value appears in a particular column in sas. For example in the following dataset the value 1 appears 3 times
value 2 appears twice, value 3 appears once, value 4 appears 4 times and value 5 appears four times.
Game_ball
1
1
1
2
2
3
4
4
4
5
5
5
5
5
I want the dataset to represented like the following:
Game_ball Count
1 3
2 2
3 1
4 4
5 4
. .
. .
. .
Thanks in advance
As per #Dwal, proc freq is the easiest solution.
Using your sample data,
proc freq data=sample;
table game_ball/out=output;
run;
Or do it in one-pass data step
proc sort data = sample;by game_ball;run;
data output;
set sample;
retain count;
if first.game_ball then count = 0;
count + 1;
if last.game_ball then output;
by game_ball;
run;
Or in SQL
proc sql;
create table output as
select game_ball, count(*) as count
from sample
group by game_ball;
quit;
I am merging two SAS datasets by ID number and would like to remove all instances of duplicate IDs, i.e. if an ID number occurs twice in the merged dataset then both observations with that ID will be deleted.
Web searches have suggested some sql methods and nodupkey, but these are not working because they are for typical duplicate cleansing where one instance is kept and then the multiples are deleted.
Assuming you are using a DATA step with a BY id; statement, then adding:
if NOT (first.id and last.id) then delete;
should do it. If that doesn't work, please show your code.
I'm actually a fan of writing dropped records to a separate dataset so you can track how many records were dropped at different points. So I would code this something like:
data want
drop_dups
;
merge a b ;
by id ;
if first.id and last.id then output want ;
else output drop_dups ;
run ;
Here is an SQL way to do it. You can use left/right/inner join best suitable for your needs. Note that this works on a single dataset just as well.
proc sql;
create table singles as
select * from dataset1 a inner join dataset2 b
on a.ID = b.ID
group by a.ID
having count(*) = 1;
quit;
For example from
ID x
5 2
5 4
1 6
2 7
3 6
You will select
ID x
1 6
2 7
3 6
Suppose I have a dataset A:
ID Geogkey
1 A
1 B
1 C
2 W
2 R
2 S
and another dataset B:
ID Temp Date
1 95 1
1 100 2
1 105 3
2 10 1
How do I merge these two datasets so I get three records each for geogkeys with id=1 and one record each for geogkeys where id =2?
Assuming you want the cartesian join, you are best off doing that in SQL, if it's not too big:
proc sql;
create table C as
select * from A,B
where A.ID=B.ID
;
quit;
The select * will generate a warning that the ID variables are overwriting; if that's a concern, explicitly spell out your select (select A.ID, A.Geogkey, B.Temp, B.date).