NAME DATE
---- ----------
BOB 24/05/2013
BOB 12/06/2012
BOB 19/10/2011
BOB 05/02/2010
BOB 05/01/2009
CARL 15/05/2011
LOUI 15/01/2014
LOUI 15/05/2013
LOUI 15/05/2012
DATA newdata;
SET mydata;
count + 1;
IF FIRST.name THEN count=1;
BY name DESCENDING date;
run;
here i got count group wise 1,2,3 so on..I want the output of name(all obs of bob) if count> 3. please help me..
The simplest way to do that is to output the last row for each ID if it is > 3, then merge that dataset back to your master dataset, keeping only matches. You could also use PROC FREQ to generate the dataset of counts and merge to that.
You can do it in a single datastep using a DoW loop, but that's more complicated, so I wouldn't recommend a new user do that.
I think this shows the power of SQL - though some would say since this generates a NOTE in the log it isn't good practice. Use the GROUP & HAVING clause in SQL to create a count of the names that you then limit to 3.
proc sql;
create table want as
select *
from have
group by name
having count(name)>3;
quit;
Here are a couple different ways to do this using SUBQUERIES in PROC SQL
Data HAVE;
Length NAME $50;
Input Name $ Date: ddmmyy10.;
Format date ddmmyy10.;
datalines;
BOB 24/05/2013
BOB 12/06/2012
BOB 19/10/2011
BOB 05/02/2010
BOB 05/01/2009
CARL 15/05/2011
LOUI 15/01/2014
LOUI 15/05/2013
LOUI 15/05/2012
;
Run;
Using a multiple-value subquery in the Where statement
Proc sql;
Create table WANT1 as
Select *
From Have
Where Name in (Select name from have b group by b.name having count(b.name)>3);
Quit;
Using a subquery in the From clause
Proc sql;
Create table WANT2 as
Select a.name, a.date
From Have a Inner Join (select name, count(name) as Count from have b group by b.name having Count>3)
On a.name=b.name
;
Quit;
Related
So I have the following data:
Data Cricket;
input match $;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
; run;
Need Output:
Match Count
IndVsPak 3
NezVsAus 2
WesVsPak 1
WesVsAus 1
Please help with code how many ways we get the above output?
Try this:
Data Cricket;
input match $;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
;
run;
/*standardise team order within each match - easier to do in data step*/
data temp /view = temp;
set cricket;
team1 = substr(match,1,3);
team2 = substr(match,6,3);
call sortc(of team:);
match_sorted = cats(team1,'Vs',team2);
run;
proc sql noprint;
create table want as
select match_sorted, count(match_sorted) as freq
from temp
group by match_sorted
order by freq descending
;
quit;
Output:
match_
sorted freq
IndVsPak 3
AusVsNez 2
AusVsWes 1
PakVsWes 1
Here's my attempt at doing this entirely in proc sql:
proc sql noprint;
create table want as
select
ifc(
team1 < team2,
cats(team1, 'Vs', team2),
cats(team2, 'Vs', team1)
) as match_sorted length=8,
count(calculated match_sorted) as freq
from (
select
substr(match,1,3) as team1,
substr(match,6,3) as team2
from cricket
)
group by match_sorted
order by freq descending
;
quit;
N.B. this uses a calculated field - a bit of SAS-specific sql functionality. You could eliminate this by setting the whole thing up as a sub-query that produces match_sorted, or you could flatten the query and use calculated fields for everything.
Good day, In SAS (almost) everything is done via PROCS. Kind of macros performing actions.
In this case I suggest using Proc freq
Data Cricket;
input match $10.;
cards;
IndVsPak
NezVsAus
PakVsInd
WesVsPak
WesVsAus
IndVsPak
AusVsNez
; run;
proc freq data=Cricket noprint;
table match / out= freqs ;
run;
You can see the output by removing the noprint-option.
This will also work if you are more comfortable using SQL:
PROC SQL;
SELECT match, count(*) AS cnt FROM cricket GROUP BY match;
QUIT;
I want to count the number of unique items in a variable (call it "categories") then use that count to set the number of iterations in a SAS macro (i.e., I'd rather not hard code the number of iterations).
I can get a count like this:
proc sql;
select count(*)
from (select DISTINCT categories from myData);
quit;
I can run a macro like this:
%macro superFreq;
%do i=1 %to &iterationVariable;
Proc freq data=myData;
table var&i / out=var&i||freq;
run;
%mend superFreq;
%superFreq
I want to know how to get the count into the iteration variable so that the macro iterates as many times as there are unique values in the variable "categories".
Sorry if this is confusing. Happy to clarify if need be. Thanks in advance.
You can achieve this by using the into clause in proc sql:
proc sql noprint;
select max(age),
max(height),
max(weight)
into :max_age,
:max_height,
:max_weight
from sashelp.class;
quit;
%put &=max_age &=max_height &=max_weight;
Result:
MAX_AGE= 16 MAX_HEIGHT= 72 MAX_WEIGHT= 150
You can also select a list of results into a macro variable by combining the into clause with the separated by clause:
proc sql noprint;
select name into :list_of_names separated by ' ' from sashelp.class;
quit;
%put &=list_of_names;
Result:
LIST_OF_NAMES=Alfred Alice Barbara Carol Henry James Jane Janet Jeffrey John Joyce Judy Louise Mary Philip Robert Ronald Thomas
William
I am using the below code but in the final output I am not able to get the name in the first entry where income is 234234. How do I get name entry here.
data names;
input name $ age;
datalines;
John 10
Mary 12
Sally 12
Fred 1
Paul 2
;
run;
data check;
input name $ income;
datalines;
Mary 121212
Fred 334343
Ben 234234
;
Proc sql;
title 'Inner Join';
create table common_names as
select * from names as n right join check as c on
n.name = c.name;
run;
Proc print data = common_names;
run;
Output
Inner Join
Obs name age income
1 . 234234
2 Fred 1 334343
3 Mary 12 121212
You cannot create two variables with the same name, in this case the variable NAME. So either create two variables
select n.name as name1, c.name as name2, ....
or use the COALESCE() function to create a single variable.
select coalesce(n.name,c.name) as name, ....
You might also what to look at SAS's NATURAL join. That will link tables on variables with the same name and automatically coalesce the key variable values.
create table common_names as
select *
from names as n
natural right join check as c
;
Just a general question lets say I have two datasets called dataset1 and dataset2 and If I want to compare the rows of dataset1 with the complete dataset2 so essentially compare each row of dataset1 with dataset2. Below is just an example of the two datasets
Dataset1
EmployeeID Name Employeer
12345 John Microsoft
1234567 Alice SAS
1234565 Jim IBM
Dataset1
EmployeeID2 Name DateAbsent
12345 John 25/06/2009
12345 John 26/06/2009
1234567 Alice 27/06/2010
1234567 Alice 30/06/2011
1234567 Alice 2/8/2012
12345 John 28/06/2009
12345 John 25/07/2009
12345 John 25/08/2009
1234565 Jim 26/08/2009
1234565 Jim 27/08/2010
1234565 Jim 28/08/2011
1234565 Jim 29/08/2012
I have written some programming logic its not sas code, this is just my logic
for item in dataset1:
for item2 in dataset2:
if item.EmployeeID=item2.EmployeeID2 and item.Name=item2.Name then output newSet
This is an inner join.
proc sql noprint;
create table output as
select a.EmployeeId,
a.Name,
a.Employeer,
b.DateAbsent
from dataset1 as a
inner join
dataset2 as b
on a.EmployeeID = b.EmployeeID2
and a.Name = b.name;
quit;
I recommend reading the SAS documentation on PROC SQL if you are unfamiliar with the syntax
To do this in a Data step, the data sets need to be sorted by the variables to join on (or indexed). Also the variable names need to be the same, so I will assume both variables are EmployeeID.
/*sort*/
proc sort data=dataset1;
by EmployeeID Name;
run;
proc sort data=dataset2;
by EmployeeID Name;
run;
data output;
merge dataset1 (in=ds1) dataset2 (inds2);
by EmployeeID Name;
if ds1 and ds2;
run;
The data step does the loop for you. It needs sorted sets because it only takes 1 pass over the data sets. The if clause checks to make sure you are getting a value from both data sets.
Is your goal to compare the two dataset and see where there are differences? Proc Compare will do this for you. You can compare specific columns or the entire dataset.
Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;