Here is the the structure of my data below:
I only need ID 2 since all patients are alive, therefore trying to delete ID 1:
ID sex status
1 2 A
1 2 A
1 2 A
1 2 D
2 1 A
2 1 A
2 1 A
If you truly want to delete the records from your source dataset, you can do so with:
PROC SQL;
DELETE FROM MyData WHERE ID = 1;
QUIT;
However, if you want to retain the source dataset as-is; maybe you will use it again, it would be best to create a new dataset from it, like so:
PROC SQL;
CREATE TABLE MyFilteredData AS
SELECT ID, sex, status
FROM MyData
WHERE ID = 2;
QUIT;
or
DATA MyFilteredData;
SET MyData;
IF ID = 2;
RUN;
proc sql;
delete from your_data where id ~= 2;
quit;
This PROC SQL will create a new dataset Want from the original dataset Have including only IDs that have no status="D":
proc sql;
create table Want as
select *
from Have
where ID not in
(select distinct ID
from Have
where status="D")
;
quit;
Related
Below given dataset I am trying to find New Users Vs Repeated Users.
DATE ID Unique_Event
20200901 a12345 1
20200902 a12345 1
20200903 b12345 1
20200903 a12345 1
20200904 c12345 1
In the above dataset, since a12345 appeared on multiple dates, should be counted as a "repeated" user whereas b12345 only appeared once, so he is a "new" user. Please note, this is only sample data as the actual data is quite large. I tried the below code, but I am not getting the correct count. Ideally, tot_num_users-num_new_users should be repeated users, but I am getting incorrect counts. Am I missing something?
Expected Output:
Month new_users repeated_users
9 2 1
Code:
data user_events;
set user_events;
new_date=input(date,yymmdd10.);
run;
proc sql;select month(new_date) as mm,
count(distinct vv.id) as total_num_users,
count(distinct case when v.new_date = vv.minva then v.id end) as num_new_users,
(count(distinct vv.id) - count(distinct case when v.new_date = vv.minva then id end)
) as num_repeated_users
from user_events v inner join
(select t.id, min(new_date) as minva
from user_events t
group by t.id
) vv
on v.id = vv.id
group by 1
order by 1;quit;
In a sub-select, for each ID you can count the number of distinct DATE to determine the new / repeated status. The all ids aggregate computations are made from the sub-select.
proc sql;
create table freq as
select
count(*) as id_count
, sum (status='repeated') as id_repeated_count /* sum counts a logic eval state */
, sum (status='new') as id_new_count
from
( select
id
, case
when count(distinct date) > 1 then 'repeated'
else 'new'
end as status
from
user_events
group by
id
) as statuses
;
An alternative solution not using proc sql (though I'm aware you tagged this with "proc sql").
data final;
set user_events;
Month=month(new_date);
run;
proc sort data=final; by Month ID;
data final;
set final;
by Month ID;
if first.Month then do;
new_users=0;
repeated_users=0;
end;
if last.ID then do;
if first.ID then
new_users+1;
else
repeated_users+1;
end;
if last.Month then
output;
keep Month new_users repeated_users;
run;
Since you are using proc sql, this is a sql question, not a SAS question.
Try something like:
proc sql;
select ID,count(Unique_Event)
from <that table>
group by ID
order by ID
run;
I have a sas datebase with something like this:
id birthday Date1 Date2
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
And I want the data in this form:
id Date Datetype
1 12/4/01 birthday
1 12/4/13 1
1 12/3/14 2
2 12/3/01 birthday
2 12/6/13 1
2 12/2/14 2
3 12/9/01 birthday
3 12/4/03 1
3 12/9/14 2
4 12/8/13 birthday
4 12/3/14 1
4 12/10/16 2
Thanks by ur help, i'm on my second week using sas <3
Edit: thanks by remain me that i was not finding a sorting method.
Good day. The following should be what you are after. I did not come up with an easy way to rename the columns as they are not in beginning data.
/*Data generation for ease of testing*/
data begin;
input id birthday $ Date1 $ Date2 $;
cards;
1 12/4/01 12/4/13 12/3/14
2 12/3/01 12/6/13 12/2/14
3 12/9/01 12/4/03 12/9/14
4 12/8/13 12/3/14 12/10/16
; run;
/*The trick here is to use date: The colon means everything beginning with date, comparae with sql 'date%'*/
proc transpose data= begin out=trans;
by id;
var birthday date: ;
run;
/*Cleanup. Renaming the columns as you wanted.*/
data trans;
set trans;
rename _NAME_= Datetype COL1= Date;
run;
See more from Kent University site
Two steps
Pivot the data using Proc TRANSPOSE.
Change the names of the output columns and their labels with PROC DATASETS
Sample code
proc transpose
data=have
out=want
( keep=id _label_ col1)
;
by id;
var birthday date1 date2;
label birthday='birthday' date1='1' date2='2' ; * Trick to force values seen in pivot;
run;
proc datasets noprint lib=work;
modify want;
rename
_label_ = Datetype
col1 = Date
;
label
Datetype = 'Datetype'
;
run;
The column order in the TRANSPOSE output table is:
id variables
copy variables
_name_ and _label_
data based column names
The sample 'want' shows the data named columns before the _label_ / _name_ columns. The only way to change the underlying column order is to rewrite the data set. You can change how that order is perceived when viewed is by using an additional data view, or an output Proc that allows you to specify the specific order desired.
Trying to make a more simple unique identifier from already existing identifier. Starting with just and ID column I want to make a new, more simple, id column so the final data looks like what follows. There are 1million + id's, so it isnt an option to do if thens, maybe a do statement?
ID NEWid
1234 1
3456 2
1234 1
6789 3
1234 1
A trivial data step solution not using monotonic().
proc sort data=have;
by id;
run;
data want;
set have;
by id;
if first.id then newid+1;
run;
using proc sql..
(you can probably do this without the intermediate datasets using subqueries, but sometimes monotonic doesn't act the way you'd think in a subquery)
proc sql noprint;
create table uniq_id as
select distinct id
from original
order by id
;
create table uniq_id2 as
select id, monotonic() as newid
from uniq_id
;
create table final as
select a.id, b.newid
from original_set a, uniq_id2 b
where a.id = b.id
;
quit;
I have 2 datasets as below
id name status
1 A a
2 B b
3 C c
Another dataset
name status new
C c 0
D d 1
E e 1
F f 1
How do I insert all rows from 2nd table to 1st table? The situation is that the first table is permanent. The 2nd table is updated monthly, so I would like to add all rows from the monthly updated table to the permanent table, so that it would look like this
id name status
1 A a
2 B b
3 C c
4 D d
5 E e
6 F f
The problem I'm facing is that I cannot increment the id from dataset 1. As far as I searched, the dataset in SAS does not have auto increment property. The auto increment can be done with using data step, but I don't know if data step could be use in the case with 2 tables like this.
The usual sql would be
Insert into table1 (name, status)
select name, status from table2 where new = 1;
But since the sas dataset not support auto increment column hence the problem I'm facing.
I could solve it by using SAS data step as below after the above proc sql
data table1;
set table1;
if _n_ > 3 then id = _n_;
run;
This would increase the value of id column, but the code is kinda ugly, and also the id is a primary key, and being used as a foreign key in other table, so I don't want to mess up the ids of old rows.
I'm in the process of both learning and working with SAS so help is really appreciated. Thanks in advance.
Extra question:
If the 2nd table does not have the new column, is there any way to complete what I want (add new row from monthly table (2nd) to permanent table (1st)) with data step? Currently, I use this ugly proc sql/data step to create new column
proc sql; //create a temp table from table2
create t2temp as select t2.*,
(case when t2.name = t1.name and t2.status = t1.status then 0 else 1) as new
from table2 as t2
left join table1 as t1
on t2.name = t1.name and t2.status = t1.status;
drop table t2; //drop the old table2 with no column "new"
quit;
data table2; //rename the t2temp as table2
set t2temp;
run;
You can do it in the datastep. BTW, if you were creating it entirely anew, you could just use
id+1;
to create an autonumbered field (assuming your data step wasn't too complicated). This will keep track of the current highest ID number and assign one higher to each row as you go if it is in the new dataset.
data have;
input id name $ status $;
datalines;
2 A a
3 B b
1 C c
;;;;
run;
data addon;
input name $ status $ new;
datalines;
C c 0
D d 1
E e 1
F f 1
;;;;
run;
data want;
retain _maxID; *keep the value of _maxID from one row to the next,
do not reset it;
set have(in=old) addon(in=add); *in= creates a temporary variable indicating which
dataset a row came from;
if (old) or (add and new); *in SAS like in c/etc., 0/missing(null) is
false negative/positive numbers are true;
if add then ID = _maxID+1; *assigns ID to the new records;
_maxID = max(id,_maxID); *determines the new maximum ID -
this structure guarantees it works
even if the old DS is not sorted;
put id= name=;
drop _maxID;
run;
Response to second question:
Yes, you can still do that. One of the easiest ways is, if you have the datasets sorted by NAME:
data want;
retain _maxID;
set have(in=old) addon(in=add);
by name;
if (old) or (add and first.name);
if add then ID = _maxID+1;
_maxID = max(id,_maxID);
put id= name=;
run;
first.name will be true for the first record with the same value of name; so if HAVE has a value of that name, then ADDON will not be permitted to add a new record.
This does require name to be unique in HAVE, or you might delete some records. If that is not true then you have a more complicated solution.
I've the below dataset as input
ID
--
1
2
2
3
4
4
4
5
And need a new dataset as below
ID count of ID
-- -----------
1 1
2 2
3 1
4 3
5 1
Could you please tell how to do this in SAS wihtout using PROC SQL?
or how about Proc Freq or Proc Summary? These avoid having to presort the data.
proc freq data=have noprint;
table id / out=want1 (drop=percent);
run;
proc summary data=have nway;
class id;
output out=want2 (drop=_type_);
run;
proc sql noprint;
create table test as select distinct id, count(id)
from your_table
group by ID
order by ID
;
quit;
Try this:
DATA Have;
input id ;
datalines;
1
2
2
3
4
4
4
5
;
Proc Sort data=Have;
by ID;
run;
Data Want;
Set Have;
By ID;
If first.ID then Count=0;
Count+1;
If Last.ID then Output;
Run;
PROC SORT DATA=YOURS NOPRINT;
BY ID; RUN;
PROC MEANS DATA=YOURS;
VAR ID;
BY ID;
OUTPUT OUT=NEWDATASET N=; RUN;
You can also choose to keep only the Id and N variables in your newdataset.
We can use simple PROC SQL count to do this:
proc sql;
create table want as
select id, count(id) as count_of_id
from have
group by id;
quit;
Here is yet another possibility, often known as a DoW construction:
Data want;
do count=1 by 1 until(last.ID);
set have;
by id;
end;
run;
If the aggregation you want to do is complex then go with PROC SQL only as we are more familiar with Group by in SQL
proc sql ;
create table solution_1 as select distinct ID, count(ID)
from table_1
group by ID
order by ID
;
quit;
OR
If you are using SAS- EG Query builders are very useful in small
analyses .
It's just drag & drop the columns u want to aggregate and in summary option Select whatever operation you want to perform like Avg,Count,miss,NMiss etc .